Urban transportation sustainability assessments: a systematic review of literature

ABSTRACT The volume of urban transportation sustainability assessments in academic literature has steadily increased over the last two decades. This paper targets these studies through the first systematic literature review to construct a synthesised and critical overview of how urban transportation sustainability is in fact assessed. The sample consists of 99 peer-reviewed articles retrieved via three scientific search engines. The results reveal a Europe-centric and single-case focus, a strong interest to introduce new indicator systems with limited references to previous work, and a lack of qualitative approaches and stakeholder diversity regarding the assessment methods. Nearly 2400 indicators are identified in the articles with significant variation in their use. Furthermore, the comprehensive accounting for sustainability is often overlooked, and the inconclusive assessment results are often noted by the authors of the sample articles themselves. Our findings signal that the research field is highly fragmented and to some extent fails to accumulate knowledge generated by past studies and to comprehensively operationalise the concept of sustainability. The identified shortcomings of the assessments and their implications for transportation policy-making and planning are highlighted, and based on our results recommendations to develop more reliable, comparable, and inclusive sustainability assessments for the urban transportation sector are made.

Currently, different indicator systems represent the dominant approach in urban transportation sustainability planning and policy-making (Olofsson et al., 2016;Sdoukopoulos et al., 2019). However, a mismatch between the overarching definitions of sustainability and what is implemented in practice exists (Marsden, Kimble, Nellthorp, & Kelly, 2010). The utilised assessment methods and indicators often lack comprehensive coverage of the sustainability concept (Jeon & Amekudzi, 2005;Marsden et al., 2010), and they are rarely developed in cooperation with diverse stakeholders (Banister, 2008;Olofsson et al., 2016). Additionally, these assessment tools may not be locally applicable, nor the selection of indicators scientifically valid (Olofsson et al., 2016). The marginalisation of data is common, and often caused by a focus on easy-to-measure and easy-to-achieve policy objectives, thus excluding the more subjective and qualitative aspects of sustainability that frequently relate to social equity and well-being (Jeon & Amekudzi, 2005;Marsden et al., 2010;Olofsson et al., 2016). Finally, it must also be acknowledged that even a comprehensive set of sustainability indicators does not automatically lead to comprehensive and inclusive policies and planning (Kennedy, Miller, Shalaby, MaClean, & Coleman, 2005;Marsden et al., 2010).
No systematic literature reviews of urban transportation sustainability assessments have been published to date. While three review articles on similar topics exist (Gillis et al., 2016;Nadi & Murad, 2017;Sdoukopoulos et al., 2019), these reviews are not systematic and the applied literature samples and their analyses are either very narrow (Nadi & Murad, 2017) or solely focus on indicators (Gillis et al., 2016;Sdoukopoulos et al., 2019). This systematic review addresses this gap. The growing body of literature needs to be reviewed in order to identify potential issues in the relationship between the ideal comprehensive definition of sustainability and its hands-on applications in academia. By synthesising and critically examining the trends, and particularly the limitations of the assessments, an overview of the field is constructed, creating a space for discussing the assessments in relation to one another including challenges, best practices, and development needs. Thus, the following research question is posed: How is urban transportation sustainability assessed in academic literature?

Systematic literature review
Systematic literature reviews focus on a clearly defined topic derived from a research question, apply pre-determined criteria for document search and selection (Berrang-Ford, Pearce, & Ford, 2015;Cooper & Hedge, 1994), and utilise a coherent analytical framework for reviewing the content (Berrang-Ford et al., 2015). Furthermore, a synthesis and critical examination of the acquired sample and the results should be embedded (Berrang-Ford et al., 2015;Van Wee & Banister, 2016).
Bibliometric analysis and qualitative content analysis are often applied together in systematic reviews (Siders, 2019). Bibliometric analysis can be used to study, for example, publication types, journal titles, inter-field connections, and frequencies of publications over time (Landauer, Juhola, & Soderholm, 2015;Siders, 2019). Qualitative content analysis focuses on the context of selected key words and requires manual reading of the analysed content guided by a questionnaire and a coding scheme (Krippendorf, 2004;Neuendorf, 2002) to achieve a systematic analysis of the sample (Berrang-Ford et al., 2015).
In order to answer the research question how urban transportation sustainability is assessed, sets of descriptive and critical sub-questions are applied. First, the descriptive sub-questions focus on the research designs of the studies and cover when and where the research was conducted, which transportation modes are assessed, whether the urban area is addressed partially or as a whole, what cases are applied, what the objects of the assessment are, and what methods and sustainability indicators are applied. Second, the critical sub-questions examine whether new assessment frameworks, tools, or indicator systems are introduced, where the sustainability indicators are sourced from, what limits the use and coverage of the sustainability indicators, and which sample articles comprehensively apply the concept of sustainability. These sub-questions enable a critical analysis of the potential limitations of indicator use, the accumulation of knowledge, as well as the use of diverse and local knowledge, and inclusive and novel approaches in the assessment literature.

Search sequences and search engines
To ensure a representative sample of the literature, three different scientific search engines were used. Scopus and Web of Knowledge (WoK) provide the most extensive scientific article databases in environmental and social sciences (Landauer et al., 2015). Google Scholar (GS) was used as a complementary search engine with a looser search sequence. No time frame limitations were set. Scopus and WoK enquiries with the following search sequence were conducted on 25 January 2019: Title-abs-key / Topic: (urban OR cit* OR metropolitan OR municipal*) AND (transport* OR transit OR mobility) AND sustainab* AND (transport* OR *cycling OR walking OR walkability OR pedestrian*) AND (assess* OR indicator* OR measure* OR framework) AND (policy* OR planning) Title: (transport* OR transit OR mobility OR *cycling OR walking OR walkability OR pedestrian* OR assess* OR indicator* OR measure* OR framework OR policy* OR planning) AND sustainab* The search was limited to journal articles and review articles, and it yielded 270 articles for WoK and 363 for Scopus. The complementary GS search was conducted on 30 January 2019 with the following search sequences: (1) urban transport sustainability indicator (2) urban mobility sustainability indicator The GS search was conducted in an incognito window while logged out of all Google accounts (as recommended by Gusenbauer (2019)) to ensure an unbiased order of hits. Due to its unknown retrieval mechanisms, GS is potentially an unpredictable and unreliable single source for literature reviews; however, it is also recognised as a highly multidisciplinary database for scholarly literature (Gusenbauer, 2019). Search (1) was limited to the first 500 hits, while search (2) was limited to 200 hits due to the abundant non-transportation related articles and repetition with results from search (1). Together, the GS searches returned 95 new articles. In total, our sample consisted of 509 articles after duplicate removal.

Selection criteria
The selection criteria consisted of the following required article foci (1-5) and characteristics (6-7): (1) Passenger transport and mobility, (2) Urban area, (3) Sustainability assessment, (4) Sustainability indicators or criteria, (5) Policy and planning process, (6) Peerreviewed journal article, and (7) Article in English. Figure 1, below, presents the systematic selection process. The criteria-based selection and deselection of extracted articles was divided into two rounds, resulting in the final set of 99 articles. 2.3. Analysis of the sample 2.3.1. Coding Coding was conducted in stages during spring and summer of 2019. First, the authors (referred to as coders) read a subset of 10 articles independently and recorded their findings into separate Excel 2016 workbooks (Microsoft, Redmond, WA, USA). The workbooks only included the pre-coded author, publication year, title, and journal information, alongside the coding framework based on the questionnaire presented in Section 2.1. The two sets of coding were then compared to assess the coding framework and the consistency between the two coders. The compared sets of data were nearly identical enabling the first full round of coding to be carried out with some minor adjustments to the framework. A second full round of coding followed. The results of the two full rounds of coding were consistent and the few individual differences were triple-checked and corrected based on the sample article content. See Supplementary material for final codebook and coding framework.

Methodological considerations
Due to the highly structured search sequences for WoK and Scopus, it is possible that some relevant articles were excluded (as recognised by e.g. Jurgilevich, Rasanen, Groundstroem, & Juhola, 2017). For this reason, the complementary GS search was carried out with a looser keyword formula. Furthermore, our sample was limited to peer-reviewed academic articles and thus excluded all grey literature and its potentially thematically relevant content. The focus on peer-reviewed articles was used to address quality control and reliability of the literature sample (Berrang-Ford et al., 2015), as well as to reduce the large sample size.
Coder bias likely presents the most significant potential issue in this analysis (Neuendorf, 2002). The bias was controlled for by planning the coding framework collaboratively between the review authors, conducting a limited sample trial coding to expose any initial inconsistencies in the framework when coders worked independently, conducting two full rounds of coding to have two sets of comparable data, and triple-checking the few remaining differences in codes after the second full round of coding.

Bibliometrics
The sample consisted of 99 articles published between the years 2002 and 2019. 1 The average trend line shows a steady overall increase in publications (excluding incomplete 2019 data) over the past 16 years (Figure 2). The sample is dominated by European research with 44 of the articles coming from European research institutions. Twenty-one are from North America and 20 from Asia. Australian and Oceanian studies account for eight articles in the sample, and South American research for six.

Cases, urban spatial scale, and transportation modes
The sample is abundant with cases, consisting mostly of European cities (n = 42), while Asian (n = 24), North American (n = 16), South American (n = 11, mostly Brazilian cities), and Australian and Oceanian (n = 9) cities are to some extent applied. African cities only feature in five articles. Additionally, international and intercontinental comparisons are presented in only a handful of papers. Table 1 presents the transportation modes and the urban spatial scale applied in the assessments. Nearly two-thirds (n = 60) of the articles assess all modes at the city level, while private motorised transportation, cycling, and walking are rarely assessed on their own. Notably, walking, or walkability, is always evaluated at the neighbourhood level, and in general, the focus on space distribution and accessibility is emphasised within the neighbourhood context (e.g. Gössling, Schröder, Späth, & Freytag, 2016;Machler & Golub, 2012). The transportation project studies include assessments of car sharing Awasthi, Chauhan, & Omrani, 2011) and stakeholder preferences in project evaluation (Bulckaen, Keseru, & Macharis, 2016), for example.

Main methods and data collection methods
Indicator and framework development and conceptualisation, multiple-criteria decision analysis (MCDA), and modelling dominate the main methods used in the sample ( Figure 3a). Participatory data collection methods, geographic information system (GIS), documentary analysis, and purely statistical data analysis are also applied as primary methods, but to a lesser extent. However, as data collection methods, the statistical data analysis and participatory data collection methods are the most frequently applied (Figure 3b). MCDA is often coupled with participatory data collection methods (in 13 out of 22 articles), due to the collection of stakeholder preferences that are central to the analysis. Furthermore, literature reviews and statistical analyses are preferred as the data collection methods in the construction of new indicator systems and frameworks. Lastly, multiple data collection methods are applied in 16 articles, and are in many cases combinations of qualitative and quantitative methods, often due to the use of MCDA as the primary method.

What methods are applied to assess what phenomena?
Indicator or framework development. The conceptual indicator or framework development characterises nearly one-third of the sample. These studies synthesise existing knowledge on transportation sustainability indicators and assessments, and present novel ways to evaluate various aspects of urban transportation systems (Table 2). For example, Richardson (2005) introduces frameworks of identified influential transportation system factors and their interactions based on findings from literature and focus groups. Gillis et al. (2016) and Litman (2007) both present comprehensive indicator lists, sourced from previous literature, to aid planning processes. Marsden et al. (2010) discuss the definition of sustainability and develop a framework in collaboration with stakeholders to better account for the comprehensive concept in planning and decision-making. Similarly, Feng and Hsieh (2009) focus on how to better incorporate different stakeholder needs, and the concepts of transport diversity and quality of life, into transportation planning. Indicator development is also at times linked with policy goal evaluation (Black, Paez, & Suthanaya, 2002;Figueroa & Ribeiro, 2013). Furthermore, many papers aim to select appropriate indicators for service performance assessments (e.g. Haghshenas & Vaziri, 2012;Olofsson et al., 2016;Reisi, Aye, Rajabifard, & Ngo, 2014). MCDA, modelling, and simulations. MCDA and modelling account for approximately one-third of the applied main methods. Both modelling and MCDA are applied to assess diverse issues within the sample; however, a clear majority of the MCDA articles focus on assessing policies and plans. Al-Atawi, Kumar, and Saleh (2016) survey citizen preferences to evaluate and rank different policies. Curiel-Esparza, Mazario-Diez, Canto-Perello, and Martin-Utrillas (2016), Ngossaha, Ngouna, Archimede, and Nlong (2017), and Oses, Roji, Gurrutxaga, and Larrauri (2017) utilise preferences from experts and local leaders in the transportation planning sector for their analyses of policies and performance. Ha, Joo, and Jun (2011) combine expert opinion with GIS data to study walkability. Castillo and Pitfield (2010) focus on evaluating appropriate indicators and, additionally, Wey and Huang (2018) evaluate policies and indicators that account for increased quality of life in transportation planning. Modelling is generally characterised through the creation of different types of planning tools for the transportation sector (e.g. Curtis & Scheurer, 2010;Fedra, 2004); however, some papers apply models to assess specific cases, such as policy impacts and strategies (e.g. Haghshenas, Vaziri, & Gholamialam, 2015;Jonsson, 2008), or service performance (Chen, Bouferguene, Shen, & Al-Hussein, 2019;Rajak, Parthiban, & Dhanalakshmi, 2016). Machler and Golub (2012) construct a vision for increased transportation access in a low-income neighbourhood using a mix of extensive local knowledge and modelling. Future development is only assessed in two articles (Fedra, 2004;Reisi, Aye, Rajabifard, & Ngo, 2016) through scenarios, while the built environment is only examined through an infrastructure project in one paper (Mansourianfar & Haghshenas, 2018).
Participatory data collection methods. These mostly qualitative methods are not generally used as the primary method. Additionally, they show a trend towards evaluations of citizen-related perceptions. Mameli and Marletto (2014), Marletto and Mameli (2012), and Munira and San Santoso (2017) survey citizen opinions on sustainability indicators and policy objectives. Policies are assessed using expert surveys (Palma Lima, da Silva Lima, & da Silva, 2014) and indicator frameworks conceptualised with expert surveys and interviews (Marsden, Kelly, & Snell, 2006).
Documentary analysis. This primary method is generally applied when assessing policies and plans, although it is also used to evaluate different green certification systems (Gouda & Masoumi, 2017).
Multiple/other. This miscellaneous group includes four articles that combine MCDA and modelling as their main methods to assess policies and plans. Other papers include a study of accessibility in relation to travel modes and behaviour using a place rank method (Vega, 2012) and a study of citizen perceptions using social media data mining and sentiment analysis (Sdoukopoulos, Nikolaidou, Pitsiava-Latinopoulou, & Papaioannou, 2018).

Assessment indicators
3.4.1. Indicator sourcing Indicators (used as an umbrella term for indicators, criteria, and other variables applied in the sample to assess sustainability) were mainly sourced from academic literature (n = 42); however, only eight articles (Currie & De Gruyter, 2018;De Gruyter et al., 2017;Haghshenas et al., 2015;Jonsson, 2008;Miranda & da Silva, 2012;Olofsson et al., 2016;Palma Lima et al., 2014;Vega, 2012) directly state that an existing indicator system was utilised in their assessment. Grey literature was often used in combination with academic literature. Experts, planners, decision-makers, practitioners, citizens, research group members, databases, surveys, and projects present rarely applied sources applied. Notably, citizen knowledge was only utilised three times in combination with other sources (Jones, Tefe, & Appiah-Opoku, 2013;Machler & Golub, 2012;Whitmarsh, Swartlig, & Jäger, 2009) and never on its own.

Indicator catalogue
As the volume of indicators was very large (2396 in total) and indicators overlapped frequently, regrouping and synthesis was necessary. 2 Table 3 shows the generated catalogue of sustainability indicators and includes the most frequently applied indicators that are referenced on more than 10 occasions. A maximum of five indicators are presented from each thematic group. See supplementary material for a full list of indicators with article references.
The most common indicators fall under the categories of accidents and fatalities, air pollutants, GHG emissions, energy and resource use, and land use. Modal split, motorisation rates, congestion rates, and travel times are also frequently applied, as well as various accessibility measures. Safety and security are considered relatively often, yet the feeling of safety is only examined in a small number of articles. Environmental sustainability appears to be mostly operationalised through emissions, and to some extent biodiversity and environmental protection. Waste (e.g. recycling of end-of-life vehicles) is, surprisingly, very infrequently applied, alongside indicators such as light disturbance and the number of unresolved environmental cases pending.
Commonly applied social indicators include affordability, quality and access of transportation for the disadvantaged, and public participation in policy-making and planning processes. The ambiguously defined Equity presents one of the most frequently applied social indicators. Many papers list it merely as equity, while some attach it to, for example, air pollution exposure (Jeon, Amekudzi, & Guensler, 2010, 2013, equity for non-drivers, the disabled, and the low-income population (e.g. Jonsson, 2008;Litman & Burwell, 2006), or gender (Santos & Ribeiro, 2013).
Economic concerns are generally represented through efficiency, traffic levels, and congestion rates, but also through investment and operation expenditure, revenues, and expenses for households and the community. The vague Economic efficiency and development is commonly applied, while some articles utilise more specific indicators, such as, created employment growth in the mobility sector (e.g. Beria et al., 2012;Joumard & Nicolas, 2010).
Health presents a commonly overlooked theme. In some studies, health impact is applied to generally account for pollution and emission induced health issues, while other studies rely on a similarly general protection and promotion of health indicator. Health hazards, disease burden related to transit, and fatality, injuries and mortality effects resulting from air pollution are all examples of more specific health-related indicators. Additionally, public health benefits from increased physical activity, portion of residents who walk or cycle sufficiently for health, and cycling trips for health represent indicators for derived health benefits from increased physical activity and sustainable mobility.
While Public participation presents the only governance-related indicator in Table 3, a rich set of indicators is applied to capture the governance theme (although most are applied only once). Examples of these governance-related indicators include intercity partnerships, transparency and responsibility, efficient use of government resources, information availability and accessibility, public acceptability, training and knowledge for practitioners, and the presence of various actions, policies, and policy integration. Marketing is also accounted for in some articles.
Issues related to comfort and design also receive little attention. Comfort, crowdedness, and cleanliness, for example, are applied in only a small number of articles to analyse public transportation (e.g. Gillis et al., 2016;Whitmarsh et al., 2009). Safety and attractiveness of the street environment appear most relevant for pedestrians, and include indicators, such as, presence of street lighting and connected and open communities.
Citizen satisfaction and perceptions are not consistently examined throughout the sample; however, some articles do factor them in. A general satisfaction with the transportation system and service is most commonly applied. In particular, Olofsson et al. (2016), Toth-Szabo and Varhelyi (2012), and Sdoukopoulos et al. (2018) extensively apply the citizen satisfaction and perception aspects as related to, for example, congestion, noise disturbance, or public transportation reliability.
Finally, while the majority of articles claim to assess the transportation system as a whole, a preference to focus primarily on public transportation and private motorised vehicles rather than on walking and cycling exists. For example, public transportation indicators are referenced 159 times in total, whereas indicators related to walking only reach 50 references.

Limitations and challenges in indicator use
The results of the assessments were also examined during the first coding round, but due to the high variability in indicator use, and as noted by the authors of the sampled articles, the inconclusiveness of the results, the collection of the results proved meaningless. However, these self-identified limitations were recorded.
The limitations are centred on data availability, and in two cases the inability to process all available data (Campos, Ramos, & Correia, 2009;Miranda & da Silva, 2012). Additionally, Alonso, Monzon, and Cascajo (2015) and Cavalcanti, Limont, Dziedzic, and Fernandes (2017) mention the measurability, reliability, and feasibility of data as limiting factors. Reisi et al. (2014) also specifically note that citizen satisfaction, quality of transportation options, quality of transportation for the disadvantaged and the disabled, noise exposure, and cost of parking could not be evaluated due to lack of appropriate data.
3.5. Comprehensive, diverse, and inclusive sustainability assessmentsprogressive approaches 33 sample articles apply indicators from at least the environmental, economic, and social dimensions, while additionally accounting for qualitative social aspects as a particular focus. In most cases they also utilise diverse sourcing for indicators. Using this as criteria, a sub-sample of progressive examples of urban transportation sustainability assessments in terms of comprehensive application of the sustainability concept was identified.
The articles are organised into four groups based on the issue they assess. This subsample was analysed exactly like the rest of the sample, but due to the added value as exemplar approaches, they are highlighted here with more descriptive details.

Finding the appropriate indicators and assessment techniques
The articles in Table 4 focus on suitable indicator selection, and primarily investigate the transportation system as a whole at the city level. These frameworks or indicators aim to rectify narrow definitions and applications of sustainability, with many articles highlighting liveability, quality of life, communities, and the needs of the most vulnerable citizen groups.
Local planners and experts are included in the indicator selection process in several articles (Ramani, Zietsman, Gudmundsson, Hall, & Marsden, 2011;Ramani, Zietsman, Ibarra, & Howell, 2013;Toth-Szabo & Varhelyi, 2012), and the importance of local context, and data availability and quality is often promoted. The majority of papers rely on literature reviews to generate the data for suitable indicator selection but, for example, Castillo and Pitfield (2010) and Wey and Huang (2018) both apply MCDA in their analysis. In a case study of suitable indicators for Mumbai, India, Nathan and Reddy (2013) emphasise the need to include areas of rapid urbanisation into the sustainable transportation assessment and planning field. Cottrill and Derrible (2015) focus on big data and the possibility to improve sustainability assessment through new technologies that provide increasingly reliable, personalised, and real-time data. Prevailing data availability issues are also highlighted by Ramani et al. (2013) and Toth-Szabo and Varhelyi (2012), stating that the actual operationalisation of some indicators is currently unrealistic and further research is required. Table 5 lists the articles that focus on system performance and physical infrastructure. Performance is assessed at the city level for all modes in every performance study (Haghshenas & Vaziri, 2012;Miranda & da Silva, 2012;Olofsson et al., 2016;Rajak et al., 2016;Shah, Table 4. Sub-sample articles (1/4) evaluating indicator suitability.

Article
Introduces an assessment framework/indicator system Castillo and Pitfield (2010). ELASTIC -A methodological framework for identifying and selecting sustainable transport indicators.
x Cottrill and Derrible (2015). Leveraging big data for the development of transport sustainability indicators. Gillis et al. (2016). How to monitor sustainable mobility in cities? Literature review in the frame of creating a set of sustainable mobility indicators.
x Litman (2007). Developing indicators for comprehensive and sustainable transport planning.
x Miller et al. (2013). Developing context-sensitive livability indicators for transportation planning: A measurement framework.
x Ramani et al. (2011). Framework for sustainability assessment by transportation agencies.
x Ramani et al. (2013). Addressing sustainability and strategic planning goals through performance measures.
x Santos and Ribeiro (2013). The use of sustainability indicators in urban passenger transport during the decision-making process: The case of Rio de Janeiro, Brazil.
x Tafidis, Sdoukopoulos, and Pitsiava-Latinopoulou (2017). Sustainable urban mobility indicators: Policy versus practice in the case of Greek cities.
x Toth-Szabo and Varhelyi (2012). Indicator framework for measuring sustainability of transport in the city.
x Wey and Huang (2018). Urban sustainable transportation planning strategies for livable city's quality of life.
x Manaugh, Badami, & El-Geneidy, 2013), while the built environment is assessed through a project (Jones et al., 2013) and at the neighbourhood level (Mansourianfar & Haghshenas, 2018). The methods applied in this group of papers are varied, but all of them introduce a novel assessment framework. Two large-N international comparisons are included in this group (Haghshenas & Vaziri, 2012;Shah et al., 2013). Jones et al. (2013) present a localised sustainability score assessment framework for urban transportation projects in developing countries, and it incorporates both scientific and indigenous knowledge. Local decision-makers, planners, and system providers are also included in the indicator sourcing process in some of the other articles in this sub-sample (Jones et al., 2013;Miranda & da Silva, 2012;Rajak et al., 2016). Table 6 evaluate transportation policies and plans. Most include all modes at the city level, but there is also an analysis of a city district (Wann-Ming, 2019), and three project-level studies Bulckaen et al., 2016). A clear majority apply MCDA as their main method. However, documentary analysis is applied by Chakhtoura and Pojani (2016) in a study of transportation plans and media articles in Paris, and also by Jeon and Amekudzi (2005) in an international review of transportation initiatives that examines the definition, indicators, and metrics of sustainability. Surveying transportation planning experts is included in the analysis of policy actions by Palma Lima et al. (2014), as well as in the MCDA procedures in other papers.

Evaluating policies and plans Papers in
All these papers include diverse sustainability dimensions and indicators in their analyses. Additionally, quality of life is again emphasised by Wann-Ming (2019) in terms of integration with growth management principles, which have emerged in Taipei City, to facilitate the creation of friendly, accessible, and sustainable living environments. Table 5. Sub-sample articles (2/4) evaluating performance and infrastructure.

Article
Introduces an assessment framework/ indicator system Haghshenas and Vaziri (2012). Urban sustainable transportation indicators for global comparison.
x Jones et al. (2013). Proposed framework for sustainability screening of urban transport projects in developing countries: A case study of Accra, Ghana.
x Mansourianfar and Haghshenas (2018). Micro-scale sustainability assessment of infrastructure projects on urban transportation systems: Case study of Azadi district, Isfahan, Iran.
x Miranda and da Silva (2012). Benchmarking sustainable urban mobility: The case of Curitiba, Brazil.
x Olofsson et al. (2016). Development of a tool to assess urban transport sustainability: The case of Swedish cities.
x Rajak et al. (2016). Sustainable transportation systems performance evaluation using fuzzy logic.
x Shah et al. (2013). Diagnosing transportationdeveloping key performance indicators to assess urban transportation systems. x

Focusing on citizen perceptions, satisfaction, and behaviour
The papers in Table 7 assess citizen perception, preferences, and behaviour. All papers apply a system-level approach to transportation modes and include the urban area as a whole, except for Machler and Golub (2012) who focus on a single low-income neighbourhood. Additionally, Sdoukopoulos et al. (2018) position their evaluation of citizen perceptions towards new suitable indicators and the use of social media and big data in transportation assessments. As expected, this group shows a concentration of participatory data collection methods. Surveys, focus groups, and workshops with citizens and experts are strongly represented in these studies, while stakeholder dialogue analysis (Marletto & Mameli, 2012) and data-mining combined with sentiment analysis (Sdoukopoulos et al., 2018) are also applied. Table 7. Sub-sample articles (4/4) assessing citizen perceptions.

Article
Introduces an assessment framework/ indicator system Machler and Golub (2012). Using a "sustainable solution space" approach to develop a vision of sustainable accessibility in a low-income community in Phoenix, Arizona. Mameli and Marletto (2014). Can national survey data be used to select a core set of sustainability indicators for monitoring urban mobility policies? Marletto and Mameli (2012). A participative procedure to select indicators of policies for sustainable urban mobility. Outcomes of a national test. Munira and San Santoso (2017). Examining public perception over outcome indicators of sustainable urban transport in Dhaka City.
x Sdoukopoulos et al. (2018). Use of social media for assessing sustainable urban mobility indicators. Whitmarsh et al. (2009). Participation of experts and non-experts in a sustainability assessment of mobility.

Article
Introduces an assessment framework/ indicator system . Using AHP and Dempster-Shafer theory for evaluating sustainable transport solutions.
x . Application of fuzzy TOPSIS in evaluating sustainable transportation systems.
x Bulckaen et al. (2016). Sustainability versus stakeholder preferences: Searching for synergies in urban and regional mobility measures.
x Chakhtoura and Pojani (2016). Indicator-based evaluation of sustainable transport plans: A framework for Paris and other large cities.
x Jeon, Amekudzi, and Guensler (2013). Sustainability assessment at the transportation planning level: Performance measures and indexes.
x Palma Lima et al. (2014). Evaluation and selection of alternatives for the promotion of sustainable urban mobility.
Wann-Ming (2019). Constructing urban dynamic transportation planning strategies for improving quality of life and urban sustainability under emerging growth management principles.
Indicator sourcing in this group is the most diverse, with many articles using experts, planners, decision-makers, and citizens in their selection processes. Furthermore, diverse stakeholder engagement and the local context is emphasised throughout these articles.

Discussion
The growing body of urban transportation sustainability assessment literature is systematically examined for the first time in this review. A critical overview of the applied research designs, the identified progressive approaches, as well as common drawbacks in the field based on 99 academic articles is produced. To summarise, a clear bias towards the global North exists, single cases are more common than comparisons, and the focus is mostly on motorised transportation as compared to walking and cycling. Additionally, there is a tendency to introduce new assessment indicators and frameworks with very limited references to existing assessment tools, indicator use is highly varied and often limited by data availability, and lastly both the indicator sourcing and stakeholder involvement are narrow. Most importantly and alarmingly, it is evident that urban transportation sustainability is evaluated in countless ways with no common baseline or minimum requirements for the application of the sustainability concept.
Next, we discuss (in Section 4.1) the mismatch between the comprehensive concept of sustainability and its narrow applications in the assessments (stemming from the limitations and high variety in indicator use and data coverage, and the few references to existing assessment tools); (in Section 4.2) the limited use of diverse knowledge and participatory approaches in the assessment literature (derived from limited utilisation of local expert and citizen knowledge, low representation of cases and research from the Global South, and exclusion of qualitative and social aspects of sustainability); and (in Section 4.3) the resulting incomplete, unreliable, and ambiguous assessment results followed by future research needs. These topics critically represent how urban transportation sustainability is assessed in academia, embody the central issues in the assessments, and establish future research directions.

Persisting definition deficit of sustainability
The field of transportation sustainability assessment literature appears to proceed on two tracks. One body of work produces an increasing volume of assessment methods and indicators, while the other debates the persisting definition deficit of sustainability in transportation research and the resulting implications for planning and decision-making. Particularly evident is the mismatch between a conceptually comprehensive sustainability assessment framework and its implementation into practice (Marsden et al., 2010;Marsden & Reardon, 2017;Olofsson et al., 2016;Sultana et al., 2019). The literature calls for the development of more comprehensive assessment tools and indicators (e.g. Olofsson et al., 2016); however, concerns over the academic understanding of their real-life use in policy and planning continue to be raised (e.g. Marsden & Reardon, 2017). Our findings of limited, narrow, and varied indicator use corroborate with these statements.
The dominance of easy-to-measure aspects of sustainability has been identified both in the literature (Jeon & Amekudzi, 2005;Marsden et al., 2010) and in practice (Cottrill & Derrible, 2015). This is also evident in the volume and frequency of use of indicators identified in this review, as quantitative sustainability aspects are assessed significantly more often than the qualitativein particular social and socio-economicaspects. Moreover, many of these qualitative and socio-economic issues are referenced on a more general level (e.g. equity) and are infrequently addressed in detail. Similarly, health is mostly included in these assessments through quantitative exposures to pollution and not through e.g. benefits acquired in increased physical activity. Our analysis also reveals many qualitative environmental indicators that currently receive little attention. Moreover, the results demonstrate a dominance of public transportation and private motorised transportation indicators over cycling and walking, even in studies that cover the transportation system as a whole.
As sustainability becomes a policy goal for urban areas globally, cities can deliberately select indicators and measures that easily achieve a result that presents a positive view (Toth-Szabo & Varhelyi, 2012). Marsden et al. (2010) emphasise that the uncertainties embedded in the planning and policy-making processes should not automatically affect the comprehensive operationalisation of sustainability in local goals and strategies. We draw similar conclusions, as a quarter of the studies in our sample do not apply all of the indicators identified as relevant in their assessments. Data availability is commonly stated as a challenge for diverse indicator use, as also noted by Sdoukopoulos et al. (2019).
Our results show that although many studies discuss new indicators and assessment techniques, they do not operationalise them into novel data collection methods to ensure improved data coverage. Only two sample articles (Cottrill & Derrible, 2015;Sdoukopoulos et al., 2018) clearly focus on innovative and improved data generation techniques. These techniques employ social media, GIS technology, and applications of big data in new indicator development. Data availability and quality should be linked to the indicator selection more effectively, as highlighted in the sub-sample articles (Section 3.5.1). Restricted indicator use leads to inconclusive and distorted results that can then prompt the development of policies and plans that do not account for sustainability as a whole (Kaur & Garg, 2019;Olofsson et al., 2016;Pearsall & Pierce, 2010;Sdoukopoulos et al., 2019).

Stakeholder and knowledge marginalisation
Based on our results, the qualitative and social aspects of urban transportation sustainability tend to be marginalised in assessments due to limited data collection resources (as noted by e.g. Cottrill and Derrible (2015), too), which then contributes to the marginalisation of diverse local voices in transportation assessments, planning, and decisionmaking. Further, the locally participatory approaches appear extremely scarce, marginalising the pool of knowledge included in sustainability assessments further.
This review clearly shows the marginalisation of citizens in both the indicator selection processes and the assessments themselves. Even though MCDAwhich includes stakeholders when establishing assessment criteria for policy measurespresents a popular research approach in the sample, the studies tend to mainly focus on experts from academia and the planning and policy-making sectors. Only a handful of studies either evaluate citizen perceptions or employ citizen knowledge. Our findings thus align with the conclusions of previous studies that demonstrate the neglect of diverse local and expert knowledge (Marsden et al., 2010;Tennoy, Hansson, Lissandrello, & Naess, 2016). Moreover, this review identifies a shortage of institutions engaging in sustainable transportation research, as well as a lack of cases that focus on the Global South (as also found by Sdoukopoulos et al. (2019)). These are critical omissions given that rapid urbanisation is increasing the pressure on existing transportation systems and the demand for expanded public transportation services, in particular. In the Global South, the need for support in sustainability planning is thus growing as increases in traffic and the related health impacts become more and more visible, and sustainability should be introduced early in the planning and research processes as a guiding concept and policy goal (Nathan & Reddy, 2013;Pojani & Stead, 2015).
To summarise, as urban transportation policy goals are context specific, selected indicators should reflect the varying local concerns and employ local knowledge (Marsden et al., 2010). Stakeholder participation has been defined as a sustainability principle as it increases the utilisation of diverse knowledge and supports the inclusion of the local context (Castillo & Pitfield, 2010). Public acceptance and support are essential when implementing successful policies and intended behavioural changes (Banister, 2008), and, for this reason, the research should address public concerns and support a comprehensive view of the pertinent issues (Litman, 2007;Miller, Witlox, & Tribby, 2013). Acknowledging varying local conditions is essential for the production of accurate and meaningful sustainability assessments (Marsden et al., 2010;Sdoukopoulos et al., 2019). However, even when diverse knowledge and comprehensive indicators are applied, a perfect assessment does not equal perfect planning (Kennedy et al., 2005). The selection of indicators presents a challenge for local planners and policy-makers (Olofsson et al., 2016), and the selection process requires clearly defined policy goals and criteria, alongside diverse stakeholder involvement (Litman, 2007;McCool & Stankey, 2004;Olofsson et al., 2016;Pearsall & Pierce, 2010;Sdoukopoulos et al., 2019), and scientific support (Olofsson et al., 2016) to avoid assessment results that are easily manipulated by the chosen sustainability indicators (Litman, 2007).

Inconclusive assessment results
The identified shortcomings discussed above, specifically the limited use of indicators, data, diverse stakeholders, and participatory methods, lead to the marginalisation of many relevant sustainability aspects. This in turn generates incomplete, unreliable, and incomparable results. The assessment results of the sample articles were initially recorded with a plan to gather a set of data on worldwide city transportation sustainability. However, the results did not provide any meaningful comparisons due to the varied indicators, data limitations, and the inconclusive results as identified by the sample article authors.
Although initially unexpected, this finding corroborates the concerns related to the fragmented field and the definition deficit of sustainability. We do not however argue for one universally applicable solution (as also noted by Sdoukopoulos et al. (2019), and Pojani and Stead (2015)) that would enable comparisons but simultaneously most likely lose the local context and relevance. Instead, the persisting problems between the definition and operationalisation of sustainability are identified. The fragmented field has, first, failed to address and reconcile the data availability issues that limit indicator coverage and lead to incomplete results, and second, move towards the extensively discussed comprehensive conceptualisations of sustainability alongside diverse stakeholder involvement. Currently, the assessment literature is producing large volumes of case-based and data availability driven research, that appear nearly devoid of purpose from the sustainability point-of-view.

Future research needs
Our findings raise concerns regarding the non-cumulative nature of the assessment literature and strongly signal a need to establish a baseline of some sort for the assessments. Changes are required to address the more significant weaknesses and to drive the sustainability assessments forward in terms of quality, reliability, comparability, and inclusiveness. These changes include new data collection methods, participatory approaches, and some common minimum requirements for assessments. Special attention must also be given to indicator coverage, inclusion of local context, and a threshold system for monitoring and comparing cases over periods of time. The United Nations Sustainable Development Goals were set to address such issues through unified sustainability indicators, but even these indicators have been found to be to some extent locally inapplicable, irrelevant, producing negative effects if applied alone, and lacking outcome and policy databases to ensure comparability and support for the exchange of best practices (Hansson, Arfvidsson, & Simon, 2019;Rozhenkova, Allmang, Ly, Franken, & Heymann, 2019).
Efforts to develop new methods to generate data are needed, particularly with the help of diverse stakeholders, the promises of big data, and real-time GIS techniques. This is of particular importance when identifying context-specific indicators and including social aspects in the assessments (Cottrill & Derrible, 2015). While the current methods are abundant and diverse, they remain separate and incomparable for the most part. For this reason, they are insufficient when presenting a consistent key set of indicators (Steg & Gifford, 2005), and when capturing a global view of transportation sustainability and best practices (Gillis et al., 2016). It is clear that there is a need to construct a more coherent baseline to better track, compare, and support the progress towards sustainable transportation systems in urban areas.

Conclusions
While the assessment literature continues its steady growth, our findings raise questions over how the knowledge is gathered and assimilated. The findings also highlight the need for an overall improvement in the quality and reliability of these evaluations. Academic literature does not provide conclusive assessment results, even with the abundant assessment methods and indicators, as the studies do not comprehensively account for the complexity of sustainability. A focus on local conditions, and the variation in indicator use and applied methods, is necessary to meaningfully assess single cities based on their unique characteristics. However, there is also a clear need for a common baseline to define what constitutes sustainability. The current situation provides no opportunity for comparisons, and the sharing of best practices in local planning and policy-making is also challenged by the loose interpretations of sustainability. By identifying trends and pitfalls, and presenting an indicator catalogue alongside a set of progressive articles that emphasise comprehensive, diverse, and inclusive approaches to sustainability assessments, this review provides an overview of the field and creates an initial baseline and a platform for improving assessment reliability and coverage. Future research needs to increase the focus on establishing a set of common criteria for the assessments, improve participation and local knowledge utilisation, develop a focus on urban areas (particularly in the Global South), and identify new ways to generate data to prevent the exclusion of essential indicators. Notes 1. The annual sub-sample for 2019 is incomplete as the systematic searches were conducted early in the year. Therefore, the volume of publications for 2019 should be treated as incomplete in relation to publication trends. 2. The indicator regrouping was performed manually in Excel 2016 under thematically cohesive groups of indicators that emerged from the dataset. Similar indicators were merged together if they measured exactly the same phenomenon.