Toward Evidence-Based Urban Planning

Abstract Literature reviews can play a pivotal role in designing urban policies. Here we introduce two tools used by public health specialists to assess the quality of studies and quantify the evidence derived from them: the Risk of Bias Assessment (RoB) and Evaluation of Certainty of Evidence (ECE). The RoB scores articles on several domains (e.g., selection bias, study design, etc.) to provide an appraisal of how rigorous the study is, whereas the ECE tool provides a framework to clearly state how much certainty there is in the outcomes under study. Both tools can be used to enhance literature review articles in urban planning to better inform practitioners on how to best develop policies using a rigorous approach.

A long and rich history of collaboration exists between the fields of urban planning and public health (Jackson et al., 2013). Both are practice-oriented disciplines, and both have strong advocates for evidence-based approaches wherein research informs practice and policy development (Brownson et al., 2009;Krizek et al., 2009). A key component of evidence-based approaches is to gather all the relevant findings on a topic by, say, completing a literature review (Krizek et al., 2009). Indeed, K. Stevens (2001) called systematic reviews in health research "the heart of evidence-based practice" (p. 529).
Although both urban planning and public health conduct literature reviews, their impact seems to vary across the disciplines. Literature reviews in urban planning have been critiqued for lacking rigor (Xiao & Watson, 2019), but recent studies have found that they are used directly in public health decision making (Dobbins et al., 2004;South & Lorenc, 2020). Further, the approach taken when conducting literature reviews can vary between urban planning and other disciplines (Xiao & Watson, 2019). For instance, the public health field has developed various methods and tools to conduct reviews in a rigorous manner. These tools have, to the best of our knowledge, yet to be incorporated into urban planning. These assessments speak to the quality of studies and the evidence derived from the reviewed articles in a systematic manner and therefore have much to contribute to urban planners who champion evidence-based policy development.
To help urban planning reviews better inform practice, we focus here on how to adapt and integrate research tools used by public health professionals when conducting literature reviews for quantitative research in the field of urban planning. In this Viewpoint, we begin by introducing the role and impact of literature reviews in the field of urban planning. We then present the Risk of Bias Assessment (RoB) and Evaluation of Certainty of Evidence (ECE) tools for literature reviews. These tools are used to assess the quality of the existing scholarship and provide clear evidence to practitioners in the public health field. We then provide an example of each tool that is applicable to survey research studies in urban planning and discuss the possibilities of applying these tools across different types of urban planning research. We conclude by outlining the foreseen benefits of incorporating RoBs and ECEs in our discipline. Given the practice-oriented nature of urban planning, we believe that adapting these tools offers great potential to move toward a more evidence-based planning approach that many planning authorities have begun adopting in recent years.

Literature Reviews: Important Tools Yet to Realize Their Full Potential
Literature reviews not only provide a coherent and wellstructured overview of the research that has been done in an area, but also add value through, for instance, identifying research gaps, putting forward research agendas or conceptual models, or critically evaluating the methods or frameworks used to study a topic (De Vos & El-Geneidy, 2021;van Wee & Banister, 2016). Literature reviews are a research tool that have had a high impact on many fields. For example, some transport and land-use planning review papers have been cited extensively (for example, Cao et al., 2009, received more than 600 citations on Scopus and 588 on Web of Science).
Many different types of literature reviews exist (De Vos & El-Geneidy, 2021;Grant & Booth, 2009;van Wee & Banister, 2016). Grant and Booth (2009) outline 14 types of reviews in their article, all of which differ when it comes to search strategy, appraisal, synthesis, and analysis. From our experiences reading, publishing, and peer-reviewing literature review articles in the field of urban planning, four common types of reviews in our field are critical reviews, scoping reviews, meta-analyses, and systematic literature reviews (Table 1).
The aim of the tools we introduce in the following section is to assess the quality of the articles included in a literature review. Xiao and Watson (2019) discussed how quality assessments can be used to help authors decide which articles to include in a literature review or to help authors know which articles' results to emphasize. The specific tools we present in this Viewpoint are mainly used in systematic reviews, which are distinguished by their exhaustive search strategy. This comprehensive search is needed to appraise and synthesize all the evidence (often including gray literature evidence) to establish what is known and to make recommendations for practice (Adkins et al., 2017). Systematic reviews are also known for incorporating quality assessments (Grant & Booth, 2009). In our experience, however, urban planning researchers frequently conduct this type of review but omit these quality assessments.
Two tools typically used for quality assessments are the RoB and the ECE. Below we provide an overview of both tools and discuss how they can be adapted for urban planning research to help in deriving more evidence-based urban policy. We also present a version of each tool that can be used to assess quantitative studies using survey research in urban planning. These tools are usually introduced in manuscripts after a detailed description of each study included in the review is introduced to the reader because they help in assessing the quality of the reviewed articles and provide an assessment of the evidence identified across the reviewed manuscripts.

Risk of Bias Assessment
An RoB, also called a quality assessment or a critical appraisal, aims to establish the quality or rigor of the studies that exist on a topic. Typically, each study included in a literature review will be scored on potential sources of bias (e.g., selection bias, detection bias, or reporting bias). Though researchers completing literature reviews cannot measure the presence of bias in each study, this tool can be used to assess the risk that the results are biased based on what is stated in the methods sections of articles. Doing so can help literature review authors avoid generating misleading literature reviews because they can amplify the results of rigorous studies and minimize those from studies with greater risk of bias (Al-Jundi & Sakka, 2017;Ma et al., 2020).
RoBs are a typical step in many review protocols. For instance, "assess studies for risk of bias" is the

Sample RoB Adapted for Survey Research in Urban Planning
Many different RoB tools exist; however, because many of these tools originate from health sciences (Krizek et al., 2009), the bulk are created for study designs rarely (or never) used in urban planning studies, such as clinical trials or cohort studies. This leads to components of the tool that are not applicable to our field. For instance, the Cochrane RoB evaluation designed for clinical trials includes a performance-bias component (whether the trial inadvertently introduced differences other than the intervention being evaluated; The Cochrane Collaboration, 2011). This type of bias is rarely relevant in urban planning studies, a field known for using a wider variety of methods than many other fields. Given this, Table 2 presents an RoB tool for surveybased research in the field of urban planning that we adapted from the Effective Public Health Practice Project (EPHPP; Effective Public Healthcare Panacea Project, 2021) for urban planning studies. The original EPHPP is also presented in Technical Appendix A. This version of the tool is best suited for survey-based research and incorporates eight potential sources of bias relevant to survey studies in urban planning. Table  2 outlines the types of bias assessed, guiding questions, and grading criteria. This tool also provides a global rating score that incorporates all eight bias scores.
When literature review authors are completing data extraction for the studies included in their literature review, they must simply also respond to the guiding questions in Table 2 for each article. Most answers can be provided by carefully reading the methods sections of the included articles. Therefore, these assessments require nominal additional effort. From our experience, we highly recommend including a supporting statement for each of the types of bias to complement the answers to the guiding questions. Once the guiding questions (and supporting statements) have been answered for all articles included in the review, the author can score each article for each type of bias using the criteria put forth in Table 2.
The results of the RoB can then be reported in the literature review article in tabular or graphical form. For instance, Figure 1 shows the results from an assessment that used the EPHPP. Although the guiding questions and criteria reported in Table 2 will often be relegated to the appendix of the literature review, the results of the assessment (i.e., Figure 1) will generally be reported in the article after the narrative synthesis of the articles. The author(s) can then comment on the overall quality of the articles, for instance by reporting on the global ratings or reporting how well articles scored on each type of bias (in this case, the literature scores highly on Withdrawal and Dropout but very low on Blinding). Other trends can be reported as well; for instance, one could report if gray literature reports or older studies received lower scores on average than peer-reviewed or more recent studies.
Because RoBs are not yet used in our discipline extensively, we see this tool as a starting point: We hope that other researchers contribute to this tool and adapt it to meet their assessment needs. Technical Appendix A includes several RoBs that may be relevant for literature reviews in urban planning. The Critical Appraisal Skills Program (CASP) website also includes many excellent tools to help perform a critical appraisal of articles (CASP, 2021).
It is important to note that most of the tools in Technical Appendix A were designed for quantitative research, with an emphasis on survey-based research. However, different research designs tend to require different types of quality assessments (Kitchenham & Stuart, 2007) and urban planners make use of a wide array of research tools, including interview-based research, ethnography, case studies, archival research, theoretical modeling, and legal analyses. These different approaches mean that many of the types of bias assessed in the RoB we present here are simply inappropriate for research using these methods. Unfortunately, to the best of our knowledge, fewer RoB tools have been designed to assess these methods, which may be due to their lack of prominence in the health sciences. Quality assessment tools for qualitative research may be less prevalent because this research draws from different epistemological, ontological, and methodological foundations. These different framings result in different ways to assess rigor. As Small (2009) argued, qualitative research should not be assessed by the same tools and concepts as quantitative research. For instance, in terms of the data collected, qualitative research tends to collect rich data (quality), whereas quantitative research (Fusch & Ness, 2015) tends to emphasize thick data. Further, qualitative research tends to seek logical rather than statistical inference, and data saturation rather than representativeness (Small, 2009). However, qualitative research makes important contributions in our field, informs policy, and is often the subject of systematic reviews. Therefore, we encourage others to adapt tools to better suit qualitative research methods. The Cochrane Handbook recently included a chapter on qualitative quality assessments that emphasized four criteria: credibility, transferability, dependability, and confirmability (Hannes, 2011). The CASP has also developed a tool to appraise qualitative work which (see Table 5 in Technical Appendix A), and Kitchenham and Stuart (2007)  These tools and this literature may be important starting points for those willing to adapt RoBs in qualitative or even mixed-methods reviews.

Evaluation of Certainty of Evidence
The second tool frequently used in public health literature reviews that we have seldom seen in urban planning studies is the ECE . Whereas RoBs provide a quality assessment for each article included in a review, the ECE allows the authors to state how confident they are about the evidence as a whole. This evidence is grouped into all the different outcomes included in the review. ECE tools provide explicit, transparent, comprehensive, and structured processes for  rating the quality of the evidence. Therefore, ECEs provide clear guidance to practitioners about how confident the literature is about the identified relationship. The greater the number of studies included on a topic, and the higher quality the studies are, the higher the ECE will grade the topic Mercuri et al., 2018). GRADE (Grading of Recommendations, Assessment, Development and Evaluations) is a framework typically used to do this assessment . Using the GRADE approach, each outcome is assigned an initial certainty of evidence score based on study design.
Randomized controlled trials, natural experiments, and quasi-experimental studies are assigned an initial level of high certainty, whereas cross-sectional studies are assigned an initial level of low certainty. Then, different factors can lead to rating up or down the quality of evidence. The criteria that can rate the evidence down are those often included in RoBs: inconsistency, indirectness, imprecision, risk of bias, and publication bias. When the certainty of evidence is not downgraded, it can be upgraded if the following are observed: large effect, dose response, or opposing bias and confounders. Ultimately, the quality of evidence for each outcome falls into one of four categories from high to very low (high, medium, low, and very low; Balshem et al., 2011;Guyatt et al., 2011).

Sample ECE Adapted for Survey Research in Urban Planning
As an example, Table 3 presents the ECE used in Prince et al.'s (2021) study, which examined the association between active transport and physical activity across the life course. The domain, judgment, scoring, and criteria are outlined in Table 3. To follow this table, one must first group all articles examining an outcome together. Then, for each outcome, natural experiments are separated from observational studies (e.g., cross-sectional studies). Depending on this study design, the evidence is given an original score of high or low. 1 Then, each domain is assessed using the criteria in Table 3. The resultant scores are added to the original certainty. The sum produces the final level of certainty. This must be calculated for each outcome and presented in the literature review. Figure 2 illustrates the results of the ECE following the criteria in Table 3 (Prince et al., 2021). As was the case with the RoB, the criteria used to generate the ECE results (Table 3) are usually presented in an appendix or as supplementary material, while the results of the ECE (Figure 2) are embedded in the article in tabular form alongside a narrative synthesis. For instance, in this case the author could state that the evidence is graded as very low for children and youth, but moderate when studies consider both age categories together.
As was the case for RoBs, many adaptations can be made to the ECE we present here to better suit the field of urban planning. For instance, the total number of participants cutoffs in Figure 2 were generated from a textbook on survey methods that seek rigor through big samples and large-scale studies (Daniel, 2012). This can bias against smaller scale studies, many of which explore new and important topics. When this is the case, the author(s) of the literature review can be careful to highlight the impact of more original work in their narrative synthesis to counter this bias. Further, these criteria can be modified to best suit the types of studies included in a review, for instance, by reducing the sample-size cutoff criteria (in which case we recommend justifying the new criteria).
We envisage other potential modifications as well. For example, in the sample ECE presented here, all criteria are graded equally on a scale from À1 to 0 and all deductions are by 0.5 or 1 intervals. Depending on the study design, perhaps certain criteria can have more weight than others (e.g., indirectness scored on 2 points, whereas imprecision scored on 1 point). Further, to add more nuanced scores, deductions can be made by smaller intervals such as 0.25 or even 0.1. These adaptations can be modified to suit the study designs and methods included in a review.
Further, given that that criteria in ECEs are based off RoBs, and the RoB presented here is best suited for survey-based research, we urge researchers using other methods to also develop ECEs. We recommend GRADE as a starting point . However, modifications might be necessary. For instance, in studies using qualitative methods the number of participants may be less appropriate than, say, the richness of the interview or the study's sampling strategy.

Conclusions
In this Viewpoint we present two tools frequently used in public health literature review articles that hold great potential to help move the field of urban planning toward a more evidence-based approach to planning and to evaluating policies and projects. Because these tools have yet to be adapted for our discipline, we also showcase a version of each tool that is best suited for survey research in urban planning. We see three primary benefits to the incorporation of these tools into urban planning literature reviews. First, no research is without bias, regardless of discipline. The quality of studies in urban planning varies, and thus the evidence they produce should not be weighed equally. These tools provide clear guidance on how to rate the quality of articles and evaluate their impact on the emergent results. Given that the results of select past reviews and meta-analyses in urban planning have been quite contentious, spurring multiple responses to the author (e.g., M. Stevens, 2017), perhaps incorporating these tools into future reviews will result in more rigorous presentation of the evidence.
Second, many of the most pressing contemporary challenges, such as climate change, noncommunicable diseases, and pandemics, will require multidisciplinary collaboration. Urban planners will need to collaborate with public health and other city building officials to tackle many of these issues. To fully capitalize on this collaboration, the integration of the two fields remains a challenge. Learning, adapting, and incorporating some of the tools used in public health into urban planning is one of the small steps we can take to strengthen and ease this collaboration.
Finally, a frequent goal of literature reviews is to highlight policy implications, especially in practice-oriented fields such as urban planning (De Vos & El-Geneidy, 2021). We believe the tools discussed in this Viewpoint hold great potential to improve the generation of policy recommendations from urban planning reviews. By assessing each article included in a review (through an RoB) and by giving a clear level of certainty of the results (through an ECE), we can provide policymakers with more accurate syntheses of the literature on a topic and to help them decide what types of policies to pursue. For instance, if the evidence for one intervention is weak because, say, only two studies found the expected relationship and one found no evidence for the relationship, policymakers will know to wait for further research before investing in that intervention.
Though these tools have not yet been tailored for the field of urban planning, we showcase herein a version of each of these tools that is applicable for surveybased research in our discipline. It is our hope not only that these tools are integrated into urban planning literature reviews, but that they are seen as a starting Not serious, borderline RoB -0.5 points ! 75% of the reviews were considered low/moderate quality as assessed by the AMSTAR2 a < 75% of the reviews were considered critically low quality as assessed by the AMSTAR2 a Serious RoB -1 point ! 75% of the reviews were considered critically low quality as assessed by the AMSTAR2 a

Inconsistency
No serious inconsistency 0 The direction and magnitude of the effect was consistent across reviews (> 75% in same direction) Heterogeneity could be explained by measurement of AT and PA (self-report vs. device) Not serious, borderline inconsistency -0.5 points The direction and magnitude of effect was inconsistent across reviews Heterogeneity could be partially (but not completely) explained by measurement of AT and PA (self-report vs. device) Serious inconsistency -1 point The direction and magnitude of the effect was inconsistent across reviews Heterogeneity could not be explained by measurement of AT and PA (self-report vs. device)

Indirectness
No serious indirectness 0 There was good global representation of primary studies included in the reviews There was little-to-no inclusion of other age groups captured in the reviews Not serious, borderline indirectness -0.5 points There was good global representation of primary studies included in the reviews There was considerable inclusion of other age groups captured in the reviews Serious indirectness -1 point There was limited global representation of primary studies included in the reviews There was considerable inclusion of other age groups captured in the reviews

Imprecision
No serious imprecision 0 The total number of participants across all studies was greater than 10,000 b Serious imprecision -1 point The total number of participants across all studies was less than 10,000 b Publication Bias No serious risk of publication bias 0 Due to the systematic and comprehensive search strategy including a scan of the gray literature, an a priori decision was made not to downgrade for risk of publication bias.
Notes: AT: active transportation; GRADE: Grading of Recommendations Assessment, Development and Evaluation; PA: physical activity; RoB: risk of bias. The quality of the evidence was upgraded if there is not cause to downgrade, and there was evidence of a large magnitude of effect from meta-analyses. Half-points were combined across domains to yield a total score; if the final scoring included a half-point, we conservatively rounded down (i.e., -0.5 points ¼ -1 point). The quality of the evidence can be interpreted as follows: High-we are confident that the true direction of association between AT and PA lies close to the association we have estimated, and further research is very unlikely to change our confidence in the effect; Moderate-we are moderately confident that the true direction of association between AT and PA is likely to be close to the association we have estimated, but there is a possibility that it substantially different; further research is likely to have an important impact on the confidence in the direction of association and may change the direction of association; Low-we have limited confidence; the true direction of association between AT and PA may be substantially different from our estimate; Very low-we have very little confidence; the true direction of association between AT and PA may be substantially different from our estimate. a ECE based off the AMSTAR2 RoB from Shea et al. (2017). b From Daniel (2012). Source: Supplemental Table 2 in Prince et al. (2021).
point, as tools that can be further expanded on and refined to better reflect the needs of urban planning research. An obvious next step is to develop these tools to include different types of methods, especially those more common in qualitative studies, a type of research that makes important contributions to policy in our discipline. The potential impact of these tools on both research and practice is too important to ignore.
ABOUT THE AUTHORS L EA RAVENSBERGEN (lea.ravensbergen@mail.mcgill.ca) is a postdoctoral fellow at the Transport Studies Unit at the University of Oxford. AHMED EL-GENEIDY (ahmed.elge-neidy@mcgill.ca) is a professor at the School of Urban Planning at McGill University.