How robust is the evidence for beneficial hydrological effects of urban tree planting?

ABSTRACT Sustainable urban water management initiatives are increasingly required to combat rapid urbanization and climate pressures. Initiatives include the role of tree planting, for which there is a need for strong evidence of benefits and drawbacks to support effective future planning. We report on the robustness of evidence from an assimilated database of urban hydrological impact studies which often had differing primary purposes. Consistent impacts were found at the local level, with trees reducing runoff and infiltration. Despite the consistency of evidence, much is undermined by the studies being somewhat lacking in robustness and scientific rigour. Many studies lack adequate controls, and models are often not strongly tested against observations. Moreover, evidence of impact at larger scales is lacking. Effects of tree characteristics were also investigated, such as maturity and species (for which evidence is consistent and detailed) and arrangement (for which there is less evidence). Realizing the full potential of trees in urban water management decision-making would benefit from more rigorous evidence.


Urban water management
There is growing pressure on urban water management (UWM), exacerbated by population growth, climate change and the deterioration of current urban infrastructure systems. Alongside an increasing population comes increasing water demand (United Nations Educational, Scientific and Cultural Organization [UNESCO] 2019), and with 70% of the global population forecast to be living in cities by 2050 (Romano and Akhmouch 2019) this presents further challenges for UWM. The Urban Water Management Programme (UWMP) was set up by UNESCO to address these stressors, and the promotion of scientific policy guidelines, knowledge of new approaches and provision of sustainable tools are hoped to serve as a holistic approach to improve UWM as a whole (UNESCO 2019). As Romano and Akhmouch (2019) point out, there is currently no "one size fits all" approach to UWM. This is a concept that varies significantly with context, and there is an increasing need for more widely applicable approaches to solving these issues of UWM (Hurlimann et al. 2017). Degrading water quality and increased urban flooding are among the concerns for UWM, in conjunction with both population growth and climate change (Miller and Hutchins 2017).

Impacts of trees
Sealing of pervious surfaces such as the conversion of gardens to driveways serves to reduce the infiltration of rainfall and increase the risk of urban flooding (Warhurst et al. 2014). The overall increase in runoff volume, reduction in runoff lag time, greater peak discharges during storm events and increased streamflow flashiness are all symptoms of increased impervious surface cover. One strategy proposed to counteract this sealing has been sustainable urban drainage systems (SUDS). SUDS include interventions such as infiltration trenches, biofiltration swales, and the planting of trees and other vegetation (McGrane 2016). Street trees have been recognised as an essential part of stormwater management in the urban context. Trees are able to reduce runoff via interception by their canopy, returning some of this water to the atmosphere through evapotranspiration, and allowing greater infiltration of water through the soil surface to be absorbed by their roots or stored in litter (Center for Watershed Protection 2017). There are also technologies designed for urban areas that implement trees with the aim of reducing stormwater runoff (GreenBlue Urban 2015). The extent to which trees can provide these services has not been defined, nor has the relevant literature been objectively reviewed.

Aims and objectives
The aim of the present study is to evaluate the impacts of urban tree planting on hydrology. The primary objective to achieve this aim is: • To critically analyse the assimilated evidence to assess the scientific robustness and the quality of the outcomes found within it.
• To assess the impact of tree arrangement or planting location on hydrology; • To analyse the extent to which vegetation type affects hydrology; and • To identify differences between modelled impacts and actual, measured impacts of urban tree planting.
To assess these objectives, a rapid evidence assessment (REA) incorporating a systematic evaluation of evidence was undertaken, for which primary and secondary questions were formulated as highlighted below. Primary question: (1) What are the impacts of urban trees on hydrology?
Secondary questions: (i) Does the arrangement of trees or planting location affect the impact on hydrology? (ii) Is there a difference between the monitored and modelled impacts of trees? (iii) Does tree species have a significant effect on the impacts found on hydrology?

Methodology
To address the primary and secondary questions, a database of evidence was assembled from online literature resources via a systematic methodology. Search queries were designed on Web of Science (https://apps.webofknowledge.com/) using a list of keywords developed from population, intervention, comparator, outcome (PICO) criteria set up to address the primary and secondary questions (Table 1). With regards to the outcomes, keywords related to the variables being studied (i.e. interception) were used to narrow the search; otherwise, results would have been too broad.

Search engines and queries
Search queries were refined iteratively to focus the process whilst ensuring the return of appropriate and relevant evidence. This was achieved by using pieces of control evidence comprising literature known at the outset to be of key significance (e.g. Livesley et al. 2016, Frosi et al. 2019, Matteo et al. 2006. The search queries were put together in sections using individual elements of the PICO criteria and then combined (Table 1). The primary searches (Web of Science) were limited to return only evidence published in English due to language comprehension restrictions. Individual searches yielded around 5 900 000 hits, which on combination were reduced to 1142 (Table 2). There is the potential for published literature to be biased, with studies remaining unpublished if their findings are not significant (Gough et al. 2013, Collins et al. 2015. Therefore, search strings were set up in Google Scholar (https://scholar.google.com/) to ensure other important academic and grey literature was not excluded and that a fully representative evidence base was assimilated.
Simpler strings were used with Google Scholar as this search engine cannot recognise all Boolean operators. Following guidance from Haddaway et al. (2015), the first 200 Google Scholar hits were screened at the title level, aided by the text preview feature.

Screening process
The next stages of the database creation involved screening, whereby evidence was included or removed depending on whether criteria were met (Appendix A1). This was carried out in three stages: title, abstract and full-text screening.

First-stage screening.
After all evidence was assimilated, it was first screened by title. Evidence was categorised as relevant (1), irrelevant (0), or uncertain (-). If terminology related to green infrastructures in urban areas such as bioretention pits or bioswales was mentioned, but trees were not explicitly referenced, evidence was included but scored as uncertain.
Web of Science searches were added to the database before going through the first stage of screening, but Google Scholar and Google searches were screened as they were searched for. It must be noted that both Google search engines provided a preview of the text, which was used as an aid for deciding relevance. 2.2.2 Second-stage screening. Evidence reaching the second stage of screening was assessed using the abstract or first paragraph. If relevance was still uncertain after this, the full text was briefly searched for terms that made its classification uncertain. For example, if the abstract mentioned green infrastructure but not trees explicitly, the text was searched for "trees," and if the population was not certain "urban" was searched for.

Final-stage screening.
All evidence reaching the final stage of screening was screened using the full text. An additional inclusion criterion considered at this stage was whether the evidence included primary evidence. Review studies were still included, but then separated from primary literature, as there is greater potential for bias in review papers. In review papers, the robustness of evidence cited cannot be accounted for unless their rationale for study inclusion is stated, or unless the integrity of each study is made explicitly clear. Articles that mention urban tree planting and its impacts, but that are not review studies, without reference to primary data were excluded. When screening the full text, a note was made regarding whether the text was accessible or not. Those articles that were not accessible were screened out, which had implications for this study due to such evidence being potentially relevant.
After all evidence had passed the final screening stage, the evidence that had been screened out was checked again, comprehensively, to assess whether there were incorrect exclusions at both title-and abstract-level screenings. The results of this test found no studies that had been incorrectly screened out, and the final number of items of evidence to be used in the assessment, after duplicates were also removed, was 55.

Critical appraisal database
The final set of literature was compiled into a database with categorical fields, as highlighted in Appendix A2, to systematically describe the evidence (in addition to the metadata: source, title, author, publisher, publication year).

Critical appraisal (CA) scoring
Relevance was scored as either 1 or 0 depending on whether the evidence meets full inclusion/exclusion requirements for population, intervention, and outcome(s). The critical appraisal of relevance is stricter, however: it had to be explicitly stated and not inferred. For example, the impacts of trees on hydrology had to be direct, and not inferred from impacts on tree health (e.g. Grey et al. 2018a).
Robustness scores were split into sections: general, methodology and analysis. Each of these sections had a set of criteria that each piece of evidence had to fulfil to achieve a score of 1 (see Appendix A3, Table A3). If less than the majority of criteria for a section were met, that section was scored 0.
Some evidence primarily used modelling to determine the impacts of urban trees. Such evidence had additional criteria to meet for both methodology and analysis sections of robustness scoring. The way in which the model operated had to be well described, and the potential error or confidence values of the modelled impacts must also be stated.
Once both relevance and robustness scores had been finalised, they were multiplied together to give an overall appraisal score. Scores could therefore range between 1 and 9. Evidence with higher appraisal scores was given higher weighting in the synthesis of evidence and the formulation of evidence statements. The final appraisal score was also indicative of those studies which reduce bias the most.

Monitoring/modelling (MM) scores
Robustness was also assessed by scoring the length of monitoring/modelling controls and interventions and their monitoring/modelling frequencies. Studies that had short or no control/intervention periods or low frequencies received a score of 1; moderate scored 2; and high scored 3 (not applicable scored 0). The total score was calculated using the sum of each category; thus, scores can range from 1 to 9. The sum was used instead of multiplying scores as in CA scoring, as it discounted the importance of studies having a control even if it was a poor one.

Evidence statement (ES) index
CA and MM scores were combined in the form of an evidence statement (ES) index as a final appraisal of the outcomes in the studies assimilated. The mean CA and MM scores were calculated for each general outcome (e.g. reduced runoff), and then these were averaged to find the ES index value.
We strongly emphasise that whilst potentially giving the suggestion of being definitive, final scores and the components thereof should not be interpreted as being indicative of the entire value of individual research studies, which in many cases had a different or wider purpose.

Type, spatial extent and outcome/population of study
Studies on urban tree planting and the impacts on hydrology have become more common over the last few years. This is a promising trend for this field of research. Among the 55 total pieces of evidence (Appendix B1, Table B1), 53 were journal articles and two were books. There was one piece of grey literature, as opposed to peer-reviewed, among the 55 studies. The study types of the primary evidence were split relatively evenly, with a much smaller number of review articles (Fig. 1). Although reviews were not solely focused on modelling, the secondary evidence used in three of the studies used a combination of modelled and measured tree impacts.
There were multiple populations of study across the evidence database. Some evidence covered multiple outcome categories, but all could be categorised in one of 13 different dominant populations ( Fig. 2(a)). The most common populations of study in the evidence base were runoff, stormwater, and interception.
Most studied populations (infiltration, rainfall partitioning, stemflow) are close to the trees themselves ( Fig. 2(a)), either adjacent to or directly beneath them. In some cases, litter leachate impacts were found farther downstream, and other studies focused on more distant surface waters or runoff, or stormwater at multiple scales.
In terms of geographic focus ( Fig. 2(b)), the majority were based in North America. Five of the studies had multiple geographical foci. No evidence was found from Africa or Antarctica; and there was only one mention of a South American location, as part of a "multiple" study (Revelli and Porporato 2018). Of the five categorised as "multiple," four were reviews (secondary evidence), which unsurprisingly considered a wider range of locations.
Only a third of primary studies reported the size of the intervention area or the catchment area, and of these eight only reported intervention size, whereas two reported just the size of the catchment (Appendix B2, Table B2). There were a vast range of sizes reported, the smallest of which were 0.6 m 2 individual plots in Grey et al. (2018b) whereas the largest was in a study by Holder and Gibbes (2017), with an intervention area of 502 km 2 within a catchment study area of 2409 km 2 . Interventions with very low area percentage compared to the catchment they are located within (as defined by the location of downstream monitoring or modelling) do not provide robust outcomes. To provide more robust evidence, a higher intervention-to-catchment-area ratio is necessary. In this case, the highest was a study with a ratio of 21% (Holder and Gibbes 2017).

Tree type and configuration
Of the 49 primary studies, 14 focused on more than five individually named species and were classed as "mixed," and 21 studies reported one to five species. The remaining 14 studies did not specify particular species.
Some of the studies that focused on the effect of species or tree characteristics compared multiple types of trees, i.e. evergreen and deciduous (Appendix B3, Table B3). However, some only focused on one type, meaning observations had a less extensive comparator.
Many studies did not specify tree arrangement, but of those that did the majority were individually spaced ( Fig. 2(c)). Trees in bioretention pits, individual stands or open areas received greater focus as they are more easily analysed than linear or group arrangements, although some studies analysed multiple arrangements. Where arranged linearly, these were street trees. When grouped, this often meant trees were part of an urban forest system or a park, or even located in parking lots. In 18 of the studies, the arrangement of trees was not specified.
Although not part of the above categories, some studies mentioned the planting of trees within green infrastructure technologies such as bioretention pits and bioswales. One study, by Maniquiz-Redillas and Kim (2016), compared the impacts of green infrastructure with and without trees.

Assessment periods
Only a few studies reported monitoring periods under control conditions ( Fig. 3(a)), and of those that did none exceeded 2 years. Although there are relatively very few reported control periods, there were clear consistencies between length of control and length of intervention.
Only one study scored high for both intervention period and monitoring frequency ( Fig. 3(b)). However, it did not report a control period, and thus its overall robustness is not as strong. In most cases, if studies reported a high monitoring frequency, the length of the monitoring period was short, which probably reflects limited resources and difficulty in sustaining intensive monitoring for extended periods. It should be noted that many studies given low scores for monitoring frequency were those that did not specify a defined frequency of study. Furthermore, most studies scoring high for intervention monitoring/modelling also had low monitoring frequency (five studies). Six primary studies scored N/A for their monitoring/modelling lengths, as well as for monitoring frequency and monitoring period. It is not expected that quantitative studies would not mention their monitoring/modelling period lengths or their frequencies, and so their overall robustness is not as good. All seven review studies scored N/A in all monitoring length and frequency categories.
Of the 10 studies that reported length of monitoring periods for both controls and intervention, as well as monitoring frequency, three were ex situ, five were in situ, one was modelled, and one was modelled and measured. Considering there are only four ex situ studies in total, the robustness of monitoring for these studies is better than that of the other study designs.

Critical appraisal scoring 3.4.1 Relevance scoring (population, intervention, outcome).
Low relevance scores are likely to arise in studies where the primary objective was notably different from the subject of our REA. All primary evidence (quantitative observational and quantitative experimental studies) scored 1 for population relevance. Due to a lack of explicit reference to trees The geographical location of each piece of evidence categorised by region, with those focusing on multiple locations being classed as "multiple." Those pieces of evidence with multiple foci within one region were categorised by the appropriate region and not as "multiple." (c) Distribution of the arrangement of trees within primary studies found in database searches. Parts of a green infrastructure technology were reported as their configuration within that technology.
being planted or used, one study scored low for intervention relevance: Tirpak et al. (2019) reported the use of a tree in a suspended pavement study but did not study the impacts of the tree itself. Two primary studies received low scores for outcome relevance. Grey et al. (2018a) analysed the impacts of street tree planting technologies based on their improved growth capacity, but focused on the impacts not on stormwater itself, but on its effect on tree health. Tratalos et al. (2007) scored low for outcomes as runoff reduction was reported as a result of address (housing) density and not tree density.

Robustness scores.
Although all studies passed the general criteria of robustness, 14 and 11 studies, respectively, scored low for methodological and analytical robustness criteria. Studies that did not fulfil criteria for methodological robustness in primary studies were due to the lack of a control group in combination with another criterion. Perhaps unsurprisingly, five out of six of the review studies also scored low for methodology criteria. In cases where robustness criteria were scored low, minimisation of bias was not evident, and most reviews did not fulfil any of the methodology criteria. The objectives of most review studies are slightly different to the objective of the present REA. The core of the protocol in this study is to minimise bias. Objectives of published reviews often favour positive outcomes of the intervention they are implementing and reflect the issue that less significant or negative results tend to go unpublished (Collins et al. 2015).
Of the primary evidence studies, only six scored low for analysis, but three of these also scored low for methodology. Whilst analytical methods were always stated by studies, many were scored low due to a lack of precision in values in combination with either a lack of defined magnitude of effects or a lack of explanation of the results that were obtained. In contrast, the same five review papers that had low methodological robustness also had low analytical robustness, in all cases due to a lack of bias minimization in the synthesis of evidence. There is a lack of systematic reviews in this field.

Critical appraisal (CA) scores.
Critical appraisal (CA) scores were calculated from the multiplication of total relevance and robustness scores. Encouragingly, primary evidence studies mostly achieved the highest possible score of 9 ( Fig. 4(a)). Secondary evidence from review papers is considered separately, as appraising the rigour of the primary evidence that they used is not possible or is beyond the scope of the present study. However, the overall rigour of the reviews themselves is much lower than that of the quantitative studies. Only one of the reviews minimised bias effectively (Roy et al. 2012).

Monitoring/modelling scores.
Only one study scored ≥7 for monitoring/modelling, which suggests the overall rigour of methodology in this area was somewhat unsatisfactory. Thus, although many studies have high critical appraisal scores, all apart from this single study have low to moderate control/intervention periods and frequencies ( Fig. 4(b)).

Hydrological impacts
From primary evidence collected, there were 27 studies that reported runoff and the presence of trees to be inversely related (Fig. 5). Some reported this in terms of an overall value of trees present, and some modelled the impact of reduced/increased urban tree cover. Other related hydrological responses included the increase in interception (17 studies), increased infiltration (six), and evapotranspiration loss (seven). Six primary studies and three review studies reported increased infilitration as a result of urban tree planting. The importance of trees in increasing the infiltration rate (IR) was demonstrated by Bartens et al. (2008), where extension of tree roots increased IR by 153%, a result 27 times greater than that of the unplanted controls. However, one of these studies, by Nielsen et al. (2007), reported that maximizing total infiltration could also be done by expanding the underlying pit surface area beyond the crown drip zone. In the same study, it was noted that while evapotranspiration led to water loss in soil (measured at over 10 L day −1 ), this was not a driving mechanism in the overall hydrology of the tree pit.
The impact of differing meteorological conditions, such as storm intensity, was also identified as an important factor affecting interception and runoff (13 studies), with canopies reaching saturation faster with increased rainfall intensity (Guevara-Escobar et al. 2007). Seven of the primary studies linked a reduction in runoff to the increased interception arising from increased tree cover (Fig. 5). The urban water balance is controlled by multiple factors involving runoff, interception, infiltration, evapotranspiration, throughfall and stemflow, all of which are reported as among the outcomes in the evidence found. Eleven primary studies focused on just one of these factors, but their relationship with other processes was not always reported. For instance, Xiao and McPherson (2011) reported an increase in infiltration due to the presence of trees, but did not link this back to runoff, which can be considered the main hydrological issue in urban areas.
Meteorological conditions were found, in 13 primary studies, to be a key factor affecting the success of trees in improving the hydrological regime. Interception in low-intensity storms was much more successful than in larger storms, or in larger storms as rainfall increased past a saturation point (Xiao et al. 2000, Wang et al. 2008, Livesley et al. 2014, Zabret et al. 2018. Interception was controlled by precipitation characteristics for smaller events, but by the maximum canopy storage for larger events (Xiao et al. 1998, Xiao and. The other main variables controlling interception rates and volumes were mostly related to the characteristics of the tree itself, such as leaf area index (LAI), canopy morphology (volume, area, etc.), and bark roughness, as highlighted in 12 of the primary studies. These characteristics reflect species (see Section 4.4).
The diversion of interception to stemflow was an important factor, highlighted by two studies, which aided the reduction of throughfall and thus runoff by directing water towards the base of trees whereby greater infiltration was encouraged (Carlyle-Moses and Schooling 2015, Huang et al. 2017).
All secondary evidence reported similar impacts of trees on urban hydrology. Overall, 54 of the 55 primary and secondary studies highlight that trees are beneficial in hydrological terms on a variety of scales. The one study that does not, by Zabret et al. (2018), has a neutral conclusion, with impacts instead reported as being controlled by meteorological conditions.

Robustness and consistency of evidence.
Although most studies on hydrological impacts achieved maximum critical appraisal (CA) scores, there was only one study with a high monitoring/modelling (MM) score. This suggests that although the study designs were well structured, the frequency and length of monitoring and modelling periods for most studies were not as robust. Yet among all 27 studies reporting outcomes related to runoff, there was a reduction in runoff despite differences in overall robustness. However, there is still a need to improve the length and frequency of study interventions as well as to increase the number of controls used, to make a more reliable comparison on the impacts of trees on urban hydrology. Only 10 primary studies had a control period (Fig. 3(a)); 39% of primary studies had an intervention period of less than a year; and just 8% of primary studies had an intervention period longer than 2 years. To improve confidence in urban tree planting as a means to reduce runoff, longer periods of monitoring under intervention and control periods would be beneficial.

The effects of tree arrangement
Ten studies reported differentiation in the outcomes they recorded based on tree planting arrangement, location, and techniques. One study, by Scharenbroch et al. (2016), noted that when a tree's growth is impaired, so is its health, and thus it has a lower potential to reduce runoff.
Five studies that compared tree arrangement focused on tree density. Studies such as that by Asadian and Weiler (2009) showed that isolated, individually spaced trees with open canopies performed better in terms of increasing interception losses. In addition to tree density, Song et al. (2020) showed through modelling of different types of urban green space (Fig. 6) that replacing existing trees with ones that had a higher LAI would also have a significant effect on runoff reduction. On a neighbourhood scale, Inkiläinen et al. (2013) highlighted differences in measured total throughfall between trees in front and back yards. The higher total throughfall was found in front yards, but this was mostly attributed to the density and type of vegetation in front yards. It was suggested the arrangement of trees and thus the reduction of runoff at this scale could be controlled by the residents themselves.
Four studies had a defined linear arrangement of trees, all of which were planted in streets. Grey et al. (2018b) found that with regards to street tree pits, runoff retention was also linked to the connectedness of impervious cover. Thus, an increase in tree density as well as cover enhances the benefits for urban hydrology. This is congruent with a study (Baró et al. 2019) on street trees in Barcelona, Spain, where the total ecosystem benefits of urban street trees within each district were closely related to their density.
To explore how urban runoff might best be mitigated, Matteo et al. (2006) modelled the impacts of 10-ft street trees and 200-ft riparian buffers. It was found street trees performed better at reducing runoff than the riparian buffer zones in urban areas. On the other hand, riparian buffers were more efficient at reducing runoff in suburban watersheds.
Five studies focused on the impact of grouped trees, although Song et al. (2020) also showed that the increased density of groups of trees in different urban settings could increase potential runoff reductions further. The study has a larger intervention area focus than most others, at 33 km 2 . However, in terms of specific CA scoring criteria for analytical robustness, the study scored lower than 80% of other studies focused on tree arrangement. Thus, although reported outcomes have been recorded positively on a larger scale, the relatively low robustness of these findings makes for tentative evidence.

Robustness and consistency of evidence.
The robustness (CA) of evidence found regarding the importance of tree arrangement comparisons is high (9) apart from two of the studies (6). There is some inconsistency of evidence between studies. Song et al. (2020) suggest that an increase in tree density will lead to further reductions in runoff, as do other studies (Inkiläinen et al. 2013, Grey et al. 2018b, Baró et al. 2019. However, Asadian and Weiler (2009) challenge this, suggesting that more isolated trees with open canopies and in good health will perform better.
There is a need for more in-depth qualitative studies comparing the influence of different tree arrangements on urban hydrologic regimes. There are different reported arrangements of trees within studies, but not many comparisons between arrangements. Making such comparisons is important for urban planners to maximise the efficiency of tree planting and increase the overall cost-effectiveness of such schemes. Song et al. (2020) presented outcomes that would be beneficial for urban planners when deciding the location and arrangement of urban trees.

Corroboration of modelled effects by observations
Although there are eight urban model study designs and 18 combined studies (modelled and measured), only four analyse differences between modelled impacts and observations. No secondary studies cover modelling. Compared to other study designs, those that employed modelling were mostly focused on runoff reduction (seven studies). Guo et al. (2017) reported an error rate of <5% for 12 of the models applied. Deutscher et al. (2019), however, reported accuracies of 66% for tree stand land cover when measuring soil moisture on a monthly scale over 2 years (potential error of 34%). These two studies highlight the range in accuracy (i.e. the difference between modelled and measured values) of different models. Inkiläinen et al. (2013) carried out sensitivity analysis to show the impact of initial canopy dryness on their model. However, their model was able to explain 94% of the variation in measured throughfall. The increase in storm magnitude also increased residuals, reflecting a decline in model performance as rainfall increased.

Robustness and consistency of evidence.
Overall, the robustness of research into modelling the impacts of different tree species is limited. The contrasting evidence and lack of model comparison or calibration against measured impacts hinders the overall robustness of the studies. Four of the six identified studies had moderate monitoring scores, whilst two had low scores. The relatively short modelling periods hinder the overall robustness of the studies. For most of the monitoring studies there were no control periods (25 of 27 modelled and combined study designs). Also, intervention periods tended not to last longer than 2 years, with the exception of one study, whilst modelling frequencies occurred at a temporal resolution of greater than a fortnight in only five out of 27 studies.

Tree species variation
Twenty-two primary studies focused on the different impacts caused by tree species. However, seven of these did not have baseline comparators to judge the overall impact of tree planting, as opposed to the benefit of one species over another. Evergreen and coniferous trees have advantages over deciduous trees in terms of runoff reduction and increased interception Šraj 2015, 2019). Guo et al. (2017) studied the water storage ability per unit leaf area of different tree species, finding that coniferous trees outperformed both their deciduous and natural forest counterparts. The mean rainfall interception capacity (RIC) of conifers was over 1.5 times that of broadleaf deciduous trees. Xiao and McPherson (2016) attributed this to morphological factors such as surface roughness. The relative benefits of coniferous trees are only apparent in storms of smaller magnitude (Liu andChang 2018, Zabret et al. 2018). In contrast, other studies have found that increases in canopy cover and plant area index (PAI) are more important at determining runoff reduction (Inkiläinen et al. 2013, Livesley et al. 2014. Increased canopy cover was also found to be better for predicting throughfall volumes than LAI, which can be an unreliable predictor of hydrological response for deciduous trees due to the unpredictable rates of fallout (Huang et al. 2017). Given the importance of increased canopy cover, evergreen species are especially beneficial in winter periods and this should be acknowledged to avoid biased conclusions (Xiao and McPherson 2011).
Although species selection is important in determining impacts on urban hydrology, there is also a need for planting areas to complement the rooting system of the chosen tree (Rahman et al. 2019). Rahman et al. (2019) found Robinia pseudoacacia had a higher growth rate with finer roots which consequently increased infiltration, yet Tilia cordata was able to influence deeper percolation of water via its deeper rooting system.
Other factors influencing the maximum amount of rainfall that can be intercepted by trees are highlighted by Kuehler et al. (2017).They found leaf area and morphology to be significant. Those species with more rigid leaves performed better, for example.
To achieve optimal tree growth, and thus ecosystem service performance, consideration of favourable soil type for different species is also important (Day and Dickinson 2008). Day and Dickinson (2008) also suggest the largest trees with the best developed root systems remove the greatest volume of water from stormwater reservoirs.

Robustness and consistency of evidence.
The size of intervention areas varied from around 25 m 2 (Tirpak et al. 2019) to 502 km 2 (Holder and Gibbes 2017), but this did not have a significant influence on results. Overall, although the tree species and type (coniferous or deciduous) is important, tree characteristics are more significant in determining the magnitude of impact on hydrologic regimes. Canopy morphology, leaf density, LAI, RIC, bark roughness, tree health, and tree maturity are all pivotal in determining the volume of runoff, interception, throughfall, and stemflow. The findings of studies analysing tree species variation amongst other characteristics are relatively consistent and corroborative despite variation in robustness scores.

Synthesis of evidence statements
Evidence statements are the aggregated conclusions made by papers reviewed into categories such as "reduced runoff." To indicate the reliability of the final evidence statements, an evidence statement index (ES) was created to provide a more accurate weighting of each statement based on both their CA and MM scores. Averages for each evidence statement were calculated to provide the CA and MM scores in Table 3. The ES index was calculated by averaging these two. All hydrological outcomes were hindered by the MM scores of their respective studies. There is little variation in the ES index values of outcomes.
The most robust outcomes found were related to evapotranspiration loss and canopy interception loss. Although these outcomes are similar, they were kept distinct in terms of their definitions. There were 27 studies reporting reduced runoff. Although the MM scores of these studies hindered their overall ES index value, they provide a substantial evidence base from which to make a summary of quantified effects. In the evidence base, 14 studies report runoff reduction attributable to tree establishment as a percentage. These are comparable, and from a graphical synthesis (Fig. 7) it is readily apparent that the establishment of trees on impermeable ground (i.e. street trees on urban roads) is highly effective at reducing runoff. The establishment of trees on a range of urban fabric types comprising a mix of permeable and impermeable surfaces provides less but still substantial benefit. There is little conflict in terms of positive and negative outcomes. All studies reported a significant benefit from increased tree cover, yet the claims made are still tentative due to their low to moderate MM scores and thus low ES index values. For more conclusive results, studies with more robust methodologies are needed.
The issues covered by the studies identified to address the primary and secondary questions are summarised to illustrate where research effort has been focused (Fig. 8). Pie charts within the diagram indicate that particular attention has been given to runoff relative to other hydrological impacts, and to individual trees rather than groups or lines of trees. A distinction between evidence from natural planting and engineered planting is also apparent, as are effects of tree management and monitoring of health; both aspects are discussed below.

Other findings
In addition to the primary objectives of the REA, additional findings of a substantial and pervasive nature were apparent; these are summarised in three subsections below.

Green infrastructure and trees.
The implementation of trees within green infrastructure was prevalent within the literature found, e.g. bioswales, green roofs, tree filter boxes, etc. Berland et al. (2017) highlighted an improved performance of trees, in terms of stormwater management, when coupled with green infrastructure technologies such as bioswales. This is not a significant conclusion in other primary studies, but it does indicate the potential for the integrated use of trees in urban environments. Tree performance can be hindered by a lack of consideration of the planting area of the tree (Day andDickinson 2008, Rahman et al. 2019).
Increased impervious cover due to urbanization is one of the main driving factors affecting urban hydrology and the risk of flooding. Nou and Charoenkit (2020) found that an increase in pervious cover by 44% can reduce peak runoff by 1.55 m 3 s −1 . However, they also found that permeable pavements were the most effective form of green infrastructure at reducing total runoff. In contrast, Deutscher et al. (2019) reported that treed land cover performed better in terms of reducing surface runoff than park lawns, which had much less impervious cover. However, Armson et al. (2013) revealed that whilst trees in pits surrounded by asphalt were able to remove as much as 62% of runoff, grass lawns eliminated almost all runoff. The importance of increased infiltration due to the size of the pit in which the tree was planted was recognised as a significant factor affecting the reduction of runoff. The measured reduction in runoff was more than interception alone could have caused, which suggests the infrastructure in which trees are planted can be just as significant as the tree itself.
In contrast to the studies supporting the planting of trees, Zölch et al. (2017) show that green roofs performed better in terms of runoff reduction than when trees were used as the main intervention. This is likely due to the larger permeable surface that green roofs create (10.1%) compared to tree planting (3.9%), despite similar green cover (~15%). The differences are only small, however, with green roofs leading to 0.6% greater surface runoff reductions than tree planting. Grey et al. (2018a) found that to achieve the optimum benefits of trees, management is also important. Passive irrigation of trees with stormwater can reduce the growth and even cause the death of the trees. Technologies and tree planting strategies in future must focus on avoiding the waterlogging of tree pits.

Tree management and health.
Some studies have highlighted the importance of management in terms of the medium in which trees are planted and the opportunity for successful growth. Nielsen et al. (2007) reported increased tree growth in urban parks compared to non-irrigated street trees. Grey et al. (2018a) showed that tree health can also be improved by the addition of an underdrain in the tree pit technology. In addition, Rahman et al. (2019) specified that pits in which trees are planted must be designed to complement the species of tree. Some have greater rooting zones, which can be confined by the size of the pit, with health and function deteriorating as a result. Grey et al. (2018b) reported that increased tree pit area and density, as opposed to tree density, would also have a significant impact on runoff reduction. By increasing the ratio of tree pit area to catchment to 4.4%, a 90% reduction in runoff could be achieved.
When choosing tree species for bioswale planting, considerations should be made regarding the rate of stomatal conductance and the total leaf area (size) at maturity, and the health and condition of the trees are of key importance too (Scharenbroch et al. 2016). Despite this, Asadian and Weiler (2009) showed that, in some cases, whilst healthier tree species do capture a greater proportion of rainfall, trees in poor Figure 7. Effectiveness of urban tree planting on runoff reduction, differentiated by the substrate on which the trees were planted. Data is based on all primary studies reporting reduced runoff as an "outcome" which also clearly indicate the substrate in which trees are planted.
condition may still intercept more than others. The evidence on the extent to which tree health can impact catchment hydrology is robust but conflicting. Evidence regarding other ecosystem services, such as carbon sequestration, has found that larger and more mature trees perform better (Turner-Skoff and Cavender 2019), but this difference has not arisen clearly in the evidence found by this REA. Some characteristics such as canopy size and density are related to tree maturity, but very few studies have explicitly stated maturity as a significant variable. Trees take time to mature and provide greater ecosystem services. This is something that needs to be addressed in further research if trees are to be used effectively.

Other quality indicators.
Although this study focused primarily on the impacts of trees on hydrology, there were studies that had other foci too. Examples include Soares et al. (2011), who calculated that the reduction of runoff caused by street trees in Lisbon, Portugal, led to greater savings in financial terms (US $1.97 million) than that arising either from energy saving or improved air quality. Over a 35-year period, McPherson et al. (2011) estimated that the One Million Trees project would reduce runoff by 51-80 million m 3 , a savings valued at US$97-153 million. Trees can provide directly measurable economic benefits as well as environmental ones. Baró et al. (2019) measured the beneficial effects of trees on temperature and air pollution. Like their noted impacts on runoff reduction, the total ecosystem value provided by these trees was mostly correlated to the density of trees within each district. Unsurprisingly, review studies also had multidisciplinary foci. Roy et al. (2012) reported a wide range of other impacts, observing positive effects of trees in terms of social issues, economic benefits, health improvements, enhanced aesthetics, reduced noise pollution, mitigation of heat island effects, reduced energy use, and better air quality.

Conclusions and recommendations
This REA was undertaken to establish the robustness of evidence available to support whether the implementation of urban forestry has beneficial impacts on hydrology and what those impacts are. More specifically, we identified whether evidence explicitly related to nature-based solution (NBS) implementation, such as bioswales, green roofs, and tree filter boxes. The evidence statements (Table 3) were weighted based on scores for each individual paper. The scores themselves constitute an aggregate of criteria based on relevance, robustness, and rigour of monitoring/modelling. Consistent beneficial impacts were found at the local level, with trees reducing runoff and increasing infiltration. Map of evidence covered in terms of tree characteristics and hydrological response. The quantitative breakdown in the pie charts into types of hydrological impact and tree arrangement is based on aggregated Evidence Statement (ES) scores of the relevant studies. Substantial attention to factors related to substrate and physiography was also apparent, but by their nature the studies could not be readily categorised for a similar quantification to be appropriate or meaningful.
The REA has identified shortcomings regarding the robustness of studies, but there is a potential for bias within the methodology, excluding recent research for example. The overall lack of grey and unpublished literature that passed the full-text screening was due to the lack of reference to primary data, which was among the criteria for inclusion/ exclusion.
The presence of controls/comparators and the ratio of intervention to catchment area were important factors to consider. Without a control, or a baseline, conclusions on the effectiveness of trees for stormwater management are limited. Only 10 primary studies incorporated a control period, although 39 of 49 studies did have valuable comparator (e.g. increase in tree cover). Furthermore, very few studies reported the size of the intervention area and/or catchment areas, and there was thus a lack of contextualization to the results found. There was little evidence of larger scale effects of trees on hydrology, a finding consistent with previous research on flooding impacts (e.g. Stratford et al. 2017). Only a minority of studies identified effects in water bodies, but trees may still be beneficial to urban environments at a more local scale. Additionally, studies found that trees were effective at mitigating a vast amount of runoff in smaller storms but were less effective in larger scale storms. Regarding methodological robustness (MM scores), studies were rarely of sufficient length to identify long-term temporal variations. To account for these, interventions must be monitored more frequently and over longer time scales. Infrequent monitoring cannot capture potentially significant short-term fluctuations.
Studies based on modelling approaches rarely reported any testing of models against observations. Although some studies did report model performance, a general lack of testing suggests that modelling studies might not be robust enough to make conclusive remarks on their findings. As with monitoring studies, there is a lack of both control periods and sufficiently extended intervention periods for such studies. There is need for further primary observational research on the wider scale of these impacts in order to apply models confidently in potentially valuable situations comprising relatively large intervention/catchment areas.
The location and arrangement in which urban trees are planted were also found to be inconclusive in terms of how best to maximise the benefits of trees. There is a need for more studies implementing both linear and grouped trees, for example, as much existing research focuses on individually spaced trees. In urban landscapes there is often limited potential for tree planting due to the vast interconnected impervious cover, and so evidence regarding the optimal arrangement or spacing at which trees should be planted to achieve the desired ecosystem functions (e.g. runoff reduction) would be invaluable. Most studies referenced tree density as one of the most important factors determining the level of benefits the trees provide.
Of the secondary questions investigated, tree species was the most comprehensively researched. Species has not been found to have a significant impact in the variation of outcomes observed, although in broader terms some studies favoured evergreen trees over deciduous. There was little impact between tree species during larger storm events; instead, the rainfall interception capacity of each tree appeared to be a controlling factor of runoff volume. It could be beneficial to compare the impacts of tree characteristics to meteorological influences on outcomes such as interception rates, for example. In terms of mitigating urban heat island effects, the size and maturity of trees is pivotal, and this aspect should be further investigated in terms of hydrological impacts for which there is only indirect evidence.