High-speed rail to prosperity? Assessing the role of transportation improvement in the urban economy

Abstract Investigate the impact of high-speed rail (HSR) on local economy is of great importance and interest to policy makers and scholars. Though there is a big body of literature in this area, the estimates of such impact are inconsistent or even contradictory. The empirical evidence remains problematic for several reasons: endogenous route placement; omitted variable bias; heterogeneity across different regions; various confounding factors. In this paper, we assess this impact by constructing the appropriate counterfactual in the absence of HSR services with similar GDP level and GDP trend before the debut of HSR services. The control group forms a good fit for the treatment group, and the economic performance of the control group was even slightly stronger than that of the treatment group before 2007. Using the DID method, we find the HSR network promoted local GDP by approximately 3.3 percentage points. The introduction of HSR service helped cities attract more industrial enterprises and achieve more industrial output, but its effect on the service sector was not pronounced. Our results are robust to different sample selection procedures, to the dynamic analyses, to different empirical strategies. Our study thus provides new and solid empirical support to the argument that HSR benefits local economic development.


Introduction
Historically, the rapid economic growth of developed countries and regions such as Western Europe, Japan and the United States have been accompanied by large-scale transportation infrastructure construction (Banerjee et al., 2020). That is, generally, better transport infrastructure seems to be closely associated with better economic performance. However, it remains unsettled whether transportation infrastructure is an engine of growth.
In this paper, we focus on the specific case of the large-scale high-speed rail (HSR) network in China. For China, the HSR network is part of a national strategy to support economic growth and urbanization (Meng et al., 2018). The earlier target of the Chinese HSR network was to form connections between provincial capitals (Ke et al., 2017). Subsequently, China extended the goal of the HSR network to connect medium-sized cities with more than 500,000 people (Shao et al., 2017). Before 2015, 155 out of 292 Chinese cities were connected to the HSR network. At 2016, China's HSR network running mileage reached 22,000 km, with design speeds ranging from 200 to 350 km per hour (see Figure 1). Growing literature documents a high growth in passenger numbers after HSR services are launched (Vickerman, 2018). In addition, HSR expedites the relocation and band adjustment of corporations and families (Zheng & Kahn, 2013). The likely result is that labor and production resources cluster in cities connected by HSR routes (Shao et al., 2017). Given its substantial impact on the improvement of the accessibility of cities, the HSR system has great potential for nurturing market integration and promoting agglomerated economies and urban growth.
In this paper, we quantify the economic impact of a large-scale HSR network on cities by comparing the economic performance of a treatment group and a control group before and after the introduction of HSR services. This analysis employs a difference-in-differences (DID) model with a unique dataset: the national train schedules of China between 2007 and 2014. The key identifying assumption is to construct an appropriate counterfactual for what would have happened to cities' economies in the absence of HSR services. In addition, we employ the instrumental variable (IV) strategy, the PSM-DID methodology and placebo tests to address potential endogeneity concerns. On this basis, we proceed to trace the dynamic effects of the HSR network to investigate whether industrial agglomeration is the mechanism by which HSR promotes the urban economy.
For the eight years before the debut of the HSR services, we find a statistically equivalent trend and level in local GDP for the treatment and control groups. Then, the treatment group experienced significant growth in local GDP after being connected to the HSR network. Our estimates indicate that HSR increases local GDP by approximately 3.3 percentage points on average, which is equivalent to moving a city from the 10th percentile of the city-level GDP distribution to the 29th percentile in 2014. Moreover, our evidence shows a significant increase in industrial agglomeration in the treatment group vis-a-vis the control group after HSR service provision. Moreover, we find that the effect of HSR on regional economic performance is driven partly by the impact of the HSR network on industrial agglomeration.
The remaining parts of this paper proceed as follows. Section 2 reviews the related literature, Section 3 introduces the data and methodology employed in this study, Section 4 presents the results and discusses the econometric analyses, and Section 5 presents the results from the robustness tests. The conclusions and policy implications are addressed in Section 6.

Literature review
Our study closely relates to the literature on HSR project cost-benefit analysis (CBA).
Whether and to what degree HSR projects affect economic growth are of great importance to policy analysis of new transport infrastructure investments. The coverage of the CBAs of conventional HSR projects naturally varies from country to country, but as a rule, these analyses tend to cover the construction costs of transport projects, the operation and maintenance costs of associated transport services, direct user benefits, and a restricted list of externalities, such as transport safety impacts, congestion, overcrowding and emissions (Ollivier et al., 2014). However, there seems little doubt that there are wider impacts from HSR investment that are not captured by conventional CBA (Laird et al., 2014;Venables, 2007). In particular, the regional economic effect of a large-scale HSR network is considered nontrivial, which should be carefully estimated and added to the extended CBA.
There are intense debates about the causal link between transport and local economic performance (Chen & Vickerman, 2019). The mainstream literature agrees that better transport and better economic performance are causally connected (Vickerman, 2018). Chen (2012) summarizes that this relationship is complex and involve a long-term evolution. Transport infrastructure could produce step changes in accessibility or generalized transport costs. This, in some cases, may lead to significant changes to both individuals and businesses in terms of the location of economic activities (Chen & Vickerman, 2019). From a macroscopic perspective, transport infrastructure could encourage regional economic performance in various aspects, such as enlarging the intercity tourism market, reducing information asymmetry, facilitating innovation and fostering an agglomeration economy (Lawrence et al., 2019).
On the other hand, there is also evidence suggests that this effect is negative or insignificant. Many scholars attaching more value to the safety and efficiency of transportation facilities (Rosov a et al., 2013;Sivalai & Rojniruttikul, 2018). Aschauer (1989) states that public investment would crowd out private investment in some scenarios. Huang (2011) concludes that HSR investments crowd out other transportation investments based on the worldwide experience. Besides, Albalate and Fageda (2016) find the provision of HSR services in Spain negatively affected air traffic, which suggests a negative indirect effect of HSR on tourist outcomes. There is also a concern on negative externality of transportation on socioeconomically disadvantaged areas (Faber, 2014). The core-periphery model suggests that transport costs reduction could prompt human and capital in periphery areas to flow towards the core areas and hence weaken the economy development of periphery areas.
New economic geography (NEG) reveals that changes in transport costs and accessibility could, in some cases, have profound effects on the location of agglomeration (Graham, 2007;Krugman, 1991). The extent to which agglomeration occurs depends on the interaction of increasing returns in economic activities, the significance of transport costs and market size (Vickerman, 2008). HSR services contribute to the attractiveness of locations in which industries tend to form spatial clusters based on factors such as increased access to raw materials, technical expertise, and markets. Morten and Oliveira (2016) find that transportation reduces the cost of moving people and promotes interregional migration. International studies suggest that these agglomeration effects are likely to be the most important among the economic effects of HSR (Ollivier et al., 2014). Chatman and Noland (2011) conduct a detailed literature review on this subject and summarize that transportation improvements can bring substantial positive externalities by enabling economies of agglomeration. Redding and Turner (2015) review the theoretical and empirical literature on how transportation infrastructure modifies the territorial distribution of economic activity. Studies on HSR projects in England, Germany, Spain and Japan also indicate that HSR contributes to an obvious agglomeration effect on regional economies (Hensher et al., 2014;Matas et al., 2020;Wetwitoo & Kato, 2017). Nevertheless, extant empirical works focus mainly on single cities or a narrow spatial domain in which agglomeration effects tend to be obvious (Chen & Vickerman, 2019). The question of whether and to what extent HSR promotes economic performance through the agglomeration effect is still ambiguous, especially for large-scale HSR networks (Shao et al., 2017). Chen and Vickerman (2017) underline that the agglomeration effect induced by HSR varies in different contexts. In postindustrial places, such as Western Europe, where the secondary industry has largely moved to industrializing countries, the strategic role of HSR is strengthening knowledge economies (Chen, 2012). Evidence also shows that HSR appears to mainly assist in the division of service labor between routine and knowledge-intensive activities in Europe (Chen & Vickerman, 2017). In China, which is still engaged in the industrialization and urbanization process with an extremely uneven pattern (Friedmann, 2005), the situation is different; even though the HSR network has the potential to facilitate agglomeration effects in the industrial and service sectors, the estimation of these effects is inconsistent for cities along different HSR lines. This inconsistency relates fundamentally to the uneven development pattern of secondary industry and the service sector across China. Only some developed cities in the eastern areas of China have experienced the transformation of their service and knowledge-based economies, and most Chinese cities are still important centers of the secondary industry (Chen, 2012;Fang et al., 2020). The related literature on the agglomeration effects of Chinese HSR lines obtains different results in different circumstances. Based on data for 25 cities in China's Yangtze River Delta region during 1995-2014, Shao et al. (2017) show that HSR has a significant impact on producer service industry agglomeration but not on consumer service industry agglomeration and public service industry agglomeration. Chen and Vickerman (2017) illustrate that in China's Yangtze River Delta region, some cities with HSR services enjoy strong growth in the secondary industry, while others enjoy strong growth in the service industry. Dai et al. (2018) find that 9 of 14 subdivided industries in the tertiary industry experienced an agglomeration trend in cities with HSR service along the Beijing-Shanghai HSR line after the inauguration of HSR services.

Data and methodology
This section describes the data and methodologies employed in this study. Cities in this paper refer to all prefecture-level cities, four province-level cities and fifteen subprovincial cities in mainland China.

Data
We collect HSR data from the National Railway Passenger Train Schedule, which contains detailed information on the arrival and departure times of all trains at each station. In 2007, 57 Chinese cities began operating high-speed trains after China's sixth-round railway large-scale acceleration. Eight cities were connected to the HSR network in 2008, 28 in 2009, 13 in 2010, 1 in 2011, 11 in 2012, 15 in 2013 and 22 in 2014. The socioeconomic characteristics of cities are collected from the 1999-2014 Statistical Yearbook of Chinese Cities. Datasets on nighttime lights are obtained from the Chinese Research Data Services Platform. The innovation index of cities comes from the China City and Industry Innovation Report (2017). The least-cost HSR route data are from Rao et al. (2019). Table 1 presents the descriptive statistics 1 with variables measured at the city-year level. Specifically, the data are from 288 cities over a 16-year period from 1999 to 2014. Due to data loss and error, the number of observations in different regressions is different. Based on the GDP deflator of 1978, nominal variables are converted to real variables.

Methodology
We use the DID method to identify the economic impact of the HSR network and further resolve the endogenous HSR route placement concern with multiple strategies (the IV strategy, PSM-DID, and placebo tests). Drawing on the work of Beck et al. (2010), a continuous DID model is used and presented as follows: In Equation (1), Y s, t is a measure of the economic development of city s in year t. A s and B t are vectors of city and year dummy variables that account for city and year fixed effects, respectively; X st is a set of time-varying, city-level control variables, and e st is the error term. The variable of interest is HSR s, t , a dummy variable that equals one in the years after city s initiates HSR service and equals zero otherwise. The coefficient b reflects the impact of the HSR network on urban economic performance. Year-specific dummy variables control for nationwide shocks and trends that shape economic development over time, such as business cycles, policy changes, and changes in labor force participation. City-specific dummy variables control for time-invariant, unobserved city characteristics that affect economic development across cities.
Following Beck et al. (2010), we use the following regression to test for a common pretreatment trend for the control group and treatment group: In Equation (2), we incorporate a set of indicator variables (the 'Ds') corresponding to 1 year, 2 years, Á Á Á , before or after HSR services were introduced. Specifically, for the notation of D 6j s, t , the 'j' is a superscript of these indicator variables, and we put a minus or plus sign before the 'j' to denote the j year before or after the year Note: The cross-city standard deviation of Y is the standard deviation of (Y st ÀỸ s ), whereỸ s is the average value of Y in city s over the sample period. a Government size is measured by the ratio of government expenditure to GDP. b Infrastructure development refers to the value area of paved roads divided by total land area of city. c Marketization level is expressed by the proportion of foreign investment to GDP. Source: Authors.
city s connected to the HSR network. D Àj s, t ðor D þj s, t Þ measures whether year t is the j year before (or after) city s is connected to the HSR network. D Àj s, t equals one for city s in the j year before it is connected to the HSR network and equals zero otherwise. D þj s, t equals one for city s in year j after it is connected to the HSR network and equals zero otherwise. A s and B t are vectors of city and year dummy variables that account for city and year fixed effects, respectively.
Following the literature (Faber, 2014;Rao et al., 2019), we adopt the least-cost path HSR network computed by Rao et al. (2019) as the IV. The least-cost tree network refers to the hypothetical HSR routes with the lowest construction cost between major cities, which are constructed considering geographic characteristics (slope, relief degree of the land surface, etc.). The combination of propensity score matching (PSM) and the double-difference method is used to mitigate the endogeneity caused by selective biases from observable variables. Moreover, we conduct a placebo test by randomly assigning the choices of HSR services to cities to examine whether the results are influenced by omitted variables (Chetty et al., 2009;Li et al., 2016).

Preliminary tests
The State Council noted in its 2008 edition of the Mid-to-Long Term Railway Development Plan that the goal of HSR is to connect provincial capital cities and central cities. This goal explains the phenomenon observed in China well-cities with large populations and a large economy or political importance are the first to be included in the HSR network. It can be concluded from Figure 2 that during the 1999-2014 period, the economic strength of cities served by the HSR network is stronger than that of cities without HSR services. Considering that the 'winner' cities are those that do or are expected to perform well, it is difficult to disentangle the effects of HSR from the natural growth path. Urban economic performance and HSR connection are correlated; hence, estimations that do not tackle this endogenous route placement are likely to be biased. In this article, our solution is establishing an appropriate counterfactual for 'winner' cities. The large-scale HSR network facilitates the identification of suitable counterfactuals. We first look at Figure 3-the histograms of the average local GDP (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006) of 'winner' cities and 'loser' cities. 2 As shown in Figure 3, in terms of economic performance, some of the 'winner' cities far exceed other cities, and some of the 'loser' cities lag far behind other cities. For both groups, it is difficult to find a suitable counterfactual for what would have happened in the absence or presence of HSR services. However, for other 'winner' cities, it seems possible to construct feasible counterfactuals because Figure 3 shows that many of the 'winner' cities and 'loser' cities have similar economic volumes, as measured by the average level of GDP in 1999-2006.
We construct the counterfactual for 'winner' cities by matching each of them with a loser city featuring a close average GDP level and GDP growth trend in 1999-2006. The specific procedure is as follows: 1. we calculate the average GDP of all cities during 1999-2006. 2. search for potential counterfactuals for each winner city based on average GDP.
A loser city is defined as a potential counterfactual for a winner city if the difference on average GDP between them was below 300 million RMB yuan, less than 10% of the average GDP of all cities before 2007. 3. draw the GDP trends for each winner city and its potential counterfactuals, and then select one city which has a GDP trend similar to that of the winner city as its counterfactual. The matched sample includes 64 'winner' cities (the treatment group) and 64 'loser' cities (the control group). 3 This matching method stresses the similarity of economic development between the treatment group and the control group. There are also some limitations of this matching method. First, the poorest and richest cities are discarded from the sample because there is no appropriate counterfactual for them. The estimation bias might arise because discarded samples are systematically different from those retained. Second, our procedure shares the same limitation as the Synthetic Control Approach, namely that there still remains uncertainty about the power of the control group to reproduce the counterfactual outcome trajectory that the treatment group would have experienced in the absence of treatment. In this paper, we employ a set of robust tests (full-sample regression, IV strategy, placebo tests, etc.) to alleviate these concerns.
We believe that the control group forms a reliable counterfactual for the treatment group. First, Figure 4 indicates that before 2007, the level and trend of GDP of these two groups were very similar. Second, Figure 5 shows that there is no evident trend in the difference between the control group and treatment group in terms of the average logarithm of GDP. It is well recognized that many confounding factors, which are observable or not, affect economic development, but the aggregate effects of those determinants did not significantly distinguish the control group from the treatment group. If there are no systemic shocks for the control group or treatment group, the similarity of their economic performance and trend will continue to exist.
Moreover, the economic performance of the control group was even slightly stronger than that of the treatment group before 2007. Notably, after 2007, the year HSR services were launched in China, the difference between the control group and treatment group in the average logarithm of GDP shows a downward trend until 2014. That is, the economic strength of the treatment group caught up and surpassed that of the control group after the introduction of HSR services.

DID analyses
We employ the DID specification-equation (1)-to assess the economic effect of the HSR network in Table 2. All models control for city and year fixed effects. The first and third models in Table 2 do not include control variables, 4 but the second and fourth models do.
The first two columns using the full-sample regression indicate that the HSR network leads to an increase in local GDP of 2% or so. In the first column, the coefficient of the effect of the provision of HSR services has no statistical significance. This could be attributed to the omission of the impacts of time-varying urban properties. However, after the incorporation of these time-varying controls, in the second column, the R-square of this regression increases substantially to 0.924, and the coefficient reaches the 10% level, which is statistically significant. Similarly, the variation between the coefficients in columns (3) and (4) shows that these time-varying controls matter for urban economic development.
As discussed in the above subsection, the control group for the matched sample is an appropriate counterfactual to the treatment group. Thus, we use the matched sample in columns (3) and (4). The estimates of the HSR effect in columns (3) and (4) satisfy significance criteria at the 1% level, and both are larger than the estimates yielded by the full-sample regression. These differences highlight that the close economic strength and growth trend that existed prior to the introduction of HSR services affects the estimation of the impact of the HSR network. It is reasonable to speculate that the full-sample regressions may underestimate the economic effect of the HSR network.
Model 4 suggests that the average treatment effect of the HSR network on local GDP is 0.033, which amounts to 0.7 billion RMB yuan per year. The effect of the HSR network on local GDP is equivalent to moving a city from the 10th percentile of the city-level GDP distribution to the 29th percentile. Overall, Table 2 relatively consistently points to a positive and significant impact of HSR on local GDP. These results are consistent with previous literature on the economic effects of HSR. Meng et al. (2018) find that the ATT of HSR construction on county-level economic growth is approximately 14%. Ahlfeldt and Feddersen (2018) find a causal effect of HSR introduction on the average GDP of three counties with intermediate stops, more specifically, an increase of approximately 8.5%. The baseline estimate is consistent with the circumstantial evidence presented in Figures 4 and 5 and is robust to a series of tests.

Dynamics of the economic effects of the HSR network
The figure plots the impact of the HSR network on the natural logarithm of GDP. We consider a 17-year window spanning from ten years before the launch of the HSR network until seven years after the launch. The dashed lines represent 95% confidence intervals, adjusted for city-level clustering. There is much greater variance in the estimates of the economic effects six or seven years after HSR opening; hence, the estimates may be measured with less precision. Specifically, we report the estimated coefficients in equation (2), accounting for city and year fixed effects. Following Beck et al. (2010), we exclude the year the HSR network was launched to estimate the dynamic effects of HSR on local GDP relative to the launch year. Figure 6 illustrates two key points. First, changes in the difference in local GDP between the treatment group and control group did not precede the launch of the HSR network. As shown, the coefficients on the pre-HSR dummy variables are insignificantly different from zero for all years before the HSR launch, with no trend in local GDP prior to the launch. This establishes the parallel trend assumption between the treatment group and control group, which provides a solid basis for the DID analyses. Second, the positive impact of HSR on local GDP emerges after the provision of HSR services and remains significant at the 5% level until five years after the launch of the HSR network. In sum, changes in local GDP do not precede the provision of HSR services, and HSR service has a promotion effect on local GDP. Note: Standard errors are adjusted for city level clustering and appear in parentheses. Ã , ÃÃ , and ÃÃÃ indicate statistical significance levels at the 10%, 5%, and 1% levels, respectively. Covariates variables include fixed-asset investment, total retail sales of consumer goods, foreign direct investment, total population, students enrolled in universities, government size, infrastructure development, innovation index, density of population, secondary industry as percentage to GDP, tertiary industry as percentage to GDP, marketization level. Same hereinafter. Source: Authors.

Channels of the economic impact of the HSR network
In this subsection, we explore whether HSR service provision gives rise to industrial agglomeration or service agglomeration, which is the potential channel underlying the economic impact of the HSR network.

Industrial agglomeration effect
We first investigate whether the HSR network induces industrial enterprises to cluster towards the treatment group by using equation (2) with a 17-year window to track annual changes in several indicators of industrial agglomeration before and after the launch of the HSR network. Figures 7-10 plot the annual impact of the HSR network on the number of industrial enterprises, gross industrial output, persons employed in manufacturing and the ratio of gross industrial output to local GDP (gross industrial output value/GDP). The dashed lines represent 95% confidence intervals, adjusted for city-level clustering. Following Beck et al. (2010), we exclude the year of HSR opening, thus estimating the dynamic effects of HSR relative to the year of HSR opening. Figure 7 indicates that more industrial enterprises cluster to the treatment group than to the control group after the launch of the HSR network. The difference in the number of industrial enterprises is insignificant for all years before HSR launch. While the number of industrial enterprises in the treatment group shows a significant increase after the launch of the HSR network, an upward trend continues thereafter, with an overall increase of almost 30% seven years after the launch. Figure 8 shows that the HSR network leads to a significant increase in the gross industrial output of the treatment group vis-a-vis the control group. An upward trend starts two years after the launch of the HSR network and continues for the following five years, with an overall increase of almost 30%.  Figure 9 shows the number of people employed in manufacturing in the treatment group vis-a-vis the control group increase immediately after HSR network launch, and the effects remain positive and significant at the 5% level. The impacts grow for approximately two years after HSR opening and then level off, indicating a steady increase in the number of people employed in manufacturing of approximately 30%.
Having found that the HSR network induces the spatial concentration of industry enterprises and boosts gross industrial output, we now explore whether industrial  agglomeration enhancement stimulates urban economic performance. Specifically, we first calculate the ratio of industrial output to GDP. Then, we trace the annual changes in the ratio between the treatment group and the control group. Figure 10 shows an upward trend in the ratio of industrial output to GDP starting immediately after the launch of the HSR network and lasting for the following six years. The effects of HSR service provision become significant at the 5% level beginning in the fourth year after HSR network launch, with an overall increase of approximately 40%.
This subsection reveals a significant increase in industrial agglomeration in the treatment group vis-a-vis the control group after the introduction of HSR services, and industrial agglomeration further gives rise to disparities in regional economic performance. Our results are consistent with extant studies on the relationship between transportation and industrial agglomeration (Wetwitoo & Kato, 2017).

Service agglomeration effect
We now investigate whether the HSR network induces service agglomeration by using equation (2) with a 17-year window to track annual changes in several indicators of service agglomeration before and after the launch of the HSR network. Figures 11-15 plots the annual impact of the HSR network on the number of persons employed in the service sector, finance, ICS (information transmission, computer services and software), WR (wholesale and retail trade), and hotels and catering services. The dashed lines represent 95% confidence intervals, adjusted for city-level clustering. Following Beck et al. (2010), we exclude the year the HSR network was launched, thus estimating the dynamic effects of HSR relative to the year of launch. Figures 11-15 seemingly indicate that the introduction of HSR services has no significant and consistent impact on service agglomeration. There are some short-term shocks in some service industries (Figures 11, 14 and 15) after the launch of HSR services, but these effects quickly fade, and the differences between the treatment group and control group become insignificant. In addition, Figures 12 and 13 show no differences in the scale of financing and ICS between the treatment group and control group before and after the introduction of HSR services. Thus, we cannot reject or accept the potential effect of HSR services without any solid evidence.  It may be that HSR does not change the absolute size of the service sector or certain service industries but could lead to a significant transformation in service sectoral specialization patterns. In particular, it would be interesting to determine whether there are any changes in the relationship between cities with and without HSR services in terms of specialization. To do this, we construct the two indexes to measure  the diversification and specialization of the service sector. The first is the index of service sector specialization, ISS, which is usually defined as where P ijt is the share of employment in industry j in the total employment in the service sector in city i in year t: P jt is the share of employment in industry j in the  total employment in the service sector of all cities in year t: P jt is the service sector employment structure benchmark. If ISS is greater than 1, the degree of specialization in the service sector in city i is higher than the benchmark. The higher ISS is, the higher the level of specialization in the service sector in city i: The second is the index of service sector diversification, ISD, which is defined as Essentially, this index is a way to characterize how the employment structure of a city is different from the benchmark. It can be easily verified that the more diverse a city's service sector is, the smaller the denominator of equation (3) is, and thus, the larger the ISD is. Figures 16-17 plots the annual impacts of the HSR network on the diversification and specialization of the service sector. These two graphs show that there are no significant changes in the service sector specialization and diversification level between the treatment group and control group after HSR network launch.
Even though HSR has the potential to facilitate industrial and service agglomeration, our empirical results only verify the industrial agglomeration effect. This may relate to the uneven development of the service sector between different regions. In China, only some economically prosperous cities in the eastern district have experienced the transformation of their service and knowledge-intensive economy, and most secondary cities are still dominant in the secondary industry. In this context, large cities with large-scale service sectors in the eastern region might enjoy possible service agglomeration induced by the HSR network. This assumption is supported by existing studies (Dai et al., 2018;Shao et al., 2017). The matched sample does not include economically prosperous cities. Thus, it is not surprising that we do not find a significant service agglomeration effect. However, this does not mean that there is no service agglomeration effect, which might exist in some metropolitan areas along some HSR lines in the eastern region of China. In general, we only find the industrial agglomeration effect induced by the HSR network in China.

Robustness tests
In this section, robustness tests, including the IV strategy, PSM-DID methodology, placebo tests and sensitivity tests, are used to address various concerns.

IV Strategy
To address nonrandom HSR route placements between cities, we use an instrumental variable strategy based on the construction of the least-cost path spanning tree network. This concept comes from Faber (2014) and has been advanced by a great body of literature. Rao et al. (2019) develop it for the least-cost HSR network-the hypothetical HSR routes with the lowest construction cost between major cities. They compute the least-cost path network between all major city pairs on the basis of geographical features (land cover and elevation). This strategy identifies the subset of routes that connect target cities on a single continuous network subject to construction cost minimization. Obviously, the hypothetical HSR routes are highly correlated with cities' geographical characteristics but not their recent economic development. Our IV is the least-cost HSR network constructed by Rao et al. (2019).
Following Diao (2018), we first estimate a probit regression with the IV and a set of city-level characteristics (population, GDP per capita, and train passengers) to  predict the likelihood that a city will receive HSR service after 2006. Then, the predicted probability is integrated into the second stage of the DID specification.
column (2) of Table 3 presents the first-stage probit estimates. The dependent variable in this regression is a dummy variable HSR i , which equals one if city i obtained an HSR connection after 2006 and zero otherwise. The IV is PHR i (short for potential HSR routes), a dummy variable that equals one if city s is located in the least-cost HSR network and zero otherwise. The results show that the likelihood of receiving HSR service is positively related to the IV, which suggests that the least-cost HSR network is a good predictor of receiving HSR service, conditional on control variables. column (4) of Table 3 reports the second-stage results of the IV regression. Consistent with Subsection 4.2, the coefficients for the HSR connection are significantly positive, indicating that the baseline DID estimate of the economic effect of the HSR network is robust to potential endogeneity concerns.

Psm-DID analyses
In this subsection, we use the combination of propensity score matching (PSM) and DID to mitigate potential endogeneity-related selective biases caused by observable variables. The first step is to estimate the conditional probability (propensity score) that a city will receive HSR service. We use pre-HSR setup data to run the probit model.
The probit specification is presented as follows: pðX s Þ is defined as the probability of city s receiving the treatment-HSR serviceconditional on a set of observed covariates X s : Treatment s is a dummy variable that equals one if city s was connected to the HSR network during 2007-2014 and zero otherwise. F is the probability function of the linear combination of covariates X s -h X s ð Þ: X s is a vector of covariates including total population, GDP per capita, the relief degree of the land surface, railway and highway passenger volume. These indicators reveal the political importance, economic prosperity, HSR service demand and feasibility of a city. These control variables are set as the arithmetic mean of 8 years (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006). The second step is to construct the treatment group and control group for DID analyses by matching 'winner' cities with 'loser' cities one by one based on propensity scores. The score caliper of one-to-one matching is set as 0.03. Then, the DID specification is used to test for the economic effects of the HSR network. Table 4 shows that the HSR dummy enters positively and significantly in all PSM-DID regressions. The results substantiate the positive effect of the HSR network and confirm the reliability of the baseline estimation.

Placebo tests
In this subsection, placebo tests are employed to examine whether the results are influenced by potential omitted variables. Following previous studies (Chetty et al., 2009;Li et al., 2016), the first step is to randomly allocate the HSR service provision timing of 'winner' cities. Then, DID estimation is performed with virtual 'winner' cities and 'loser' cities. This procedure was repeated 1,000 times to increase the identification power of these falsification tests. Given the random data generation process, the 'virtual' HSR connection variables should produce no significant estimates with a magnitude close to zero. Figure 18 shows the distribution of the estimates from the 1,000 runs along with the benchmark estimate, 0.033, from column 4 in Table 2. The CDF of estimates from random HSR connection assignments is clearly centered around zero, and the standard deviation of the estimates is 0.011, suggesting that there is no effect with the random HSR connection assignment. The benchmark estimate is located outside the 99% confidence interval of the distribution (1 out of 1000 estimates larger than 0.033). The placebo tests suggest that the positive and significant effect of the HSR connection on economic performance is not driven by unobserved factors.

Sensitivity tests
We perform sensitivity tests with the following strategies. First, we expand the sample from 128 to 170 cities by relaxing the matching criteria 5 carried out in Subsection 4.1. Second, we expand the sample period from 2007-2014 to 1999-2014. Third, we substitute the dependent variable from local GDP to the average brightness of lights at nighttime. Nighttime lightness data are gradually perceived as an effective measurement index related to urban economic development since Henderson et al. (2012) demonstrate that nighttime light data can serve as a substitute proxy for GDP over the long term and track short-term fluctuations in growth. We emphasize that there are several open questions regarding the use of light data as a proxy for local economic development. First, there might be some areas for which light data have insufficient resolution, which causes high measurement error. Second, some nighttime light data may not be suitable for GDP estimation at the city level because the analysis unit is small and the light overflow and saturation phenomenon are significant (Dai et al., 2017). In addition, the accuracy of GDP estimation based on nighttime light data is affected by the spatial scale of the study area, the density, the land cover types and the industrial structures of the study area. On the other hand, the use of nighttime light data in sensitivity tests has two advantages. One is that the nighttime lightness intensity is strongly associated with economic performance and development, and the other is that it is free from human errors and manipulation (Henderson et al., 2012). Overall, we consider that light data may be a useful supplemental variable to local GDP.
As shown in Table 5, the HSR dummy variables are positive and significant on at least the 10% level in all regressions, which indicates that the baseline results are robust to sensitivity tests.

Conclusions
This paper tries to test for and quantify the effect of the HSR network on urban economic development. Moreover, we further investigate whether industrial or service agglomeration is the mechanism by which the HSR network promotes urban economic performance. Our analyses are based on the rapid development of the HSR network in China. This large-scale HSR network enables us to construct an appropriate counterfactual for what would have happened to cities' economies in the absence of HSR services, which is the key prerequisite for causality identification.
In this study, we employ national train schedules during the 2007-2014 period and apply DID and other empirical strategies to evaluate the average treatment effect of the HSR network on local GDP. After controlling for time-varying variables and time-invariant city characteristics, HSR network connection contributes to a 3.3-percentage-point relative increase in local GDP. Numerous robustness tests confirm our baseline estimation. Our estimation also suggests that the uneven development gap between the 'winner' cities and 'loser' cities in China will increase because of the development of the HSR network.
Regarding the mechanism through which the HSR network boosts the local economy, dynamic analyses suggest that HSR service provision gives rise to industrial agglomeration, which then promotes local GDP, at least in part. However, the fact that industrial agglomeration does not fully account for the whole economic effect of the HSR network underscores that there remains much to learn about the structural source of estimated spillovers from the HSR network. Moreover, we do not find any valid evidence of the service agglomeration effect using the matched sample. This result may be related to the specific economic trajectory of China. In contrast to cities in developed countries that have shifted from the industrial economy to the knowledge economy, most Chinese cities are still becoming industrialized and urbanized. Therefore, only some major cities with a large-scale service sector are likely to benefit from the service agglomeration effect induced by HSR. It is worth noting that the economic contexts in parallel with HSR development should not be ignored.
Our results will help quantitatively measure the economic impact of the HSR network that is not, or not fully, accounted for by conventional transport CBA. Thus, our findings have important policy implications for countries that plan to upgrade their railway systems to accommodate increasing interregional transportation demand and stimulate regional economic development. In addition, our results are also meaningful for the layout of HSR routes in light of their positive impacts on the economic distribution landscape. For instance, to drive the development of inland areas, the Chinese HSR network gradually expands to the central and western regions, which are not economically developed places.