Exploring metro vibrancy and its relationship with built environment: a cross-city comparison using multi-source urban data

Recent urban transformations have led to critical reflections on the blighted urban infrastructures and called for re-stimulating vital urban places. Especially, the metro has been recognized as the backbone infrastructure for urban mobility and the associated economy agglomeration. To date, limited research has been devoted to investigating the relationship between metro vitality and built environment in mega-cities empirically. This paper presents a multisource urban data-driven approach to quantify the metro vibrancy and its association with the underlying built environment. Massive smart card data is processed to extract metro ridership, which denotes the vibrancy around the metro station in physical space. Social media check-ins are crawled to measure the vitality of metros in virtual spaces. Both physical and virtual vibrancy are integrated into a holistic metro vibrancy metric using an entropy-based weighting method. Certain built environment characteristics, including land use, transportation and buildings are modeled as independent variables. The significant influences of built environmental factors on the metro vibrancy are unraveled using the ordinary least square regression and the spatial lag model. With experiments conducted in Shenzhen, Singapore and London, this study comes up with a conclusion that spatial distributions of metro vibrancy metrics in three cities are spatially autocorrelated. The regression analysis suggests that in all the three cities, more affluent urban areas tend to have higher metro virbrancy, while the road density, land use and buildings tend to impact metro vibrancy in only one or two cities. These results demonstrate the relationship between the metro vibrancy and built environment is affected by complex urban contexts. These findings help us to understand metro vibrancy thus make proper policy to re-stimulate the important metro infrastructure in the future. the sustainability of new shared mobility services, such as scooter sharing, carsharing, and ridesharing. His research uses multi-source datasets to advance understanding of pressing urban and transportation issues, e.g., urban expan-sion, emerging mobility services, and the interactions between land use and transportation.


Introduction
Cities aggregate vast infrastructures and populations spatially. They have been experiencing great transformations because of technological innovation, industrial revolution and constant migration. Recent urban transformations have led to critical reflections on the blighted urban infrastructures and called for revitalization of urban places (Martí, Serrano-Estrada, and Nolasco-Cirugeda 2017). Vibrant spaces can not only promote healthy urban livings but also foster urban developments . Urban developments, if well performed, may reinforce urban vibrancy in the future. Particularly, the metro, which is recognized as the backbone of cities' public transportation systems, plays an important role in urban growth (Zheng et al. 2016). The metro system supports millions of urban trips every day and the associated economy agglomeration in the mega-city (Gutiérrez, Cardozo, and García-Palomares 2011;Chakour and Eluru 2016). It is an important urban infrastructure to be vitalized. It is necessary to energize the metro space by quantifying urban vibrancy produced by metro and its influencing factors in cities.
Urban vibrancy is firstly introduced by Jacob (1961) as human activities and the interactions between residents and spaces. Jacobs pointed out that more vibrant streets and neighborhoods tend to engage people in commercial or residential activities. Montgomery (1998)further suggested that a vibrant place should be open space supporting diverse human activities and interactions. In general, the vibrancy signifies the intensity of human activities and interactions. Following this stream, many approaches have been developed to quantify the vibrancy of streets or neighborhoods, such as field observation and interview (Sung, Go, and Choi 2013). For example, Zarin, Niroomand, and Heidari (2015) measured the street vibrancy in Tehran through a random sampling questionnaire survey. Wu et al. (2018b) quantified the neighborhood vibrancy in the suburban of Beijing by a GPS-based activity survey. These studies paid little attention to the metro vitality, but the metro has been recognized as the backbone infrastructure for urban mobility and the associated economy agglomeration in worldwide cities, such as Shenzhen and London. This study examines the metro vitality by limiting the study area in the service area of metro station.
The development of Information and Communication Technology (ICT) reshapes urban living. More and more human interactions are performed online, rather than in physical space. For example, many urban habitats buy foods in Amazon fresh, rather than driving to the supermarket. Online chatting via instant messenger substitutes some face-to-face meeting, especially under the threat of Coronavirus disease 2019 . Online group games also attract more and more young people therefore reduce outdoor games. Human activities and interactions in cities are being changed (Yu and Shaw 2008;Shaw and Sui 2018). Previous studies have demonstrated that less daily activities and travels are conducted (Best and Butler 2015). These studies also imply that the quantification of urban vibrancy in physical space should be extended into virtual space.
On the other hand, location-awareness technology is widely adopted. These new tools enable us to collect massive geospatial urban data, for example, point-ofinterests (POIs), mobile phone records Tu et al. 2017), social media data (Lansley and Longley 2016;Aiello et al. 2016;Chen et al. 2019;Lock and Pettit 2020), smart card data (Zhong et al. 2016) and vehicle GPS trajectories (Fang et al. 2012). These new urban sensing data contain rich information about human activities and interactions in both physical and virtual spaces Monika 2020). Such information enables new ways of quantifying the vibrancy. For example, De Nadai et al. (2016) took mobile phone records as the proxy of human activities and mapped city-wide urban vibrancy in six Italian cities. Chen et al. (2019) examined the spatial distribution of urban vibrancy using Facebook data. These advanced studies demonstrate the effectiveness of new urban sensed data when quantifying vibrant space. Furthermore, built environment, including land use, road network and buildings, has a great impact on fostering urban vibrancy. These studies focus on streets, neighborhoods or the whole city and provide useful insights into revealing the mechanism between vibrancies in these areas and built environment.
Few studies have attempted to reveal the relationship between metro vitality and the underlying built environment. Similar to urban vibrancy, the metro vibrancy describes the attraction, diversity and prosperity of places resulting from human activities and interactions but limited within the service area of metro station. To fill the research gap, this study presents a multisource urban big data-driven approach to quantitatively measure metro vibrancy and its association with urban built environment. Massive smart card data and social media check-ins are leveraged to measure metro vibrancy in both physical and virtual space, which are then integrated into a holistic metro vibrancy metric using an entropy-based weighting method. Several built environmental factors, including land uses and transportation infrastructures, are modeled as independent variables. The Spatial Lag Model (SLM) is implemented to unravel the mechanism of built environment's influences on the metro vibrancy. Three global cities, Shenzhen, Singapore and London, are selected as experimental cities.
The remainder of this paper is organized as follows: Section 2 reviews related literature in the domain of urban vibrancy and urban big data analytics. Section 3 introduces the studied cities and used datasets. Section 4 describes the presented method. Section 5 reports and discusses the results. The last section concludes this study and outlooks future work.

Urban vibrancy
Urban vibrancy, alternatively urban vitality, is originally proposed by Jacob (1961) and illuminated as "liveliness and variety attract more liveliness; deadness and monotony repel life"; Lynch (1984) interpreted vibrancy by decomposing it into three main components: urban morphology, urban function and urban society. Later studies mainly contribute to verify Jacobs' spatial vitality theory and Lynch's hypothesis empirically using emerging data and new measures. Spatial vibrancy, particularly, at the scale of street and neighborhood, has always been an important topic in urban studies and planning practices. A vibrant urban space not only supports diverse human activities and livable urban spaces but also facilitates sustainable developments in the long run (Huang, Wong, and Chen 1998) while the concentration attracted in geographical studies on spatial vibrancy is mostly stimulated by the availability of spatial data, which allows us to describe spatially explicit urban vibrancy in a quantitative form. Earlier work has attempted to, for instance, use surveyed activity data (Sung, Go, and Choi 2013), land use (Jacobs-Crisioni et al. 2014) and housing price (Nicodemus 2013) to proxy urban vibrancy.
The more recent development of spatial vibrancy uses emerging geospatial sensed data. Compared to previous work, human mobility data provides a more straightforward approximation of human activity at a finer spatial and temporal granularity. For instance, human activity was extracted from mobile phone data, integrated with land use and socio-demographic information from the Italian Census and Open Street Map, and tested Jacob's four conditions in six Italian cities (De Nadai et al. 2016). Other mainstreaming measures define urban vibrancy as attraction, diversity and accessibility of a place also based on Jacob's theory. For example, Yue et al. (2017) developed a series of mixed-use indicators profiling characteristics of mixed-use neighborhoods using the mobile phone and POIs data in Shenzhen. Besides, it is worth emphasizing the temporal dimension, as Jacob describes urban vibrancy as street life over a 24-h period. Therefore, spatio-temporal variability is considered in nearly all new measures, which gives a better profile of the dynamics of the space (Wu et al. 2018a;Sulis et al. 2018). Recently, activities in virtual space have attracted much attention. In Sulis et al. (2018), diversity was calculated as from three dynamic attributes: intensity, variability and consistency, using Twitter data and regression model to unveil the most vibrant areas in London.
The metro system attracts much scholarly attention as the main transportation infrastructure. One stream of related works studied the physical human activities around the metro stations by examining the ridership (Zhong et al. 2016;Tu et al. 2018). For example, Gutiérrez, Cardozo, and García-Palomares (2011) quantified the metro ridership in Madrid using smart card data. They developed a distance-decay weighted regression function to explore the association with built environment. Zhao et al. (2013) examined the metro ridership in Nanjing, China. They found that population, business/office floor area, central business district, number of education buildings, entertainment venues and shop centers had a significant influence on the metro ridership. Chakour and Eluru (2016) examined the association of transit ridership and stop-level infrastructure and the built environment in Montreal using a composite marginal likelihood-based ordered response probit model. Their results demonstrated that public transportation, parks, commercial enterprises and residential neighborhoods are significantly associated with transit ridership, in other words, the activities around the metro stations. However, these studies focus on the physical human activities around the metro station and ignore the vibrancy in virtual spaces.
Rather than street vibrancy and neighborhood vibrancy, our work has a specific focus on the metro. The ultimate goal of measures and analysis of the metro vibrancy is to evaluate the urban form or landuse configuration around the metro, which is in line with previous studies Song, Merlin, and Rodriguez 2013). Shreds of evidence have been reported in previous work that the built environment has an impact on urban vibrancy, but the conclusion varies from case to case. For instance, negative impact of mixed land use on neighborhood vibrancy has been observed in high-density cities by Yue et al. (2017); while, in another study (Li et al. 2016), they found that increasing the road density will encourage human travel and the subsequent activities thus cultivate vibrancy. Obviously, more empirical studies and indepth analysis of the reasoning behind spatial vitality in different urban contexts are necessary.

Urban big data analytics
Technological advancements have enabled urban scholars to scrutinize human activity at multiscale and in fine granularity. Mobile phone datasets, smart card transactions and taxi GPS trajectories, dockless bike-sharing and scooters have all received intellectual attentions in the last decade (González, Hidalgo, and Barabási 2008;Zhang et al. 2018). Among these new urban data, smart card data has been heavily explored in transportation research to gain a better understanding of individual travel behavior, which can inform strategic transit planning as well as day-to-day operations (Pelletier, Trépanier, and Morency 2011). Smart card data can be used to estimate congestion dynamics, which includes station-level crowdedness and frequency of missed trains (Ceapa, Smith, and Capra 2012). Transit usage is usually associated with a high degree of regularity (Goulet-Langlois et al. 2018;Zhong et al. 2016). The collective regularity in human travels leads to the emergence of "familiar strangers" in daily life. Also, by digesting historical travel behavior, the regularity makes it possible to anticipate individual movement by machine learning or econometric methods. Moreover, there is a certain degree of variability in individual travel behavior (Morency, Trepanier, and Agard 2008;Zhong et al. 2016). The travel pattern changes can be abrupt, substantial and persistent in certain cases . The smart card can provide wide coverage, but it comes with disadvantages. In many cities, the alighting stop is not immediately available and requires inferences . The data lacks user-profiles and trip purposes, which hinder in-depth applications. Some attempts were found to estimate behavior attributes by fusing travel survey data (Kusakabe and Asakura 2014). Additionally, one dataset alone may offer a bias answer. The multi-faceted analysis of urban studies is needed with a combination of multiple data sources (Zhang et al. 2018(Zhang et al. , 2019. Social media data, especially georeferenced checkin activities, emerge as complementary to the human activity dataset. Different from the aforementioned human activity data, social media data has carried important attributes of human activities, which allows us to explore spatial variations of activities in different categories (Hasan, Zhan, and Ukkusuri 2013). These crowdsourcing data sources reveal individual activities-related choices and preferences, which provides a new perspective for sensing urban place and identifying urban functions (Lansley and Longley 2016;Martí, Serrano-Estrada, and Nolasco-Cirugeda 2017;Shen and Karimi 2016). Fusing it with traditional classification in remote sensing, fine-scale land use map can be derived . Some social media data include geo-tagged pictures, which have broadened the horizon of its applications. The geotagged pictures can help better understand how urban spaces are used by citizens ). Using the real-time crowdsourcing pictures, scholars are able to identify urban emergency events (Xu et al. 2016) and gain a better understanding of noise distribution inside the city (Aiello et al. 2016). On the other hand, social media data indicates the online activity and interaction of social media users via internet; therefore, they provide an opportunity to investigate human activity in virtual space. Using social media check-ins as an example, this study measures the metro vibrancy in virtual space and then presents an integrated vibrancy indicator for deep vibrancy analysis.

Study area
This study is conducted in three cities, including Shenzhen, Singapore and London. Shenzhen is a mega-city located in the Pearl River Delta, to the north of Hong Kong. Since its establishment in 1979 as a Special Economic Zone (SEZ) of China, Shenzhen has become one of the largest and most innovative cities in China, covering 1996 km 2 with 18 million people. The south Shenzhen is the urban center while the north Shenzhen is sub-urban area with manufacturing factories and warehouses. To the year 2017, the metro in Shenzhen contains 8 lines with 167 stations, as Figure 1(a) shows. The second studied city is Singapore, a city-state that covers a total area of 719 km 2 , and a total population of 5.6 million as of 2016. For Singapore, the metro system has 128 stations and covers 149 km, as Figure 1(b). The third city is London, the capital of United Kingdom. The subway in London has a long history since 1956. To 2014, the London metro has 11 lines and 382 stations, supporting 4 million travels per day, as Figure 1(c) shows. To keep the completeness, several metro stations out of London were also displayed .
Outlining the Metro Station Service Area (MSSA) is an important issue for investigating the metro vibrancy. There are many approaches to capture the MSSA, for example, interviewing the destination of metro travelers. Many empirical studies (Gutiérrez, Cardozo, and García-Palomares 2011;Zhao et al., 2013;Chakour and Eluru 2016) have demonstrated that the attraction of the metro tends to decay with the distance to the metro station. The MSSA can be represented by a circular area centered at the metro station. Following previous studies (Zielstra and Hochmair 2011;Zhao et al. 2013), the radius of the circle is set to 750 meters. The vibrancy and the built environment in MSSA are studied.

Datasets
Multi-source urban datasets are used to measure the vibrancy of the MSSA and built environment, including smart card data, social media check-ins and geospatial urban data. Details of these datasets are described as below.
The smart card dataset in Shenzhen covered one week, March 6th to 12th, 2017, containing 12 million records of more than 3 million users. The social media check-ins in the same period were crawled from Sina Weibo, one of the largest social media service providers in China. In particular, 798,789 check-ins in the MSSA were captured. The population data, land use map, road network and building footprints, provided by the commission of natural resource and urban planning, are used to quantify the built environment characteristics.
The smart card data in Singapore that covered one week from Oct 20th to 26th, 2014 is used to infer the physical vibrancy around metros. It contains 2,778,142 users with metro, and 14,839,175 trips are conducted. Tweets are crawled from Twitter to compute the vibrancy in virtual space. Finally, 108, 594 tweets are collected in the same period of EZ-link card transaction. The household interview travel survey in 2012 is used to capture the social-economic characteristics of MSSAs. Road network is acquired from Navteq, a professional map company. 40, 782 POIs are extracted by calling Google Place API. The building floor area of different functions is derived using the method detailed in (Zhu and Ferreira 2015).
The study in London is based on tap-in and tap-out records of Oyster card data, which is used by more than 80% of the passengers in metropolitan areas. The used Oyster card data covers one week in January 2014. The dataset captures a daily average of 3.65 million tap-ins (and tap-outs) at the 382 stations in London. More information about the original smart card data used for calculation in this paper can be found in (Zhong et al. 2016;Sulis et al. 2018). Tweets are crawled from Twitter to capture the activity in virtual space. The POI data and land use map are obtained from the Ordinance Survey (OS), 1 which is the national mapping agency for Great Britain. The road network and building footprints are downloaded from Open Street Map.

Methodology
We propose a multi-source geospatial data-driven approach to investigate the metro vibrancy and unravel the influences of built environment, as Figure 2 displays. The vibrancy in physical and virtual space is measured using smart card data and social media check-ins respectively. An integrated vibrancy indicator is presented to explore the spatial pattern. Furthermore, Ordinary Least Square Regression (OLS) model and SLM are implemented to explore the intrinsic relationship between the metro vibrancy and the built environment. Finally, the impact on the metro vibrancy is analyzed and compared in three cities. The commons and differences between the interactions of the vibrancy and built environment are examined .

Physical vibrancy
The physical vibrancy is measured based on the daily ridership of one metro station as the ridership potentially reflects the intensity of physical activity around the metro. Both the tap-in and tap-out events are summed. To eliminate the differences of the metro ridership in three cities, the physical vibrancy of a metro is normalized to the range [0, 1] with the Min-Max normalization method, as Equation (1), where j is a metro station, R j is the daily ridership of a metro, Rmin and Rmax denote the minimum and the maximum metro ridership in the city respectively.

Virtual vibrancy
The virtual vibrancy is delineated by the intensity of the human activities in virtual space, such as social network, social media platform and online game systems. In this study, considering the available dataset, social media check-ins are taken as the proxy of the Figure 2. The multi-source geospatial-data driven framework to explore the vibrancy of the metro service area.
activity in virtual space. Formally, the virtual vibrancy is measured as Equation (2), where N j is the total geotagged social media check-ins within a MSSA. Similarly, the virtual vibrancy is normalized by the minimum (N min ) and the maximum (N min ) number of the check-ins in all MSSAs.

Integrated vibrancy
A single data set only describes one facet of the vitality, which lacks a comprehensive perspective over urban vibrancy Huang and Wang 2020).
Here, the Entropy-based weighting methodis utilized to integrate the physical and virtual vitality. The Entropybased weighting method is an objective weighting method considering the distribution of values. The Shannon entropy of one type of vibrancy, H k , is measured as Equation (3), where v kj is the vibrancy in a MSSA, j; n is the total count of MSSAs.
The integrated vibrancy index, V i , is calculated as Equation (4), where v k (k = p, v) is one type of vibrancy. As weighting both the physical and the virtual vibrancy, the integrated vibrancy index represents the comprehensive intensity of human activities and interactions in both physical and virtual space.

Associated social-economical and built-environmental factors
Previous studies have revealed that urban vibrancy is associated with three categories of factors, including demographics, socioeconomics, and built environment (Zarin, Niroomand, and Heidari 2015;. In terms of demographics and socioeconomics, the population density, the employment density and the average income are calculated in a MSSA. The built environment is measured by land use, POI, transportation and building floor space ). Using the land use map, the ratio of several typical urban land usages in a MSSA is calculated, including commercial, residential, industrial and open area. On the other hand, the influences of mixed land use on the vibrancy have long been observed. In this study, the Shannon entropy is calculated to indicate the land use mix, as estimated by Equation (5), where p i is the area of the i th type of land use, including residential, commercial, industrial, or open areas.
The transportation infrastructures play an important role in the vibrancy. In general, well-connected road networks and rich public transportation service will encourage urban travels thus breed the urban vibrancy. In this study, the coverage of the road and the bus are examined. The density of the road in a MSSA, D r , is calculated as Equation (6), which denotes the ratio of total route length to the total area of a MSSA, j, S j the area of a MSSA. The density of the bus, D s , is calculated to represent the supply of the bus service, as Equation (7), where N s is the number of bus stops.
Sufficient concentration of urban buildings is vital for urban vitality. The Ground Space Index (GSI) metric is utilized to describe the concentration of buildings in a MSSA. The GSI denotes the density of building footprints, as Equation (8), where S b is the total area of building footprints.
Correlated independent variables will introduce biases to the following regression analysis. To avoid potential bias, the multi-collinearity test is conducted to evaluate the correlation of independent variables. The variables with high Variance Inflation Factor (VIF) are excluded as their effects have been explained by other variables.

Spatial autocorrelation
Spatial autocorrelation measures the degree of correlation of a spatial variable to itself through space. Spatial autocorrelation has been examined by many urban studies (Diniz-Filho, Bini, and Hawkins 2003;Tu et al., 2019). Several indices have been proposed to measure spatial autocorrelation, such as Moran's I, the Geary's C (De Jong, Sprenger, and Veen 1984) and Getis's G (Sawada 2001). Here, Moran's I is utilized to reveal spatial autocorrelation of the vibrancy around the metro station, as Equations (9)-(11), where Zi denotes the deviation of the vibrancy index (Vi) to the mean vitality ( � V), n is the number of MSSA in a city, S0 is the sum of spatial weights, which is measured by the distance reverse weight method (Getis and Aldstadt 2004). The Moran's I falls into the range of (−1,1). The Moran's I greater than 0 suggests a positive spatial autocorrelation while the value less than 0 indicates a negative one. And the value equals 0 shows a random spatial distribution. The Moran's I of three types of metro vitality is calculated in three cities respectively.

Regression models
The OLS is used to unravel the relationship between the dependent variables and independent variables. The OLS formula is given in Equation (12), where y is the dependent variable, X is the independent variables, β 0 , β, ε are the constant, estimated coefficient of independent variables and the fitting residual respectively.
The OLS model takes the integrated vibrancy as the dependent variable. The independent variables are considered demographic, socioeconomic and built environment factors. To compare the impact on the metro vibrancy among three cities, each independent variable is normalized as the formula given by Equation (2).
Spatial dependence is a common phenomenon in geographic distribution. In other words, the spatial variable at a place influences its neighbors. The OLS model neglects the spatial dependence of variables; therefore, it may overestimate the impact of considered factors. There are several geospatial regression models taking the spatial dependence into accounts, such as SLM, Spatial Error Model (SEM), Spatial Durbin model (SDM) (Elhorst 2014) and Geographically Weighted Regression (GWR) model . After preliminary experiments, the SLM is utilized to further reveal the impact on the metro vibrancy.
SLM takes into account the spatial dependence on dependent variables in the regression analysis. The formula of SLM is given by Equation (13), where W presents the spatial weight matrix, X is the independent variables, ρ, β are estimated coefficients about the spatial lag term and independent variables respectively, andε is the fitting residual. In comparison to Equation (12), the effect of near the metro station on the vibrancy is considered by adding the component ρWy. Similar to the OLS models, three SLMs are formed, in which dependent variables are the integrated vitality in each city while associated factors are derived from those highly related to independent variables, which are reported by the OLS models.  Figure 3 displays the physical vibrancy in the MSSA of three cities. Figure 3(a) shows the cumulative frequency of the physical vibrancy. They demonstrate that the physical metro vibrancy in Shenzhen and Singapore have similar distributions: many metro stations are with higher physical vibrancies. 80% MSSAs in Shenzhen and Singapore are with physical vibrancy more than 0.7 and 20% MSSAs less than 0.7. While the distribution of the physical metro vibrancy in London is smoother. Around 40% metro stations with the physical vibrancy are less than 0.5. These distributions imply the physical vitality in the MSSA of London is less polarized . Figure 3(b-d) shows spatial distributions of the physical vibrancy. It should be noted that the kernel density estimation is implemented to estimate the spatial vitality around metro stations. They demonstrate a general spatial trend: the metro vibrancy decays from the urban center to the suburban area along metro lines. For example, the metro vibrancy in Shenzhen (Figure 3b) is mainly decayed from the south to the north, as the south Shenzhen is the original SEZ while the north is the suburban area. Two physical vibrancy centers appear in Futian and Luohu, but the physical metro vibrancy in Singapore displays a typical polycentric spatial structure. Three local vibrancy centers appear in the West (Choa Chu Kang), the North-East (Sengkang) and the Central (old Gloden shoe area). In terms of London, the only center is located at the famous west end and its surrounding areas. The distribution of the physical vibrancy demonstrates a ring-shaped structure, decaying from the urban center to suburban area. The comparative results suggest that urban spatial structure could potentially impact metro vibrancy. Figure 4 displays the spatial patterns of virtual vibrancy in the three cities. Comparing to the curves shown in Figure 3(a), the cumulative frequency curves of these virtual vibrancy shift to the left, indicating that more metro stations have lower vibrancy values. They indicate that more metro stations are with moderate virtual vibrancy, especially in Shenzhen and Singapore. The differences in the number of check-ins around the metro stations in these two cities become larger. But the general spatial distributions of the virtual vibrancy (Figure 4b-d) is largely in line with that shown in Figure 3: high vibrant MSSAs appear at the urban center and sub-centers, and low vibrant MSSAs occur at the suburban area. Especially, the spatial distribution of the virtual vibrancy in Singapore is highly consistent with its physical vibrancy. Three local virtual vibrancy centers appear in the West, the North-East and the Central respectively. Regarding Shenzhen and London, there are some differences between the physical and virtual vibrancy. Some sub hotspots of the physical vibrancy in Shenzhen disappear, i.e. a local vibrancy hotspot at the center of Shenzhen. For London, while the general spatial trend of the virtual vibrancy is similar to its physical vibrancy, some areas such as Canary Wharf show local hotspots, and the core area of high vibrant MSSA becomes smaller. Figure 5 further shows the joint distribution of physical and virtual vibrancy. It demonstrates that the correlation varies from city to city. The activities in physical and virtual space in Shenzhen are quite different as the Pearson's Correlation Coefficient is 0.02, which indicates difference preferences in physical space and social media check-in platforms. While these two types of vibrancies in Singapore and London are moderately correlated with each other as their Pearson's Correlation Coefficient are 0.52 and 0.65 respectively. As London and Singapore are well developed, residents in these cities have similar preferences in physical space and virtual space .

Integrated vibrancy
By combining the physical and the virtual vibrancy, the integrated vibrancy metrics are obtained and displayed in Figure 6. The cumulative frequency suggests that the integrated vibrancy demonstrates a moderate trend between those of Figures 3 and 4. Shenzhen and Singapore share a similar cumulative frequency while London is quite different. Regarding spatial distribution, the integrated vibrancy also achieves the balance between the physical and the virtual  vibrancy. The integrated vibrancy metrics exhibit the spatial gradient. High vibrant metro stations appear at the urban center while low vibrant metro stations are located at the suburban area . Table 1 reports the Moran's I test results of integrated vibrancy. Spatial autocorrelation can be observed for all the z-values above 10, which corroborates the significance of the spatial autocorrelation of the metro vibrancy in Figure 5. It suggests the general spatial dependency of the integrated vibrancy between near metro stations. The integrated vibrancy in London is the most spatially aggregated with the Moran's I indicator, 0.461, while the metro vibrancies in Shenzhen and Singapore are slightly scattered, with Moran's I indicators less than 0.4. These findings further highlight the necessity of considering the spatial dependence of near metro stations when investigating the association with the built environment.

Results of the OLS model
The OLS models are implemented to explore the global relationship between the metro vibrancy and demographic, socioeconomic and built environment. The results are reported in Table 2. Income, road density, bus stop density, land use and building footprints (GSI) are significantly associated with the metro vibrancy in one or more cities. All of them are significant at the 0.05 confidence level. The AIC suggests a good association between the metro vibrancy and the aforementioned factors. In general, the reported factors explain 54.9%, 59.0%, and 74.8% of the variations of the integrated vibrancy metrics in the MSSA of Shenzhen, Singapore and London respectively.
Although the derived vibrancy indicators exhibit differences in terms of spatial distributions and cumulative frequency, the global regression results suggest that income is always significantly associated with the derived vibrancy in three cities, though the influence of income on the metro vibrancy differs. The results in Shenzhen and London demonstrate a positive impact of the income on vibrancy while that in Singapore show the opposite. It may be due to the fact that the high-income population in Shenzhen and London travel more by metro. On the other hand, MSSAs with high income usually have rich amenities thus attracting many visitations.
The remaining associated factors, including multi-type land use (commercial, residential, and industrial land use), land use mix, road density, bus stop density and building footprints have influences on the vibrancy metrics in one or two cities. For example, the road density and the industrial land are only significantly associated with the integrated vibrancy metric in Shenzhen, in line with the history of Shenzhen as a fastdeveloping industrial city. The residential land is highly correlated with the integrated vibrancy metric in Singapore. It somewhat proves the success of transit-oriented urban planning in this city while the bus stop has a great influence on the vibrancy metric in London, which implies the good inter-connectivity of bus routes and the metro network in this city.
Three factors have great influences on the two cities. The commercial land has a positive effect on the vibrancy in Singapore but the negative effect on that in London, which emphasizes the importance of the commercial activities in these two cities. Land use mix is demonstrated with the positive influences on the integrated metro vibrancy metrics in Shenzhen and Singapore, which is in line with previous studies (Jacob 1961). The building footprints (GSI) are significantly associated with the vibrancy metrics in Shenzhen and London. Above all, these results demonstrate that the associations between metro vibrancy and related factors differentiate in different cities because of different urban contexts and social-economic structures.

Results of spatial lag model
Compared to the OLS models, the SLM deals with the influences of near metro stations. Table 3 reports the estimated coefficients of the independent variables and fitting degrees reported by the SLMs. The R2 suggests that 59.5%, 62.9% and 79.9% of the variations of the integrated vibrancy metrics in Shenzhen, Singapore and London can be explained by the associated factors; these results are 4.2%, 3.9% and 5.1%, higher than those of the corresponding OLS models. The AIC values in the three SLM are lower than those of the above OLS models. These outcomes verify the better explanatory power of the SLMs. Hence, the SLM is able to reveal the effect of the spatially auto-correlated variable.
The coefficients of the spatial lag component demonstrate the positive influences of near metro stations on the integrated vibrancy metrics. The coefficients of spatial lag component are 0.274, 0.343 and 0.429 in Shenzhen, Singapore and London respectively. It demonstrates that the vibrancy around the metro station will spill out the influence out their service area. That is because the MSSAs in highdensity cities like Shenzhen, Singapore and London usually overlay with each other, for example, the CBD in three cities are clustered with many stations. Therefore, vibrant MSSAs breed the vitality of near metro stations in all three cities. It further inspires us that fostering the metro vibrancy will have a systematic effect on the whole metro system.
By comparing corresponding coefficients of factors, it further reveals that the influences of other associated factors reported by the SLM are less than those reported by the OLS models. For example, the absolute value of the income's coefficient is 0.138, 0.11 and 0.094 in the SLM of Shenzhen, Singapore and London while 0.160, 0.134 and 0.169 in the corresponding OLS models. The coefficients of other factors also suggest a similar trend. These values verify the good explanatory of the SLM.

The association of metro vibrancy and built environment
This study examines the metro vibrancy in three worldwide cities, including Shenzhen, Singapore and London. The results demonstrate that the income, road density, industrial land, land use mix and GSI are significantly associated with the metro vibrancy in Shenzhen, which partially agree with the case of the urban vibrancy in Shenzhen Tu et al. 2020). As the metro system mainly covers the residential neighborhoods and industrial parks, the income and the industrial land are significantly associated with the metro vibrancy rather than the whole urban vibrancy. This difference emphasizes the heterogeneity of spatial vitality in the city. The metro provides the essential travel service for commuters therefore, their vibrancy is highly impacted by the characteristics of commuters.
The found association of the metro vibrancy with local built environment is in line with some previous studies related to the metro ridership (Gutiérrez, Cardozo, and García-Palomares 2011;Zhao et al. 2013;Chakour and Eluru 2016). But the association differs from cities to cities, as Montreal, Nanjing and Madrid demonstrate that social-economical factors and transportation facilities, i.e. employment, bus stops, number of metro stations and suburban bus lines, have great influences on bus or metro ridership. This may be due to the additional virtual vibrancy by considering human activities in virtual space.

Implications for fostering the metro vibrancy
Vibrant urban space, such as streets and metro areas, benefits human well-being by improving urban living experience. Understanding the interactions between metro vibrancy and built-environment inspires us to foster metro vibrancy. This empirical study in Shenzhen, Singapore and London suggests that: (1) Transportation. Building more roads will improve the metro vibrancy as the results in Shenzhen indicate. On the other hand, providing bus services around the metro station also breeds the metro vibrancy by bringing more people to the MSSA. This is due to the fact that convenient transportation always stimulates human travels, therefore, increasing human activities and interactions in the MSSA. (2) Land use. Results in Singapore demonstrate that increasing commercial and residential land will improve the metro vibrancy as the metro is the backbone transportation infrastructure for commuting travels. More commercial and residential land parcels around metro stations tend to increase shopping activities and interactions. The results in London and Shenzhen suggest that MSSA dominated by one single or two land use types, such as commercial and/or industrial land, tend to have lower metro vibrancy. Last, similar to the suggestion of Jacob (1961), increasing the land use mix will increase human interaction, therefore, breed the metro vibrancy. (3) Amenities. Amenities such as POIs and buildings play a key role in facilitating daily human activities. i.e. housing building and shopping malls. More buildings do improve the density of human activities and interactions, which is also indicated by the results in Shenzhen and London.
On the other hand, many empirical studies have been conducted to investigate the relationship between the built environment and urban vibrancy by taking one city as a case study, such as Yue et al. (2017) and Ye et al. (2018). These studies provide useful insights into quantifying urban vibrancy and unveiling the influences on it. In general, these results can be helpful to the policy-making for urban governors and planning. However, straightforward transferring the obtained knowledge of urban vibrancy from one city to another may be not right. Although this comparative studies in three global cities suggest some universal patterns between metro vibrancy and urban contexts, it is clear that the significant associated factors vary from city to city. Although the knowledge in one city can provide valuable insights into another city, it is not reasonable to transfer the knowledge of one city to another city straightforwardly without the adjustment according to the spatial and social context.

Conclusion
Revitalizing urban areas around metro stations is of importance for urban transformation. To date, limited studies have been devoted to empirically exploring the pattern of the metro vibrancy and its association with built environment. This study presents a multisource urban data-driven approach to quantify the metro vibrancy and unravel the influences of built environment. Massive smart card data is processed to extract metro ridership, which denotes the physical vibrancy of metros. Social media check-ins are used to measure the vitality of metros at virtual space. Both physical and virtual vibrancies are integrated into the integrated vibrancy with the entropy-based weighting method. The built environment, including land use, transportation and buildings are modeled as independent variables. The significant associated factors are unveiled with the SLM by considering the spatial dependence of the metro vibrancy. Case studies are conducted in three global cities, namely Shenzhen, London and Singapore.
The results demonstrate metro vibrancy metrics are highly spatially auto-correlated. The metro vibrancy is high in the urban center and low in suburban area. The results of the OLS and the SLM suggest built environmental factors including road density, bus stop density, land use and building footprints (GSI) have great influences on the integrated vibrancy metrics in one or two cities. But the influences depend on the city for different urban contexts. These findings help us understand metro vibrancy thus benefit the policy-making to foster the vibrancy around metros stations in the future.
This study makes the following contributions: (1) the metro vibrancy is measured using both smart card data and social media check-ins, which produces a comprehensive and multi-facet portrait of the metro vibrancy. (2) A comparative framework is developed to examine the association between the metro vibrancy and associated built environmental factors. Their commonality and differences in the three cities are identified. The proposed comparative framework has significant potential for evaluating other urban concepts, such as urban resilience and urban innovation. (3) The effects of associated factors on the metro vibrancy can be used to evaluate the vitality of the metro and identify appropriate design interventions in an evidence-based manner.
This study still leaves room for explorations. In the future, field survey will be conducted in several metro stations in three cities to validate the measured metro vibrancy metrics derived from multi-source urban data. Second, because of the availability of urban datasets, only the core but a small set of builtenvironmental factors is considered. More data will be collected and more independent variables will be taken into consideration. Note 1. Ordnance Survey (OS) website, https://www.ordnan cesurvey.co.uk/, accessed in June 2019.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was jointly supported by the National Natural

Notes on contributors
Wei Tu is currently an Associate Professor with the Department of Urban Informatics, Shenzhen University. He is also a Visiting Scholar with the Senseable City Laboratory, Massachusetts Institute of Technology. His research interests include urban informatics, smart mobility, urban spatial intelligence.
Tingting Zhu received the M.S. degree from Shenzhen University. Her research interests include big spatial data analytic and smart transportation.

Chen
Zhong is an Associate Professor in Urban Analytics at the Centre for Advanced Spatial Analysis. She received the PhD degree from ETH Zurich. Her research interests are spatial data analysis, machine learning, urban modeling, and data-driven methods for urban and transport planning.

Xiaohu Zhang is currently an Assistant
Professor with the Department of Urban Planning and Design, Faculty of Architecture, The University of Hong Kong. Prior to this, he worked with Singapore-MIT Alliance for Research and Technology, the MIT Senseable City Laboratory, and Sun Yat-Sen University. His scholarship bridges the information gap in sustainable urban and transportation policy-making with stochastic simulation and big data analytics. Broadly interested in urban data science, his recent work explores the sustainability of new shared mobility services, such as scooter sharing, carsharing, and ridesharing. His research uses multi-source datasets to advance understanding of pressing urban and transportation issues, e.g., urban expansion, emerging mobility services, and the interactions between land use and transportation.