Measuring spatio-temporal autocorrelation in time series data of collective human mobility

Massive spatio-temporal big data about human mobility have become increasingly available. Revealing underlying dynamic patterns from these data is essential for understanding people’s behavior and urban deployment. Spatio-temporal autocorrelation analysis is an exploratory approach to recognizing data distribution in space and time. The most widely used spatial autocorrelation measurements, such as Moran’s I and local indicators of spatial association (LISA), only apply to static data, so are powerless to spatio-temporal big data about human mobility. Thus, we proposed a new method by extending Moran’s I to measure the spatial autocorrelation of time series data. Then the method was applied to taxi ride data in Beijing, China to reveal the spatial pattern of collective human mobility. The result shows that there is strong positive spatio-temporal autocorrelation within the 5th Ring Road, weak negative spatio-temporal autocorrelation nearby the Sixth Ring Road, and almost no spatiotemporal autocorrelation between the roads. Local spatial patterns of taxi travel were also recognized. This method is useful for discovering underlying patterns from spatio-temporal big data to understand human mobility. ARTICLE HISTORY Received 27 February 2019 Accepted 26 May 2019


Introduction
With the popularity of mobile positioning devices, spatio-temporal big data about human mobility, such as taxi trajectories, mobile phone records, check-ins of social media, and many more, have become increasingly available. Human mobility is a dynamic process for both space and time. The aggregation values that measure a collective activity in a place vary over time, for example, numbers of taxi passengers at a location, visitor densities of a tourist attraction, traffic flows at a road intersection. The variance is recognized as the temporal activity signature in a place, which is a proxy of collective human mobility (Pei et al. 2014;Zhi et al. 2016;Wu et al. 2018). Moreover, the variances also distribute spatially because each place has its own specific temporal activity signature. The underlying spatiotemporal patterns are essential for understanding human behavior and urban structures. These patterns are dependent on not only the spatial distribution of human activities but also the temporal dynamics. Regions with similar temporal variation of human mobility also have similar urban functions, and vice versa (Liu et al. 2012b;. Exploring the spatial autocorrelation of the time series of collective human mobility can give an insight into spatiotemporal patterns. Tobler's first law of geography states that "everything is related to everything else, but near things are more related than distant things" (Tobler 1970). Most geographic phenomena are generally continuous in the real world, but when discretized during modeling, phenomena that are geographically close tend to be alike. Measurements of spatial autocorrelation were promoted for describing the degree of this relationship across space, such as Moran's I index (Moran 1950) and Geary's C index (Geary 1954). A focus on local patterns of autocorrelation and an allowance for local instabilities in overall spatial association have also been suggested, such as local indicators of spatial association (LISA) (Anselin 1995) and G* statistics (Ord and Getis 1995). To expand the visualization of spatial autocorrelation to a multivariate setting, Anselin introduced a Moran Scatterplot Matrix and Multivariate LISA maps (Anselin, Syabri, and Smirnov 2002). Liu extended conventional approaches and proposed measures to quantify global and local spatial autocorrelation for vectors (Liu, Tong, and Liu 2015).
However, these indexes only apply to static data. They are useless for spatio-temporal big data concerning human mobility. Temporal dependency should be addressed in the spatial pattern measurement if a geographic phenomenon varies over time. A spatio-temporal weight matrix was developed to adjust Moran's I index for temporal consideration (Dubé and Legros 2013). Temporal attribute values of geographic events were included in the weight matrix of Moran's I and LISA index (Lee and Li 2017) for measuring spatio-temporal autocorrelation. However, these extended indexes consider spatial observations and temporal values separately by introducing temporal lags into the spatial weight matrix. Porat, Shoshany, and Frenkel (2012) replaced the attribute deviation in the Moran's I equation with the Pearson correlation coefficient between time series to extend the Moran's I and LISA index. Still, this measurement has certain limitations, because the assumptions of normal distribution and linear relationship between variables are not always satisfied by spatio-temporal data of human mobility.
The measurement of the spatio-temporal autocorrelation between two places depends on the similarity of the two time series. Pearson correlation coefficient and other likely methods are not applicable because of the temporal autocorrelation and the periodicity of time series data about human mobility. Dynamic behavior of a time series is one of the criteria to distinguish different mobility sequences, the magnitude of a mobility series that represents travel amount is another significant indicator for identifying mobility modes. Therefore, we focused on both the proximity estimation of temporal trends and the deviation of magnitudes for measuring variation of time series of collective human mobility. Then, we extended the spatio-temporal autocorrelation index utilizing this variation measurement. The new index was evaluated by taxi ride data in Beijing, China to investigate the mobility patterns.

Spatial autocorrelation measurement
Moran's I (Moran 1950) is one of the most popular indexes for measuring global spatial patterns, defined as where X i and X j are the values of observation at location i and j, respectively, X is the mean value of all observations, W is the spatial weight matrix defined according to the spatial proximity, and N is the number of all observation locations. Values of I usually range from -1 to +1. The positive value indicates positive spatial autocorrelation, and the negative value represents negative autocorrelation. The larger the value is, the stronger the autocorrelation is. Its statistical significance is determined by the z-score and its associated p-value.
To measure the spatial heterogeneity and to detect local clusters, the local Moran index, LISA, is widely employed, defined as where I i is the local Moran index of sub-region i, which indicates the correlation between the subregion and its neighborhood. The more similar the observations in sub-region i and its neighborhood are, the larger the value of I i is. To facilitate the correlation between the sub-region and neighborhood, it can be classified into four types. High-High type represents high values surrounded by high values, Low-Low type represents low values surrounded by low values, Low-High type represents low values surrounded by high values, namely cold spot, and High-Low type represents high values surrounded by low values, namely hot spot.

Measuring deviation of time series
For human mobility, the spatial observation is temporal serial, so the measurement of the deviation in classical Moran index is no longer applicable. If X i is a time series, X i À X ð Þwill be the deviation between the time series values at location i and the series of global mean values. Then, the problem becomes measuring the similarity between two time series. We measured the deviation of two time series from two perspectives, the temporal trend proximity, and the magnitudes deviation. The former is estimated by the temporal correlation coefficient of two time series, and the latter is evaluated by the difference of the two mobility volumes. If the mobility at a location has a similar trend with its neighbors, it is high-high positively autocorrelated when its total volume is larger than the mean value, while it is low-low negatively autocorrelated when its total volume is smaller than the mean value. Therefore, the deviation is initially measured by the difference of the accumulative magnitude of two time series and then adjusted by the temporal dissimilarity.
We employed the adaptive temporal dissimilarity measure proposed by Chouakria and Nagabhushan (2007), which covers both temporal correlation and data distance. The temporal trend proximity is evaluated by means of the first-order temporal correlation coefficient, defined as where X T and Y T are two series with T length, X t and X tþ1 are observations at the time t and t þ 1. The result of this formula ranges from -1 to 1. The result of -1 represents that X T and Y T share a similar growth in rate but opposite in direction, while 1 indicates that they have a similar growth both in rate and direction. The result of 0 means their growth rates are stochastically linearly independent, so they have different temporal behaviors. Then, the temporal variance and accumulative volume are integrated to set up the deviation measure, defined as where D X T ; Y T ð Þ is the deviation of two time series X T and Y T , V X and V Y are the accumulative volumes in period 1; T ½ of X T and Y T , respectively. f ðÞ is an exponential adaptive tuning function, which is defined as This function, as shown in Figure 1, tunes the firstorder correlation coefficient from À1; 1 ½ to 0; 2 ð Þ. When the correlation coefficient is 0, the value of the tuning function is 1, and the deviation is identical to the difference of two volume values. When the correlation is positive, the two time series have similar variance, so the value of the tuning function is less than 1; the more similar the two series are, the smaller the value is. On the contrary, the tuned value is more than 1 if the correlation is negative. The less similar the two series are, the larger the value is.

Measuring spatio-temporal autocorrelation
We extended Moran's I index defined as Equation (1) by modifying the deviation measurement for time series data of human mobility. Observations of human mobility are represented as a set of spatiotemporal series X T i ; i ¼ 1; 2; . . . ; N È É , where N is the number of locations, X T i is a time series at locationi, , and all X T i have the same length T. The mean of X T i È É is a series of the mean value at each time, i.e.
Then, the spatio-temporal autocorrelation index is defined as where W is the spatial weight matrix. Z T i is the deviation between the time series at location i and the mean series, which is measured by Equation (4), i.e.
where V i is the accumulative volume of X T i , V is the accumulative volume of X T , CORTðÞ and ϕðÞ are defined by Equations (3) and (5), respectively. Accordingly, the local autocorrelation index of spatio-temporal series is defined as The index I T and I T i measure the spatial autocorrelation of time series data that distribute in space. The autocorrelation indicates the similarity of both the dynamic behavior and the accumulative magnitude of time series in neighbor places. The quantity deviation determines whether, and how much, the autocorrelation is positive or negative; then the similarity of the temporal variance adjusts the degree of the correlation. Mobilities with similar quantity deviation and similar temporal variance at near locations will result in strong autocorrelation.
Hotspots and cold-spots can also be discovered by I T i for analyzing spatial heterogeneity.

Experimental
In big cities, taxis are an important transportation mode, so taxi trajectory provides a good indicator for citizen travel patterns (Jiang, Yin, and Zhao 2009;Liu et al. 2012a;Zhu and Guo 2017). The dynamic quantity of taxi rides including pick-ups and drop-offs produces the spatio-temporal series data about human mobility in a city. We utilized taxi ride data in Beijing, China to evaluate the proposed spatio-temporal autocorrelation index and to analyze the spatial pattern of taxi travel.

Study area and data
The region within the 6th Ring Road in Beijing was selected as the research area. The GPS traces of 3,300 taxis in Beijing from 1 to 28 December 2015 were collected. The statuses of taxis, including ID, position, timestamp, and whether the taxi was occupied by passengers or not, were sampled every minute for 24 h. After filtering out invalid points and trips, a total of 1.07 million pick-up and drop-off pairs in the research area were obtained. The research area was partitioned with 1 km × 1 km regular grid, and then taxi pick-ups and drop-offs were spatially allocated to corresponding grid cells. The total number of pick-ups and drop-offs in each grid cell is spatially distributed as shown in Figure 2. The series of pickups and drop-offs are periodic, so the hourly quantities are averaged into one week period in each grid cell. As a result, the spatio-temporal series data sets about taxi mobility were constructed. The temporal variance is shown in Figure 3.

Results
We constructed two spatio-temporal data sets by taxi pick-ups and drop-offs for evaluating the spatiotemporal autocorrelation index. In each grid cell, the observed variable is a time series of the hourly number of pick-ups or drop-offs that is averaged in one week. We performed the proposed measurement upon the two data sets to interpret the global and local spatial patterns of the temporal taxi travel. The spatial weight matrix is generated by the inverse distance square between any two grid cells, and then is row-standardized.
The global spatial patterns are measured by the spatio-temporal index from the time series pick-ups and drop-offs. The global autocorrelation coefficient of pick-ups is 0.507 with z-score equal to 141.22 and p-value less than 0.01; the global coefficient of dropoffs is 0.369 with z-score equal to102.78 and p-value less than 0.01. This reveals a global pattern of taxi travel. The spatio-temporal autocorrelation of taxi travel is significantly strong in Beijing, and pick-ups are more spatiotemporally autocorrelated than dropoffs. The arrivals of taxi passengers are more disperse in space and time than departures. These results are consistent to the common sense about human mobility in big cities. The local patterns are measured by the spatiotemporal LISA index based on the pick-up and dropoff sequences in each grid cell. The distributions of LISA and spatial patterns are shown in Figure 4. There is strong positive spatio-temporal autocorrelation within the 5th Ring Road, weak negative spatiotemporal autocorrelation nearby the 6th Ring Road, and almost no spatio-temporal autocorrelation between the 5th and 6th Ring Roads. Almost all areas within the 5th Ring Road exhibit a significant High-High autocorrelation pattern. These areas, considered urban areas in Beijing, are densely populated. More taxis and passengers result in much more traffic flow than normal. Near locations are more likely to have similar travel volume and rhythm, thus they are likely to usher in traffic peaks during similar time periods. The spatial autocorrelation is not significant outside the 5th Ring Road. Low population density and diverse land use in these urban fringes results in lower and less regular taxi travel.
We narrowed the study area to the region within the 5th Ring Road for investigating the finer spatial patterns in the city center. The global autocorrelation coefficients of pick-ups and drop-offs within the 5th Ring Road are 0.681 with z-score equal to 87.6 and 0.639 with z-score equal to 82.3, respectively, under the 0.01 significance level. The spatio-temporal autocorrelation of taxi travel within the 5th Ring Road is stronger than that outside, and the pick-ups and drop-offs exhibit similar spatiotemporal patterns.
The distributions of LISA and spatial patterns within the 5th Ring Road are shown in Figure 5. The closer to the city center, the stronger the positive correlation is. Almost all areas within the 5th Ring Road exhibit a significant High-High autocorrelation pattern. The

Discussion
To evaluate the effectiveness of the spatio-temporal autocorrelation measurement, we compared the results with those of the classical Moran's I and LISA indexes. After averaging the number of pick-ups and drop-offs by day, in each grid cell, we measured their classical spatial autocorrelation. The global autocorrelation coefficients of pick-ups and drop-offs were compared with the results of the spatio-temporal autocorrelation measurement, as shown in Table 1. The autocorrelation measured by the proposed method were stronger and more significant than those measured by the classical method, especially within the 5th Ring Road in Beijing. The temporal variances enlarged the spatial autocorrelation. The distributions of LISA and spatial patterns obtained by the classical measurement are shown in Figure 6. Compared to the patterns in Figures 4 and 5, the spatial patterns discovered by the classical method are less significant. Within both research areas, the results of the two methods represent similar distribution. However, the spatio-temporal LISA can recognize finer patterns than the classical index. The spatio-temporal LISA values are more distinguished at different locations, and more Low-Low patterns are discovered. In addition, The spatio-temporal measurement produces significant results at more locations than the classical index.

Conclusions
We proposed a spatio-temporal autocorrelation measurement by extending the classical Moran's I and LISA indexes for time series data of collective human mobility.
The new method integrates both the temporal variance and the value deviation at a location to reveal the spatial pattern. Analyzing taxi rides data in Beijing in this experiment, the method exhibited a good ability to discover global and local spatio-temporal pattern of taxi travel. It can reveal spatial autocorrelation and heterogeneity in collective human mobility. This method and the research results are useful for understanding human mobility and urban structures.