Quantitative spatiotemporal impact of dynamic population density changes on the COVID-19 pandemic in China’s mainland

ABSTRACT The coronavirus disease 2019 (COVID-19) and its mutant viruses are still wreaking global havoc over the last two years, but the impact of human activity on the transmission of the pandemic is difficult to ascertain. Estimating human dynamic spatiotemporal distribution can help in our understanding of how to mitigate COVID-19 spread, which can help in maintaining urban health within a county and between counties within a country. This distribution can be computed using the Volunteered Geographic Information (VGI) of the citizens in conjunction with other variables, such as climatic conditions, and used to analyze how human’s daily density distribution quantitatively affects COVID-19 transmission. Based on the estimated population density, when the population density increases daily by 1 person/km2 in a county or prefectural-level administrative unit with an average size of 26,000 km2, the county would have an additional 3.6 confirmed cases and 0.054 death cases after 5 days, which is the illness onset time for a new COVID-19 case. After 14 days, which is the maximum incubation period of the COVID-19 virus, there would be 5 new confirmed cases and 0.092 death cases. However, in neighboring regions, there can be 0.96 fewer people infected with COVID-19 on average per day as a result of strong intervention of local and neighboring authorities. The primary innovation and contribution are that this is the first quantitative assessment of the impacts of dynamic population density on the COVID-19 pandemic. Additionally, the direct and indirect effects of the impact are estimated using spatial panel models. The models that control the unobserved factors improve the reliability of the estimation, as validated by random experiments and the use of the Baidu migration dataset.


Introduction
The World Health Organization (WHO) has declared coronavirus disease 2019 (COVID-19) to be a public health emergency of international concern (World Health Organization (WHO) 2020b).Between 31 December 2019, and the 7th week of 2021,112,348,223 cases of COVID-19 were reported, including 2,484,324 global deaths European Centre for Disease Prevention and Control (ECDC), 2021.Additionally, since 9 March 2020, significant excess mortality has been observed in France, where the dramatic increase in mortality was concomitant with the COVID-19 pandemic (Fouillet, Pontais, and Caserio-Schönemann 2020).Governments worldwide have taken huge efforts to try to control the pandemic, for example, travel restrictions (Murano et al. 2021) and vaccine developments.Although there is currently an unprecedented effort underway to generate more than 200 candidates for a safe and efficacious vaccine in various stages of development -with over 50 candidates in human clinical trials and 18 in efficacy testing (Parker, Shrotri, and Kampmann 2020) -the Coalition for Epidemic Preparedness Innovations (CEPI) estimates global vaccine manufacturing capacity at 2-4 billion doses annually (Kim, Marks, and Clemens 2021), and that its enactment will not occur until 2023-2024(CEPI 2021)).Additionally, because of the differences in the management of the pandemic and the resilience and preparedness of the health and social care systems of different countries, which is also reflected by the heterogeneous mortality effects of the COVID-19 pandemic (Kontis et al. 2020), several new strains of SARS-CoV-2, the causative agent of COVID-19, such as the UK B.1.1.7 lineage (a variant of concern 202,012/1), have emerged (van Oosterhout et al. 2021).Thus, during the next several decades, detecting accurate quantitative factors of the natural transmission (of new strains of) of COVID-19 will still be a top priority to reduce or eliminate this health disaster.
In the fight against COVID-19, Geographic Information Systems (GIS) and big data technologies have played a vital role in several aspects, such as rapid visualization of pandemic information (Jiang and Rijke 2021), spatial tracking of confirmed cases (Xu et al. 2021), prediction of regional transmission (Xu et al. 2021b), and spatial segmentation of the pandemic risk and prevention level (Griffith and Li 2021).This can provide solid spatial information support for the decision-making, measure formulation, and effective assessment of COVID-19 prevention and control (Zhou et al. 2020).Thus, these technologies can strongly support the detection of the transmission spatiotemporal factors of the viruses (including COVID-19 and other such diseases).The correlation between the transmission of COVID-19 and the spatiotemporal factors can provide valuable tools for the application of concepts, methods, and quantitative techniques to address spatial issues in disease and medicine (known as medical geography) (Meade 2014), the community response (Andersen et al. 2021), and governmental policymaking.
An increasing number of studies are being performed to establish the critical factors affecting COVID-19 transmission.Currently, various categories of factors may considerably affect the transmission of the virus, including socioeconomic, environmental, and demographic variables (Mollalo, Vahedi, and Rivera 2020).For example, Wu et al. (Wu et al. 2020) showed that long-term exposure to air pollution could exacerbate the health effects of COVID19 cases, and weather conditions may contribute to the severity and rate of spread of the virus (Wang et al. 2020).However, some unobserved factors are always neglected, despite them being critical in estimating the impact of the virus.These unobserved factors include those for which we cannot get adequate data or those that cannot be quantified in the real world, such as individuals' own pandemic-prevention measures.
Furthermore, because the spread of COVID-19 depends primarily on human-to-human transmission among close contacts (Li et al. 2020), the dissemination of the virus was mostly determined by human mobility (Tian et al. 2020;Kraemer et al. 2020;Rader et al. 2020).Additionally, human movement and contact rates play an important role in determining how infectious diseases are transmitted (Wesolowski et al. 2012;Lai et al. 2019).For example, counties in China that are intersected by railways, freeways, or national highways, or that have airports, have a higher risk of COVID-19 transmission (Wei et al. 2020).Thus, variables associated with human movement activities are one of the most important factors affecting COVID-19 transmission.As one of the representations of human movement activities, dynamic changes in population density at high spatiotemporal resolutions (e.g.daily for a city-wide administrative unit) (Zhang et al. 2020b) should be considered in the correlation analysis of how dynamic population density changes affect COVID-19 transmission.However, many studies believe that the static census population density distribution is an adequate demographic factor.More dynamic population density distributions were not considered or incorporated into the analysis models, which may cause a huge bias in the quantitative estimation of the effect of population density changes on the COVID-19 pandemic.Furthermore, since high local population density (static, not dynamic) has been confirmed, which might catalyze the spread of new pathogens due to higher contact rates with susceptible individuals (Rocklöv and Sjödin 2020;Kraemer et al. 2015), in-depth research related to changes in population or human movements, which play a key role in infectious disease transmission, is scant.
Because the long-term impacts of the COVID-19 crisis differ with population density, as has been shown for the case of five south Indian states (Arif and Sengupta 2021), it is also urgent to detect the different patterns of the impact from dynamic population density changes in different regions and the way they interact.For example, how people's density changes in a local city can affect the local COVID-19 pandemic transmission and how it affects, or can be affected by, any neighboring cities.Several traditional statistical models, such as a generalized additive model (Ma et al. 2020) and a generalized linear model (Liu et al. 2020), can detect the normal relationship between several variables, but with COVID-19, they may perform inaccurately.This is because the transmission of viruses may have strong spatial spillover effects (Alexander, Lenhart, and Anaya-Izquierdo 2020) resulting from the spatial interactions occurring among individuals, which cause further virus transmission.Thus, geographical regression models such as the Bayesian spatial-temporal regression model (Jiang et al. 2021) and an exploratory spatial data analysis method (Xie et al. 2020) are applied to adjust the estimation error caused by spatial spillover effects.However, most of the datasets used have low spatiotemporal resolution (e.g.monthly for a province-wide administrative unit), a single study area (e.g.only one city, such as Wuhan), and an aperiodic study period (e.g. during the first few months after the outbreak of COVID-19 in China).
Meanwhile, when the dynamic population density distribution is considered as the key factor, this dataset, with high spatiotemporal resolution in two dimensions of the individual (geographic targets) and time, can fit a panel data model (panel data is a subset of longitudinal data where observations are for the same subjects each time in a time-series), which can increase the impact estimation accuracy by controlling the unobserved factors (variables) (Wooldridge 2009).Then, because the transmission of COVID-19 has been proven to possess a strong spatial effect (Kang et al. 2020;Zhang et al. 2020c), spatial regression models should be applied to such a study.
Additionally, to detect the quantitative influence of the potential impact factors on the transmission of COVID-19, where these impacts are from both localities and any neighbors, a GIS-based spatial panel model can be effectively applied while considering spillover effects (Guliyev 2020).
Hence, this study regards the relative dynamic people population density distributions as a key factor, which is computed from the voluntary shared positioning data of the citizens of China (this study only focuses on China's mainland area), with other supplementary variables, such as climatic conditions to analyze if and how the dynamic people's movement can affect the transmission of COVID-19 quantitatively.The primary method used is a spatial panel data model based on GIS and data mining processes to build a relationship model between the population density and daily new confirmed and mortality cases of COVID-19.This study covers the whole period of the first wave of COVID-19 (we define the first wave as being from the first case reported until the daily confirmed case number becomes stable, that is, from 1 December 2019, to 23 April 2020), using the more fine-grained spatial resolution of a county-level administrative unit (see Section 3.1 for a definition of this), for China's mainland.
Thus, the primary innovations and contributions of this study are as follows: In a wide study region, dynamic population density distributions at higher spatiotemporal resolutions (county-level, daily) are used to detect how human activity can affect the transmission of COVID-19.To the best of our knowledge, this is a new concept.This is fundamentally different from the existing studies that use static census data to estimate population density.Unlike previous work, our dataset can provide a quantitative result compared to a qualitative result.Second, using a spatial panel data model, the direct (for the local region) and indirect (for neighboring regions) spatial effects of the impact can be computed quantitatively and qualitatively.Third, the spatial panel data model can also contribute to the accuracy of the estimation results by estimating the error caused by unobserved factors that may have a significant impact on the transmission of COVID-19.
Such estimations can give governments (e.g.China) more science-based suggestions to take more targeted measures for controlling people's density in a specific region, beyond trying to advise simply maintaining a social distance between every person.The measures could guide authorities, to a certain extent in keeping a balance between resuming industry and commerce and revitalizing the economy and controlling the spread of the pandemic, in a more informed way.Meanwhile, citizens could return to their normal modes of work and life to a greater extent based on such government measures.
The remainder of this article is organized as follows: Section 2 presents the related work, while Section 3 introduces the dataset and method used.In Section 4, we evaluate the results of the study with its discussion in Section 5. Finally, in Section 6, we present our conclusions, policy implications, limitations, and thoughts for future research.

Related work
In China, there has been considerable research focusing on COVID-19 spatial analysis using GIS methods.Guan et al. (2020)  Many related studies focused on the transmission factors of COVID-19 on a spatiotemporal scale in China.Gross et al. (2020) studied the spatiotemporal propagation of the first wave of the COVID-19 virus in China.They draw a comprehensive picture of the spatial transmission from Hubei to other provinces in China, considering the distance, population size, and human mobility and their scaling relations.Further, the measures of both the national and local governments of China, such as the strict quarantine, are strongly referred to in this study.Xie et al. (2020) found that that the pandemic spread is mainly affected by population inflow and outflow from Wuhan and that the strength of economic connection, and other factors, such as population distribution, transport accessibility, average temperature, and medical facilities could impact the pandemic spread to varying degrees.
In terms of other countries, place-based factors (Mollalo, Vahedi, and Rivera 2020;Sugg et al. 2020), such as the median household income, income inequality, percentage of nurse practitioners, percentage of black female population, average household size, and county-level socioeconomic factors in neighboring counties (Baum and Henry 2020), explain significant variation in COVID-19 incidence in the U.S. In England and Wales, the importance of considering demographic and socioeconomic factors in anticipating local spikes in health-care demand related to the COVID-19 pandemic is recognized (Verhagen et al. 2020).In Europe, the spatial association between the socio-demographic variables and COVID-19 cases and deaths was evaluated, where many corresponding factors, including demography, climatic, cultural, or socioeconomic differences among the countries, can contribute to the uneven distribution of the COVID-19 confirmed cases (Sannigrahi et al. 2020).
Some studies specifically focus on the climate (including air quality and weather conditions) factors that may influence the transmission of COVID-19.After controlling for population migration, the study of Ma et al. (2020) suggested that meteorological factors, particularly temperature variation and humidity, play an independent role in COVID-19 transmission.Similarly, local weather conditions with low temperatures, a mild diurnal temperature range, and low humidity likely favor the transmission (Liu et al. 2020).The study of Qi et al. (2020) suggested that both daily temperature and relative humidity influenced the occurrence of COVID-19 in Hubei province and some other provinces.Zhu et al. (2020) found that there is a statistically significant relationship between air pollution and COVID-19 infection.Short-term exposure to the higher concentrations of PM 2.5 , PM 10 , CO, NO 2 , and O 3 is associated with an increased risk of COVID-19 infection.
It is evident that in different countries, researchers focus on nonuniform impact factors of transmission.And even using the same factors such as temperature and humidity, the qualitative and quantitative conclusions vary.This is because temporal and spatial heterogeneities could result in big differences in the transmission patterns of COVID-19.However, multivariable factors should be considered as much as possible if the regression and machine learning models cannot reduce the estimated error brought by unobservable variables.Thus, a panel data structure with spatial models seems to be a good choice to address this problem because it can model the individual heterogeneity of spatial units (individual effects) and model missing variables and estimation errors more efficiently (Elhorst 2014).Additionally, such models can manage the spatial autocorrelation effect of the dependent variable and correctly analyze the affecting factors and their spatial spillover effects.Such models have been applied to model the transmission of COVID-19.For example, the study (Guliyev 2020) examined the factors affecting COVID-19 together with their spatial effects and used spatial panel data models to determine the relationship among the variables, including their spatial effects.Using spatial panel models, they analyzed the relationship between confirmed cases of COVID-19, deaths, and recovery cases due to treatment.However, such studies only used the province-level of China with static population density (census population density for each province) as the individual spatial dimensions, which offered only coarser resolutions in both time and space.
In terms of other factors that can also represent people's movement, transport usage seems to be important.The rapid development in transportation infrastructure and the liberalization of migration restrictions in the last decade can explain why 28% of the infections appeared outside Hubei province (Li and Ma 2020).Another study (Hadjidemetriou et al. 2020) used driving, walking, and transit real-time data to investigate the impact of government control measures on human-mobility reduction in the UK and found that human-mobility reduction had a significant impact on reducing COVID-19-related deaths, thus providing crucial evidence in support of such government measures.The study (Huang et al. 2020b) leveraged the human-mobility data collected from a widely used Web mapping platform in China to look into the significant change in people's transportation-related behaviors during the COVID-19 pandemic and conduct data-driven analysis.However, such transport data cannot cover all of the people's dynamic distributions at a high spatiotemporal resolution, and does not study how changes in the whole population distribution can impact the COVID-19 pandemic.

Methodology
This section first describes the data, including the dependent and independent variables of the spatial panel data model with preprocessing, and then presents the details of the model.

Data and preprocessing
The situation of the COVID-19 pandemic can be represented at two scales: the dynamic spatiotemporal changes of the incidence and the death cases due to COVID-19 (Liu et al. 2020).Two features can be identified and described as daily new confirmed and death case numbers, which are both time series.When we consider the transmission of COVID-19 over China, the county-level could be regarded as a finegrained resolution.Thereafter, the dataset we used is organized as a panel structure (Hsiao 2005), where a panel data regression model could be applied.We first introduced the main independent and any preprocessing variables, then the dependent and other independent variables followed.

Tencent population density
Humans spread viruses to one another through contact and droplet, airborne, and fomite transmissions (World Health Organization (WHO) 2020a).Thus, people's density is a key reason that affects the transmission of COVID-19.
Tencent positioning data, which is one of the most popular volunteered positioning data of citizens in China, could be used to compute Tencent Population Density (TPD) to represent the dynamic relative people population density in China (Zheng and Zhang 2020;Zhang et al. 2020b.The spatial structure of the Tencent positioning data consists of vector points that are stored as SHP format files (ESRI 2021), which are evenly distributed on a map of China.For each SHP file, millions of points cover China at a specific time, for example, 8:00 p.m. on 1 March 2020.At each point, there is a feature named Recorded Number of People (RNP) that represents the related people number on this point at this time.In terms of the temporal structure of the Tencent positioning data, the data is collected hourly, which means every hour there is an SHP file generated from the website.In this study, because the temporal resolution was daily, we computed the daily average values of the points in SHP files every 24 hours.Thus, the TPD at a specific time can be computed by dividing the RNP by the area of an administrative unit at that time.During the study period, there are 145 corresponding TPDs all over China that are analyzed each day.This means, compared to traditional static population density data, for example, from a census every decade, the daily TPD is much more dynamic.
An example of the raw Tencent positioning data structure, which consists of points with spatial coordinates and a person number at each hour, is shown in Figure 1(a).Because this study aims to estimate the quantitative impacts of people's density on the pandemic of COVID-19, the real dynamic population density should be estimated based on the TPD. Figure 1(b) shows an Estimated Population Density (EPD) distribution in 2015 (Xu 2017), which is the latest available census population density, so it is regarded as the ground truth.Additionally, it is known that population density changes relatively over a short period of time.
From Figure 1(a), the classified Tencent positioning values can reflect well the characteristics of population density of China (Ge and Feng 2010), where the population densities become denser from the Northwest to the Southeast, and as populations tend to cluster along rivers and coastlines.For example, the Yellow River that is located in North China makes this area dense, while the regions located at Hunagand Yangtze River Delta and Pearl River Delta also attract more settlers.The average Tencent Population Density derived from the Tencent positioning data has been shown to exhibit a simple linear correlation with static census population density (Zheng and Zhang 2020).Thus, we build a simple linear regression model for the two density distributions, which lead to converting TPD to a real population density.
Thus, we unify the two density distributions to a county level.In terms of the TPD distribution at each hourly time, we count the people number in each administrative unit by summing all the values at all points in the unit, then divided by the area of the unit, to get the TPD at the county level.During the study period, there are 145day � 24hour ¼ 3480hour Tencent Population Density.Because the EPD is static, we compute the average of the 3480 distributions to obtain the Average Tencent Population Density (ATPD), which is shown in Figure 1(c).Note that in Figure 1(c), the tiny values (the density is very small), e.g.Hegang city in the north of China, are not displayed in the figure.While in Figure 1(d), the null values, for example, Hong Kong, Macao and Taiwan, are excluded in our study.Then, in terms of the EPD, we compute the average value for each administrative unit, then the distribution that has the same spatial resolution as ATPD is plotted in Figure 1(d).
A simple linear regression of the two density distributions is performed, where the results are reported in Section 4.1.

Panel data setup
To detect how dynamic population density changes can affect COVID-19 transmission in time and space, we set the new confirmed case number, new death case number in a day as the dependent variables in the regression, respectively.Because the interval -time from illness onset in a primary case (infector) to illness onset in a secondary case (infectee) of COVID-19 has been estimated to be 4.6 days (Nishiura, Linton, and Akhmetzhanov 2020), while a 14-day delay is the maximum incubation period of the virus (Backer, Klinkenberg, and Wallinga 2020), in this study, the situations of a 5 days lag (≈ 4.6 days) and a 14 day lag of COVID-19 new confirmed and death cases are also considered.
Other supplementary factors that may impact the transmission of COVID-19 are set as the independent variables as well.The reasons are as follows.
Socioeconomic: Age is associated with communitylevel vulnerability (Andersen et al. 2021).Groups considered to be high-risk include older adults and those with underlying medical conditions, such as obesity, diabetes, and heart disease (Centers for Disease Control and Prevention (CDC) 2020).Thus, we take the number of people whose age is over 65 (the variable has been named as AO), and the Elderly dependency ratio (ER) of each province into consideration.Further, hospital bed number (BN) is also a key impact as more beds in a hospital means a better chance of survival (Mollalo, Vahedi, and Rivera 2020).
The study (Mollalo, Vahedi, and Rivera 2020) found that place-based factors such as median household income, percentage of nurse practitioners, can help explain the significant variations in COVID-19 incidence.Other adverse socioeconomic factors, such as poverty and unemployment, are likely to cause negative health consequences as well (Sun, Hu, and Xie 2021).However, the Chinese government put in place nationwide aid (medicine, nurses, and doctors) against COVID-19, e.g.Hubei's battle (XINHUANET 2021), while the Chinese authorities regulated that all COVID-19 patients would have their medical expenses fully covered by social health insurance and public finance (World Economic Forum (WEF) 2021).Thus, medical insurance, poverty, and unemployment, percentage of nurse practitioners would not be regarded as the key factors in our study.
Environmental factors: Average and minimum temperature were significantly associated with the COVID-19 pandemic (Bashir et al. 2020;Chen et al. 2020a).Further, humidity also plays a significant role in the seasonal spread of coronaviruses (Sajadi et al. 2020;Poole 2020).Thus, in terms of the weather conditions, we consider temperature (TP) and dew point (DP) which plays the same role with relative humidity as the key environment factors for COVID-19 transmission.
Air pollution plays key role in the influence of human health (Zhang et al. 2020a;Zhang, Rui, and Fan 2018).Previous studies have suggested that ambient air pollutants are risk factors for respiratory infection by carrying microorganisms to make pathogens more invasive in humans, affecting the body's immunity to make people more susceptible to pathogens (Becker and Soukup 1999;Cai et al. 2007;Horne et al. 2018;Xie et al. 2019;Zhu et al. 2020).Since COVID-19 is a respiratory disease, SARS-CoV-2 could remain viable in aerosols for hours (Van Doremalen et al. 2020).A positive relationship between PM 2.5 , PM 10 , CO, NO 2 and O 3 , and COVID-19 transmissions exists (Zhu et al. 2020).Thus, in terms of the air quality factors, AQI with six main air pollutants should also be considered.
However, among the independent variables above, some of them might be highly correlated, which can cause multicollinearity in regression (Alin 2010), for example, AQI is computed by other 6 pollutants.Thus, by following the study (Hu et al. 2021), Variance Inflation Factor (VIF) that can measure how much the variance of regression coefficient is inflated due to multicollinearity is used to exclude the redundancy of the independent variables (VIF1 >10 is used as the cut score (James et al. 2013)).The comparison of the selection of the variables before and after the VIF analysis is reported in Appendix 1.The determining variables definition, resources, and summary descriptions of the dataset are illustrated in Table 1.In the last part of the paper, we called different models based on the dependent variables names, for example, when the dependent variable is NC by using a Spatial Durbin Model (SDM), the model is called NC-SDM.
The basic data structure is determined as: • Temporal resolutions: daily.
• Temporal period: 1 December 2019 to 23 April 2020 (145 days).  1 is not used to build a model.

Model
The transmission of COVID-19 over China is not independent, for example, the confirmed or death case number of a city might be impacted by the case numbers of contiguous cities.Thus, ignoring the spatial correlation associated with cases of COVID-19 flow, may lead to an erroneous model setting.Based on this, this paper selects the spatial panel data analysis technology that takes the spatial correlation of cases of COVID-19 (including both confirmed cases and death cases) into account, to investigate its relationship to dynamic population density changes.At the same time, the direct and indirect effects are computed using the spatial panel data model.When applying a spatial panel data model, a spatial matrix is necessary.Thus, in this section, we first introduce the spatial matrix, then the spatial penal data model is proposed.Third, the special outputs of the model, that is, direct effect and spatial spillover effect, are defined.

Spatial weights matrix
The formula of a spatial weights matrix W is given in Equation (1).Supposing there are n spatial objectives, while w is the weight value between every two spatial objectives.For example, w 13 is the weight value of the first and the third objectives.
There are two common weight values, which can construct two kinds of spatial weights matrix.
A spatial contiguity matrix is defined if the two spatial objectives, for example, two counties are adjacent, the w of them is set as 1.But, if they have no adjacency, the value is 0 (Equation ( 2)).
Another matrix is a spatial inverse distance matrix, where the w between the objective i and j is 1=d ij , as shown in Equation (3).To be specific, w is calculated as the inverse distance between the geographical centers of county i and j.
A spatial contiguity matrix highlights the impact from the closest neighbors, while a spatial inverse distance matrix underlines the effect that this diminishes as distance increases.The two spatial matrixes are simultaneously used to test the robustness of the spatial regression model.Therefore, we use both matrixes as inputs to our spatial models.

Spatial dependency examination
Before using a spatial model, Moran's I spatial statistic is needed to test whether spatial dependencies (also well known as spatial autocorrelation) in the dependent variables exist (Kang et al. 2020;Li, Calder, and Cressie 2007).In this study, we examine the monthly spatial dependency for COVID-19 related dependent variables by computing the global Moran' I (Goodchild 1986;Saffary et al. 2020).
A positive global Moran's I value with statistical significance indicates spatial clustering, while a significant and negative Moran's I value indicates spatial dispersion across the study region.However, because the panel structure of the dataset is applied in our study, where both dependent and independent variables are continuous in both time and space, the daily or monthly Moran's Is are hard to examine the spatial dependency for the panel data structure, which can be alternatively expressed in several other ways, for example, spatial lag effect (Elhorst 2001).Thus, another index such as a coefficient of the spatial lag effect in a model is needed and proposed as a further test.We further use the coefficient δ in Equation ( 4) to illustrate this point (LeSage and Pace 2009) in the next section.

SDM
There are two common spatial panel data models, a Spatial Autoregressive Model (SAR) (Griffith 1988), which only considers the spatial dependent variables, and a Spatial Error Model (SEM) (Anselin and Griffith 1988), which contains only the autocorrelation of spatial error terms.However, the conduction of spatial effects including spatial heterogeneity and spatial dependence may occur simultaneously and represented as the variation of the error term caused by the spatial lag of the dependent variable and random impact.Thus, a Spatial Durbin Model (SDM) was used (LeSage and Pace 2009) to consider the spatial interaction, for example, a dependent variable in one city is not only influenced by the independent variables in the same city, but also by both dependent and independent variables in one or more contiguous cities.
This study mainly uses SDMs to detect the quantitative relationship between the confirmed or death case number and dynamic population density changes.Nevertheless, because different spatial panel data models are proposed for different purposes, to obtain the best fitted spatial panel data model, we present a process to pretest the robustness (prerobustness test) of the SDMs: first, building SDMs for the datasets, which have been described in Section 3.1.2.Second, testing if the SDMs could be simplified to SARs.Third, testing if the SDMs could be transferred to SEMs.Finally, using the Hausman test to determine if we use a Fixed Effect (FE) or a Random Effect (RF) to the SDMs.The specific steps are shown below.
Second, when the spatial interaction examined by the SDM model does not exist and there is only a oneway spatial correlation between regions, which means Thus, we only test if the θ 1 ,θ 4 ; γ; # in Equation (4) equal zero, to determine if the SDM could be simplified to Equation (5) (SAR), which is set as Hypothesis 0. If the hypothesis has been rejected (p < 0.5), the SDM cannot be transferred to a SAR.
Third, when the coefficient of spatial interaction θ i , the spatial lag coefficient δ and the regression coefficient β i of the dependent variable meet the requirements of θ i ¼ À δβ i , γ ¼ À δα, # ¼ À δρ, then the model changes to a Spatial Error Model (SEM): Thus, to set a Hypothesis 0 based on a nonlinear test, if the hypothesis has been rejected (p < 0.05), the SDM could not be transferred to an SEM (Equation ( 6)).
Finally, we consider using a FE or RE estimator to perform the SDMs.Because in Equation ( 4), ε it ¼ λ i þ υ it , for random effects, we assume the λ i are part of the composite error term ε it .To construct an efficient estimator, we need to evaluate the structure of the error and then apply an appropriate generalized least squares estimator to find an efficient estimator.
The Random Effects estimator has the standard generalized least squares form summed over all individuals in the dataset: In Equation ( 7), e T ¼ 1; 1; 1 . . .; 1 ð Þ is a unit vector of size T.
The fixed effects estimator can be written in: If there is no correlation between regressors and effects, then FE and RE are both consistent, but FE is inefficient.
If there is a correlation, FE is consistent and RE is inconsistent.Under the null hypothesis of no correlation, there should be no differences between the estimators.Thus, we follow the process of the Hausman test (Hausman 1978) to determine which effects estimator is better.The covariance of an efficient estimator with its difference from an inefficient estimator should be zero.Under the null hypothesis, we test: If H is significant, we should use the fixed effects estimator.Otherwise, a random-effects estimator is better.

Direct and indirect effects
In a spatial panel data model with the spatial lag term, the influence of the independent variable on the dependent variable cannot be simply represented by the regression coefficient.According to the different scope and objects of spatial effect, LeSage and Pace (LeSage and Pace 2008) divided the influence of independent variables on dependent variables in a spatial panel data model into direct effect, indirect effect (spatial spillover effect).A direct effect reflects the average influence of an independent variable x on y in this region.An indirect effect reflects the average influence of independent variable x on y in other regions.Later, they found that a partial differential method can make up for the shortcomings of the point estimation method in explaining the spatial effect.This more effectively explains the impact of any random impact on each variable, and determines the direct effect and indirect effect of an independent variable on the dependent variable in a spatial econometric model (LeSage and Pace 2009).This further processing is defined below: First, transferring the general form of an SDM to: Second, using the setting below: Then the formula becomes: which is equal to a matrix form: Within Equation ( 13), m ¼ 1; 2; . . .; k, which represents the k-th independent variable.The right part of the equal sign is a partial differential matrix, where the diagonal elements reflect the average influence of the change of X ik variable in a specific space unit on the dependent variable of the unit, namely direct effect.The non-diagonal elements represent the average influence of the change of X ik variable in a specific space unit on the dependent variable of other space units, namely the indirect effect (also well-known as space spillover effect).Equations ( 14) to (15) give the formulae for direct and indirect effects:

Validation
To test our results of the estimation in both qualitative and quantitative scales, two validation strategies are proposed.

Validation strategy 1 (VS1): random experiment
Random experiments can solve the potential endogeneity issue brought about by an omitted variable (Berk 1983), which has been widely used in the crossvalidation of panel data models (Zhang et al. 2021).A ten-fold cross-validation has been conducted in this study.The process is described as below.
Over the study period, we randomly extract 60% samples of data in temporal-continuous days (87 days) for 10 rounds, where each round is regarded as one random experiment, and then build SDE models for 10 rounds.To quantify the reliability of the results, the Reliability Indices (RIs) for both coefficient (β 1 in SDE model) and its significance has been defined as below, where in Equation ( 16), RI of the coefficient is represented by RI coef : , and β 1m indicates the β 1 in the main result, while β 1cvÀ mean indicates the mean β 1 values over the 10-round random experiments.Similarly, RI of the significance is represented by RI sig: , and N cons: indicates the number of times this is consistent with the main results, while N total means the total number of the random experiments, which is 10 in this study.
These parameters can provide key references to the reliability of the SDE models, which also improve the evaluability of the study.

Validation strategy 2 (VS2): independent dataset test
To further test the robustness of the results of this study, VS2 uses an independent dataset to validate the results.
As another popular volunteer GIS data, Baidu migration data (Baidu 2020), that is also widely used in GIS-oriented work (Wei and Wang 2020;Zhan et al. 2020;Chen et al. 2017), can be applied as an independent test.The migration of people from city to city throughout the country was recorded as location-based data using the search engine Baidu.Every city is represented using two vectors.The first one describes the situation of access to a city, called the Migrate In Index.The other index reflects the population movement from this city to another one, called the Migrate Out Index).The two indices are used on a daily dataset.These indices were only indicative of the relative volume of movement of people from one city to another.Thus, the migration strengths of cities, serve as indicative measures of the human transfer volume moving in and out of administrative units (Zhan et al. 2020).
However, the difference between the Migrate In Index and Migrate Out Index can reflect the population Net Increase (NI) of an administrative unit in a day, which has the same effect to measure the people's density change at a spatiotemporal scale, qualitatively.Thus, the Baidu migration NI (BMNI) can validate the main results of this study in a qualitative way.
The available historical Baidu migration data cover the days from 10 January 2020 to 15 March 2020 (65 days), we only build the model between the independent variables and the COVID-19 cases (confirmed and death) on the day and after 5 days (if we consider after 14 days, the period would be reduced to 51 days, equal to losing more than 20% of samples), so we extract the panel dataset corresponding to the period on the day and after 5 day's situation of COVID-19.And then NPD is replaced by BMNI to construct the SDE models with the same steps.

Relationship between Tencent and real population densities
We find there is a simple linear relationship between the logarithms of the ATPD and EPD in 2015, where the function is statistically significant (p < 0.001, R 2 = 0.93).Figure 2 shows the details of the regression.
The function could be represented by Equation ( 18), where the coefficient is 0.0817: According to the above equation, the EPD could be computed by Equation ( 19): Based on this, the relationship between the Tencent population density with higher temporal resolution such as hourly and daily (defined as New PD, NPD, Table 1), and estimated dynamic population density (EDPD) could be represented by Equation ( 20): An EDPD could be computed by NPD, which means the quantitative impact of NPD on the transmission of COVID-19 could be transferred to the quantitative impact of estimated dynamic population density on the transmission.

Spatial dependency examination
The monthly global Moran's I has been shown in Table 2, where the values that are not significant have been excluded.The remaining values are all significant, positive, which means the COVID-19 related variables have a high spatial dependency during the specific months.
Note, at the start of the pandemic of COVID-19, e.g.Dec. 2019, the cases appeared sparsely distributed among the cities in China, which causes the Moran's Is to be not significant.However, overall there is a high spatial dependency of the COVID-19 related variables (at least 60% are significant positive), which is consistent with other related research that focuses on China (Kang et al. 2020;Zhang et al. 2020c) and this is also found in other countries (Bag et al. 2020;Shariati et al. 2020).

Model prerobustness test
Table 3 shows the results of the decision if an SDM should be simplified to a SAR or SEM and Hausman test.According to Section 3.2.3,because all the p-values in the column SAR and SEM are less than 0.01, which means the original hypothesis is rejected.Thus, the SDMs should not be simplified to SARs or  estimator should be selected to perform the SDMs.

Main results of SDM
After the robust test and Hausman test, an SDM model with a random effect estimator method is determined.The main coefficients of NPD in regression results are shown in Figure 3. Within them, the left 6 bars in the figure show the results using a spatial contiguity matrix, while the right 6 bars illustrate the use of a spatial inverse distance matrix.The full results have been reported in Table A .3 and A.4 in Appendix 2, where the spatial lag coefficient δ of all 12 SDMs is significantly positive, which means the (lagged) confirmed or death cases of COVID-19 in one administrative unit are impacted by the situation of its neighbor units.Thus, a spatial panel data model is necessary rather than a normal panel data model.

Direct and indirect effects
However, the regression coefficient of the SDM model cannot directly reflect the degree of influence of independent variables on dependent variables, so it needs to calculate the direct and indirect effects to be specific.The results are shown in Table 4.
In Table 4, the direct effects of NPD in all 12 SDMs are positive, while the coefficients in ND by contiguity matrix and NC in inverse distance matrix are not significant.In terms of the indirect effects, which are all negative, the p-values for all six SDMs by contiguity matrix are less than 0.05 which means they are    significant.However, in terms of the inverse distance matrix, the coefficients of NPD in SDMs of NC, ND, ND_Lag5 are significant, while the other three are insignificant.
In terms of supplementary independent variables, the effects are varied.For example, the indirect effect of AO by contiguity matrix SDMs are all significantly positive, while for direct effect by both two spatial matrixes are not significant.
According to Equation ( 18), the effects of dynamic estimated real population density on the transmission of COVID-19 are computed (Figure 4).
We determine that in the SDMs with the same dependent variables, for the results by using a contiguity and inverse distance matrixes, if both of them are significant (p < 0.05), then the impact of NPD on the dependent variables are significant.Otherwise, we state that there were no obvious effects of the NPD.Then, the mean effect values of the two spatial matrixes are computed.Thus, in Figure 4, according to (a) and (b), the mean and significant results are plotted in (c).While (d) and (c) determine the (f).
It is evident that in terms of the ND-SDM, the direct effect is not significant but significant indirect effects are plotted, while the other two NPD (i.e.EDPD) in Figure 4 (c) only have significant direct effects.In terms of the new death case-dependent variables, the coefficients in ND_Lag5-SDM have both significant direct and indirect effects, while the direct effects are positive but indirect effects are negative.However, the coefficient in ND_Lag14-SDM is only significantly positive for the direct effect, and ND is only significantly negative for the indirect effect.

Validation
This section illustrates the validation results using the two validation strategies.

VS1
RI coef : and RI sig: of the random experiment are shown in Table 5.In terms of the main results of SDE models, all RIs of the NPD coefficient are more than 60%, but the significance RIs can partly reach 100%, while others are much lower.The RI coef : of the direct effects for NPD are all more than 60% over the 12 SDE models as well, while the corresponding RI sig: has the same patterns as the main results.In terms of the indirect effects, except when the dependent variable is NC_Lag14, the RI coef : for others are all above 50%, while RI sig: for only three of them is less than 50%.
The validation by using the random experiment strategy can compute the reliability of both coefficients and significance, which overall supports the main results based on all six dependent variables and the two spatial weights matrices.
Full regression results of the SDE models for the 10 random experiments are reported in Table A.5 to A.24 in Appendix 3.

VS2
Table 6 illustrates the results of the SDE model for VS2.The consistency comparison with the original results is shown as well, where the corresponding coefficients with significance are tabulated.The full regression results are shown in Table A .25 and A.26 in Appendix 4.
The consistency tuple of signs for each dependent variable based on two kinds of spatial weights matrices illustrates that only two tuples are inconsistent.While others, including the positive or negative of coefficients with their significance are all consistent, which means the results of VS2 are more than 95% consistent on a qualitative scale.These results validate our original qualitative results at a much high level of reliability and robustness.

Discussion
Overall, dynamic population density changes put a significant impact on the pandemic of COVID-19 based on the significant coefficients of NPD in all SDMs, which support similar opinions in other surveyed studies (Rader et al. 2020;Li et al. 2020;Kraemer et al. 2020).The most significant positive global Moran's Is for monthly average-dependent variables show that the transmission of COVID-19 features spatial dependencies.Further, according to the results in Table 2, the spatial lag coefficients δ in all 12 SDMs are significantly positive.Thus, spatial features play a key role in the analysis of how dynamic population density can impact the transmission of COVID-19.This also shows that transmission of COVID-19 is impacted by the local dynamic population density changes, as well as the density changes in neighboring regions, and these relationships have a positive correlation.
In Table A .3 and A.4, the overall spatial lag coefficients δ using an inverse distance spatial matrix are higher than that using a contiguity matrix (the average coefficients are 0.292 and 0.122 respectively).This means the transmission varies as the spatial distance increase from the local region, and is not limited just to the adjacent regions.This result highlights that when detecting factors that impact the transmission of the COVID-19, we need to consider spatial  under "Ind." on the second column is "+/+", which means the indirect effects coefficient of BMNI and NPD for NC_Lag5-SDE models are consistent ("+"), they are all positive, while the right one for the significance are inconsistent ("-"), the BMNI one is insignificant, while the NPD one is significant.
regression models.However, because these two spatial matrices emphasize different significances, we explain the results, respectively.When using a contiguity weights matrix, all the coefficients of NPD are significant except for ND-SDM (Table 4).This result illustrates that between two adjacent administrative units, there is an obvious effect of population density changes on the transmission of COVID-19, where we define this as the spatial contiguity effect.Dynamic population density changes directly and significantly influence the transmission of the virus.To be more specific, when NPD increases by 1 unit in quantity, the new confirmed case number of the day in a county grows by 0.101, while the number after 5 days grows by 0.131, and after 14 days increases by 0.154.This phenomenon relies on the mechanism of human-to-human contact.Meanwhile, under the condition that NPD increases the same unity quantity, the new confirmed case number increases with the time lags, which means the transmission of COVID-19 impacted by the dynamic population density changes is likely to take more than 14 days.However, when NPD increases by 1 unit in quantity in a county, the new confirmed cases in neighbor counties would decrease by 0.207, while the number after 5 days falls by 0.160, and after 14 days reduces by 0.205.There may be two reasons for these results.First, when the local new confirmed cases increase, governments would publish and strengthen strict intervention measurements to control the spread of cases and the virus, for example, to restrict road and foot travel using road blocks.These policies can reduce a neighbor (e.g.B county) being impacted by the confirmed cases (that have been found), while more potential cases that have not to be confirmed would spread to neighbor regions.This means local potential cases (not the locally confirmed cases) might increase the neighbor's confirmed cases.Second, in terms of a neighbor B county, when more confirmed cases in A county are reported, the measures of B county would help protect the spread of the virus as well, which includes both the government's control and the self-defense of citizens in B county (Zhu et al. 2021).
Similarly, the contiguity spatial effect of the impact of dynamic population density changes on the new death cases is intuitive as well.The insignificant coefficient of NPD in ND-SDM (0.00366, p ≥ 0.05) shows that the dynamic population density changes during a day would not result in death case of COIVD-19 rising on that day, because of the human immune system resistance and timely medical intervention measures.The infectee would not die in one day (the time between symptom onset and death ranged from about 2 weeks to 8 weeks (World Health Organization (WHO) 2020c)).However, 1 unit quantity increase of NPD can lead to 0.004,23 people dead after 5 days, and 0.007,49 deaths after 14 days, locally.But indirectly, the increase of people's density can raise the probability of survival in neighbor regions, wherein neighbor B there are 0.007,87 people that survive from COVID-19 on a day when the NPD increases 1 unit quantity in local A, while after 5 days, 0.008,79 people survive in county B and after 14 days there are 0.007,39 fewer people dead.The reasons are similar to the explanation for the new confirmed case SDMs above.
When using an inverse distance weight matrix (Table 4), with the geographical distance reduces, the spatial effect of the dynamic population density changes impacts on the transmission of COVID-19 varies, and this effect is defined as the spatial attenuation effect.There are significant and insignificant coefficients of NPD (insignificant results are more than significant ones) when using the matrix, which means the spatial attenuation effect is weaker than a spatial contiguity effect in terms of the impact of people's density on the transmission of COVID-19.To be more specific, when NPD increases 1 unit in quantity, the local new confirmed case number increases by 0.094,4, but after 5 days, the new confirmed case grows by 0.118, and increases by 0.138 after 14 days.Meanwhile, other regions that are not limited to the adjacency neighbor counties but are further away from nonadjacency counties, they decrease by 0.712 confirmed cases at the day and decrease by 0.566 after 14 days, but after 5 days the influence of dynamic population density changes is not clear.The reason is that a 5-day is too short a time slot to accurately register new COVID-19 infections.And as the spatial area is extended, the indirect spatial attenuation effect disappears as time goes on.
However, in terms of the spatial attenuation effect for the death case, when local NPD increases by 1 unit quantity in that first day, after 5 days and 14 days, the death case number increases by 0.004,13, 0.004,44, and 0.006,93, respectively.This means that when a direct spatial attenuation effect occurs, the change of people's density impacts the local death case.But in terms of other regions, the indirect effects are only significant for ND-SDM (−0.0212, p < 0.01) and ND_Lag5-SDM (−0.02, p < 0.01).These results are further evidence that with the extension of the spatial impact the prolongation of the temporal impact, the influence of local dynamic population density changes on the transmission of COVID-19 in other regions, fades.
To compute the quantitative impact of the people's density on the new confirmed and death cases of COVID-19 in China over the study period, we use Equation ( 18), to transfer the NPD to a real people's density (EDPD).At the same time, we calculate the mean values of the coefficients when using two different spatial matrixes to determine the overall quantitative impacts.Furthermore, the average RI coef : and RI sig: are also given, see below.
In terms of the influence of other supplementary dependent variables shown in Table 4, socioeconomic factors have not proved to have a significant impact on the transmission of COVID-19 in this study (for the reasons already given).For the environmental factors, NO 2 as a key pollutant of air pollution also plays a key role in the transmission of COVID-19.Because NO 2 is a diffusive gas whose concentration attenuates with distance (Massman 1998), a model using an inverse distance spatial matrix is more interpretable than using a contiguity matrix.Thus, according to the results of SDMs using an inverse distance matrix, the indirect effects of NO 2 concentration are significantly positive, but the indirect effects are significantly negative.This result is similar to the study of Ogen and Yaron (Ogen 2020), who explain that a high NO 2 concentration accompanied by a downwards airflow causes a NO 2 buildup close to the surface, which prevents the dispersion of air pollutants.This phenomenon can cause a high incidence of respiratory problems and inflammation in the local population, which contributes to the high COVID-19 fatality rates in these regions.Our study also provides evidence that NO 2 is likely to be one of the causes of higher local infections and even deaths due to COVID-19 (direct effect).Meanwhile, this kind of effect prevents the large-scale spread of NO 2 , causing ab insignificant increase of NO 2 concentration around a county (indirect effect).This effect does not speed up the spread of the surrounding pandemic (the coefficients are negative).
In terms of other environmental factors including both air pollution and weather conditions, just like with NO 2 , they all attenuate with increased distance.Thus, the results computed by using an inverse distance matrix are more interpretable.TP, DP, SO2, PM25, and PM10 all have a significant impact on the spread of the pandemic in certain SDMs.For example, five SDMs show that SO 2 has a significant positive indirect effect on the newly confirmed cases and on death cases.In addition, the significant results of TP, DP, SO2, PM25, and PM10 in certain models can also prove other relevant research results.However, this paper focuses on the impact of dynamic population density changes on the pandemic situation, so other supplementary variables are not discussed in depth here.

Conclusion
Dynamic population density changes which lead to the change of people's density has a significant impact on the transmission of COVID-19, where the impact is spatiotemporal related and estimated quantitatively.To be more specific, dynamic population density changes in one county not only impact the daily new confirmed and death cases of the local county but also influence the transmission of COVID-19 in the neighboring counties.Further, dynamic population density changes on the day not only impacts the transmission of the virus on that day but also influence the situations in the future (e.g. 5 and 14 day lags).This study not only provides estimated quantitative values, but the reliability of these numbers is also quantified.
For the case that a pandemic, such as COVID-19 is still ongoing, even if more and more people are vaccinated, a new mutant strain may need to restart the same measures to try to contain the new pandemic strain in human society via restricting human mobility en masse in a measured way.Therefore, for such a pandemic laden world, this paper proposes some additional clear policy implications: from a public health perspective, for policymakers and governments all over the world that not only should citizens aim to maintain a social distance between individuals of a specific length, e.g. 2 m but that we also consider certain place restrictions on crowd activities with quantitative restrictions.These restrictions can cover a large region like a county or can include a smaller area, such as an indoor space, for example, limiting the number or density of people in shopping malls which can be computed by the historical dynamic population density data, to reduce the diagnosis rate and mortality rate on that day and in the future.Moreover, when any county-level government authority considers instigating COVID-19 related mobility policy restrictions, good cooperation with adjacent counties that can maximize the best interests of the population as a whole is needed, because both local and neighboring counties' dynamic population density changes can impact the transmission of the virus, and these direct and indirect effects may be quite different.Furthermore, from the perspective of resource exploitation, access to open (spatial) data, such as volunteered GIS data, plays a more and more vital role than before, especially for the COVID-19 pandemic.Traditional governments need to set up a special policy research department to make available and exploit such datasets using state-of-the-art technology to be able to make more efficient policies that relate to such spatial and temporal quantitative measurements.
There are some limitations of this study that could be future work.First, the spatial and temporal resolutions of the dataset only reach the county level (in space) and hour level (in time).For more accurate pandemic prevention measures, higher spatial and temporal resolutions are still needed, such as population density changes at a kilometer and minute levels.Second, under the premise of a certain range of vaccination, the transmission mechanism of the pandemic situation may be significantly different from that of a non-vaccinated region.Therefore, projects that focus on vaccination should not only consider the temporal aspect; the spatial model needs to be improved too.Finally, different countries or regions have different pandemic environments, demographics, geographical topologies, transport links, and government measures.The quantitative results in this paper are not applicable equally to all regions or countries.Therefore, future research is needed to better understand the effect of such different national conditions and spatial-temporal population density changes in relation to pandemic situations.

Notes on contributors
observed the rapid spread of COVID-19 throughout China during the first 2 months of the current outbreak and highlighted the geographical characteristics of COVID-19 cases.Chen et al. (2020b) computed the distribution of the people who were infected by the virus and analyzed its correlation with the migration of the Wuhan population in the initial stage of the pandemic.In studying the relationship between the spatiotemporal and epidemiological characteristics of COVID-19, Huang et al. (2020a) also considered the control measures set by governments.Xiong et al.2020 used spatial statistics and Pearson correlation methods to analyze the spatial autocorrelation and influencing factors of the COVID-19 pandemic from 30 January 2020, to 18 February 2020.Tang et al. (2020) analyzed the changing patterns and the spatiotemporal features of the COVID-19 pandemic in China to provide further evidence of the effectiveness of any real-time responses.

Figure 1 .
Figure 1.Tencent Population Density and estimated population density in 2015.(a) is the Tencent positioning points at 12 A.M. on 1 December 2019.(b) shows an estimated population density distribution in 2015.(c) shows the 32-levels' classified results of the Average Tencent Population Density (ATPD) during the study period.(d) are the 32-levels' classified density distribution corresponding to (b).

•
Spatial resolutions: county-level administrative unit: on average this area is an average size of 26,000 km 2 in China.N.B.In other countries, this size can vary and counties can be referred to by other names, such as local government areas (Australia), shires (UK), municipal government (Canada).

Table 1 .
Variable definition, source, and statistic description.
Note: Std.refers to standard deviation.Obs.refers to the number of observations.

Table 2 .
Monthly global Moran's I for dependent variables.
Note: The insignificant values have been shown as ".".

Table 3 .
SDM pre-robustness test and effect selection.

Table 4 .
Direct and indirect effects of variables on COVID-19 transmission.

Table 6 .
SDE model results of VS2.Dir. and Ind. refers to the direct and indirect effects.Cons.indicates the consistency.The left sign of the consistencies represents the consistency of positive/ negative, while the right one represents the consistency of significance."+" means they are consistent, while "-" means they are contrary.For example, "Cons." This research received funding from the National Science and Technology Major Project of the Ministry of Science and Technology of China [grant number 2017YFB0503605], the National Natural Science Foundation of China [grant number 41771478], the Fundamental Research Funds for the Central Universities [grant number 2019B02514], Natural Science Foundation of Beijing, China [grant number 8172046], the China Scholarship Council (CSC), and Queen Mary University of London.
Guangyuan Zhang received BSc and MSc in Geographical Information Science, from Hefei University of Technology, and University of Chinese Academy of Sciences, China.He received the PhD degree in Computer Science from Queen Mary University of London, UK.His research interest includes Internet of Behaviors and Urban Computing.Stefan Poslad received the PhD from Newcastle University.He is currently an Associate Professor at Queen Mary University of London, UK, where he heads the IoT Lab.His research interests are Internet of Things, ubiquitous computing, semantic Web, and distributed system management.Yonglei Fan received BSc and MSc from South China Normal University and University of Chinese Academy of Sciences, China.He is currently pursuing a PhD degree in Computer Science at Queen Mary University of London, UK.His research interests are Geographical Information Science, indoor poisoning technique, human activity recognition and Internet of Things.Xiaoping Rui received PhD degree in Cartography and Geographic Information System from the Graduate University of Chinese Academy of Sciences, Beijing, China, in 2004.He is currently a full professor with the School of Earth Sciences and Engineering, Hohai University.His research interests include geographical big data mining, 3D visualization of spatial data, and remote sensing image understanding.