Revealing population flow patterns in the Sichuan-Chongqing region, China, during the COVID-19 epidemic in 2020

ABSTRACT COVID-19 has had a serious impact on the lives and health of people and severely affected the population flow in 2020. Baidu migration data offer great opportunities to study spatiotemporal interactions among cities. Revealing population flow patterns has important scientific significance for the precise prevention and control of the COVID-19 epidemic. The aim of this article is to reveal the spatiotemporal patterns of population flow and associated influential factors in 22 cities in the Sichuan-Chongqing region (SCR), which is regarded as the fourth pole of China’s economy. Four typical time periods are selected to study the spatiotemporal patterns of population flow. The regional population flow intensities in all cities and between different cities in the SCR are illustrated. Stepwise regression is used to analyse the factors affecting regional population flow intensity in four selected periods. The results show that (1) the COVID-19 epidemic greatly affected population flow in the SCR, (2) more travel occurred between cities on holidays than on weekdays in the SCR when the epidemic was not serious, and (3) the regional population flow intensity was strongly correlated with the population education level and transportation facilities when the epidemic was not serious.


Introduction
Population flow is an important feature that reflects the level of urban development and interconnectedness between cities, and it provides important information that reflects economic links and transportation needs between cities (Shumway and Otterstrom 2001;González, Hidalgo, and Barabási 2008). Since January 2020, COVID-19 has been spreading worldwide (Chung, Xu, and Zhang 2020), and the WHO declared it an international public health emergency by the end of January 2020 (Lai et al. 2020). COVID-19 has not only had a serious impact on the lives and health of people around the world but has also significantly damaged the economic development of various cities (Connolly, Ali, and Keil 2020). Studies have shown that limiting population flow can largely reduce the spread of COVID-19, thus resulting in a significant reduction in prevalence (Wells et al. 2020;Wang et al. 2021;Wu et al. 2021). During the severe epidemic periods in China, major changes in population flow patterns occurred due to policy restrictions (Chen et al. 2020).
Scholars have explored population flow in China during the COVID-19 epidemic using statistical data (Zhang et al. 2020a). However, these studies focused on a short period of population flow because collecting statistical data to determine population flow is not only very costly but also very time sensitive. In China, large-scale population surveys are conducted every ten years, and it is impossible to obtain real-time information on population flow from statistical data (Xu et al. 2017). With the development of communications and networks in the last decade, big data, especially location-based big data, offer great opportunities to study the sense of place and spatial interactions between cities Meijers and Peris 2019;Yao et al. 2019;Ye, Li, and Peng 2021).
Rapidly available, yet accurate, big-data analytics on population flow and its influencing factors are invaluable to decision-makers. The emergence of location-based big data provides a new perspective for studying population flow, and revealing population flow patterns and their influencing factors has more important scientific significance for the precise prevention and control of the COVID-19 epidemic. The main contribution of this article is to reveal the spatiotemporal patterns of population flow and associated influential factors in 22 cities in the Sichuan-Chongqing region (SCR). This paper proceeds as follows. Section 2 introduces the related work. Then, the basic situations underlying the SCR and Baidu migration data are explained. Then, we briefly describe the impact of the epidemic on population flow. Stepwise regression is then used to analyse the influencing factors on regional population flow intensity in four selected periods. Finally, we analyse and discuss the results and conclude the paper.

Related work
The emergence of big data has facilitated our research on population flow. Population flow is affected by many factors. Here, the related works that explore population flow and influencing factors are reviewed.
The use of location-based big data to detect population flow is becoming mainstream Zhang et al. 2020b). Cell phone signalling data are generated by cell phone users in the event of calls, text messages or mobile location, and these data are acquired by the operator's communication base station that records the user's signalling trajectory. Many scholars use cell phone signalling data to analyse the user's trajectory and derive population mobility patterns (Ratti et al. 2006;Fan et al. 2018;Gao et al. 2020;Hu et al. 2021). The use of cell phone signalling data to study population flow has the advantages of wide coverage and high accuracy; however, cell phone signalling data are generally expensive and often have a large amount of redundant data (Jarv et al. 2012). Floating car data refer to the trajectories of vehicles equipped with Global Navigation Satellite System (GNSS) devices that automatically record real-time traffic data (Erdelić et al. 2021). Floating car data have the advantages of stability and high accuracy. Many scholars are now using floating car data to study the population flow pattern of cities (Veloso, Phithakkitnukoon, and Bento 2011;Tang et al. 2015;Shen, Liu, and Chen 2017). Floating car data emphasize the flow of vehicles; however, the flow of vehicles and the population flow are not completely equivalent. Social media data are also a very popular source for studying population movement (Wu et al. 2014;Huang et al. 2020;Ye, Li, and Peng 2021). However, more social media data are obtained for young people, whereas social media platforms are less frequently used by children and older people. Therefore, using social media data to study population mobility and flow may introduce bias. Location-based services (LBS) denote applications integrating geographic location with the general notion of services, and examples of such applications include emergency services, car navigation systems, and tourist tour planning (Schiller and Voisard 2004). Baidu migration data are based on Baidu's LBS data; moreover, Baidu is an open platform that collects the results of various internet releases and the location information of various user mobile applications, and it also provides a unified data format and a unified open interface (Deville et al. 2014) and offers great opportunities to study spatiotemporal interactions between cities.
The large-scale population flow in 2020 was influenced not only by the COVID-19 epidemic but also by other factors. The main factor underlying population flow was the economic situation of each city or region (Xu et al. 2017;Cao et al. 2018). Cities or regions with a high level of urbanization tend to have a more developed economy, more jobs and higher income, which can attract people from other regions who are seeking employment and development, and the flow rate tends to be higher. Weaker economic regions tend to have fewer jobs, and a large amount of outward talent spread (Shen 2006;Gao et al. 2014;He et al. 2016;Jin et al. 2019;Li et al. 2019;Zhao, Liu, and Wang 2019). In addition to economic conditions, population flow is also related to quality of life (Mueser and Graves 1995), which is largely influenced by the public facilities in the city. Cities and regions with better public facilities, such as cinemas, shopping malls and parks, can better attract immigrants. Research shows that the general budget expenditures of local governments are an important factor that influences the distribution of migrating populations (Xu and Ouyang 2018). Traffic conditions can also greatly affect the flow of people. Convenient and welllocated areas are more attractive to immigrants (Kotavaara, Antikainen, and Rusanen 2011). However, these studies did not consider the impact of epidemics. Some of the features may show a high degree of correlation between themselves. Excessive features may result in redundant data, resulting in an increase in time complexity. Moreover, in a large-scale municipal or provincial administrative area, the relationship or structure of the variables would change as the geographical location changes, which is known as spatial existence nonstationarity. Using the traditional regression model may cause a certain deviation in the result (Matthews and Parker 2013).
Most of the cities in the SCR are located in the Sichuan Basin, which is surrounded by the highlands of the Wu Mountains in the east and the Yunnan-Guizhou Plateau in the south and the Tibet Plateau in the west and the Daba Mountains in the north. In recent years, some studies have focused on the SCR, for example, temperature variability (Shao, Li, and Ni 2012), comprehensive drought index (Ji et al. 2018), precipitation (Lai and Gong 2017), and atmosphere (Wang et al. 2018;Huang et al. 2021), whereas few studies have explored population flow patterns in the SCR. Since SCR is a relatively closed geographical unit, the population flow in the SCR has strong regional characteristics, and the connections between cities in the SCR are closer. During the COVID-19 pandemic, revealing population flow patterns in the SCR has more important scientific significance for the precise prevention and control of the COVID-19 epidemic.

Study area
The SCR is located in southwestern China in the upper reaches of the Yangtze River. It is bordered by Hubei and Hunan Provinces in the east, the Tibet Autonomous Region and Qinghai Province in the west, Yunnan and Guizhou Provinces in the south, and Shaanxi and Gansu Provinces in the north. The area of the SCR, which is 568,422 square kilometres, accounts for 5.9% of China's total area. It includes 22 cities in Sichuan Province and Chongqing municipality. It is an urbanization region with the highest development level and significant development potential in western China and is regarded as the fourth pole of China's economy. By the end of 2020, the permanent population of the SCR reached 115.8 million, which accounted for 8.2% of China's total population, and the gross domestic product (GDP) of the SCR reached 73,601.55 trillion yuan, which accounted for 7.24% of China's total GDP (National Bureau of Statistics of China 2021). The study area and the population of each city in 2020 are shown in Figure 1.

Data sources
The population flow data were obtained from the Baidu platform (http://qianxi.baidu.com/), which has been online since 2014 and uses its own LBS open platform to calculate and analyse LBS big data to realize full, dynamic, and instantaneous visualization of the trajectory and characteristics of population flow before and after the Chinese New Year. Baidu provides daily population flow data for two periods (January 1 -March 15 and September 22 to December 31) in 2020. It records the population inflow trend, population outflow trend, urban population inflow source, and urban population outflow destination. Baidu migration data can help us reveal population flow patterns. Tables 1 and 2 show some examples of data on population inflow trends and examples of data on the source of urban population inflow, respectively. In Table 1, population flow intensity is a relative value that reflects the size of population flow, and the value of population flow intensity can be compared across cities. In Table 2, the sum of the percentage of population flow is equal to 100.
The population flow was greatly affected by the COVID-19 epidemic during 2020. Considering the COVID-19 epidemic and holiday factors, four typical time periods were selected for the analysis of population flow patterns (Table 3). The first time period (Period I) is the first month of the Chinese lunar calendar, which is from January 25 to 22 February 2020. New local cases of COVID-19 in Chongqing and Sichuan were 516 and 498, respectively, during Period I. In Period I, shortly after COVID-19 appeared in China, people self-isolated at home. On 8 April 2020, China lifted outbound travel restrictions on Wuhan, the city hardest hit by the COVID-19 outbreak in 2020. At that time, COVID-19 was basically controlled in China, and people's lives generally returned to normal in the following days. The second time period (Period II) is the national day holiday from October 1 to 8 October 2020. Since Period II is a holiday and the COVID-19 epidemic is basically under control, many families choose to travel. The third period (Period III) is the period of normal population flow in the SCR from November 2 to 29 November 2020, and it was not greatly impacted by the COVID-19 epidemic. The number of new local cases is 0 during Periods II and III. Several COVID-19 cases have occurred in Chengdu since December 7. Therefore, population flow in the SCR has been affected to some extent. The fourth period (Period IV) from December 8 to December 27 in 2020 was selected. The number of new local cases is 36 during Period IV.
Considering the availability of data, indicators of population, society, economy, education, health, infrastructure, and public security were selected. Finally, the resident population, number of employed persons, gross domestic product (GDP), number of college students, number of health institutions, number of personnel in health institutions, total mileage of highways, and rate of criminal cases were obtained from the Sichuan Statistical Yearbook (2020) and Chongqing Statistical Yearbook (2020). The temporal resolution of these data is the annual scale, and the spatial resolution is the city scale; these data were used to reveal the causes of population flow.

Calculation of regional population flow intensity
Baidu migration data provide the population flow intensity for each city to characterize the overall situation of a city's population flow as well as the percentage of the population flow of other cities to the population flow intensity of that city. To calculate the regional population flow intensity, the daily regional population flow intensity of each city was first calculated, and then the average daily regional population flow intensity in a time period was obtained. The regional population flow intensity can be calculated by Formula (1).
where M k represents the regional population flow intensity of city k, I i represents the population flow intensity of city k on day i, p j represents the percentage of population flow of city jj to the population flow intensity of city k on day i, n is the total number of days in that time period, and m represents the number of cities in the  Moderately study area other than city k. If I i is the in-flow intensity, then M k is the regional population in-flow intensity; if I i is the out-flow intensity, then M k is the regional population out-flow intensity.

Stepwise regression analysis
Regression analysis is widely used to infer the relationship between a dependent variable and a series of independent variables. To exclude independent variables that have no significant influence on dependent variables, stepwise regression analysis is intended to be used. Unlike conventional regression analysis that considers all independent variables, stepwise regression is the step-by-step iterative construction of a regression model that involves the selection of independent variables to be used in a final model. Therefore, stepwise regression is able to provide a good explanation of the outcome, and we chose stepwise regression for the analysis of factors affecting regional population flow intensity in this study. R 2 and RMSE are widely used indicators for evaluating regression models. Therefore, the performance of the stepwise regression is evaluated by using RMSE and R 2 in the article.

Regional population flow intensity in the SCR
Regional population flow intensity varies greatly in different time periods and cities in the SCR. The regional population flow intensity in different time periods is illustrated in Figure 2. Figure 2(a, b) show the regional population out-flow intensity and regional population in-flow intensity, respectively. From a temporal perspective, the following conclusions can be drawn from Figure 2. The time period of population flow from small to large was Period I, Period IV, Period III, and Period II. In Period IV, since the Chinese government took very strong measures to curb the spread of COVID-19 in the early days of the outbreak, the lowest population flow was observed in the SCR. In Period Ⅳ, population flow was affected to some extent by the COVID-19 epidemic and decreased considerably in the SCR compared to that in Period III. Moreover, the population flow in Period IV was significantly greater than that in Period I in view of the small scale of the epidemic. In Period III, because the population flow is slightly affected by the COVID-19 epidemic and most people work on weekdays, the population flow is relatively large in the SCR. During Period II, since the COVID-19 epidemic is basically under control and people do not work on holidays, population flow occurs on a massive scale, and the highest population flow occurs in the SCR.
From the perspective of the spatial distribution of population flow, cities can be roughly divided into five categories according to the regional population flow intensity. The greatest population flow is observed in the first group of cities, including Chongqing and Chengdu, which are the dual core of the SCR. The second group of cities includes Nanchong, Leshan, Neijiang, Mianyang, Deyang, and Meishan, which are the subcentres of the SCR. The third group of cities includes Ziyang, Dazhou, Guangan, Yibin, Suining, Luzhou, and Zigong. The fourth group of cities includes Liangshan, Aba, Yaan, and Guangyuan. The fifth group of cities includes Ganzi, Bazhong, and Panzhihua.

Regional population flow intensity between different cities in the SCR
To reveal the spatiotemporal patterns of population flow between different cities, the regional population flow intensity between different cities in the SCR for the four time periods was illustrated. Figures 3 and 4 show the regional population out-flow intensity and regional population in-flow intensity, respectively, between different cities in the SCR.
From Figure 3, the following conclusions can be drawn. There are 11, 19, 18, and 14 cities in the SCR with the largest population flowing out to Chengdu in Period I, Period II, Period III, and Period IV, respectively. Therefore, Chengdu is the centre of the population out-flow. There are 14 cities in the SCR with the smallest population flowing out to Ganzi in Period I. There are 17, 15, and 8 cities in the SCR with the smallest population flowing out to Panzhihua in Period II, Period III, and Period IV. The population from other cities to Ganzi and Panzhihua is relatively small.
From Figure 4, the following conclusions can be drawn. The 17, 19, 17, and 14 cities in the SCR with the largest population sources are Chengdu in Period I, Period II, Period III, and Period IV, respectively. Therefore, Chengdu is the centre of population inflow. The 10 cities in the SCR with the smallest source of population are Aba in Period I. The 19 and 13 cities in the SCR with the smallest source of population are Panzhihua in Period II and Period III, respectively. The 8 cities in the SCR with the smallest source of population are Ganzi in Period IV. The population of Aba, Panzhihua, and Ganzi moving to other cities is relatively small. 2. Regional population flow intensity during the four periods. (a) Regional population out-flow intensity; and (b) regional population in-flow intensity.

Analysis of population in-flow outside the study area
Because there were no new local cases of COVID-19 during Periods II and III and only a small number of new local cases of COVID-19 during Period IV, the population in-flow outside the study area in Period I will be further analysed. In Period I, the number of new local cases was mainly related to the population in-flow outside the study area. The top ten provinces with population inflows to Chongqing and Sichuan are shown in Figure 5.
Chongqing and Sichuan are each other's provinces with the largest population inflows. The percentage of population in-flow from the top ten provinces of Chongqing and Sichuan accounted for 94.19% and 87.32% of the total population in-flow, respectively. Since Hubei, which had the most local cases in Period I in China, was Chongqing's third largest inflow of out-of-province population, Chongqing had more new local cases than Sichuan in Period I. Although the population flowing into Sichuan from Hubei was relatively small, there were also some new local cases in Sichuan due to the widespread interprovincial movement of people in Period I.

Population flow pattern affected by the lockdown policy
Regional population flow intensity in 2020 was affected by the lockdown policy. For example, COVID-19 cases were reported in Chengdu on 7 December 2020, and some areas in Chengdu imposed lockdown policies. Since Chengdu and Chongqing are the two main cities in the SCR, the regional population flow intensities on two weekends before and after the outbreak of COVID-19 in Chengdu, Chongqing, and all cities in the SCR are listed in Table 4.
The changes in population flow intensity show the following. (1) The population out-flow intensity of Chengdu, Chongqing, and all cities in the SCR in the first weekend after the outbreak of COVID-19 was 59.21%, 72.99%, and 67.06% of the previous weekend.
(2) The population in-flow intensity of Chengdu, Chongqing, and all cities in the SCR in the first weekend after the outbreak of COVID-19 was 50.29%, 77.38%, and 67.06% of the previous weekend.
(3) Due to the close connection of population flows between Chengdu and Chongqing, the population flow in Chongqing has been affected to a certain extent, but the impact is not more serious than that in Chengdu. (4) Under the COVID-19 lockdown policy, there has been a major drop in population flow intensity in cities where COVID-19 cases have been reported, and population flow intensity will also be affected to a certain extent in cities that are more closely linked to cities where COVID-19 cases have been reported.

The influencing factors of the regional population flow intensity
Stepwise regression analysis is used to analyse the factors affecting regional population flow intensity. Resident population (X 1 ), number of employed persons (X 2 ), GDP (X 3 ), number of college students (X 4 ), number of health institutions (X 5 ), number of personnel in health institutions (X 6 ), total mileage of highways (X 7 ), and rate of criminal cases (X 8 ) are selected as independent variables, and the regional population flow intensity (Y) during the four time periods are selected as dependent variables for analysis at the significance level of 0.05.
Four stepwise regression models for modelling the relationships between the regional population outflow intensity during the four time periods and the eight independent variables are built. The stepwise regression models are given in Formulas (2-5). Y1, Y2, Y3 and Y4 are the regional population out-flow intensities of Period I, Period II, Period III, and Period IV, respectively. The R 2 and RMSE in Formula (2)   (2) Y2 ¼ 2:30534 þ 1:54152EÀ 05 � X4 À 6:04767EÀ 05 � X7 À 4:2067 � X8 (3) Likewise, four stepwise regression models for modelling the relationships between the regional population inflow intensity during the four time periods and the eight independent variables are built. The stepwise regression models are given in Formulas (6-9). Y5, Y6, Y7 and Y8 are the regional population in-flow intensities of Period I, Period II, Period III, and Period IV, respectively. The R 2 and RMSE in Formula (6) The following conclusions can be drawn from Formulas (2-9). The 7 stepwise regression models (Formulas (3-9)) reveal the relationship between the regional population flow intensity and the eight independent variables very well, whereas since strict home quarantine measures were taken in Period I, the population outflow was very small, and Formula (2) did not fit the results well. Therefore, although X8 is more explanatory to population flow in Formula (2), this is the case where the model does not perform well. Six stepwise regression models include variables X4, and seven stepwise regression models include variables X7, indicating that these two variables are more explanatory to population flow. Therefore, educational resources and transportation facilities are mainly concentrated in core cities, which helps to increase population flow. Since the regional population flow intensity in these three time periods (Period I, Period III, and Period IV) is less than 30% of the regional population flow intensity in Period II, the RMSE of the regional population flow intensity in Period II is larger than that of the other three periods.

Conclusions
In this article, the spatiotemporal patterns of population flow in the SCR during the COVID-19 epidemic in 2020 are explored. The difference in regional population flow intensity at different time periods showed that population flow in the SCR is seriously affected by the COVID-19 epidemic and that more travel occurs between cities on holidays than on weekdays in the SCR. Under the COVID-19 epidemic lockdown policy, there has been a major drop in population flow intensity in cities where COVID-19 cases have been reported, and population flow intensity will also be affected to a certain extent in cities that are more closely linked to city where COVID-19 cases have been reported. Stepwise regression showed that the regional population flow intensity is closer to the population education level and transportation facilities. This study also has some drawbacks. Since Baidu did not release population flow data for all days in 2020, it is impossible for us to study the daily population flow situation in 2020. Limited by Baidu migration data, we have not studied the strength of connections between different districts in the same city.