Dynamic characteristics of the COVID-19 epidemic in China’s major cities

ABSTRACT The novel coronavirus disease of 2019 (COVID-19) first appeared in Wuhan and subsequently spread rapidly in cities and provinces across the country and all over the world. In order to effectively control the spread of the epidemic in different areas, zonal management and endemic prevention and control policies should be implemented according to local epidemic situations. This study proposes a time-series clustering method to discover dynamic characteristics of the COVID-19 epidemic by categorizing the epidemic situations in China’s major cities into groups based on daily reported confirmed cases and analysing the driving factors of the city background conditions for each category. Our results show that according to the dynamic patterns of the COVID-19 epidemic there are eight types of epidemic situations, including extreme outbreak areas, large spread areas, potential resurged areas, middle spread areas, controlled outbreak areas, limited growth areas, delayed outbreak areas, and lag report areas. These dynamic patterns are mainly related to the city background conditions, such as population flow, local resident number, government emergency response capability, and medical resource conditions. Based on our results, different endemic prevention and control measures are recommended for containing the COVID-19 epidemic in cities with different types of epidemic situations.


Introduction
Since December 2019, when the coronavirus disease of 2019  first appeared in Wuhan, COVID-19 spread rapidly in cities and provinces across the country and invaded Asia, Europe, America, and the rest of the world, posing a serious threat to global public health (Desjardins, Hohl, and Delmelle 2020;. As of 14 July 2021, 187,717,255 people have been infected and 4,047,597 have died worldwide (Dong, Du, and Gardner 2020). Facing a pandemic of such a massive scale, governments around the world have implemented a substantial number and variety of policies (Koo et al. 2020;Kissler et al. 2020;Ding et al. 2020), including individual isolation, family quarantine Chinazzi et al. 2020), travel restrictions (Lai et al. 2020), contact reduction (Ferretti et al. 2020), etc. (Zhang, Litvinova, and Liang et al. 2020;Liu et al., 2020b). Although most policies slowed down the spread of COVID-19 to a certain extent, the effects are regionally different. Considering the endemic epidemic situations, zonal management and regional prevention and control measures are key to control the epidemic (Q. Li et al. 2020c).
In order to effectively achieve zonal management and endemic prevention and control policy implementation for mitigating the spread of COVID-19, it is crucial to understand the various patterns of the COVID-19 epidemic and the mechanisms influencing it in different places. Studies on the dynamic characteristics of the COVID-19 epidemic can be grouped into three categories: outbreak trend surveillance, spatial hotspot identification, and spread process analysis. In the early period of the epidemic, several outbreak indices, including basic reproduction number (R0), growth rate, and endemic case numbers, have been estimated based on several epidemic dynamic models for evaluating the epidemic severity in China (Wu, Leung, and Leung 2020;R. Li et al. 2020a), the United States (Yadav et al. 2020), Europe (Flaxman et al. 2020), and other countries (Adekunle et al. 2020). Other studies focused on the increasing (decreasing) trends in different epidemic stages and the identification of turning points through time-series analysis methods Tang et al. 2020). In addition, epidemic risk analysis through prediction models has also been addressed (Chimmula and Zhang 2020). Studies on spatial hotspot identification of COVID-19 mainly focused on discovering severely hit areas at different scales. Common spatial analysis methods, including spatial scan statistics (Desjardins, Hohl, and Delmelle 2020), spatial clustering ) and local spatial autocorrelation (Xiong et al. 2020) were widely applied to different cities (Koo et al. 2020), provinces, and countries (Desjardins, Hohl, and Delmelle 2020). Studies on the spread process aim at discovering the source and sink areas and analysing potential transmission paths of COVID-19 through population flow between regions using human movement data (Hu et al. 2020). Although there are few studies on this topic, previous works have proposed many effective methods for spread process analysis of other infectious diseases, such as malaria (Ruktanonchai et al. 2016), dengue fever (Wesolowski et al. 2015), cholera (Finger et al. 2016), etc.
Studies on the mechanisms influencing regional COVID-19 outbreaks mainly focused on social factors, natural factors, and biological factors. For social activity factors, the population outflow from the source is believed to be the most important factor affecting the epidemic spread in the early period (Jia et al. 2020), and the local population density along with its activities has a large impact on endemic epidemic outbreaks . Natural factors of the COVID-19 epidemic mainly include temperature (Sajadi et al. 2020), humidity and sunshine (Paez et al. 2020). These factors may affect the transmission ability of COVID-19 by reducing the virus biological activity and survival time. The influence from biological factors is mainly due to the antiviral ability of individuals of different ages, gender, blood types and health levels (Dowd et al. 2020). For instance, many studies have found that older male patients suffer higher incidence and mortality of COVID-19 (Lighter et al. 2020).
Although the above studies have analysed dynamic characteristics of the COVID-19 epidemic, along with factors influencing it, in many regions, as yet no research has been carried out to categorize regional epidemic dynamic situations and summarize the evolution patterns. Therefore, this study aims to propose a timeseries clustering method to discover dynamic patterns of the COVID-19 epidemic by categorizing regional epidemic situations into groups and analysing the driving factors among the city background conditions for each category. Through this work, endemic prevention and control measures for containing COVID-19 can be effectively recommended in areas with ongoing (or upcoming) outbreaks according to the city background conditions. The epidemic dynamic processes in Chinese cities function as an example because the outbreaks in those cities have been under control since March, with a total of over 80,000 confirmed cases (84,047 cases, including 4,638 deaths, reported on 17 May 2020).

Data
COVID-19 case data of 362 cities (or autonomous prefectures or special administrative regions) in China were collected from the DXY.cn website. 1 This data set includes daily reported numbers of confirmed cases at the prefecture-level. In order to analyse the dynamic process of the epidemic, we chose the data from 11 January to 27 March 2020, when the epidemic had been brought under control for the most part in most cities, and adjusted the records in some cities by a simple reassignment process. Based on this adjustment, sudden increases in the number of reported cases due to a change in testing are proportionally reassigned to the days of the previous week ( Figure 1); the data could thus be better used for further time-series analysis. We also extracted human mobility data from the Baidu Smart Eye Map 2 (Baidu, Inc., Beijing, China). This data contains two indices: the Baidu migration index (BMI) and the intra-city travel intensity (ITI). The BMI represents the relative population flow from one city to another in per cent, whereas the ITI is the ratio of travellers to the local resident population in a city. The Baidu migration data have been widely used in many studies on the COVID-19 epidemic (Wu, Leung, and Leung 2020;Xiong et al. 2020;Kraemer, Yang, and Gutierrez et al. 2020). We collected these data for 83 major cities in China from 17 January to 30 January 2020, 1 week before and after the quarantine of Wuhan, and calculated several indices representing city background conditions for further analysis. In addition, the statistical data for the residential population and number of hospitals were obtained from the China Statistical Yearbook 2018. 3 Table 1 shows some descriptive statistics for the yearbook data.

Time-series analysis of epidemic characteristics
In order to analyse the epidemic characteristics, we counted the cumulative confirmed cases and analysed the date of the first case occurrence and outbreak duration in each city. The date of the first case occurrence is the date when the reported confirmed cases number was greater than zero. The outbreak duration is the number of days between the date of the first case occurrence and the time when the outbreak was brought under control, which is defined as the first date that the new confirmed cases number was zero for 7 consecutive days.

Time-series clustering for categorizing cities of different patterns
We also implemented a time-series clustering method to categorize several cities with larger numbers of confirmed cases (greater than 50) into different types according to the local epidemic dynamic in each city. We first construct a synthesized time series, which is the linear combination of the normalized pattern and the cumulative number of confirmed cases. Then, to determine different types of patterns, we use a semisupervised K-means clustering method. Thus, the effect of clustering result is determined by a silhouette coefficient (Rousseeuw 1987).
The process of clustering is divided into the following three steps: (1) Construct the synthesized time series that combines the normalized pattern and the cumulative number of confirmed cases. (2) Cluster the time series of confirmed cases using K-means and determine the optimal clustering number based on silhouette coefficient.
(3) Post-process the clustering result by assigning clusters to cities of different patterns. Each of these steps is now described in detail.

Construct the synthesized time series
The synthesized time-series data of each city used for clustering combines three parts. The first is the total cumulative number of confirmed cases from 11 January 2020 to 27 March 2020, which reflects the outbreak size of the local epidemic. The second is the normalized time series of daily new confirmed cases, which indicates the dynamic variation of the local epidemic. The third is the cumulative number of confirmed cases when the epidemic was under control in most cities (from 5 March 2020 to 27 March 2020), which reflects the resurgence situation of local epidemics. This synthesized time series, given by where Y i is the synthesized time series for clustering in city i, N i is the cumulative number of confirmed cases in city i, and Z i ({Z ij , j = 1,2, . . ., T}) is the standardized time series of daily new confirmed cases in city i defined as where n ij is the number of daily new confirmed cases and N r i is the cumulative number of confirmed cases in city i from 5 March 2020 to 27 March 2020.

Cluster the time series for categorizing
After generating the synthesized epidemic time series, we applied a K-means clustering algorithm (Jain 2010) for categorizing cities with different epidemic characteristics. The number of clusters is determined by the silhouette coefficient (Rousseeuw 1987) as a quantitative index, which is defined as Here, aðiÞ stands for average distance between cluster centre i and other samples in the same cluster. bðiÞ stands for minimum value of average distances between cluster centre i and samples in other clusters. Large silhouette coefficient means a good clustering result. Thus in this process, we try different number of clusters and determine the clustering result with the optimal silhouette coefficient.

Post-processing to assign clusters to cities of different patterns
Once the clusters have been determined, we perform post-processing to assign each cluster to cities with specific patterns. Each pattern is determined by a decision tree based on three quantification indices including cumulative cases number, duration and turning point (Figure 2a,b). A cluster is assigned to a specific pattern type (quantified by these three indices) whose centre is close to the cluster centre i in step2. Considering the spatial effect, the final clustering result also need to be adjusted to make the neighbouring cities into the same category as possible. In this case, we select the cities surrounded by cities of other types, and compared their pattern quantification indices. If each index of the sample city is within one standard deviation of mean value in the cluster, we can adjust the type of the city to the same type of the surrounding cities.
The complexity of this entire process is mainly determined by the clustering step, which is at most P nCluster nCluster � Oðn � nIterationÞ. Here, n is the number of cities. m is the length of the synthesized epidemic time series. nIteration is the number of iterations in the K-means algorithm. nCluster is the number of clusters to be determined. The overall complexity of the algorithm can be simplified as O(n), namely, linear complexity. This algorithm is coded and performed in a Matlab software, whereas the result is presented in ArcMap.

Dynamic characteristics of COVID-19 in China's major cities
Figure 3 provides the daily confirmed cases in China. As can be seen, Wuhan was the most severely hit area during the COVID-19 epidemic in China. With a sharply increasing trend in January, daily confirmed cases there reached the peak value in mid-February and then gradually decreased until the pandemic was under control in March. In contrast, the outbreak trends in Hubei Province (excluding Wuhan) and other places are relatively flat, and the peaks of daily confirmed cases appeared earlier. Figure 4a shows the date of the first confirmed case occurrence for each city analysed in this study. In addition to Wuhan and Huanggang, the earliest occurrences of confirmed cases are mainly distributed in megacities (or major cities) in China, including Beijing, Shanghai, Tianjin, Chongqing, Guangzhou, etc. These cities are densely populated and have large population flows from Wuhan. Therefore, while the epidemic spread in Hubei Province, infected individuals also travelled to other major cities across the country through the interregional transportation network and the disease spread to the surrounding urban agglomerations, which led to outbreaks of the epidemic all over China. Figure 4b shows the cumulative number of confirmed cases for each city analysed in this study. It can be seen that confirmed cases of COVID-19 are mainly distributed throughout the central and eastern regions in China. The areas with severe outbreaks are centred on Wuhan and are mainly distributed in Hubei Province and its neighbouring major cities, including Chongqing, Nanchang, Changsha and Xinyang. The number of cumulative confirmed cases in these cities exceeded about 1,000 and the number in Wuhan reached more than 50,000. Other endemic areas away from Wuhan include Beijing, Shanghai, Wenzhou, Guangzhou and Shenzhen. More than 400 cases were confirmed in each of these cities.  The durations of the respective outbreaks are shown in Figure 4c. It can be seen that Hubei Province and its neighbouring cities (Chongqing, Chengdu, Nanyang, etc.) are the areas with the longest outbreak durations. Durations in provinces in central and eastern China (Hunan, Jiangxi, Anhui, Henan, etc.) were mainly up to 1 month only. In other provinces, the outbreaks lasted no more than 3 weeks. Although the epidemic was under control in most cities by the end of March, there are still a number of imported infection cases in Beijing, Shanghai, and Guangzhou. In these cities, along with Wuhan, Hong Kong and Taiwan, the COVID-19 epidemic is still ongoing.

City clusters of different epidemic dynamic patterns
The spatial distribution of each type of epidemic situation is presented in Figure 5, and six epidemic indices in different types of epidemic situation are presented in Figure 6. On the basis of these, Table 2 summarizes the different types of epidemic situations with different COVID-19 dynamic patterns in 83 major cities. Type 1, represented by Wuhan, is categorized as 'extreme outbreak area' for the earliest appearance of cases, largest number of cumulative cases, and longest duration. The type 2 cities around Wuhan in Hubei Province, such as Xiaogan, Huanggang, Ezhou, etc. are categorized as 'large spread areas' with thousands of confirmed cases and outbreak durations of over 1 month. Type 3 are mainly megacities with large population and traffic flows, including Beijing, Tianjin, Shanghai and Shenzhen. These cities are categorized as 'potential resurgence areas' with fewer confirmed cases compared to type 2 cities. However, due to large population flows, a few imported infection cases persist despite the fact that the epidemic is under control in most cities. Type 4 includes two small cities in Hubei Province and some major cities away from Hubei, including Harbin, Dongguan and Tangshan. These cities are categorized as 'middle spread areas' because the cumulative confirmed cases and outbreak durations of these cities are all at midlevels. Cities of type 5 include several major cities in south Henan, west Anhui, and Chongqing. These cities are categorized as 'controlled outbreak areas', because the outbreak durations were relatively short and the turning points were earlier due to effective prevention and control measures. Cities of type 6 are mainly distributed in north Hunan, east Jiangxi, west Jiangsu, etc. These cities are categorized as 'limited growth areas' based on small numbers of cumulative confirmed cases and no turning points. Cities of type 7 include Hong Kong, Macau and Taiwan, where daily new confirmed cases began to rise in the late period of the epidemic. These cities are categorized as 'delayed outbreak areas' Type 8, Jining, is a special case where 200 confirmed cases suddenly emerged in a prison on 20 February 2020. This city is categorized as 'Lag report areas'.

City background conditions of different types of epidemic dynamic patterns
In order to understand the driving factors among the city background conditions for each category of epidemic dynamic pattern, we calculated five indices using the Baidu migration and China Statistical yearbook data, including the population flows from Wuhan, distance from Wuhan, local residential populations, activity ratio (ratio between 1-week cumulative ITI after and before 23 January, which equals to cumulative ITI from 17 to 23 January divided by cumulative ITI from 24 to 30 January), and number of local hospitals. The first two indices represent interactions with the extreme outbreak area. The third index suggests potentially susceptible population sizes. The fourth index reflects the government emergency response capability, and the last index the medical resource conditions. Table 3 shows the statistics of the city background condition indices for each type of epidemic situation. It is clear that in most cases extreme outbreak areas are likely to appear in metropolises with large populations, such as Wuhan (Hu et al. 2020), Milan (La Maestra, Abbondandolo, andDe Flora 2020) and New York City (Desjardins, Hohl, and Delmelle 2020). These cities generally hold the most or earliest confirmed cases in the early period of the epidemic. The large spread areas are generally the cities around extreme outbreak areas, because these cities have large population flows and thus import cases from there. In addition, limited hospital resources are also one of the reasons for epidemic outbreaks in such cities. The potential resurgence areas tend to be international cities or port cities (including Beijing, Shanghai and Shenzhen) with large populations and flows of domestic or foreign people; thus, population flows are also believed to be a potential driver of the COVID-19 epidemic spread (Jia et al. 2020;Ren et al. 2020). The middle spread areas are also cities with relatively high population flows from extreme outbreak areas (not as high as cities of large spread areas), but have a lower number of local residents. Controlled outbreak areas are mainly cities with strong government emergency response capabilities, such as Chongqing (Heet al. 2020) and Wenzhou (Han et al. 2020). In these cities, once the government realized the severity of the situation, strict prevention and control policies were implemented immediately to contain the spread of the epidemic. The limited areas are generally cities far away from extreme outbreak areas. These cities have a limited population flow from areas severely hit by the epidemic and thus only a few occasional cases are imported. The delayed outbreak areas are special administrative regions where new confirmed cases are more likely to be imported from overseas. The lag report areas are very likely to be cities with poor medical treatment capabilities, where new confirmed cases are reported late due to limited testing capacities. However, this case was a speciality case in China because most of the confirmed cases occurred in a prison.

Endemic prevention and control measure recommendations
Based on the above analysis, we recommend different policy guidelines according to the empirically predicted epidemic dynamic patterns and their city background conditions along with the early epidemic situation in Table 4. Then, we also list some specific endemic prevention and control measures for each type of cities. For the primary source city and its surrounding cities, strict control policies are indispensable for containing the outbreak situation, especially for metropolises with large populations (type 1 cities) and cites with large population flows from those regions (type 2, 3 and 4 cities). Because preventing the spread of infected cases and containing the epidemic is the primary task in these cities. In this respect, quarantine of the outbreak area and travel restrictions from neighbouring cities with large population flows along with family isolation are recommended as the most effective measures (Liu et al., 2020a;Flaxman et al. 2020;Tian et al. 2020;Jia et al. 2020). Moreover, rapid surveillance based on efficient virus testing capacities and adequacy of medical resources are also considered as key measures for these cites once outbreaks have started (R. Liu et al., 2020b;Zhang, Litvinova, and Liang et al. 2020). As we knew, the medical staff and resources assistance from other cities to Hubei Province strongly support COVID-19 containment later (Chinazzi et al. 2020;. At the same time, the establishment of new hospitals, including Leishenshan Hospital and Huoshenshan Hospital and Fangcang shelter hospitals also play an important role in the COVID-19 pandemic Z. Li et al. 2020e). These measures have  effectively contained the COVID-19 epidemic in the outbreak areas. For cities with strong government emergency response capabilities and cities far away from extreme outbreak areas where the epidemic is not serious, semi-control and gradually resumption policies are recommended. Because, during the middle period of the epidemic, when quarantine and travel restrictions are well implemented and most cities hit by the epidemic have been isolated, gradually resumption of work under the epidemic control is also an important task. In this respect, mild control policies including zonal management, personal contact-tracing, family isolation and other social distancing measures (such as workplace distancing, bans of public gatherings) may be more efficient for controlled outbreak areas (type 5 cities) and the limited growth areas (type 6 cities) (Ferretti et al. 2020). For delayed outbreak areas (type 7 cities) and cities with poor medical treatment or socioeconomics (type 8 cities), however, the government should maintain surveillance to ensure that local infected cases have disappeared and to prevent a resurgence of the epidemic. Therefore, surveillance and control policies are considered as appropriate measures. It is also recommended to delay loosening restrictions and improve testing capacities for these areas even if the outbreaks are under control.

Guidance for the global COVID-19 epidemic from China's situation
Although the number of infected cases is relatively smaller and the duration is relatively short compared to global situation, China's experience on COVID-19 provides the earliest and complete case-study Huang et al. 2021). This dynamic process of COVID-19 epidemic in China, including a whole 'emergenceoutbreak-under control-gradually vanish' epidemic period, is believed to provide important guidance for global epidemic prevention and control (Liu et al. 2021). First, the epidemic indices proposed according to China situation at different stages can be useful to describe dynamic process of COVID-19 epidemic in foreign cities, and some indices such as first occurrence, turning point, trend of infected cases are of important guiding for the implementation of epidemic prevention and control policies (Li, Feng, and Quan 2020;Tang et al. 2020). Second, through classifying the different dynamic patterns in Chinese cities, COVID-19 epidemic situation in some foreign cities with similar characteristics can be categorized, and then some epidemic indices may be predicted, such as duration, infected cases number, etc. More importantly, different endemic prevention and control policies can be recommended according to the categorized dynamic patterns for foreign cities (Lei et al. 2020;Fricke et al. 2021) so that we could effectively contain the epidemic spread and achieve resumption of work as soon as possible. In general, China's epidemic dynamic process has many similarities with global epidemic, and during this period, the epidemic dynamic patterns, medical response, and endemic prevention and control policies have provided valuable experience and reference for the global COVID-19 epidemic.

Conclusions
In this study, we categorized eight types of epidemic situations with different epidemic dynamic characteristics in Chinese cities, based on a time-series clustering method, and analysed the driving factors among the city background conditions, including population flows, local resident numbers, government emergency response capabilities, and medical resource conditions. We suggest that different endemic prevention and control measures for containing COVID-19 epidemic should be implemented depending on the epidemic situation type. Our work may provide effective recommendations for cities with ongoing (or upcoming) COVID-19 outbreaks. We emphasize the importance of focusing on the primary outbreak areas (source cities) of the epidemic and their surrounding areas as well as international (and port) cities with large populations and flows of domestic and foreign people. Moreover, we believe that contact-tracing and testing capacity will play an important role in some cities during future outbreaks of COVID-19.
This study is subject to some limitations. First, the data used for analysis are daily reported numbers of confirmed cases, which differ from the actual numbers of infected people because of delay time from disease onset to diagnosis. Therefore, some dynamic characteristics in cities, such as rise duration, turning point, may shift a few days. Second, uptrends of cases may be overestimated in cities of Hubei Province, because testing capacities of cities in Hubei Province have been rapidly improved in the early period of epidemic due to the assistance from other provinces and the establishment of 'Fangcang' hospital. Nevertheless, we believe that the final clustering results of major cities will be little affected and the endemic prevention and control measures are still instructive. Our study is solely based on the characteristics of the epidemic in China's major cities. Considering that the situation in foreign cities may be different, future work should focus on worldwide epidemic dynamics with a more appropriate data set.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This study was funded through support from the National Natural Science Foundation of China [Grant Nos. 42041001, 42071435, 41525004, and 41421001].