Using an age-structured COVID-19 epidemic model and data to model virulence evolution in Wuhan, China

COVID-19 is a disease caused by infection with the virus 2019-nCoV, a single-stranded RNA virus. During the infection and transmission processes, the virus evolves and mutates rapidly, though the disease has been quickly controlled in Wuhan by ‘Fangcang’ hospitals. To model the virulence evolution, in this paper, we formulate a new age structured epidemic model. Under the tradeoff hypothesis, two special scenarios are used to study the virulence evolution by theoretical analysis and numerical simulations. Results show that, before ‘Fangcang’ hospitals, two scenarios are both consistent with the data. After ‘Fangcang’ hospitals, Scenario I rather than Scenario II is consistent with the data. It is concluded that the transmission pattern of COVID-19 in Wuhan obey Scenario I rather than Scenario II. Theoretical analysis show that, in Scenario I, shortening the value of L (diagnosis period) can result in an enormous selective pressure on the evolution of 2019-nCoV.


Introduction
Severe acute respiratory syndrome coronavirus 2 (2019-nCoV) is the causative agent of coronavirus disease . The virus emerged in Wuhan of China in December 2019 and quickly spread all over the world [1]. The World Health Organization declared the COVID-19 a pandemic on 11 March 2020 [1]. By end of June 2020, the virus has infected more than 10 million people worldwide and more than 500,000 of them have died [2]. In the process, it has forced massive lockdown and caused significant economic loss and devastation.
Because of its enormous public health impact and the need for guidance of health measures, many researchers have focused their efforts on the modelling of 2019-nCoV and its spread in the population (see mini review in [18]). Both mathematical models (e.g. [11,14]) and statistical approaches (e.g. [7]) have been used. The disease spread mechanism has been studied by using data fitting [9,11], and the disease spread has been forecasted in different regions to provide suggestions to government officials for the disease control, work resumption and return to school [3].
As of the end of June 2020, the spread of the disease in China has been contained. Medical technology and quarantine measures for the disease control are both improved significantly in mainland of China. Looking back at the course of the defense activities, quarantine policy played a key role. Especially, the quarantining of close contacts was very important. Following the progression of the disease, susceptible individuals who have been exposed undergo a short exposed period when they are not contagious. The exposed period lasts from exposure to 2 days before showing symptoms. After the exposed period, infected individuals become contagious, and after they show symptoms, they can be confirmed and possibly hospitalized [5].
While there are still many open questions on 2019-nCoV in regard to whether it imparts immunity and for how long, some very recent studies suggest that the virus is subject to continuous evolution [17]. By the tradeoff hypothesis, epidemic model with infection age has been used to study the virulence evolution. In the infection-age model, two scenarios, i.e. two pattern of transmission rate, were used to study the virulence evolution process [8].
On 12 February 2020, the government of Wuhan built several 'Fangcang' hospitals and more than 10,000 infected individuals were diagnosed and lived in a hospital. As a result, the spread of COVID-19 in Wuhan was quickly contained. Naturally, there are some questions that need to be addressed. Which is the key factor in the control of COVID-19 when 2019-nCoV continuously evolves? What is the virulence evolutionary path of 2019-nCoV?
To answer the above questions, the dynamic model should be involved in the distribution of quarantined individuals with respect to quarantined age ([4]) and the distribution of confirmed individuals with respect to confirmation age. Especially, the disease transmission process and the within host infection process should be linked by use of the infection age. Motivated by the above discussion, in this paper, we will formulate a new epidemic model involving infection age, confirmation age and quarantined age to study the virulence evolution of 2019-nCoV. This paper is organized as follows. In Section 2, we establish a new age structured SEI-JGR epidemic model to track the continuous infection stages, confirmation processes and quarantine period. Both theoretical and numerical analysis are carried out to reveal the virulence evolution of 2019-nCoV in two special scenarios in Section 3. Based on the data after 12 February 2020 and sensitivity analysis, our assessment works helps us identify the model of the virulence evolutionary path of 2019-nCoV in Section 4. In Section 5, we give a brief discussion.

Dynamical model of COVID-19 transmission
Motivated by the mechanism of COVID-19 transmission, we use the flowchart in Figure 1 to describe the disease spread and to formulate an SEIJGR epidemic model. In this model, S(t) is the number of susceptible individuals at time t, E(t) is the number of exposed individuals at time t who are not infectious; I(t) is the number of infectious individuals at time t who have not been confirmed or detected, J(t) is the total number of the confirmed and isolated individuals at time t and G(t) is the total number of the quarantined individuals at time t. R(t) is the total number of the recovered individuals at time t. Susceptible individuals get into a contact with an unconfirmed infective individual. A fraction (1 − ) of the newly infected individuals become exposed, while a fraction become quarantined. After the exposed period the undetected exposed individuals become undetected infectious and can be detected and isolated at rate γ 1 . On 29th January, a paper in NEJM about 2019-nCoV spread in China [12] showed that the mean value of the time from infection to showing symptoms is 5.2 days ∈ [4.1, 7] (CI 95%) and the maximum value is 12.5 days ∈ [9, 18] (CI 95%). The same reference suggests that 89% of infectious individuals see a doctor more than 5 days since infected (see [12]). Thus symptoms appear on the average in the 6th day or later after exposure. Further, it is known that the exposed individuals may be infectious approximately 2 days before showing symptoms [5]. Hence, the exposed individuals cannot infect others in the first 4 days of infection [5]. Thus we assume that the transfer rate from the exposed to the infectious σ = 0.25.
Note that the susceptible individuals will enter into the quarantined compartment at the ratio once they have been identified as close contacts of an infectious individual. The individuals in the quarantined compartment remain quarantined during the exposed period (5.2 days). Then the quarantined individuals either become confirmed and hospitalized or transfer into the susceptible class. We assume the recovery rate of quarantined individuals is δ 2 = 1/14 [10,16] and the recovery rate of infected individuals in J class δ 1 = 0.1 [15].
After 23 January 2020, when the lockdown of Wuhan started, the control measures started excerpting evolutionary pressure on the virus and the evolution of 2019-nCoV started play an important role in the disease transmission. To study the evolution of the virus, we will use two life histories of the virus which can be considered in the context of age-since-infection structured epidemic models [8]. Thus we address this disagreement by using an age-since-infection model: with the initial conditions The independent variables a, b and θ in model (1) are the infection age (time since entering into the infectious class), the confirmation age (time since entering into the confirmed and isolated class) and the quarantine age (time since quarantined) respectively. We surmise that the confirmation time of the quarantined individuals γ 2 (θ ) should obey a gamma distribution with the mean value 4 days, due to the mean value of the exposed period being 4 days (σ = 0.25) and the exposed period ranging from 1 day to 14 days (see [4]). Further we assume that since the mean time of each confirmed individuals in hospital δ 3 (b) is 9 days and the range is the interval [5, 12.75], the recovery time of the hospitalized obeys a normal distribution with minimum and the maximum recovery time of 5 days and 13 days and mean value of 9 days [6]. To summarize, we assume that the expressions for γ 2 (θ ) and δ 3 (b) are gamma distribution function 28 × (32, 8) and a normal distribution function 2.6 × N (9, 1.2) respectively (see Figure 2). The other parameters' values are given in Table 3.
It follows from the idea of [13], we define the basic reproduction number of our model (1) as follows: In the next section, we use our model (1) and the cumulative confirmed case data in Wuhan before 'Fangcang hospitals' were introduced to perform data fitting. From the data fitting, we will obtain the values of the parameters related to the viral evolution, and then get a preliminary understanding on the evolution of 2019-nCoV.

Fitting data to understand 2019-nCoV virulence evolution
In this section, we will carry out the data fitting to derive the values of the parameters β(a) and γ 1 (a) which shed light on the viral evolution of its transmission rate β(a) in terms of the control measures modelled by γ 1 (a). We also compute the value of the basic reproduction number R 0 and obtain a rough understanding about the virulence evolution of 2019-nCov. For the period starting on January 23, the initial conditions of our model are assumed as S 0 = 6, 081, 000, E 0 = 750 and I 0 (a), J 0 (b), G 0 (θ ) are given in Figure 3. The predetermined parameter values are given in Table 3. To carry out the data fitting, we will use the reported case data in Tables 1-2 and consider two highly simplified parasite life histories, captured through the form of the parameters β(a) and γ 1 (a), which are taken as Scenario I and Scenario II from [8] (see Figure 4). These scenarios provide two possible virulence evolutionary paths of 2019-nCoV and help us get a clear understanding about the disease transmission mechanism.
In Scenario I, we assume that transmission begins at infection age τ β and is constant thereafter. Parasite-induced confirmation begins at infection age τ γ and is constant  thereafter. It then follows that Obviously, τ γ is larger than τ β . Let L = τ γ − τ β (see Figure 4). Then the basic reproduction number simplifies to In Scenario II, from Figure 4, we take Obviously, τ γ is larger than τ β , and we let L < D < +∞ (see Figure 4). Then it follows that the basic reproduction number takes the form In the following, we will do data fitting based on our model (1) in Scenario I and Scenario II, respectively. By taking we use the cumulative confirmed cases 23 January-11 February in Wuhan Y i to fit the variable J c , which satisfies with the object function   The values of the parameters β, L, and D are obtained in Table 4 and the fitting result is shown in Figure 5. Figure 5 shows the real data is well fitted by our model when γ 1 = 0.4. Note that the value of γ 1 is influenced by the virulence of the virus. To improve the data fitting and obtain a rough understanding of the viral evolution, we will choose various values of γ 1 and perform the fitting for each one of them.
In Scenario I, we choose γ 1 = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7 respectively. From the results presented in Table 5, we see that the value of the transmission rate β is increasing with the increasing of the value of the confirmation rate γ 1 . However, this dependence results in the decreasing of the value of the basic reproduction number R 0 , and the decreasing process seems to be steady. As we will see in the next section, analytical computations show that the increase of β with respect to γ 1 can be expected. This fact also demonstrates the accuracy of data fitting results. Even with the change, the values of β fall in the interval [2.8202 × 10 −7 , 3.3168 × 10 −7 ], and the value of R 0 falls in an interval [1.8967, 2.4186].   In Scenario II, let τ β = 1 and we take     Tables 5-8, has shown the relationships between different parameters. Figure 6(d) shows that the virus 2019-nCoV tries to keep the value of R 0 as a constant which is larger than 1. Then the disease will persist if R 0 > 1, based on the knowledge of infectious disease dynamics.

Understanding the virulence evolution of 2019-nCoV using theoretical results
In this section, we will justify the virulence evolution of 2019-nCoV results obtained from the fitting using analytical tools. It is assumed that the virulence evolution of 2019-nCoV is continuously and naturally before 'Fangcang hospitals', i.e. the case-confirmation rate γ 1 is totally determined by the virulence, the evolutionary path is fixed (one of the two scenarios).
Note that the disease transmission rate β and confirmation rate γ 1 are both derived from the virulence of 2019-nCoV. It is reasonable that the disease transmission rate β has a strong correlation with the confirmation rate γ 1 and the values of them are changing with the virulence evolution of 2019-nCoV. Based on the trade-off hypothesis [8], higher virulence is directly related to higher transmission which would help the virus spread, however, the higher virulence leads to higher confirmation rate which would reduce the virus spread. These results lead to the balance of disease transmission, and the magnitude of the basic reproduction number R 0 will stay the same. Based on this idea, we will carry out the following theoretical analysis.
In Scenario I, R 0 is given by Equation (5) and we treat β as a function of γ 1 . Differentiating Equation (5) with respect to γ 1 and setting the derivative equal to zero, we obtain the following equation: If L = 0, Equation (11) reduces to β β = 1 γ 1 +δ 1 . Then β can be expressed as a function of γ 1 , β(γ 1 ) = β(0) γ 1 +δ 1 δ 1 , and it is an increasing function of the case-confirmation rate. If L > 0, we get a simple interpretation of the time lag L from the left-hand side of Equation (11). The value of the factor in the square bracket is larger than one if L > 0, and the growth rate of β with respect to γ 1 is smaller than that with L = 0. In the following, we will perform analysis to obtain a deeper understanding of the role of L in the virus's evolution processes. Let Then, from (11), we have Integrating both sides of Equation (13) with respect to γ 1 from 0 to γ 1 , we have Meanwhile, the factor kδ 1 + 1 (kδ 1 + 1) + kγ 1 ≤ 1, and it is decreasing with the increasing of k. Note that k is an increasing function of the time lag (diagnosis period) L (see (12)). Thus the increasing of time lag L reduces the growth speed of β with respect to γ 1 and slows down the evolution speed of the virus. In the selective pressure the virulence is exerting on the transmission rate, the value of γ 1 is growing faster than the value of β as L increases. Therefore, shortening the value of L (diagnosis period) can result in an enormous selective pressure on the evolution of 2019-nCoV. In Scenario II, treating β as a function of γ 1 , we differentiate Equation (7) with respect to γ 1 and set it equal to zero and obtain the equation Equation (15) will become Equation (11), when the value of D is infinity.
If the period D is finite, we can observe from Equation (15) that the growth of β with respect to γ 1 is nonlinear which agrees with the results in Tables 6-8 obtained from the fitting. However, comparing with the case when D is infinity, we conjecture that the finite period D slows down the evolution speed of the virus. Especially, the smaller the value of D, the smaller the dependence of β on γ 1 for L < D < +∞, which is consistent with data fitting results (see Figure 5).
From the above theoretical analysis, we get a main idea of the virulence evolution of 2019-nCoV from the perspective of the tradeoff hypothesis [8]. These theoretical results clearly show the relationship between γ 1 and β. It is reasonable to believe that β and γ 1 are positively coupled through their mutual dependence on host exploitation. Under the selective pressure, the virulence of 2019-nCoV forms a trade-off with transmission and confirmation, that is, the virulence chooses middle-of-the-virulence value to keep transmission and detection in balance and to coexist with humans for a long time. These two virulence evolutionary paths of 2019-nCoV (Scenario I and Scenario II) can provide reasonable explanations from theoretical analysis and data fitting. The problem which virulence evolutionary path is the true for 2019-nCoV is of interest. To investigate, we perform sensitivity analysis with the data after 'Fangcang' hospitals.

Determining key parameters of R 0
To understand better the virulence evolution of 2019-nCov, we perform sensitivity analysis of the basic reproduction number R 0 in (5) (Scenario I) and (7) (Scenario II) with respect to its parameters. Based on the data fitting results, we employed Latin Hypercube Sampling (LHS) to identify the rank of key parameters that affect the basic reproduction number. Using the LHS technique with 100,000 samples, we plot the partial rank correlation coefficient (PRCC) respectively for Scenario I and Scenario II in Figure 7.
In Scenario I, from Figure 7 we see that the parameters , γ 1 , β and L have PRCC larger than 0.5 in absolute value and are therefore the most sensitive parameters. Other parameters that are also influential are δ 1 and τ β . Consequently improving the values of and γ 1 can lead to putting the disease under control. In Scenario II, we see that the parameters D,  L, and γ 1 are the most sensitive parameters, and the parameters β, are no longer sensitive parameters. That raises the question whether improving the values of γ 1 and can lead to putting the disease under control. Note that D is the most influential parameter in Scenario II but it is not present in Scenario I. D will effect the multiples of the parameters γ 1 and .
These above differences between two evolutionary paths may give us a chance to investigate further the virulence evolutionary path of 2019-nCov by use of the data after 'Fangcang' hospitals.

Assessing control measures
The confirmation rate of COVID-19 is determined by the detection technology and the virulence of 2019-nCoV. Medical detection technology improves the value of γ 1 , and epidemiological investigations increase the value of . 'Fangcang' hospitals provide sufficient space for the quarantined and confirmed individuals. Confirmation and quarantining were two useful and achievable interventions against the spread of COVID-19 in Wuhan. In this subsection, we will assess the role of these two control measures in the containment of COVID-19 in Wuhan.
In Scenario I, we obtained from the PRCC analysis that and γ 1 are the two most sensitive parameters. One important question is whether applying the control measures jointly is better than applying a single control measure. Another important question concerns the timing of control measures. Based on the evolution of 2019-nCoV, we will address the question of the timing of the intervention strategies.
If we use a single control strategy, namely detection of infected individuals, even if we increase the confirmation rate (replace the value of γ 1 by γ 1 = 0.95) and use the remaining parameters in lines 2, 4, 6 of Table 5, respectively (see Figure 8 (a)), we obtain that the theoretical plots of Figure 8 (a) are far away from the data. In another single control strategy, by increasing quarantining rate (replace the value of by = 0.95) from 12 February 2020 only, we perform simulations (see Figure 8 b), the theoretic plots also do not match the data. Results show that using a single measure is not enough.
It is necessary to combine the two control measures. We need to both enhance the quarantine rate and improve the detection technology. By 11 February 2020, the detection technology was still relatively inadequate, and hence, the value of γ 1 is almost totally derived from the virulence of 2019-nCoV. After 11 February 2020, with the development of the nucleic acid detection, artificially improving the value of γ 1 became possible.
In Scenario I, the discussions in Section 3.2 show that the transmission rate β is monotonically increasing with respect to γ 1 . The lower the value of γ 1 the smaller the value of β, and the more efficient the joint control measures. Numerical simulations show the effect of the control measures according to the real data (see Figure 9). In Scenario I, if the confirmation rate γ 1 is less than 0.2 before 11 February 2020, then the data of the cumulative confirmed cases will be matched by the joint control measures, otherwise, agreement between the data and the projections cannot be achieved. We conclude that the start of the joint control measures should occur when the disease transmission rate is very small (i.e. γ 1 = 0.2).
In Scenario II, the data of the cumulative confirmed cases cannot be matched by the joint control measures. Hence, we conclude that Scenario I is a better model of COVID-19 than Scenario II.

Discussion
It took 3 months to control COVID-19 in Wuhan, China, from the outbreak to the near elimination of the disease. Although it has been contained by the 'Fangcang' hospitals, a risk of a second outbreak still exists, potentially due to the virulence evolution of 2019-nCoV. In the context of infection age, confirmation age and quarantine age usually affect the viral evolution. In this paper, an age structured epidemic model is formulated to study the virulence evolution of 2019-nCoV by use of data and tradeoff hypothesis.
By taking two possible special scenarios as the virulence evolutionary paths of 2019-nCov, we perform data fitting based on our age structured epidemic model. Results show that the disease transmission rate and the confirmation rate of COVID-19 are strongly correlated before 'Fanfcang' hospitals. Using the tradeoff hypothesis in [8], we present theoretical analysis which deepens the understanding of the virulence evolution of 2019-nCov in two possible scenarios. In Scenario I, the virulence evolution of 2019-nCov is slowed down by the lag L which models the period of transmission before confirmation. In Scenario II, the transmission and confirmation only last for a fix duration D, and the value of D slows down the virulence evolution, compared with the virulence evolution in Scenario I.
Using sensitivity analysis, we find that the quarantining rate and the confirmation rate are two most sensitive parameters of the basic reproduction number in Scenario I, but they are not the most sensitive parameters in scenario II. Simulations suggest that after 'Fangcang' hospitals, the joint intervention measures play a key role in the disease control in both scenarios, regardless of the sensitivities of and γ 1 . In Scenario I, the viral evolution process is simple and the COVID-19 can be controlled if the joint intervention measures be carried out as soon as possible. However, in Scenario II, the viral evolution process is complex, the control of COVID-19 spread will be more difficult.
Note that the disease is quickly controlled by 'Fangcang' hospitals in Wuhan. The assessment based on our age-structured SEIJGR epidemic model provides us an opportunity to identify the virulence evolutionary path of 2019-nCov. We find that Scenario I is a better model of virulence evolutionary path of 2019-nCov than Scenario II. Our theoretical analysis coupled with data gives us more clear understanding of 2019-nCoV and the disease COVID-19.