Parameter identifiability and optimal control of an SARS-CoV-2 model early in the pandemic

We fit an SARS-CoV-2 model to US data of COVID-19 cases and deaths. We conclude that the model is not structurally identifiable. We make the model identifiable by prefixing some of the parameters from external information. Practical identifiability of the model through Monte Carlo simulations reveals that two of the parameters may not be practically identifiable. With thus identified parameters, we set up an optimal control problem with social distancing and isolation as control variables. We investigate two scenarios: the controls are applied for the entire duration and the controls are applied only for the period of time. Our results show that if the controls are applied early in the epidemic, the reduction in the infected classes is at least an order of magnitude higher compared to when controls are applied with 2-week delay. Further, removing the controls before the pandemic ends leads to rebound of the infected classes.


Introduction
Mathematical models have been widely implemented to study the spread and interventions aimed to mitigate the impact of the COVID-19 pandemic. Model complexity has varied significantly, from the simple Susceptible, Exposed, Infected, Recovered (SEIR) model involving a few parameters to more elaborate models with many more parameters that incorporate testing efforts, contact tracing and social distancing protocols [8,12,26,29,31,38,40,41]. For instance, Tang et al. [29] showed that intensive contact tracing followed by quarantine and isolation was effective for reducing the reproduction number and risk of transmission in a model that incorporated clinical progression of disease. In an extended SEIR model, Zou et al. considered testing and under-reporting of COVID-19 cases, which was later trained by machine learning algorithms [41]. Lu  of information about testing efforts overt time [19]. Researchers have used the cumulative number of cases, quarantined population [29], number of confirmed and fatality cases [41], at early stages of the pandemic. However, the ability to accurately forecast epidemics is highly dependent on (i) amount and type of data, whether there exists sufficient data to identify the model parameters, and (ii) model structure, whether the model is structured to reveal its parameters from the ideal noise free data [33,34]. Insights from mathematical models that incorporate features of disease transmission, incubation period, and other features of a virus can be used to improve our understanding of the spread of diseases and investigate optimal intervention strategies. We are often challenged with the decision of implementing models that are simple enough to provide meaningful inference given the limited data but also models that realistic enough to capture key underlying virus dynamics and epidemiological states. The implementation of more complex features often involves models with large number of unknown parameters and epidemiological states that while important are often not observed or captured. The variability and lack of accuracy in the projections based on some mathematical models have raised concern among public health officials, specially during the early phase of the COVID-10 pandemic. Therefore, it is important that modelling studies of COVID-19 investigate the question of parameter identifiability [21]. A model is said to be (structurally) identifiable, if its parameters can be determined uniquely from the observations such as daily cases and fatalities. If a model is unidentifiable, then infinitely many parameters would give the same observation. In that situation, multiple parameter combinations satisfying the parameter correlations would fit the same data sets, which will result in misleading and inaccurate predictions. Formally studying the parameter identifiability for COVID-19 models will be valuable for several reasons. First, it will provide some guidance about the limitations of the models themselves being proposed in providing valuable predictions and inferences based on data that is currently available. Findings of studies on parameter identifiability are also essential for increasing awareness about the importance of timely access to data that are instrumental for improving estimates and projections.
In this study, we propose a model to study the spread of SARS-CoV-2 and assess the effect of social intervention measures to curtail the spread of the COVID-19 pandemic during the early phase of the outbreak. The findings of this study are most relevant to understanding the spread of the pandemic as it unfolds initially. To our knowledge, this is the first study that has considered the question of both structural and practical identifiability of a COVID-19 model, but the structural identifiability of several COVID-19 models has been recently considered in [21]. Structural and practical identifiability of epidemic models for Ebola, Cholera, Dengue, Rift Valley Fever and Zika have been studied before [5,9,10,[32][33][34][35]. Even though, it is not possible to identify individual parameters of the epidemic model from the observations, two independent studies confirmed that it is possible to accurately estimate the basic reproduction number from the epidemic data [9,33]. In this study, we investigate the structural and practical identifiability of a COVID-19 model that considers six epidemic states. Susceptible (S) individuals can be transitioned into a latency period (E), become infectious with no symptoms and not being tested (I n ), or can move into symptomatic infectious and tested state (I s ), symptomatic isolated and treated (I h ), and recovered individuals (R).
In Section 2, we consider a COVID-19 model with no infection control measures (e.g. social distancing), derive the basic reproductive number, and investigate parameter identifiability using daily COVID-19 cases and deaths data. In Section 3, we discuss structural and practical identifiability of the model. We determine structural identifiability using the Differential Algebra approach given its ability to eliminate the unobserved state variables. Using data on daily incidence of cases and deaths, we establish that the proposed model is not structurally identifiable but we improve its structural identifiability by fixing some of the parameters from external information. We further investigate parameters practical identifiability by performing Monte Carlo simulations on a constrained minimization problem numerically from noisy observations. This approach is local and is similar in nature to approaches with derived 95% confidence intervals, bootstrapping or sensitivity analysis. Although Monte Carlo Simulations (MCS) are not a novel approach, the novelty in our approach is that we are applying MCS only after we have determined that the parameter estimation problem is well-posed. This framework is expected to lead to much more reliable solutions. A subset of two parameters were deemed practically unidentifiable. Given that shelter-in-place order were initiated in March 19, 2020, Section 4 incorporates an optimal control with control variables of reduction in transmission rates for exposed and asymptomatic infected individuals, as well as self-isolation for symptomatic infected individuals. The forward-backward sweep method is used to approximate the solution of the model with controls. Section 5 provides a discussion of the findings described in this study.
Structural non-identifiability arises from excessive parametrization of the proposed model or by choice of a completely wrong model structure [22,32]. This problem can often be resolved by reformulating the model to reduce the number of states and parameters or fix parameter values that are less relevant to model predictions of interest. Lack of practical identifiability can be remedied by employing an alternative model of lower complexity, collecting more data about model states to better characterize the system dynamic features, increasing spatial-temporal resolution of the data to better constrain the model parameters, and/or reducing the number of parameters that are jointly estimated.

A basic compartmental model for COVID-19 infection
We introduce a mathematical model of COVID-19 by splitting the total population, N(t), into non-intersecting classes based on the disease status. In US individuals with symptoms of COVID-19 were asked to self isolate at home. We denote this class by I s (t). If clinical symptoms were eventually developed, individuals were admitted to a hospital. We denote the hospitalized individuals who are isolated and treated by I h (t). Individuals who were not tested but may have had COVID-19 without symptoms are denoted by I n (t). So, the total population N(t) is divided into sub-populations that consists of susceptible individuals S(t), exposed individuals E(t), symptomatic and infectious individuals that are tested I s (t), asymptomatic and infectious individuals that are not tested and remain in the community I n (t), severe symptomatic individuals who are hospitalized and isolated I h (t), and recovered individuals R(t). The total population size is N(t) = S(t) + E(t) + I s (t) + I n (t) + I h (t) + R(t). We use standard incidence but since the isolated individuals do not mix in the population, we divide by N − I h . This implicitly assumes that the infection rate of hospital workers from the hospitalized COVID-19 individuals is relatively low and can be neglected. The model takes the form: Susceptible individuals acquire COVID-19 infection upon effective contact with an exposed individual or an infectious individual who may or may not show COVID-19 symptoms. The COVID-19 infection rate is assumed to be different for each infection class to account for the heterogeneity of the contact rates of infectious individuals with or without the clinical COVID-19 symptoms. Namely, the parameters β E , β n and β s represent the transmission rates for individuals in E, I n and I s compartments respectively. It is estimated that infectiousness starts 5, 8 and 11 days before the incubation period ends [14]. Recent papers have modelled the latent period and the pre-symptomatic period separately [13]. In this study, to keep the model simple we lumped these two stages in a class called E, which denotes the incubation period. The average transmission rate of two stages is not zero.
Hence β E = 0. Therefore, newly infected individuals that move to the incubation class E become infectious while they are still in class E. After the incubation period, which lasts for 1/k days, a fraction (0 < ρ < 1) of   We compute the reproduction number of (1) to obtain where N(0) = S(0) + E(0) + I n (0) + I s (0) + I h (0) + R(0) is the initial total population size. The reproduction number consists of three terms, each of which corresponds to secondary cases produced by one type of infectious individuals. The first term is the number of secondary infections produced by exposed individuals in an entirely susceptible population during their lifespan as exposed, the second term is the number of secondary infections produced by asymptomatic and uncounted individuals in an entirely susceptible population during their lifespan as asymptomatic; finally, the third term corresponds to the number of secondary infections produced by symptomatic individuals during their lifespan as symptomatic and counted.

Identifiability analysis: can we uniquely estimate the model parameters?
In the literature, the term identifiability analysis is used to describe whether it is possible to uniquely recover model parameters from a given set of data. It is a crucial step in parameter estimation problem and is usually performed in two steps. The first step is the structural identifiability which is a theoretical property of the epidemic model. Structural identifiability explores whether the epidemic model is structured to reveal its parameters from the observations. Here we use the term observation to refer to continuous smooth output where the data is measured at the discrete time points. If a compartmental epidemic model is structurally unidentifiable, it means that different sets of parameters could produce the same output. Hence a structurally unidentifiable epidemic model should not be used to analyse epidemiological data or design intervention strategies. If an epidemic model is structurally identifiable, then the second step is the practical identifiability. Practical identifiability considers the actual epidemiological data which is contaminated with noise.

Structural identifiability analysis
To study the structural identifiability of the epidemic model, we represent the model (1) in the following compact form: where x denotes the state variables and p denotes the parameters of the model of (1). The observations, such as daily incidences and deaths, are functions of the state variables. Let y 1 (t) denote the daily incidences and y 2 (t) denote daily reported deaths, then in terms of model variables the observations take the following forms; In compact form, we denote the observations as y(t) = (y 1 (t), y 2 (t)) such that Structural identifiability determines whether the model (2) is structured to reveal its parameters, p, from observations (4). Structural identifiability is a theoretical result which considers the 'perfect' scenario assuming that observation is given continuously for all time and is noise free. A model is said to be structurally (uniquely) identifiable if all the parameters in p can be recovered from the observations y. If even one parameter cannot be identified, the mode is said to be structurally unidentifiable. Several methods for determining structural identifiability have been introduced and studied widely [10,22,33]. Here, we will use the Differential Algebra approach to study the structural identifiability [22]. The goal of Differential Algebra approach is to eliminate the unobserved state variables using the system of differential equations in (2) and to obtain input-output equations. We obtain the input-output equations of the model (1) using the observations (3), which consist of a couple of monic differential polynomials in the observed state variables and their derivatives. The coefficients of these polynomials are rational functions of the parameters of the model. We obtain the following input-output equations using DAISY [1].
Input-output equations (5) and (6) are differential algebraic polynomials of outputs y 1 and y 2 whose coefficients consist of model parameters. We say that the model (2) is structurally identifiable if the mapping from parameter space to the coefficients of input-output equations (5) and (6) are one-to-one [32,34]. Within differential algebra approach, we define the structural identifiability as follows [10,34]: denote the coefficients of the input-output equations (5) and (6). That is denoting the parameters of the model (1). Then we say that the model (1) is structured to reveal its parameters from the observations (3) if and only if Following the definition of the structural identifiability, we setp = (β E ,β n ,β s ,k,ρ,γ n ,γ s ,α,γ h ,ν). Then solving c(p) = c(p), we obtain the following parameter correlations: (8), parameters of the model (1) cannot be obtained from the observations of daily incidences and deaths. Some parameter combinations can be identified such as γ s + α or να, but individual parameters γ s , α or ν cannot be identified. Even if α is fixed toα in (8), then we can only obtain γ s =γ s , ν =ν and γ h =γ h , and the parameters β E , β n , β s , γ n , k, ρ still cannot be identified. Thus we have proved the following Proposition 3.1.

Proposition 3.1:
The parameters of the model (1) are not structurally identifiable from the observations of daily incidences and deaths.
In this approach, we assume that the initial conditions are given and fixed. An important advantage of using Differential Algebra approach in structural identifiability analysis is that we obtain parameter correlations as in (8). These parameter correlations are then used to obtain a structurally identifiable model [10,32]. However, the correlations in (8) are highly complex. In an attempt to get a structurally identifiable model, we rewrite the parameter correlations into the following simpler forms; Next, we factor out β s from the incidences term in model (1) as follows: With this setting, if we fix parameter α atα, and γ n atγ n then the parameter correlations in (9) become We can see from (10) that the parameters α, γ n , γ s , γ h and ν can be identified; parameter combinations q E kρ and q n (ρ−1) ρ can be identified; but parameter β s cannot be identified. For instance, the parameter correlation q E kρ implies that infinitely many values of the parameters q E , k and ρ would generate the same output as long as the parameter relationship q E kρ remains constant atq Ê kρ . Same is true for parameters ρ and q n satisfying the relationship In the view of these results, we obtain the following Proposition 3.2.

Proposition 3.2:
If the parameters, α, γ n , k and ρ are fixed, then solving c(p) = c(p), we obtain Since, β s still cannot be obtained we say that the model (1) is structurally unidentifiable from the observations of daily incidences and deaths.  (1) is not structurally identifiable, because the parameter β s does not appear in the parameter relations in (10). But all the other parameters are identifiable. We use the following strategy in an attempt to make β s an identifiable parameter; we use the basic reproduction number R 0 to determine the infection rate β s as If we use estimate from the literature of R 0 , and fix R 0 at that value, then β s will be identifiable as well.

Structural identifiability of COVID-19 epidemic models
We take this opportunity and study the structural identifiability of some COVID-19 models. Several models are proposed within the last year for COVID-19 infection and most of these models, including our model (1), are extensions of the following Kermack-McKendrick type SEIR epidemic model without human demography.
As before we assume that the observations are daily incidences and COVID-19 deaths. In terms of the state variables of the model (M 1 ), observations becomes We have determined the input-output equations of the model M 1 and show that it is structurally (globally) identifiable and obtain the following result (see Appendix for details).

Proposition 3.3:
The model (M 1 ) is structurally identifiable from the observations of daily incidences y 1 (t) = kE, and deaths y 2 (t) = νI.
In Proposition 3.3, it is assumed that all new incidences, kE are observed. However, this is not the case in reality. A key feature of COVID-19 cases is that majority of new infections are generated by individuals who do not show any clinical symptoms of COVID-19. A big portion of these asymptomatic individuals will remain undetected and only some might be detected through random testing or contact tracing. This leads to the fact that only the fraction of new incidences are reported. We continue with the identifiability analysis assuming that only fractions of daily incidences are observed, that is if we set y 1 (t) = ρkE for 0 < ρ < 1 then the parameter correlations for model (M 1 ) become Assuming a much realistic scenario that only a fraction of actual incidences can be observed, makes the simple SEIR model unidentifiable. Thus we have obtained the following result for model (M 1 ).

Proposition 3.4:
The model (M 1 ) with a fraction of observed cases is not structurally identifiable from the observations of daily incidences, y 1 (t) = ρkE, and deaths y 2 (t) = νI. (11), we see that if we set ρ =ρ, that is if we assume that the fraction of the incidences being reported daily is known, then the model (M 1 ) becomes structurally identifiable from daily incidences and deaths. The next model, we will study is the one where infected class I is further divided into two subclasses, as asymptomatic individuals I n and symptomatic individuals I s . Then we obtain the following model:

Remark 3.2: From parameter correlations
In this model, we assume that all the infected individuals who are showing symptoms are getting tested. That is for this model (M 2 ), observations of daily incidences and deaths take the form y 1 (t) = ρkE, y 2 (t) = νI s . Determining the input-output equations and then solving c(p) = c(p), gives the following parameter correlations; Proposition 3.5: The model (M 2 ) is not structurally identifiable from the observations of daily incidences, y 1 (t) = ρkE, and deaths y 2 (t) = νI s .
If we assume both the incubation period and the fraction of infected individuals who are showing symptoms are known, and set k =k, ρ =ρ. Furthermore, if we set as before, q E = β E β s and q n = β n β s , then the following parameters of the model (M 2 ) can be identified, Since all the parameter but β s can be identified. We use the same strategy as before and determine β s from the basic reproduction number of the model (M 2 ). We summarize the results in the following proposition.
Our intention for considering models (M 1 ) and (M 2 ) is to find the most complex COVID-19 model which is intrinsically identifiable, that is identifiable without any restrictions on the parameters. It is worth mentioning that the models (M 1 ), (M 2 ) and (1) are nested models. Model (M 1 ) is nested in model (M 2 ), and model (M 2 ) is nested in (1). The simplest model (M 1 ) is the only structurally identifiable model from the observations of daily incidences and casualties. As the model gets more complex as we increase the number of infectious classes in an attempt to capture the transmission dynamics of the COVID-19 infection, the epidemic model becomes unidentifiable.

Parameter estimation
We obtain the parameter estimates of the model (1) by fitting the model to the observed daily COVID-19 incidences and deaths in the USA. The time series data from March 3 to April 26, 2020, is collected [36]. We are interested in estimating the epidemiologically important parameters. In particular, the parameters to be estimated from data are infection rate of S by exposed individuals (β s q E ), infection rate by individuals who are not showing any clinical COVID-19 symptoms (β s q n ), infection rate by individuals with clinical COVID-19 symptoms (β s ), recovery rate of individuals with clinical symptoms (γ s ), recovery rate of hospitalized individuals (γ h ) and death rate of hospitalized individuals (ν). Based on the structural identifiability analysis of the epidemic model (1), we fix some model parameters such as the incubation period to estimate the model parameters accurately. The literature search of peer-reviewed publications reporting the incubation period of SARS-COV-2 listed the incubation period as ranging from 2 to 18 days [11]. We set the incubation period 1/k as 14 days. Mild or asymptomatic cases usually recover within 14 days [2]. Since we have already set the incubation period to 14 days, we set the recovery period of the asymptomatic individuals (1/γ n ) to 1 day. The mean estimated basic reproduction number of SARS-COV-2 early in the pandemic is found to vary from 1.95 to 6.47 [16]. We set R 0 = 4.3 for our parameter estimation problem. For the proportion of asymptomatic individuals, several studies give a wide range, namely 30-70% of individuals who have tested positive have mild or no symptoms ( [39] and the references therein). Due to this large uncertainty, we take the numbers in the Diamond Princess cruise ship; 712 individuals in the cruise ship Diamond Princess tested positive for COVID-19, 410 of these infected infected individuals in the cruise ship were asymptomatic. Therefore, we set the fraction of asymptomatic individuals to 410/712 = 58% [27]. That is we take ρ = 0.42. Hospitalization rate of infected individuals with clinical symptoms of COVID-19 per day (α) is not known. However, CDC reports that at an average 15% of infected individuals are admitted to hospitals [4]. Thus we set  We estimated the parameters of the model (1) p = [β s , q E , q n , γ s , γ h , ν] by solving the following constrained optimization problem: We use the confirmed COVID-19 cases for the parameter estimation problem. However, it is worth mentioning that the due to limited (testing only individuals with moderate to severe symptoms especially early in the pandemic) and imperfect (false negative results) testing, high number of cases in the USA were undetected or under-reported [39]. It is estimated that the actual COVID-19 cases is 3-20 times higher than the reported cases [39]. In our modelling approach, we take this into consideration with ρ-proportion of individuals who are getting tested. Another consequence of limited testing capacities is that true initial population size N(0) is not captured in the data. We pre-estimate the initial effective population size, N(0), and then fix the initial conditions, since the goal of this study is whether we can uniquely determine the epidemiologically important COVID-19 parameters such as the transmission rates of the different infectious classes. We present the estimated parameters in Table 3, and model (1) predictions with daily COVID-19 cases and deaths in Figure 2.

Practical identifiability analysis
We continue the identifiability analysis by performing Monte Carlo Simulations (MCS) and study whether the parameters of the model can be estimated by solving the constraint minimization problem numerically (16) from the noisy observations Y = (Y i 1 , Y j 2 ) (15). This type of identifiability is called practical identifiability [22]. To determine the practical identifiability, first we estimate the model parameters by (16) using the reported daily incidences and deaths. Then using the estimated parametersp, we generate synthetic data sets  (i) Error structure with normal distribution: We assume that the measurement error is proportional to data. Namely we assume that the measurement error has the following form: where ε i and ε j are independent and identically distributed with mean zero and standard deviation σ 0 . Thus the random variables Y i 1 and Y j 2 have mean y 1 (t i ) and y 2 (t j ) respectively, and standard deviation y 1 (t i )σ 0 and y 2 (t j )σ 0 respectively. In particular, we generate M = 1000 data sets using normal distribution whose mean is the model predictions and standard deviations are y 1 (t i )σ 0 and y 2 (t j )σ 0 respectively. That is , y 1 (t i ,p)σ 0 and Y j 2 = N y 2 (t j ,p), y 2 (t j ,p)σ 0 (18) where N (μ, σ ) denotes the normal distribution with mean μ and standard deviation σ . We generate M = 1000 synthetic data sets for each measurement errors, by increasing the errors gradually from σ 0 = 1%, to σ 0 = 5%, 10% and σ 0 = 20%. In Figure 3, we present a single synthetic incidence (Y i 1 ) and deaths (Y j 2 ) for each measurement error σ 0 . Comparing Figure 3 with the actual data in Figure 2, it appears the actual data has about 20% error.
(ii) Error structure with poisson distribution: For the Poisson distribution, the mean and the standard deviation are equal.
where P(μ) denotes the Poisson distribution with mean μ. We present the synthetic data generated by Poisson distribution in Figure 4. It is apparent from Figure 2, Figure 3 and Figure 4 that the error structure in actual data most likely has normal distribution with σ 0 = 20%. Top row: A single set of incidences data generated for the Monte Carlo Simulations is presented for each measurement error. The incidence data is generated by taking the estimated parameters given in Table 3 as true parametersp and generating random incidences with normal distribution whose mean is y 1 (t i ,p) and the standard deviation is y 1 (t i ,p)σ 0 . Bottom row: The simulated deaths data is generated by taking the estimated parameters given in Table 3 and as true parametersp and generating random deaths with normal distribution whose mean is y 2 (t i ,p) and the standard deviation is y 2 (t i ,p)σ 0 as in (18). First column is for σ 0 = 1%, the second column is for σ 0 = 5%, the second column is for σ 0 = 10% and the fourth column is for σ 0 = 20%. We stopped at σ 0 = 20%, because comparison with the actual data suggests that the measurement error is about 20%.  Table 3 as true parametersp and generating random data with Poisson distribution.
We then fit the COVID-19 epidemic model (1) to each M = 1000 data sets by solving the optimization problem (16) and then calculate the Average Relative Errors (AREs) of each parameter for each noise level. The AREs are computed as (20) where p k is the k th parameter in p andp are the parameters that generate the random variables Y i 1 and Y j 2 . AREs of the parameters for each measurement error are presented  (16) to M = 1000 synthetic data sets for each measurement error σ 0 . in Table 4 and Table 5. AREs give insight about the practical identifiability of the model parameters. In Table 4, we present the AREs of each parameter when the error structure in the data is assumed to be normal. AREs of each parameter increases as the measurement error level (σ 0 ) increases. We see that two of the parameters, namely β s and q E have high AREs compared to other parameters. The AREs of parameters β s and q E are more than the 20 times the measurement error when σ 0 = 20%. On the other hand, the AREs of the parameters q n , ν, γ s and γ h are bounded by the measurement error at σ 0 = 20%. Based on AREs in Table 4, we conclude that the parameters β s and q E are not practically identifiable. Parameters q n , ν, γ s and γ h are practically identifiable. Next we run the MCS by generating M = 1000 data sets using Poisson error structure with mean y 1 (t i ,p) and y 2 (t i ,p). We then compute the AREs and the results are presented in Table 5. Checking the ARE of each parameter in Table 5, we see that the parameters β s and q E have higher ARE compared to other parameters and declare these parameters to be practically unidentifiable. Comparing the ARE values in Table 4 with ARE values in Table 5, regardless of the error structure assumption, Normal vs. Poisson, we see that in both cases the parameters β s and q E have significantly higher AREs compared to other parameters. In conclusion, error structure does not influence the parameter identifiability.
We continue with the identifiability analysis by assessing the evolution of the parameter identifiability. Thus we investigate how the parameter identifiability changes as function of time. We performed the MCS simulations at three different stages of the pandemic; (i) before it reached its peak, (ii) at the peak and (iii) after the peak. Since normal error Table 6. Monte Carlo simulation results: Average relative errors (ARE)s of parameters of the COVID-19 model (1) are presented after performing parameter estimation problem (16) to M = 1000 synthetic data sets for each measurement error σ 0 . Error structure is assumed to be normal. Before the peak At the peak structure is a better representation of the error structure in the actual data, we performed the MC simulations with normal error structure. We present the AREs of each parameter before the peak and at the peak in Table 6. AREs presented in Table 4 are the AREs of the parameters after the peak. The AREs of the infection rate of symptomatic individuals, (β s ) decrease as the pandemic progresses, same is true for q n . We do not observe the decline of the AREs for the parameter q E . On the other hand, AREs of the parameters ν and γ s remain almost the same for all three phases of the pandemic. And AREs of the parameter γ s increase from before the peak to at the peak phase and then decrease again from the peak to the after the peak phase. For this practical identifiability analysis, we conclude that as the pandemic progresses, as we add more data points to the parameter estimation problem, it is not possible to say that the identifiability of the parameters improves.

Predicting the future epidemic waves
Kermack-McKendrick type SEIR epidemic model (M 1 ) or its variations such as (M 2 ) and (1) can only forecast a single outbreak wave. Due to their nature, these types of outbreak models cannot predict multiple epidemic waves. In this section, we give a methodology on how to use an epidemic model to predict multiple waves. We use the methodology used in sub-epidemic curves introduced in [6]. Authors in [6] used a single generalized-logistic growth model which describes the incidences to forecast multiple epidemic waves. Here we generalize this methodology to outbreak models. We model the epidemic wave which consist of three overlapping sub-epidemics using the COVID-19 model (1): where A i (t) = 0 for t < t i 1 for t ≥ t i for i = 1, 2, 3 with t 1 < t 2 < t 3 and t 1 = 0. For the sub-epidemic model (21), the observation of incidences and deaths becomes respectively as follows: To estimate the model parameters, we modified the optimization problem (16) for the subepidemic model (21). The estimated parameters are presented in Table 7. The fit is shown in Figure 5.

An optimal control scenario
US states initiated various community mitigation policies in March 2020. Some of the widely implemented strategies included the issuance of orders requiring persons to stay home, schools were closed and universities transitioned to online instruction. Stay-athome orders started from California on March 19, 2020, and ultimately were issued in all 50 states, covering 316 million people between March 19 and May 20, 2020 [30]. To model the social distancing, we introduce control u 1 (t) which effectively reduces the transmission rates for the exposed E and non-symptomatic individuals I n . To model the self-isolation required from symptomatic individuals I s , we introduce a control u 2 (t). Although, direct correspondence between the real life controls, social distancing and self-isolation to our controls u 1 and u 2 is not clear, such an investigation still yields to useful observations, particularly in the extreme case, when we want to achieve maximum effect. Both controls belong to the admissible set where T is the final time considered. The model with the controls takes the following form: The main goal of the quarantine was to reduce the total number of infected individuals over the time [0, T]. To express that we use the following cost functional: Our goal is to find an optimal control functions (u * 1 , u * 2 ) such that We do not consider isolation to be a control variable, as isolation was applied only to severe cases. Since 80-85% are mild to moderate [23], hospitalizations are determined by the progression to severe symptoms and cannot be controlled.
We use Pontryagin's Minimum Principle [20] and introduce a time-varying Lagrange multiplier vector λ(t), whose elements are called the adjoint variables of the system. The Hamiltonian H is defined ∀t ∈ [0, T] by That is we set Hamiltonian as In the next theorem, we present the adjoint system and optimal control characterizations.

Numerical simulations with optimal control
We use forward-backward sweep method [18] to approximate the solution of model (22) with controls u 1 , u 2 . We use MATLAB built-in function ode15s to approximate the solutions of the model (22) with controls by taking initial conditions x(0) = x 0 and also use the ode15s to solve the adjoint differential differential equations backward in time by taking the value at the final time as λ(T) = 0. We use the following forward-backward sweep algorithm to get the optimal control functions [18].
(4) We update the new values of u = (u 1 (t), u 2 (t)) using (26) and (27). (5) Iterate the same process until the following convergence criteria is met for every state variable, adjoint and control functions For the optimal control simulations, we use the parameter values obtained in the fitting as presented in Table 3. For the first simulation, we solve the optimal control problem with controls social distancing and self-isolation setting off at time t = 0 and effective for 75 days. Figure 6 presents the state of infectious populations with and without social distancing and self-isolation. We see that the controls are most effective when it is applied at 100%, that is at their upper bound of 1. We recognize that achieving 100% social distancing or isolation is not practically possible. However our results show that the higher the social distancing or isolation the lower the infection. Therefore, depending on the level of the cost parameters and our cost structure, one should aim at the maximum possible social distancing and isolation. Even if we set a realistic upper bound, u max for the controls which is strictly less than 1, the optimal control will still reside at the u max most of the time. For the next simulation, we apply the social distancing and self-isolation from March 19 till May 20, for about 2 months as this was the case for the USA. We perform the iterative process to find the optimal pair which would be only effective from March 19 till May 20 that would minimize the objective function (23) subject to the COVID-19 model (22) starting on March 3 and ending on June 23, 2020. In Figure 7, we only present the infectious states with controls, since the infectious populations without controls are same as in Figure 6 second row. Comparing Figure 6 and Figure 7, we see that, one of the main differences in starting the social distancing and self-isolation 2 weeks after the launch of the outbreak is that the number of infectious classes are at least 1 magnitude order higher if not more. Further, we see, comparing Figure 6 with Figure 7 that if we had continue to apply the controls there would not have been an uptick in cases.

Discussion
Mathematical compartmental models can provide valuable insight about the spread of SARS-CoV-2 in a population. In particular, they can provide guidance about the role of  specific interventions in reducing the impact of an ongoing disease outbreak. All models are limited by factors that include the levels of complexity in the epidemiological states being considered, the parameters to be estimated, and the limited data that often involves delay and uncertainty. These models have the potential to provide valuable guidance for pandemic preparedness and planning to reduce the burden of an outbreak, particularly during the early phase of an outbreak. In this study, we proposed a COVID-19 epidemic model that considers six epidemic states and incorporates important phases of the coronavirus life cycle, such as incubation and latency period. Scenarios that incorporate social distancing efforts are implemented to limit the spread of SARS-CoV-2 were considered in our study. We computed the reproduction number, estimated model parameters based on daily cases and deaths, and conducted structural and practical identifiability analysis for the proposed model.
We show that parameters of the proposed model are not structurally identifiable from observations of daily incidence and deaths. We further establish that for a sequence of models, nested in the proposed model, only the SEIR model where all incidences are observed, is structurally identifiable. For our proposed model, we show that if the parameters α, γ n , k, ρ are fixed from external information, then all parameters are structurally identifiable, except β s . To identify β s we use estimates of R 0 from the literature.
We further investigated the practical identifiability of these parameters by performing Monte Carlo Simulations and found that transmission rate for exposed β E = q E β s and symptomatic infectious β s were not practically identifiable. This result largely persists even if we use outside information for the proportion of individuals who progress to hospitalization. A surprising outcome of the practical identifiability analysis was that fixing the proportion of individuals who become hospitalized in fact makes the parameters of the model practically identifiable. The practical identifiability conclusions remain the same regardless of the error structure (Poisson or Normal) chosen. We also investigated whether taking a shorter time series compared to taking a longer time series affected identifiability results. For the most part identifiability improved with the longer data set but we think that the results were not strong enough and uniform enough to be conclusive.
The investigation of the optimal control problem shows that socials distancing should be practiced at 100% to reduce the disease quickly, while isolation can be practiced at more relaxed conditions, particularly as time progresses. Social distancing should be practiced at 100% for most of the time interval even if the cost of that is very high. High levels of social distancing lead to exceptionally high reduction in the number infected individuals without symptoms and hospitalized individuals which leads to quick curtailing of the pandemic and maintenance of susceptible individuals at higher level.
An important finding of our study is our investigation of social distancing and selfisolation as control measures. We investigated two scenarios: control is active during the entire studied interval of time and the control is active only when the lock down was in effect, which is a subset of time. With the first scenario both optimal controls gradually go to the maximum and stay there for the remainder of the studied duration. In this case, the epidemic states are well controlled and reach peaks at least an order of magnitude smaller than with the second scenario when the control is applied with delay. In the second scenario, the optimal controls jump to the maximum and stay there for the entire allowed duration. However, the numbers in the infected classes are much larger, and as soon as the control is stopped, these classes start growing in magnitude. The observation we make here support again the rule of thumb that control measures have to be applied as early as possible in an epidemic as strongly as possible. Our numerical simulations for each of the epidemic states of the proposed model show the critical importance of the timely implementation of control measures, particularly in reducing the number of individuals at risk of exposure and infection from the coronavirus. The implications of these findings are most relevant during the very early phase of the pandemic as the simulations are based on data captured very early-on. To further investigate the trajectories of these epidemic states and parameters as the pandemic progresses beyond its early phase, we fit overlapping epidemic waves to the data of the USA as described by [6]. This allows us to investigate how the virus and epidemic characteristics of the COVID-19 pandemic have evolved over the year since it first became established in the USA. To accurately model the spread of the coronavirus and the role of specific interventions, it is important that we update modelling efforts to account for new virus information, as well as additional relevant interventions such as testing and contact tracing [3,12,17]. It is also important to consider that morbidity and mortality associated with COVID-19 differ among individuals in the population and such trends in risk have also evolved over time [7,24,37]. During the early phase of the COVID-19 pandemic in the USA older age individuals seemed to have experienced a higher burden while in the current months of this pandemic, younger and mid-age individuals seem to experience higher risk of infection. A consistent observation with this pandemic is that individuals of older age, and higher pre-existing comorbidities such as heart disease, diabetes and obesity have a higher risk of infection and severe COVID-19 experience [15,25,28].
In summary, this study presents a COVID-19 model that investigates the spread of SARS-CoV-2 and the role control interventions play to reduce the burden associated with the COVID-19 pandemic. In particular, our study of the structural and practical identifiability of the proposed model highlights the importance of parameter identifiability and the limitations inherent to uncertainty in data available, particularly during the early phase of a novel pandemic. We have presented some unique methodology for turning unidentifiable into identifiable parameters. Using a mixed approach to the problem of parameter identifiability, we are able to provide specific guidance about the identifiability of a subset of parameters that can be useful in guiding some of the model predictions. We further investigate the relevance of control measures through social distancing efforts and self-isolation. We show that the timely implementation of control measures can be extremely valuable to reduce the burden of this novel pandemic.