Discrete stochastic analogs of Erlang epidemic models

ABSTRACT Erlang differential equation models of epidemic processes provide more realistic disease-class transition dynamics from susceptible (S) to exposed (E) to infectious (I) and removed (R) categories than the ubiquitous SEIR model. The latter is itself is at one end of the spectrum of Erlang SEIR models with concatenated E compartments and concatenated I compartments. Discrete-time models, however, are computationally much simpler to simulate and fit to epidemic outbreak data than continuous-time differential equations, and are also much more readily extended to include demographic and other types of stochasticity. Here we formulate discrete-time deterministic analogs of the Erlang models, and their stochastic extension, based on a time-to-go distributional principle. Depending on which distributions are used (e.g. discretized Erlang, Gamma, Beta, or Uniform distributions), we demonstrate that our formulation represents both a discretization of Erlang epidemic models and generalizations thereof. We consider the challenges of fitting SEIR models and our discrete-time analog to data (the recent outbreak of Ebola in Liberia). We demonstrate that the latter performs much better than the former; although confining fits to strict SEIR formulations reduces the numerical challenges, but sacrifices best-fit likelihood scores by at least 7%.


Introduction
An SEIR epidemic modelling framework, where S, E, I and R represent the named susceptible (S), exposed (E: infected but not yet infectious), infectious (I), and removed (R: either dead or recovered and now immune) disease classes -and italicized letters S, E, I, and R represent the number of individuals in each of these classes -underpins all infectious disease modelling at the population level [27,52]. This holds whether the models are continuous or discrete-time [21], deterministic or stochastic [2,10], or framed at the compartmental-or individual-level formulation [1,11,24]. Fitting epidemiological models to real data possess significant challenges because most epidemics do not conform to the assumptions underlying the basic SEIR formulation. In particular, populations are rarely spatially homogeneous [25,30,32], and disease transmission varies with age [55] and other individual-level factors, including behaviour [16]. An additional acknowledged weaknesses of the SEIR formulation is that if a group of n individuals enter class X (X = E or I) at time t and exit at rate γ , then the number of individuals still in state X at time t + τ is given by the exponential function n(t + τ ) = n(t) e −γ X τ [56]. This approach has been shown from outbreak data to underestimate the basic reproductive ratio R 0 of an epidemic process [56]. A more reasonable assumption is that individuals will mostly exit around a characteristic diseaseclass-specific residence time ρ X , which corresponds to the latent period and infectious period for any disease [49,56]. Models that address deviations from the above assumptions have been proposed. For example, one way to modify the standard SEIR continuous-time formulation so that the modal exit-time of individuals from disease states E and I is close to empirically measured values, ρ E and ρ I respectively, is to use either a discrete-time formulation in which individuals take a set number of time periods to move through each disease state (e.g. as has been done in modelling SARS [23,37]), or to use a continuous time distributed-delay approach [7,9,13,33]. This latter approach can be modelled using a so-called boxcar formulation: the process by which individuals pass through each disease state (i.e. compartment) X is represented by individuals passing through k X concatenated sub-compartments at a rate γ X in each of the sub-compartments [7,31,33,35] (Figure 1). In this case, the number of individuals n(t) entering at time t still in state X at time t + τ is not given by an exponential function, but rather n(t) multiplied by (1 − F Erlang (τ , k X , γ X )), where F Erlang (τ , k X , γ X ) is the cumulative Erlang distribution with parameters k X and γ X . Note that the Erlang is a special case of the Gamma distribution when the shape parameter k X is a positive integer [33]. Further, the fact that a distributed-delay integrodifferential equation formulation can, under certain assumptions, be turned into an ordinary differential equation formulation of higher dimension, using the so-called 'linear chain trick' has been known in biological modelling for the past 40 years ( [14,40,44], and see [51] for a more recent treatment).
Beyond individuals emerging from a disease state according to an Erlang distribution, a more general discrete-time modelling approach can be taken in which any distribution of emergence can be specified. Here we develop a general discrete-time, distributed-delay formulation, taking the novel approach of identifying disease subclasses that represent the number of days individuals have left before they transfer from the current disease class to the next in the chain (i.e. E to I and I to R, and so on), as depicted in Figure 1. This permits exponential, Erlang, and Uniform distributions, among others, to be used. Beyond comparing our discrete-time formulation that corresponds to continuous-time Erlang formulations, we compare simulation output among the Erlang, Exponential, and Uniform cases, demonstrating some surprising results. For example, the case that yields the highest levels of prevalence switches as the values of the reproductive ratios R 0 increase from low (R 0 = 1.25) to very high (R 0 = 20). We also demonstrate how easily discrete-time formulations can be implemented stochastically, and we compare sample mean ensembles of stochastic simulations to discrete-time solutions. Finally, we compare the efficiencies of fitting continuous-time Erlang models and our various discrete models to real data. To keep our analysis focused, our formulation and simulations ignore the processes of recruitment and mortality. These, of course, are easily added, as we discuss, along with other important extensions such as spatial structure using metapopulation networks. The standard continuous-time SEIR compartmental model has individuals flowing through the E and I compartments at rates γ E and γ I respectively, with rates of flow from compartments S to E and V to S (V is the immune part of R, D is the dead part of R) represented by τ and γ V respectively. The boxcar or Erlang version divides E into k E and I into k I sub-compartments with the mean time that it takes to pass through compartments E and I now given by ρ E = k E /γ E and ρ I = k I /γ I , respectively. The discrete time-to-go model can be regarded as the discrete-time analog of the Erlang model where the proportions (deterministic setting) or probabilities (stochastic setting) p E i , i = 1, . . . , n E and p I j , j = 1, . . . , n I follow Erlang distributions, as described in the text.

Continuous deterministic
The basic equations for the SEIR distributed-delay model [7] are well known. For the sake of completeness, however, we present these equations here in terms of a general transmission function T ) that depends on the total number I tot of infectious individuals in the population. As depicted in Figure 1, if the number of E and I class compartments is k E and k I , respectively, with associated rate parameters γ E and γ I , then the boxcar models takes the form: . . .
The simplest form of the transmission function is T (I) = T (I tot ) = βI tot . It is convenient when applying the model to normalize the transmission rate parameter β > 0 in T by total population size N so that β does not have to be adjusted for populations of different sizes. This ensures that expressions for the basic reproductive ratio, R 0 , of the disease is independent of population size (cf. [4]). This normalization is done by dividing β by N, where If deaths are ignored during the course of the epidemic, then N(t) is constant and the transmission function varies only with I tot (t) over time, otherwise it will vary with N(t) as well. In this latter case, T (I tot )(t) is referred to as frequency-dependent transmission. This representation of transmission is particularly appropriate when contact rates are influenced by processes other than population density, as in sexually transmitted diseases [22]. The ubiquitous density-dependent form τ (I, N) = βI though is useful when population density is low. A more general approach that approximates density-dependent transmission at relatively low populations densities and frequency-dependent transmission at relatively high population densities, for some arbitrary constant L > 0 (to be estimated when fitting the model to data), is given by the transmission function [21,41] The assumption that the transmission rate parameter β is time independent is also problematic. The primary reason that epidemics subside, other than running out of susceptible individuals (which is linked to the so-called threshold effect [22,38]), is that β(t) precipitously falls due to behavioural changes that influence contact rates as the epidemic proceeds (e.g. as in the recent Ebola outbreak in West Africa [12,24]). A remedy for this is to assume that β has the exponential form β(t) = β 0 e −εt (e.g. as in [3]). This approach, however, implies that β declines most steeply at the start of the epidemic -a highly suspect assumption. A more likely situation is that β only starts to decline once public awareness of the full potential of the epidemic becomes apparent. In this case, a reverse S-shaped curve, such as is more realistic, where t c > 0 is that amount of time after the start of the epidemic that it takes for β(t) to fall to half its original intensity, and ε > 1 is an abruptness parameter for the onset of this fall (in a manner analogous to the onset of density-dependence discussed in [20]). In the absence of an independent estimate for t c when fitting models to data, one can estimate β 0 by fitting a constant β model to the early stages of an epidemic, and then use this value to estimate the reproductive ratio parameter R 0 (the number of individuals that the index case can be expected to infect at the start of the epidemic [15,26]). Using the simple frequency-dependent transmission rate (Equation (4)) and assuming that β is constant over time, the model given by Equations (1)-(4) is a five parameter system (β > 0, γ E > 0, and γ I > 0, as well as positive integers k E and k I ) that can be fitted to data to obtain the best estimate of the transmission rate parameter β, and the mean residence times in the disease states E and I, respectively. This also implies that the mean residence from exposure to removal is Here we make use of the fundamental definition of R 0 as the ratio of all newly infected individuals transferring into disease classes E+I to the net change of individuals across classes E+I [15,26]. This is formally equivalent to finding the dominant eigenvalue of the next generation matrix, as defined in [53]. Since we are ignoring births and deaths, this ratio has the particularly simple form Note that ignoring births and deaths can also be regarded as an approximation to situations where the epidemiological process is an order of magnitude faster than demographic processes. This is, however, an assumption that breaks down once disease-induced mortality rates become significantly greater than background natural mortality rates.

Discrete time-to-go formulation
In formulating a discrete-time analogue of Equation (1), we take an approach of placing individuals into subclasses that represent the length of time that each will remain within the disease class under consideration. Thus, we use the notation E i (t + 1) to represent the number of individuals in disease class E at time t+1 who will depart from this class (to enter class I) at time t+1+i. These individuals were either in compartment E i+1 at time t or are newly allocated from the individuals that were infected over the time interval [t, t + 1) using the proportions described below. The same process applies to the compartments represented by the variables I i (t + 1), with individuals transitioning to disease class R. These proportions are denoted by 0 < p E i < 1, i = 1, . . . , n E and 0 < p I i < 1, i = 1, . . . , n I , and represent the proportions of individuals entering the disease classes E and I who only stay in those classes for i additional periods of time. The integers n E and n I are selected so that all individuals (essentially > 99% in the case of populations of several thousand or more individuals) upon entering diseases classes E and I at time t have exited these disease classes on or before times t + n E and t + n I , respectively. In the context of stochastic models, these proportions can be used to characterize the cumulative functions F(x; k E , γ E ) and F(x; k I , γ I ) of two distributions: namely, In our discrete model, we use the variables E 0 (t) and I 0 (t) to represent the number of individuals changing state over the discrete-time interval [t, t + 1] from S to E and E to I, respectively. We also use E tot (t) and I tot (t) to represent the total number of individuals in classes E and I at time t. Over the time interval [t, t + 1), the susceptibles S(t) are assumed to be exposed to I tot (t) infectious individuals (taken to be constant over the interval, which is generally a reasonable approximation), thereby resulting in the infection of approximately susceptible individuals. Hence we obtain the equation: Recall that the equations for the compartments E i (t + 1) need to account for the transfer of individuals from E i+1 (t), as well as the newly allocated individuals p E i+1 E 0 (t). These considerations yield the system: These newly infectious individuals I 0 (t) can themselves be distributed so that proportions p I i , where i = 1, . . . , n I enter the removed class at time t. That is, we obtain the equations: Finally, we need to compute E tot (t) and I tot (t) by applying Equation (2) with n E and n I replacing k E and k I to obtain

Stochastic implementation
The discrete stochastic formulation results in a model that is identical to the discrete deterministic model except for the fact that proportions in Equations (10) and (11) are replaced by random drawings from multinomial distributions, which approach proportions (i.e. the discrete formulation) as R 0 (i.e. β) increases in size and the number of samples grows large.
In presenting the stochastic model, we use the notation to denote that x is a random variable representing the number of heads that might be obtained when flipping a biased coin n times, with a probability p of getting heads on each flip (i.e. a Bernoulli process with probability p). Thus, x is binomially distributed. We will use the notation X to represent an instance of the number of heads (positive outcomes) of an actual n-trial sampling of this distribution with probability parameter p. More generally: In other words, it is the probability of gettingX i positive outcomes of type i = 1, . . . , q when the probability of getting a positive outcome of type i at any drawing is p i , with q i=1 p i = 1, and the total number of drawings is n. To keep our presentation succinct, the convention (X 1 , . . . ,X q ) := MULTINOMIAL n, p 1 , . . . , p q , is adopted in the model formulations below to denote one outcome of a random drawing from the indicated distribution.
With this above notation, the stochastic equivalent of the deterministic model represented by Equations (9)-(12) is . . .

Erlang probabilities
If we want our probabilities to follow truncated Erlang distributions, then we can first select n E such that We can then define the probabilities associated with passage through disease class E as Similarly, we select n I such that Fitting the discrete Erlang model to data involves estimating five parameters: (k E , γ E , k I , γ I , β). It also requires constraining k E and k I to be positive integers. In general, however, we can allow k E and k I to vary continuously on the positive reals, and use gamma functions, which interpolate the Erlang when k E and k I are not integers. The boxcar version of the distributed-delay model (which, in addition to its formulation in terms of k first-order differential equations, can also be formulated as a kth order ordinary differential equation), under assumptions that the rate is constant through all the box cars associated with each disease class, is known to produce Erlang distribution residence times [29]. This implies that the distribution of residence times x is given by the probability density function and, hence, the cumulative distribution function From these we can calculate the probabilities (proportions in the deterministic model) that individuals entering a k boxcar complex at time t, and exiting this concatenation of cars during the time A theoretically simpler, and computationally faster, approach to fitting the discrete model to data is to use one parameter Uniform distributions f Unif (x; 0, a) rather than two parameter Erlang. These probabilities are provided in the Appendix.

Initial conditions and cases considered
Various approaches can be taken to setting initial conditions in comparing discrete and continuous time simulation models. The primary consideration is to have a good way to match up initial values because the continuous model has k E + k I + 2 states while the discrete model has n E + n I + 2 states (where, typically, n X > 2k X , X = E, I). Further, when injecting stochastic models into the comparative study, one needs to be cognizant of the fact that continuous and discrete deterministic simulations are independent of scale -i.e. the size of N -and only the stochastic simulations are affected by the size of N. Specifically, the influence of demographic stochasticity on simulated values reduces at a rate proportional to 1/ √ N. In our first set of simulations, we selected S(0) = 10, 000, but varied S(0) from 1000 to 10 6 in subsequent sets of simulations that explore aspects of demographic stochasticity and compare the efficiency of fitting discrete versus continuous models to simulated data. To initiate the start of an epidemic, we placed one individual in the last Esubcomponent class, which corresponds to setting E k E (0) = 1 in the continuous time model and E 0 (0) = 1 in the discrete-time deterministic and stochastic Erlang models. In all of the simulations, the initial values of the remaining E and I subcomponent states were set to 0, as was R(0).
For purposes of illustration and comparison of simulations we consider continuous and discrete-time implementations for cases involving k E and k I equal to 5 and 10, as well Table 1. Simulation cases 1-33 are sorted from the shortest (5 days) to longest (10 days) mean infectious period, and then subsorted within infectious periods from shortest to longest total duration (exposed plus infected) and within that from least to most distributed-delay in infected and then exposed.

Differences among continuous deterministic solutions
Continuous-time, deterministic simulations of prevalence levels over time (i.e. plots of I tot (t) against t) for Cases 1-32 are illustrated in Figure 2(a). These 32 cases can be organized into four groups, each of which yields the same set of (ρ E , ρ I , ρ E+I ) values, differing only in whether 5 or 10 boxcars were used in each of the E and I classes ( Table 1). The main difference among the four groups corresponding to a particular set of values (ρ E ,ρ I ,ρ E+I ) is the amount of variation associated with the distributions around ρ E and ρ I : specifically, 5-boxcar formulations with associated progression rateγ have larger variation around ρ X than 10-boxcar formulations with associated rate 2γ . These four groups of four cases each were implemented in the context of two values of β to yield corresponding values of the reproductive ratio R 0 . The resulting eight sets of four cases fall very neatly into eight sets of outputs (Figure 2(a)), indicating that the selection of 5 versus 10-boxcar formulations makes very little difference in terms of the projected levels of disease prevalence associated with a given (β,ρ E ,ρ I ,ρ E+I ) combination. Clearly, as expected, lower R 0 leads to significantly lower peak prevalence, but as we see in Figure 2(a), it also results in longer outbreaks.

Differences among deterministic and mean stochastic solutions
The discrete Erlang model produces very similar results to its corresponding continuous Erlang model, as we see in Figure 3 for both Cases 29 and 32, although the discrete peak is slightly lower. The mean of the corresponding stochastic ensemble for the case N = 10,000 is slightly lower than either deterministic case.  [42,46,47,50], which we designate here as an initial fadeout (IFO) and breakout (BO) The basic disease reproductive rates (R 0 ) and mean residence times in the exposed (ρ E ) and infectious (ρ E ) classes for each of the eight groups of four cases, as detailed in Table 1 Table 1. component. Instantiations of the stochastic model that belong to the IFO component, occur with probability P IFO , and those belonging to the BO component occur with probability 1 − P IFO . It is only the BO component that can be approximately modelled by a corresponding deterministic system [46], provided N is sufficiently large. The size of the IFO component is strongly linked to the critical value R 0 = 1. For R 0 ≤ 1 only the IFO  Table 1. Note that mean of the non-fadeout ensemble is averaged over only those that do not initially fadeout. component exists, and solutions are referred to as stuttering transmission chains [8,39]. When R 0 > 1, the size of the IFO stuttering transmission chain component rapidly diminishes with increasing R 0 . For example, in generating the prevalence profiles in Figure 3, which corresponds to the case R 0 = 10 (β = 1), no stuttering chains were evident in our ensemble of 100 runs. Selecting β = 0.25 and repeating the exercise for Runs 1 and 4 for the case N = 10,000 -i.e. R 0 = 1.25, we obtained the prevalence profiles plotted in Figure 4. Now, 14 and 9 of the 100 Monte Carlo instantiations for Cases 1 and 4, respectively, were stuttering transmission chains that faded out before they could break out.
IFO rates vary somewhat among repeated sets of ensemble runs and are also affected by the size of the population. In particular, IFO rates are much more difficult to assess in small than in large populations because the sorting of runs into the IFO and BO components is much more difficult when N is relatively small. This problem is evident from the results obtained from simulations conducted for varying N, as reported in the next section.

Small and large populations
When the transmission rate is normalized by the value N, as we have in Equation (4), the solutions to the continuous and discrete deterministic model are scale-independent with regard to population size N. That is, epidemics in isolated towns of 10,000 individuals are per-capita the same as epidemics in isolated mega-cities of 10 million individuals. This is not the case for stochastic populations because demographic stochasticity is not scale-independent. Rather, due to sampling processes for which the variance declines in proportion to sample size, the effects of demographic stochasticity are proportional to 1/ √ N. But even large populations are structured into smaller subpopulations that may not be well-mixed on the time scale of an epidemic. This is particularly true of local school populations where the same group of several hundred children aggregate for a substantial portion of the day, on five of the seven days each week. Thus, disease outbreaks in mega-cities, particularly those associated with children, may behave somewhat like disease outbreaks in interconnected sets of large villages or small towns, with large population behaviour depending on suitably high transfer rates of individuals among localities.
To gain insights into the effects of population size, we will consider outbreaks in isolated groups of N = 500, 2000, and 50,000 individuals. For the scenario N = 500, 90/500 runs (i.e. 18.0%) belonged to the initial fadeout component (IFO), while for the scenarios N = 2000 and N = 50,000 the IFO contained 21.0% and 18.2% of the runs, respectively.

Discrete model convergence
In Figures 3 and 4, and the third column of Figure 5, we have plotted the average of a number of instantiations of our stochastic Erlang model, which looks relatively smooth over an ensemble of 100 runs, except when R 0 is small (i.e. R 0 = 1.25) or N is small (i.e. N = 500). It remains unclear precisely which circumstances enable the convergence between the mean of an ensemble of stochastic instantiations and the solution of its discrete counterpart. This convergence depends, at least, upon the following three aspects:  Figure 5. To see Figure 6. The left, middle, and right columns are plots of prevalence obtained from the discrete deterministic model (blue curves) and averages over ensembles of 100 runs (n = 100) of the discrete stochastic Erlang models (red curves; broken lines are ± one standard deviation around solid curve mean) for Cases 1, 17, and 33 (cf. Table 1). the effects of N and R 0 on the mean of ensembles of instantiations of the stochastic model, consider the (γ E , k E , γ I , k I ) = (5, 5, 1, 1) configuration of the model for the cases β = 0.25 (R 0 = 1.25), β = 1 (R 0 = 5), and β = 2 (R 0 = 20), which correspond to, Runs 1, 17, and 33 in Table 1, for scenarios N = 10 3 , 10 4 ,and 10 6 . The results obtained are presented in Figure 6. From these results, we see that ensembles larger than n = 100 are needed to smooth out the curves generated when R 0 and N are both relatively small (e.g. R 0 = 1.25 and N = 10 3 ; top left panel), but are quite smooth when either N is relatively large (e.g. R 0 = 1.25 and N = 10 6 ; bottom left panel), or when R 0 is somewhat larger (e.g. R 0 = 5 and N = 10 3 ; top middle panel). When R 0 = 1.25, the ensemble mean underestimates the deterministic solution (left column of panels) by 25-30%. This falls to just under 10% when R 0 = 5 (centre column of panels), and below 2% when R 0 = 20 (right column of panels).

Model comparisons
We compared the deterministic solutions obtained from the (k E , k I , γ E , γ I ) = (1, 5, 1, 5) Erlang models (both discrete and continuous) with their R 0 -equivalent deterministic Uniform and Exponential discrete-time models for the scenarios R 0 = 1.25 (β = 0.25), R 0 = 5 (β = 1) and R 0 = 20 (β = 0.4). The prevalence curves obtained, normalized to proportions (since deterministic solutions do not depend on N) are presented in Figure 7. Except in cases with a low R 0 value (top panel), the Uniform provides a much better fit to the Erlang model than the Exponential does. Figure 7. The prevalence in the population (normalized to a proportion) is plotted for the continuous and discrete (k E , k I , γ E , γ I ) = (1, 5, 1, 5) Erlang models (dotted and blue curves respectively), as well as their Uniform (brown curve) and Exponential (green curve) discrete-time counterparts for the cases β = 0.25, 1 and 4 (R 0 = 1.25, 5, and 20 respectively).

Fitting models to Ebola data
Using published data from the Ebola outbreak in Liberia during the 74 weeks following initial reporting in March of 2014 (compiled by [5]), we fitted several of the proposed formulations to real outbreak data. The models tested on the Ebola data (the discrete and continuous Erlang) were compared to the discrete and continuous versions of the standard SEIR formulation (with an exponential distribution governing transitions between compartments) to judge the utility of the alternative methods for model fitting. Because the transmission rate was treated as a constant throughout the epidemic, only the initial quasi-exponential increase was used for the purposes of fitting. In order to fit these models to the subsequent decline in infections (when the population of susceptible hosts is not exhausted during the initial increase), assumptions regarding the form of the change in the transmission parameter (β) must be made. While some have used an exponential decay to represent this shifting β value over time [3], others have used threshold functions [34] or related the transmission parameter to environmental covariates like seasonality [43]. Fitting to the initial increase in the epidemic should be informative with regard to the ability of each alternative formulation to fit to real epidemic data without requiring a potentially confounding assumption about the form of the change in the transmission rate over time. As in [3], a total population size of 10 6 was used for each model, and the epidemic was initialized with a single infected case.
The standard SEIR formulation, which consists of three parameters (β, γ , and σ ), converged relatively quickly, as indicated by the resulting multivariate potential scale reduction factor (PSRF; a metric of MCMC convergence associated with Gelman and Rubin [19], where values close to 1 indicate success). The negative log-likelihood of the discrete model was lower than that of the continuous formulation, no matter the number of iterations, despite being slightly less computationally efficient. The MCMC sampling algorithm mixed well in both cases, with acceptance rates between approximately 0.1 and 0.6 generally treated as suitable (Table 3). Importantly, the R 0 values for each of the models were fairly consistent with one another, ranging from 4.25 to 4.62 (Table 4).  Note: Three of the parameters (σ , γ , and R 0 ) were estimated directly using an MCMC approach, whereas β was derived based on the other parameter estimates. Note: The computational time required to run six parallel chains, the mixing rate (as determined by the mean acceptance rate across all chains) and convergence (PSRF), as well as the negative log-likelihood of the best fitting chain are displayed. Four different sets were run, each with a different number of iterations to illustrate how the diagnostic statistics and fit changed with additional time. The asterisk indicates the fact that final run of the continuous Erlang was analysed using a longer burn-in period of 200,000 iterations (40%) rather than the 20% used for the other runs.
The Erlang models have more dynamic model structures than the standard SEIR formulations due to the complications involved in altering the number of compartments through which exposed and infected individuals move. This resulted in substantially longer computation times for these models compared to the simpler SEIR models ( Table 5). The Erlang formulations also required more iterations to converge consistently on the same parameter values. Despite nearly optimal mixing rates, the six chains of the discrete Erlang model still had PSRF values substantially higher than 1, indicating that even more iterations (and computational resources) were required for the six chains to converge on the same parameter  Note: In the discrete case, three of the parameters (β, γ E , and γ I ), whereas the shape parameters (k E and k I ) were fixed at 5. These shape parameters were multiplied by 3 to determine the number of boxcars in the E and I classes to ensure that ∼ 99% of individuals made it through the E and I states. Finally, the R 0 values were derived from the other parameter values. In the case of the continuous model, five parameters (k E , k I , γ E , γ I , and R 0 ) were fit directly with the MCMC sampler, while β was derived from the other parameter estimates. The asterisk indicates the fact that final run of the continuous Erlang was analysed using a longer burn-in period of 200,000 iterations (40%) rather than the 20% used for the other runs.
estimates. The best fitting chains of the discrete Erlang model (after at least 100,000 iterations) did fit more closely to the data than the best fitting models from the discrete standard SEIR model (Table 5; Figure 8). This indicates that the discrete Erlang offers a more accurate representation of epidemic dynamics than the simple SEIR model, but the additional computation time and the failure to converge consistently on the same parameter values makes the method much more demanding than the approximation offered by the standard SEIR model. Notably, the R 0 values estimated using the Erlang models ranged from 3.76 to 4.60, a slightly larger range than the standard SEIR models (likely due to the lack of convergence), but similar in magnitude to those obtained using the simpler alternatives (Table 6). In both cases, these R 0 values are higher than previously published estimates for Ebola, which have ranged from 1.35 to 3.65 [28].
The discrete stochastic Erlang model was not used to fit the empirical epidemic data because of the complexities involved in the interpretation of the results. Even when averages over a substantial number of instantiations are used to calculate the fit associated with a particular parameter set, the uncertainty bounds surrounding this mean may be just as important in producing the empirical data as the mean itself. A real-world epidemic is indeed a stochastic process, but it is unclear whether the epidemic represents an instantiation at one of the extremes around the set of parameters that govern the process or the mean or a realization somewhere in between. This fact hinders model fitting procedures in epidemiology when using a deterministic model as well. However, due to computational limitations, the most efficient method likely involves fitting a discrete deterministic model first, and then exploring the potential spread that an epidemic can exhibit around those resulting parameters using a stochastic formulation of the model.

Conclusion
Continuous-time Erlang SE n I m R models may be aesthetically appealing because epidemic events can arise at any time (i.e. time is continuous), and humped-shaped specific diseaseclass residence times provide better fits to real data than the exponentially distributed residence times associated with the standard single component SEIR model [49,56]. Data, however, are usually aggregated and binned into discrete-time intervals, not only for convenience, but also because the precise times of actual events (e.g. when exactly a particular individual was exposed to a dose of pathogen, or when exactly a particular individual became or ceased to be infectious) is not usually known. Thus, continuous models cannot be regarded as intrinsically more accurate than discrete models, particularly when noise within the data is taken into account. All else being equal, we should choose models that are computationally more tractable. This is particularly the case for discrete-time models where the interval is synchronized with the temporal aggregation of data, typically daily or weekly for fast epidemics, or monthly or quarterly for slow epidemics. Further, when extending formulations to incorporate stochasticity, discrete models are more easily implemented than having to apply a Gillespie or modified Gillespie algorithm to simulate continuous time stochastic processes, which may also require an unrealistic Poisson process assumption [17,18,54]. No matter how little or much stochasticity is associated with environmental factors, demographic stochasticity always plays a major role in epidemics because all epidemics start with a small number of infected individuals. This implies that even for values of R 0 comfortably larger than 1 (e.g. in the range 2-4), the fadeout of an epidemic before it can breakout is an event we can expect to occasionally observe [2,10,24,45,47,48].
The real challenge in applying epidemic models is to either strategically identify key factors that lead to outbreaks (thereby providing a guide to reducing the risk of such outbreaks) or tactically manage ongoing epidemics through appropriate interventions. In both cases, we should avoid focusing our computational efforts on model fitting at the expense of numerical simulations to obtain broad comprehension: the imperative to find the best fitting set of parameters for the model at hand comes at the expense of obtaining a deeper understanding through exploratory simulations. This exploration includes a full assessment of possible ranges of outcomes due to stochastic factors (both demographic and environmental). Additional exploration requires that we incorporate factors that profoundly affect the course of epidemics, yet are not included in our SEIR formulations. Specifically, here we addressed the issue of replacing exponential residence times in disease classes with the more realistic distributed-delay residences times. We also overcome the issue of modelling declining transmission rates over time by fitting our models only to data from early stages of the epidemic.
Future studies that fit models to data that continue beyond the exponential outbreak phase need to include time-varying transmission rate functions of the form depicted in Equation (5). Equally important for many diseases is eliminating the assumption that the epidemic is occurring in a spatially homogeneous population. In these cases, models need to include spatial structure. This is most easily done by assuming a metapopulation structure: i.e. a small network of relatively large homogeneous subpopulations [6,30,36]. In the context of epidemics, spatial structure typically has two major components. The first is the way populations are distributed into major population centres, each of which can be assumed to constitute a homogeneous subpopulation. The second is to appropriately characterize the movement of individuals among subpopulations [32], particularly individuals who are infected, but still asymptomatic (e.g. individuals who are in class E, but make the transition to class I once they have moved).
Whatever approach is taken in future studies to modelling epidemics in populations with notable substructures -whether that substructure depends upon spatial organization, age, behavioural, or alternative means of categorizing individuals -computationally efficient stochastic models are needed. Further, these models should exhibit realistic residence-time properties for individuals within disease classes. The discrete time-to-go model formulated here represents a useful step towards satisfying the latter proposition.