On the reproduction number in epidemics

This note provides an elementary derivation of the basic reproduction number and the effective reproduction number from the discrete Kermack–McKendrick epidemic model. The derived formulae match those derived from the continuous version of the model; however, the derivation from discrete model is a bit more intuitive. The MATLAB functions for its calculation are given. A real case example is considered and the results are compared with those obtained by the R0 and the EpiEstim software packages.


Introduction
In addition to the incidence and prevalence [1,2], the reproductive number is one of the most useful epidemiologic metrics for monitoring and controlling the development of an epidemic [3][4][5][6][7][8][9]. While the incidence and prevalence tell us something about the size of the epidemic, the reproduction number informs us of its spread rate.
There are two kinds of reproduction numbers: the basic reproduction number, R 0 , and the effective (or time-varying) reproduction number, R. R 0 is defined only at the beginning of an outbreak and 'represents the number of secondary cases that one case can produce if introduced to a susceptible population' [10]. R is defined over the entire period of an epidemic and represents 'the average number of secondary infectious produced by typical infective during the entire period of infectiousness' [11]. Neither R 0 norR is a natural constant, because their value depends on the epidemic model used. For various models used for analyzing properties and for calculation of R 0 , see [12][13][14][15][16][17][18][19][20][21][22][23][24]. For properties and calculation of R, see [25][26][27][28][29][30]. For critical consideration of R 0 , see [31] and references therein.
We first review the formulas for calculating R 0 and R based on the continuous Kermack and McKendrick epidemic model [32,33]. Heesterbeek and Dietz [14] used the following form of the model where I(t) is the incidence rate, S(t) is the density of susceptibles in the population, A(τ ) is excepted infectivity of a person with infection age τ . They define and derived the following expressions where S 0 is the initial density of the susceptible population and r is the natural growth rate. From these expressions, Wallinga and Lipsitch [34], by introducing generation time distribution deduce a formula for calculation of R 0 , which has the form For practical calculation, they offer a discrete version of this formula implemented in the R 0 package [35].
It seems that Fraser [26] was the first who deduce a general formula for calculation of R from the Kermack and McKendrick model. He starts with a generalization of Equation (1) in the form of a continuous renewal equation where β(t, τ ) is the transmissibility. He defines the case reproduction number R c (t) as 'the average number of people someone infected at time t can expect to infect', thus and the instantaneous reproduction number R(t) as 'the average number of people someone infected at time t could expect to infect should conditions remain unchanged', expressed by the formula, Assuming that β(t, τ ) can be factorized as he deduces and from this formula, a discrete version of the formula for calculation of R and R c . The method was implemented in the EpiEstim package [36,37]. We will follow the above ideas, but, unlike the authors mentioned, we will derive formulas for calculating the reproduction number from the discrete form of the Kermack-McKendrick epidemic model. In particular, this derivation does not need the factorization assumption Equation (9), and the derivation of the formula (4) for generation time is a bit more intuitive. For other discrete models, see [38] and references therein.

Kermack-McKendrick epidemic model
Let I n,k denote the number of infectious peoples at calendar time n and infection-age k. Let I n,0 be the number of peoples who became infected at the calendar time n. We call them nth generation. The transition process of the epidemic for the nth generation is as follows I n,0 → I n+1,1 → · · · → I n+k,k → · · · , and the whole epidemic process may be schematically represented in this way [32,33]: Here, t is calendar time and τ is the infection age. Two hypotheses govern the epidemic process. The first one is that a decline in the number of infected in nth generation depends only on age. Thus, we have where 0 ≤ ψ k−1 ≤ 1 is the removal rate. From this, it follows that can be interpreted as the probability that an individual of age k is still infectious [3,19]. Before stating the second hypothesis, we decompose I n,0 to local (I local n,0 ) and imported (I imported n,0 ) cases [37]: The second hypothesis assumes that I n,0 is proportional to the number of contacts each generation has with all susceptible left in time interval n [32]: where φ k is the infective rate at age k and S n is the number of susceptible individuals at the time interval n. Using Equations (12), (13) and (14), we obtain where A k ≡ φ k B k is the infectivity at age k [19,39]; i.e. the number of new infectives in unit time per ineffective contact. Note that the Equation (15) assumes that 'an individual is not infective at the moment of infection' [32], so, starting at the 0th generation, we assume where κ ≥ 0 is a constant and I 0 the number of infected individuals initially. Equations (16) and (15) represent the discrete Kermack-McKendrick epidemic model that we will use to derive the formulas for calculating R 0 and R.

Basic reproduction number
To derive the formula for calculation of R 0 , we start with the epidemic process for 0th generation: Thus, the number of infectious individuals of 0th generation at each time interval is, by Equation (12), and the new infected individuals produced by these infectives are, by Equation (15), The total number of secondary infections produced by 0th generation is thus i.e. the susceptible population remains the same (which would literally mean that the newly infected individual is immediately removed and replaced by a new susceptible, but practically that the population is large), and we set I 0 = 1 and I imported 0,0 = 0 in Equation (16), then I 0,0 = κ, and thus, by definition [14], This is a discrete form of Equation (2). Note. We could set I 0 = 0 and I imported 0,0 = 1, but we want to keep κ in the expression in order to use an empirical law for I, which depends on the initial condition I 0 .
The sequence of infected individuals produced by one infective in a completely susceptible population is thus, by Equation (17), Dividing each term of this sequence by Equation (20), we obtain the sequence of the fractions of infected produced by one infected case: where Thus, w k can be interpreted as the probability that an infective individuals, produces a new case at age k. Equation (23) is a discrete form of Equation (4), and thus, by definition, the generation-time distribution. For more on generation time, see [40][41][42]. From Equations (20) and (23), we can express infectivity at age k as Substituting Equation (24) for A k into Equation (15), and taking I imported n,0 = 0 and S 0 = S 1 = · · · , we obtain This formula cannot be used directly for calculation of R 0 from observed incidences because these are assumed to be produced only by the 0th generation in a wholly susceptible population. Thus, to obtain the formula for computing of R 0 from Equation (25), we assume that infection is a continuous process that in its initial phase is governed by the exponential growth rate where I 0 and r can be determined from observable data by regression analysis. Using Equation (26) and integrating it over a single time interval, we obtain where κ = e r − 1 r .
Substituting Equation (27) for I n,0 and Equation (28) for κ into Equation (25), we obtain Wallinga-Lipsitch formula [34]: This is a discrete form of Equation (5). Note that calculation of R 0 by Equation (29) depends on n; however, practically w k = 0, for k > K, so we should take n = K. For more on the calculation of R 0 by Equation (29), see [35].

The effective reproduction number
By similar reasoning as in the previous section, we will derive the formula for calculation of the effective reproduction number R. For the nth generation, we have the following transition process I n,0 ⇒ I n+1,1 → I n+2,2 → · · · .
Using Equation (12), the number of individuals who are still infective in the nth generation is The sequence of newly infected individuals generated by these infectives is The total number of secondary infections generated by the nth generation is therefore We have two possibilities regarding S n . The first is that, similar to the calculation of the R 0 for the 0th generation, where the population of the initial number of susceptibles does not change, we assume that the current population of the susceptibles does not change If we insert this and I n,0 = 1 into Equation (30), then we can define the instantaneous reproduction number R n as This is a discrete analogue of Equation (8). From Equation (15), we have Extracting A k from Equation (23) and substituting Equation (33) into (32), we obtain the well-known Fraser formula [26]: In particular, if I n,0 = 1 (or any other constant) for all n, then R n → 1 as n → ∞. When I local n,0 = 0 for some n, then, for this n, we have R n = 0. If w k = 1 for k = m and w k = 0 for all other k, i.e. an infected person is practically infectious for only one day, then R n = I n,0 /I n−m,0 . This relation can be used for a quick but crude estimate of R n . Equation (34) is a discrete form of Equation (10). Substituting Equation (24) into (32), and taking κ = 1, we obtain This is a well-known result for the SIR model [11]: the ratio of R and R 0 represents the fraction of susceptibles who are still in the population.
If we abandon condition (31) and put I n,0 = 1 into Equation (30), then we can define the case reproduction number R c,n as We note that none of the formulas (29), (34) and (37), depend on the size of the initial susceptible population. Thus, they can thus be used to calculate the reproduction number from daily incidence reports once the distribution of generation time is known.

Examples
For practical calculation, we write a MATLAB programme using functions that are listed in the Appendix. These functions assume w k = 0, k > K and I n,0 = 0, n > N.
For a numerical example, we take data for Spanish influenza in Germany 1918-19 [43] and compare results of this calculation with results obtained by the R0 package [35] and the EpiEstim package [36].
In Figure 1, we see that the calculation of R c by Equation (37) and those obtained by the R 0 package using the Wallinga-Teunis method [25] are practically identical; the absolute difference is less than 10 −7 . The difference is noticeable towards the end of data; i.e. on the length of the vector giving the generation time. Calculation of R c by formula (37) implicitly assumes that R c,n = 0; i.e. I n,0 = 0 for n > N. In particular, this yields R c,N = 0. The calculation of R c that includes only valid data is therefore limited to n < N − K.
In Figure 2, we see that the absolute difference between the calculation of instantaneous reproduction number by Equation (34) and by function estimate_R from the EpiEstim package is in a range 10 −2 to 10 −1 . The reason for the difference is that the function estimate_R implements a formula for calculating R t , which, for statistical reasons, adds Gamma distribution parameters a in the numerator and 1/b in the denominator of Equation (34) [36]. We note that this has a side effect: R t > 0, even if I t = 0; however, it prevents the possibility of an undefined 0/0 case.

Conclusion
We have shown that the well-known formulae for calculating the various epidemic reproduction numbers can be derived from the classical discrete Kermack-McKendrick epidemic model. An advantage of the model is that it is more intuitive and some of the assumptions of the continuous model can be omitted. In particular, as already mentioned, the derivation does not require a factorization assumption on the epidemic's transmissibility.
The practical example shows that case reproduction number calculated by the formula (37) gives numerically almost the same result as the reproduction number determined by the algorithmically much more demanding Wallinga-Teunis method.

Disclosure statement
No potential conflict of interest was reported by the author(s).