Building Mean Field State Transition Models Using The Generalized Linear Chain Trick and Continuous Time Markov Chain Theory

The well-known Linear Chain Trick (LCT) allows modelers to derive mean field ODEs that assume gamma (Erlang) distributed passage times, by transitioning individuals sequentially through a chain of sub-states. The time spent in these states is the sum of $k$ exponentially distributed random variables, and is thus gamma (Erlang) distributed. The Generalized Linear Chain Trick (GLCT) extends this technique to the much broader phase-type family of distributions, which includes exponential, Erlang, hypoexponential, and Coxian distributions. Intuitively, phase-type distributions are the absorption time distributions for continuous time Markov chains (CTMCs). Here we review CTMCs and phase-type distributions, then illustrate how to use the GLCT to efficiently build mean field ODE models from underlying stochastic model assumptions. We generalize the Rosenzweig-MacArthur and SEIR models and show the benefits of using the GLCT to compute numerical solutions. These results highlight some practical benefits, and the intuitive nature, of using the GLCT to derive ODE models from first principles.


Introduction
Continuous time state transition models, often formulated as mean field ODE models, are widely used throughout the biological sciences and across multiple scales. Examples include models of multi-species interactions, infectious disease transmission, cell proliferation, and various other applications in which entities transition among a finite number of states (e.g., Allen, 2007;Anderson and May, 1992;Arrowsmith and Place, 1990;Beuter et al., 2003;Clapp and Levy, 2015;Dayan and Abbott, 2005;Edelstein-Keshet, 2005;Ellner and Guckenheimer, 2006;Hirsch, Smale, and Devaney, 2012;Izhikevich, 2010;Keener and Sneyd, 2008a,b;Meiss, 2017;Murray, 2011a,b;Strogatz, 2014;Wiggins, 2003;Yates, Ford, and Mort, 2017). One criticism of mean field ODE models is that they often implicitly assume the time individuals spend in the different states are exponentially distributed, and it is known that the timing of state transitions can very meaningfully affect model dynamics and model outputs in an applied setting (Getz et al., 2018;Krylova and Earn, 2013;Metz and Diekmann, 1986;Metz and Diekmann, 1991;Nisbet, Gurney, and Metz, 1989;Robertson et al., 2018;Wearing, Rohani, and Keeling, 2005). That is, an ODE model with a linear loss rate can be interpreted as implicitly assuming an underlying stochastic state transmission model with an exponentially distributed dwell-time in that focal state. For example, the simple model dx/dt = b − d x is consistent with assuming an underlying stochastic state transition model in which individuals spend an exponentially distributed amount of time with mean 1/d in the state corresponding to variable x.
One remedy to address this issue with ODE models is known as the Linear Chain Trick (LCT; Hurtado and Kirosingh, 2019;Smith, 2010, and references therein), which allows modelers to instead assume gamma (Erlang 1 ) distributed passage times (a.k.a., dwell times). This is accomplished by partitioning a state into a series of k sub-states, where individuals transition sequentially through this ''linear chain" of states. The resulting time spent in this collection of sub-states is thus the sum of k exponentially distributed random variables, and therefore follows an Erlang distribution (if each exponential has the same rate) or a generalized Erlang 2 distribution if the rates differ.
The Generalized Linear Chain Trick (GLCT) (Hurtado and Kirosingh, 2019) extends this technique to allow modelers to assume these passage times follow a much broader family of distributions that includes the phase-type family of distributions (Bladt and Nielsen, 2017a,b;Horváth, Scarpa, and Telek, 2016;Horváth and Telek, 2017;Reinecke, Bodrog, and Danilkina, 2012). This broad family includes exponential, Erlang, hypoexponential, hyperexponential, Coxian and some other named distributions. Intuitively, phase-type distributions can be thought of as the family of all possible hitting time (or absorption time) distributions for continuous time Markov chains (CTMCs). In addition, statistical methods exist for estimating such distributions from data (Horváth, Scarpa, and Telek, 2016;Horváth et al., 2012;Hurtado and Kirosingh, 2019, and references therein) allowing researchers to build approximate empirical distributions into ODE models using a more flexible family of distributions than only the Erlang distributions.
In this paper, we illustrate how to use the GLCT alongside concepts and techniques from CTMC theory to build and numerically solve mean field ODE models using a much richer set of possible underlying stochastic model assumptions. The paper is organized as follows. First, we review CTMCs and phase-type distributions. We then state the GLCT for phase-type distributions and, for comparison, the well-known LCT. In the Results section, we generalize some simple biological state transition models by replacing their implicit assumption of exponentially distributed passage time assumptions with arbitrary phase-type distributions. Lastly, we investigate some of the computational costs and benefits of using this generalized model framework with regards to computing numerical solutions.

Continuous Time Markov Chains and Phase-Type Distributions
To provide proper context for an intuitive understanding of the Generalized Linear Chain Trick (GLCT), we briefly review continuous time Markov chains (CTMCs) with a focus on CTMCs that have a single absorbing state. Our focus will then be on the probability distributions that describe the time it takes to reach that absorbing state starting from one of the transient states, since these absorption time distributions define the phase-type family of probability distributions. The following summaries build upon similar descriptions laid out in Hurtado and Kirosingh (2019).

Continuous Time Markov Chains
Discrete time Markov chains (DTMCs) describe the transition of an individual (or other distinct entity) among a set of n states. The transition probabilities from a state i to a state j (p ij where 1 ≤ i, j ≤ n) are best organized using a transition probability matrix P, where p ij is value in the i th row and j th column of the matrix (P ij = p ij ). For our purposes below, we will restrict our attention to Markov Chains in which the first k = n − 1 states are transient states, and the last (k + 1) state is an absorbing state. This means the system eventually enters this last state and remains there on each subsequent time step with probability 1.
The transition probability matrix P can be written in block form according to these first k = n − 1 transient states (we'll call this set of states X) and the last absorbing state as where P X is a k × k matrix describing transition probabilities among transient states, P a is the k × 1 vector of probabilities of transitioning from the i th transient state to the absorbing state, and 0 is a 1 × k vector of zeros.
In a continuous time Markov chain (CTMC), these transitions don't occur according to a fixed time step, but instead each transition occurs after an exponentially distributed amount of time.
If the individual is in state i, that exponential distribution has rate λ i or equivalently has mean duration 1/λ i . Let R be the vector of the rates λ i , for i = 1, . . . , k. Due to the memorylessness property of exponential distributions transitions from state i to state i can effectively be ignored, thus we can formulate a new transition probability matrix that describes an equivalent Markov chain but where we only track transitions to new states. This reformulated Markov chain is known as the embedded jump process (or embedded Markov chain), and it is described with a transition probability matrix P where the diagonal entries are 0 and the off-diagonal transition probabilities p ij = p ij / j =i p ij . That is, these are just the off-diagonal entries of P with the diagonal set to 0, and the rows normalized so each row sums to 1. For our purposes, since a Markov chain with an absorbing state is not ergodic and therefore does not properly have an embedded jump process representation, the above procedure is only applied to k rows corresponding to transient states.
Thus, the last row of P and P will be the same, so that the last state in the Markov chain remains an absorbing state.
In a CTMC context, the transition probability matrix P and rate vector R are combined into a single matrix called the transition rate matrix Q. The entries of Q can be thought of as the mean-field, per-individual loss rates from each state (along the diagonal) and the transition rates from state i into state j (the off diagonal entries; see below). It has the block form where A is a k × k matrix describing the transition rates among transient states, B is the k × 1 vector of transition rates from the transient states to the absorbing state, and the bottom row is all zeros for the absorbing state.
Matrices A and B are constructed from the entries of the transition probability matrix P and the vector of rates for the exponential distributed dwell-times R as follows. The i th diagonal entry of A is the loss rate −λ i and the rest of the entries in the i th row are the product of λ i and the transition probability p ij . That is, the ij off-diagonal entries of A are the per-individual transition rates from the i th state into the j th state, given by λ i p ij . Since the rows of P sum to 1, the rows of Q sum to 0, and it then follows that vector B is equal to the negative of the row sums of A. Thus, we can write B = −A 1, where 1 is a column vector of k ones.
Finally, assume that the initial state of such a CTMC is one of the k transient states. Let α be the length k column vector of probabilities (i.e., k i=1 α i = 1) that define the initial state distribution over these k states.
Note that, for CTMCs which have k transient states and 1 absorbing state, all of the information necessary to describe the CTMC is contained in the transition rate matrix for transient states, A, and initial distribution vector α. As detailed next, these quantities are also sufficient to parameterize the corresponding phase-type distribution.

Phase-Type Distributions
With the above family of CTMCs in mind, let T i be defined as the duration of time that it takes to first reach the absorbing state, given the CTMC starts in the i th transient state. We call this an absorption time. More generally, let T be the absorption time given that the initial state is determined by the initial probability vector α (i.e., T follows the mixture distribution of random variables T i with mixing probabilities α). Phase-type distributions are the family of absorption time distributions for all such T described above.
The most familiar examples are the exponential distribution, and the Erlang distribution (i.e., those gamma distributions that have an integer shape parameter k ≥ 1) which can be thought of as the sum of k independent exponentially distributed random variables, each with rate r. Erlang distributions can be parameterized in terms of their mean µ and coefficient of variation c v (the standard deviation over the mean), or their rate r and shape k, where µ = k r , σ 2 = k r 2 , c v = 1 √ k , and thus, r = µ σ 2 , and k = More generally, phase-type distributions are parameterized by vector α and transient state rate matrix A (as defined in the previous section), and have the probability density function, cumulative distribution function, and j th moment given (respectively) by: Here 1 is a column vector of ones that has the same number of rows as α and A. Note that α and A are not a unique parameterization of a given phase-type distribution, and there are equivalent parameterizations using vector-matrix pairs of the same dimension as well as different dimensions. Phase-type distributions can be classified as cyclic (transient states can be visited infintely often) and acyclic (transient states can only be visited once). This family of distributions has been relatively well studied in the queuing theory literature, and elsewhere, and readers are encouraged to consult Altiok (1985), Bladt and Nielsen (2017a,b), Horváth, Scarpa, and Telek (2016), Horváth et al. (2012), Reinecke, Bodrog, and Danilkina (2012), and Reinecke, Krau, and Wolter (2012) for further details.
Additionally, freely available computational tools such as BuTools for Matlab and Python (Horváth andTelek, 2017, 2020) enable researchers to fit phase-type distributions to data. This fact, combined with the Generalized Linear Chain Trick, allows for the construction of ODE models that incorporate empirically derived distributional assumptions for the time spent in a given state.

Generalized Linear Chain Trick
The GLCT provides modelers with a direct way to take an existing ODE model that includes a state that has an exponentially distributed dwell time, and obtain a new set of ODEs where that exponentially distributed dwell time has been replaced with a phase-type dwell time distribution. This is done by partitioning that focal state into a set of sub-states and using the GLCT to write the new systems of ODEs that govern those sub-states using the matrix and vector parameterization of the assumed phase-type distribution. This technique can also be used to implement the classic Linear Chain Trick (LCT), since Erlang distributions (i.e., gamma distributions with integer shape parameters) are a subfamily of phase-type distributions.
The GLCT in its most general form (Hurtado and Kirosingh, 2019) extends the GLCT for phasetype distributions to the scenario where the rates and probabilities in the CTMC framework described above can vary with time. Here, we only provide a statement of the GLCT for phase-type distributions: Theorem 1 (GLCT for phase-type distributions [Corollary 2 in Hurtado and Kirosingh, 2019]). Assume individuals enter state X at rate I(t) and that the distribution of time spent in state X follows a continuous phase-type distribution given by the length k initial probability vector α and the k × k matrix A. The mean field equations for these transient sub-states x i are given by where the rate of individuals leaving each of these sub-states of X is given by the vector (−A 1) • x, where • is the Hadamard (element-wise) product of the two vectors, and thus the total rate of individuals leaving state X is given by the sum of those terms, i.e., Note that the influx of individuals at time t (at rate I(t)) is distributed across the sub-states of X according to the initial distribution vector α, and the second term in eq. (5) describes both the movements among sub-states of X as well as the loss rate from the state X from each sub-state.
The Linear Chain Trick (LCT) has been known for decades, and is special case of the GLCT for phase-type distributions stated above (Hurtado and Kirosingh, 2019). Here we give a formal statement of the LCT, which assumes an Erlang distributed dwell time, with shape parameter k and rate parameter r.
Corollary 1 (Linear Chain Trick). Consider the GLCT above. Assume that the dwell-time distribution is an Erlang distribution with shape k and rate r (or if written in terms of shape k and mean τ = k/r, then use rate r = k/τ ).
Then the corresponding mean field ODE equations for the k sub-states of X are where the total loss rate from state X at time t is the loss rate from the final sub-state, r x k (t).
Proof. The phase-type distribution formulation of the Erlang distribution with shape k and rate r is given by v and M below, and substituting these into eq. (5) which yields the desired result.
See Hurtado and Kirosingh (2019) for a direct proof that uses a recursive relationship between Erlang density functions and their derivatives.

Results
In the sections below, we extend two well-known models using the GLCT by replacing the implicit exponentially distributed dwell time assumptions of these models with phase-type distribution assumptions. These more general model formulations can also be used as a way to formulate models that could otherwise be derived using the standard LCT (i.e., the assumption of Erlang distributed dwell times). This may be the more desirable approach since the phase-type formulation of such models can be more practically and computationally advantageous to work with, which we show in section 2.4.

Rosenzweig-MacArthur Predator-Prey Model
Maturation delays in population models can influence model outputs, although such delays are not always incorporated into models used in applications (Robertson et al., 2018). In this section, we illustrate how one can use the GLCT to incorporate phase-type maturation times into such population models, using the widely used Rosenzweig-MacArthur model of predator-prey (consumer-resource) dynamics (Murdoch, Briggs, and Nisbet, 2003;Rosenzweig and MacArthur, 1963): In the absence of predators (P ), the prey population (N ) is subject to logistic growth, and predators consume prey following a Holling's type II functional response (Dawes and Souza, 2013;Holling, 1959a,b;Murdoch, Briggs, and Nisbet, 2003). Predators will then live for an exponentially distributed lifetime with mean 1/µ.
One approach to incorporating a maturation delay of duration τ is to formulate a delay differential equation (DDE), as in (Xia et al., 2009): This can be thought of as the limit of a distributed delay model, with mean delay time τ , for which the variance or coefficient of variation goes to zero. This corresponds to a delay distribution with point mass at τ (i.e., the distribution can be described with a Dirac delta function). The LCT has long been used to approximate such limiting cases in DDE models by assuming instead a delay distribution that is Erlang distributed with mean τ and a very small coefficient of variation, i.e., a large shape parameter k = 1/c 2 v (Hurtado and Kirosingh, 2019;Smith, 2010). Writing this approximating model, as in Hurtado (2020), yields the Rosenzweig-MacArthur model with Erlang distributed maturation time in the predators: The sub-states x j , j = 1, . . . , k, track the immature stages of the predators before they mature.
Using the GLCT, the above model can be generalized in two ways. First, the Erlang distributed maturation time assumption that yields the sub-states x i can be replaced by the assumption of a more general phase-type distribution with matrix-vector parameterization α X and A X . Similarly, the exponentially distributed time duration that predators spend as adults can also be replaced with a more general phase-type distribution with parameter vector α P and matrix A P . According to the GLCT -where x(t) denotes the vector of maturing predator sub-states x i (t), y(t) is the vector of adult predator sub-states y j (t), and where P (t) = all j y j (t) -these assumptions yield the more general model: Observe that eqs. (10) are the special case of eqs. (11) where the phase-type distribution matrixvector parameterization for an Erlang distribution with mean τ and shape k is given by and for an exponential distribution with rate µ, Note that eqs. (10) are a much more compact way of formulating such generalized models without the need to specify the number of sub-states. As shown below, this formulation allows modelers to write more efficient computer code for computing numerical solutions to such models, and also can be used with computer algebra systems to generate ODEs like eqs. (10) from first principles.

SEIR Model
SIR-type models of infectious disease transmission are widely used in the study of infectious diseases, and can help inform public health efforts to limit the spread of infectious diseases (Anderson and May, 1992;Diekmann and Heesterbeek, 2000;Keeling and Grenfell, 1997;Kermack and McKendrick, 1927, 1932, 1933, 1991aLloyd, 2009;Wearing, Rohani, and Keeling, 2005). For example, such models are currently being used in response to the ongoing SARS-CoV-2/COVID-19 pandemic.
It is known that including a latent period prior to the onset of symptoms and infectiousness, as well as incorporating non-exponential distributions for the time spent in the different infection states, can be important to include in models that are being used in such applications (Feng, Xu, and Zhao, 2007;Wang et al., 2017;Wearing, Rohani, and Keeling, 2005).
Here we use the GLCT to formulate a more general SEIR model where we assume that the latent period (time spent in state E) and infectious period (time spent in state I) follow independent phase-type distributions. For simplicity, we assume the state variables have been scaled by the total population size so that S + E + I + R = 1, and that there are no births or deaths in the model.
To begin, consider this simple SEIR model, where S is the fraction of susceptibles in the population, E the fraction of exposed individuals with latent infections, I the fraction of individuals with active infections, and R the fraction of recovered or removed individuals: Next, assume the latent period distribution is phase-type with parameters α E and A E , and the infectious period distribution is also phase-type, but with parameters α I and A I . Let x = [E 1 , . . .] T and y = [I 1 , . . .] T be the column vectors of the fraction of individuals in each of the exposed and infectious sub-states, respectively, where E = E j and I = I i . Then by the GLCT we can write the mean field ODEs for the generalized SEIR model as follows: To assume, for example, an Erlang distributed latent period with mean τ E and shape k E , i.e., rate 1/τ E and coefficient of variation c v = 1/ √ k E , then one would use Similarly, an Erlang infectious period distribution with mean τ I and shape parameter k I (coefficient of variation 1/ √ k I ) would be parameterized by Simplifying the right hand side of eqs. (15) using the matrix and vector definitions above yields the familiar sub-state equations for an SEIR model with Erlang distributed latent and infectious periods, eqs. (18).
Other phase-type distributions could be assumed, e.g., by fitting non-Erlang phase-type distributions to data using computational tools like the free software BuTools (Horváth and Telek, 2017, 2020).

SEIR Model with Heterogeneity Among Infected Individuals
The examples above illustrate how an existing DDE or ODE model can be generalized by assuming that states with fixed or exponentially distributed dwell times instead have phase-type distributed dwell times. Here we take the generalized SEIR model above and use the GLCT to further explore more complex model assumptions. We do this by considering two special cases of this generalized model (see Figs. 1, 2): one that models hospitalization in a manner that does not change the distribution of time in the infected class, and a second case that models heterogeneity in the severity and duration of disease. In each case, we assume that infected individuals will either experience mild or severe disease with the potential for distributional differences in the duration of infection.
For simplicity, here we assume the removed class R contains both recovered and deceased individuals, and that, upon transitioning from the class of individuals with latent infection (E) to the infectious class (I), a fraction ρ will go on to develop severe symptoms (in state I s ) that may require hospitalization, whereas the remaining fraction (1 − ρ) do not develop severe disease and instead enter a different state of more mild disease (I 0 ). Using the GLCT, the states I 0 and I s are partitioned into sub-states, where the numbers in each are described by vectors y 0 and y s , respectively, and their dynamics obey the following system of ODEs: Eqs. (19) can be viewed as a special case of eqs. (15) defined in terms of a mixture of two phase-type distributions, where y = [y 0 T , y s T ], the vector α I = [(1 − ρ)α I 0 T , ρα Is T ], and A I is the block diagonal matrix A I =diag(A I 0 , A Is ). The two cases below are described in the context of eqs. (19).

Case 1: Hospitalization Independent of Progress Towards Infection Resolution
In this scenario, individuals progress towards recovery or death according to an Erlang distribution with rate λ and shape k y 0 . Independently, they also move towards hospitalization according to an Erlang distributed hospitalization time with rate r and shape k H (for an example of this structure used to model Ebola, but where k H = 1, see Feng et al., 2016). Modeling this with Erlang latent and infectious period distributions for simplicity, and according to the competing Poisson processes motif detailed in (Hurtado and Kirosingh, 2019), yields a sub-state structure as shown in Fig. 1. The matrix-vector pairs α E , A E , and α I 0 , A I 0 are as described above for Erlang distributions.
The matrix-vector pair A Is and α Is are defined as follows 3 . If we order these states starting at the I 11 entry and work across each rows left to right before moving down to the next row (see Fig. 1), then the associated rate matrix has the following block form with each column and row having k H blocks of dimension k 1 × k 1 . This block structure corresponds to each row of the I 1 sub-states shown in the lower portion of Fig. 1 as a k H × k 1 grid of sub-states.
where the diagonal and superdiagonal blocks are R · · · · · · · · · · · · · · · . . . . . . . . . . . . Figure 1: SEIR-type model with heterogeneity in illness severity and hospitalizations that do not alter the infectious period. A special case of eqs. (19) (cf. Fig. 2). See the main text for details. Here the standard LCT has been applied to the exposed (E) state, and the GLCT is applied to the infectious (I) states, using the competing Poisson Process approach (Hurtado and Kirosingh, 2019) to model hospitalizations in a fraction of the infectious individuals. This ensures that the time spent in I is independent of whether or not individuals transition to the hospital. and the bottom right diagonal entry is Together, the matrix A Is and the length k H k Is initial distribution vector α Is = [1, 0, · · · , 0] T complete the parameterization of model eqs. (19) to yield the model structure illustrated in Fig. 1.
Observe that the matrix above has the same diagonal and superdiagonal blocks in all rows except for the last row, for which the diagonal entries are −λ and not −λ − r. Here the dwell time in each sub-state of I s (except for the last row) follows the minimum of two independent exponential distributions with rates λ and r, so by the properties of exponential distributions, those dwell times are each exponentially distributed with rate λ + r. Individuals leaving those sub-states then either move horizontally towards resolving their infections with probability λ/(λ + r), or move downwards towards hospitalization (the last row) with probability r/(λ + r) (Hurtado and Kirosingh, 2019). In the last row, individuals are hospitalized and can only move horizontally towards the resolution of their illness. This structure ensures that time transition to the hospital (the last row) does not impact their overall distribution of time spent in state I 1 .
· · · · · · · · · · · · Figure 2: SEIR-type model with heterogeneity in illness severity in which a fraction of infected individuals experience severe illness and a fraction of those require critical care. This is also a special case of eqs. (19) (see Fig. 1). See the main text for details.

Case 2: Hospitalization With Heterogeneous Need for Critical Care
To further illustrate the flexibility of eqs. (15) and (19), we now consider the model structure illustrated in Fig. 2. In this case, we make similar assumptions to the case above, except for the states that pertain to the fraction ρ of individuals who experience severe illness. Those individuals exhibit an Erlang distributed period of more mild disease, with rate r 1 and shape parameter k 1 . Those individuals then either recover (with probability f ) after an Erlang distributed period of time with rate r R and shape k R , or they become even more ill (with probability 1 − f ) and require hospitalization for an Erlang distributed amount time with rate r C and shape k C . As above, all individuals eventually enter a removed state R which, for our purposes here, makes no distinction between recovery and death.
Comparing Figs. 1 and 2, we see that this second scenario is also a special case of eqs. (19), and only differs from that case in the definition of matrix A Is and the length k 1 + k R + k C initial distribution vector α Is = [1, 0, · · · , 0] T . Ordering the substates of I s from I 11 to I 1k 1 to I R1 to I Rk R to I C1 to I Ck C , then, by the assumptions above, A Is is given by The two examples above illustrate the utility of deriving models using the GLCT and by thinking about deriving model structure from first principles using intuition from CTMCs. This approach can be leveraged for the analytical study of such models (Hurtado and Richards, in prep.), but as we show in the next section it can also facilitate the process of computing numerical solutions to such systems of ODEs.

Benchmarking Numerical Solutions: LCT vs GLCT
Software like Matlab, Python, Julia, and R, have built-in ODE solvers that implement various numerical methods for obtaining numerical solutions to ODEs. In Fig. 3 8)] is included as a baseline). The second and third cases are mathematically equivalent systems. For smaller shape parameters (lower dimension systems) the GLCT model formulation is relatively slower than explicitly writing out the 2M + 1 equations, whereas for larger shape parameters (higher dimension systems) the GLCT formulation becomes markedly faster. This is likely due to the efficiency of the matrix computations. The x-axis values M indicate the number of maturing predator sub-states (k x = M ) and adult predator sub-states (k y = M ), which yields a 2M + 1 dimensional model. Numerical solutions were computed using the ode() function in the deSolve package (Soetaert, Petzoldt, and Setzer, 2010) in R (R Core Team, 2020), using method ode45 with atol=10^-6, for time points 0 to 500 in increments of 1, with r = 1, K = 1000, a = 5, k = 500, χ = 0.5, µ x = 0.5, µ y = 1, N (0) = 1000, x i (0) = 0 (i ≥ 1), y 1 (0) = 10, and y j (0) = 0 (j > 1). See Appendix A for details.  14)] is included as a baseline). The second and third cases are mathematically equivalent systems. For smaller shape parameters (lower dimension systems) the GLCT model formulation is relatively slower than explicitly writing out the 2N+2 equations, whereas for larger shape parameters (higher dimension systems) the GLCT formulation becomes faster, likely due to the efficiency of the matrix computations. The x-axis values N indicate the number of sub-states in each of E (k E = N ) and I (k I = N ), which yields a 2N + 2 dimensional model. Numerical solutions were computed using the ode() function in the deSolve package (Soetaert, Petzoldt, and Setzer, 2010) in R (R Core Team, 2020), using method ode45 with atol=10^-6, for time points 0 to 100 in increments of 0.5, with β = 1, µ E = 4, µ I = 7, S(0) = 0.9999, E 1 (0) = 0.0001, E i (0) = 0 (i > 1), I 1 (0) = 0, I i (0) = 0 (i > 1), and R(0) = 0. See Appendix A for details.
To summarize these results, neither approach performs uniformly better than the other. In both comparisons, low dimensional models (i.e., smaller shape parameters and thus larger coefficients of variation) coded using the more explicit (LCT-based) model formulation, like eqs. (10) and (18), yielded numerical solutions faster than mathematically equivalent models coded using the more general phase-type (GLCT-based) formulation, like eqs. (11) and (15).
For higher dimensional models (i.e., larger shape parameters and thus smaller coefficients of variation), the phase-type (GLCT) formulation of these models outperformed their LCT-type counterparts. This is very likely the result of the matrix calculations used in the phase-type (GLCT) formulation of the models being more computationally efficient.
It is noteworthy that the GLCT-based ODE function only needs to be coded once, as it is agnostic of the number of sub-state variables. In contrast, researchers typically hard-code the number of sub-states for such computations using an LCT-based model, and must write multiple ODE functions to consider model outputs using different shape parameters for the assumed Erlang distributions.
In summary, the GLCT may allow for faster computing times for high dimensional systems and it can simplify writing code for ODE solvers since a single instance of the model can be used to simulate ODE models with an arbitrary number of dimensions.

Discussion
ODE models are widely used in the biological sciences and can often be viewed as mean field models for some (oftentimes, unspecified) stochastic state transition model. Such ODE models are sometimes criticized for their implicit assumption of exponentially distributed dwell times (e.g., exponentially distributed lifetimes of organisms), and their frequent lack of age or stage structure, which may not adequately capture important time lags in the system being modeled, such as the maturation times of organisms or latent periods in disease transmission models.
In this paper, we have provided an overview of the Generalized Linear Chain Trick (GLCT), a relatively new tool modelers can use to improve upon existing ODE models to address these shortcomings, and we illustrate its utility using multiple examples. The GLCT extends the wellknown Linear Chain Trick (LCT) to a broad family of probability distributions known as the phase-type distributions, and also clarifies in a straightforward way how mean field ODEs reflect underlying stochastic model assumptions when viewed from the perspective of continuous time Markov chains (CTMCs). Therefore, we have also provided an overview of CTMCs, and their absorption time distributions in particular. Importantly, the phase-type distributions comprise these absorption time distributions, and include a broad set of named probability distributions including exponential, Erlang, hypoexponential (generalized Erlang), hyperexponential, and Coxian distributions. Freely available statistical tools like BuTools (Horváth andTelek, 2017, 2020) exist for fitting phase-type distributions to data, enabling modelers to build approximate empirical dwell time distributions into ODE models using the GLCT.
We have illustrated how to apply the GLCT by using it to derive extensions of two familiar models: the Rosenzweig-MacArthur Predator-Prey model, and the SEIR model of infectious disease transmission. We showed how two special cases of the generalized SEIR model can be constructed to accommodate additional complexity among infected individuals: the first case models a hospitalization scenario in which the transition to the hospitalized state has no impact on the distribution of the overall time individuals spend sick (cf. Feng et al., 2016), and the second case models heterogeneity in the progression and severity of infection outcomes. These examples illustrate how the GLCT can be used to refine model assumptions in a rigorous manner, but without the need to explicitly derive mean field ODEs from stochastic models and/or mean field integral equations.
Lastly, we showed some of the potential computational benefits of using a GLCT-based approach to ODE model formulation by comparing the time it takes to compute numerical solutions of ODEs using the standard approach versus using a GLCT-based approach. We found that, for low dimensional models, the GLCT-based computations are slower than using a more traditional approach; however for higher dimensional models, the GLCT-based computations were faster. This improvement in computing time is likely the result of the computational efficiency of doing matrix and vector based computations. In addition to faster computation times, another benefit of the GLCT-based approach is that only one ODE model function needs to be written since it is agnostic of the model dimension. In contrast, models that have been extended using the standard LCT typically have the number of sub-states (i.e., the shape parameter for the Erlang distributions) hard-coded, and therefore multiple model functions must be coded to explore different shape parameters.
In conclusion, we hope this work encourages other researchers to think more carefully about underlying model assumptions when deriving ODE models. We also hope this work demonstrates the relative ease with which some basic intuition from Markov chain theory can be used to specify clear model assumptions from first principles, which can then be very quickly realized as one or more mean field ODE models using the GLCT (Hurtado and Kirosingh, 2019).

Funding
This work was supported by a grant awarded to PJH by the Sloan Scholars Mentoring Network of the Social Science Research Council with funds provided by the Alfred P. Sloan Foundation; and this material is based upon work supported by the National Science Foundation under Grant No. DEB-1929522. This work was partly motivated by PJH's participation in the ICMA-VII conference held at Arizona State University, October 12-14, 2019, with travel support provided to PJH from NSF grant #DMS-1917512 awarded to the Organizing Committee of the ICMA-VII conference.

Disclosure statement
The authors declare that they have no conflict of interest.

Appendix A R Code for Numerical Solutions to ODEs
For the complete R code used to generate Figs. 3 and 4, see the Electronic Supplements. The following R code shows a portion of that code, to illustrate how the GLCT-based model formulations differ from the LCT-based formulations.