COMBOS2: an algorithm to the input–output equations of dynamic biosystems via Gaussian elimination

Differential algebra (DA) methods are currently being exploited for analyzing dynamic biosystem models for their structural identifiability (SI) properties. An early step in this approach entails finding an equivalent input–output (I/O) model. A recent approach for finding these equations, based on Grbner bases and imbedded in the app COMBOS is relatively slow and sometimes unsuccessful, even for moderate size models. In this paper, we propose a faster algorithm, using a Gaussian elimination approach for analyzing the subclass of linear dynamic biosystem models. We apply this methodology to find the simplest set of globally SI parameter combinations of these models and show that it works effectively for linear biosystem models of simple to moderate complexity.


Introduction
Structural identifiability (SI) analysis deals with the problem of finding the solutions from input-output {u(t), y(t)} data [1,2] for the unknown parameters p of a dynamic system model of the form(1) with state and output equations: x(t, p) = f (x(t, p), u(t), t; p), t ∈ [t 0 , T] y(t, p) = g(x(t, p); p), ( 1 ) where p is a k-dimensional parameter vector, x is an n-dimensional state vector, y is an m-dimensional output vector, u is an r-dimensional input vector, and f and g are rational polynomial functions of their arguments, a reasonable assumption in most applications [3]. Dynamic biosystem models typically have this form, with function f being polynomial, ratios of polynomials, or representative of linear compartmental biomodels in, e.g. tracer perturbation experiments on nonlinear biosystems. Parameter p ∈ p is said to be SI locally if there are finite number of distinct solutions for it, and globally SI if there is only one solution. It's unidentifiable if there are uncountably number of solutions. Nevertheless, even when a model has unidentifiable parameters, finding useful combinations of parameters that are structurally identifiable is always possible. Furthermore, these combinations of parameters can often be utilized to reparameterize the original model into a structurally identifiable one. Practically speaking, SI is a prerequisite for finding parameter values (or values for parameter combinations) in a real "noisy" data problem, usually known as the numerical identifiability problem.
Differential algebra (DA) methods have been illustrated quite applicable in dealing with global and local SI properties of dynamic system models [4][5][6][7]; and many software tools have been elaborated and implemented using many differential algebra algorithms [7,8]. but, they are restricted by complexity of computational algebraic or other difficulties, and are then far limited in application to models of low to moderate complexity.
An early and critical step in DA identifiability analysis is finding the model input-output (I/O) equations, by first eliminating its state variables. To make the I/O equations monic, the coefficients c(p) of them are fixed uniquely by normalizing the Equation [7]. Structural identifiability is then recognized by checking the injectivity of the coefficients c(p), in other words, the model (1) is globally identifiable if and only if c(p) = c(p * ) implies p = p * for arbitrary p * [7]. Therefore, recognizing structural identifiability is facilitated to finding the solutions to c(p) = c(p * ) using many differential algebra algorithms.
Several methods are available for finding the I/O equations, including Ritt's pseudodivision [9], a Gröbner Basis alternative to Ritt's pseudodivision (GRA), developed by Meshkat et al. [3], and some other primary elimination methods [10][11][12][13]. All are encumbered by the computational algebraic complexity encountered in finding the I/O equations and are thus limited to relatively small to moderately complex models [14][15][16].
Gröbner Basis computations in current algorithmic approaches are a major encumbrance to solution speed, in some cases causing one symbolic algebra programme (COMBOS 1 ) to hang, thereby providing no solutions, or running on for several hours in another DA programme (DAISY 2 ) before providing SI solution both for a not-so-complex linear biosystem model (see Section 8 below). Gröbner Basis computations can be quite slow, even for relatively small problems; and increasing model complexity by even a small amount (the number of equations, equation terms, or unknown parameters) can have a great impact on speed and efficiency, especially for finding the I/O equations.
In this paper, we describe an algorithm that simplifies computation of and accelerates the speed of determining the I/O equations, limited only to linear ODE models. In brief, we use a Gaussian elimination approach directly applicable to linear biosystem and other ODE models [17][18][19][20], replacing the Gröbner Basis-based GRA Method in COMBOS for obtaining I/O equations when the ODEs are linear. We also extend some results associated with the GRA algorithm, finding a strict bound on the minimum number of derivatives of the model equations required for finding the I/O equations of linear ODEs. The new methodology is admittedly limited, but it is a step forward, because even some linear models present major encumbrances to ID analysis overcome by the updated algorithms in this new version of COMBOS.

Algorithm for finding the I/O equations
To obtain the I/O equations for (1), we need to first eliminate the state variables and their derivatives. For linear ODE models, if the number of equations (Neq) is more than the number of state variables and their derivatives (Nx), and at least two equations have dependent state variables, we can use one of them to eliminate the other, and can eliminate all state variables and their derivatives by repeating this step.
The condition (Neq > Nx) ensures that at least one input-output equation is found. To find the input-output equations involving all output variables with the highest order, we also show and prove that condition Neq ≥ Nx + m is sufficient (m is the number of output equations). This does not work for nonlinear ODE models [3].
Finding the I/O equations is done in three steps: (1) Eliminate all state variables and their derivatives. (2) Assemble the coefficient matrix of the new system of equations. (3) Apply the Gaussian elimination algorithm to the coefficient matrix.
Step (1). In model (1), the equations y(t, p) = g(x(t, p); p) without derivatives of x is called the output equations. First take the derivative of the output equations, to eliminate the first derivative of x from the state variable equations, by back substitution. Every generated unknown variable then should exist in two equations of the new system. If Neq ≥ Nx + m, this additional equation is enough to eliminate the state variables, otherwise, the second derivative of the output equations is required. Differentiation of the corresponding state variable equations is continued until Neq ≥ Nx + m.
In the GRA algorithm, taking derivatives (n − m) + 1 times is sufficient. In Section 6, we show that this boundary is not always reached, and usually differentiating again generates more I/O equations.
Step (2). For using Gaussian elimination, we need to write all equations in matrix form, thereby assembling a coefficient matrix for the new system in which we can summarize all the information of the new system. This matrix must have as many columns as there are variables and their corresponding derivatives, and as many rows as equations. The order must be specificfirst state variables and their derivatives, next output variables and input variables and their derivatives. The matrix is constructed one row at a time, writing a 1 for the coefficient if the corresponding variable is present and 0 (zero) if it is absent.
Step (3). Following Gaussian elimination of the coefficient matrix, we generate a matrix in reduced rowechelon form. Rows of the new matrix, in which elements corresponding to state variables and their derivatives are zero, are input-output equations coefficients.  [3], after differentiating k times, we have Neq ≥ Nx + m for the new system. Since all equations of a linear ODE system of the form (1) are linearly independent, the new system remains linearly independent. Then, when we apply the Gaussian elimination algorithm on the new system, we obtain the reduced row-echelon matrix (Rrem) with non-zero elements shown in Figure 1. In each row of Rrem, the leading entry is the only non-zero entry in its column. In addition, Neq ≥ Nx + m, and we conclude that we have Nx + m rows involving

Linear compartment models and illustrative examples
We use linear compartment models [21,22] in several examples below to illustrate our novel algorithm for finding I/O equations and identifiable parameters combinations. We also compare this with results obtained using the GRA algorithm, where available.
where u is an r-dimensional input vector, x is an n-dimensional state vector, y is an m-dimensional output vector, and K is n × n compartmental matrix with rate constants k ij that satisfy: where k 0i is the transfer rate constant from compartment i to the external environment; B is n × r input matrix; C is m × n output matrix, p is a p-dimensional parameter vector containing the K, B and C parameters (a positive subset of the real numbers).

2-Compartment model example
SI analysis of a simple 2-compartment model ( Figure 2) demonstrates all of the steps in our new algorithm.
u is the input to compartment x 1 , the dashed line with a circle is the output inform compartment x 2 . The same notation will be used in the following figures.ẋ We first use GRA and then our algorithm for finding the I/O equations and show that the results are the same. Then we use the Meshkat algorithm [3] for finding identifiable parameters combinations (combos) and reparametrize the model in terms of these combos. There are n = 2 state variables {x 1 , x 2 }, m = 1 output equation y = x 1 /v 1 and Neq = 3, Nx = 4. Then, according to the GRA Algorithm, taking derivatives n − (m − 1) = 2 − (1 − 1) = 2 times is sufficient to find the input-output equation.

1:
We first differentiate x 1 − V 1 y (output equation) and then add the result to the system (Neq = 4, Nx = 4):ẋ 2: Differentiateẋ 1 − V 1ẏ to obtainÿ, and the corresponding state variable equation to obtainẍ 1 and then add the result to the system (Neq = 6, Nx = 5): The coefficient matrix for model (4) is the following form: we now use Gaussian elimination on this coefficient matrix and obtain the following matrix in the reduced row-echelon form: We select the rows of the new matrix which all elements corresponding to state variables and their derivatives are 0 (zero). Multiplying these rows with variables, we obtain the I/O equations (Figures 3 and 4).
The I/O equation for model (4) is: Multiplying Equation (8) by V 1 , we see our I/O equation is to the same as the GRA algorithm I/O Equation (7). Now we investigate the parameter identifiability of the system (4) using Gröbner bases for the solutions to c(p) = c(p * ) and show V 1 , k 12 k 21 , −k 02 − k 12 , −k 01 − k 21 are uniquely identifiable. Finally, the most appropriate identifiable combinations are {q 1 = −k 01 − k 21 , Then the reparameterized equations are:

3-Compartment model example
We analyse the 3-compartment model with one output equation shown in Figure 5, to show that we cannot find I/O equations when the number of equations (Neq) is equal to the number of state variables and their derivative (Nx).

4-Compartment thyroid system model example
The model shown in Figure 7, which represents the linear dynamics of two thyroid hormones (T3 and T4) in blood (1 and 2) and tissues (3 and 4), with conversion of T4 into T3 in the lumped tissue compartment. It has one input and two output. For this example, we find sufficient input-output equations for finding identifiable combinations, with fewer derivatives than with the GRA algorithm.ẋ We use the same method as above, but the minimum bound for the number of taking derivatives is 2 using our algorithm, while it is 3 in GRA algorithm. There are n = 4 state variables {x 1 , x 2 , x 3 , x 4 } and m = 2 output equations {y 1 = x 1 , y 2 = x 2 }. Then, according to Meshkat [14], taking derivatives n − (m − 1) = 4 − (2 − 1) = 3 times is sufficient for finding the I/O equation. 1: Differentiate y 1 − x 1 , y 2 − x 2 (output equations) and then add the result to the system (Neq = 8, Nx = 8): The number of equations is still not enough to eliminate all unknowns. 2: Differentiateẏ 1 andẏ 2 to obtainÿ 1 andÿ 2 , and the corresponding state variable equation to obtainẍ 1 andẍ 2 . Then add the result to the system (Neq = 12, Nx = 10):ÿ Then new system now has 12 equations in 10 unknown state variables and their derivatives, {x 1 , x 2 , x 3 , x 4 ,ẋ 1 ,ẋ 2 , x 3 ,ẋ 4 ,ẍ 1 ,ẍ 2 }, enough to eliminate all unknowns. So, we do not need to take another derivative and the algorithm terminates here. According to the GRA algorithm and the ranking {ẍ 2 ,ẍ 1 ,ẋ 4 ,ẋ 3 ,ẋ 2 ,ẋ 1 , x 4 , x 3 , x 2 , x 1 ,ÿ 2 ,ÿ 1 ,ẏ 2 ,ẏ 1 ,u, y 2 , y 1 , u}, we find the Gröbner Basis of it using Mathematica.Then I/O equations have the following forms: Coefficient matrix for model (17) is: We then use Gaussian elimination and acquire a reduced row-echelon form (not shown, too big). We select two rows with 0 (zero) elements corresponding to state variables and their derivatives ( Figure 8).

Evaluation of the third derivative
We investigate the result of obtaining a third derivative using the GRA algorithm. we show that the third derivative gives more input-output equations and, consequently, more coefficients. But identifiable parameters combination results are the same as using the Meshkat alghorithm [3]. Let's explore the relationships among the coefficients of the I/O equations obtained from the third and second steps, which do not change the results. A coefficient table of these input-output equations for the second derivative are: We did not repeat the same coefficients because they do not affect the exhaustive summary or the Gröbner Basis.

Theorem 7.2:
. . a n = p n Proof: By induction.  A, A , B, B , C and C respectively. We should show S C ⊆ S C and S C ⊇ S C . For S C ⊆ S C , Suppose s ∈ S C = S A ∪ S B . Then s ∈ S A or s ∈ S B ; if it is in S A , then by relation A ∼ = A , we have s ∈ S A ⊆ S C =⇒ S C ⊆ S C , by symmetry s ∈ S B leads to the same conclusion. We also can show S C ⊇ S C by using this method.
Theorems 7.1, 7.2 and 7.3 are true for Gröbner Basis of the same model, that is, we can perform Gröbner Basis on each side of above expressions without changing the equality [23]. For theorem 7.1, we need to get the coefficient domain of polynomials as rational functions instead of integers numbers to simplify the solutions. We use these theorems for comparing between the obtained coefficients of I/O equations by taking derivatives two times (Table 1) and three times ( Table 2).
We introduce a method to show the relation between the coefficients in Table 1 (T 1 ) and Table 2 (T 2 ).
First, we specify each coefficient according to its position number in T 1 or T 2 . For example, the relation number of coefficient k 03 + k 13 + k 43 is 7 in T 1 .
Second, the coefficient in each table may have a different relation number. For example, the relation number of coefficient k 03 + k 13 + k 31 + k 43 is 9 in T 1 and 11 in T 2 .
Third, Suppose coefficient α is in T 2 , but not in T 1 . To find relation number in T 1 , we find a subset A including α of T 2 and a subset B of T 1 such that A ∼ = B. that is: The relation number of α ∈ A in T 1 is equal to all the relation number of the coefficients in B.
Then the relation number of k 03 k 04 k 42 + k 04 k 42 k 43 is 2, 3, 7 in T 1 . The relation number is not unique and we use them only to show related coefficients do not effect the final answer and can be eliminated. The best way to find these subsets is to write the coefficients in T 2 Note: No. in T 1 is the position of the coefficient in Table 1, No. in T 2 is the position of the coefficient in Table 2.

Linear thyroid hormone distribution and metabolism (D&M) model example (Figure 9)
This more complex version of a well-studied biosystem model [24][25][26] illustrates the power of our algorithm.
The new system now has 22 equations in 20 unknown state variables and their derivatives, {x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ,ẋ 1 ,ẋ 2 ,ẋ 3 ,ẋ 4 ,ẋ 5 ,ẋ 6 ,ẍ 1 ,ẍ 2 ,ẍ 3 ,ẍ 4 ,ẍ 5 ,ẍ 6 , ... To test structural identifiability of this model, we adopt the quasi-numerical method used in DAISY [7]. We provide the exhaustive summary c(p), using coefficients of the I/O equations plus random numbers. For every function of the unknown p in the exhaustive summary we assign a pseudo-randomly chosen numerical value for the elements of vector p, providing a system of nonlinear algebraic equations in the unknown p. This quasi-numerical approach allows significantly reduced symbolic computations and hence broadens the class of testable models. For this example, the resulting nonlinear algebraic equations (with random numbers) were solved using Mathematica, which returns solutions for each unknown parameter. When these solutions indicate that the parameter is equal to its assigned number, the model is globally structurally identifiable, otherwise it is not. We chose numerical parameter values as follows: {k 02 → 2, k 03 → 3, k 05 → 5, k 06 → 7, k 12 → 11, k 13 → 13, k 21 → 17, k 31 → 19, k 45 → 23, k 46 → 29, k 52 → 31, k 54 → 37, k 63 → 41, k 64 → 43} After solving exhaustive summary, none are equal to their numerical value, so the model is not structurally identifiable (Appendix 1).

Implementation of method as web application
Our new algorithm is implemented as web application COMBOS2, available online at: http://biocyb1.cs.ucla. edu/combos2. To show the speed of our algorithm compared with the Gröbner Basis approach in COMBOS, we give some examples of analyzing linear models for identifiable combinations, which DAISY does not provide. COMBOS [8] provides identifiable parameter combinations for the 4-Compartment biosystem model in about 17 seconds, decreasing to 5 seconds with COMBOS2. COMBOS2 shows that a 5-Compartment identifiable model [27] is identifiable in about 9 seconds, whereas COMBOS needs 28 seconds. Each is at least 3 times faster with COMBOS2. For the 6-compartment linear thyroid model above, COMBOS did not provide any answer because it hung during the I/O phase of the computation. COMBOS2 took only 2 seconds for finding I/O equations, but it also hangs when attempting to find identifiable parameter combinations, because it uses the Gröbner Basis approach in its paramter combinations algorithm. An alternative method for solving this problem using Mathematica is described in the Appendix 2. DAISY does not compute identifiable parameters combinations, but it did analyse for structural identifiability of individual parameters, and this took several hours to complete.

Conclusion
We introduced a novel algorithm for finding the I/O equations for systems of linear biosystem models and other linear ODEs of the form (1) with m output equations, by taking derivatives of the system and then using a Gaussian elimination approach. The new algorithm runs much faster and is simpler to implement than a Grobner basis approach. We also showed the number of derivatives needed for computing I/O equations is smaller when the condition Neq ≥ Nx + m is satisfied. We demonstated the utility of the new algorithmic approach in a new app COMBOS2, applying it to several example models and, in particular, to a 6-dimensional thyroid hormone dynamics model, shown in 2 seconds to be not structurally identifiable, while COMBOS provided no solution. Another app DASIY needed several hours to provide only a structural identifiability analysis for the original individual parameters, and no parameter combination results. The main limitation of our new algorithm is, of course, that it is not applicable to nonlinear biosystem models. But we believe it gets a bit closer.