Approximate reduction of linear population models governed by stochastic differential equations: application to multiregional models

ABSTRACT In this work we develop approximate aggregation techniques in the context of slow-fast linear population models governed by stochastic differential equations and apply the results to the treatment of populations with spatial heterogeneity. Approximate aggregation techniques allow one to transform a complex system involving many coupled variables and in which there are processes with different time scales, by a simpler reduced model with a fewer number of ‘global’ variables, in such a way that the dynamics of the former can be approximated by that of the latter. In our model we contemplate a linear fast deterministic process together with a linear slow process in which the parameters are affected by additive noise, and give conditions for the solutions corresponding to positive initial conditions to remain positive for all times. By letting the fast process reach equilibrium we build a reduced system with a lesser number of variables, and provide results relating the asymptotic behaviour of the first- and second-order moments of the population vector for the original and the reduced system. The general technique is illustrated by analysing a multiregional stochastic system in which dispersal is deterministic and the rate growth of the populations in each patch is affected by additive noise.


Introduction
Nature offers many examples of systems with an inherent complexity whose study leads to mathematical models with a large number of state variables whose analytical study is, in most cases, not feasible. In order to be able to extract important information about the behaviour of some of these complex models, one can resort to 'approximate aggregation methods'. These are mathematical techniques, which are usually applied in systems governed by processes with different time scales, in which appropriate approximations are introduced in order to transform the system under consideration into a reduced system with a lower number of variables, called 'global variables'. In this way, the behaviour of the original system can be approximated, but not known with exactitude, in terms of the knowledge of the behaviour of the reduced system. Approximate aggregation techniques in population dynamics have been widely studied in the context of deterministic systems with different time scales both in continuous and discrete time (see the review in [2] and the references therein), as well as in discrete stochastic models incorporating either environmental [19] or demographic stochasticity [21].
Stochastic differential equations (SDEs) can be thought of as resulting from the introduction of environmental stochasticity in the coefficients of deterministic ODEs. In spite of the difficulty of their analytical treatment, they have become popular as population modeling tools (see for example [6,8] for the scalar case and [13,24] for applications to competition models and epidemiology respectively).
The aim of this work is to formulate an approximate aggregation technique valid to reduce fast-slow linear population models governed by SDEs, to give sufficient conditions for the solutions of the models to be positive and to relate the asymptotic behaviour of the first-and second-order moments of the population vector for the original and the reduced system. The original model is built by considering a fast deterministic process that converges to an equilibrium, together with a slow process in which the parameters are affected by additive noise. In the resulting system the quotient between the rate of diffusion squared and the speed of drift is O(ε), ε → 0, where ε is a measure of the difference of time scales between the fast and the slow process. This is in contrast with most previous works in the field [5,10,11,16] which are valid in other situations. An exception is [14] which covers our case of interest but which only deals with the relationships between the original and the reduced systems in finite time intervals.
The existing literature on the field of the approximate reduction of SDEs has been developed mainly in the context of physics and control theory. Our approach, although general in nature, is aimed to population dynamics applications. Specifically, we will employ it to study a multiregional model consisting of a single population living in a multipatch environment in such a way that dispersal is deterministic and the growth rate of the population in each patch is affected by additive noise. Models with spatial heterogeneity have been widely studied through aggregation techniques (see [3] for the continuous time case and [2] for discrete time), normally making use of the fact that in many practical situations the dispersal of individuals amongst the different patches is fast with respect to other processes like demography or competition (an exception to this is [20]). In this work we contemplate a situation that has not been dealt with so far: dispersal between some spatial patches can be fast with respect to demography whereas migration between some other patches can happen at the same time scale of demography. For example we can think of a population of birds living in different islands in such a way that inter-island movements happen at the scale of demography but, in comparison, intra-island migrations are fast.
The manuscript is organized as follows: in Section 2 we present the general formulation of linear SDE models and, in order to be able to use them in population dynamics applications, we give sufficient conditions that guarantee that if initial conditions are positive, the solution to such a system remains positive for all times. Section 3 presents the general formulation of a linear two time scale population model with stochasticity affecting the slow process. Section 4 carries out the reduction of the system, which is accomplished by letting the fast process reach equilibrium and defining and adequate set of new variables. In Section 5 we apply the general technique to the case of a stochastic multiregional model, with special attention to the case in which all migrations are fast with respect to demography, since it is the most frequent situation in practice and moreover in that case the reduced system is scalar. Section 6 presents a result that allows one to relate the asymptotic behaviour of the first and second statistical moments of the population vector for the original and the reduced system, and applies it to study the asymptotic behaviour of the multiregional models of Section 5. A brief discussion of results and the Appendix with mathematical proofs complete the manuscript.

Linear population models with two time scales
Throughout the paper we assume we are working in a complete probability space ( , F, {F t } t≥0 , P) where the filtration {F t } t≥0 satisfies the usual conditions [18]. Let us consider a structured population modelled by an linear autonomous homogeneous stochastic differential equation. The model has the form where T is a mdimensional standard Wiener process defined in the previous probability space. Moreover, we assume that X 0 ∈ R d is a non-random vector. Models of the kind (1) are obtained from a linear deterministic model if we add noise to the population vital rates a ij . Indeed, system (1) can be written in the form from where we see that each coefficient b h ij characterizes the intensity of the noise dW h affecting a ij . Note that the case in which the noises dW j (t) are correlated can be reduced to this setting through an appropriate transformation [1, p. 126]. Systems of the kind (1) can be interpreted in the sense of Ito or in the sense of Stratonovich, and the results obtained in the two cases differ [23]. However, one can choose any of the two interpretations as long as one defines appropriately the parameters of the model [7]. In this work we will make use of the Ito interpretation. It is well known [1, p. 126] that in the previous conditions there exists a unique solution to (1) which is continuous with probability one.
In order to be a valid model for a population, the solution of (1) for any positive initial condition must remain non-negative for all times, so we turn our attention to this property. Given a vector or matrix x we will write x ≥ 0 (resp. x > 0) to denote that all the components of x are non-negative (resp. positive). Let us recall that a square matrix A is said to be a Metzler matrix or a essentially non-negative matrix when it has non-negative off-diagonal entries, that is, a ij ≥ 0 for i = j [22]. It is well known that if A is a Metzler matrix then solutions of the deterministic system (2) meet the desired property. However, in model (1) additional requirements are needed in order to ensure positivity of solutions. The next result gives sufficient conditions: Theorem 2.1: If matrix A is a Metzler matrix and matrices B 1 , . . . , B k are diagonal, then given X(0) = X 0 > 0, the solution of (1) verifies that P(X(t) > 0 for all t ≥ 0) = 1.
Proof: See Appendix.

A model with two time scales
We suppose a stage-structured population in which individuals are classified into stages or groups attending to any characteristic of the life cycle. Moreover, each of these groups is divided into several subgroups that can correspond to different spatial patches, different individual activities or any other characteristic that could change the life cycle parameters. The model is therefore general in the sense that we do not state in detail the nature of the population or the subpopulations.
We consider the population being subdivided in q populations (or groups). Each group is subdivided in subpopulations (subgroups) in such a way that for each i = 1, 2, . . . , q, group i has N i subgroups. Therefore, the total number of subgroups is N := N 1 + N 2 + · · · + N q . We denote the fast time as τ , while the slow time will be denoted by t. In this way we have t = ετ where ε << 1 is a small number that represents the ratio between the slow and the fast times.
Le x ij (τ ) be the density of subpopulation j of population i at time τ , with i = 1, 2, . . . , q and j = 1, 2, . . . , N i . In order to describe the population of group i at time τ we will use vector where T denotes transposition. The composition of the total population is then given by vector In the evolution of the population we will consider two linear processes whose corresponding characteristic time scales, are very different from each other. In order to include in our model both time scales we will model these two processes, to which we will refer as the fast and the slow dynamics, by two different matrices.
In principle, we will make no special assumptions regarding the characteristics of the slow dynamics other than linearity and restrictions to guarantee positivity of solutions to the slow process. Thus, we will assume that the parameters of the slow process are defined by where: S.1. Matrix S ∈ R N×N models the deterministic part of the slow process and has non-negative off-diagonal entries, that is, S is a Metzler matrix. We consider S divided into blocks S ij , 1 ≤ i, j ≤ q in such a way that Each block S ij = [s αβ ij ] has dimensions N i × N j and characterizes the rates of transference of individuals from the subgroups of group j to the subgroups of group i. More specifically, for each α = 1, 2, . . . , N i and each β = 1, 2, . . . , N j , entry s αβ ij represents the rate of transference of individuals from subgroup β of group j to subgroup α of group i. S.2. W 1 , . . . , W k are independent standard Wiener processes (so that by dW 1 (t)/dt, . . . , dW k (t)/dt we denote the associated weakly defined gaussian white noises) and, for each h = 1, . . . , k, matrix B h ∈ R N×N models the contribution of noise dW h (t) to the dynamics of the slow process. In order to be able to guarantee positivity of solutions (Theorem 2.1), we assume that matrices B h are diagonal, that is, Note from Equation (3) that b h iα models the intensity of the noise dW h (t) affecting coefficient s αα ii . In this context, we say that an eigenvalue λ of a certain square matrix A is strictly dominant when the real part of λ is strictly larger than the real part of the rest of the eigenvalues of A.
As far as the behaviour of the fast dynamics is concerned, we will make the following three assumptions: The fast dynamics is an internal process for each group, that is, there is no transference of individuals from one group to a different one. Therefore, for each i = 1, . . . , q, the fast dynamics of group i will be represented by a Metzler matrix F i of dimensions N i × N i . We will assume that F i is irreducible in the sense that there exists r > 0 such that rI + F i is a primitive non-negative matrix [22]. Therefore, the matrix that governs the fast dynamics for the whole population is F. 3. The fast process in each group has a non-trivial equilibrium point which is asymptotically stable. More specifically, for each i = 1, . . . , q matrix F i has eigenvalue 0 and it is strictly dominant so that the rest of the eigenvalues of F i have negative real parts. Since F i is a Metzler irreducible matrix, eigenvalue 0 is simple [22, Theorem 2.6] and moreover, there exist positive right and left eigenvectors v i and u i of F i associated to eigenvalue 0 for which we choose the following normalization conditions where * 1 denotes the 1-vector norm. In order to incorporate both time scales in our model, we will make use of parameter ε = t/τ . The model, to which we will refer as original system, has the following form in the slow time Alternatively, using the fast time τ we have where we have made an abuse of notation and kept the same notation for X and the W h in the new time, and we have used [1, p. 47] that if W(t) is a standard Wiener process then so is √ εW(t/ε). We stress that the quotient between the rate of diffusion squared and the speed of drift is O(ε), ε → 0, which is in contrast with most approaches in the analysis of two-time scales systems [5,10,11,16] which are valid in other situations.

Approximate reduction of the model
In order to reduce the original system (7), we will use the fact that the fast process has an asymptotic stable equilibrium, and we will approximate this system by another one in which the fast process has reached equilibrium.
Let i = 1, . . . , q be fixed and letF i := v i u T i . From Equation (5), we have that if the system were governed by the fast process exclusively, for any initial condition From this expression we note that vector v i defines the equilibrium population structure for group i and u i is a vector of reproductive values, that is, the larger u j i , the higher the contribution of the j-th subgroup of the ith group to the equilibrium population. Therefore u i characterizes the size of the equilibrium population.
We define matrixF in the following way, and then for any initial condition X ∈ R N + , the equilibrium population for the fast dynamics in the whole system is given byX =FX. Let us define the non-negative matrices whose interpretation is immediate bearing in mind what we pointed out about v i and u i . Some of the properties of these matrices are gathered in the following lemma, whose proof is straightforward: Now, from Equation (6) we will build an auxiliary system replacing the state variables in the right side by its equilibrium values for the fast process and use that FF = 0, Now we define the vector of global variables as and, multiplying Equation (8) on the left by U and using thatF = VU and Equation (9) we obtain the aggregated system where we have defined Note that the global variables Y(t) defined by Equation (9) have the following expression in terms of the variables X(t) of the auxiliary system: Note that: (a) y i (t) is a linear combination of the variables corresponding to group i, being the coefficients of the combination the components of vector u i . Recall that u i is a vector of reproductive values for the fast process in group i. Therefore, for each j = 1, . . . , N i , variable x ij (t) has a relative weight in y i (t) which is proportional to u j i , that is, proportional to the contribution to the total equilibrium population that an individual initially present in group i and subgroup j would have in the case that the system were governed by the fast process exclusively. (b) The global variables are conservative for the fast process. Indeed, suppose that the fast process is the only one acting in the system. Then we would haveẊ(t) = FX(t)/ε and using that UF = 0,Ẏ(t) = UẊ(t) = UFX(t)/ε = 0. (c) The components of the matrices representing the drift and the diffusion for the reduced system are certain linear combinations of their analogues for the original system, where the coefficients of the combination are determined by the equilibrium characteristics of the fast process.
The next result together with Theorem 2.1 guarantees that the original and the aggregated systems have positive solutions for any positive initial conditions. Proposition 4.2: F + εS andS are Metzler matrices and matricesB h , h = 1, . . . , k are diagonal. Therefore, according to Theorem 2.1 both the original system (7) and the aggregated system (10) verify that for any positive initial condition the solution remains positive for all t > 0 with probability one.
Proof: F + εS is clearly a Metzler matrix for it is the sum of Metzler matrices. Now let i = j. Since S is a Metzler matrix we have that S ij ≥ 0 and using the fact that u i and v j are positive vectors, from Equation (11) it follows thats ij = u T i S ij v j ≥ 0 and soS is a Metzler matrix. Moreover, from Equation (12) it is clear thatB h is a diagonal matrix for each h = 1, . . . , k.

Multiregional models with two time scales
In this section we will illustrate the reduction technique by applying it to a multiregional model.

Model setting
We consider a population living in a multipatch system. We assume that there are a number N of different patches among which the individuals can migrate. The growth of the population in each patch is linear and is affected by stochasticity. Migration among patches is assumed to be deterministic.
We number the patches in the form (i, α), i = 1, 2, α = 1, . . . , N i , where N := N 1 + N 2 . Coming back to the terminology of Section 3, the first index, i, defines the 'group' of patches and the second, α, the 'subgroup' within that group. Note that in our setting we have chosen that the number q of groups is 2 just for the sake of simplicity in the expression of the matrices involved, but the generalization of the model to an arbitrary number q of groups is straightforward. Let x iα (t) denote the population in group i and subgroup j at time t and let X = (x 11 , . . . , x 1N 1 , x 21 , . . . , x 2N 2 ) T ∈ R N be the population vector.
We assume that migration is fast within each group of patches, that is, from any patch (i, α) to any other patch of the form (i, β), β = α. Migration is assumed to be slow between patches belonging to different groups, that is, from any patch (i, α) to any other patch of the form (j, β), with j = i. We can think of each group of patches as spatial regions located close to each other so that migration between them is easy, whereas different groups of patches correspond to regions amongst which migration is more difficult. For example, in a population of birds each group of patches can correspond to an island and the subgroups can correspond to the different spatial locations within an island, so that intra-island migrations are fast with respect to inter-island movements.
Let τ and t denote, respectively, the times corresponding to the fast and the slow migrations and let ε := τ/t be the ratio between both. We assume that the growth of the population takes place in the slow time scale. For each pair (i, α), i = 1, 2, α = 1, . . . , N i , let r α i be the deterministic population growth rate in patch (i, α) and let us assume that this growth rate is affected by a noise defined by a certain linear combination σ 1 iα dW 1 (t)/dt + · · · + σ k iα dW k (t)/dt of (weakly defined) independent white noise processes, where σ h iα ≥ 0 for each h = 1, . . . , k. Now we define matrices Regarding the slow migration between different groups of patches, for each i = j and each α = 1, . . . , N i , β = 1, . . . , N j , we define l aβ ij ≥ 0 as the (slow) migration coefficient from patch (j, β) to patch (i, α). Similarly we define l aβ ii = 0 for each α, β = 1, . . . , N i with α = β (as there is no slow migration within a group of patches) and Then the slow process, that is, the joint effect of the growth process and of the slow migration between patches of different groups, can be modelled by the following system of SDEs Regarding the fast migration between patches of the same group, for each i = 1, 2 and α, β = 1, . . . , N i with α = β, let m αβ i ≥ 0 be the (fast) migration rate from patch (i, β) to patch (i, α). For β = α we define We assume that the m αβ i are such that M i is irreducible for each i = 1, 2. From Equation (13) we have that the columns of each M i add up to zero and so matrix M i + I is a non-negative primitive column stochastic matrix. Therefore 0 is the (strictly) dominant eigenvalue of M i and moreover it is simple. Let be its associated positive left and right eigenvectors, where we assume the normalization condition 1 T i v i = 1. Note that vector v i defines the equilibrium distribution between the different patches of group i when we consider the fast migration as the only process acting on the system. Now we define matrices Then, the complete model that takes into account the joint effect of the slow population growth in each patch and the slow and fast migrations between patches is given by or, using the fast time τ , which constitutes a system of N linear SDEs.
Note that under the above conditions we are in the general setting of Section 3 by taking and Hypotheses S1, S2, F1, F2 and F3 for the slow and fast processes are met.

Model reduction
Therefore we can proceed to the aggregation of the original system following the procedure developed in Section 4. We define the global variables Y := UX, that is, so that y i is the total population in all patches of group i, and the reduced aggregated system is the two dimensional SDE whereS In order to be more specific we will consider the particular case in which N 1 = N 2 = 2, that is, there are 4 patches (1, 1), (1, 2), (2, 1) and (2,2), and the population vector is X = (x 11 , x 12 , x 21 , x 22 ) T . Dispersal is fast between patches (1, 1) and (1,2) and between (2, 1) and (2,2), and is slow in the rest of cases.
Let i = 1, 2 be fixed. We will keep the notation introduced above except in the case of the fast migrations, where in order to simplify it we will denote by p i and q i the (fast) migration rates from (i, 1) to (i, 2) and from (i, 2) to (i, 1) respectively. Therefore We assume 0 < p i , q i < 1 so that matrix M i is irreducible and vector v i has the The global variables are Y = (y 1 , y 2 ) T = (x 11 + x 12 , x 21 + x 22 ) T and the reduced system has the form (15) with 11 12 + l 21 12 ) + p 1 (l 12 12 + l 22 12 ) p 2 + q 2 q 1 (l 11 21 + l 21 21 ) + p 1 (l 12 21 + l 22 21 ) p 1 + q 1 q 2 (−l 11 12 − l 21 12 + r 1 2 ) + p 2 (−l 12 12

Case in which migration is fast
Let us now consider the particular but relevant case in which there is only one group of patches, and therefore all the migrations among the patches are fast with respect to demography. This has been the usual setting in the literature of approximate aggregation techniques [2,3]. We can consider this case as a particular instance of the setting in Section 5.1 when we take q equal to 1 instead of equal to 2. In order to simplify the notation, we omit the index 1 regarding the group of patches. Therefore, we denote the patches as j = 1, . . .  , σ h  N ). Therefore, the original system (14) takes the form In this case there is only one global variable y(t) = x 1 (t) + x 2 (t) + · · · + x N (t) and if v = (v 1 , . . . , v N ) T and 1 = (1, . . . , 1) T are the positive right and left eigenvectors of M associated to eigenvalue 0 scaled so that 1 T v = 1, we have V = v, U = 1 T . Therefore, the reduced system is the scalar SDE

Relationships between the original and the reduced systems
The aim of this Section is to relate the asymptotic behaviour of the first-and second-order moments of the solution of the original and the reduced systems (8) and (10) introduced in Section 3. The two systems fit in the framework of [14] in which the quotient between the rate of diffusion squared and the speed of drift is O(ε p ), ε → 0, with p > 1/2, but the results obtained in that reference, essentially a stochastic extension of the Tikhonov theory of singular perturbations, are valid only for finite time intervals. The first-and second-order moments of the solution of general linear stochastic equations with a deterministic initial condition are finite and can be calculated as the solution of certain linear ordinary differential equations. Specifically [1], for system (1) EX(t) is the unique solution to the equation whereas the matrix of second-order moments H(t) := E(X(t)X(t) T ) ∈ R d×d is the unique non-negative-definite symmetric solution of the equation Regarding Equation (18), in order to work with a more tractable expression, we will make use of the Kronecker matrix product and the 'vec' operator (see [9] for details). The Kronecker matrix product for two matrices A = [a ij ] ∈ C m×n and B = [b ij ] ∈ C r×s is defined as a matrix of size mr × ns with mn blocks in which the block in position (i, j) has the form a ij B. For any matrix A ∈ C m×n , vecA is defined as the column vector that contains, in order, the columns of A. For any matrices A, Y, and B for which the product AYB makes sense, one has vec(AYB) = (B T ⊗ A)vecY [9,Theorem 6.8], and so applying the 'vec' operator to both sides of Equation (18) we obtain that the second-order moments verify and I N denotes the N × N identity matrix. Therefore, the asymptotic behaviour of the first-and second-order moments of system (1) can be characterized in terms of the dominant eigenvalue and eigenvectors of matrices A and A 2 . We will now use this fact to study the moments of systems (8) and (10).
In the case of the original system (8), the analogous to matrices A and A 2 in the previous reasonings are matrices C ∈ R N×N and C 2 ∈ R N 2 ×N 2 given by where we have defined In the case of the aggregated system (10) the corresponding matrices areC(ε) = εS ∈ R q×q ,C 2 (ε) = εS 2 ∈ R q 2 ×q 2 wherē We introduce the following two hypotheses: Hypothesis 6.1: MatrixS has a simple and strictly dominant real eigenvalue μ associated to non-negative right and left eigenvectors r and l, respectively. Hypothesis 6.2: MatrixS 2 has a simple and strictly dominant real eigenvalue μ 2 associated to non-negative right and left eigenvectors r 2 and l 2 , respectively.
From Hypothesis 6.1 we have that the first-order moments of the reduced system have the following asymptotic behaviour lim t→∞ EY(t) exp(μt) = l, y 0 l, r r, where y 0 := UX 0 and * , * denotes the standard vector scalar product. Analogously, from Hypothesis 6.2 we have, for the second-order moments, The next result relates the asymptotic behaviour of the first-and second-order moments of the original system (8) with that of the reduced system when ε is small, that is, when the separation between the time scales of migration and of demography is large enough: Theorem 6.1: (a) Let Hypothesis 6.1 hold. Then for small enough ε > 0, matrix C(ε) = F + εS has a simple and strictly dominant eigenvalue λ(ε) that can be written in the form with associated right and left non-negative eigenvectors r(ε) and l(ε) such that Consequently, for the original system (8) we have: (b) Let Hypothesis 6.2 hold. Then for small enough ε > 0, matrix C 2 (ε) = F 2 + εS 2 has a simple and strictly dominant eigenvalue λ 2 (ε) that can be written in the form with associated right and left non-negative eigenvectors r 2 (ε) and l 2 (ε) such that Consequently, the asymptotic behaviour of the second-order moment of the original system is characterized by: Proof: See Appendix.
From the last theorem it follows in particular that if μ < 0 (resp. μ > 0), then λ(ε) < 0 (resp. λ(ε) > 0) for small enough ε, so that if the expected value of the population vector tends to zero (resp. infinity) in the reduced system the same happens for the original one when ε is small. Something analogous happens for the second-order moments regarding μ 2 and λ 2 (ε).
Let us apply this result to relate the behaviour of the multiregional models of Section 5. In the first place we will consider the model with q = 2, N 1 = N 2 = 2. Note from (16) that if there is at least a non-zero coefficient l αβ ij , matrixS is irreducible and then Hypothesis 6.1 holds. Its dominant eigenvalue is given by μ =s 11 +s 22 + (s 11 −s 22 ) 2 + 4s 12s21 2 and so the rate of growth of the first-order moments for the original system is εμ + O(ε 2 ) for small enough ε. This expression allows one to study how the different combinations of the migration and growth parameters affect the asymptotic behaviour of the expected value of the population. In particular, μ < 0 if and only if trS < 0 and detS > 0. Sincē S has order two we can calculate right and left eigenvectors r and l associated to μ and then apply Equations (20) and (23) to obtain the full information about the asymptotic behaviour of the first-order moments of the system. We could argue analogously for the second-order moments: in this case matrixS 2 is of order four and so an analytical computation of μ 2 , r 2 and l 2 is non-feasible, but for instance we can use the Routh-Hurwitz criterion [17] to obtain conditions for μ 2 to be negative, so that εμ 2 + O(ε 2 ) will also be negative for small ε.
In the case of the system of Section 5.3, the analysis is simpler cause the aggregated system is scalar and so areS andS 2 , so that Hypotheses 6.1 and 6.2 hold trivially. Moreover, μ =s and μ 2 = 2s + k h=1 (ḡ h ) 2 wheres and theḡ h are given by Equation (17) and we can take r = l = r 2 = l 2 = 1. Therefore from Equations (20) and (23) we have Note in particular that, when ε is small enough, the expected value of the population vector tends to zero ifs < 0, and the second-order moments also tend to zero ifs < − k h=1 (ḡ h ) 2 /2.

Discussion
In this work we have presented a technique to carry out the reduction of linear two-scale population models governed by SDEs, therefore allowing one to simplify the treatment of these models. The reduction of the original model with N variables is carried out by letting the fast process reach equilibrium and defining an appropriate set of q global variables, q < N, which are linear combinations of the state variables and are conservative for the fast process. Moreover, we have obtained conditions that guarantee the positivity of solutions to the model. We have also presented a result that allows one to know the the asymptotic behaviour of the first-and second-order statistical moments of the population vector of the original system through the computation of the dominant eigenvalues and eigenvectors of certain matrices of dimensions q × q and q 2 × q 2 associated to the reduced system.
The aggregation technique has been applied to study stochastic multiregional models in which migration between some patches can be fast with respect to demography whereas dispersal between other patches happens at the scale of demography. In the simpler case in which all migrations are fast, the reduced system is a scalar SDE whose analysis is straightforward, therefore allowing one to easily characterize the asymptotic behaviour of those moments for the original multiregional model. This work suggests possible lines of future development: on the first hand, and still within the linear setting, work needs to be done to try to relate the stochastic stability [15] of the origin in the original and the reduced system, for this will provide more information regarding the persistence-extinction of the population. Secondly, in order to be able to study more realistic population models, the technique should be extended to nonlinear settings.

Disclosure statement
No potential conflict of interest was reported by the authors.
We state as a Theorem a collection of results from Baumgärtel [4] about perturbation of semisimple eigenvalues of matrices that we will use in the proof of Theorem 6.1: Essentially Theorems 2 and 3 in [4], pp. 267-269): Let E be a complex linear space of finite dimension N and let A(ε) be a linear operator defined on E that admits the following holomorphic expansion A(ε) = A 0 + εA 1 + ε 2 A 2 + · · · , |ε| < R. Let λ 0 be a semi-simple eigenvalue of A 0 and let P be its associated eigenprojection.
Proof of Theorem 6.1: For small ε we will consider matrix C(ε) = F + εS as a perturbation of matrix F. From Equation (4) and the properties of each matrix F i , we have that 0 is a semisimple eigenvalue of F and that it is strictly dominant. Therefore is a Jordan canonical decomposition of F [12] where V ∈ R N×(N−q) , U ∈ R N×(N−q) and ∈ R (N−q)×(N−q) are certain matrices and is associated to eigenvalues with strictly negative real part. From Equation (A5) and Lemma 4.1, we have thatF = VU is the eigenprojection matrix of F associated to eigenvalue 0.
Using the continuity of eigenvalues on matrix entries, the dominant eigenvalue of C(ε) for small enough ε necessarily must belong to the set of eigenvalues of C(ε) that tend to 0 when ε → 0. Using part 1 of Theorem A.1 applied to C(ε) = F + εS with A 0 = F, A 1 = S, λ 0 = 0 and P =F, we conclude that this set of eigenvalues has the form where the μ j are the eigenvalues of the restriction T | ImF of T :=FSF to ImF. Clearly ImF is invariant for T and, using Equation (11) and Lemma 4.1, we have T =FSF = VUSVU = VSU, so thatS is the matrix of the restriction T | ImF expressed in the basis of ImF defined by the columns of V. Therefore, T | ImF andS have the same eigenvalues and, moreover, if x and y are, respectively, right and left eigenvectors ofS associated to a certain eigenvalue λ, then Vx = 0 and U T y = 0 are right and left eigenvectors of T | ImF associated to eigenvalue λ. Therefore, using Hypothesis 6.1 it follows that μ is a simple eigenvalue for T | ImF and is associated to right and left eigenvectors Vr = 0 and U T l = 0, respectively. Moreover, μ is strictly dominant for T | ImF so that, using Equation (A6), μ(ε) := εμ + o(ε) is a simple eigenvalue for C(ε) and strictly dominant when ε is small enough. Now, using part 2 of Theorem A.1 and the fact that μ is simple for T | ImF , it follows that μ(ε) has the form μ(ε) := εμ + O(ε 2 ) and that is associated to right and left eigenvectors which can be written in the form Vr + O(ε) and U T l + O(ε) respectively. Then, for small enough ε and so Equation (20) is proved. (b) The proof is analogous to that of part (a). A notable property of the Kronecker product that we will use in the sequel is [9, Theorem 6.1] We will use Equation (19) and for small ε will consider matrix C(ε) as a perturbation of matrix F 2 = I N ⊗ F + F ⊗ I N . First, let us study the spectral properties of F 2 . Letv i , i = 1, . . . , N, denote the ith. column of matrix (V | V ), that is,v i is an eigenvector of F associated to eigenvalue 0 if 1 ≤ i ≤ q or to an eigenvalue with strictly negative real part if q + 1 ≤ i ≤ N.
From [9, Theorem 6.5] we know that the eigenvalues of I N ⊗ F + F ⊗ I N are the set {λ i + λ j : λ i ∈ σ (F), i = 1, . . . , N} and that, associated to each eigenvalue λ i + λ j we have right and left eigenvectorsv i ⊗v j andû i ⊗û j wherev i andû i are, respectively, right and left eigenvectors of F associated to eigenvalue λ i , i, j = 1, . . . , N. Taking into account the spectrum of F we conclude that σ (F 2 ) consists of eigenvalue 0 with multiplicity q 2 associated to right (resp. left) left eigenvectorsv i ⊗v j , (resp. u i ⊗û j ) i, j = 1, . . . , q, and N 2 − q 2 eigenvalues with strictly negative real parts associated to right (resp. left) eigenvectorsv i ⊗v j (resp.û i ⊗û j ), where either i or j belong to the set {q + 1, . . . , N}. Clearly, eigenvalue λ = 0 is semisimple and is strictly dominant for F 2 .
It is easy to check that if A and B are any matrices with the block structure A = (a 1 | · · · | a k ), B = (B 1 | · · · | B s ) where the a i are column vectors, then A ⊗ B = (a 1 ⊗ B 1 | · · · | a 1 ⊗ B s | · · · | a k ⊗ B 1 | · · · | a k ⊗ B s ).

Applying this result we can write
(v 1 ⊗v 1 | · · · |v 1 ⊗v q | · · · |v q ⊗v 1 | · · · |v q ⊗v q ) = V ⊗ V (û 1 ⊗û 1 | · · · |û 1 ⊗û q | · · · |û q ⊗û 1 | · · · |û q ⊗û q ) = U T ⊗ U T so that we can write a Jordan canonical decomposition of F 2 in the form where 2 , V 2 and U 2 are appropriate matrices. From this expression we see thatF 2 := (V ⊗ V)(U ⊗ U) = (VU) ⊗ (VU) =F ⊗F is the eigenprojection of F 2 corresponding to eigenvalue 0. Like in part (a), we know that, for small enough ε, the dominant eigenvalue of C 2 (ε) necessarily must belong to the set of eigenvalues of C 2 (ε) that tend to 0 when ε → 0. Let us consider the restriction T 2 | ImF 2 of matrix T 2 :=F 2 S 2F2 to ImF 2 . Clearly ImF 2 is invariant for T 2 and, using Equation (11), Lemma 4.1 and the properties of the Kronecker product we have T 2 =F 2 S 2F2 = (V ⊗ V)S 2 (U ⊗ U) so thatS 2 is the matrix of T 2 | ImF 2 expressed in the basis of ImF 2 defined by the columns of V ⊗ V. Therefore, T 2 | ImF 2 andS 2 have the same eigenvalues and, moreover, if x and y are, respectively, right and left eigenvectors ofS 2 associated to a certain eigenvalue λ, then (V ⊗ V)x = 0 and (U ⊗ U) T y = 0 are right and left eigenvectors of T 2 | ImF 2 associated to eigenvalue λ. Now let us assume Hypothesis 6.2. Then we can reason exactly as we did in part (a), substituting C(ε), F, S,S,F, V and U for C 2 (ε), F 2 , S 2 ,S 2 ,F 2 , V ⊗ V and U ⊗ U, respectively, and obtain that for small enough ε, C 2 (ε) has a simple and strictly dominant eigenvalue that can be written in the form (21) and associated to eigenvectors with the form (22), from where the asymptotic behaviour (23) follows.