A Central Limit Theorem for Punctuated Equilibrium

Current evolutionary biology models usually assume that a phenotype undergoes gradual change. This is in stark contrast to biological intuition, which indicates that change can also be punctuated - the phenotype can jump. Such a jump can especially occur at speciation, i.e. dramatic change occurs that drives the species apart. Here we derive a central limit theorem for punctuated equilibrium. We show that, if adaptation is fast, for weak convergence to hold, dramatic change has to be a rare event.


Introduction
A long-standing debate in evolutionary biology is whether changes take place at times of speciation (punctuated equilibrium Eldredge and Gould [16], Gould and Eldredge [20]) or gradually over time (phyletic gradualism, see references in Eldredge and Gould [16]).Phyletic gradualism is in line with Darwin's original envisioning of evolution (Eldredge and Gould [16]).On the other hand, the theory of punctuated equilibrium was an answer to what fossil data was indicating (Eldredge and Gould [16], Gould and Eldredge [19,20]).A complete unbroken fossil series was rarely observed, rather heavy functional analysis apparatus, which unlike the direct one here, is completely inaccessible to the applied reader.We can observe the same competition between the tree's speciation and OU's adaptation (drift) rates, resulting in a phase transition when the latter is half the former (the same as in the no jumps case Adamczak and Miłoś [1,2], Bartoszek and Sagitov [6]).We show here that if large jumps are rare enough, then the contemporary sample mean will be asymptotically normally distributed.Otherwise the weak limit can be characterized as a "normal distribution with a random variance".Such probabilistic characterizations are important as there is a lack of tools for punctuated phylogenetic models.This is partially due to an uncertainty of what is estimable, especially whether the contribution of gradual and punctuated change may be disentangled (but Bokma [11] indicates that they should be distinguishable).Large sample size distributional approximations will allow for choosing seeds for numerical maximum likelihood procedures and sanity checks if the results of numerical procedures make sense.Most importantly the understanding of the limiting behaviour of evolutionary models with jumps will allow us to see the effects of these jumps, especially how much do they push the populations out of stationarity.
In Section 2 we introduce the considered probabilistic model.Then in Section 3 we present the main results.Section 4 is devoted to a series of technical convergence lemmata that characterize the speed of decay of the effect of jumps on the variance and covariance of tip species.Finally in Section 5 we calculate the first two moments of a scaled sample average, introduce a submartingale related to the model and put this together with the previous convergence lemmata to prove the Central Limit Theorems of this paper.

A model for punctuated stabilizing selection 2.1 Phenotype model
Stochastic differential equations (SDEs) are today the standard language to model continuous traits evolving on a phylogenetic tree.The general framework is that of a diffusion process dX(t) = µ(t, X(t))dt + σ a dB t . (1) The trait follows Eq. ( 1) along each branch of the tree (with possibly branch specific parameters).At speciation times this process divides into two processes evolving independently from that point.The full generality of Eq. ( 1) is not implemented in contemporary phylogenetic comparative methods (PCMs).Currently they are focused on the OU processes dX(t) = −α(X(t) − θ (t))dt + σ a dB t , where θ (t) can be piecewise linear, with different values assigned to different regimes (see e.g.Bartoszek et al. [7], Butler and King [12], Hansen [21]).There have been a few attempts to go beyond the diffusion framework into Lévy process, including Laplace motion, (Bartoszek [4,5], Duchen et al. [14], Landis et al. [22]) and jumps at speciation points (Bartoszek [5], Bokma [9,10]).
We follow in the spirit of the latter and consider that just after a branching point, with a probability p, independently on each daughter lineage, a jump can occur.We assume that the jump random variable is normally distributed with mean 0 and variance σ 2 c < ∞.In other words, if at time t there is a speciation event, then just after it, independently for each daughter lineage, the trait process X(t + ) will be where X(t −/+ ) means the value of X(t) respectively just before and after time t, Z is a binary random variable with probability p of being 1 (i.e.jump occurs) and Y ∼ N (0, σ 2 c ).The parameters p and σ 2 c can, in particular, differ between speciation events.

The branching phenotype
In this work we consider a fundamental model of phylogenetic tree growth -the conditioned on number of tip species pure birth process.We first introduce some notation, illustrated in Fig. 1 (see also Bartoszek [5], Bartoszek and Sagitov [6], Sagitov and Bartoszek [31]).We consider a tree that has n tip species.Let U (n) be the tree height, τ (n) the time from today (backwards) to the coalescent of a pair of randomly chosen tip species, τ i j the time to coalescent of tips i, j, ϒ (n) the number of speciation events on a random lineage, υ (n) the number of common speciation events for a random pair of tips bar the splitting them event and υ (n) i j the number of common speciation events for a tips i, j bar the splitting them event.Furthermore let T k be the time between speciation events k and k + 1, p k and σ 2 c,k be respectively the probability and variance of the jump just after the k-th speciation event on each daughter lineage.
The above intuitive descriptions can be made more precise.We first introduce two separate labellings for the tip and internal nodes of the tree.Let the origin of the tree have label "0".Next we label from "1" to "n − 1" the internal nodes of the tree in their temporal order of appearance.
The root is "1", the node corresponding to the second speciation event is "2" and so on.We label the tips of the tree from "1" to "n" in an arbitrary fashion.This double usage of the numbers "1" to "n − 1" does not generate any confusion as it will always be clear whether one refers to a tip or internal node.
ϒ (i,n) = {number of nodes on the path from the root (internal node 1, including it) to tip node i} Definition 2.4 For i ∈ N Tip (U (n) ), j is a node on the root to tip node i path and I be a binary random variable such that be a binary random variable equalling 1 iff a jump took place just after the r-th speciation event in the sequence be a binary random variable equalling 1 iff 1 Definition 2.8 For i, j ∈ N Tip (U (n) ), We have the −1 in the above definition of υ (i, j,n) as we are interested in counting the speciation events that could have a jump common to both lineages.As the jump occurs after a speciation event, the jumps connected to the coalescent node of tip nodes i and j cannot affect both of these tips.Definition 2.11 For i, j ∈ N Tip (U (n) ) and r ∈ {1, . . ., max{I (i, j,n) } − 1}, let 1 (i, j,n) r be a binary random variable such that Definition 2.12 For i, j ∈ N Tip (U (n) ), 13 For i, j ∈ N Tip (U (n) ) and r ∈ {1, . . ., υ i, j }, let J (i, j,n) r be a binary random variable Definition 2.14 For i, j ∈ N Tip (U (n) ) and r ∈ {1, . . ., n − 1}, let Z (i, j,n) r be a binary random variable Definition 2.15 Let R be uniformly distributed on {1, . . ., n} and (R, K) be uniformly distributed on the set of ordered pairs drawn from {1, . . ., n} (i.e.Prob((R, Remark 2.16 For the sequences Remark 2.17 We drop the n in the superscript for the random variables 1 i , 1i , J i , Ji , Z i and Zi as their distribution does not depend on n.In fact, in principle, there is no need to distinguish between the version with and without the tilde.However, such a distinction will make it more clear to what one is referring to in the derivations of this work.
The following simple, yet very powerful, Lemma comes from the uniformity of the choice of pair to coalesce at the i-th speciation event in the backward description of the Yule process.The full proof can be found in Bartoszek [5] on p. 45.
Lemma 2.18 For all i ∈ {1, . . ., n − 1} We called the model a conditioned one.By conditioning we consider stopping the tree growth just before the n + 1 species occurs, or just before the n-th speciation event.Therefore, the tree's height U (n) is a random stopping time.The asymptotics considered in this work are when n → ∞. 1 = 1 always.The between speciation times on this lineage are T 1 , T 2 , T 3 + T 4 and T 5 .If we "randomly sample" the pair of extant species "A" and "B", then υ (n) = 1 and the two nodes coalesced at time τ (n) .The random index of their joint speciation event is Ĩ1 = 1.See also Bartoszek  The key model parameter describing the tree component is λ , the birth rate.At the start, the process starts with a single particle and then splits with rate λ .Its descendants behave in the same manner.Without loss generality we take λ = 1, as this is equivalent to rescaling time.
In the context of phylogenetic methods this branching process has been intensively studied (e.g.Bartoszek and Sagitov [6], Crawford and Suchard [13], Edwards [15], Gernhard [17,18], Mulder and Crawford [27], Sagitov and Bartoszek [31], Steel and McKenzie [35]), hence here we will just describe its key property.The time between speciation events k and k + 1 is exponential with parameter k.This is immediate from the memoryless property of the process and the distribution of the minimum of exponential random variables.From this we obtain some important properties of the process.Let H n = 1 + 1/2 + . . .+ 1/n be the n-th harmonic number, x > 0 and then their expectations and Laplace transforms are (Bartoszek and Sagitov [6], Sagitov and Bartoszek [31]) Now let Y n be the σ -algebra that contains information on the Yule tree and jump pattern.Bartoszek [5] previously studied the branching OU with jumps model and it was shown (but, therein for constant p k and σ 2 c,k and therefore there was no need to condition on the jump pattern) that, conditional on the tree height and number of tip species the mean and variance of an arbitrary tip species, A key difference that the phylogeny brings in is that the tip measurements are correlated through the tree structure.One can easily show that conditional on Y n , the covariance between an arbitrary pair of extant traits, ) . ( We will call, the considered model the Yule-Ornstein-Uhlenbeck with jumps (YOUj) process.

Remark 2.19
Keeping the parameter θ constant on the tree is not as simplifying as it might seem.
Varying θ models have been considered since the introduction of the OU process to phylogenetic methods (Hansen [21]).However, it can very often happen that the θ parameter is constant over whole clades, as these species share a common optimum.Therefore, understanding the model's behaviour with a constant θ is a crucial first step.Furthermore if constant θ clades are far enough apart one could think of them as independent samples and attempt to construct a test (based on normality of the species' averages) if jumps have a significant effect (compare Thms. 3.1 and 3.5).

Martingale formulation
Our main aim is to study the asymptotic behaviour of the sample average and it actually turns out to be easier to work with scaled trait values, The initial condition of course will be Y 0 = δ 0 .Just as was done by Bartoszek and Sagitov [6] we may construct a martingale related to the average Then (cf.Lemma 10 of Bartoszek and Sagitov [6]), we define This is a martingale with respect to F n , the σ -algebra containing information on the Yule n-tree and the phenotype's evolution.

Asymptotic regimes -main results
Branching Ornstein-Uhlenbeck models commonly have three asymptotic regimes (Adamczak and Miłoś [1,2], Ané et al. [3], Bartoszek [5], Bartoszek and Sagitov [6], Ren et al. [29,30]).The dependency between the adaptation rate α and branching rate λ = 1 governs in which regime the process is.If α > 1/2, then the contemporary sample is similar to an i.i.d.sample, in the critical case, α = 1/2, we can, after appropriate rescaling, still recover the "near" i.i.d.behaviour and if 0 < α < 1/2, then the process has "long memory" ("local correlations dominate over the OU's ergodic properties", Adamczak and Miłoś [1,2]).In the OU process with jumps setup the same asymptotic regimes can be observed, even though Adamczak and Miłoś [1,2], Ren et al. [29,30]) assume that the tree is observed at a given time point, t, with n t being random.In what follows here, the constant C may change between (in)equalities.It may in particular depend on α.We illustrate the below Theorems in Fig. 2.
The scaled sample mean, (n) Y n converges weakly to random variable whose characteristic function can be expressed in terms of the Laplace transform of (II) If 0.5 = α, then the conditional variance of the scaled sample mean The scaled sample mean, (n/ ln n) Y n converges weakly to random variable whose characteristic function can be expressed in terms of the Laplace transform of σ 2 (III) If 0 < α < 0.5, then n α Y n converges almost surely and in L 2 to a random variable Y α,δ with first two moments Remark 3.2 For the a.s. and L 2 convergence to hold in Part (III), it suffices that the sequence of jump variances is bounded.Of course, the first two moments will differ if the jump variance is not constant.
Definition 3.3 A subset E ⊂ N of positive integers is said to have density 0 (e.g.Petersen [28]) if where χ E (•) is the indicator function of the set E. (II) If 0.5 = α, then (n/ ln n) Y n is asymptotically normally distributed with mean 0 and variance 2.

Remark 3.6
The assumption σ 4 c,n p n → 0 with density 1 is an essential one for the limit to be a normal distribution, when α ≥ 0.5.This is visible from the proof of Lemma 4.5.In fact, this is the key difference that the jumps bring in -if they occur too often (or with too large magnitude), then they will disrupt the weak convergence.
One natural way is to keep σ 2 c,n constant and allow p n → 0, the chance of jumping becomes smaller relative to the number of species.Alternatively σ 2 c,n → 0, which could mean that with more and more species -smaller and smaller jumps occur at speciation.Actually, this could be biologically more realistic -as there are more and more species, then there is more and more competition and smaller and smaller differences in phenotype drive the species apart.Specialization occurs and tinier and tinier niches are filled.

Key convergence lemmata
We will now prove a series of technical lemmata describing the asymptotics of driving components of the considered YOUj process.
PROOF For a given realization of the Yule n-tree we denote by τ 2 two independent versions of τ (n) corresponding to two independent choices of pairs of tips out of n available.We have, Let π n,k be the probability that two randomly chosen tips coalesced at the k-th speciation event.We know that (cf.Bartoszek and Sagitov [6], Steel and McKenzie [35]) . Writing and as the times between speciation events are independent and exponentially distributed we obtain On the other hand, Taking the difference between the last two expressions we find Using the simple equality we see that it suffices to study the asymptotics of, where To consider these two asymptotic relations we observe that for large n Notice that obviously Var E e −2ατ ) derived from the same random lineage and a fixed jump probability p we have PROOF We introduce the random variables ) and Immediately (for i < j) The term E (E [1 i |Y n ]) 2 can be [see Lemma 11 in 6] expressed as E 1 (1) i 1 (2) i where 1 (1) i and 1 (2) i are two independent copies of 1 i , i.e. we sample two lineages and ask if the i-th speciation event is on both of them.This will occur if these lineages coalesced at a speciation event k ≥ i. Therefore E 1 Together with the above .

III
We use the equality [cf.Lemma 11 in 6] and consider the three parts in turn.
Cn −1 ln n α = 0.25 Cn −1 ln n α = 0.25 Putting these together we obtain On the other hand the variance is bounded from below by III .Its asymptotic behaviour is tight as the calculations there are accurate up to a constant (independent of p).This is further illustrated by graphs in Fig. 3. PROOF We consider the case, α > 0.25.Notice that in the proof of Lemma 4.3 Var If the jump probability and variance are not constant, but as in the Corollary, The Corollary is a consequence of a more general ergodic property that if u > 0 and the sequence a i → 0 with density 1, then For 1 ≤ u the above is a direct consequence of Petersen [28]'s Lemma 6.2 (p.65).For 0 On the other hand if a i does not go to 0 with density 1, then lim sup When α = 0.25 we obtain the Corollary using the same ergodic argumentation for ln −1 n ∑ n−1 i=1 p i σ 4 c,i i −1 .
Lemma 4.5 For random variables (υ (n) , Ĩ(n) , Ji i=1 ) derived from the same random pair of lineages and a fixed jump probability p PROOF We introduce the notation ) and by definition we have Var E We introduce the random variable and obviously (for As usual let (τ 2 ) be two independent copies of (τ (n) , υ (n) , Ψ (n) ) [cf.Lemma 11 in 6] and now Using the above we consider each of the five components in this sum separately.
Putting I-V together we obtain The variance is bounded from below by III and as these derivations are correct up to a constant (independent of p) the variance behaves as above.This is further illustrated by graphs in Fig. 4. 5 Proof of the Central Limit Theorems 3.1 and 3.5 To avoid unnecessary notation it will be always assumed that under a given summation sign the random variables (ϒ (n) , ) are derived from the same random lineage and also (υ (n) , Ĩ(n) , Ji i=1 ) are derived from the same random pair of lineages J k e −2α(T n +...+T PROOF The first equality is immediate.The variance follows from J k e −2α(T n +...+T k Jk e −2α(τ (n) +...+T This immediately entails the second moment.
Lemma 5.2 Assume that the jump probability is constant, equalling p, at every speciation event. Let and then for all α > 0 and n greater than some n(α) is a submartingale and furthermore W n converges a.s. and in L 1 to a random variable W ∞ with expectation PROOF PROOF FOR α > 0.5 , where ξ i is a binary random variable indicating whether it is the i-th lineage that split (see Fig 5).
In particular note Therefore W n will be a submartingale with respect to Y n from a certain n.We know that E [W n ] < C E for some constant C E , as E [W n ] → 4p/(2α(2α − 1)) [Appendix A.2. 5] and hence by the martingale convergence theorem W n → W ∞ a.s.for some random variable W ∞ .As all expectations are finite ).Also the previous implies uniform integrability of {W n } and hence L 1 convergence.
PROOF FOR α = 0.5 is the same as the proof for α > 0.5 except that we will have for n large enough PROOF FOR 0 < α < 0.5 is the same as the proof for α > 0.5 except that we will have for n large enough

Now following again Bartoszek
Following Bartoszek [5] converges to a constant by Lemma 4.5.
PROOF OF THEOREM 3.1, PART (I), α > 0.5 We will show convergence in probability of the conditional mean and variance for a finite mean and variance random variable σ 2 ∞ .Then due to the conditional normality of Y n this will give the convergence of characteristic functions and the desired weak convergence, i.e.

E e ix
Using Lemma 5.1 and that the Laplace transform of the average coalescent time [Lemma 3 in 6] is we can calculate Therefore we have µ n → 0 in L 2 and hence in P. PROOF OF PART (III), 0 < α < 0.5 We notice that the martingale H n = (n + 1)e (α+1)U (n) Y n has uniformly bounded second moments.
Namely by boundedness of σ 2 c , Lemma 5.1 and Cauchy-Schwarz E H 2 n = (n + 1) 2 E e 2(α−1)U Hence, sup n E H 2 n < ∞ and by the martingale convergence theorem, H n → H ∞ a.s. and in L 2 .As was done by Bartoszek and Sagitov [6] we obtain (Bartoszek and Sagitov [6]'s Lemma 9) n α Y n → V (α−1) H ∞ a.s. as in L 2 .Notice that for the convergence to hold in this regime, it is not required that σ 2 c,n is constant, only bounded.We may also obtain directly the first two moments of n α Y n (however, for these formulae to hold, σ 2 c,n has to be constant) ) |Y n implying that the convergence p n σ 4 c,n → 0 with density 1 is a necessary assumption for the asymptotic normality.

Definition 2. 1 N
Tip (t) = {set of tip nodes at time t} Definition 2.2

Figure 1 :
Figure 1: A pure-birth tree with the various time components marked on it.If we "randomly sample" node "A", then ϒ (n) = 3 and the indexes of the speciation events on this random lineage are I (n) 3 = 4 I (n) 2 = 2 and I (n) 1 = 1.Notice that I (n) [5]'s Fig.A.8. for a more detailed discussion on relevant notation.The internal node labellings 0-4 are marked on the tree.

Theorem 3 . 1
Assume that the jump probabilities and jump variances are constant equalling p and σ 2 c < ∞ respectively.Let Y n = (X n − θ )/ σ 2 a /2α be the normalized sample mean of the YOUj process with Y 0 = δ 0 .The process Y n has the following, depending on α, asymptotic with n behaviour.(I) If 0.5 < α, then the conditional variance of the scaled sample mean σ 2 n := n Var Y n |Y n converges in P to a random variable σ 2 ∞ with mean

Definition 3 . 4 A
sequence a n converges to 0 with density 1 if there exists a subset E ⊂ N of density 0 such that lim n→∞,n / ∈E a n = 0. Theorem 3.5 If σ 4 c,n p n is bounded and goes to 0 with density 1, then depending on α the process Y n has the following asymptotic with n behaviour.(I) If 0.5 < α, then (n) Y n is asymptotically normally distributed with mean 0 and variance (2α + 1)/(2α − 1).

Figure 5 :
Figure 5: The situation of the process between the n-th and n + 1-st split.Node m split so ξ m = 1 and ξ i = 0 for i = m.The time between the splits is T n+1 ∼ exp(n + 1).