The recent and rapid spread of Themeda triandra

Abstract Tropical savannas cover over 20% of land surface. They sustain a high diversity of mammalian herbivores and promote frequent fires, both of which are dependent on the underlying grass composition. These habitats are typically dominated by relatively few taxa, and the evolutionary origins of the dominant grass species are largely unknown. Here, we trace the origins of the genus Themeda, which contains a number of widespread grass species dominating tropical savannas. Complete chloroplast genomes were assembled for seven samples and supplemented with chloroplast and nuclear ITS markers for 71 samples representing 18 of the 27 Themeda species. Phylogenetic analysis supports a South Asian origin for both the genus and the widespread dominant T. triandra. This species emerged ~1.5 Ma from a group that had lived in the savannas of Asia for several million years. It migrated to Australia ~1.3 Ma and to mainland Africa ~0.5 Ma, where it rapidly spread in pre-existing savannas and displaced other species. Themeda quadrivalvis, the second most widespread Themeda species, is nested within T. triandra based on whole chloroplast genomes, and may represent a recent evolution of an annual growth form that is otherwise almost indistinguishable from T. triandra. The recent spread and modern-day dominance of T. triandra highlight the dynamism of tropical grassy biomes over millennial time-scales that has not been appreciated, with dramatic shifts in species dominance in recent evolutionary times. The ensuing species replacements likely had profound effects on fire and herbivore regimes across tropical savannas.


Introduction
Tropical savannas are among Earth's most productive biomes and cover around 20% of global land area (Lehmann and Parr 2016). Throughout the tropics and subtropics, these grassy biomes are dominated by C 4 grass species, which displaced C 3 species across these regions during the Late Miocene and Pliocene as Earth's environment warmed and dried (3-8 Ma; Cerling et al. 1997;Edwards et al. 2010). Consumer centred feedbacks related to fire and grazing have been invoked as underlying determinants of the globally synchronised rapid expansion of tropical grassy biomes (Beerling and Osborne 2006;Edwards et al. 2010;Scheiter et al. 2012). Indeed, there is evidence for the co-evolution of grazing mammals with the grass species they consume (Bouchenak-Khelladi et al. 2009;Sage and Stata 2015), just as there is palaeo-evidence of a global increase in fire with the expansion of C 4 grass-dominated savannas (Edwards et al. 2010). Today, these tropical grassy biomes both sustain large numbers of grazing mammals and promote frequent fire (Archibald and Hempson 2016), and feedbacks between both fire and herbivory have been demonstrated as central to the modern dynamics of tropical grassy biomes over much of their range (Archibald and Hempson 2016;Lehmann et al. 2014). Similarly, at local scales, many grass species rely on fire and herbivory for their persistence and dispersal (Bond et al. 2003). The origins of these co-dependency feedbacks remain, however, poorly understood.
While the timing of the origin of C 4 grass dominated biomes is relatively well resolved, many questions regarding the composing flora are unanswered. Today, a number of C 4 grass species from these habitats have exceptional geographic ranges that span multiple continents. This is in stark contrast to the woody flora of savannas, which are typically derived from local forest ancestors in response to the novel ecological conditions generated by the development of a C 4 grass ground layer OPEN ACCESS (Maurin et al. 2014;Simon et al. 2009). Establishing why savannas are dominated by relatively few species requires considering the origins of the grass species that dominate these habitats, in terms of space, time and ancestral conditions. Among C 4 grass species, variation in flammability is phylogenetically constrained, and the traits of the grassy ground layer have ecosystem-level impacts on fire frequency and intensity (Archibald et al. 2013;Simpson et al. 2016). A key question is therefore whether grass traits underlying savanna dynamics existed before the colonisation of these ecosystems, or whether they evolved in situ. This problem needs to be addressed by elucidating the history of groups that are nowadays involved in the fire/herbivore feedbacks in these systems.
In this study, we reconstruct the phylogeography of the genus Themeda Forssk. (Poaceae, Panicoideae, Andropogoneae). This genus includes one of world's most widespread C 4 grasses species, Themeda triandra Forssk., along with a modern invasive, Themeda quadrivalvis (L.) Kuntze, a grassy weed common in disturbed areas worldwide (Keir and Vogler 2006). In total, there are 27 recognised Themeda species with both annual and perennial lifeforms (Clayton et al. 2014). The range of the genus is defined by the range of T. triandra and all other species are either regionally or locally restricted. The highest concentration of species diversity is located in Asia, in particular India (Morales 2014). Despite ongoing phylogenetic work in Andropogoneae (Giussani et al. 2001;Mathews et al. 2002;Sánchez-Ken and Clark 2010;Arthan et al. 2017), the tribe that includes Themeda, detailed analyses of this ecologically important genus are lacking.
Today, T. triandra is central to the ecological dynamics of palaeotropical savannas (Gibbs Russell et al. 1991;Jessop, Dashorst, and James 2006;Snyman, Ingram, and Kirkman 2013), where the absence of this particular species has been used as an indicator of reduced ecosystem function and soil quality (Mills et al. 2005;du Preez and Snyman 1993). T. triandra is generally abundant where it is found, with both fire and grazing by mammals necessary for its persistence (Snyman, Ingram, and Kirkman 2013). Indeed, the ecological dominance of T. triandra relies on periodic burning and its ability to rapidly resprout post fire (Bond et al. 2003;Morgan and Lunt 1999). For example, in Serengeti region T. triandra comprises approximately 50% of grass cover in areas of light to moderate grazing (Vuorio et al. 2014), but is lost from a system where both fire and grazing are excluded (Danckwerts 1993). Themeda triandra is also a crucial food source for domestic livestock and wildlife in both Africa and Australia (McNaughton 1985;Morgan and Lunt 1999). Despite the ecological importance of T. triandra, the timing and rate of its geographic spread remain unknown.
Here, we investigate the evolutionary origins of the Themeda genus and T. triandra in particular via phylogenetic reconstructions. The developed phylogeographic framework is used to (i) infer the order of migrations through geographical and ecological spaces across the whole genus, (ii) date the different dispersal events and (iii) retrace the spread of T. triandra through space and time. Our analyses of this paleotropical savanna dominant sheds new light on the biogeographic factors underlying the assembly of new ecosystems during recent geological times.

Sampling and sequencing
Leaf samples for Themeda (n = 71; species = 18) and other closely related Andropogoneae (n = 7; species = 6) were collected from the field or obtained from the herbaria at the Royal Botanic Gardens Kew and Edinburgh (Table S1). The sampling strategy optimised the morphological, ecological and geographic diversity within the genus given the sample availability. DNA was extracted from silica dried leaves and herbarium material using the DNeasy Plant Mini Kit (Qiagen, Texas, USA), following the manufacturer's protocol. PCR and Sanger sequencing of five plastid regions (trnK-matK, rpl16, ndhF, rpoC2 and trnL-trnF) and the nuclear-encoded ITS marker (i.e. internal transcribed spacers of the ribosomal DNA) were performed as described in Lundgren et al. (2015). In brief, 25 μl PCR reactions were amplified with a 48°C annealing temperature, subsequently cleaned using the Exo-SAP treatment (Affymetric, High Wycombe, UK) and finally sequenced using the Big Dye 3.1 Terminator Cycle Sequencing chemistry (Applied Biosystems, California, USA). Multiple sets of PCR primers were used (Table S2). The older herbarium specimens yielded degraded DNA, and markers were consequently amplified in short overlapping fragments. The resultant chromatographs were manually corrected in Geneious v.5.3.6. All Sanger sequences were deposited in the NCBI GenBank database (KY991068-KY991366).
Four Themeda triandra, two Heteropogon Pers. and one Themeda quadrivalvis accessions were subjected to low-coverage whole-genome sequencing (genomeskimming), as described in Lundgren et al. (2015). In brief, samples were individually barcoded, pooled and sequenced on an Illumina HiSeq-2500 or HiSeq-3000 at the Genopole platform of Toulouse. In total, 1/24th of a lane paired-end data were generated per sample, with all raw data deposited in the NCBI Sequence Read Archive (project identifier PRJNA377519; Table S3).

Assembly, alignment and phylogenetic inference
A complete chloroplast genome (hereafter plastome) was assembled de novo for one sample using the genome-walking method described in Besnard et al. (2013). In brief, a combination of extractread2 (included in the OBITools package, http://metabarcoding.org/obitools) and velvet (Zerbino and Birney 2008) was used to identify overlapping chloroplast reads in the raw data and subsequently assemble them. For each sample, we then mapped the paired-end Illumina data back to this initial reference using bowtie2 v.2.2.9 (Langmead and Salzberg 2012). Indels and SNPs were called with SAMtools mpileup function v.1.2 (Li et al. 2009) and filtered using the accompanying vcfutils.pl script (parameters: d = 10, a = 5 and Q = 30). Plastome sequences for each sample were generated using vcftools v.0.1.11 (Danecek et al. 2011). To remove sequencing errors from SNP calls, we discarded minor alleles that had a within sample frequency of < 0.2. Plastome sequences were submitted to GenBank (accession numbers: KY707767-KY707773).
The newly generated plastomes were aligned with others publicly available for Andropogoneae and Arundinella Raddi using MAFFT v. 7.123b (Arthan et al. 2017;Katoh and Standley 2013). The alignment was manually refined and the inverted repeat region removed. The Sanger sequencing data, and additional data retrieved from GenBank, were aligned to the plastome matrix using MAFFT and then concatenated by sample. A phylogeny was obtained using Bayesian inference, as implemented in MrBayes version 3.2.0 (Huelsenbeck and Ronquist 2001) with the GTR + G + I substitution model. Two analyses, each composed of four chains, were run for 10 million generations, sampling a tree every thousand generations. Convergence was evaluated with Tracer v. 1.5.0 (Drummond and Rambaut 2007), and a consensus was computed using all the trees sampled after a relative burn-in period of 10%. To verify the effect of missing data on phylogenetic inference, we also constructed a phylogeny using a trimmed alignment containing five chloroplast genes (trnK-matK, rpl16, ndhF, rpoC2 and trnL-trnF) either obtained via Sanger sequencing or extracted from plastomes. The ITS sequences were similarly aligned with MAFFT, and a Bayesian phylogeny was inferred as above.

Molecular dating
Divergent groups within Themeda were identified from the plastid and ITS phylogenies, and the accessions from each clade with the most complete sequences were retained for molecular dating analysis as implemented in BEAST v. 1.8.4 (Drummond and Rambaut 2007). Arundinella deppeana Nees was used as the outgroup, based on the results of Grass Phylogeny Working Group II (2012) that demonstrate the monophyly of the ingroup (Andropogoneae). Dating subgroups of grasses is complicated by the limited fossil record, and no informative fossil is available for the group studied here. We consequently adopted a secondary calibration approach, with a calibration point extracted from a previous, angiosperm-wide dating analysis ). The age of grasses has been hotly debated, with controversial fossils potentially pushing the origin of the group back in time (Prasad et al. 2011;Burke, Lin et al. 2016), although their effect is reduced when the evidence for non-grass groups is incorporated . Because this debate affects only the scale of the dating analyses and not the ages relative to each other, we decided to fix the calibration point to values previously estimated and present the results of our dating analyses under two different scenarios; with only macrofossils and with macrofossils plus phytoliths, in both cases using the dates that take into account the evidence for groups outside grasses . Our approach has the advantage of showing the uncertainty resulting from the genetic data (confidence intervals around the age estimates) independently of the calibration density (uncertainty about the scale). The divergence of Zea mays L. and Sorghum bicolor was fixed at 15.26 Ma (or 22.38 when taking into account controversial microfossils; Christin et al. 2014), which was achieved with a normal distribution that had a mean of 15.26 and standard deviation of 0.0001. Two different analyses were run for 50,000,000 generations, sampling a tree every 1,000 generations with a GTR + G model, a Yule process speciation prior and a relaxed log-normal clock. After evaluating the convergence of the runs in Tracer v. 1.5.0 (Drummond and Rambaut 2007), the burn-in period was set to 10%, and the maximum credibility tree was identified from all trees sampled from both runs after the burn-in period, mapping median ages on nodes. To check the effect of missing data on our results, we repeated this analysis using only the samples with full chloroplast genomes.

Genome-wide phylogenetic tree
The seven genome-skimmed individuals were also genotyped across the nuclear genome as described in Olofsson et al. (2016). In brief, the reads were cleaned using NGS QC Toolkit v.2.3.3 (Patel and Jain 2012) and mapped onto the chromosomes of the closest available model species (Sorghum bicolor (L.) Moench, version Sorbi1; Paterson et al. 2009) using Bowtie2 v. 2.2.3 (Langmead et al. 2009). Only reads uniquely aligned in pairs were subsequently used for SNP calling using SAMtools v.0.1.19 and previously published scripts (Olofsson et al. 2016). The low-coverage genome-skimming data mean that some alleles are likely missed, leading to an overestimation of homozygosity. However, no bias is expected in the missing allele, so that the low coverage is unlikely to lead to spurious groupings (Olofsson et al. 2016). A Bayesian phylogeny was inferred for the SNP data using MrBayes as described above, run for three million generations.

Phylogenetic relationships within Themeda based on plastid markers
In this study, we sequenced and assembled seven plastomes, which were supplemented with a further 26 publicly available plastomes from Andropogoneae incorrectly identified individual as it is nested within T. triandra (Figure 1) and is subsequently treated as such. Themeda triandra then forms a monophyletic group with T. quadrivalvis (clade H; posterior probability = 1; Figure 1). The T. triandra clade is sister to T. tremula (Nees ex Steud.) Hack. (clade F; Figure 1), with the two being sister to clade G (Figure 1), which includes species from Australia and Papua New Guinea (Figure 1). Some of the other clades include multiple morphological species without variation in the genetic markers (clades C and D; Figure 1). Within the T. triandra / T. quadrivalvis clade, there appears to be three (including five Themeda accessions), and the plastome of Arundinella deppeana. Sanger-sequenced chloroplast data were obtained for a further 62 samples with a mean of 3,572 bp per sample (SD = 1623 bp; Table  S1). These were aligned to the reference chloroplast genome, trimmed of the inverted repeat, and the phylogeny inferred from this 125,270 bp alignment included 18 of the 27 Themeda species (Table S1).
The Bayesian chloroplast tree identified seven major Themeda clades (clades A-H; Figure 1, Table 1). A previously published chloroplast genome attributed to Themeda arguens (L.) Hack. may come from an here) and 17 of the 27 Themeda species (Fig. S2). One notable exception is the division of T. novoguineensis (Reeder) Jansen from T. arguens, which was not achieved with the chloroplast markers. Based on ITS, T. triandra, T. quadrivalvis and T. australis (R.Br.) Stapf (not sampled for chloroplast markers and hypothesised as a synonym of T. triandra) form a clade with no distinction among the morphological species, nor the clades identified based on plastid markers (Figure 1).

Molecular dating
Based on our dating analyses, the divergence between Themeda and its sister group occurred at 7.98 Ma (95% HPD 5.01-12.78), which is largely congruent with previous estimates (Estep et al. 2014;Spriggs, Christin, and Edwards 2014). Within Themeda, the divergence of T. triandra from its sister group (T. arguens) occurred at 3.50 Ma (95% HPD 2.16-5.02 Ma) and the first divergence within T. triandra (crown node) at 1.48 Ma (95% HPD 0.79-3.45; Figure 2). This first divergence within T. triandra corresponds to the divergence of mainland African accessions (H.1) from those from Madagascar (H.2) and Australia/Thailand (H.3; Figure 2). The Madagascan samples split from the Australian and Thai groups, which match the geographic regions (H.1, H.2 and H.3; Figure 1 and Fig. S1). Clade H.1 includes all the mainland African samples, in addition to those from Turkey and Yemen (Figure 1). Clade H.2 contains all the T. quadrivalvis samples, whether they come from Madagascar or Asia (Figure 1). Finally, clade H.3 encompasses all the Australian accessions, in addition to those from Bhutan, Nepal and Thailand (Figure 1). In clade H.3, there is a division between some of the Australian samples and the rest. Note that the overall diversity among T. triandra plastomes (defined as clade H) is low. Across the plastomes of T. triandra (mean length with one inverted repeat = 116,090 bp, SD = 35 bp), 99.2% of sites were conserved, for a 99.8% pairwise identity between the eight T. triandra plastomes. There is some effect of reducing the data-set on the topology, but the relationships within T. triandra remain the same when only the four densely sampled markers are used (Fig. S1).

Phylogenetic relationships based on the nuclear ITS
A majority of the clades from the plastid phylogeny were congruent with the phylogeny inferred from the 625bp ITS alignment containing 71 samples (59 sequenced Table 1. Themeda species used in this study and their distribution. In total, we sequenced 18 out of the 27 known species. the clade assignment is based on chloroplast and nuclear markers, with the number of samples sequenced indicated (n), and ecological information derived from Morales (2014 SNPs were used to infer a Bayesian phylogeny, with a mean of 11% missing data per samples. All nodes were fully supported (Figure 3). The tree was rooted on the branch separating Heteropogon from Themeda. Unlike in the chloroplasts and ITS phylogenies (Figures 1, 2, S2), T. quadrivalvis collected from Madagascar is strongly supported as sister to T. triandra in this tree (Figure 3). Within T. triandra, the Australian sample separates first, with the three African samples forming a derived monophyletic group (Figure 3).

Discussion
Based on the phylogenetic relationships and species distributions, Themeda triandra spread to its current cosmopolitan distribution surprisingly recently. This ecologically important species began to diversify approximately one and a half million years ago and has since been able to colonise the grassy biomes of Africa, Madagascar and Australia, long after these biomes initially assembled ones at 1.37 Ma (95% HPD 0.73-3.07), and this represents the divergence of clades H.1 and H.2 ( Figure 1). Australia was colonised over 1 Ma, and the two Thailand samples diverged 0.69 Ma (95% HPD 0.21-1.22). The African lineage began diversifying relatively recently, with the Uganda sample splitting from the South African ones 0.52 Ma (95% HPD 0.19-1.30; Figure 2). Note that the three African samples capture the earliest split within clade H.1 (Figure 1). While all date estimates are older when considering microfossils, the diversification of T. triandra is still placed within the last two million years ( Figure 2). Missing data appeared to have little effect on the dates, since removing partially sequenced individuals does not alter the results (Fig. S3).

Genome-wide nuclear phylogeny
A nuclear phylogeny was inferred from the genomeskimming data generated here, using the methods described in Olofsson et al. (2016). A total of 753,190 perennial. Previously, T. quadrivalvis has been considered as a potential synonym of T. triandra (Veldkamp 2016). However, if speciation is relatively recent, we would predict similar patterns for savanna species to what we observe in our phylogenies (Pennington and Lavin 2016) with a monophyletic daughter species (T. quadrivalvis) nested within a paraphyletic ancestor (T. triandra). This scenario is supported by the whole chloroplast genomes and ITS phylogenies (Figures 1, 2, S1, S2, S3), and it may be concluded that T. quadrivalvis represents a recent evolution of an annual growth form from a perennial ancestor. Switching between these two life histories may be a surprisingly simple transition in grasses (Linder and Rudall 2005), being controlled by only two genes (Hu et al. 2003). However, the nuclear phylogeny identifies T. quadrivalvis as sister to T. triandra ( Figure 4). Therefore, T. triandra replaced other C 4 grass species already established in these habitats, rather than being one of the founding species. Themeda trianda is a major component of modern day savannas and has rapidly become a key component of the ground flora of these ecosystems (Snyman, Ingram, and Kirkman 2013). This highlights the dynamic turn-over of tropical grassy biomes, with dramatic shifts in species dominance occurring in relatively recent evolutionary times Is Themeda quadrivalvis a synonym of Themeda triandra?
Both species are extremely similar and often dominate savannas. The key morphological difference between the two is that T. quadrivalvis is annual, and T. triandra is  accessions compared to the divergence of the mainland African accessions sampled (South Africa and Uganda). Our dating analyses show that the lineage containing the mainland African accession diverged from those that remained in Asia approximately one and a half million years ago. Individuals from this group subsequently colonised mainland Africa, with the first split within this lineage estimated at half a million years ago (South Africa and Uganda), further supporting a recent and rapid spread throughout Africa (Figure 4). The fact that this species was able to conquer multiple continents while its congeners remained geographic restricted is intriguing. The capacity for wind dispersal of T. triandra seeds is poor, with most landing <1.75 m from a parent plant (Everson, Yeaton, and Everson 2009), but they can travel epizoochorously potentially over long distances (Agnew and Flux 1970;Milton 1993;Reynolds and Cumming 2016). Hydroscopic awns might aid their attachment to animal fur or feathers, but such awns are found in numerous Andropogoneae genera (Kellogg and Watson 1993) and in all Themeda species (Morales 2014). It is therefore possible that the impressive spread of T. triandra is not the result of a higher dispersal ability, but instead reflects an enhanced survival after occasional long distance dispersal.
Evolving from a genus already associated with tropical grassy biomes over a long period means that T. triandra is inherently adapted to these habitats. Its height and rapid growth enable T. triandra to out compete neighbouring species for light, water and nutrients. Fire also plays an important role in its dominance, with T. triandra being highly flammable relative to co-occurring grass species (Simpson et al. 2016) and its persistence reliant on burning (Danckwerts 1993). These characteristics are, however, at least partially shared across the genus (Morales 2014).
Polyploidy may also play a role in the success of T. triandra following dispersal, with the species recognised as having numerous populations of different ploidy levels (Birari 1980;Hayman 1960). Ploidy has been associated with the invasive potential of species, through improving the capacity to colonise new environments, via maintaining genetic diversity as new populations establish (te Beest et al. 2011). However, other species within the genus also include polyploids (Birari 1981). Furthermore, previous studies investigating allopolyploidy and diversification during Miocene grassland expansion showed that there is no correlation with the origin of novel morphological characters (Estep et al. 2014). We therefore conclude that the success of T. triandra cannot be attributed to a single characteristic, but probably reflects the combined action of multiple traits favouring dispersal and survival, but also potentially contingency. Indeed, a handful of chance dispersal events over long distances would have been sufficient to bring the species to the African continent, where its characteristics would have allowed a rapid colonisation of C 4 grassy biomes that already existed there.
( Figure 3). The inconsistencies between the plastome and nuclear phylogenies might stem from incomplete lineage sorting or hybridisation during the early diversification of the species complex, in which case T. quadrivalvis and T. triandra represent recently diverged sister taxa. Either way, both species are extremely closely related, and whether the two forms do interbreed in the wild or represent distinct species will require dedicated analyses in the future.

A long history in Asian savannas
Both the chloroplast and nuclear phylogenies place a series of Asian species as successive sister groups to the T. triandra clade (Figs. 1, 2, S2), clearly suggesting Asia as the ancestral area for the group. The earliest division within the genus divides clades A-D from E-H (Figure 2 Clayton and Hyparrhenia subplumosa Stapf). We therefore conclude that the savanna ecology, with its suite of associated traits, predates the genus. The Andropogoneae tribe as a whole contains many of the species typical of grassy biomes in the paleotropics (Osborne 2008). The group evolved C 4 photosynthesis more than 17 Ma (Christin et al. 2008), a trait that, in association with their capacity to quickly accumulate biomass between fires (Bond et al. 2003;Morgan and Lunt 1999), might have favoured their spread in novel grassy biomes as they expanded around the world. The savanna Themeda species themselves (clade E-H) began to diversify approximately 5 Ma (Figure 2), during the global expansion of C 4 grassy biomes (Cerling et al. 1997;Edwards et al. 2010). This means that the ancestors of T. triandra spent millions of years in the extensive and diverse grassy habitats of Asia (Dixon et al. 2014), and therefore T. triandra was probably well suited for the African savannas when it was eventually able to colonise this continent.

Paleotropical spread of Themeda triandra
From its Asian origin, T. triandra has spread to Australia and Africa (Figure 4). It is likely that T. triandra arrived in Australia before mainland Africa, as supported by higher genetic diversity within Australia and to a lesser extent, the earlier divergence of Madagascan and Australian

Conclusion
Themeda triandra is one of the species dominating savannas throughout Africa, Asia and Oceania. Interestingly, our dating analyses indicate that this species originated relatively recently in Asia, long after the expansion of C 4 grassy biomes. It emerged from a group that had inhabited Asian savannas for millions of years. Therefore, characteristics favouring the dominance of African C 4 grassy biomes predate the colonisation of these habitats and evolved in distinct ecosystems with similar properties. Interestingly, the spread of the species is independent from the expansion of the ecosystems, highlighting the importance of the biogeographic history, including random dispersal events, in the assemblage of large communities around the world. Following the colonisation of Africa, T. triandra likely displaced other savanna species that lived there, since it now represents the dominant grass in multiple systems across the continent. Overall, our results highlight the turnover of tropical grassy biomes, which may have experienced many dramatic shifts in species dominance during recent evolutionary times, with a succession of dominant species. This is likely to have had a profound effect on fire and herbivore regimes across these globally important ecosystems.