Complete plastomes of three endemic Mexican pine species (Pinus subsection Australes)

Abstract We assembled the plastomes of Pinus greggii, P. jaliscana and P. oocarpa from 100 bp paired-end Illumina reads. We combined de novo (comparing Velvet and SPAdes) with reference-guided assembly and a final step of gap filling. SPAdes performed better than Velvet based on scaffold number (180 vs. 263) and mean length (1886 vs. 560 bp), and number of gaps (2 vs. 4). Annotations were automatically transferred from P. taeda NC_021440 and carefully revised by hand. Phylogenetic analysis with additional plastomes revealed very short branch lengths, supporting a rapid diversification within Australes and close relatedness among pines from Western Mexico.

Understanding diversification processes in Pinus has proven a difficult task due to their large genome with highly repetitive elements (ca. 23 Gbp; Neale et al. 2014), large effective population sizes, weak reproductive barriers and the recent origin of many of its species. Published data based on plastome sequences have revealed limited amounts of diversity and haplotype sharing among closely related species, suggesting incomplete lineage sorting, gene flow or both (e.g. Hern andez-Le on et al. 2013; Willyard et al. 2016). Plastomescale datasets are therefore necessary for more accurate phylogenetic hypotheses, analysis of phylogeographical structure and the estimation of gene-flow patterns.
The Mexican pines Pinus greggii Engelm. ex Parl., P. jaliscana P erez de la Rosa and P. oocarpa Schiede belong to subsection Australes, a group comprising approximately 30 species from North and Central America and the Antilles (Gernandt et al. 2005), hypothesized to have undergone a recent radiation (Willyard et al. 2007). A partial plastome sequence has been published for P. greggii (Parks et al. 2012), whereas only five plastid DNA markers are available for P. jaliscana and P. oocarpa (Hern andez-Le on et al. 2013). Detailed phylogenetic relationships among these endemic and recently diverged species have therefore not been addressed. These species are characterized by contrasting ecological and geographical features, from wide to restricted distribution ranges, and from temperate to subtropical environments. Pinus greggii is considered Vulnerable and P. jaliscana is considered Near Threatened by the International Union for the Conservation of Nature (IUCN 2016) (Supplementary Material 1).
Seeds used in this study were collected within the natural distribution range of each species and voucher specimens were deposited in the Herbario Nacional de M exico (Supplementary Material 2). Total genomic DNA was extracted from megagametophyte (haploid tissue of maternal origin found in the seed) using a Wizard Genomic DNA Purification kit (Promega, Madison, WI). DNA samples were stored at the Laboratorio Sistemática Molecular Bot anica (Instituto de Biolog ıa, UNAM, Mexico). Genomic libraries were prepared with 800 ng of DNA, sheared and size-selected to 250 bp (MYcroarray, Ann Arbor, MI). Libraries were sequenced in an Illumina Hi-Seq 2500 with 100 bp paired ends. The pipeline was developed in P. greggii for comparing the two de novo short-read assemblers, Velvet ver. 1.2.10 (Zerbino and Birney 2008) and SPAdes ver. 3.6.2 (Bankevich et al. 2012), and then replicated in P. jaliscana and P. oocarpa using SPAdes only. FASTQ files were processed in Geneious 9.2.6 (Kearse et al. 2012), removing duplicate reads and trimming low-quality bases.
In the first stage of the pipeline, plastome reads were filtered using the Geneious 9.2.6 mapper tool using P. taeda NC_021440 as the reference ). Filtered reads were independently assembled de novo using Velvet ver. 1.2.10 and SPAdes ver. 3.6.2. For each independent assembly, de novo contigs were mapped over NC_021440 using the Geneious Mapper tool with the highest sensitivity option and five iterations for fine tuning. The consensus sequence was generated and segments without any mapped contigs were annotated and filled with P. taeda reference sequence to produce a first chimeric draft plastome. In a second stage, the entire read dataset was assembled over the chimeric plastome. Ambiguous calls produced by the assembly of different reads to positions already covered by de novo contigs were resolved by selecting the nucleotide displayed by the de novo contig. Regions previously filled with P. taeda sequence that displayed a poor mapping of reads leading to low coverage (<30Â) and/or ambiguous calls were considered gaps. The consensus sequence for this second draft of the plastome was then obtained. In the third stage of the pipeline, we ran the gap filling software Sealer (Paulino et al. 2015) for finishing the genome. For details on the parameters used for each step, see Supplementary Material 4.
We compared the two resulting assemblies and retained the sequence based on SPAdes as the best version of the P. greggii plastome based on number and length of de novo contigs (Stage 1) and the number of gaps to be filled during Stages 2 and 3 (Table 1). This result agrees with assays performed on prokaryotic genomes (Magoc et al. 2013).
The P. greggii whole plastome sequence reported here is 274 bp longer than P. greggii partial plastome JN854198 published by Parks et al. (2012), which has 120,227 bp (plus 370 bp as Ns). Main differences of our sequence when compared to JN854198 include 2 insertions (291 bp in ycf1, 55 bp in trnN-chlL) and 2 deletions (99 bp in rrn16, 356 in cemA-ycf4 characterized by a high proportion of ambiguities). Improvements made on the P. greggii plastome can be attributed to methodological differences. For example, Parks et al. (2012) used Illumina 60 to 80 bp single-end lengths vs. 100 bp paired ends here. Plastome JN854198 was based on a 5Â coverage for conserved positions and a minimum of 20Â for SNPs, while our sequence had a coverage of 146Â.
Phylogenetic relationships based on P. greggii, P. jaliscana and P. oocarpa plastomes (GenBank: KY963967, KY963968 and KY963969) were investigated by aligning them to 17 plastomes from Pinus subsection Australes (Parks et al. 2012) and 2 plastomes from section Pinus as an outgroup (Wakasugi et al. 1994;Fang et al. 2016), using MAFFT as previously described. A maximum-likelihood tree was built using Stage 1 corresponds to the de novo assembly, Stage 2 to the reference-guided assembly, and Stage 3 to the gap filling step. Final plastome sequences resulted from the pipeline based on SPAdes. In P. jaliscana and P. oocarpa, only scaffolds >200 bp were used for mapping. a Value of k-mer length where gaps were successfully filled.
RAxML-HPC ver. 8 (Stamatakis 2006) on XSEDE (Miller et al. 2010) with a general time reversible þ gamma (GTR þ G) nucleotide substitution model and a random stepwise addition parsimony starting tree. Support values were based on 300 rapid bootstrap replicates. The alignment revealed an 805 bp psbA sequence inserted within the trnK-UUU-matK region of P. taeda NC_021440 (already identified as poorly resolved and considered as a gap during our assembly), which is exclusive of this plastome. Whether this insertion is a characteristic of this species or an inaccuracy remains to be determined. Genes ycf1 and ycf2, despite large differences in size between species (5157-5742 bp and 6090-6210 bp, respectively), were correctly aligned. This is in accordance with previous results showing that these genes harbour a great amount of variability at large and intermediate phylogenetic scales (Parks et al. 2012;Hern andez-Le on et al. 2013).
Phylogenetic analysis supports the existence of a monophyletic Oocarpae (egg-cone pines from Mexico and Central America) and Attenuatae (California closed-cone pines; Figure 1). Within Oocarpae, P. greggii (north-eastern Mexico) grouped more closely to P. leiophylla and P. chihuahuana (central and north-western Mexico, respectively). Pinus jaliscana (central western Mexico) and P. oocarpa (NW Mexico to Nicaragua) were closely related to P. lawsonii, P. pringlei (SW Mexico) and P. lumholtzii (NW Mexico) in general agreement with Hern andez-Le on et al. (2013). Yet, very short internal branch lengths and low bootstrap values hindered a precise determination of sister relationships, suggesting a very recent radiation of Western Mexican pines within Australes. Notoriously, our analysis revealed P. patula (eastern Mexico) as sister to all the other Mexican species, in contrast to the pattern reported by Parks et al. (2012) where it is sister to the P. greggii clade. However, current available Australes plastomes from Parks et al. (2012) display a high number of ambiguities (from 120 in P. lawsonii to 29,040 in P. attenuata), which means that internal branch lengths and phylogenetic relationships should be interpreted cautiously. Future investigations based on fully resolved plastome sequences shall help in building a more detailed understanding of pine evolution in Mexico, which represents the centre of diversity of the genus (Farjon and Filer 2013).

Disclosure statement
The authors declare no conflict of interest and are responsible for the content and writing of the paper.