Reconstructing the geographic origin of the New World jays

We conducted a biogeographic analysis based on a dense phylogenetic hypothesis for the early branches of corvids, to assess geographic origin of the New World jay (NWJ) clade. We produced a multilocus phylogeny from sequences of three nuclear introns and three mitochondrial genes and included at least one species from each NWJ genus and 29 species representing the rest of the five corvid subfamilies in the analysis. We used the S-DIVA, S-DEC, and BBM analyses implemented in RASP to create biogeographic reconstructions, and BEAST to estimate timing of NWJ diversification. Biogeographic reconstructions indicated that NWJs originated from an ancestor in the Eastern Palearctic or Eastern + Western Palearctic, diversified in Mesoamerica and spread subsequently to North and South America; the group has been diversifying in the New World since the late Miocene.

The origins of corvids can be traced to Australia, from where the ancestor of the family dispersed to Asia, followed by radiations in Asia, Europe and elsewhere [8]. Ericson and colleagues [9][10][11] provided evidence that the origin of the oscine passerines, to which the Corvidae belongs, dates to the split of the Australo-Papuan tectonic plate from Antarctica in the early Tertiary,~53M years ago. Thus, the replacement of the early Tertiary rainforests in Australia by drier vegetation may have placed presumably forest-adapted early corvids in more open habitats, resulting in a new radiation that led to the present global distribution of the family. However, a recent study [12] estimated the age of the family at~20M years. Colonization of the New World by corvids occurred more recently, apparently via a trans-Beringian route [8]. The ancestor of NWJs, thought to be jay-like lineages related to Cissa and Urocissa, reached North America 10−8M years ago, in the Miocene [13]. Once in the Americas, a rapid radiation in South America by the early Pliocene, was apparently followed by a secondary radiation in North America [1]. This hypothesis was not supported by the later analyses of Bonaccorso and Peterson [14], who offered support for a hypothesis that the NWJ ancestor was in Mesoamerica or North America + Mesoamerica and radiated into North and South America independently. The origins of this diverse clade are of considerable interest in light of questions about the biogeography of Asian and American tropical groups [15], morphological innovation [2] and the evolutionary origins of social behavior [2].
Several previous studies have examined the evolutionary history, phylogeny and biogeography of the NWJs, but invariably based on sparse representation of NWJ genera and limited outgroup sampling. De los Monteros and Cracraft [1] proposed a first hypothesis of intergeneric relationships in the group based on cytochrome b sequences. Ericson et al. [11] added markers but included only single species per genus. Saunders and Edwards [16] examined mitochondrial DNA control region (CR) variation among representatives of all NWJ genera. Finally, Bonaccorso and Peterson [14] derived a multilocus phylogeny with much-improved NWJ species representation and included a historical biogeographic analysis that concluded that NWJ origins were in Mesoamerica or Mesoamerica + North America.
These previous studies, however, did not provide sufficient detail in terms of representation of taxa of NWJs and other corvids to permit thorough understanding the geographic origin of NWJs or to identify their closest corvid relatives. The key deficiency has been in terms of representation of taxa from the deepest branches of the corvid phylogeny; thus, we here derive a denser phylogenetic hypothesis by deriving DNA sequences for these early lineages, placing several species of treepies, blue magpies and green magpies in a molecular phylogeny for the first time, to permit more robust biogeographic analyses. In light of this new data set, we discuss geographic origins of NWJs and offer a new hypothesis.
DNA extraction, PCR amplification, sequencing, alignment and model selection DNA was extracted from frozen tissues using proteinase K digestion under manufacturer's protocols (Qiagen DNeasy tissue kit). PCR amplifications were performed using Promega GoTaq Flexi DNA polymerase. We employed a touchdown protocol for the amplification of ND2 and ND3, with annealing temperatures of 55 and 50°C, respectively. Constant annealing temperatures of 58, 50, 59 and 56°C were used for Cytb, βfib7, Myo2 and TGFβ2, respectively. PCR products were sent to Beckman Coulter Genomics (Danvers, MA) for purification and final sequencing.
Sequences were cleaned, assembled and aligned in Geneious 8.1.6 (Biomatters, Auckland, New Zealand) and checked against an automated alignment in MUS-CLE [42]. Sequence data were partitioned by codon position for mitochondrial DNA and by gene for nuclear introns. We used MrModeltest 2.3 [43] to identify appropriate substitutional models for each partition under the Akaike information criterion (AIC).

Phylogenetic analysis
Single-gene matrices were concatenated using Geneious 8.1.6 to produce an overall mitochondrial data set, a nuclear intron data set and a total combined matrix. Phylogenies were estimated with Bayesian analysis (BA) and maximum-likelihood (ML) inference methods. GARLI v2.01 [44] was used for the maximum-likelihood analysis. We performed 100 independent analyses and chose the tree with the best likelihood score. Nodal support values for topologies were evaluated via 500 nonparametric bootstrap replicates [45]. The 50% majority-rule consensus tree was acquired in PAUP* 4.0 [46]; 70% bootstrap support was considered as indicative of solid nodal support [47,48].
Bayesian phylogenetic estimates were obtained in MrBayes 3.2.2 [49][50][51]. Two independent Markov chain Monte Carlo (MCMC) runs of 2 Â 10 7 generations using four chains (temp = 0.1) per run were conducted. We sampled trees every 2000 generations and discarded the first 25% of the trees as burn-in. We used TRACER 1.6 [52] to assess convergence of parameter estimates and posterior probabilities. The remaining trees were summarized to obtain a 50% majority-rule consensus tree.

Ancestral area reconstructions
We employed S-DIVA (Statistical Dispersal-Vicariance Analysis), 'Bayes−Lagrange' or Statistical Dispersal-Extinction-Cladogenesis (S-DEC [53,54]), and Bayesian Binary Method (BBM) analysis implemented in RASP 3.2 [55,56] to reconstruct ancestral areas of corvid lineages. DIVA is an event-based method in which vicariance is assumed and includes potential contributions for dispersal and extinction by assigning a cost [57]. S-DIVA is an expansion of Bayes-DIVA implemented by Nylander et al. [58] that incorporates the entire posterior distribution of tree topologies, thus accounting for both phylogenetic and ancestral state uncertainty [55,56]. S-DEC model has been implemented using the source code for the C++ version of Lagrange [59]. This model gives the likelihood of all possible biogeographic scenarios estimated at a given node and summarizes biogeographic reconstructions across all user-supplied trees [56]. BBM is a statistical procedure in which a full hierarchical Bayesian approach is used to reconstruct ancestral distributions at ancestral nodes [60]; it is implemented in RASP [56], wherein the source code of MrBayes 3.1.2 is used to create the MCMC analysis.
We defined 8 geographic areas (A = North America, B = Mesoamerica, C = South America, D = Western Palearctic, E = Eastern Palearctic, F = Afrotropics, G = Indo-Malaya, H = Australia). We assigned each species included in the phylogenetic analysis to a region or regions, based on published range maps [61]. S-DIVA, S-DEC and BBM analyses were conducted using 10,000 postburn-in trees derived from the overall combined data matrix from the Bayesian Markov chain Monte Carlo (MCMC) procedure implemented in BEAST 1.8.2 [62]. For the S-DEC analysis, we considered the probability of dispersal between areas as equal, and all values in the dispersal constraint matrix were set to 1. BBM analysis used the maximum clade probability tree (condensed tree) from the BEAST analysis and included two independent runs of 10 chains (temp = 0.1) that ran for 1M generations, sampling every 100 generations, but discarding the initial 25% as burn-in. We used the F81 + G evolutionary model for the analysis, based on the results of   the model testing described above. In all analyses, the maximum number of areas included in ancestral distributions was set to 2. We excluded the broadly distributed outgroup Lanius from these analyses.

Timing of diversification
The fossil record for passerine birds is scarce and many described fossils lack information necessary for use as informative calibration points [63]. Thus, we estimated the time of diversification of NWJs using previously published ND2 substitution rate priors using BEAST 1.8.2 under a relaxed clock model. We unlinked nucleotide substitution models, and used MrModelTest 2.3 [43] to obtain best-fitting substitutional models. We linked clock and tree models, used a birth-death speciation process and ran them for 5 × 10 8 generations, sampling every 50,000th generation. Convergence was examined in TRACER [52]; all effective sample size (ESS) values were > 200. We discarded the initial 25% of samples as burn-in. Calibrating a phylogeny based on empirically determined mitochondrial DNA substitution rates offers only a rough estimate of actual time, though lacking fossils within the clade, such an external rate calibration offers the only option for linking to actual dates [64]. ND2 is widely studied in ornithology [65]; we used two empirically estimated substitution rate priors derived from ND2 substitution rates (2.4 and 3.3%/lineage/ million years) for Hawaiian honeycreepers [66] that were derived based on three recently obtained geologic calibration points, to estimate divergence times for various corvid lineages.

Results
Topologies inferred via ML and BA analysis were highly congruent for the concatenated mitochondrial and nuclear sequence datasets (Figure 1; also see Appendix 2 for concatenated mitochondrial, and nuclear datasets). Results from the BEAST analysis were similar to the ML and BA topologies, except that Pyrrhocorax was placed as sister to clade A in the BEAST analysis (posterior probability 0.84); ML and BA could not resolve the placement of this genus. In general, these mostly congruent phylogenetic hypotheses all agreed in identifying five clades within the corvids: A, including Temnurus, Platysmurus, Crypsirina, and Dendrocitta; B, including Cissa and Urocissa; C, including Cyanopica and Perisoreus; D, including Pica, Podoces, Ptilostomus, Garrulus, Nucifraga and Corvus; and E, including Cyanolyca, Aphelocoma, Gymnorhinus, Cyanocitta, Cyanocorax, Psilorhinus and Calocitta. The NWJ clade (clade E) was placed as sister to clade D with strong support (posterior probability 1.0, 100% ML bootstraps). Monophyly of the clades C, D and E was highly supported (1.0, 87%), although the sister relationship of Perisoreus and Cyanopica was ambiguous (0.72, 47%). Placement of Podoces as sister to Ptilostomus [11] was not well-supported; in contrast, a sister relationship between Podoces and Pica was better-supported, with a posterior probability of 0.95% and 58% bootstrap support.
S-DIVA, S-DEC and BBM ancestral area reconstructions for NWJs produced congruent results, estimating Mesoamerica as the most probable ancestral area for the common ancestor of clade E (Figure 2). According to S-DIVA analysis, Mesoamerica + Eastern Palearctic was the most probable ancestral area (p = 1.0) for the common ancestor of clades D and E. S-DEC also showed Mesoamerica + Eastern Palearctic as the most probable ancestral area (p = 0.63), and Eastern Palearctic was the other possible area (p = 0.37). BBM results were equivocal, with four possible ancestral areas: Eastern Palearctic (p = 0.40), Eastern Palearctic + Western Palearctic (p = 0.31), Western Palearctic (p = 0.16) and ambiguous (p = 0.13). Because all NWJ lineages were connected to American areas, uncertainty in phylogenetic arrangements within clade C had no effect on our reconstructions of areas of lineages ancestral to the NWJs.
BEAST relaxed-clock analysis based on the two substitution rate priors indicated that the time to the most recent common ancestor for NWJs was 5.8−4.3M years. The split of the NWJ clade (clade E) from its sister clade (clade D) was estimated at 6.7−5.0M years (Figure 3). Our analysis estimated that the Corvidae likely originated in the late Miocene, 8.0−5.9M years ago.

Discussion
Our biogeographic analysis was based on denser taxonomic sampling of the early branches of the corvid phylogeny than had been available in any previous study. Our results provided solid evidence regarding the geographic origin of NWJs. Mesoamerica + Eastern Palearctic or Eastern Palearctic was consistently identified as the most probable ancestral area for the ancestor of the NWJ and its sister clade; thus, the lineage probably reached the New World from the Eastern Palearctic through a trans-Beringian route. An interesting observation that resulted from the ancestral area reconstruction was the absence of the common ancestor of clades D and E in North America (Figure 2). A plausible explanation to this observation can be drawn from studies on other taxa: for example, Fox [67], based on mammalian fossils, documented aridification of North America during the late Miocene, thus linking it with increased seasonality. Similarly, Kohn and Fremd [68] documented decreased faunal diversity in western North America in the late Miocene, which they postulated to coincide with increased aridity and seasonality. The ancestor of clades D and E probably largely bypassed North America during the late Miocene, perhaps using western coastal habitats under the climate-stabilizing effects of the Pacific Ocean that may have offered temporary corridors for passage from the Eastern Palearctic to Mesoamerica. Hence, early cladogenesis-at least among extant taxaof the NWJ lineage probably took place either in Mesoamerica or in the Eastern Palearctic.
Our results indicated that the NWJs are most closely related to a clade comprising Pica, Podoces, Garrulus and others. Ericson et al. [11] identified Indo-Malaya as the ancestral area of origin of these taxa. In another study, De los Monteros and Cracraft [1] recovered Perisoreus as the sister taxon to the NWJ clade. Additionally, Craig et al. [13] described a close relationship between the NWJ ancestor and Cissa + Urocissa. Our results refute these hypotheses, with better taxonomic representation, a more robust multilocus data-set, and better support ( Figure 1).
All of our analyses identified Mesoamerica as the most probable area for the most recent common ancestor of NWJs. This result narrows down the results of Bonaccorso and Peterson [14], which showed the ancestral distribution of the NWJs as either Mesoamerica or Mesoamerica + North America. The proposed origin of Calocitta + Cyanocorax + Psilorhinus in Mesoamerica, dispersing thence into South America [14], was supported in our analysis. However, our results were unclear as to whether the Aphelocoma + Cyanocittta + Gymnorhinus clade originated in North America vs. North America + Mesoamerica.
De los Monteros and Cracraft [1] proposed the idea that the origin of the NWJs was the result of a single dispersal event into the Americas, which could have taken place around the early Miocene. However, our time calibration estimates indicated a later (Miocene−early Pliocene) origin for the NWJs (Figure 3), in which the ancestor probably dispersed to the New World from the Eastern Palearctic, or Eastern + Western Palearctic, around 6.7−5.0 M years ago (Figure 3). This event could have taken place when a continuous coniferous belt linked Northern Asia and America across Beringia from mid-to-late Miocene to the late Pliocene [69], yet the timing of our reconstructions does not appear to coincide with any specific geological events that might have triggered the dispersal event.
Our time calibration estimated a late Miocene-early Pliocene origin for the most recent common ancestor (MRCA) of NWJs. This timing makes intuitive sense considering the absence of the MRCA in North America given the continent's increased seasonality and aridity by that time period, as stated above [67]. The time calibration further estimated the age of the family Corvidae at 8.0−5.9 M years, substantially younger than the recent estimate of~20M years in the phylogeny of corvoid passerine birds [12]. However, we note that the timing of Corvidae-Laniidae split of the recent phylogeny of songbirds based on~5000 loci by Moyle et al. [70] coincides with our estimate for the same node (not shown here). Moreover, our time estimates for the MRCA of the South American genera (2.8−2.0M years for Cyanolyca, and 2.3−1.7M years for Cyanocorax) make intuitive sense, since dispersal of theses clades from Mesoamerica into South America should postdate the emergence of the Isthmus of Panama, around 2.8 M years ago [71]. However, we emphasize that our time estimates are rough estimates of actual time and must be considered with caution.
Our results provide novel insights into the biogeographic origin of NWJs, offering a solid hypothesis based on different ancestral area reconstruction methods, backed by strong support from a robust, multilocus phylogeny. However, clearly, more complete taxon representation with genome-level sampling could perhaps improve support for some remaining ambiguous nodes in the phylogenetic hypothesis. Such an improved phylogeny would help to resolve ambiguities associated with some of our phylogenetic and geographic hypotheses, and could allow insight into other important questions, such as tropical-temperate transitions in the family.

Acknowledgments
We are grateful to the museum collectors and field workers who collected specimens used in this study. We thank the collections managers and curators of American Museum of Natural History, Louisiana State University Museum of Natural Science, National Taiwan Normal University, and the University of Kansas Natural History Museum, for providing tissue samples. For providing sequences of Crypsirina temia, Dendrocitta cinerascens, and Platysmurus leucopterus, we thank Carl Oliveros. We thank Matthew Girard for helping with data cleaning and statistical analysis. For wise guidance in the laboratory, we thank Alana Alexander. Thilina De Silva provided support in laboratory work and statistical analysis. We are grateful to Leo Smith for the assistance and guidance in the lab. The Office of the Provost, University of Kansas, supported this study financially.

Disclosure Statement
No potential conflict of interest was reported by the authors.  Figure 2B. Phylogeny based on Bayesian and maximum-likelihood analyses of the combined mitochondrial data set. Numbers at the nodes refer to Bayesian posterior probability/maximum-likelihood bootstrap support.