Complete mitochondrial genomes provide current refined phylogenomic hypotheses for relationships among ten Hirundo species

Abstract Hirundo is the most species-rich genus of the passerine swallow family (Hirundinidae) and has a cosmopolitan distribution. Here we report the complete, annotated mitochondrial genomes for 25 individuals from 10 of the 14 extant Hirundo species; these include representatives from four subspecies of the barn swallow, H. rustica. Mitogenomes were conserved in size, ranging from 18,500 to 18,700 base pairs. They all contained 13 protein-coding regions, 22 tRNAs, a control region, and large and small ribosomal subunits. Phylogenetic analysis resolved most of the relationships between the studied species and subspecies which were largely consistent with previously published trees. Several new relationships were observed within the phylogeny that could have only been discovered with the increased amount of genetic material. This study represents the largest Hirundo mitochondrial phylogeny to date, and could serve as a vital tool for other studies focusing on the evolution of the Hirundo genus.


Introduction
The genus Hirundo comprises 14 species of swallows and has a cosmopolitan distribution spanning every continent except Antarctica. Of these species, the barn swallow (H. rustica) is the most widespread, breeding throughout Eurasia and North America, and overwintering across the Southern Hemisphere (Brown and Brown 1999). Eleven of the Hirundo species reside primarily in Africa, and three others inhabit the Pacific. Members of Hirundo build mud-cup nests in a variety of habitats (e.g. cliffs, anthills, human dwellings, bridges) and are highly adapted to aerial feeding. Among members of the genus Hirundo, variation in melanin-based plumage color is thought to be a prominent aspect of phenotypic differences. In addition to differences in plumage color among members of the genus Hirundo, length of outer tail streamers and body size tend to be helpful aspects of phenotype for identifying species (Dor et al. 2010). Differences in these character traits are often minimal, which presents challenges to taxonomic efforts (Sheldon et al. 2005;Dor et al. 2010).
Dor et al. published a species tree of all members of the genus Hirundo using one nuclear gene and six mitochondrial DNA genes with the purpose of understanding lineage diversification and phylogeographic relationships within this clade (Dor et al. 2010). This was significant as this was the only phylogeny of the Hirundo genus using more than 1000 base pairs of genetic material at the time. An earlier paper by (Sheldon et al. 2005) presented the entire swallow family, Hirundinidae, and used even fewer genes, thus a lower base pairs count, for its phylogeny. The ability to sequence and assemble entire mitogenomes is now more feasible than in the past due to reduced financial costs and improved computational accessibility; re-evaluating the phylogeny using more genetic data is now a more achievable task.
In the present study, we revisited the evolutionary relationships among members of the genus Hirundo by assembling complete mitochondrial genomes of 10 species. Genomes were assembled de novo from Illumina shotgun sequence data. These mitochondrial genomes offer greater resolution and support for some aspects of the previous Dor et al. (2010) Hirundo mtDNA gene tree and also offer insight to a new topology.

Sample collections
Samples were collected from both museum breast tissues and field collection blood samples as depicted in Table 1.

DNA extraction and sequencing
Genomic DNA was extracted using a QIAamp DNA Blood Mini kit, following instructed blood or tissue protocol. Genomic libraries were then prepared using Nextera V R XT DNA library prep kits (Illumina V R ), and each sample was barcoded using unique dual index adapters Nextera V R i5 and i7 by the University of Colorado's BioFrontiers Institute Next-Generation Sequencing Facility in Boulder Colorado. Samples that passed quality control were processed for paired end 150 base pair reads on the Illumina Novaseq S4 sequencer (800 raw data/lane) at Novogene V R .

Mitochondrial genome assembly
The whole genome data were aligned to the Hirundo rustica gutturalis reference mitogenome (GenBank accession KP148840.1) with Samtools function bwa-mem using default settings (Li 2013). Only the reads that aligned to the mitochondria were retained for use in downstream assembly (this subsetting of the data improved assembly performance). Reads were trimmed of adapters and low-quality reads using Trimmomatic v0.39 (Bolger et al., 2014)) with the following parameters: Illuminaclip: NexteraPE-PE.fa:2:20:10 Leading:20 Trailing:20 Sliding window:4:15 Minlen:100. De novo assembly of trimmed reads into scaffolds was performed with SPAdes v3.11.1 (Bankevich et al., 2012). The relative position, order, and orientation of scaffolds were determined by comparison with available H. rustica rustica and H. rustica erythrogaster reference genomes available on GenBank (GenBank accessions KP148840.1, KX398931.1). When multiple contigs represented the same genomic region, contig selection was based on maintaining approximate consistency in read coverage across contigs. Properly ordered contigs were then combined by trimming overlapping sequences. Gaps between scaffolds were filled by tiling from raw or trimmed reads.

Mapping and error correction
Assembled genomes were then aligned to reference Hirundo mitochondrial genomes (GenBank accessions KP148840.1, KX398931.1) with Zpicture, which allowed visualization of structural differences between the reference and our mitochondrial genomes (Ovcharenko et al., 2004). Assembled genomes were also compared according to relatedness for structural agreement following the work of Dor et al. (2010). Samtools tView was then used to identify possible SNPs (single nucleotide polymorphisms) or Indels (insertions and/or deletions) in the genome; modifications were made if mapped reads supported these assembly errors/variants (Li 2011). Completed draft assemblies were compared using Multiple Alignment using Fast Fourier Transform (MAFFT) ver. 7 (Katoh and Standley 2013). Large discrepancies in assemblies based on alignment to neighboring or same species were taken into account when making changes to the draft genome sequences.

Annotation
Genomic features were annotated using MITOS and MITOS2 (Bernt et al., 2013). Accuracy of these annotations was checked by comparison to annotations from Mitofish (Iwasaki et al., 2013) and tRNAscan-SE 2.0 (Lowe and Eddy 1997). Annotations were further verified using BLAST to compare the nucleotide and translated protein sequences of our mitogenomes against the same reference mitogenomes as used in the alignment step (GenBank accessions KP148840.1, KX398931.1). Sequence and annotation information was entered in NCBI's Sequin 15.50 software for final submission to GenBank. In some cases, Sequin identified errors in assembly (e.g. premature stop codons), which were then fixed via the methods described above.

Genome annotation
Mitochondrial genome content across birds has been studied intensively, and has been found to generally include the following features, which were present in all of our assemblies: 13 protein-coding genes (ND1, ND2, ND3, ND4, ND4l, ND5, ND6, COI, COII, COIII, ATPase6, ATPase8, and Cytb), 22 tRNAs, and the large and small rRNA of the ribosome (Boore, 1999). We found incomplete stop codons for COXIII and NAD4, which is a common feature of avian mitogenomes (Anmarkrud and Lifjeld 2017). Post-transcriptional polyadenylation of RNA is a likely completion mechanism of these stop codons, however, studies of transcriptomic data will be necessary to verify this hypothesis. The mitochondrial genomes also exhibit characteristic compactness, with a maximum separation of 40 base pairs between genes. The vast majority of genes were encoded on the þ strand. Of the 37 identified mitochondrial genes, only nine were encoded on the À strand, grouped into two operons and two standalone genes. Additionally, the genes were ordered identically across all sampled species, including the outgroup D. urbicum. The length of the highly variable rrnS and rrnL genes were relatively consistent across species, with rrnS measuring $1000 base pairs across species, and rrnL measuring $1600 base pairs.

Phylogenetic comparison
In recreating the Hirundo genus phylogeny, several differences can be observed between the previous Dor et al. phylogeny and the current complete mitochondrial genome phylogeny topologies. We found two main differences between our Hirundo phylogeny and those preceding it (Sheldon et al., 2005;Dor et al., 2010); both differences are within the 'Barn Swallow' clade ( figure 1) Our findings also clarify a previously unresolved node connecting the 'Blue Swallow' and 'Pearl-Breasted Swallow' clades with the joint 'Pacific Swallow' and 'Barn Swallow Clades' (Dor et al., 2010). We found that the latter two clades are sister to the 'Pearl-Breasted Swallow' clade (bootstrap support ¼ 84/100). The previous phylogeny (Dor et al., 2010) had suggested this relationship; however, a low bootstrap value of 54/100 prevented a definitive conclusion.
In addition to an updated tree topology, our whole mitogenome data also provide greater support for branch length values. Higher support within branch length is generally associated with more genetic data that agrees with the topology patterns and high levels of congruency (Wiens et al., 2008).
It is interesting that the whole mitogenomes presented here have clarified certain aspects of the Hirundo phylogeny while at the same time introducing an unresolved polytomy. An exact topology remains elusive even at the level of whole mitochondrial genomes. We suspect this is attributable to how closely relate these species are. Despite the benefits mitogenomes offer for phylogenetic studies, it is important to remember mitochondrial genes are linked and ultimately represent a single, matrilineal locus. We anticipate additional nuclear genomic markers will be of use in resolving all nodes of the Hirundo phylogeny. Future phylogenomic studies of Hirundo will investigate whether the nuclear gene tree agrees with the mtDNA tree presented here. The potential ILS associated with the polytomy between H. albigularis, H. smithii, and the H. nigrita clade will also be addressed using a phylonetwork approach during the nuclear genome analysis.