Describing novel mitochondrial genomes of Antarctic amphipods

Abstract To date, only one mitogenome from an Antarctic amphipod has been published. Here, novel complete mitochondrial genomes (mitogenomes) of two morphospecies are assembled, namely, Charcotia amundseni and Eusirus giganteus. For the latter species, we have assembled two mitogenomes from different genetic clades of this species. The lengths of Eusirus and Charcotia mitogenomes range from 15,534 to 15,619 base pairs and their mitogenomes are composed of 13 protein coding genes, 22 transfer RNAs, 2 ribosomal RNAs, and 1 putative control region CR. Some tRNAs display aberrant structures suggesting that minimalization is also ongoing in amphipod mitogenomes. The novel mitogenomes of the two Antarctic species have features distinguishing them from other amphipod mitogenomes such as a lower AT-richness in the whole mitogenomes and a negative GC- skew in both strands of protein coding genes. The genetically most variable mitochondrial regions of amphipods are nad6 and atp8, while cox1 shows low nucleotide diversity among closely and more distantly related species. In comparison to the pancrustacean mitochondrial ground pattern, E. giganteus shows a translocation of the nad1 gene, while cytb and nad6 genes are translocated in C. amundseni. Phylogenetic analysis based on mitogenomes illustrates that Eusirus and Charcotia cluster together with other species belonging to the same amphipod superfamilies. In the absence of reference nuclear genomes, mitogenomes can be useful to develop markers for studying population genetics or evolutionary relationships at higher taxonomic levels.


Introduction
Mitogenome DNA sequence data or parts of mitogenomes have been widely used to reconstruct evolutionary relationships or detect cryptic diversity (Caterino et al. 2000;Tang et al. 2020). For instance, in amphipods, sequencing mitochondrial cox1 or cytb together with nuclear genes (e.g. 18S, 28S, ITS2) has revealed cryptic species of Hyalella S.I. Smith, 1874 (Witt et al. 2006), Caprella penantis Leach, 1814 (Pilar Cabezas et al. 2013), Gammarus fossarum Koch, 1836 (Grabowski et al. 2017) and some Eusirus Krøyer, 1845 species (Baird et al. 2011). Molecular data from 13 protein coding genes of Alicella gigantea Chevreux, 1899(Li et al. 2019b, Baikalian amphipods (Romanova et al. 2016), Gammarus roeselii Gervais, 1835 (Cormier et al. 2018), Halice sp. Boeck, 1871 (Li et al. 2019a), Metacrangonyctidae Boutin & Messouli, 1988(Bauz a-Ribot et al. 2012, and from all mitochondrial genes (protein coding genes, rRNA, tRNA) of Gammarus pisinnus Hou, Li & Li, 2014 and Gammarus lacustris G.O. Sars, 1863  have been used to reconstruct evolutionary relationships. The broad application of molecular data from mitogenomes can be explained by several advantages, which the mitogenome has compared to the nuclear genome. These include its simpler structure, conserved gene content and limited size (Boore 1999;Li et al. 2019b;Krebes and Bastrop 2012) facilitating sequencing of mitogenomes from those species for which reference nuclear genomes are not yet available. The uniparental, usually maternal inheritance of mitogenomes furthermore simplifies analyses because recombination is either totally absent or very rare (Barr et al. 2005, Lin andDanforth 2004). The relatively high evolutionary rate of mitogenomes generating relative large genetic differences makes mitogenomic DNA sequence data furthermore suitable for studies at the genus or species level investigating population genetic or and phylogeographic patterns (Ballard and Whitlock 2004;Krebes andBastrop 2012, Tang et al. 2020;Li et al. 2019a). The inclusion of whole mitogenomes has resulted in phylogenies with better statistical supports (Haran et al. 2013; L opez-L opez and Vogler 2017) and clearer phylogeographic patterns (Keis et al. 2013). Moreover, despite the highly conserved gene content of the mitogenome, gene order has been found to be variable and can provide additional data for reconstructing phylogenetic relationships and evolutionary histories (Cormier et al. 2018; Krebes and Bastrop 2012;Zhang et al. 2020).
Amphipods are widely distributed crustaceans inhabiting a range of different habitats (V€ ain€ ol€ a et al. 2008;Li et al. 2019b). In Antarctica, amphipods are among the most diverse components of the benthic community (Gallardo 1987) and show high levels of endemism (Knox and Lowry 1977) making them ideal model organisms to study evolutionary patterns and divergences based on mitogenomes. Currently, there is only one published complete mitogenome of an Antarctic amphipod, namely of Gondogeneia antarctica Chevreux, 1905(Shin et al. 2012, and no mitogenomes are yet available for abundant amphipods of the genera Eusirus Krøyer, 1845 and Charcotia Chevreux, 1905. In this paper, we have assembled and analyzed complete mitogenomes of three Antarctic amphipods from two morphospecies (Charcotia amundseni d'Udekem d'Acoz, Sch€ on & Robert, 2018 and Eusirus giganteus Andres et al., 2002) and two genetic clades of the latter species. Our aims are to (1) provide full mitogenomic data of selected amphipod species for future research and (2) compare gene content and order with published amphipod mitogenomes to unravel shared and unique patterns of mitogenome evolution in amphipods.

Sample collection
Specimens of two species of Antarctic amphipods, Charcotia amundseni d'Udekem d'Acoz, Sch€ on & Robert, 2018 and two genetic clades of Eusirus giganteus Andres et al., 2002 (G1 and G2; which might resemble different genetic species (Verheye and D'Udekem D'Acoz 2021) have been collected during different Antarctic expeditions (Table 1) and are curated in the collections of the Royal Belgian Institute of Natural Sciences, Brussels, Belgium.
Eusirus amphipods belong to the superfamily Eusiroidea Stebbing, 1888. Eusirus cf. giganteus has previously been confused with Eusirus perdentatus Chevreux, 1912 due to small morphological differences (Andres et al. 2002). The genetic study of Baird et al. (2011) reveals cryptic diversity of Eusirus giganteus including the so-called clades G1-G4, and the existence of a species complex is supported by Verheye and D'Udekem D'Acoz (2021). The same authors report that potential Eusirus giganteus species that still need to be formally described showed at least minor morphological differences and different color morphs but that a thorough morphological analysis of the putative genetic species is still required. Given the possibility of multiple cryptic species, we follow here the suggestion of Greco et al. (2021) to use the name Eusirus cf. giganteus in our study. Our other target species, Charcotia amundseni, belongs to the superfamily Lysianassoidea Dana, 1849. The genus Charcotia has formerly been known as Waldeckia (Chevreux 1905) but recently has undergone a change in nomenclature (D'Udekem D'Acoz et al. 2018) which we follow here.
Mitochondrial genome sequencing, assembly, annotation, and analyses DNA has been extracted from a pleopod of each specimen using the DNeasy Blood & Tissue Kit (Qiagen, Germany) for both Eusirus cf. giganteus clades and the Qiamp DNA Minikit (Qiagen, Germany) for Charcotia amundseni following the manufacturer's protocol. DNA concentration and quality have been checked with a Nanodrop ND-1000 Spectrophotometer (ThermoFisher Scientific, USA) and a Qubit 2.0 fluorometer (Life Technologies, USA).
A low coverage skimming sequencing approach has been applied at the Genomics Core at the KU Leuven (Leuven, Belgium) using an Illumina HiSeq2500 sequencing platform in the 2 Â 125 bp mode. Samples were indexed separately as unique libraries. Reads have been quality-checked using FASTQC (Andrews 2010) and pre-processed with Geneious Prime 2019 v1.8.0 (https://www.geneious.com) by merging paired reads, removing duplicates and trimming of lowquality ends using the BBDuk trimmer in Geneious with the minimum quality set to 20. These pre-processed reads have then been used for de novo assemblies in MITObim v1.9.1 (Hahn et al. 2013) with the MIRA 4.0.2 (Chevreux et al. 1999) assembler with default settings (kmer size ¼ 31) and an iteration limit of 100. The Onisimus nanseni G.O. Sars, 1900 mitogenome (GenBank accession number FJ555185.1) which belongs to the same superfamily as Charcotia and a partial 16S to COI sequence of Eusirus perdentatus have been used as seed references. The longest resulting contigs from the de novo assembly have been imported into Geneious and further assembled with the 'map to reference' approach with medium-low sensitivity and 50 iterations. Identity of the resulting consensus sequences have been verified with BLAST searches (Altschul et al. 1997). Automatic annotation has subsequently been conducted with the MITOS web server, versions 1 and 2 (Bernt et al. 2013). The identity of the rrnL region of both Eusirus species has been confirmed by BLAST searches only, since it has not been annotated by MITOS. The resulting annotations have been viewed and gene boundaries manually corrected in Geneious. The boundaries of the 13 protein coding genes and 2 rRNA genes have been identified by comparing alignments of the novel assemblies with mitochondrial genes of other amphipod species. Protein coding genes boundaries have been further corrected by avoiding any overlap with the subsequent tRNA gene and by noticing any partial stop codons (T or TA). Such partial stop codons are atypical features of mitochondrial protein coding genes (Cameron 2014 (Zhang and Hewitt 1997). Gene orders of the novel mitogenome assemblies were compared to the putative pancrustacean ground pattern which is derived from both Crustacea and Hexapoda (often referred to as pancrustacea) as they share the same ground pattern in terms of their mitochondrial gene order (Kilpert and Podsiadlowski 2006;Boore et al. 1998). Possible gene rearrangements have been analyzed with the CREx web service (Bernt et al. 2007). CREx utilizes a strong common interval tree to heuristically deduce the plausible rearrangement scenarios to change one gene order to another (Bernt et al. 2007). AT and GC skew have been calculated using the formulas of Perna and Kocher (1995) Only other amphipod species with complete and published mitogenomes have been analyzed for their AT and GC skew (Supplementary Table 1). Nucleotide diversity (p) has been computed for each protein coding gene with DnaSP v6.12.03 (Rozas et al. 2017).
To verify the phylogenetic position of the studied species, the three novel and assembled mitogenomes were supplemented with data from other amphipod species for phylogenetic reconstructions. Published amino acid sequences of 13 protein coding genes were obtained from GenBank and aligned separately for each gene using MAFFT v7.0 online (Katoh et al. 2019), together with the amino acid sequences of the current study. The resulting alignments were concatenated with Geneious Prime 2019 v1.8.0 (https://www.geneious. com) and trimmed with Bioedit v7.2.5 (Hall 1999) with additional checking by eye. The MtArt þ G þ F was chosen as the best fitting model of molecular evolution as identified with ModelGenerator v0.85 (Keane et al. 2006) using four discrete categories for gamma distribution. Phylogenetic analyses based on maximum likelihood methods were carried out using PhyML v3.0 (Guindon and Gascuel 2003) with 1000 bootstrap replications. Bayesian inference was conducted with MrBayes v3.2.7 (Ronquist and Huelsenbeck 2003) with 1 million generations, tree sampling every 1000th generation, and 10% of the initial trees being discarded as burn-in.

Mitogenome organization
The total length of the obtained complete mitochondrial genomes of Eusirus cf. giganteus (G1), Eusirus cf. giganteus (G2), and Charcotia amundseni is 15,558, 15,534, and 15,619 bp, respectively (Genbank accession nos. OK489458, OK489459, OK489457, respectively) which is within the range of complete mitogenomes from other amphipods (13,517-18,424 bp) ( Table 2). The three newly assembled mitogenomes are each composed of 13 protein coding genes, 22 tRNAs and 2 rRNAs. For E. cf. giganteus, 23 genes are encoded on the positive (þ) strand and 14 on the negative (-) strand while 17 genes are encoded on the þ strand and 20 on thestrand in C. amundseni (Supplementary Figure 1a and b, Supplementary Table 2). A putative control region (CR) has also be identified in all three mitogenomes and is located between trnS2 and rrnL in Eusirus and between trnF and nad5 in C. amundseni. The mitogenome also contains 20 intergenic regions for E. cf. giganteus and 18 intergenic regions for C. amundseni. The whole mitogenomes of the two species show AT-richness of 61.9% for E. cf. giganteus and 68.7% for C. amundseni, respectively, which contributes to the positive AT skew (0.008 to 0.092) and negative GC skew (À0.317 to À0.201) values observed in the three mitogenomes (Table 2). A relatively high AT content is also observed in the complete mitogenomes of other amphipod species varying from 61.09 to 77% (Table 2).

Gene order and rearrangements
A translocation of nad1 gene in E. cf. giganteus is observed while cytb and nad6 are translocated in C. amundseni as compared to the pancrustacean ground pattern (Supplementary Figure 2). We furthermore also find shifts in the position of tRNAs and the control region in E. cf. giganteus and C. amundseni as compared to the pancrustacean ground pattern (Supplementary Figure 2). While also trnG has been translocated in the three species investigated here, we find other tRNA gene strings consisting of trnA, trnS1, trnR, trnN, and trnE for E. cf. giganteus and trnS1, trnN, trnE, and trnF for C. amunseni (Supplementary Figure 2). Similar with the pancrustacean ground pattern, the trnV is located between the rrnL and rrnA genes in E. cf. giganteus while trnC and trnV are inserted between these genes in C. amundseni (Supplementary Figure 2).
Results of the CREx analyses indicate that E. cf. giganteus and C. amundseni have undergone multiple transpositions and rearrangements relative to the pancrustacean ground pattern (Supplementary Figure 3b).

Protein coding genes
The most frequent start codon in E. cf. giganteus and C. amundseni is ATG (Supplementary Table 2). Defining the protein coding gene boundaries following a tRNA results in a few partial or incomplete stop codons (T or TA). The AT content of the protein coding genes of the three amphipod mitogenomes is estimated as 59.6% for E. cf. giganteus (G1 and G2) and 67.3% for C. amundseni (Table 2). Mitochondrial genomes of the two species in this study have negative GC skew values in the protein coding genes encoded on both strands (Supplementary Table 4). The highest AT content is found in the third codon position of C. amundseni and the second codon position of E. cf. giganteus while the lowest AT content is observed in the first codon position in all three species. (Supplementary Table 3).

Ribosomal RNA
The two ribosomal RNA (rrnS and rrnL) genes in the three mitogenomes are located on the negative (À) strand. In the two Eusirus species, the length of both RNAs is 681 bp and 871 bp, respectively (Supplementary Table 2). Unlike in E. cf. giganteus, the two mitochondrial rRNAs of C. amundseni are shorter (529 bp and 739 bp) (Supplementary Table 2). The mitochondrial rRNA genes of the two amphipod species in this study also show a high AT content with 69.7% for E. cf. giganteus (G1 and G2) and 70.8% for C. amundseni.

Transfer RNA
In the three mitogenomes of this study, 22 tRNAs are present with a length ranging from 52 to 67 base pairs (Supplementary Table 2). The AT content of tRNAs of E. cf.
giganteus is 66.2% and 69.9% for C. amundseni (Table 2). In E. cf. giganteus, 14 tRNAs are encoded in the þ strand and 8 in the À strand. In C. amundseni 10 tRNAs are encoded in the þ and 12 in the À strand. Typical clover leaf secondary structures are observed in most predicted tRNAs although some tRNAs show wobble base pairs, atypical pairing or the DHU and TWC arm are missing (Supplementary Figure 4a-c). More specifically, the DHU arm is missing in trnS1, trnS2 and trnV of E. cf. giganteus ( Supplementary Figure 4a and b) and trnS1, trnS2, and trnI of C. amundseni (Supplementary Figure  4c). We also find that the TWC loop is absent in trnK, trnD, trnN, trnM, trnS2, trnI, and trnQ of E. cf. giganteus ( Supplementary Figure 4a and b) and in trnD, trnH, trnL1, trnC, trnV, trnQ, trnK, and trnM of C. amundseni (Supplementary Figure 4c).

Phylogenetic analysis
Phylogenetic analysis revealed that the phylogenetic grouping follows the superfamily identity and fall under different family groups (Figure 1)

Discussion
In the current study, we have assembled and annotated novel complete mitogenomes from two Antarctic amphipod species with a low coverage skimming sequencing approach. We have obtained very low percentages of ambiguities (<0.01%) illustrating that this cost efficient approach is very successful. Besides our study, only two complete mitogenomes from amphipods of the polar regions are currently available, namely, from Gondogeneia antarctica Chevreux 1905 from Antarctica (Shin et al. 2012) and Onisimus nanseni G.O. Sars, 1900 from the Arctic (Ki et al. 2010). Our study thus provides important novel genomic data for further research and the first complete mitogenomes of the widely spread amphipod genera Eusirus and Charcotia. Our comparisons of mitogenomes between two genetic clades, possibly resembling two different genetic species of E. cf. giganteus illustrate that mitogenomic features such as length, gene order, AT content, and tRNA structure are similar at the intraspecific level (Table 2 (Romanova et al. 2016;Li et al. 2019b). The observed AT-richness of the mitogenomes of the current study (61.9% and 68.7%) is slightly lower than in other studies based on complete (Table 2) and incomplete amphipod mitogenomes where AT range between 69.79% and 74.35% (Li et al. 2019a). However our data are in line with Wilson et al. (2000) reporting such an AT-rich bias as typical for arthropods.
The negative GC skews on both strands of the protein coding genes of the two species in this study differ from the so far known common Malacostraca pattern where genes encoded on the þ strand usually exhibit negative and genes encoded on the À strand positive GC skews (Hassanin 2006). The strand bias in nucleotide composition of metazoan mitogenomes is attributed to varying mutational pressure during replication or transcription (Pons et al. 2014;Hassanin et al. 2005). Future research will need to test if these factors are responsible for the different GC patterns observed in the two Antarctic amphipod species of the current study.

Gene order and rearrangements
The translocations of trnG and a commonly derived pattern of a gene string consisting of trnA, trnS1, trnN, trnE, and trnR are presumed to be apomorphic features of certain amphipods (Kilpert and Podsiadlowski 2010;Krebes and Bastrop 2012;Li et al. 2019a). The two studied species exhibit the translocation of trnG relative to the pancrustacean ground pattern. However, the altered tRNA gene order of the two species results in a unique tRNA string that is dissimilar to the apomorphic gene string of trnA, trnS1, trnN, trnE, and trnR. Moreover, the observed rrnL-trnV-rrnS pattern of E. cf. giganteus is known to be common in most Malacostraca (Ki et al. 2010) and is also observed in the pancrustacean ground pattern. This is, however, not the case for C. amundseni with the trnC being present. In addition, the large-scale gene reversals that have been found in three species have also been observed in Halice sp. Boeck, 1871 (Li et al. 2019a). It may be attributed to intramitochondrial recombination allowing breaking and rejoining of the mitochondrial genome (Dowton and Austin 1999;Li et al. 2019a).

rRNA genes
The shortest complete rrnL in amphipods of 577 bp is currently known from Hirondellea gigas Birstein & Vinogradov, 1955(Lan et al. 2016, which is much shorter than the rrnL that has been found in the species of the current study. On the other hand, the rrnS of C. amundseni has with 529 bp the same length as Alicella gigantea (Li et al. 2019b) which has so far been the shortest reported rrnS length in amphipods. Also, the shortest total length of rrnL and rrnS together has been described from the amphipod Hirondellea gigas Birstein & Vinogradov, 1955(Lan et al. 2016 with 1120 bp, and we find that the total length of the two rRNAs in C. amundseni is with 1268 bp rather similar. Short rRNA genes have also been observed in Gammarus duebeni Lilljeborg, 1852 (Krebes and Bastrop 2012) where they have been attributed to a minimization strategy of the mitogenome.  (Li et al. 2019a) and 'Metacrangonyx boveii' (Pons et al. 2014). The absence of the TWC loop is another aberrant and common structure in amphipod that has also been observed in trnC, trnE, and trnT of Caprella mutica Schurin, 1935 (Kilpert and Podsiadlowski 2010), trnQ and trnV of Gammarus duebeni Lilljeborg, 1852 (Krebes and Bastrop 2012), and trnC, trnQ, trnK, and trnF of Onisimus nanseni G.O. Sars, 1900 (Ki et al. 2010). The pressure for minimization of the mitogenome has been put forward as one of the explanations for these aberrant tRNA structures (Yamazaki et al. 1997). Other explanations could be replication slippage resulting in sequence deletions or insertions (Macey et al. 1997). Despite these aberrant structures in tRNAs, these are most likely still functional (Watanabe et al. 2014).

Nucleotide diversity
Information on nucleotide diversity can be helpful for the design of new molecular markers (Romanova et al. 2016;Zhang et al. 2018). Here, we have shown that the most variable mitogenes of Eusirus for intraspecific comparisons between genetic clades are nad6, nad5, and nad1 while for comparisons between Eusirus and Charcotia, atp8, nad6, and nad2 are most variable, which could be suitable for future phylogeographic and population genetic studies. Contrary, the least variable mitogenes for Eusirus are nad4, nad3, nad2, and cox1 and for interspecies comparisons between Eusirus and Charcotia cox1, cytb, and cox3 could be more suitable for future deep phylogeny investigations. Surprisingly, despite its wide use in DNA barcoding initiatives (Witt et al. 2006;Hebert et al. 2003), the cox1 gene appears to have relatively low nucleotide diversities between closely and distantly related amphipods. Consistent with our results, also Romanova et al. (2016) describe the mitogenes atp8, nad2, nad4l, nad5, and nad6 as most variable in Baikalian amphipods and cox genes to be less variable, with cox1 having the lowest nucleotide diversity.

Phylogenetic analysis
Our evolutionary tree (Figure 1) constructed from mitochondrial protein coding genes is well supported and shows phylogenetic clades according to amphipod superfamily identity. Moreover, our results are congruent to the current taxonomic classification where the species were categorized into their respective family and superfamily (Horton et al. 2021). Previous classification have placed Eusirus in the same Eusiroidea superfamily as Epimeria (Bousfield 1978) while the recent classification have placed Eusirus under superfamily Eusiroidea and Epimeria under Iphimedioidea (Lowry and Myers 2017). Phylogenetic evidence using 18S rDNA have shown that Eusirus has a close relationship with Epimeria which showed a well-supported clade of Eusiridae, Calliopiidae, Astyridae, Iphimediidae, Epimeriidae, and Pleustidae families (Englisch 2001). Phylogenetic evidence using 13-protein coding genes further corroborates these close relationships (Figure 1).
Our grouping of Charcotia amundseni with other species from the superfamily Lysianassoidea (Figure 1) is supported with the morphological phylogeny of Lowry and Myers (2017), which characterized this superfamily as often having a type 3 lysianassoid calceolus and a cleft telson. Molecular phylogenetic analyses using concatenated 16S-COX1-18S Figure 1. Phylogenetic tree based on the concatenated 13 protein coding genes amino acid alignment using maximum likelihood and Bayesian methods. Only bootstrap values of !50 (above the nodes) and posterior probabilities >0.80 (below the nodes) are shown. Scale bar corresponds to the number of substitutions per site. Target species of the current study are indicated in bold. The Genbank accession numbers for the mitochondrial genomes are shown in the parenthesis. data in Ritchie et al. (2015) show clustering of families and superfamilies similar to our study which further backs up our results. The phylogenetic grouping of the two morphospecies invested here based on the three novel mitogenomes thus follow the expected patterns according to taxonomic relationships.

Conclusions
The current study provides three additional novel complete mitogenomes of Antarctic amphipod species and the first complete mitogenomes of the abundant amphipod genera Eusirus and Charcotia. In comparison to other published amphipod mitogenomes, the novel mitogenomes show distinct features such as a lower AT-richness in their whole mitogenomes, negative GC skews on both strands of the protein coding genes, and unique gene rearrangements. The novel mitogenomes also share characteristics with other amphipod mitogenomes including aberrant tRNA and short rRNA genes, which could be linked to minimalization of mitogenomes. Moreover, the estimation of the nucleotide diversity (p) provides information to choose mitogenes as most suitable markers for future phylogenetic studies of amphipods. The novel mitogenomes are certainly useful for future phylogenetic analyses as put the investigated species into phylogenetic positions matching superfamily and family identity.