The determination and analysis of the complete mitochondrial genome of Dario dario (Anabantiformes: Badidae)

Abstract The classification of Badidae family based on morphology has been revised several times, but data on complete mitogenome are scarce, the complete mitochondrial genome of the Badidae fish Dario dario was characterized for the first time in the present study. The whole mitogenome was 16,830 bp in size and consisted of 13 protein-coding genes, 22 tRNAs, two rRNAs genes, a control region and origin of light-strand replication. The proportion of coding sequences with a total length of 11,431 bp was 67.92%, which encoded 3800 amino acids. The genome composition was highly A + T biased (58.12%), and exhibited a negative AT-skew (–0.0045) and GC-skew (–0.2347). All protein-coding genes started with ATG except for GTG in CO1, while stopped with the standard TAN codons or a single T. The control region (D-loop) ranging from 15,658 bp to 16,830 bp was 1173 bp in size. Phylogenetic analysis showed that D. dario was most closely related to Badis badis. The complete mitochondrial genome sequence provided new insight into taxonomic classification, and a more complex picture of species diversity within the Anabantiformes.

Badidae is known as a small family (about 30 species), a few species have been described since, and the family currently comprises six species of Dario (R€ uber et al. 2004) and about 20 species of Badis (Basumatary et al. 2016). Dario dario (Hamilton, 1822), has been positioned as Badidae. It is a small predatory fish, mainly distributed in southeast of Asia and India. Specimens of D. dario collected by Zhejiang Engineering Research Center for Mariculture and Fishery Enhancement Museum (PH18146) from Yamuna River of New Delhi (28 36 0 50 00 N, 77 12 0 30 00 E) were identified by both the morphological features and the COI. Tissue samples stored at À20 C were preserved in 95% ethanol and total genomic DNA was extracted from muscle using the phenol-chloroform method (Barnett and Larson 2012). Sequences were amplified by PCR with long and accuracy Taq (LA-Taq) DNA polymerase (Takara, Tokyo, Japan) following the manufacturer's protocol and assembled by CodonCode Aligner 5.1.5 (CodonCode Corporation, Dedham, MA). The primers (Table S1) used in this study are the universal primers designed in conserved regions. Transfer RNA genes were generated by the program tRNAs-can-SE (Lowe and Eddy 1997); the composition and the relative synonymous codon usage (RSCU) were obtained using MEGA X (Kumar et al. 2018). The base compositional bias of the mitochondrial genome AT skew and GC skew was calculated using the formulae: AT-skew¼(A À T)/(A þ T); GC-skew¼(G À C)/(G þ C) (Perna and Kocher 1995).
Similar to the typical mitogenome of vertebrates, the mitogenome of D. dario deposited in GenBank (MT344964.1) is a closed double-stranded circular molecule of 16,830 bp including 13 protein-coding genes, two ribosomal RNA genes, 22 tRNA genes, and two main noncoding regions (Boore 1999). The overall contents of A, C, G, and T were 28.93%, 25.83%, 16.02%, and 29.19%. A-T and G-C contents were 58.12% and 41.88%, thereby with a high AT bias. Both ATskew and GC-skew of the mitogenome were negative (-0.0045, À0.2347). Most mitochondrial genes are encoded on H-strand except for ND6 and eight tRNA genes encoded on the other complementary strand. In addition, 55 base pairs in 11 intergenic spacers were found in the D. dario mitogenome, ranging from 1 to 34 bp in length. Simultaneously, nine overlapping sites (totally 28 bp) were observed in both PCGs and tRNA genes. Among them, the largest overlap is 10 nucleotides, between ATP6 and ATP8. The lengths of 12S rRNA and 16S rRNA were 950 bp and 1694 bp, while the length of control region was 1173 bp, ranging from 15,658 bp to 16,830 bp.
Thirteen PCGs were 11,431 bp (67.92%) and encoded 3800 amino acids. Moreover, the AT-skew (-0.0733) and GC-skew (-0.2918) for the PCGs in D. dario were negative. All the PCGs used the initiation codon ATG except for GTG in CO1. Besides, CO 2 , ND4, CytB ended by single T, ND3 ended by TAG, all the others ended by TAA. The base content of nucleotides differed in the sense strands of the PCGs (T, 31.05%; A, 26.80%; G, 14.92%; C, 27.22%). The overall A þ T content in the sense strands of the PCGs (57.85%) showed the obvious bias in the AT nucleotide composition. The values of RSCU showed that Leu2; Val; Ser1; Pro; Thr; Ala; Arg; Gly were higher codon usage at the same level encoded by four synonymous codons, while the others were lower codon usage encoded by either three or two codons.
The lengths of 12S rRNA and 16S rRNA were 950 bp and 1694 bp, it showed a positive AT skew (0.2138) and negative GC skew (-0.0755). The total length of the 22 tRNAs in the D. dario mitochondrial genome was 1553 bp, and the overall A þ T content of tRNAs was 56.60%. It had a positive AT skew (0.1081), but negative GC skew (-0.1246). The length of CR was 1173 bp, ranging from 15,658 bp to 16,830 bp, 420 nucleotides for A, 385 nucleotides for T, both of them accounting for 68.63% of the whole D-loop, the AT and GC skew values were 0.0435 and À0.1957.
Based on the Akaike information criterion (AIC), GTR þ GþI þ F was indicated as the best-fitting substitution model for the phylogenetic relationship analysis. In Figure 1, it is obvious that D. dario was most closely related to B. badis; these two species formed a monophyletic clade with high support value constituting a Badidae group. Besides, Anabantidae þ Helostomatidae þ Osphronemidae forms a monophyletic clade, and formed sister branches with Badidae þ Pristolepididae þ Channidae. Phylogenetic analysis was used to get a clear understanding of classification status, and here better clarification of the phylogenetic classification of D. dario. The more discovery of these species will further promote more research on Badidaes.

Disclosure statement
20200202], and Starting Research Fund from the Zhejiang Ocean University.

Data availability statement
The genome sequence data that support the findings of this study are openly available in GenBank of NCBI at https://www.ncbi.nlm.nih.gov/ under the accession no. MT344964.1.