Sequencing and analysis of the complete mitochondrial genome of Datnioides campbelli (Datnioididae)

Abstract In this study, the complete mitochondrial genome sequence of the New Guinea tiger fish Datnioides campbelli (Whitley 1938) (Lobotiformes: Datnioididae) was sequenced by next-generation sequencing method. The assembled mitochondrial genome consists of 13 protein-coding genes, 22 transfer RNA genes, and two ribosomal RNA genes, with a length of 16,416 bp. The total base composition of the mitogenome of D. campbelli was 29.31% for A, 29.02% for C, 15.14% for G and 26.54% for T. A phylogenetic tree based on 13 protein-coding genes (PCGs) provides important molecular data for further phylogeographic and evolutionary analysis of Lobotiformes.


Introduction
Datnioides campbelli, commonly known as the New Guinea tiger fish, is an ornamental ray-finned fish belonging to the genus Datnioides (Family: Datnioididae, Order: Lobotiformes, Series: Eupercaria). The distribution of the New Guinea tiger fish is limited to the Gulf of Papua drainages and the coastal waters of New Guinea, occurring in brackish river mouths, coastal lagoons, and rivers above tidal influence (Roberts and Kottelat 1994). In this study, we focused on the New Guinea tiger fish due to its restricted distribution, high commercial value, and paucity of genetic research. This is the first report about the genomic study on D. campbelli. It provides a novel reference genome and important molecular data for Lobotiformes, which is a fundamental step toward resolving the phylogenetic relationships of the highly diverse rayfinned fish.

Materials
The specimen of D. campbelli (Figure 1) was collected from Guangzhou Lanhai Marine Technology Co., Ltd, Guangzhou city, Guangdong Province, China (latitude: 23 12 0 51 00 N, longitude: 113 28 0 6 00 E) and identified according to the morphological characters described in Roberts and Kottelat (1994). The muscle was preserved in 95% ethanol and stored at À80 C. A specimen was deposited at the National Freshwater Genetic Resource Center in Guangzhou city, Guangdong Province in China (https://cafs-germplasm.app. msorg.cn/, Yexin Yang is the contact person yangyexin@prfri. ac.cn) under the voucher number DA-camp-1.

Methods
Genomic DNA was extracted using a TIANamp Genomic DNA Kit (TIANGEN) and determined by a 0.8% agarose gel and QubitV R 2.0 fluorometer (Life Technologies, USA). The highquality genomic DNA was used to prepare DNA library with an insert size of 350 bp using NEB NextV R Ultra DNA Library Prep Kit for Illumina (NEB, USA) following manufacturer's recommendations. 2 Â 150 bp paired-end reads were generated on an Illumina Novaseq6000 platform using sequencing protocols provided by the manufacturer (Illumina, Inc., San Diego, CA). About 2.07 G raw data was generated. Raw reads from Illumina sequencing were subjected to adaptor trimming and filtering of low-quality reads by fastp v0.20.1(https://github.com/OpenGene/fastp) (Chen et al. 2018). The minimum length for reads after trimming was set to 150 nucleotides, and the quality threshold was set to Q20. The whole genome was assembled using SPAdes v.3.15.2 (http://cab.spbu.ru/software/spades/) (Lapidus et al. 2014) with '-plasmid' option and kmer sizes 33, 55, 77, 99 and 127. The assembled contigs included a mixture of sequences from organellar and nuclear genomes. We identified mitochondrial contigs using similarity searches by BLASTN 2.13.0þ against NCBI Nucleotide collection (nt) database. MitoFish (http:// mitofish.aori.u-tokyo.ac.jp/) (Iwasaki et al. 2013) was used to annotate the mitochondrial genome. The circular mitochondrial genome map of D. campbelli was visualized via CGView (Stothard and Wishart 2005). The study protocol was approved by the Laboratory Animal Ethics Committee of Pearl River Fisheries Research Institute, CAFS (number: LAEC-PRFRI-20201219).
The complete mitochondrial genome of D. campbelli was blasted against the GenBank database in NCBI, and 17 species highly similar to our D. campbelli, with Max score between 11,924 and 12,381, were selected to perform the phylogenetic analysis, including Banjos banjos (KT345965, Liu et al. 2016 Oh et al. 2021). Amongst, Monodactylus argenteus was selected as the outgroup. The alignment of concatenated 13 protein-coding genes (PCGs) from our D. campbelli together with the above 17 species were aligned using ClustalW in BioEdit (Hall et al. 2011) and then converted to Nexus file by PDGSpider 2.1.1.5 (Lischer abd Excoffier 2012), which was used as the import file for later phylogenetic analysis. A Bayesian phylogenetic tree was constructed based on 13 PCGs using MrBayes 3 (Ronquist and Huelsenbeck 2003). The specific settings were as follows: Four Metropolis-coupled Markov chain Monte Carlo (MCMC) analyses were run twice for 1,000,000 generations and sampled every 100 generations (mcmc ngen ¼ 1,000,000; nchains ¼ 4; temp ¼ 0.01; samplefreq ¼ 100; burnin ¼ 2500) using the default general-time-reversible þ gamma þ invariants (GTR þ G þ I) model of sequence  evolution and running until the standard deviation of the split frequencies was below 0.01. Then the first 25% of total trees were removed as burin-in, and the posterior probabilities (PP) was calculated by the remaining trees. The final tree was visualized in FigTree v1.4.4 (http://tree.bio.ed.ac.uk/).

Results
The complete mitochondrial genome of D. campbelli was 16,416 bp in length (GenBank: MZ930121), and included 13 protein-coding genes, 22 transfer RNAs, and two ribosomal RNA genes and a noncoding control region (D-loop) ( Figure  2). The order of the mitogenome of D. campbelli was identical with that of D. polota (MZ930122, unpublished), and the overall sequence identity between D. campbelli and D. polota was up to 84.4%. The overall base composition of the complete mitogenome of D. campbelli was 29.31% for A, 29.02% for C, 15.14% for G, 26.54% for T.
The mitogenome of D. campbelli had typical vertebrate organization. All the 13 mitochondrial protein-coding genes shared the same start codon ATG, except for COXI (GTG start codon). The complete stop codon, TAA, was present in ND2, COXI, ATP8, ATP6, COXIII, ND4, ND5 and ND6; TAG was present in ND1, ND3; and the incomplete stop codon 'T--' was found in COXII, ND4 and CYTB. Of all protein-coding genes, the longest was the ND5 gene (1,839 bp), and the shortest was the ATP8 gene (168 bp).
Phylogenetic analysis based on 13 PCGs revealed that Lobotiformes was more closely related to Lutjaniformes, compared with Pempheriformes and Acanthuriformes (Figure 3).

Discussion and conclusion
As the largest and most diverse group of vertebrates, rayfinned fishes account for half of all vertebrate species (Near et al. 2012;Hughes et al. 2018). In the present study, we first reported a novel mitochondrial genome of D. campbelli and the phylogeny inferred using the Bayesian method based on 13 PCGs showed that Lobotiformes is the sister lineage of Lutjaniformes, which is consistent with the previous study on the phylogeny of ray-finned fishes based on transcriptomic and genomic data (Hughes et al. 2018). Our study not only provides critical molecular data for further phylogeographic and evolutionary analysis of Lobotiformes, but also offers a basis for the later deep research on phylogenetic relationships in ray-finned fishes.

Authors' contributions
Hong Zhou: Performed the experiments, wrote the main manuscript and analyzed the data. Yexin Yang: Collected samples, designed the experiments and analyzed the data. Yi Liu: Performed the experiments and analyzed the data. Hongmei Song: Analyzed the data. Xuejie Wang: Analyzed the data. Sudong Xia: Conceptualization, revised the manuscript. Xidong Mu: Conceptualization, designed the experiments, Funding acquisition. All authors agree to be accountable for all aspects of the work.

Disclosure statement
The authors report no conflicts of interest. The authors alone are responsible for doing the research and writing the paper.

Data availability statement
The genome sequence data that support the findings of this study are openly available in GenBank of NCBI at (https://www.ncbi.nlm.nih.gov/ genbank/) under accession number MZ930121. The associated BioProject, SRA, and Bio-Sample numbers are PRJNA787735, SRR17211917, and SAMN23839792, respectively.