Complete chloroplast genome sequence and phylogenetic analysis of winter oil rapeseed (Brassica rapa L.)

Abstract Winter oil rapeseed ‘18 R-1’ (Brassica rapa L.) is a new variety that can survive in northern China where the extreme low temperature is −20 °C to −32 °C. It is different from traditional B. rapa and Brassica napus. In this study, the complete chloroplast (cp) genome of ‘18 R-1’ was sequenced and analyzed to assess the genetic relationship. The size of cp genome is 153,494 bp, including one large single copy (LSC) region of 83,280 bp and one small single copy (SSC) region of 17,776 bp, separated by two inverted repeat (IR) regions of 26,219 bp. The GC content of the whole genome is 36.35%, while those of LSC, SSC, and IR are 34.12%, 29.20%, and 42.32%, respectively. The cp genome encodes 132 genes, including 87 protein-coding genes, eight rRNA genes, and 37 tRNA genes. In repeat structure analysis, 288 simple sequence repeats (SSRs) were identified. Cp genome of ‘18 R-1’ was closely related to Brassica chinensis, B. rapa and Brassica pekinesis.


Introduction
Winter oil rapeseed (Brassica rapa L.) is a new cultivar used as oil crop in northern China. It can survive in fields where the extreme low temperature is À20 to À32 C in winter. It makes northern China grow winter rapeseed now where is spring rapeseed zone before (Wancang et al. 2010;Dongmei et al. 2014). Growing winter rapeseed has many advantages in northern China. Firstly, winter rapeseed sows in mid-August, and turns green in late March next year, harvests in early June which is one and a half months earlier than spring crops. So after harvests, it is possible for a succeeding crop such as maize, potato, millet, corn, buckwheat, vegetables and others (Sun et al. 2016). It can make full use of heat and light of this area and change the traditional 1-year-one-ripe pattern to a 2-year-three ripe pattern (Wang et al. 2009). In this way, it can avoid spring farming in this area and result in increasing land cover during winter. So winter rapeseed is a cover crop in winter and it will reduce soil surface dust (Xuefang et al. 2009).
Chloroplast (cp) is common in plant and other organism. Because it is simple, conservative and rearranged, it is mainly used to analyze the origin and evolution of species (Szymon et al. 2016). In our research, we find that winter rapeseed (B. rapa) is different from B. napus and B. rapa cultivars. It has strong cold tolerance and low growing point. So in this study, we used a winter oil rapeseed variety, '18 R-1' and constructed its whole cp genome. We compared the cp genome with other members of Brassica to make sure its genetic evolutionary relationship. It is expected that the results will provide a theoretical basis for the determination of phylogenetic status and future breeding research.

Sampling, DNA extraction, sequencing, and assembly
The experiments were set up in Gansu Agricultural University,China (N. 36.05 ,E. 103.87 ). '18 R-1' seeds were sowed in pot. Fresh leaves were collected at five-leaf-stage and were frozen in liquid nitrogen immediately then stored at À80 C until analysis. Genomic DNA was extracted by the modified method CTAB . After testing qualified, genomic DNA samples were broken into fragments with the mechanical interrupt method (ultrasonic). Then fragment purification, terminal repair, the addition of 3'terminal A and connection of sequencing connector were performed for fragmented DNA. The fragment size was selected by agarose gel Figure 1. Cp genome map of 18 R-1. Genes inside the circle are transcribed clockwise, and those outside are transcribed counterclockwise. Genes of different functions are color-coded. The darker gray in the inner circle shows the G þ C content, while the lighter gray shows the A þ T content. Table 1. List of genes annotated in the cp genomes of winter rapeseed.
electrophoresis, and the sequencing library was formed by PCR amplification. The library was inspected first, then the qualified library shall be sequenced and sequencing reading length was PE150. Sequencing was performed with an Illumina Hiseq 2500 platform (Nanjing, China, N. 31.14 , E. 118.22 ), yielding at least 11.02 GB of clean base. All of the raw reads were trimmed by Fastqc. The core module was assembled using SPAdes (Bankevich et al. 2012) software to assemble the chloroplast genome, independent of the reference genome.

Annotation and analysis of the cpDNA sequences
CpGAVAS was used to annotate the sequences. DOGMA (http://dogma.ccbb.utexas.edu/) and BLAST were used to check the results of the annotation (Liu et al. 2012;Wyman et al. 2004). The circular gene map of 18 R-1 was drawn using the OGDRAWv1.2 program (Lohse et al. 2007). An analysis of variation in synonymous codon usage, relative synonymous codon usage values (RSCU), codon usage, and GC content of the complete plastid genomes and commonly analyzed CDS was conducted. CpSSR analysis was performed using the MISA (Song et al. 2019).

Genome comparison
The mVISTA (Mayor et al. 2000) program was applied to compare the complete cp genome of '18 R-1' to the other published cp genomes of its related species.

Phylogenetic analysis
It used genome-wide analysis by setting the same starting points for ring sequences. Multiple sequence alignment was performed with MAFFT software (v7.427, auto mode). Sequence alignment data were trimmed with trimAl (v1.4.rev15). Then using RAxML v8.2.10 software (https://cme. h-its.org/exelixis/software.html) and GTRGAMMA model, we built maximum likelihood evolutionary tree with rapid Bootstrap analysis (Bootstrap ¼ 1000). Phylogenetic tree was constructed using 25 cp genome of the Cruciferae species sequences from the NCBI organelle genome and nucleotide resources database (Katoh et al. 2005;Lam-Tung et al. 2015;Huelsenbeck and Ronquist 2001;Xiayu et al. 2019).

RSCU analysis
RSCU is relative synonymous codon usage. Because of the degeneracy of codons, each amino acid corresponds to at least 1 codon and at most 6 codons. The utilization rate of genomic codon varies greatly among different species and organisms. RSCU is thought to be the result of natural selection, mutation and genetic drift.
Regardless of termination codon, UUA encoding 'Leu' was the most used codon, while GUG encoding 'Met' was the fewest used codon (Table 2; Figure 2). There are 30 codons with RSCU greater than 1, in which the third base are all ending in A/U. It indicated that winter rapeseed preferred to use the codon ending in A/U in the third base. There is only one codon, UGG, which RSCU is 1.

Repeat sequence and SSR analysis
By the REPuter analysis, there are 37 repeat sequences in the cp genome (Table 3). Except for one repeat with the length of 26,219 bp, the others are 30 bp to 58 bp. Most of the repeats are located in LSC region. Palindrome repeats are 18 while forward repeats are 14, and reverse and complement repeats are 3 and 2, respectively.
The cp genome has 288 SSRs, including 228 mononucleotide repeats which are mainly A and T, 17 dinucleotide repeats, 63 trinucleotide repeats and 6 tetranucleotide repeats (Figure 3). From the location of SSR distribution, the vast majority (63.50%) is located in LSC region, and 21.90% located in SSC region and 14.60% in IR region (Figure 4). The SSRs of tandem guanine (G) and cytosine (C) is fewer which   means it has strong A and T bias. Most SSRs are distributed in intergenomic region, followed by exon region, and intron region was the least. These repeated sequences can be applied to the development of molecular markers and provide guidance for the evolutionary study of winter rapeseed.

IR scope analysis
Cp genomes of other eight Cruciferous species were selected for comparative analysis of LSC/IRs and SSC/IRs boundaries with 18 R-1. The LSC/IRb boundary of 9 species located in the coding region of rps19, which spans two regions and is 166 bp at LSC region while 113 bp at IRb region. It is reported that LSC/IRb boundary is stable in many species (Zhao et al. 2019). In most species, IRb/SSC boundary lies in the overlap region between ycf1 gene and ndhF gene (Zhao et al. 2019). In 9 Cruciferous species the IRb/SSC boundary is ycf1 and ndhF too. At SSC/IRa boundary, ycf1 straddles the edge in seven species. There is no ycf1 in Brassica juncea. It is also special in '18 R-1' that ycf1 is shorter than others and 17,776 bp far from the edge. Near the edge of IRa/LSC, it is rpl2 in IRa region and trnH in LSC region ranging from 2 bp to 30 bp from boundary (Shi et al. 2020). In some plants, trnH is also common in IR region and rpl22 gene straddles the IRa/LSC boundary   (Figure 5).

Cp genome sequence homology analysis
Using mVISTA online software we assessed the difference of '18 R-1' and other nine Brassica species. The results showed that sequences of nine species were highly similar. There was also little variation in the length of each region. Collinearity analysis showed that the cpDNA sequences of nine species did not detect large fragments of gene rearrangement, indicating that the cpDNA sequences were relatively conservative ( Figure 6).

Phylogenetic analysis
Phylogenetic analysis was based on the complete cp genome from 14 Cruciferae species (Figure 6). Almost all confidence factors of branches are high (93-100) except for branch between 'Brassica rapa' and 'Brassica pekinesis'. The higher is the branch's confidence factor, the more consistent is the guiding value of the evolutionary analysis for the relationship. Capsella Rubella and Camelina sativa are early differentiated groups. '18 R-1' is a late group. It gathers together with Brassica chinensis first, then with B. rapa and Brassica pekinensis. It means cp genome of '18 R-1' was closely related to Brassica chinensis. Brassica rapa and Brassica pekinesis are located at the innermost of the branch which infers to they are probably the last group of Brassica to be differentiated (Figure 7).

Conclusions
In this study, we reported and analyzed the complete cp genome of a new cultivar '18 R-1' (B. rapa), a winter oil rapeseed in China. The cp genome was shown to be more conservative with similar characteristics to other Brassica species. An analysis of the phylogenetic relationships among nine species found '18 R-1' was closely related to B. chinensis. We can infer that it is different from B. rapa. This may be because '18-R' is an oil crop and the cp genome data for B. rapa published are from vegetable crops. The results of this study provide an assembly of a whole chloroplast genome of B. rapa used as oil crops which might facilitate genetics, breeding, and biological discoveries in the future.