Diversity and authentication of Rubus accessions revealed by complete plastid genome and rDNA sequences

Abstract Complete plastid genome (plastome) and ribosomal DNA (rDNA) sequences of three Rubus accessions (two Rubus longisepalus and one R. hirsutus) were newly assembled using Illumina whole-genome sequences. Rubus longisepalus Nakai and R. longisepalus var. tozawai, described as different varieties, have identical plastomes and rDNA sequences. The plastomes are 155,957 bp and 156,005 bp and the 45S rDNA transcription unit sizes are 5809 bp and 5811 bp in R. longisepalus and R. hirsutus, respectively. The 5S rDNA transcription unit is an identical 121 bp in three Rubus accessions. We developed three DNA markers to authenticate R. longisepalus and R. hirsutus based on plastome diversity. Phylogenomic analysis revealed that the Rubus species classified as two clades and R. longisepalus, R. hirsutus, and R. chingii are the most closely related species in clade 1.


Introduction
The genus Rubus consists of about 500 species, for which the taxonomy remains unclear due to frequent hybridizations, polyploidization, and asexual reproduction (Alice and Campbell 1999;Wang et al. 2016;Hyt€ onen et al. 2018). The genus has been divided into 12 subgenera (Focke 1910(Focke , 1914. However, this classification is not unanimously supported, and each subgenus has been reported to be nonmonophyletic (Alice and Campbell 1999;Yang et al. 2012;Wang et al. 2016;Hummer et al. 2019). Even though previous studies contributed to current phylogenetic outline, short barcode regions such as internal transcribed spacer (ITS) and universal barcoding loci in the plastid genomes (plastome) have its own limitations (Li et al. 2015). Recently, nuclear genome and whole plastomes were used to analyze phylogenetic relationships among members of the genus Rubus and the chromosome scale genome assembly was released for R. occidentalis (VanBuren et al. 2016;Jibran et al. 2018;VanBuren et al. 2018;Hummer et al. 2019;Yang et al. 2021).
A super-barcoding approach using whole plastomes offers a solution to the limitations of using short barcoding regions to clearly distinguish inter-and intra-species diversity (Hollingsworth et al. 2009;Li et al. 2015). Since the plastome is inherited maternally in many plants, the absence of recombination preserves genome size, number of genes, and gene order in most plants (Palmer 1985;Wicke et al. 2011). However, sufficient variations are accumulated between species to allow estimation of their evolutionary path (Wolfe et al. 1987).
Nuclear ribosomal DNA (rDNA) exists in the plant nuclear genome in the form of thousands of tandem repeat arrays (Roa and Guerra 2012). Despite being part of the nuclear genome, its sequences are very conserved (Malinska et al. 2010). However, the internal transcribed sequences (ITS1 and ITS2) separating subunits of 45S rDNA (18S, 5.8S, and 28S) possess a meaningful level of variation among species ( Alvarez and Wendel 2003). Whole-genome sequences produced by second-and third-generation sequencing platforms allow complete plastome and rDNA sequences to be assembled simultaneously in a time-and cost-effective manner (Kim et al. 2015a;Kim et al. 2015b). Comparison of plastomes and rDNA sequences have proved very useful for phylogenetic analysis and development of barcoding markers (Kim et al. 2017;Lee et al. 2019;Nguyen et al. 2020;Lee et al. 2021).
Rubus longisepalus Nakai, R. longisepalus var. tozawai (Nakai) T.B.Lee, are endemic to the Southern coasts and islands of the Korean Peninsula while R. hirsutus Thunb are distributed widely in Eastern Asia. R. longisepalus Nakai and R. longisepalus var. tozawai are regarded as distinct varieties with the common names 'Macdo' and 'Geoje,' respectively. R. hirsutus has a similar habitat and morphology as the two R. longisepalus varieties. Therefore, clear taxonomic identification and development of molecular markers are necessary for distinguishing these edible plant resources on the Korean Peninsula.

Plant materials and genome sequencing
Leaf samples of three Rubus accessions were provided from the Hantaek Botanical Garden, Gyeonggi-do, Republic of Korea. Each sample was ground into powder form using liquid nitrogen, and DNA was extracted using an Exgene Plant SV Midi Kit (Geneall Biotechnology, Seoul) following the manufacturer's protocol. The extracted DNA was sequenced on the Illumina Miseq platform by Phyzen (www.phyzen.com, Seongnam, Gyeonggi-do). Approximately 1.3 Gbp paired-end sequence data were obtained for each of the three accessions.

Assembly and annotation of plastomes and rDNAs
Plastomes and 45S rDNA sequences were assembled using the de novo assembly of low-coverage whole-genome sequencing (dnaLCW) method (Kim et al. 2015b). To summarize, raw reads were trimmed using the trimming tool in CLC Assembly and then assembled de novo using the CLC novo assembly tool (CLC Inc, Denmark). Only contigs with similarity to the reference plastid genome (Rubus trifidus, NC_046585.1) were extracted using MUMmer (Kurtz et al. 2004). Contigs structurally identical to the reference plastome were then extracted, and assembly of the three Rubus plastomes was completed through manual curation. The complete plastomes were annotated using GeSeq (https://chlorobox.mpimp-golm. mpg.de/geseq.html), with manual curation using artemis (Carver et al. 2012;Tillich et al. 2017). Finally, a gene map was drawn using OGDRAW (https://chlorobox.mpimp-golm. mpg.de/OGDraw.html) (Greiner et al. 2019). The 45S rDNA sequences were assembled in the same way. Contigs similar to the reference (Sorbus commixta, MN215997.1) were selected and curated manually. After assembly, each subunit (18S, ITS1, 5.8S, ITS2, 28S) was determined using RNAmmer followed by comparison with a reference (Lagesen et al. 2007). The 5S rDNA sequences were assembled using the reference mapping method. Reads were first mapped to the reference (Arabidopsis thaliana, AF330993.1), and then different positions were modified. Intergenic spacer regions (IGS) in 45S rDNA and 5S rDNA were characterized by extending the end position of the rDNA unit through read mapping. Extension of the IGS proceeded until the IGS sequence met the start position of the next rDNA subunit. Manual curation was then conducted to obtain complete rDNA repeats sequences.

Polymorphism and marker development
The three completed chloroplast genomes and rDNA sequences were aligned using the MAFFT online version (Katoh et al. 2019). Plastome and rDNA variants were confirmed from the alignment results. Among the polymorphic regions, two single nucleotide polymorphisms (SNPs) and one insertion and deletion (InDel) region were selected for marker development. The two SNPs were developed into derived cleaved amplified polymorphic sequences (dCAPS) markers using dCAPS finder 2.0 (http://helix.wustl.edu/dcaps/) (Neff et al. 2002) and the InDel region was developed into a codominant marker. The three primer sets for these markers were validated in silico using NCBI primer blast (Ye et al. 2012) before adapting them to the three Rubus species (Table 1).

Phylogenetic analysis
A phylogenetic tree was reconstructed using coding sequences (CDSs) in the plastome. Sequences representing 11 additional species of the genus Rubus and three outgroup species also belonging to the family Rosaceae were obtained from NCBI GenBank (https://www.ncbi.nlm.nih.gov/genbank/). Only 74 CDSs common to the 16 species were extracted by FeatureExtract (Wernersson 2005). These sequences were concatenated into one contig. The 16 CDS contigs were aligned using PRANK with the translate option (L€ oytynoja 2014), and a phylogenetic tree was reconstructed using the maximum-likelihood method in MegaX with 1000 bootstrap replicates (Kumar et al. 2018).

Characteristics of complete plastomes
Assembled plastomes have distinct quadripartite structures consisting of one long single copy (LSC), one short single copy (SSC), and two inverted repeats (IRb and IRa  Figure  1). Analysis of nucleotide variations between R. longisepalus and R. hirsutus revealed 1882 SNPs and 325 InDels.

Marker development
We developed molecular markers based on the polymorphism between plastomes of R. longisepalus and R. hirsutus, and applied these to the three Rubus accessions. Sequence-based alignment of two dCAPS markers based on SNP regions and one codominant marker based on an InDel region confirmed their targets as polymorphic regions. All three markers could successfully distinguish R. longisepalus and R. hirsutus (Figure 2), validating the sequence assembly.

Phylogenetic analysis
To elucidate phylogenetic locations of R. longisepalus and R. hirsutus, plastomes of 11 additional species of the genus Rubus and three other species of the family Rosaceae were retrieved from NCBI GenBank. A total of 74 common CDSs were used to reconstruct and analyze a phylogenetic tree (Figure 3). Ten of the 13 Rubus species are classified into two subgenera in the    sequences are the same among all three accessions. The IGS of 45S rDNA are different among all three accessions, while the IGS of 5S rDNA are the same in the two R. longisepalus accessions but differ from those of R. hirsutus.

Discussion
Completion of three newly assembled Rubus plastomes and rDNA sequences allowed us to identify their polymorphisms and phylogenetic relationships. Two accessions of R. longisepalus, known to represent the same species but classified as different varieties, have identical plastomes and rDNA sequences. Despite large variations between R. longisepalus and R. hirsutus, they are the most closely related species among the 13 species of the genus Rubus studied. The majority of species in the genus Rubus belong to the subgenus Idaeobatus, with only one species classified as subgenus Malachobatus. Since most of the branches reconstructed in this study correspond with those obtained in previous studies, we conclude that the overall topology of our phylogenetic tree is reliable (Yang and Pak 2006;Yang et al. 2012;Wang et al. 2016;Hummer et al. 2019;Wang et al. 2020;Yang et al. 2021). The genome data and barcode markers developed in this study provide a basis for unveiling the phylogenetic relationships of species of the genus Rubus worldwide.

Disclosure statement
The authors declare that there are no competing interests.