Complete chloroplast genome sequencing of five Salix species and its application in the phylogeny and taxonomy of the genus

Abstract In this study, whole chloroplast genomes of five Salix species (S. argyracea, S. dasyclados, S. eriocephala, S. integra ‘Hakuro Nishiki’, and S. suchowensis) were sequenced. These chloroplast genomes were 155 ,605, 155, 763, 155, 552, 155, 538, and 155 ,550 bp in length, harboring 131 genes (77 unigenes), 37 tRNA genes, 8 rRNA genes, and 86 mRNA genes, respectively. The genes ycf1, psaI, ycf2-2, rpoC2, rpl22, atpF, and ndhF were under positive selection among the 21 Salix species. psaI, ycf2-2, atpF, and ycf1-2 were under positive selection between the tree willow and shrub willow, and rpoC2, rpl22, and ycf1-2 were positively selected among the shrub genomes. The gene rps7 was most variable among the genomes. Phylogenetic analysis of 21 Salix species and Chosenia arbutifolia provide evidence that the cp genome data partially support the relationship with traditional taxonomic concepts in the Flora of China. This chloroplast genome elucidates Salix taxonomy and provides evidence for evolutionary research.


Introduction
Chloroplast DNA (cpDNA) is maternally inherited, thus providing essential information for molecular markers, breeding of new varieties, and plant phylogeny (Cui et al. 2019;Njuguna et al. 2019). The willow genus (Salix spp.) is composed of 350-520 species that are distributed worldwide. In the 'Flora of China', the species distributed in China are classified into 37 groups (Wang and Shi 2019). The five species sequenced here (S. argyracea, S. dasyclados, S. eriocephala, S. integra 'Hakuro Nishiki', and S. suchowensis) are widely planted in Jiangsu Province and produce a large amount of biomass. Salix eriocephala was introduced from the United States for its high biomass yield and as a source of bioenergy. All these species absorb the heavy metal cadmium (Cd) in their roots and are the most promising candidates for phytoremediation among the willow species. In addition, the leaves and flowers have great ornamental value. Salix integra 'Hakuro Nishiki' is available from nurseries in shrub and tree form with vibrant white and pink leaves. Salix argyracea, S. suchowensis, and S. dasyclados are widely used in crafts for wickerwork and decorations. Thus, sequencing of the cpDNA and molecular marker mining will be effective methods to segregate willow germplasms and reveal phylogenetic relationships.

Plant materials
The five Salix species were collected and deposited in the willow collection at Jiangsu Academy of Forestry (31.861947 N, 118.777145 E). The voucher specimens of S. argyracea, S. dasyclados, S. eriocephala, S. integra 'Hakuro Nishiki', and S. suchowensis were deposited at the herbarium of Jiangsu Academy of Forestry under the voucher numbers P102, P126, 87, P646, and P63, respectively. The email of the person who is in charge of the sample collection is zjwin718@126.com. cpDNA sequencing and de novo assembly Fresh leaves were collected for DNA isolation and library construction, and the DNA samples were stored at Key Laboratory of Jiangsu Academy of Forestry, Nanjing, China. Genomic sequencing was performed using the Illumina Novaseq PE150 platform (San Diego, CA, USA). The raw data were sequenced and filtered using fastp (version 0.20.0, https://github.com/OpenGene/fastp) software to obtain clean data. Then de novo assembly was constructed using SPAdes v3.10.1 (http://cab.spbu.ru/software/spades/) for the complete pseudo genome.

Positive selection genes
The nonsynonymous substitution rate (Ka), synonymous substitution rate (Ks), and their ratio (Ka/Ks) are commonly used to calculate the direction of evolution and its selective strength in protein-coding genes. The genes ycf1, psaI, ycf2-2, rpoC2, rpl22, atpF, and ndhF were under positive selection in the 21 Salix species (Ka/Ks > 1) ( Table 2). The gene rps7, located in the IR region, occupied the highest Pi value (Figure 1), indicating that the gene is the most variable among the 21 Salix genomes that could be used as potential molecular markers.

Phylogenic analysis
With Eucalyptus spathulata as the outgroup, the phylogenetic tree of 21 Salix (5 sequenced and 16 published), 1 Chosenia arbutifolia, and 8 Populus complete cp genomes were constructed using MAFFT (auto mode) (Figure 2). Salix formed one robust monophyletic clade. The 21 species within Salix were clustered into two subclades. Of the 5 newly sequenced species in this study, S. argyracea, S. suchowensis, and S. eriocephala were in a clade (together with S. gracilistyla). Salix dasyclados was clustered with S. integra 'Hakuro Nishiki' in a clade. Based on the phylogenetic relationships inferred from the cp genomes, the genus Salix in China can be divided into two major groups.

Discussion
Five Salix species were sequenced, and the complete cp genomes of 16 previously published Salix species and that of C. arbutifolia were annotated. The cp genome size of the five Salix species was $155 kb and similar to that of the other 17 previously published species (154-156 kb). The GC content of the IR region was high, similar to the previously reported cp genomes of plants (Huang et al. 2017). The results revealed that the structure and synteny of the 21 Salix species and C. arbutifolia were highly conserved. Positively selected genes are vital for pinpointing specific targets in adaptive evolution processes, such as environmental, geographical, and host response (Wang et al. 2017). In a photosynthetic organism, loss of activity of atpF could impair respiratory activity and affect morphology (Lapaille et al. 2010). The psaI encoding photosystem I reaction center subunit VIII indicated that the selection was associated with photosynthesis change in the process of evolution. The ndhF exhibited a positive selection effect for its involvement in adapting to hot and dry climates (Carbonell-Caballero et al. 2015;Caspermeyer 2015). These positive selection genes are central to evolutionary patterns and might have driven the successful adaptation of the Salix genus.
The taxonomy and systematic phylogeny of the genus Salix has been obscure. Chosenia arbutifolia was within the clade comprising Salix species (Figure 2), which is consistent with previous reports (Chen 2008). In the 'Flora of China' (Wu and Raven 1999), S. dasyclados and S. integra 'Hakuro Nishiki' are assigned to the same section as S. suchowensis and S. koriyanagi are. However, the cp genome data partially support the relationship with traditional taxonomic concepts. The rps7 gene encodes the ribosome S7 protein, also known as ribosomal protein S7 (uS7), which is crucial for the assembly and stability of the ribosome. The rps7 shows the most variable region among the 21 genomes, indicating that it could be the molecular marker for species identification. Therefore, it is clear that the identification of cp genomes could provide valuable molecular resources for studying the taxonomy and phylogeny of Salix. This study provides us with valuable resources, which can be further applied for phylogenetic and evolutionary studies in Salix.

Disclosure statement
No potential conflict of interest was reported by the author(s).