Genome-wide identification of Cellulose-like synthase D gene family in Dendrobium catenatum

Abstract Dendrobium catenatum, which grows on the semi humid rocks in the mountains, has been at the top of the ‘Nine Immortals of China’ since ancient times. It is a kind of yin tonic medicine and its main active component is polysaccharide. Cellulose synthase-like D(CslD) genes were predicted to catalyse the biosynthesis of 1,4-β-d-glycan backbone of hemicelluloses, which plays fundamental roles in plant development. To investigate the role of CslD in the development of D. catenatum, eight CslD genes (DcCslD1,2a,2b,3a,3b,4a,4b,5) were identified. The results of protein prediction and analysis showed that CslD2a/2b/4a/4b proteins were acidic proteins. All the proteins had obvious hydrophobic or hydrophilic regions, and had transmembrane structure. Phylogenetic analysis revealed that the DcCslD family could be divided into group I, II, III and IV. DcCslD proteins had a typical Cellulose synthase domain and similar protein structures to the CslDs of other plants. Their promoter regions contain cis regulatory elements related to stress and hormone response. The results of qRT-PCR showed that the identified DcCslDs were differentially expressed in roots, stems and leaves. Most of them were highly expressed in stems and leaves. The environmental stresses examination showed that the expression levels of DcCslD5 were closely associated with drought-recovery treatment; the expression levels of DcCslD1, DcCslD2a, DcCslD2b, DcCslD3a and DcCslD5 were significantly influenced by low temperature. This study systematically analyzed the sequence characteristics of CslD protein of D. catenatum, which can provide reference for further study on the function of CslD protein in the polysaccharide metabolism of D. catenatum. Supplemental data for this article is available online at https://doi.org/10.1080/13102818.2021.1941252.


Introduction
Dendrobium catenatum (D. catenatum), a perennial herb of Dendrobium in Orchidaceae, is a valuable traditional Chinese medicine and has the effects of nourishing the stomach and promoting hydration, nourishing yin and antipyretic, etc. [1][2][3]. The principal functional substances of D. catenatum are polysaccharides, alkaloids, flavonoids, terpenes, fluorenones, phenanthrenes and bibenzyls. D. catenatum polysaccharide (DOP), which is mainly composed of mannose and glucose, and a minor amount of galactose, are especially abundant in D. catenatum stems [4,5]. Structurally, DOPs are 2-O-acetylglucomannanor 2,3-O-acetyl-glucomannan, made up primarily of β-1,4-linked mannose and β-1,4-linked glucose [6,7]. They have shown antioxidant, antitumor and anti-inflammatory bioactivities [8]. For example, D-Mannose played an immunomodulatory role in T cell immune response [9]; the D. catenatum polysaccharides could stimulate splenocytes, T-lymphocytes and B-lymphocytes, promote the cell viability and NO production of RAW 264.7 macrophages [10,11]. But the structures of polysaccharide were complex, and the functions of genes in the polysaccharide synthesis pathway were not very clear.
Cellulose synthase A (CESA) superfamily genes, which include cellulose synthase (CESA) gene and cellulose synthase like gene (Csl), are related to mannan synthesis [12,13]. The first plant CesA gene was reported in cotton in 1996 [14]. Based on this, Cutler and Somerville [15] identified the Csl gene family in Arabidopsis and speculated that some of members might function as different types of β-1,4-glucan synthase. These provided the basis for the further study of Csl gene family.
In this study, we analyzed the physicochemical and domain of eight complete DcCslD proteins from D.
catenatum and their homology with other plants. Through the analysis of expression patterns, the functions of DcCslD genes in the polysaccharide synthesis pathway of D. catenatum were characterized. These results can provide theoretical basis for the further study of polysaccharide synthesis and DcCslD genes function in D. catenatum.

Plant materials
D. catenatum cultivar 'Jingpin NO. 1' (Breed NO. Zhe R-SV-DO-015-2014) was from the State Key Laboratory of Subtropical Silviculture in Zhejiang Province, China. Fresh roots, stems and leaves from the plants were collected and frozen in liquid nitrogen for subsequent RNA isolation.

Genome-wide identification of CslDs proteins in D. catenatum
The complete CslDs protein sequence of D. catenatum was downloaded from NCBI. Then we downloaded the HMM (hidden Markov model) profile of Cellulose synthase domain (PF03552) from Pfam database [44]. And this HMM profile was used to identify the Cellulose synthase domain in D. catenatum genome by HMMER 3.0 software with the E value cutoff set at 10 −5 . At the same time, we used 6 A.thaliana CslD protein sequences and 5 Oryza sativa CslD protein sequences to conduct a local BLAST search of the D. catenatum protein database. The obtained gene ID was compared with the former, and finally obtained 8 DcCslD genes. It was found that there were multiple sequences in the same gene, so the suffixes were distinguished by a and b.

Gene structure analysis and identification of conserved motifs
We used TBtools to compare CD\s sequence and analyze the exon intron structure of the DcCslD gene. The conserved motifs of DcCslD proteins were analyzed by MEME suit(http://meme-suite.org/tools/meme), Homology alignment of DcCslD family was performed by DNAMAN software. At the same time, we used TBtools to extract the sequence of 2000 base pairs (bp) upstream of the start codon (ATG) of the DcCslDs gene, and then through the PlantCARE database (http://bioinformatics.psb.ugent.be/webtools/plantcare/ html/) to search for CIS components of promoters.

Real-Time quantitative RT-PCR (RT-qPCR)
For the DcCslD gene expression pattern analysis, roots, stems and leaves were collected from six-month-old D.catenatum seedlings and put in liquid nitrogen immediately. The total RNA from all of the samples was extracted by using TaKaRa MiniBEST Plant RNA Extraction Kit. The first-strand cDNA synthesis was performed using the PrimeScript™ RT reagent Kit with gDNA Eraser (Petfect Real Time). Primer design with Primer Premier 5 (Additional file 1). The real-time quantitative reverse transcription-polymerase chain reaction (PCR) was conducted using CFX96 TM Real-Time PCR System. The reaction volume consisted of 10 μL TB Green Premix Ex Taq II (Tli RNaseH Plus) (2X), 2 μL cDNA, 0.8 μL upstream primer (10 μmol/L), 0.8 μL downstream primer (10 μmol/L) and 6.4 μL ddH 2 O (20 μL in total). The reaction was performed with the following cycling profile: 95 °C for 3 min, 40 cycles of 94 °C for 20 s, 60 °C for 20 s, 72 °C for 20 s. Three technical replicates were performed for each sample. The calculation of the gene expression levels followed the 2 −ΔΔCT method described by Livak and Schmittgen [45], and SPSS 21.0 software was used to test the normal distribution of the data, and then one-way ANOVA was used to test whether there was any difference in the expression of samples.

In silico expression profiling of DcCslD genes
Based on the transcriptome data of D.catenatum, we screened out the expression of the DcCslD gene family under drought and low temperature conditions and then used TBtools to draw the heat map [46]. Vigorous 8-month-old D. catenatum plants with a height of ∼12 cm were chosen for the drought stress and stress removal experiment [47]. Irrigation was performed on the 1st day, omitted from the 2nd to the 7th day, and resumed on the 8th day, watering every two days at 15:30. Finally, the raw RNA-seq reads were obtained from the leaves that were harvested at both 06:30 and 18:30 on the 2nd [DR5 (NCBI: SRR7223299) and DR8 (SRR7223300)], 7th [DR6 (SRR7223298) and DR10 (SRR7223296)], and 9th [DR7 (SRR7223301) and DR15 (SRR7223297)] days, respectively, and at 18:30 on the 8th day [DR11 (SRR7223295)]. In order to analyze the expression of DcCslD genes in response to cold stress, the raw RNA-seq reads of leaves under 20 °C control condition (SRR3210630, SRR3210635 and SRR3210636) and 0 °C cold treatment for 20 h (SRR3210613, SRR3210621 and SRR3210626) were obtained from NCBI provided by Wu et al. [48]. The HISAT package [49] was used to align the readings of all samples with the reference genome of NCBI Dendrobium. used StringTie [50] to assemble the mapping readings for each sample. Then, used the Perl script to merge all transcriptome from the samples to reconstruct a comprehensive transcriptome. After the final transcriptome was generated, StringTie and edgeR were used to estimate the expression levels of all transcripts. StringTie calculated FPKM to perform expression level PtrCslD3, Potri.003G097100; PtrCslD4, Potri.001G136200; PtrCslD5, Potri.019G046700; PtrCslD6, Potri.013G082200; PtrCslD7, Potri.004G208800; PtrCslD8, Potri.009G170000; PtrCslD9, Potri.003G177800; PtrCslD10, Potri.001G050200; PAXXG023130; PAXXG228240; PAXXG000590; PAXXG120890; PAXXG156700; PAXXG228570; PAXXG188550; PAXXG254610.

Identification and characterization of DcCslD genes in D. catenatum
Through HMMER analysis and BLAST search, we identified Eight DcCslD genes, which were named DcCslD1, DcCslD2a, DcCslD2b, DcCslD3a, DcCslD3b, DcCslD4a, DcCslD4b and DcCslD5. The physicochemical properties of the predicted DcCslD proteins showed that, the size of the deduced DcCslD proteins varied between 740 and 1180 amino acids (aa), with an average of 1009.75 aa; the molecular weight (MW) varied from 82.39 to 131.73 kDa, and the theoretical pI of these genes ranged from 5.68 to 8.84; DcCslD2a, DcCslD2b, DcCslD4a and DcCslD4b were acidic proteins and the rest were basic proteins. The instability index was greater than 40, indicating that they were unstable proteins. And from the analysis of the grand average hydrophobicity of the protein, CslD3a was hydrophobic, while other proteins were hydrophilic ( Table 1).

Prediction of peptide, signal peptide and transmembrane domains of CslD protein from D. catenatum
Prediction analysis showed that the CslD protein of D. catenatum had no obvious signal peptide and guiding peptide. The transmembrane domain of DcCslD was predicted using TMHMM server v. 2.0. The results showed that, except for the DcCslD4b protein, the other DcCslD proteins had 6-8 transmembrane domains, among which 1-2 transmembrane domains were in the N region of the protein, and 5-6 transmembrane domains were at the C-terminal of the protein ( Table 2).

Prediction and analysis of secondary structure of CslD protein in D. catenatum
The secondary structure of the protein mainly includes α-helix, β-fold, β-turn, random coil and extended chain. The prediction results showed that the DcCslD protein sequences were composed of four structural components (α-helix, β-fold, extended chain and random coil), and the proportions of their structures were very similar. Among them, α-helix, extended chain and random coil were the main elements, and β-fold accounted for the least proportion (Table 3).

Prediction and analysis of the tertiary structure of CslD protein in D. catenatum
The DcCslD protein tertiary structures were predicted and compared by using the dormal model in Phyre2 structure prediction server, and their rationality was detected by the protein structure testing tool PDBsum Generate ( Figure 2). All proteins except DcCslD4b had similar tertiary structures. And some differences of structures might be responsible for the different functions of proteins.   The proportion of all DcCslD proteins in the disallowed regions was less than 2%, indicating that the spatial structures were stable. More than 80% of the amino acid residues were in the most favorable region, means the conformation was reasonable. The values of generation factors were all greater than −0.5, which indicated that all DcCslD had normal spatial structures (Table 4).

Phylogenetic analysis and classification of CslD proteins in D. catenatum
Due to the high sequence homology of the DcCslD genes, we investigated their evolutionary relationship. To understand the dynamic topological evolution, a neighbor joining phylogenetic tree was constructed by MEGA 7.0 using CslD protein sequences, including the above D. catenatum CslD proteins and Arabidopsis thaliana, Oryza sativa, Phalaenopsis aphrodite, Zea mays and Populus trichocarpa (Figure 3).
The result from the phylogenetic tree analysis showed that these 8 DcCslD proteins belong to three clades (I-III). On phyletic lineage, DcCslD2a, DcCslD2b, DcCslD3a and DcCslD3b shared the same clade I with ZmCslD5, OsCslD1, OsCslD2, AtCslD2, AtCslD3, PtrCslD5, PtrCslD6, PAXXG120890, PAXXG156700, PAXXG000590 and PAXXG228570. In this clade, AtCslD2, AtCslD3, OsCslD1 are required for root hair morphogenesis and the formation of cell wall, and ZmCslD1 is essential for cell division of rapidly growing tissues [35,37,42,[51][52][53]. There were 15 proteins in clade II, DcCslD1, DcCslD4a and DcCslD4b shared high similarity with AtCslD1, AtCslD4, OsCslD3, OsCslD5, ZmCslD3, ZmCslD4, PtrCslD7, PtrCslD8, PtrCslD9, PtrCslD10, PAXXG228240 and PAXXG023130. Among these proteins, AtCslD1and AtCslD4 caused abnormal flowers, pollen tubes, and pollen grains; PtrCslD7, PtrCslD8, PtrCslD9 and PtrCslD10 may participate in flower and pollen tube development [34,54,55]. In addition, DcCslD5 was in clade III as well as functions of OsCslD4, PtrCslD2 and AtCslD5 were studied, which mutants displayed reduced stem growth and synthesis of polysaccharides [56,57]. In clade IV, AtCslD6, PtrCslD3, PtrCslD4 and PAXXG254610 shared the same lineage; none of these were identified for their functions [33]. Based on the phylogenetic tree analysis, DcCslD2a, DcCslD2b, DcCslD3a and DcCslD3b may function in root hair formation, cell wall formation and cell elongation; DcCslD1, DcCslD4a and DcCslD4b may participate in flower and pollen tube development, and DcCslD5 may be involved in the formation of cell wall, the growth and development of plants and the synthesis of polysaccharides.

Analysis of gene structures, conserved domains and motifs of CslD proteins in D. catenatum
The exon/intron structures and intron numbers of the most closely related members in the same sub-families were not very similar. For example, the DcCslD3b had seven exons whereas DcCslD3a had just one; the DcCslD1 had three exons whereas DcCslD4a and DcCslD4a had five.
The functional domains of DcCslD proteins were analyzed by HMMER online tool. It was found that all 8 DcCslD proteins contained a Cellulose synthase domain. The results showed that all DcCslD proteins contained 12 motifs, except that DcCslD3a lacked motif 7 in the C-terminal region and DcCslD 4 b lacked motifs 2, 6, 12, 8, 9 in the N-terminal region ( Figure  4). According to the multiple sequence alignment results of the DcCslD protein domain, the DcCslD proteins had a large number of amino acid conservative sites, and the sequence of the Cellulose synthase domain had the ability to bind nucleoside sugar and the conserved motifs of D, D, D, Q, X, X, R, W ( Figure  5), indicating the highly conservative function of DcCslD protein.

Analysis of cis elements in DcCslD promoter
The plant gene promoter is an important cis-acting element which is the control center of gene transcription. The cis-acting elements analysis revealed that the promoter regions of these eight DcCslDs contain more than three cis-acting elements ( Figure 6). These cis-regulatory elements could be broadly divided into eight categories: Light responsive elements, Hormone responsive elements, Promoter related elements, Development related elements, Environmental stress-related elements, Site-binding related elements and other elements [58,59]. The light responsive category contains nineteen kinds of cis-regulatory elements, and twelve types of cis-regulatory elements were found to be involved in plant hormone responsiveness, including TATC-box, p-box, GARE-motif,  gibberellins responsive element; ERE, ethylene responsive element; ABRE, abscisic acid responsive element; TGA-element, auxin-responsive element; TCA-element, salicylic acid responsiveness; the MeJA-responsiveness; AuxRR-core, auxin responsiveness element, CGTCA-motif, TATC-box and TGACG-motif. The third category was related to plant growth and development, which contains five types of cis-regulatory elements: CCAAT-box, CAT-box GCN4-motif, HD-Zip3 and MSA-like. The fourth category might respond to environmental stress, such as ARE, the anaerobic induction; GC-motif, anoxic specific inducibility; TC-rich repeats, defense and stress responsiveness; MBS, MYB binding site involved in drought-responsiveness; LTR, low-temperature responsiveness; and W-box, wound-responsive element. In addition, two categories were promoter related elements and site-binding related elements; they both had two types of cis-regulatory elements, respectively. Finally, the functions of the remaining three types of cis-regulatory elements were unclear.

Expression patterns of DcCslD gene in different organs
DcCslD genes are important for plant growth and development [60,61]. In order to preliminarily elucidate the function of DcCslDs in D. catenatum, we used qRT-PCR to examine the expression of eight DcCslDs in roots, stems and leaves. Overall, most DcCslDs genes were highly expressed in stems and leaves, while the DcCslD4b gene was highly expressed in the root. But

Expression of DcCslDs in response to drought and low-temperature treatments
It has been shown that DcCslDs are involved in plant responses to environmental stresses [62][63][64]. In order to investigate the response of DcCslDs under stress conditions, we used the transcriptome data to analyze the levels of transcripts in tissues treated with drought and low-temperature. under drought treatment, the expression of DcCslD5 gene changed most obviously, and increased with the extension of drought treatment time while decreased after rewatering ( Figure 8A

Discussion
D. catenatum has been known as the first of 'nine magic herbs in China' since ancient times. Because of its high medicinal value and wide application, it has been described as 'giant panda of pharmaceutical kingdom' in the international medicinal plant community [65,66]. DOP which is mainly composed of mannose and glucose, is one of the main active components in D. catenatum [67]. Studies have reported the first draft genome sequence of D. catenatum and discovered extensive duplication of genes included DcCslD gene family, involved in glucomannan synthase activities, likely related to the synthesis of medicinal polysaccharides [27]. These allow us to systematically analyze the CslD gene family in D. catenatum.
In this study, a total of eight DcCslD genes were identified and characterized. The predicted physicochemical properties of amino acids showed that all DcCslD proteins had long sequences, and they were Figure 7. expression of CslD gene family in organs of D. catenatum. We used six-month old seedlings of D. catenatum to extract Rna from roots, stems and leaves for qRt-pcR. During this period, to ensure the accuracy of the experiment we designed three biological and three technical repeats for each sample. error bars show standard deviation of the means. unstable proteins; DcCslD2a/2b/4a/4b proteins were acidic proteins, the rest were basic proteins, what might be related to the protein function. Except for DcCslD3a, the other proteins had obvious hydrophilic regions, which could let them bind to the substrate and function better. Each protein contains several transmembrane structures, suggesting that they might be transported to the outside of the cell membrane after synthesis. In the prediction of protein secondary structure, it could be seen that the main components of DcCslD protein secondary structure were α-helix, extended chain and random coil and α-helix was the main part of transmembrane structure. In this study, the high-level structure model of DcCslD protein was established by homologous modeling method, and the model was tested by PROCHECK. It was found that all DcCslD proteins had normal spatial structure and conformation of proteins was reasonable. The prediction and analysis of DcCslD protein sequence structure can guide the expression and modification of protein; in addition, the prediction and analysis of secondary and tertiary structures will be helpful to further explore the relationship between structure and function and the mechanism of action.
According to phylogenetic analysis, CslD proteins could be divided into four clades (I-IV), and DcCslD proteins were distributed in clades I-III. We found that DcCslD2a/2b/3a/3b, which showed close phylogenetic relationships with AtCslD2/3, OsCslD1 and ZmCslD1, were required for plant growth and development such as root hair morphogenesis and the accumulation of mannose in internode fiber. DcCslD1/4a/4b shared high similarity with AtCslD1/4 and PtrCslD7/8/9/10, indicating that they might participate in flower and pollen tube development. From the clade III of the phylogenetic tree, DcCslD5 might have the same function as AtCslD5 in promoting plant stem growth and polysaccharide synthesis. The functional differences among the three clades of proteins indicated that they were obviously divergent in the process of evolution. This might be related to the growth environment and internal genetic transformation of plants.
Based on the phylogenetic, gene structures, and motif analysis of the DcCslD protein, we discovered that the exon/intron structures and intron numbers of the most closely related members in the same sub-families were not very similar. For example, the sister pairs DcCslD1, DcCslD4a and DcCslD4b had different intron/exon structures and numbers. These findings indicated that some intron loss, along with intron gain events, might have occurred during the structural evolution in the gene family of DcCslD encoding genes. In contrast, the DcCslD proteins had similar motif distributions. The clade I members (DcCslD2a/2b/3b), clade II DcCslD1 member and clade III member DcCslD5 had 12 distinct motifs, while the clade I group members DcCslD3a and clade II DcCslD4b lost motif 1 and motif 2, 6, 8, 9, 12, respectively. All of them contained the marker Cellulose synthase domain and highly conserved D, D, D, QXXRW motifs. It was essential for the activation of the monosaccharide binding enzyme [68][69][70].
Analyses of cis-elements in the promoters suggest that DcCslDs might respond to regulate plant growth and development, as well as different environmental stresses and stimuli. To determine where the DcCslD genes were expressed in D. catenatum, qRT-PCR analysis was performed on RNA extracted from various organs. As shown in Figure 6, DcCslD genes accumulated in the tested D. catenatum tissues, but with different relative expression levels. Except for DcCslD4b, the expression of other genes in the stem and leaves was higher than that in the root. It might be that the six-month-old seedlings were in the vigorous growth period, and the formation of stems and leaves required the expression of DcCslD genes. And the high expression of DcCslD gene in stem also confirmed this point. DcCslD genes had the function of synthesizing polysaccharides, which provided conditions for the formation of plant cell wall. Similarly, multiple genes expressed ubiquitously in D. catenatum leaves and stems, but especially in stems, mainly because stems are the principal storage organs for DOPs in D. catenatum.
Heat map data showed that drought could induce the expression of the DcCslD5 gene, whereas the effect on other genes was not obvious. This result is similar to that reported by Wan et al. [62] whose results showed that moderate drought would not lead to a significant change in D. catenatum's gene expression because of its strong ability to adapt to drought. The expression of 5 DcCslD genes (DcCslD1, DcCslD2a, DcCslD2b, DcCslD3a, and DcCslD5), whose promoters contain a low temperature response element, were up-regulated by low temperature. It was also reported that low temperature induced the expression of DcCSLA5, which belongs to the same subfamily as DcCslD [71]. That might be related to the adaptation of D. catenatum to the harsh environment such as cliffs and cold. What is more, some studies speculated that the polysaccharide produced by CslDs might have a signaling role in plants development. For example, the plasma-membrane-bound receptor-like kinase (THESEuS1), which is present in elongating Arabidopsis cells, could not only mediate the cell response to disturbance of cellulose synthesis, but also act as a sensor for cell wall integrity [72]. OsCSLD4 mutant could significantly slow down plant growth by delaying cell cycle progression and it was closely related to several genes involved in cell cycle regulation [73]. Based on the above results and our prediction analysis, we speculated that the growth of D. catenatum was inhibited under low temperature. In order to resist the environmental stress, the expression of DcCslDs was up-regulated, which promoted the synthesis of polysaccharides, and then brought the cell cycle and growth process of D. catenatum to normal. These results also provided a good basis for further analysis of the function of this gene family in the growth and development and glucomannan synthesis of D. catenatum.

Conclusions
In this study, we identified eight putative CslD proteins in D. catenatum and analyzed their biochemical characteristics, structures and phylogeny. In addition, we analyzed the expression profiles of DcCslD genes in different tissues and organs and their responses to diverse environmental stresses. Our findings provide comprehensive information on the classification and expression profiles on DcCslD genes, and will lay the foundation for the functional characterization of the DcCslD genes family in orchids.

Authors' contributions
HX, JL, QL, JS, LZ, and CL planned and designed the research. HX performed the experiments. HX, JL, QL, XC, CL, YZ, JY and DC analyzed the data. HX, LZ, JL and JS wrote the article. All the authors approved the manuscript.

Availability of data and materials
All data generated or analyzed during this study are included in this published article and its Additional files. The datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.

Disclosure statement
The authors declare that they have no competing interests.

Ethics approval and consent to participate
The D. catenatum used in this study is a commercial cultivar 'Jingpin NO. 1' , which was cultivated by Prof Jinping Si (Zhejiang A&F university), was authorized by Zhejiang Province with Breed NO. Zhe R-SV-DO-015-2014. It does not require ethical approval.

Funding
This work was funded by National Key R&D Program of China (2017YFC1702201) focusing on wild imitation cultivation and stress response mechanisms of Dendrobium, the State Key Laboratory of Subtropical Silviculture (ZY20180206) focusing on metabolic mechanism of environmental fitness in Dendrobium, and 2021 SAAS Project on Agricultural Science and Technology Innovation Supporting Area [SAAS Application Basic Study 2021 (09)]. The funding agencies were not involved in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.