Genome-wide identification of the CCCH gene family in rose (Rosa chinensis Jacq.) reveals its potential functions

Abstract The CCCH(C3H) transcription factor is an important transcription factor in plants in the process of plant development, hormone regulation and under stress. Rose is one of the most important commercial flower crops worldwide. However, there are few reports about the C3H gene of rose and Rosa chinensis Jacq. The reference genome sequence becomes a valuable source of data for the C3H genes in rose (RcC3Hs). We identified 31 encoding members in total, which were distributed on seven chromosomes and clustered into 11 subgroups in R. chinensis. Each subgroup has similar group-specific features including motifs, gene structure, cis-acting elements and collinearity in R. chinensis and two other species (Arabidopsis thaliana and Oryza sativa L). The RcC3Hs’ tissue-specific expression by quantitative real-time PCR (qRT-PCR) showed that the expression of RcC3Hs was higher in roots and leaves compared with other tissues. These results provide a theoretical basis for the follow-up exploration of the functions of RcC3Hs and the molecular breeding of rose. Supplemental data for this article is available online at https://doi.org/10.1080/13102818.2021.1901609 .


Introduction
Transcription factors play a key role in regulating many important biological processes in plants [1]. It is essential to understand the functional characteristics of transcription factors to study their transcriptional regulatory networks and biological processes [2]. As a transcription factor family in plants, CCCH (C3H) is a zinc lipoprotein family, regarded as the most abundant protein family. CCCH proteins contain one or more cysteine residues and a histidine residue motif [3,4]. Compared with the research on the C2H2-type zinc finger proteins, the research on the CCCH-type zinc finger proteins is little. The CCCH-type zinc finger proteins usually contain at least one zinc finger motif, in which three Cys (C) residues and a His (H) residue are the most distinctive feature. The motifs' consensus sequence is defined as C-X4-15-C-X4-6-C-X3-4-H (X represents any amino acid) based on the number of different amino acids between cysteine and histidine in CCCH [5].
The CCCH genes in plants are reported to participate in the process of plant development, adaptation, hormone regulation and stress physiology. For example, in the model plant Arabidopsis thaliana, AtC3H54 and AtC3H61 were expressed at a higher level in mature dry embryos and at a lower level in seed imbibition [6]. Compared with the wild-type plants, the transgenic plants over-expressing AtC3H2, AtC3H6 and AtC3H54 had a later germination and the mutants of AtC3H2, AtC3H6 and AtC3H54 had an earlier germination in Arabidopsis [6]. AtC3H2 is expressed specifically at the sprout stage, and participates in the ABA pathway and gibberellin pathway to regulate the plant growth [7]. AtC3H23, as a gene which regulates the ABA pathway and gibberellin pathway, has continuous expression during the plant growth [8]. AtC3H20 and AtC3H49 are also affected by ABA [9,10]. AtC3H66 can mediate the immune responses through the pathogen-related molecular patterns (PAMP). The KO mutant in Arabidopsis showed different responses such as decreased reactive oxygen species (ROS) accumulation and changes in the disease resistance of plants [11]. AtC3H20 and AtC3H49 reportedly may have a function associated with stress. The over-expression of these genes could enhance the resistance of plants under oxidative, salt and drought stress compared to the wild-type plants, while the antisense or RNAi plants were more sensitive under stress [9,10]. CCCH genes have also been reported to be involved in plant growth regulation and stress resistance in many crops. Rice plants over-expressing the CCCH gene OsTZF1 showed higher salt tolerance than wild-type plants [12]. The over-expression of GhZFP1 (a CCCH gene in cotton) in tobacco plants can increase plant disease resistance and up-regulate salicylic acid (SA) to regulate disease resistance. These results indicate that GhZFP1 participates in the plant defense response [13].
The CCCH as an important gene family is currently reported in Arabidopsis and rice (Oryza sativa) [5], maize (Zea mays) [14], chinese cabbage (Brassica rapa) [15], citrus (Clementine mandarin) [16], tomato (Solanum lycopersicum) [17], poplar (Populus trichocarpa) [18], alfalfa (Medicago truncatula) [19] and common bean (Phaseolus vulgaris L.) [20]. Although the CCCH gene has been a hot spot in plant gene research, the comprehensive identification and analysis of the CCCH gene family in rose is rarely reported. Rose (Rosa chinensis Jacq.) is one of the most important commercial flower crops worldwide. The publication of the rose genome [21] has greatly promoted the research on the important traits of rose, including a comprehensive identification for the genes of rose. Based on the rose genome, this study identified rose CCCH gene families, studied the structure, conserved domains and tissue-specific expression of these genes in order to provide a theoretical basis for the future analysis of the mechanism of CCCH participating in the growth and development of rose.

Identification of RcC3H
The domain of CCCH(PF00642) was obtained from the HMMER structure [22] and in Pfam database [23]. The amino acid sequence and the nucleotide sequence were obtained from NCBI database (ID:11717). Software Interpro, P3DB and ExPASy were used to confirm the integrity of the C3H domain in RcC3Hs. Each RcC3H was given a unique name according to the position on chromosomes. All websites and software reference addresses are listed in Supplemental Table S1. The C3H protein sequences in other species were obtained from Esembl plants using the C3H domain.
MUSCLE program was used in MEGA X [24] to perform multiple sequence alignments between the identified RcC3H protein sequences and the protein sequences of CCCH members in Arabidopsis and rice (Oryza sativa L.). The Maximun Likelihood were performed by using the WAG + G model, which was pridicted as the best model by MEGA. One thousand replicates were used to produce bootstrap values. FIG tree was used to edit and show the phylogenetic tree.

Analysis of RcC3H
The MEME tool [25] was used to detect the additional motifs in RcC3Hs' protein sequences outside the C3H domain. The lengths of the motifs were 10-50 amino acids and the E-values were less than 1e −20 . The motifs were numbered according to the order in the protein sequences. All of them were compared among RcC3Hs to identify the group-conserved or group-specific signatures. The GSDS [26] was used to analyze and display the exon/intron structure of RcC3Hs. Genewise [27] was used to get the location information according to the sequences of DNA (containing exon and intron together) and protein. The C3H domain in the predicted amino acid sequence of the putative RcC3H proteins were transformed by using perl scripts. Plantscare was used to analyze the promoter of RcC3Hs. Through the genome sequence and GFF3 file downloaded from NCBI, all the sequences of RcC3Hs were extracted by using TBtools [28]. The promoter situation of 1500 bp of RcC3Hs was analyzed. The MCScanX [29] was used to calculate the gene duplication events. The result was showed by software Circos.

Plant materials and treatments
A local rose variety "Month-pink" mainly grown in Heilongjiang was chosen as the plant material. The flowering growth period was used as the sampling period. The roots, stems, leaves, flowers, stamens and pistils of plants were sampled as different tissues. Each sample was about 0.2 g. The RNA of these samples was extracted by the "Trelief TM RNAprep Pure Plant Kit" (TSINGKE, TSP411, Beijing, China) and determined by NanoDrop 2000 C (Thermo Scientific, Waltham, Massachusetts, USA); the RNA samples' OD 260/280 was required to be between 1.8 and 2.2, and 28S/18S was not less than 1. The samples' cDNA were obtained by "Goldenstar Tm RT6 cDNA Synthesis Kit" (TSINGKE, TSK301, Beijing, China). The primers of Quantitative Real-time PCR (qRT-PCR) were designed by Primer Premier (Supplemental Table S2) and RcUBR was used as the reference gene based on preliminary experiments in our laboratory. The Roche Light Cycler 480II system was used for the program of qRT-PCR (Roche, Roche Diagnostics, Switzerland) and the 2 × TSINGKE Master qPCR Mix (SYBR Green I) (TSINGKE, TSE201, Beijing, China) based on 3 biological replicates. The relative expression level of RcC3Hs was determined according to the 2 ΔCt method.

Identification of RcC3Hs
Thirty-one protein sequences in total which have the C3H domain were identified in rose after sifting by SMART and InterPro analyses; they were assigned as RcC3Hs. RcC3Hs and were named RcC3H01~RcC3H31 according to the position on the chromosomes (Chrs), and their detailed information can be seen in Supplemental Table S3. These 31 members were located on seven chromosomes (Chrs) of R. chinensis.
RcC3H members are present on all chromosomes in the Rosa genome (R. chinensis 'Old blush'), while chromosome 5 has the most RcC3Hs with 11 members; Chr4, Chr6 and Chr7 have the same number of RcC3H members (3), and Chr1 only has 2 RcC3H members. The predicted protein length of RcC3Hs is from 150 to 1174 amino acids, and the isoelectric point is in the range of 5.30-9.37; the molecular weight is between 16797.72 and 133095.47 Da, while the instability index is from 32.64 to 65.42 (Supplemental Table S3 and Figure 1).

The evolutionary, motifs and gene structure analysis of RcC3Hs
Thirty-one RcC3H protein sequences were analyzed to construct a phylogenetic tree (Figure 2). The Maximum Likelihood was used with the WAG + G model which predicted the optimal model in MEGA software (MEGAX). According to the evolutionary tree, the RcC3H members were clustered into 11 subgroups, named subgroup I-XI.
The motifs of 31 RcC3H members were analyzed. Each RcC3H's motifs are visualized in different color and the compositions of the motifs are shown in Figure 3B. The subgroups VI, VII, VIII, X, XI only contained motif 2, while the subgroup IV had the highest number of motifs, at least six motifs. The RcC3H members had similar motifs in the same subgroup of the evolutionary tree.
The gene structure of the RcC3H members is shown in Figure 3C. The number of exons varied from 2 to 14 in RcC3Hs and the RcC3Hs that were grouped together possessed similar gene structures.

The cis-regulatory elements of RcC3Hs
The cis-regulatory elements of RcC3H members' promoters were analyzed. In total, 11 cis-regulatory elements were found (Supplemental Table S4), of which six cis-regulatory elements may function in hormone-response. Another four cis-regulatory elements were predicted to be associated with stress responses, which may illustrate that RcC3Hs may have effects related to stress (Figure 4).

Evolutionary analysis of C3H members in rose, Arabidopsis and rice (O. sativa)
The CCCH genes in Arabidopsis and rice (O. sativa) were identified here. In total 68 CCCH genes in Arabidopsis and 50 C3H genes in rice were compared with RcC3Hs. The motifs and the gene structure of C3H genes in the three species are shown in Figure 5. The C3H genes of the three species were also divided into 11 subgroups. Each subgroup includes genes of a similar gene structure and motifs, which indicates that the C3H family genes of each subgroup in these three species may have similar functions.

Collinearity analysis of RcC3Hs
The results of the collinearity analysis of RcC3Hs are shown in Figure 6. Three pairs of RcC3Hs were found, which were RcC3H05vsRcC3H06, RcC3H11vsRcC3H18 andRcC3H13vsRcC3H27, indicating that these three pairs of genes are probably related. The collinearity analysis of RcC3Hs with Arabidopsis showed that only one gene (AT5G18550) could be related to RcC3H: RcC3H13, which indicated that these two genes may have the similar function.

Tissue-specific expression analysis of RcC3Hs
The expression analysis was tested by qRT-PCR to study the expression of RcC3Hs in multiple tissues including flower, leaves, stems, roots, stamens and pistils. As shown in Figure 7, 24 RcC3Hs were chosen for testing. Most RcC3Hs were expressed in various tissues of plants and the expression levels in various tissues were specific. In comparison, the expression of RcC3Hs was higher in roots or leaves, indicating that during growth and development, the roots and leaves may be the target tissues for research into the function of RcC3Hs in the future.

Discussion
The CCCH (C3H) family genes might participate in plant seedings, plant growth regulation or response to stress (Bogamuwa et al., 2014). In order to fully understand the C3H family genes in rose, based on the published rose genome data, 31 rose C3H members were identified. The evolutionary tree of rose CCCH family was constructed by cluster analysis. The 31 RcC3H members were divided into 11 subgroups. Similarly, the C3H family genes in other different species were also divided into 11 subgroups as CCCH-type zinc finger gene family in Arabidopsis and rice (Oryza sativa L.) [5], common bean (Phaseolus vulgaris L.) [20], citrus (Clementine mandarin) [16], Brassica rapa [15], chickpea (Cicer arietinum L.) [30] and Tetraploid switchgrass [31].
The genes with similar or identical functions will be grouped into the same subgroup, which provides a reliable basis for studying the function of genes related to this gene family. In subgroup I, the C3H genes AT3G48440, AT5G16540, AT3G06410 in Arabidopsis are all related to mRNA binding [32 and 33]. It is speculated that the RcC3Hs which have close genetic distance, might also have similar functions. The C3H genes AT3G12130 and AT5G06770 have been reported to have a function in regulating leaf senescence and flowering regulation in Arabidopsis [34]. RcC3Hs which clustered into the same subgroup in the phylogenetic analysis might have similar functions in rose. Reportedly, AT3G55980 and AT2G41900 might play important roles in modulating the tolerance of Arabidopsis plants to salt stress or hypoxia conditions [35][36][37]. The RcC3Hs in the same subgroup in the evolutionary analysis might have similar functions in stress responses. Colinearity analysis of RcC3Hs was compared with the Arabidopsis, AT5G18550. The gene has been reported to participate in mRNA binding [38 and 33]. The RcC3Hs are widely involved in the process of plant growth and development. Understanding the structure and function of RcC3Hs is of great significance for rose breeding and life science research in the future.
The members of RcC3Hs have different expression patterns in different tissue parts. RcC3Hs are expressed in all tissue sites and developmental stages and play an important role in plant growth and development. It is possible that the function of some RcC3Hs is related to leaf development and stress tolerance. Tissue-specific expression is used to speculate that RcC3Hs has a more extensive function and play a more important role in the normal growth and development of plant tissues (roots, stems, leaves, flowers). Similarly, the tissue-specific expression of BnZFP1, encoding a CCCH transcription factor, during grain formation      changes the oleic acid content of Brassica napus [39]. The expression of a CCCH-type zinc finger transcription factor gene in switchgrass, PvC3H72, was rapidly induced by cold stress and its overexpression in transgenic switchgrass improved the chilling tolerance at four degrees [40]. Several CarC3Hs [30] members were expressed preferentially in chickpea (Cicer arietinum) in specific tissues and also had different expression during abiotic stress (dryness, coldness, salinity). Here, most of the RcC3Hs were expressed at a high level in roots. It is speculated that the functions of these RcC3Hs are related to root development. Some RcC3Hs were also expressed in leaves at a high level.

Conclusions
In this study, 31 RcC3Hs were identified in the published reference genome of rose (R. chinensis), and RcC3Hs were analyzed, including their gene structure, predicted physical and chemical properties, conserved domains, collinear and evolutionary tree. The expression levels of RcC3Hs in different tissues were analyzed. These results provide a theoretical basis for the follow-up exploration of the functions of RcC3Hs in rose and the molecular breeding of rose.

Conflicts of interest
No potential conflict of interest was provided by the author(s).

Data availability
All data that support the findings reported in this study are available from the corresponding author upon reasonable request.

Funding
The financial support was received from the Science and Technology Innovation Project of Jilin Academy of Agricultural Sciences (C02103109) and Agricultural Science and Technology Innovation Project of Jilin Province.