Identification and expression analysis of capsaicin biosynthesis pathway genes at genome level in Capsicum chinense

Abstract Capsicum chinense is one of the best ingredients for making chili sauces and extracting capsaicin. In this study, we identified 59 genes associated with shikimic acid, phenylpropane, unsaturated fatty acids, phenylalanine, valine, leucine, isoleucine and other metabolic pathways based on pepper genome. In the present study, 58 of the 59 genes involved in capsaicin synthesis were found to be unevenly distributed on 11 chromosomes, and several genes with the same number of exons and high homology were found to be clustered in a duplication region of the chromosome. The promoter regions of these genes were predicted and analyzed, and multiple gene promoters were found to contain MYB binding sites, which provided a reference for the subsequent exploration of MYB target genes. At the same time, the expression levels of genes related to capsaicin synthesis were analyzed based on different transcriptomes, and it was found that these genes showed significant differences in different plant development stages and different tissues. The results of this study will be helpful to further study the synthesis mechanism of capsaicin in Capsicum chinense and the functional relationship between genes.


Introduction
Pepper, a plant of genus Capsicum in Solanaceae, is popular all over the world because of its rich varieties, high nutritional quality and unique spicy taste. Due to the special pungency, pepper can not only be used as fresh vegetables to supplement nutrition, also be processed into condiments such as Hainan yellow lantern pepper sauce, pickled millet pepper, Guizhou old Ganma, Hunan 'spicy girls' . As the main ingredient and determinant of the pungency of pepper fruit, capsaicin not only has the effects of pain relief, anti-inflammatory detumescence, appetite regulation, rheumatism prevention, etc., widely used in the pharmaceutical industry [1][2][3][4], but also has a good sense of touch and repellent effect on pests, so it is a new kind of raw material for development of green environmental protection pesticide [5,6]. In addition, capsaicin is also an important component of chili water for police riot prevention and military expulsion [7]. Therefore, high spicy pepper has a wide range of consumer demand and application prospects.
So far, about 50 genes that may be involved in capsaicin metabolism have been discovered, including genes Pal, C4H and 4-CL of GPMP, COMT, pAMT and AT (or Pun1, encoding capsaicin synthase CS) of capsaicin branch pathway (CBP), BCAT, Kas, FatA and ACS of BCFAP, CCR and CAD of the lignin biosynthesis pathway (LBP), and other metabolic pathway genes such as GS (glutamine synthetase) and CSE (Caffeyl shikimate esterase) [15,16]. The silencing of pun1, COMT, pAMT, Kas and CAKR1 genes involved in the capsaicin biosynthesis pathway are identified to decrease the content of capsaicin [17,18]. Sequence analysis showed that asymmetric replication, deletion of promoter and exon sequences and frameshift mutations in Pun1 gene evolution led the decrease or missing of pungency in C. annuum, C. chinense and C. frutescens, respectively [19][20][21], and the 12 bp deletion mutation and 7 bp insertion of pAMT gene were responsible for the low spiciness of C. frutescens and C. chinense no. 4034, respectively [22,23]. In our previous study, the binding affinity of key enzymes CCR1 and CCR2 involved in the lignin pathway was different with different substrates, and the silencing of their encoding genes also had different effects on capsaicin content [24,25]. Moreover, many of these genes are found to be regulated by different types of MYB in plants [26][27][28].
In recent years, genomic association analyses (GWAS) have found more pathways and genes related to spicy taste, such as ankyrin (anchor protein, which is speculated to encode acetyl transferase), CSe [4,14,29]. At the same time, with the successful sequencing of different pepper genomes, the identification and analysis of genes and their gene families at the whole genome level is helpful, and there are more and more studies on the analysis of structural genes and transcription factors related to biosynthetic substances in pepper based on transcriptome data [30,31]. Due to the wide range of spiciness variations, the area of pepper has stabilized at more than 2.1 million hectares with a total output of more than 60 million tons and a contribution rate to farmers' income of 1.14%, which has become an important starting point for precise poverty and Rural Revitalization in China [32]. Through transcriptome analysis, we can search for genes that may be related to capsaicin biosynthesis, and understand the characteristics of these genes, which can help us better understand the formation mechanism of chili spiciness.
In the case of pepper, they come in different shapes and sizes depending on the variety, and some of them can be used as ornamental plants [33]. Of course, the most important is the synthesis of capsaicin. Capsaicin not only gives a unique taste but is also good for human health [34]. The purpose of this study is to analyze the structural characteristics and expression patterns of genes related to the branch routes (GPMP, BCFAP, LBP) of phenylpropane metabolism based on transcriptome data of Capsicum chinense.

Selection of genes associated with capsaicin biosynthesis
Genes related to capsaicin synthesis were searched by protein annotation in the Capsicum chinense genomic database of nCBI (https://www.ncbi.nlm.nih.gov/ genome/?term=capsicum+chinense). The GenBank number is shown in Table 1.

Sequence analysis of genes associated with capsaicin biosynthesis
The genomic and CDS sequences for each of the capsaicin metabolism genes were extracted from the nCBI database. Visual analysis of introns and exons of genes was performed using the Gene Structure Display Server (http://gsds.gao-lab.org/). Then, the physicochemical properties of related gene proteins were predicted by ProtParam. (https://web.expasy.org/protparam/). The subcellular location of pepper capsaicin metabolic genes was predicted by ProtComp 9.0 (http://linux1.softberry. com/berry.phtml?topic=protcomppl&group=programs& subgroup=proloc).

Phylogeny, structure, chromosomal localization, duplication and promoter analysis of genes associated with capsaicin biosynthesis
The full genome of Capsicum chinense and the information of genes related to pepper synthesis were downloaded from nCBI, and the location of genes on chromosomes was visualized by TBtools [35]. TBtools was used to Blast the protein sequences of capsaicin-related genes in pairs, and then combine the distribution of each gene on the chromosome and the number of introns and exons to determine the gene duplication events In MeGA X software, clustalW was used for sequence alignment analysis of coding sequence, and neighbor-joining (nJ) method was used to create phylogenetic tree [36]. The gaps/missing data treatment of pairwise deletion, p-distance model and 1,000 bootstrap replications were selected as the default parameters for the nJ analysis. The branch lengths were assigned by utilizing the pairwise calculations of the genetic distances. The upstream 2000-bp region of ATG was selected as promoter region. PlantCare (http://bioinformatics.psb.ugent.be/webtools/ plantcare/html/) to predict cis-acting regulatory elements in the promoter region, and TBtools was used to visualize the predicted results.

Expression analysis of genes associated with capsaicin biosynthesis
The transcriptome data of the fruit development were acquired from the scientific data [12,37,38]. Then their transcriptome information was downloaded using Sratoolkit, and their TPM values were converted and extracted using FileZilla Client and PuTTY. Then, TBtools software was used to create and visualize heat maps of the expression of genes involved in capsaicin synthesis.

Selection and characterization of genes associated with capsaicin synthesis pathway
Bioinformatics tools were used to identify 59 genes associated with capsaicin synthesis pathways in pepper. The physical locations and physicochemical properties of these genes are shown in Table 2, including the gene names, chromosome location, gene length, number of exons, protein length, isoelectric point (pI), molecular weight and the subcellular localization. nineteen genes encoding enzymes were identified to be involved in GPMP, including five PAL genes (CcPAL1, CcPAL2, CcPAL3, CcPAL4 and CcPAL5), which are believed to be involved in the synthesis of cinnamic acid. And three C4H genes (CcC4H-1, CcC4H-2 and CcC4H-3), which are thought to be involved in the synthesis of coumaric acid. Finally, p-coumaroyl-CoA was synthesized with the participation of 4CL genes (4-CL1-1,

Chromosomal localization and duplication of genes associated with capsaicin synthesis
As shown in Figure 1, 58 out of 59 genes related to capsaicin synthesis were unevenly distributed on 11 chromosomes. The largest number of genes, 11, is located on chromosomes 3 and 6. The fewest genes are only two, CcACL2 and CcC4H-1, on chromosome 5. However, based on available genomic data, there is one gene (CcCAD9-3) that has not been identified on any chromosome, and is on CCv1.2.scaffold1155. Gene duplication events play an important role in plant evolution [39,40]. In the present study, in order to better understand the mechanism of capsaicin biosynthesis gene evolution in pepper plants, tandem duplications were searched. Through bioinformatics methods, we found that CcCCR1-1 and CcCCR1-2 clustered in a gene replication region on chromosome 1, and they had the same 6 exons with an identity of 78.69%. Hence, we think one of these genes is a copy of the other gene. Both CcCOMT-1 and CCCOMT-7 genes have 4 exons, and the protein sequences are of the same length and the identity is up to 85.24%; the two genes are clustered in a gene replication region of chromosome 3. Both CcCOMT-8 and CcCOMT-9 genes have four exons with similar distribution regions, with identity up to 89.26%, clustered in a gene replication region on chromosome 6. Both CcPAL-2 and CcPAL-3 genes have two exons, with only one difference in protein sequence and 98.47% identity, clustered in a gene replication region on chromosome 9.
Both CcC4H-2 and CcC4H-3 genes have three exons, and the protein sequences are of the same length, and the identity is up to 98.61%; they are clustered in a gene replication region on chromosome 6. CcCCR2-2 and CcCCR2-1 have five exons, which are 95.4% identical and clustered in a gene replication region on chromosome 11. no segment duplication event occurred.

Exon-intron organization and promoter of genes associated with capsaicin synthesis
For the purpose of gaining further insight into the genes related to capsaicin synthesis, phylogenetic trees and exon-intron structures were made to analyze these genes in this study ( Figure 2). As shown in the figure, there are significant differences in the number of exons and introns among the genes associated with capsaicin synthesis. In terms of exons, the number of exons ranges from 1 to 19, the number of exons of CcpAmt-4 gene is only one, and it has no intron. The gene with the largest number of exons is CcBCAT-like1, which has 19 exons. Among these genes related to capsaicin synthesis, the longest gene is CcKas-2, which is about 35 kb, but it has only 14 exons. CcCAD1, which is only 1.7 kb in length, has six exons. The presence of these differences in the structure of genes associated with capsaicin synthesis suggests that the pepper genome may have undergone significant evolutionary changes. The upstream 2000-bp sequence of the start codon of the genes related to capsaicin synthesis was predicted online by PlantCare, and its expression regulation was studied. It can be seen from the figure that the promoters of genes related to capsaicin synthesis contain elements related to growth and development, stress, hormone regulation and light (Figure 3). Because these promoters have core regulatory elements CAAT-box and TATA-BOX, they are not shown in Figure 3. AACA-motif only exists in the promoter of the CcCCR1-2 gene and par ticipates in endosperm-specific negative regulation. There are many MYB binding sites in these gene promoters.

Expression patterns of genes associated with capsaicin synthesis during the development of the pepper fruit
In the present study, to further understand the function of these genes involved in the capsaicin synthesis pathway during fruit development, we analyzed the expression levels of these genes in two tissues and developmental stages based on three published transcriptomes [12,37,38] and heatmaps were built from the results (Figures 3 and 4). The expression of these genes related to capsaicin synthesis was significantly different at different stages of fruit development, placenta (PL) and pericarp (PR). In the GPMP, genes show different levels of expression, two of these genes were up-regulated in placenta tissue during the development of pepper fruits ( Figure 4B). Two genes (CcPAL-2 and CcPAL-3) were up-regulated in the pericarp ( Figure 4A), and seven genes (CcPAL-4, CcPAL-5, CcC4H-1, CcC4H-2, CcC4H-3, Cc4CL1-1 and Cc4CL-like1) were up-regulated in both the pericarp and the placenta. The expression levels of CcPAL-2, CcPAL-3, CcC4H-3 and Cc4CL1-1 were higher in the placenta than in the pericarp during the mature-green stage ( Figure 4C), but the results were reversed in Breaker Plus 10 days ( Figure 4D). In other transcriptomes, CcPAL-4 expression reached its highest level around 40DAP (days after pollination) in the placenta tissue ( Figure 5A). CcPAL-1 was expressed slightly or sometimes not at all in both tissues.
In the CBP, the results showed that most capsaicin biosynthesis genes are highly expressed at mature green stage. However, CcCOMT-9 was almost not expressed in the pericarp. CcCOMT1-1 and CcCOMT-8 are mainly expressed at breaker plus 10 days. It was also observed that CcCOMT-1, CcCOMT-2 and CcCOMT-3 expression levels in placenta were higher at the 40DAP, but the expression level of CcPun1 in placental tissues was high around 50DAP, which was similar to CcCOMT-10 expression pattern ( Figure 5A). In addition, another set of transcriptome data showed that CcCOMT-8, CcCOMT-9, CcCOMT-10, CcpAmt-2 and CcPun1 were expressed at a high-level during the fruit-breaker stage (30-45DPA) (days post-anthesis) ( Figure 5B). These results indicated that, these genes may play an important role in capsaicin biosynthesis in capsicum fruit.
In the LBP, most of the lignin synthesis genes were highly expressed at the mature green stage and were expressed at a low level at the breaker plus 10 days stage. CcCCR1-1, CcCCR1-2 and CcCCR1-4 were expressed at higher levels in placenta during the breaker plus 10 days stage. CcCAD1 and CcCAD2 were highly expressed in 40DAP and 50DAP, respectively.
In the BCFAP, these genes were mainly expressed in the pericarp tissue at the mature green stage, but during breaker plus 10 days, the expression of other genes in the placenta was higher than that in the pericarp, except CcACL1-1 and CcACS. CcACL1-3, CcACL1-4, CcACL2, CcACL3 and CcACS were highly expressed in placental tissues at about 40 DAP ( Figure  5B). CcKas-1 was highly expressed in placental tissues at breaker Plus 10 days.

Discussion
A large number of the genes and enzymes associated with capsaicin biosynthesis have been identified, cloned and studied [41,42]. In this study, we predicted the physicochemical properties and regulatory genes of capsaicin-related enzymes in Capsicum chinense.
CcPal proteins have isoelectric points ranging from 5.87 to 6.61, indicating that they are acidic proteins. The isoelectric point of one of the CcC4H proteins is 5.88, and that of the other two proteins is 9.18, indicating that they have both acidic and basic proteins, which may be caused by the difference of the non-conserved domain. Ccbcat-like1 and CcKas proteins were basic proteins. CcHCT, CcPun1 and CcACS proteins were acidic proteins. Cc4CL, CcCOMT, CcpAmt, CcCCR, CcCAD and CcACL proteins have both acidic and basic proteins. However, how the pH value of specific cell microenvironment affects the related protein synthesis needs further research.
Meanwhile, we conducted tandem duplication and segment duplication studies on capsaicin synthesis genes and found that there were multiple genes in the same replication region. They show that some of the genes involved in capsaicin synthesis probably have originated through gene duplication events Multiple copies of a gene help to counteract the effects of gene mutations and survive in adverse environments [43]. Therefore, the determination of these duplication events provides a reference for us to analyze the evolutionary relationship and functional prediction of genes related to capsaicin synthesis.
We classified these genes through different metabolic pathways. It has been found that, although the time at which the maximum amount of capsaicin accumulates varies with subtle differences in the environment, it generally occurs at about 40-50 DPA [44]. In our study, we learned that most genes related to capsaicin synthesis did reach their highest expression level at about 40DPA and then showed a downward trend. PAL is an enzyme related to capsaicin synthesis, as well as disease resistance in plants [45]. All CcPAL genes were subcellularly localized in the cytoplasm, which was consistent with the existing subcellular localization results of PAL genes [46]. In our study, CcPAL-1 did not show significant expression changes during the whole fruit development process of pepper, and it may be mainly involved in the resistance formation of pepper, which needs further functional verification. At present, the role of C4H in pepper has not been fully elucidated. In this study, we  [38] and placental tissue in Capsicum chinense l. [12] during the fruit development stage based on the Rna-Seq data. note: these data were normalized to a row scale. Days after pollination (Dap); Days post-anthesis (Dpa); Fruit-early (20Dpa); Fruit-Breaker (30-45Dpa); Fruit-mature (45-60Dpa). speculated that three CcC4H genes play an important role in capsaicin synthesis, laying a foundation for the subsequent study on the mechanism and function of C4H in pepper. p-Coumaroyl-CoA is transformed into caffeoyl-CoA under the catalysis of HCT protein. In this study, the expression level of CcHCT was the highest around 20DAP, when the maximum Caffeoyl-CoA was produced, while the expression level of most CcCOMT genes tended to be the highest around 30DAP, indicating that the synthesis of pepper at each stage had a certain delayed effect. It remains to be further explored whether the delayed effect is due to environmental influence or the influence of metabolite concentration. At present, the CCR gene that has been studied on pepper is highly homologous to CcCCR2 [24], but the remaining several CCR genes need further study. The subcellular localization of CcACL1-1, CcACL1-2 and CcACL1-3 found them in chloroplasts, indicating that they may be involved in photosynthesis in pepper plants.
At present, important progress has been made in studies on the expression of genes related to capsaicin synthesis, biosynthesis process [42], molecular markers of capsaicin content [47] and clinical use of capsaicin, etc. However, there is still a lack of research on the regulation of secondary metabolite synthesis by controlling the synergistic action of related enzymes.

Conclusions
In this study, we analyzed 59 genes related to capsaicin synthesis from the aspects of gene structure and protein physicochemical properties, and understood the basic characteristics of these genes. By predicting cis-regulatory elements of promoters of these genes, the possible MYB binding sites were searched for reference for the subsequent search for target genes of MYB transcription factors in the capsaicin synthesis pathway. Then we analyzed the expression patterns of these genes related to capsaicin synthesis in different developmental stages and different tissues based on three transcriptomes. From these results, we can suggest that these genes related to capsaicin synthesis play a very important role in the growth and development of pepper. In general, the comparative genomic analysis can provide a reference for the study of capsaicin synthesis mechanism and pepper breeding.

Data availability statement
The data that support the findings of this study are available from the corresponding author, [SHC], upon reasonable request.