Gene mining to discover pumpkin response genes to gummy stem blight infection caused by Stagonosporopsis cucurbitacearum

ABSTRACT Pumpkin is one of the region’s main cashcrops in northeast China cultivated for its edible seed and sarcocarp. Gummy stem blight (GSB), caused by the main pathogenic fungus of Stagonosporopsis cucurbitacearum (Sc.) which was identified in our previous study, has hampered pumpkin industry development with the affection of its reduced biological yield and edible quality. To clear the plant immune response mechanisms against Sc. infection, RNA-seq technology was employed to sequence pumpkin leaf transcriptome, the total number of distinct differentially expressed genes (DEGs) between the treatments and control was 2351, 4892, 5285, 4698, and 4213 for 2, 6, 12, 24, and 96 h treatments, respectively. KEGG analysis results revealed 16 distinct significantly up-regulated DEGs associated with MAPK signaling and plant-pathogen interaction pathways that may actively participate in Sc. infection-associated processes, while other five DEGs exhibited significantly altered expression during GSB infection. An apyrase-like gene, Gene3360 (LOC111476496), with significantly different expression levels, was possessing an early-stage primary infection-based function, which was hypothesized an important role in the resistance of pumpkin to Sc. infection. This study further deepens our understanding of the mechanism and mechanism of pumpkin leaf resistance to fungal infection.


Introduction
Gummy Stem Blight (GSB) caused by Stagonosporopsis cucurbitacearum (Sc.) infection, one of the most common diseases affecting cucurbits production worldwide, has seriously undermined pumpkin (Cucurbita maxima) production in Northeast China (Zhao et al. 2018a(Zhao et al. , 2018b. GSB infestations of fields progresses rapidly and can easily overtake large areas by infecting aboveground plant structures, such as leaves, flowers, and fruits. In fact, GSB infection incidence rates have reached 50-80% within continuous cropping fields and even within protected areas such as greenhouses (Ha et al. 2009;Yao et al. 2016;Tsutsumi and Silva 2004). Meanwhile, continual Sc. evolutionary processes and genetic mutations will ensure that GSB infestations will continue to negatively impact pumpkin yield and quality future years. At present, GSB control depends on long-term applications of large quantities of various chemical fungicides to fields, leading to fungicide resistance that undermines future GSB control efforts (Tsutsumi and Silva 2004). Therefore, elucidation of key genes involved in the pumpkin response to Sc. infection would provide molecular mechanistic information for use toward future development of pumpkin varieties with GSB resistance to aid GSB prevention and control efforts.
Transcriptome-based gene mining can be useful for revealing host genes, infection-associated metabolic pathways, signal transduction pathways, and molecular control networks involved in plant-pathogen interactions. In turn, such information would help clarify cucurbit molecular mechanisms underlying host resistance to Sc. to facilitate future efforts to create pumpkin varieties with GSB resistance. However, to date only a few research studies have focused on plant genes expressed in response to GSB infection. Frantz et al. investigated inheritance of melon GSB resistance, they pinpointed five independent resistance genes in melon that individually conferred resistance to GSB infection such that loss of function of these genes led to GSB susceptibility (Frantz and Jahn 2004). Meanwhile, comparisons made between a cucumber GSB resistancebased high-density genetic map to corresponding maps for melon and watermelon revealed that these three cucurbitaceae species exhibited similar chromosomal linkage relationships. In turn, these similar chromosomal maps permitted identification of past gene recombination events that led to the repositioning of cucumber genome scaffolds. Subsequent use of the resulting cucumber genetic map aided identification of genes conferring GSB resistance and susceptibility to cucumber varieties and ultimately demonstrated that a single dominant gene mediated the resistant phenotype. Therefore, super-dense genetic map was conducive to identification of new disease-resistant genes ) as a successful example for mining plant resistance genes (Kankanala et al. 2019). RNA-seq technology was also applied to study the infection of estern white pine (Pinus monticola) by cronartium ribicolaw, it was found that many secretory proteins, secondary metabolic enzymes and receptor protein encoding genes may play important roles in host-pathogen interactions (Liu et al. 2013). Multiple unique genes related to rice genome were excavated by RNA sequencing for rice blast infection at different time points, most of which were involved in metabolism, transportation and signaling, and these genes may play a role in rice blast resistance . Similarly, Liu et al. mined 1070 up-regulated DEGs that were enriched in growth/development, photosynthesis/biogenesis and defense-related reactions by using RNA-Seq . Therefore, as a powerful tool, transcriptomic analysis can effectively dig out the genes of host responding to pathogen infection.
The release of pumpkin genome-wide information (Sun et al. 2017a) has provided a convenient platform for conducting whole genome-based analyses of plant-pathogenic fungi with transcriptome. However, few studies to date have been focused on investigating pumpkin transcriptome changes during GSB infection. Thus, we conducted RNA-seq to compare DEGs expressed in untreated pumpkin leaves (0 h infection as control) to DEGs expressed in leaves after Sc. infection of leaves for 2, 6, 12, 24, and 96 h. Subsequent analysis of infection-related DEGs permitted identification of pumpkin genes involved in plant resistance to GSB. The results of this study should accelerate progress made toward successful breeding of GSB-resistant cucurbits.

Materials
Host plants of pumpkin variety 'Long Xue 1' were obtained from Heilongjiang Academy of Agricultural Sciences, Harbin, China. The more details of Sc. and assigned it the fungal strain designation was descripted in our previous study (Zhao et al. 2018a(Zhao et al. , 2018b. Seeds were sown in composite soil (at a volume-based ratio of vermiculite: field soil = 1:1) within an illumination incubator followed by incubation under the following conditions: 16-h light (25°C)/8-h dark (22°C) and humidity of 80%. Three weeks after the time of emergence of seedlings, the first treatment (96 h infection) was conducted, 6 mm mycelium block was taken from the edge of the colony and inoculated on the leaves covered with cellophane, which was moistened with sterile watersoaked filter paper and cling film, and so on. We harvested leaves from plants infected with Sc. for 0, 2, 6, 12, 24, and 96 h (designated CK, T1, T2, T3, T4, T5, respectively) in the same time then were immediately frozen in liquid nitrogen and stored in a −80°C freezer for future use. The entire infected leaf was collected for each individual sample in this study, all samples were prepared with three biological replicates.
RNA extraction, library construction, sequencing, and assembly The above-mentioned leaves were milled while frozen in liquid nitrogen then were extracted with an RNA extraction kit (Omiga Bio-tek, 100T, USA). After that, RNA preparations were assessed for degradation and contamination via electrophoresis using a 1% agarose gel. RNA purity was determined using a nucleic acid protein assay (Implen, CA, USA) and RNA concentration was determined using the Qubit ® RNA Assay Kit 2.0 Fluorometer (Life Technologies, CA, USA). RNA (3 μg) of each sample was used for library construction after samples were confirmed to be of suitable quality. Library concentration and inserted fragment sizes were detected using Qubit 2.0 and Agilent 2100 systems, respectively. An Agilent 2100 RNA Nano 6000 System (Agilent Technologies, CA, USA) was used to evaluate RNA integrity and a Q-PCR method was used to quantify the effective concentration of the library. High-quality clean reads were obtained by high-throughput sequencing with Illumina HiSeq 4000. Next, Hisat (version 2.2.1, https:// daehwankimlab.github.io/hisat2/) was used to align sequences obtained here against the reference genome (Sun et al. 2017a) and map them to genome site locations. Fragments Per Kilobase of transcript per Million (FPKM) mapped reads method was used to eliminate biases due to variable gene lengths and sequence coverage effects on final calculated gene expression level values. Calculated gene expression levels were used to compare gene expression differences among different samples.

Differential expression analysis
DESeq2 (Leng et al. 2013) was used to screen for DEGs with significant expression changes based on fold-change values and q values (Padj value, corrected P-value) with |log2 Fold change| ≥ 1 and q < 0.05 were selected as thresholds for determinations of significant DEGs. Based on the set of DEGs identified for each comparison group, a volcano map of DEGs was drawn. R package was used to perform hierarchical clustering analysis of DEGs for the different samples. Meanwhile, the K-means algorithm was used to draw heat maps depicting differential expression levels for each comparison analysis group showing biological replicates. DEGs were enumerated for each comparison group then were annotated using NCBI (https://www.ncbi.nlm. nih.gov/), UniProt (https://www.uniprot.org/), GO (https:// www.geneontology.org/), and KEGG (https://www.genome. ad.jp/kegg/) to obtain detailed information for significant DEGs.

Functional analysis
GO terms corresponding to each DEG were obtained using Blast2GO (https://www.blast2go.com/). For second-level GO database terms, the number of DEGs (up-regulated and down-regulated) for each term was determined with their percentages were calculated and statistical results were obtained. According to the number of annotated DEGs assigned to each GO term, numbers of GO terms that were significantly enriched in DEGs groups were compared to enrichment of identified GO terms across the entire genome background. After correction of the calculated pvalue, GO items meeting the cutoff threshold of q < 0.05 were defined as GO terms that were significantly enriched in DEGs of a particular experimental comparison group. In this way, main biological functions of DEGs were subsequently determined as significant using GO enrichment analysis.
KEGG database was used to integrate DEGs, GENES/ SSDB/KO database was used to mine genomes for functional information to help decipher functions of predicted genes and proteins then a distribution map was made according to the enrichment degree (Q value) of a given KEGG item in the pathway. KEGG items meeting this condition were defined as significantly enriched in DEGs based on q < 0.05 as threshold.

Quantitative real-time PCR analysis
Quantitative real-time PCR (qRT-PCR) was performed to verify the expression of six selected differentially expressed genes in order to validate libraries used for RNA-seq. For RNA preparation, total RNA obtained from pumpkin leaf samples was prepared using a PrimeScript RT reagent Kit with gDNA Eraser (Perfect Real Time) (Takara, Otsu, Japan). Total RNA template (1 μg) was reverse transcribed to generate cDNA using instructions provided with the kit. Next, qRT-PCR was performed for biological samples in triplicate using an ABI PRISM 7500 Fast Real-Time PCR System (Applied Biosystems, Foster City, CA, USA). Thermal cycling conditions were as follows: 1 cycle of initial denaturation at 95°C for 5 min followed by 35 three-step cycles of denaturation at 95°C for 15 s, annealing at 60°C for 15 s, and extension at 72°C for 15 s. Resulting qRT-PCR data were analyzed using the comparative 2 −ΔΔCt method (Livak and Schmittgen 2001). Means and standard deviations were calculated based on data from 3 independent biological replicates. Primers used in qRT-PCR for DEGs validation are shown in Table S4.

High-throughput RNA sequencing
Based on sampling time after inoculation, transcriptome sequencing was conducted in triplicate at five post-infection time points: 2 h (T1A, T1B, T1C), 6 h (T2A, T2B, T2C), 12 h (T3A, T3B, T3C), 24 h (T4A, T4B, T4C), and 96 h (T5A, T5B, T5C), with 0 h serving as control (CKA, CKB, CKC), for a total of 18 samples. However, the quality of four RNA samples (CKA, T3A, T4B, T5B) were of unacceptable and thus were not used. Ultimately we obtained 6.19 × 10 8 bp of raw reads from 14 samples sequenced in this study. After conducting quality control measures, numbers of clean reads obtained at each of the 6 time points ranged from 36.90-48.38 million and each library had a clean Q30 Bases Rate >92.32% (Table 1). Thus, these sequences should allow us to identify almost every gene involved in the plant response to Sc. Infection. After transcriptome sequencing using the Illumina sequencing platform, final contents of G versus C bases and A versus T bases were almost equal for each sequencing cycle as evidence that the entire sequencing process was stable and reliable ( Figure S1).

Differential expression analysis
Based on transcriptome data, we constructed digital gene expression (DGE) libraries of plant leaves infected with pathogenic fungi then mapped all clean reads to our transcriptome reference database. For each library, numbers of mapped reads ranged from 35.43-45.33 Mb, with mapped reads exhibiting reference genome alignment rates greater than 92.15% (Table S1). Due to the relatively comprehensive status of annotated information of genes across species, more than 85% of sequences for each sample could be aligned to exon regions ( Figure S2). Together, each library generated clean read numbers ranging from 5.54-7.11 billion sequences, with proportions of clean reads of total reads exceeded 96% for all libraries, which together indicate that transcriptional data were reliable (Table 1).
A total of 314 up-regulated genes were obtained for all five sampled post-inoculation time points (Figure 2), of which Gene11695, Gene5008, Gene15928, and Gene14079 were up-regulated significantly at all five time points. A total of 249 unigenes elevated expression after Sc. infection for 2 h, suggesting these genes were up-regulated at the very start of infection. Gene3360 (LOC111476496) was exhibiting the greatest expression level, which was an apyrase 2-like gene. Subsequently, 173 unigenes were up-regulated from 6 h to 96 h ( Figure 2). Notably, a total of 248 unigenes were down-regulated sometime after the 2-h time point, of which Gene5348, Gene11224, and Gene5716 were downregulated during all five sampled time points. Only 169 down-regulated genes were detected at the 2-h post-infection time point, while 171 unigenes were down-regulated after 2 h of infection (detected in 6-h to 96-h time points) (Figure 2).

Pathways of DEGs in pumpkin response to Sc. infection
To explore biochemical pathways associated with up-regulated DEGs identified during the five post-infection timepoints, pathway analysis was conducted with significance based on an E-value cutoff of <0.05. Interestingly, the results of KEGG pathway analysis detected 16, 36, 26, 18, and 26 significantly enriched pathways during the five post-infection time points of 2, 6, 12, 24, and 96 h, respectively (Table  S2). These up-regulated DEGs were found to be associated with several biological processes, including alpha-linolenic acid metabolism, pentose phosphate pathway, valine/leucine/isoleucine degradation, and Carbon fixation in photosynthetic organisms, with enrichment of all of these pathways observed at 2 h, 6 h, 12 h, 24 h and 96 h of infection. Thus, we speculate that these genes involved in these four pathways play important roles in pumpkin resistance to pathogen infection (Table S2). Common pathways were identified for DEGs from all five post-infection time points, while some DEGs acting within these pathways were also identified. Among the five time points, seven genes were involved in the pathway of alpha-Linolenic acid metabolism, two genes (Gene16420 and Gene30102) were involved in the Pentose phosphate pathway, nine genes were involved in the Valine, leucine and isoleucine degradation pathway, and three genes (Gene11367, Gene7505, and Gene7369) were involved in Carbon fixation in photosynthetic organisms (Table S2). Among them Gene6555 (AcaA1) was the only gene that was associated with α-linolenic acid metablism and Valine, leucine and isoleucine degradation pathways ( Figure S4). Eight pathways were significantly enriched at 6 h, 12 h, 24 h, and 96 h time points, including Alanine, aspartate and glutamate metabolism, Plant-pathogen interaction, Plant hormone signal transduction, MAPK signaling pathway-plant, Stilbenoid, diarylheptanoid and gingerol biosynthesis, Biosynthesis of amino acids, Phenylalanine, tyrosine and tryptophan biosynthesis, and Glycerolipid metabolism. At 6 h, 12 h, 24 h, and 96 h time points, we found eight up-regulated genes that were related to plant-conduction interaction, including Gene21473 (LOC111496628), Gene23057 (LOC111497667), Gene26916 (LOC111467281), Gene14342 (LOC111488147), Gene4395 (LOC111476413), Gene13960 (LOC111488479), Gene28857 (LOC111469333), and Gene25778 (LOC111465983) ( Table S2). In addition, two significantly enriched pathways, Protein processing in endoplasmic reticulum and Monoterpenoid biosynthesis, were found to be  enriched at 2 h post-infection, suggesting that these pathways play important roles during the early stage of infection. Meanwhile, eight up-regulated DEGs were linked to the MAPK signaling pathway-plant pathway at the other four time points, including Gene25585 (LOC111465470), Gene7418 (LOC111480444), Gene21262 (LOC111496162), Gene18962 (LOC111493622), Gene4055 (LOC111476613), Gene1086 (LOC111489244), Gene6486 (LOC111480113), Gene2411 (LOC111466615) ( Table S2).  (Table S3, Figure 3).
We found that most DEGs were concentrated within the 'biological process' category, with the next highest number of DEGs concentrated within the category of 'molecular function'. By contrast, the lowest number of enriched terms were in the 'cellular component' category ( Figure 3). Only the 'molecular function' category had entries that were commonly enriched at all five time points simultaneously, including GO:0016491 (oxidoreductase activity) and GO:0003824 (catalytic activity). In the 'biological process' category, at 6 h, 12 h, 24 h, and 96 h only GO:0006855 (drug transmembrane transport) was commonly enriched (Table S3).
Based on the GO and KEGG pathway enrichment analysis results, we speculate that five DEGs may play key roles during Sc. leaf infection for terms relating to transferase activity and the alanine, aspartate and glutamate metabolism pathway. Among them, Gene4395 (LOC111476413) and Gene25778 (LOC111465983) were enriched for GO terms calmodulin binding and transferase activity, while Gene798 (LOC111479964), Gene616 (LOC111479758), and Gene7505 (LOC111480790) were enriched for the GO term of transferase activity and KEGG term of alanine, aspartate and glutamate metabolism pathway, respectively.

Comparison of digital gene expression values with qRT-PCR analysis results
To evaluate the reliability of our RNA-seq and DGE analysis, three up-regulated genes (Gene30329, Gene20445, Gene33233) and three down-regulated genes (Gene5716, Gene10310, and Gene1275), which exhibited a wide range of expression levels and patterns during GSB infection of Sc., were selected for qRT-PCR analysis. Corresponding primers with two pairs that were designed are listed in Table S4. As shown in Figure 4, the qRT-PCR results agreed well with DGE profile-based results.

Discussion
The material selection of this study was designed based on the previous research results, which ensures the correct implementation of the research direction. In previous studies, we have determined that GSB of pumpkin crops was mainly caused by Sc. in Heilongjiang Province of China (Zhao et al. 2018a(Zhao et al. , 2018b. To ensure the accuracy of the experimental data, we performed the subsequent analyses only in two duplicate with well repeatability for each time point, attributed to the unsatisfactory extraction results for four samples (CKA, T3A, T4B, T5B), which may due to some irregularities in the experimental procedures, such as prolonged extraction and grinding of samples lead to degradation, improper control of freezing treatment effectiveness and so on. Therefore, we emphasize the importance of reproducibility in transcriptomic analysis for further data processing and analysis.
After pathogen infection, activities of many plant metabolic pathways were affected, with most alterations of activities found for genes within transcription networks (Bozkurt et al. 2010). In our study, we found a series of genes that were differentially expressed during fungal infection of pumpkin. Gene798 was shown to encode glyoxylate aminotransferase 2 (AGT2), a plant peroxisomal glyoxylate aminotransferase, which contain a putative type 1 peroxisomal targeting signal (PTS1) (Neugebauer et al. 2018). Gene616, was encoded an aspartate aminotransferase involved in amino acid metabolism, might interact with plant defence responses for increasing disease resistance (Park et al. 2010). Gene7505 was found to encode a gamma aminobutyrate transaminase and thus may be involved in Sc.-plant interactions. Overall, our results indicated that alterations of some unigenes were transient, with some unigenes only altered at one-time point during Sc. infection.
Whether up-regulated or down-regulated, the highest numbers of DEGs with the most significant fold changes of expression were found at 96 h (786 up-regulated genes, 1079 down-regulated genes) and the lowest numbers of DEGs with most significant fold changes were found at 2 h (249 up-regulated genes and 169 down-regulated genes, Figure 2). The most prominent up-regulated gene, Gene3360 (LOC111476496), was annotated as an apyraselike gene, which plays a very important role in controlling extracellular ATP concentration and regulating ATP signaling according to the reports in leguminous plants (Takahashi et al. 2006;Veerappa et al. 2018;Wang et al. 2018), wheat (Liu et al. 2019), and Arabidopsis thaliana (Corratgé-Faillie and Lacombe 2017; Tripathy et al. 2017). Moreover, stressinducible apyrases may have potential roles in the regulation of wheat environmental stress adaptations (Liu et al. 2019). ATPase activities of cell-wall-bound apyrases were enhanced by a glycoprotein elicitor and were inhibited in a speciesspecific manner by mucin-type glycopeptide suppressors secreted from a pea pathogenic fungus (Takahashi et al. 2006). The addition of the same or related apyrase inhibitors could potentiate the ability of different fungicides to inhibit the growth of different pathogenic fungi in plate growth assays (Tripathy et al. 2017). Thus, apyrases play important roles in plant interactions with microorganisms and in plant stress resistance.
Differentially expressed genes play an important role in the plant-pathogen interaction pathway. Eight up-regulated genes were commonly expressed at the post-infection time points of 6 h, 12 h, 24 h and 96 h, which may related to plant-pathogen interactions with possible involvement in plant responses to Sc. infection. Two of these genes, gene4395 and gene25778, encoded calcium-dependent protein kinases (CDPKs), which can regulate calciummediated effects of hormone responses, participate in regulation of plant responses to a variety of stresses, induce differential expression of other CDPKs, and appear to play important roles in plant disease resistance responses Wei et al. 2016;Shuaifeng et al. 2013).
Here, three calmodulin (CaM) encoded genes, including gene21473, gene 26916, and gene28857, were expressed significantly increased during Sc. infection. And gene23057 was encoded a putative serine/threonine protein kinase (resembling PBS1), which serves as a 'decoy' by activating resistance to pseudomonas syringae (Sun et al. 2017b). These results provide insights into the mechanism whereby these genes participate in host responses to Sc. infection and in plant innate immunity in general.
The plant hormone ethylene regulates a wide range of developmental processes and responses of plants to stress and pathogens. Ethylene signal transduction begins with ethylene binding to a family of ethylene receptors and terminates in a transcription cascade involving EIN3/EIL plantspecific transcription factors (Alonso et al. 2003). Ethylene insensitive 3 (EIN3) belongs to a family of regulatory proteins of transcriptional activators that are post-transcriptionally regulated, with ethylene binding to and stabilizing the  EIN3 structure. MAPK signal transduction pathways are involved in almost all aspects of plant growth and development, as well as plant responses to various environmental stimuli, including pathogen invasion (Martins et al. 2020).
Here we found six up-regulated genes (Gene2411, Gene21262, Gene25585, Gene7418, Gene18962 and Gene2411) encoding EIN3-binding F-box protein that were associated with the KEGG pathway MAPK signaling pathway-plant at post-infection time points of 6, 12, 24, and 96 h. These abovementioned genes, acting as signaling components of ethylene signal transduction in MAPK pathway, may be an essential regulatory mechanism regulates responses of plants to stress and pathogens.

Conclusions
RNA-seq technology was used to conduct transcriptome sequencing analysis on pumpkin leaves at different time points after Sc. infection. A total of 28,570 DEGs were obtained, which including 2351, 4892, 5285, 4698, and 4213 DEGs for 2, 6, 12, 24, and 96 h treatments, respectively. 314 up genes and 248 down genes were obtained for all five sampled post-inoculation time points. DEGs involved in alpha-Linolenic acid metabolism, Pentose phosphate pathway, Valine / leucine / isoleucine degradation, and Carbon fixation in photosynthetic organisms were found to be associated with pumpkin responses to Sc. infection. These results will help enhance understanding of plant genes involved in pumpkin-Sc. interactions and will provide a foundation for future functional genomics-based studies on cucurbit resistance to GSB.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The authors gratefully acknowledge the financial support from Natural  , 2019JCQN003, HNK2019CX08-05, 202007, HNK2019CX14]. The funders had no role in study design, the collection, analysis, interpretation of data, writing of the manuscript, the preparation or decision of the manuscript to publication.

Notes on contributors
Qian Zhao, PhD, mainly engaged in plant disease control technology and mechanism research. Zhenping Gong, PhD, Professor, mainly engaged in crop cultivation research.
Shukun Jiang, PhD, Professor, mainly engaged in crop genetics and breeding research.
Zhugang Li, PhD, Professor, mainly engaged in crop genetics and breeding research.
Liyan Zhang, PhD, mainly engaged in plant genetic pathology and breeding research.
Lizhi Wang, Associate Researcher, mainly engaged in crop genetics and breeding research.
Xianli Yang, PhD, mainly engaged in crop genetics and breeding research.
Mingxian Li, mainly engaged in crop genetics and breeding research.
Zhongjie Li, mainly engaged in crop genetics and breeding research.
Liyong Chi, mainly engaged in crop genetics and breeding research.
Rui Li, mainly engaged in crop genetics and breeding research.
Chao Yan, PhD, mainly engaged in crop genetics and cultivation research.
Yongcai Lai, PhD, Professor, mainly focused on crop genetics and breeding research.
Jianzhong Wu, PhD, Associated Professor, mainly focused on crop genetic improvement and innovative research.