Transcriptome analysis of strawberry (Fragaria × ananasa) responsive to Colletotrichum gloeosporioides inoculation and mining of resistance genes

Abstract Strawberry has high nutritional and economic value and is an important horticultural crop widely cultivated worldwide. Strawberry anthracnose is the most harmful disease in strawberry production, which causes huge production losses every year. Mining disease resistance genes and breeding high-quality strawberry varieties are the fundamental way to solve the problem of strawberry anthracnose. In this study, transcriptome analysis and disease resistance-related genes mining were targeted on strawberry anthracnose. The main results are as follows: (1) A total of 100.45 Gb clean reads were obtained, with no less than 6.26 Gb for each sample; (2) The clean reads for each sample were aligned to the reference genome efficiently at 91.30% − 92.27%; (3) 93,042 genes were annotated; (4) 16 anti-anthracnose-related genes were identified, including FaAUX1, FaARF18, FaSAUR50, FaGH3.6, FaAHP1, FaARR (3, 5 and 11), FaPYR1, FaPYL11, FaPP2C (6, 16, 24, 37 and 51) and FaPR1. The above results can provide theoretical guidance for the molecular breeding of disease resistance of strawberry.


Introduction
Strawberry is a perennial herb in the family Rosaceae. It is loved for its color, strong flavor, high nutrition, good benefits and other advantages, and is known as the 'fruit queen' . Strawberries are distributed around the world, and the cultivation area and production rank first in all kinds of berries [1,2]. Strawberry anthracnose is the most serious disease in strawberry production. It is mainly caused by Colletotrichum gloeosporioides [3][4][5][6][7]. The most affected plant parts are the stolon, petiole, leaf, petals, calyx and fruit. The obvious feature after infection is that the leaves develop local spots and the whole plant wilts. This disease is spread worldwide. It occurs in major strawberry producing areas in China, which can cause annual output losses of 25%~30%, and reduce production by more than 50% in serious years. This seriously threatens the sustainable development of strawberry production [8].
At present, the prevention and control measures of strawberry anthracnose mainly include some chemical prevention and control methods. Pathogens are prone to drug resistance due to the long-term use of chemical pesticides. In addition, these methods can easily lead to strawberry pollution and threaten the health of consumers. To solve the above problems, mining disease resistance genes and cultivating high-quality disease resistance varieties have become the focus of anthracnose prevention and control research. Li [9] constructed a molecular genetic map containing 34 amplified fragment length polymorphism (AFLP) markers and 109 simple sequence repeat (SSR) markers by using 210 F2-generation population crosses between 'Hokowase' and 'Sweet Charlie' .
In addition, Li [9] also cloned 36 nucleotide-binding site and leucine-rich-repeat (nBS-LRR) genes and analyzed the expression of 16 nBS-LRR genes at three time points after strawberry anthracnose inoculation. quantitative real-time (qRT-PCR) confirmed that the expression of 16 genes was markedly altered, in which three genes were upregulated, and the rest were down-regulated following Colletotrichum gloeosporioides inoculation. Zhang et al. [10] found that the FaNBS20 gene may be related to resistance to anthracnose, and different resistance of strawberry varieties to strawberry anthracnose after SA treatment; and its resistance promotion effect can persist for more than 10 d. Zhang [11] first isolated the BAG6 gene in the forest and cultivated strawberry using homologous cloning techniques. Furthermore, he found that overexpression of the FvBAG6 gene enhances resistance to anthracnose in Arabidopsis.
In this study, we performed a transcriptome sequencing analysis of strawberry leaves inoculated with C. gloeosporioides at 0 d, 1 d, 3 d, 5 d and 7 d, from which we identified some related metabolic pathways and disease resistance regulatory genes. These can provide valuable data resources for the future application of plant disease resistance genes.

Strawberry growth and C. gloeosporioides inoculation
The plant material used in the experiment is 'Maria' strawberry. 'Maria' strawberries, also known as 'quartes 1′ strawberries, are a medium ripe strawberry variety in Spain [12]. The fruit is conical, bright red on the surface, shiny, with light yellow meat, fragrant, sweet, with good hardness and transportation resistance. The average weight of first-grade fruit is about 35 g, and the maximum single fruit weight is about 70 g. They are stored in the Institute of Rural Revitalization Science and Technology of Heilongjiang Academy of Agricultural Sciences (126°35′E, 45°40′n). The methods of strawberry growth and C. gloeosporioides inoculation were the same as previously described by Liang et al. [5]. Strawberries were cultivated in a plastic greenhouse under a photoperiod of 16/8 h (light/dark) and air temperatures of 25/18 °C (day/night). The strain used for the test was C. gloeosporioides (Penz.) Penz. & Sacc, which was purchased from Beijing Bena Biotechnology Co., Ltd. The pathogenic fungi used for the experiments were grown in Potato Dextrose (PD) shake-flask cultures for 7-10 days at 25 °C and then were filtered through four layers of gauze into a diluent and adjusted to 10 6 spore mL −1 using a microbial counting chamber. Then, the conidial suspension was transferred to a watering can, and the plant leaves were sprayed to cover their surfaces with a uniform layer of the suspension. Leaf tissues of strawberry were sampled at 0 d, 1 d, 3 d, 5 d and 7 d post inoculation. Plants of 0 d were negative controls with only sprayed diluent. Each treatment had 3 plants. All samples were frozen in liquid nitrogen and stored at −80 °C, which was used as materials for transcriptome sequencing and subsequent quantitative real-time PCR (qRT-PCR) verification.

RNA extraction, library preparation and sequencing
Total RnA from strawberry leaves was extracted by Beijing Bio Marker Co., Ltd. using an RnA prep plant kit (TIAnGEn, China). First-strand cDnA was synthesized using a Prime Script First-Strand cDnA Synthesis kit (TaKaRa, Japan). RnA concentration and purity were measured using nanoDrop 2,000 (Thermo Fisher, Scientific, Wilmington, DE). RnA integrity was assessed using the RnA nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, uSA). Total RnA was used to construct cDnA libraries using a nEBnext® ultra™ RnA Library Prep Kit for Illumina® (nEB, uSA) following the manufacturer's recommendations. The cDnA library was sequenced on an Illumina HiSeq TM 2000 platform, and paired-end reads in 240 bp length were yielded.

Data quality control
Raw data (raw reads) of fastq format were first processed through in-house Perl scripts. In this step, clean reads were obtained by removing reads containing adapter, reads containing ploy-n and low-quality reads from raw reads. At the same time, q20, q30, GC-content and sequence duplication level of the clean data were calculated.

Comparative analysis
The adaptor sequences and low-quality sequence reads were removed from the datasets. Raw sequences were transformed into clean reads after data processing. These clean reads were then mapped to the reference genome sequence (Fragaria_x_ananassa_Camarosa_ Genome_v1. 0. a1; GDR, https://www.rosaceae.org/ rosaceae). only reads with a perfect match or one mismatch were further analyzed and annotated based on the reference genome. Hisat2 tools software was used to map with reference genome.

Quantification of gene expression levels
quantification of gene expression levels Gene expression levels were estimated by fragments per kilobase of transcript per million fragments mapped. The formula is shown as follows:

Differential expression analysis
Differential expression analysis of two conditions/ groups was performed using the DESeq2 R package. DESeq2 provides statistical routines for determining differential expression in digital gene expression data using a model based on the negative binomial distribution. The resulting P values were adjusted using the Benjamini and Hochberg's approach for controlling the false discovery rate. Genes with an adjusted P-value ≤ 0.01 found by DESeq2 were assigned as differentially expressed.

GO enrichment analysis of DEGs
Go enrichment analysis of the DEGs was implemented by the Goseq R packages based Wallenius non-central hyper-geometric distribution [14].

KEGG pathway enrichment analysis of DEGs
We used KEGG [15] as a well-established database resource for analysis of high-level functions and utilities of the molecular dataset (http://www.genome.jp/ kegg/). We used KoBAS [20] software to test the statistical enrichment of DEGs in KEGG pathways.

qRT-PCR validation of the DEGs
To verify the accuracy and reliability of the gene expression levels obtained from this transcriptome sequencing, qRT-PCR verified the expression patterns

Quality analysis of sequencing data
In recent years, transcriptome sequencing has developed into one of the important techniques to study stress response, explore new genes and determine metabolic pathways [21]. In this study, extensive transcript information was obtained by sequencing strawberry leaf tissue at 0 d, 1 d, 3 d, 5 d and 7 d after inoculation with C. gloeosporioides using high-throughput transcriptome sequencing. Specifically, a total of 100.45 Gb clean reads were obtained, with no less than 6.26 Gb for each sample. The percentage of bases with quality values greater than or equal to 30 was not less than 92.70% (Supplemental Table S2).
These results indicate that the sequencing and assembly of RnA meets the required quality and can be used for later analysis.

Comparative analysis
The clean reads were aligned to the reference genome sequence using Hisat2; it was found that 91.30% to 92.27% of the data were able to be aligned to the reference genome (Supplemental Table S3). From 76.18% to 77.76% of the reads were aligned to a unique location of the reference genome. Between 14.00% and 15.77% of the reads were aligned to multiple locations in the reference genome. The reads aligned to the positive strand of the reference genome accounted for 40.00% to 40.97% of the total reads. The reads aligned to the negative chain of the reference genome accounted for 41.91% to 42.45%. The reads aligned to the specified reference gene progenitor are called Mapped Reads. In this study, 88.83% of the mapped reads were in the exonic region of the reference genome, 5.83% were in the intronic region and 5.34% in the intergenic regions ( Figure 1).

Quality analysis of database construction
To assess the adequacy of the data, saturation tests were performed on the transcriptome sequencing data.
The results showed that with the increasing number of sequencing, the number of genes detected at different expression levels tended to reach saturation ( Figure 2), indicating that the sequencing data covers the vast majority of genes in the reference annotation and can better support our subsequent analysis.  Table S4). There were 26,661 genes between 300 bp and 1,000 bp and 62,034 genes with lengths greater than 1,000 bp.

Screening of DEGs
The DEGs were selected by using the differential gene screening criteria Fold ≥ Change 2 and FDR ≤ 0.01. The overall distribution of gene expression levels and magnitude of differences in two samples can be quickly viewed by Volcano and MA Plots. In this study, it can be seen from the Volcano Plot ( Figure 3) and the MA Plot (Figure 4) that the expression levels of most genes tended to be consistent, and the expression levels of a few genes were significantly different between samples.

Statistics of the number of expressed genes
The DEGs were obtained by using the differential gene testing criteria. The results showed that compared with the control, the number  Table S5).

GO enrichment analysis of DEGs
After Among them, 20 subclasses were biological processes, including metabolic processes, cellular processes and single biological processes, etc; 15 subclasses were cellular composition, including cell, cell part and membrane, etc.; 14 subclasses were molecular functions, including catalytic activity and binding protein, etc.

KEGG pathway enrichment analysis of DEGs
Plants suffer from various biological and abiotic stresses throughout their growth and development, and plant endogenous hormones play an important role in plant biotic and abiotic stresses [22]. The KEGG pathway enrichment analysis of DEGs in this study showed that 873, 1,573, 978 and 949 genes were annotated and assigned to 124 KEGG pathways, 1 d, 3 d, 5 d and 7 d after inoculation, compared with the   control, respectively. Among them, the top 20 pathways with high enrichment significance include Photosynthesis-antenna proteins, Plant hormone signal transduction, Galactose metabolism, Fatty acid degradation, Cyanoamino acid metabolism, Glycosphingolipid biosynthesis-globo series, nitrogen metabolism, Flavonoid biosynthesis, Glutathione metabolism, Glycerolipid metabolism, Cutin, suberine and wax biosynthesis, Alpha-Linolenic acid metabolism, Tyrosine metabolism, Starch and sucrose metabolism, Photosynthesis, Phenylalanine, tyrosine and tryptophan biosynthesis, Brassinosteroid biosynthesis, Glycerophospholipid metabolism, Alanine, aspartate and glutamate metabolism, and Linoleic acid metabolism ( Figure 9). of these 20 pathways, the largest number of DEGs was enriched in the phytohormone signal transduction pathways, indicating that the phytohormones play a very important role in the anti-anthracnose process in strawberry.

Analysis of the DEGs in the plant hormone signal transduction pathway
In the auxin (IAA) signal transduction pathway, FaAUX1, FaARF18 and FaSAUR50 genes were downregulated compared with 0 d, with the lowest expression at 3 d. The FaGH3.6 gene showed a tendency to be upregulated first and then downregulated ( Figure 10). In the cytokinin (CTK) signaling transduction pathway, FaAHP1, FaARR3, FaARR5 and FaARR11 genes tended to be upregulated compared with 0 d, with the highest expression at 3 d ( Figure 10). In the abscisic acid (ABA) signal transduction pathway, FaPYR1, FaPYL11, FaPP2C6, FaPP2C16, FaPP2C24, FaPP2C37 and FaPP2C51 genes were upregulated compared with 0 d (Figure 10). In the salicylic acid (SA) signal transduction pathway, the FaPR1 gene was upregulated first and then downregulated compared with 0 d, with the highest expression at 1 d and the lowest expression at 7 d ( Figure 10). The results suggest that these genes may play an important role in the response of strawberry against C. gloeosporioides infection.

qRT-PCR validation of the DEGs
To verify the accuracy and reliability of the gene expression levels obtained from this transcriptome sequencing, the expression patterns of 16 important disease resistance-related genes after 0 d, 1 d, 3 d, 5 d and 7 d infection were confirmed using qRT-PCR. The results showed that the expression trend of these genes at five time points after C. gloeosporioides infection was consistent with transcriptome sequencing (R 2 =0.9376) (Figure 11), indicating that transcriptome sequencing was reliable.

Discussion
Plant pathogens cause billions of dollars in economic losses by reducing or eliminating crop yield or quality [23]. Plant endogenous hormones are activated and initiate defense signals, which is indispensable for plants to resist external stress [24]. The infection by pathogens can affect multiple signaling pathways of plants, and the signaling molecules that induce defense responses are mainly JA, SA, ABA, ET, IAA, GA, CTK and BR. These plant hormones are not only involved in the regulation of pathogen resistance, but also are important regulators in plant disease resistance signaling pathways. Plants activate the corresponding hormone signals to regulate the disease resistance genes and then establish the disease resistance pathway to resist the pathogen infection [24]. SA is an important immune signaling molecule in the plant defense response. After pathogen infection, the SA levels increase significantly, producing locally acquired resistance (LAR) and systematically acquired resistance (SAR) [25][26][27][28][29]. The PR genes of plants are able to respond to biotic and abiotic stresses [30][31][32] and are also often used as the indicator gene for the immune response of the plant hormone SA [22]. The analysis of DEGs in the Plant hormone signal transduction pathway revealed that the FaPR1 gene showed a trend of up-regulated expression before down-regulated expression during the process of C. gloeosporioides infecting strawberry leaves. of these, the highest expression was observed on 1 d and down-regulation on 7 d compared with 0 d. The plant produced a strong defense response in the early stage of the C. gloeosporioides infection, and the plant defense system was broken to some extent after 7 d of infection. The SA signaling pathway was involved in the strawberry anti-anthracnose response, where the FaPR1 gene was a key gene in the disease resistance response. In addition, the IAA, ABA and CTK have also been reported to be involved in regulating the plant resistance to pathogens [27]. IAA is the earliest discovered phytohormone found in humans, whose main role is to promote cell elongation and maintain apical advantage; it is involved in almost all plant growth and development processes [33]. Recent studies indicate that IAA promotes the ubiquitination and degradation of the AuX/IAA repressors by binding to the F-box protein TIR1 (Transport inhibitor response 1) of the ubiquitin ligase complex. The IAA response factor ARF gene family, due to loss of inhibition of AuX/IAA repressor expression, generate numerous ARF proteins and activate the IAA signaling pathway [34][35][36]. Analysis of the DEGs in the Plant hormone signal transduction pathway in this study revealed that the FaAUX1, FaARF18 and FaSAUR50 genes were down-regulated in the IAA signaling pathway, and that the FaGH3.6 genes were first up-regulated before being down-regulated. It indicates that genes in IAA signaling play a very important role in strawberry in response to C. gloeosporioides infection. Blocking IAA signaling enhances strawberry resistance to C. gloeosporioides, probably because IAA can induce the production of expansin (a protein that eases the cell wall), leading to the relaxation of the natural disease resistance barrier cell wall, thus making plants susceptible to infection. Moreover, pathogen infection causes unbalanced auxin levels and thus changes the expression of genes involved in the signaling pathway. The early IAA-response gene FaGH3.6 encodes an IAA-binding enzyme that binds excess IAA to amino acids to regulate IAA homeostasis to enhance strawberry resistance to C. gloeosporioides. CTK can promote cell division, assist in nutrient transport as well as extend leaf lifespan [37]. CTK is also involved in the disease resistance response in plants. Improving the content of tobacco endogenous CTK can induce the expression of the SA pathway PR gene and the production of extracellular chitinase, and enhance tobacco resistance to TMV [38]. In Arabidopsis, CTK is able to activate ARR2 transcription factor expression and enhance its resistance to P. syringae [39]. Analysis of the DEGs in the Plant hormone signal transduction pathway in this study revealed that the FaAHP1, FaARR3, FaARR5 and FaARR11 genes were up-regulated in the CTK signaling pathway. It was shown that infection of C. gloeosporioides activates the CTK signaling pathway to fight against pathogens. The ABA can be regulated after plants are stressed by adverse factors in the environment and downstream genes to help plants adapt to complex environments. Recent studies demonstrate the important role of ABA in the plant disease resistance response. Abscisic acid deficiency causes changes in cuticle permeability and pectin composition that influence tomato resistance to Botrytis cinerea [40]. Spray of exogenous ABA can inhibit the proliferation of Cochliobolus miyabeanus in the rice mesophyll [41]. Analysis of DEGs in the Plant hormone signal transduction in this study revealed that many key genes were up-reregulated in the ABA signaling pathway, including FaPYR1, FaPYL11, FaPP2C6, FaPP2C16, FaPP2C24, FaPP2C37 and FaPP2C51. This indicates that the ABA signaling pathway is involved in the strawberry anti-anthracnose response.

Conclusions
A large amount of transcript information was obtained by sequencing strawberry leaf tissue at 0 d, 1 d, 3 d, 5 d and 7 d after inoculation with C. gloeosporioides using high-throughput transcriptome sequencing. A total of 93,042 genes were annotated to the CoG, Go, KEGG, KoG, Swissprot, noG and nR databases, as determined by a sequence alignment with the specified reference genome. Analysis of DEGs revealed that the number of DEGs was 4,494, 8,034, 4,791 and 4,753 at 1 d, 3 d, 5 d and 7 d compared with the control, respectively. Go enrichment analysis of DEGs revealed that DEGs were mainly enriched in metabolic processes, cellular processes, single biological processes, catalytic activity and binding proteins, etc. KEGG pathway enrichment analysis of DEGs revealed that 873, 1,573, 978 and 949 genes were annotated and assigned to 124 KEGG pathways at 1 d, 3 d, 5 d and 7 d compared with the 0 d, respectively. Among the top 20 pathways with high enrichment significance, the largest number of genes were enriched in the Plant hormone signal transduction pathways. Analysis of the DEGs in the Plant hormone signal transduction pathway identified a total of 16 anti-anthracnose-related genes, including : FaPR1, FaAUX1, FaARF18, FaSAUR50,  FaGH3.6, FaAHP1, FaARR3, FaARR5, FaARR11, FaPYR1,  FaPYL11, FaPP2C6, FaPP2C16, FaPP2C24, FaPP2C37 and FaPP2C51.     Science and Technology Innovation in Heilongjiang Province (no. 2021YYYF032).