Transcriptome profiling revealed heat stress-responsive genes in Arabidopsis through integrated bioinformatics analysis

ABSTRACT Heat stress is an environmental challenge that reduces plant productivity and growth. Plants have developed corresponding mechanisms to survive this adverse environmental stress. To demonstrate the mechanisms of how plants adapt to the environmental challenge, the heat response experiments involving Arabidopsis thaliana were retrieved from the GEO database. After quantile normalization of the GEO raw data, the differentially expressed genes (DEGs) in response to heat stress were identified by robust rank aggregation (RRA) algorithm, including 384 up-regulated and 302 down-regulated genes. Then, systematic bioinformatics analyses disclosed that the up-regulated DEGs were mainly related to protein refolding, abscisic acid catabolic process, potassium ion import, response to hydrogen peroxide, cytochrome complex assembly, and apoptotic process, and the down-regulated DEGs were involved in microsporocyte differentiation, syncytium formation, adventitious development, glutathione metabolism, and glycine metabolic process. The up-regulated DEGs also had more complicated PPI interactions than the down-regulated DEGs, and potential core genes of heat stress tolerance were provided. Furthermore, the mRNA expression of core genes Hsp70-2 (At5g02490), Hsp70-3 (At3g09440), and Hsp70-4 (At3g12580) has been measured to validate the outcome of integrated bioinformatics analysis. Our work will extend our understanding of heat-responsive mechanisms and these DEGs might be potential markers for plant heat stress-resistance studies.


Introduction
Heat is an extreme climate condition, which threatens plant growth and agricultural production. Plants have evolved various systems to respond to environmental stresses. To better understand how plants adapt to heat stress and increase crop yield, many studies have been performed to disclose the differentially expressed genes (DEGs) in response to heat stress in Arabidopsis (Gong et al. 2020).
DREB2A is an important transcriptional activator regulating heat-responsive genes. DREB2A can be activated by dephosphorylation of NRD (negative regulatory domain) and improve thermotolerance of Arabidopsis (Mizoi et al. 2019). Heat stress transcription factor A-6b (HSFA6b) plays an essential role in thermotolerance and can be induced by proteins responsive to ABA signaling through promotor activation. HSFA6b can bind to the promoter of dehydration-responsive protein and increase its expression (Huang et al. 2016). As a result, the activated ABA-signaling pathway enhances the plant thermotolerance. The ABA receptors RCAR12 and RCAR13 also play vital roles in increasing plant adaptability to extreme temperature. Under heat stress, overexpression of RCAR12 or RCAR13 in Arabidopsis can greatly increase the rates of germination and survival. RCAR12 or RCAR13 also can induce the expressions of heat-responsive proteins HSP18.2 and HSP70 and improve plant thermotolerance . WRKY25, WRKY26, and WRKY33 are other genes involved in heat tolerance and participate in ethylene and HSP-related signaling pathways. Knock-down of WRKY25, WRKY26, and WRKY33 significantly decreased the tolerance of Arabidopsis to high temperature, whereas overexpressing these genes greatly enhanced thermotolerance (Li et al. 2011). Additionally, WRKY39 and LIHSFA1 are both heat-responsive transcription factors. WRKY39 positively regulates the expression of PR1 and MBF1c, and participated in the salicylic acid (SA)-and jasmonate (JA)activated signaling pathways (Li et al. 2010), whereas overexpression of LIHSFA1 improved the expression of HSP proteins and enhanced plant thermotolerance in Arabidopsis (Gong et al. 2014).
In this work, to present a comprehensive overview of heatresponsive DEGs in Arabidopsis, we retrieved microarray data from the gene expression omnibus database (GEO). Based on the robust rank aggregation (RRA) algorithm, 384 up-regulated and 302 down-regulated DEGs were screened out from about 30,000 Arabidopsis genes. Then, the molecular function, KEGG pathways, signaling pathways, protein-protein interaction (PPI) interaction networks, and potential hub DEGs were analyzed by an integrated bioinformatics strategy. Our findings facilitated the demonstration of the mechanisms of plant heat-stress response and might provide potential targets for future research in plant thermotolerance.

Retrieval of heat-responsive genes in Arabidopsis from GEO database
The microarray datasets of the expression profile of heat stressrelated genes in Arabidopsis were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) (Table 1).

Data normalization and integration
The microarray data were downloaded from GEO database (https://www.ncbi.nlm.nih.gov/geo/). The file is in TXT format. The dataset details were shown in Table 1. The R software package was applied to process the matrix files and to filter the low-quality data. The resultant data was log2 transformed and processed with the limma package (http://www.bioconductor. org/) to retain the DEGs with the criterion of p-value < 0.05 and |log2fold change (FC)|> 1. Furthermore, the DEGs identified above were integrated with the robust rank aggregation software package (RRA) (https://cran.r-project.org/web/ packages/RobustRankAggreg/index.html). Using a null hypothesis of uncorrelated inputs, the RRA algorithm defines genes ranked consistently better than expected and then assigns a significance score to each gene. The hypothesis of the RRA method is that each gene is in each experiment with random order. If a gene is highly ranked in all experiments, its p-value is smaller (Niu et al. 2019).

Cluster analysis of heat-responsive DEGs in Arabidopsis
The resultant heat-responsive DEGs were clustered by the software Gene Cluster 3.0 using the C Clustering Library version 1.24 created by Michael de Hoon, Seiya Imoto, and Satoru Miyano, (http://bonsai.hgc.jp/~mdehoon/software/ cluster/software.htm#ctv) and the results were plotted using the software Java TreeView created by Alok (Version 1.0.4; http://jtreeview.sourceforge.net).

Gene ontology classification and KEGG Pathway analysis
The online bioinformatics tools PANTHER (Protein Analysis Through Evolutionary Relationships) classification system (Version 15.0, released 2019_04) (http://pantherdb. org/) and DAVID (Database for annotation, visualization, and integrated discovery, Version 6.8) (https://david. ncifcrf.gov/) were applied for gene ontology enrichment analysis of the DEGs. Each gene was classified into one category.

Protein-protein interaction (PPI) networks
The analysis of PPI interaction networks was performed by the online STRING (search tool for recurring instances of neighboring genes) database (Version 11.0, released 19 January 2019) (http://string-db.org/). Subsequently, the MCC app in Cytoscape software (https://cytoscape.org/) was used to screen modules within the PPI network with the default parameters.

Growth conditions and harvest of Arabidopsis
In accordance with previous publications (Guo et al. 2020b), the ecotype Col-0 Arabidopsis thaliana seeds were germinated on the normal medium plate with 22/20°C day/ night, 8/16 h light/dark cycle, 60 μmol·m −2 s −1 light intensity. After seven days, the A. thaliana seedlings underwent heat stress by exposure to a temperature of 37°C for 3 h. Then the Arabidopsis roots and leaves were harvested and stored at -80°C.

qRT-PCR analysis
In accordance with our previous method of RNA isolation and qRT-PCR analysis (Guo et al. 2020b), the relative expression levels of the Arabidopsis genes Hsp70-2 (At5g02490), Hsp70-3 (At3g09440), and Hsp70-4 (At3g12580) were determined and the Arabidopsis gene actin 2 (AT3G18780) was used as an endogenous reference in Arabidopsis roots and leaves ( Table 2). All experiments were performed in triplicate.

Statistical analysis
All data were reported as mean ± standard deviation (SD), and statistically analyzed using the unpaired t-test by Graph-Pad Prism 7 (GraphPad Prism, La Jolla, CA). A p-value less than 0.05 was considered statistically significant. Gene names Forward primers (5 ′ to 3 ′ ) Reverse primers (5 ′ to 3 ′ )

Results
Microarray data of Arabidopsis retrieved from GEO and raw data normalization In the present work, eight heat-responsive microarray datasets were retrieved from the GEO database (Table 1). To integrate the experimental data from different labs and platforms, the data were normalized using quantile normalization in R (Supplementary Figure S1).

Quantification of heat stress-responsive DEGs in Arabidopsis
Then, these heat-responsive gene datasets were screened by the limma package with the criteria of p-value <0.05 and | log 2 FC| > 1 (Figure 1). Eight heat stress studies on Arabidopsis were included in this study. The DEGs identified in each study were summarized in Table 3. These heat-responsive DEGs were further clustered by the software Cluster 3.0, and visualized by the software Java TreeView (Figure 2).

Heat-responsive DEGs in Arabidopsis screened by RRA algorithm
After quantification of the eight heat stress-response experiments, the RRA method was used to integrate the above DEGs ( Figure 3). As a result, 384 up-regulated and 302 down-regulated genes were pooled out as the heat stressresponsive DEGs of Arabidopsis (Supplementary Table S1).

Bioinformatics enrichment of heat stress-responsive DEGs in Arabidopsis
Bioinformatics tools were further used to analyze the 384 upregulated and 302 down-regulated genes. Figure 4 showed GO classification of heat stress-responding up-regulated DEGs (Figure 4(A)) and down-regulated DEGs (Figure 4 (B)) in Arabidopsis. For biological process (BP), the top three terms of up-regulated DEGs were related to protein refolding (fold-enrichment: 23.9), response to high light intensity (fold-enrichment: 21.5), and abscisic acid catabolic process (fold-enrichment: 21.5) (Supplementary

KEGG pathway of heat-responsive DEGs in Arabidopsis
Based on the analysis of KEGG pathway, the top three processes of up-regulated DEGs were involved in protein processing in endoplasmic reticulum (fold-enrichment: 6.84), galactose metabolism (fold-enrichment: 4.19), and spliceosome (fold-enrichment: 3.60), whereas the top three processes of down-regulated DEGs were related to C5-Branched dibasic acid metabolism (fold-enrichment: 17.1), glucosinolate biosynthesis (fold-enrichment: 14.2), and one carbon pool by folate (fold-enrichment: 8.52) ( Figure 5). The details of KEGG pathway of heat-responsive DEGs in Arabidopsis can be found in Supplementary  Table S3.

PPI network of core heat-responsive genes in Arabidopsis
Protein interactions of the identified DEGs were investigated by the online software STRING, the 384 up-regulated heat stress-responsive DEGs were connected with 2097 edges (PPI enrichment p-value: <1.0E-16), and the 302 down-regulated heat stress-responsive DEGs were connected with 591 edges (PPI enrichment p-value: <1.0E-16) ( Figure 6) (Supplementary Table S4).

qRT-PCR analysis of core heat-responsive genes in Arabidopsis
Our results showed that the core heat-responsive Arabidopsis genes Hsp70-2 (At5g02490), Hsp70-3 (At3g09440), and Hsp70-4 (At3g12580) were all up-regulated under heat stress. Therefore, we validated the bioinformatics analysis results by assaying the relative mRNA expressions of Hsp70-2 (At5g02490), Hsp70-3 (At3g09440), and Hsp70-4 (At3g12580) in both Arabidopsis roots and leaves that had been exposed to heat stress. All three genes showed increased mRNA expression after heat shock in Arabidopsis roots and leaves (Figure 7). The results verified the confidence of bioinformatics analysis.

Discussion
High temperature negatively affects plant growth and crop yield. In the past decades, to explore the mechanisms of  GSE4062  2931  2707  GSE12619  723  743  GSE16222  1310  1196  GSE19603  1095  1148  GSE43937  2119  1909  GSE44053  1231  1081  GSE63128  3754  3386  GSE63372 3106 3027 heat tolerance in plants, Arabidopsis has been used as the ideal model plant (Zhang et al. 2018), and several heatresponsive DEGs have been identified in the previous studies (Gong et al. 2020). But more details are still needed for our further understanding of plant response to heat stress. Our previous work indicated that GEO-based identification of DEGs would facilitate the investigation of the comprehensive pathways (Guo et al. 2014), biological processes (Guo et al. 2019), and protein interaction networks (Guo et al. 2020a) under different stresses in Arabidopsis.
Additionally, our results also showed DEGs responding to heat stress also cross-talked with other DEGs associated with wounding stress, abscisic acid stress, salt stress, water deprivation stress, and cold stress, which could improve our understanding of heat-stress DEGs (Guo et al. 2020b).
In the present work, we retrieved more DEGs responding to heat stress from the GEO database. As a result, 384 up-regulated and 302 down-regulated genes were identified  by the RRA screening method. The up-regulated DEGs were mainly associated with protein refolding, abscisic acid catabolic process, response to hydrogen peroxide and apoptotic process, while the down-regulated DEGs were related to adventitious development, glutathione metabolism, and glycine metabolic process. Among the up-regulated DEGs, L-ascorbate peroxidase 2 (APX2) is a reactive oxygen species (ROS)-scavenging enzyme that was necessary to maintain the balance of ROS. Knockdown of APX2 in Arabidopsis would reduce the capacity of heat stress tolerance. In addition, the Zinc finger proteins ZAT10 and ZAT12, which belong to the ZAT family, were both up-regulated under heat stress. ZAT12 can interact with and positively regulate the expression of APX2. ZAT10 is also involved in the heat-stress response (Davletova et al. 2005;Vanderauwera et al. 2011). BZIP transcription factor 28 (BZIP28) is a membrane protein located in the endoplasmic reticulum (ER) membrane. Heat stress can induce its cleavage by protease, and the resultant cleaved-BZIP28 can transfer from ER to the nucleus for enhancing the transcription of heat-responsive genes (Gao et al. 2008;Sun et al. 2013).
Heat stress causes the accumulation of immature/ unfolded proteins, which is toxic and leads to apoptosis. To avoid this, plants have evolved corresponding strategies to maturate or degrade these unfolded proteins. Therefore, several kinds of heat stress transcription factors, chaperone proteins, and heat shock proteins play important roles in renaturation of immature proteins under high temperature (Baniwal et al. 2007;Kotak et al. 2007;Zhong et al. 2013). In the present work, 9 heat stress transcription factors, chaperone proteins, or heat shock proteins, 11 heat shock 70 kDa proteins, and 5 heat shock 90 kDa proteins were all up-regulated under heat stress, which all played vital roles in renaturation of denatured proteins (Supplementary Table S5). To validate the alterations of core heat-responsive Arabidopsis Hsp genes, qRT-PCR experiments were performed to assay the relative mRNA expressions of Hsp70-2 (At5g02490), Hsp70-3 (At3g09440), and Hsp70-4 (At3g12580) in Arabidopsis under heat stress. The results indicated all three genes in Arabidopsis roots and leaves all increased mRNA expression after heat shock, which were consistent with the bioinformatics analysis. Therefore, Hsp genes might play a more important role in responding to heat stress. Furthermore, the detailed investigations of the regulation of protein homeostasis during heat stress are still needed in the future.

Conclusion
This work shed light on the mechanisms of heat stress-tolerance in Arabidopsis and identified potential DEGs, which enhanced the understanding of how the plants adapt to and survive under heat stress. These findings might provide targets for research on heat-tolerance and increase the crop output in the future.