Prognostic significance of ferroptosis-related genes and their methylation in AML

ABSTRACT Background Ferroptosis involves in the development and therapeutic response of various types of tumors. This study aims to explore ferroptosis-related prognostic genes that could further accurately stratify AML patients. Methods We investigated the prognosis significance of ferroptosis-related genes in AML by Univariate and multivariate Cox proportional hazards regression analyses. With the methylation data of TCGA samples, we looked for methylation sites associated with prognostic genes and compared the correlation between methylation and mRNA expression. R software and ‘edgeR’ packages were used to identify the DEGs between the high-and-low-risk groups divided by the FRPGs prognosis model and then run GO enrichment, KEGG pathway, and PPI network. Results We found a prognostic risk model that included AKR1C2 and SOCS1 predicted outcomes in AML patients. Methylation analysis showed that AKR1C2 and SOCS1 are negatively regulated by their methylation, leading to their low expression in AML patients. Besides, both decreased SOCS1 expression and hypermethylation predicted favorable OS and PFS in AML patients. Finally, this prognostic risk model exhibited a close correlation with several clinical features, especially with age (P=0.005), cytogenetic type (P=0.031), risk_cytogenetic (P=0.001), and risk_molecular (P<0.001). Functional enrichment analysis showed that DEGs are most enriched in the regulation of cell death and the PI3K-Akt signaling pathway. Conclusion AKR1C2 and SOCS1 are promising biomarkers for predicting prognosis in patients with AML.


Introduction
Acute myeloid leukemia (AML) is a group of molecularly heterogeneous hematologic malignancies. Its initiation mainly attributes to changes in pivotal cellular regulatory factors, such as transcription and epigenetic factors [1]. Currently, cytogenetic outcomes and molecular abnormalities at diagnosis consider being the most important prognostic factors for AML and highly predictive of complete response rate(CR), disease-free survival(DFS), overall survival(OS), and recurrence risk [2,3].
Current clinical guidelines classify AML into three groups (favorable, intermediate, and adverse risk) according to cytogenetic and molecular risks of AML patients. The intermediate-risk group is the most subgroups, accounting for approximately 45% of adult AML patients. AML patients with normal karyotypes (NK-AML) and patients without adverse or favorable molecular abnormalities are in this subgroup [2]. The ideal therapy strategy for these patients endures unclear, and treatment outcomes are heterogeneous. Several studies have found molecular alterations to further refine risk stratification (such as KIT, AF1q, MN1, ERG, MLL, and BAALC) in this subgroup of patients [4][5][6]. It is necessary to further perfect the risk stratification of AML patients and provide more accurate and appropriate therapy for the patients with intermediate-risk.
Epigenetic modification abnormalities appear in a large proportion of AML patients [7]. As a critical regulator of gene expression, DNA methylation study may also help to the more precise grouping of AML patients. DNA methylation is a key epigenetic mechanism involved in transcriptional regulation and cell development and function [8]. Hypomethylation rules in introns and repeats elements, while hypermethylation usually happens in so-called CpG islands and is correlated with gene promoter regions. Numerous studies have established a link between abnormal DNA methylation and gene silencing in disorders [9][10][11]. These suggest the potential importance of DNA methylation as a prognostic and risk marker in AML.
Ferroptosis is an iron-dependent form of regulatory cell death caused by excessive lipid peroxidation, associates with the development and therapeutic response of various types of tumors [12,13]. In this study, we first selected the ferroptosis-related genes (FRGs) of AML in the TCGA database, explored the ferroptosis-related prognostic genes (FRPGs) and their DNA methylation in AML. Then, we validated the prognostic-related genes in patients at intermediate risk classified by the NCCN AML classification according to their molecular and cytogenetic risk status. Finally, according to the prognostic model, patients were divided into low-and high-risk groups. We calculated the differentially expressed genes (DEGs) between the two groups and examined the biological processes of DEGs through gene enrichment analysis to study the functional mechanism of FRPGs in AML.

Acquisition of ferroptosis-related genes in AML
First, we downloaded the clinical data, transcriptional data, along with methylation profiles of patients with AML from the TCGA database via the Cbioportal website (https://www.cbioportal.org/study/summary? id=laml_tcga_pub). The inclusion criteria were patients with complete transcriptional, methylation, and clinical data. 154 AML samples were retrieved for subsequent analysis. Then, we downloaded the FRGs data from the FerrDb website (http://www.zhounan.org/ferrdb/ operations/download.html). FerrDb is the database of ferroptosis regulators and markers and ferroptosisdisease associations [14]. The genes in AML of the TCGA database were intersected with FRGs to obtain FRGs in AML.

Construction of a prognostic signature
We performed the univariate Cox regression analysis of FRGs in AML by using the 'survival' package in R software to obtain the FRPGs (P<0.05). Then, we searched an online website for gene expression profiling interactive analysis (GEPIA) to investigate the DEGs of the FRPGs in AML samples and healthy control samples. Moreover, the DEGs of prognostic significance were further put into multivariate Cox regression analysis to obtain the FRPGs for constructing the prognosis model and generated the regression coefficients of these crucial genes. To reflect the effect of the prognostic model on the survival status of AML patients, we calculated the risk score of each patient by multiplying the normalized gene expression of each FRPG with its corresponding regression coefficient and divided patients into low-and-high-risk groups according to the mean risk score. We further conducted the Kaplan-Meier curve analysis to estimate the relationship between the risk score, overall survival, and disease-free survival. We also performed time-dependent receiver operating characteristic (ROC) curve analysis to evaluate the 3-year and 5year predictive value of the os status and calculated the area under the curve (AUC) of the ROC curve to examine the classifier performance. Besides, we analyzed the relationship between the different clinical parameters and the risk score based on FRPGs. To explore whether these prognostic genes can use as supplements to the current AML molecular risk stratification, we further applied the prognosis model in patients at intermediate risk classified by the NCCN AML classification according to their molecular and cytogenetic risk status.

DNA methylation analysis of prognostic genes
With the methylation data of TCGA samples, we looked for methylation sites associated with prognostic genes and compared the correlation between methylation and mRNA expression. MethSurv (https://biit.cs.ut.ee/ methsurv/) contains the clinical and methylation data  on AML in the TCGA dataset and provides online analysis [15], was used to draw the heat maps of methylation sites and to do survival analysis of methylation sites. We also investigated the detailed association of FRPGs and their methylation with a range of clinical features. Graph of methylation and gene expression, survival curves, and density figures of the methylation sites were plotted.

GO and KEGG functional enrichment analysis
In the part of construction of a prognostic signature, the TCGA cohorts were divided into high-and-low-risk groups by multivariate regression analysis. Then, the 'edgeR' package of R software was used to compare the gene expression profiles of patients in high-andlow-risk groups and the DEGs between these two groups were obtained. Then, analyzed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses based on the DEGs (P-value < 0.05, | log2FC| ≥ 1.5) between the high-risk and low-risk groups using the 'clusterProfiler' R package.

Protein-protein interaction network construction and module analysis
The protein-protein interaction (PPI) network of DEGs was constructed using the gene/protein retrieval tool (https://string-db.org/), and the results were visualized by Cytoscape software (version 3.8.0). Cytoscape MCODE plug-in provided access to select the significant modular of PPI network [16,17]. The patterns default parameters are as follows: degree cutoff=3, node score cutoff=0.2, k-core=2, and max. depth=100. For genes in the hub modules, we use ClueGO + CluePedia for functional enrichment analysis.

Statistical analysis
The OS and DFS between different groupswere compared by Kaplan-Meier analysis with the Nonparametric test. The student's t-testwas used to compare gene expression between high-and-low groups. Pearson correlation and Spearman's rho wereused to analyzing the correlation between methylation and mRNA expression. A chi-square test wasused to investigate the detailed association of prognostic-related genes, AKR1C2 expression, SOCS1 expression, and their methylation with a set of clinical features. All statistical analyses were performed with R software and SPSS software, and statistical significance was set at probability values of p < 0.05.

Ferroptosis-related gene extraction
In this study, we used three FRGs data sets provided by the FerrDb database, including driver genes that promote ferroptosis, suppressor genes that prevent ferroptosis, and marker genes that indicate the occurrence of ferroptosis. Among them, four genes were listed as drivers and inhibitors simultaneously and were excluded from our study. Excluding the multiannotated genes, 130 overlapping ferroptosis-related genes from the TCGA dataset and FRGs were selected for further analysis. The Wien diagram isshown in Figure 1.

Construction of a prognostic signature
Populations of 154 patients from the TCGA dataset were identified and included in this study. The univariate Cox proportional hazards regression analysis showed that 20 genes had significant prognostic relevance (p < 0.05) ( Table 1). GEPIA identified 6 genes that were differentially expressed between AML samples and normal samples (Figure 2A-2F). Multivariate Cox proportional hazards regression analysis was further performed on the 6 genes, which screened AKR1C2 and SOCS1. The risk score for predicting overall survival was calculated as follows: Risk . Survival analysis showed that the patients in the high-risk group had worse overall survival than those in low-risk group in TCGA cohort. C. DFS analysis showed that the patients in the high-risk group had shorter DFS than those in low-risk group in TCGA cohort D. ROC analysis was performed to calculate the most optimal cutoff value to divide the AML patients into high risk and low risk group.
score=0.160* AKR1C2 + 0.383* SOCS1 ( Figure 3A). According to the risk score, patients were divided into low-and high-risk groups. Survival analysis showed that low-risk patients had longer OS and DFS than high-risk patients (median OS: 33.56 months vs. 14.04 months, p<0.0001; median DFS:26.79 months vs. 9.55 months, p<0.0001) ( Figure 3B、3C). The AUC of 3-year and 5-year os survival time roc curve analysis were 0.73 and 0.76, respectively ( Figure 3D). We then used age as a control variable to compare the risk score based on the 2 FRPGs with different clinical factors, such as BM blast percentage, PB blast percentage, cytogenetic risk, molecular risk, WBC count, and sex on survival. The results showed that risk scores based on 2 genes, cytogenetic stratification, and molecular stratification were more common than age factors and increased the occurrence of survival. The occurrence of BM blast percentage, PB blast percentage, WBC count, and sex were lower than that of age and did not affect the occurrence of survival ( Figure 4).  Figure  5C,5D).

DNA methylation analysis of FRPGs
We explored the methylation status of AKR1C2 and SOCS1 genes and found that AKR1C2 has 4 methylation sites and SOCS1 has 24 methylation sites. Figure 6A,6B clearly shows the distribution of CpG sites of each gene. Then Pearson correlation and Spearman's rho were used to analyzing the correlation between methylation and mRNA expression. We observed that two methylation sites were negatively correlated with gene expression. The relationship between methylation and mRNA expression is shown in Figure 7A We used the chi-square test to investigate the detailed association of FRPGs, AKR1C2, SOCS1 expression, and their methylation with a range of clinical features. As shown in Table 2, the FRPGs were correlated with age (P=0.005), cytogenetic type (P=0.031), risk_cytogenetic (P=0.001), and risk_molecular (P<0.001). The expression of SOCS1 was correlated

GO and KEGG functional enrichment analysis
240 DEGs were found between high-riskand low-risk groups. The volcanic map of DEGs is shown in Figure 9A. The results of functional and pathway enrichment analysis are shown in Figure 9B, 9C. Gene Ontology (GO) analysis of DEGs showed that regulation of cell population proliferation, cell death, protein phosphorylation, cell migration, NMDA glutamate receptor clustering, and engulfment of the apoptotic cell, were mostly enriched. Among the functional and pathway enrichment analysis, ECM-receptor interaction, protein digestion, and absorption, PI3K-Akt signaling pathway, enzymelinked receptor protein signaling pathway, transmembrane receptor protein tyrosine kinase signaling pathway, cGMP-mediated signaling, positive regulation of peroxisome proliferator-activated receptor signaling pathway, and cAMP-mediated signaling pathway were enriched most.

PPI network and analysis on clusters
STRING mapped 240 DEGs into PPI network containing 154 nodes and 287 edges ( Figure 10A). Then MCODE was used to discover clusters in the network. 3 clusters were selected for GO term and KEGG pathway enrichment analysis. Among them, cluster 1 contained 8 nodes and 28 edges, with the highest score ( Figure 10B), cluster 2 contained 6 nodes and 14 edges ( Figure 10C), cluster3 contained 5 nodes and 10 edges ( Figure 10D). We then performed a functional analysis for the 3 important clusters using ClueGO. The results showed that chloride channel complex, negative regulation of chemotaxis, and extracellular matrix structural constituent conferring tensile strength were most enriched ( Figure 10E).

Discussion
In this study, we screened out the FRGs in AML based on TCGA and FerrDb datasets. We found the expression of AKR1C2 and SOCS1 were significantly different in AML samples and healthy control samples and were negatively related to the OS and DFS of the AML patients. We used age as a control variable to compare the risk score based on these 2 genes with different clinical factors and found that as the same as cytogenetic stratification and molecular stratification, risk based on these 2 genes was more common than age factors and increased the occurrence of survival. Moreover, we also verified the prognostic role of AKR1C2 and SOCS1 expression in patients with intermediate molecular and cytogenetic risk status in the NCCN AML classification. The results all emphasized the promising prognostic value of AKR1C2 and SOCS1 expression in AML patients. We compared the correlation between methylation and mRNA expression of AKR1C2 and SOCS1 genes and identified methylation sites that were negatively correlated with mRNA expression of these two genes. We also explored the prognostic role of methylation sites.
Aldo-keto reductases family 1 member C2 (AKR1C2), a member of the aldo/keto reductase superfamily. Previous studies have shown that its expression is selectively deleted in prostate and breast cancer and leads to clonal expansion of tumor cells. In addition, some studies have shown that its expression is negatively correlated with apoptosis-inducing factor (AIF), and is involved in the metastasis of liver cancer and the occurrence of NSCLC drug resistance [18][19][20]. Suppressor of cytokine signaling 1 (SOCS1) is identified as a tumor suppressor gene. Silencing of SOCS1 expression due to promoter methylation has been found in several malignancies such as pancreatic cancer, breast cancer, lymphoma, and leukemia [21][22][23][24][25]. There is a study mentioned that SOCS1 methylation leads to SOCS1 gene silencing, which promotes AML cell growth and proliferation by inhibiting the downstream JAK2/STAT signaling pathway [26]. However, in our study, low expression of SOCS1 suggested a better prognosis. The specific role of SOCS1 methylation in AML still needs to be further investigated. Besides, no studies have been conducted on the prognostic effect of AKR1C2 and SOCS1 on AML. In this study, we systematically investigated the relationship between AKR1C2 and SOCS1 expression and survival using the TCGA database and found that low AKR1C2 and SOCS1 expression was highly correlated with more favorable OS and DFS in AML patients.  Increasing evidence has proved that abnormal DNA methylation plays a crucial role in the occurrence and development of cancers [27,28]. In our analysis, we first determined whether the methylation status of AKR1C2 and SOCS1 would affect their mRNA expression through Pearson correlation and Spearman's rho. There was a significant negative correlation between cg18841653 and cg03014241 CpG sites methylation and ARK1C2 and SOCS1 mRNA expression in AML. This negative correlation can well explain the low expression of AKR1C2 and SOCS1 in AML. We then investigated the prognostic significance of two selected CpG sites in AKR1C2 and SOCS1 DNA methylation and found that SOCS1 hypermethylation was strongly associated with better OS and PFS in AML patients. In summary, AKR1C2 and SOCS1 are negatively regulated by their methylation, and SOCS1 methylation status may be a strong indicator of good OS and PFS.
Ferroptosis is a new type of iron-dependent programed cell death. Its main mechanism is catalyzing the highly expressed unsaturated fatty acids on the cell membrane under the action of ferric divalent or esteroxylase, resulting in lipid peroxidation, thus inducing cell death [12]. Ferroptosis involves in a variety of diseases, including inhibition of tumor cell growth [29][30][31]. In our study, we divided patients into low and high-risk groups based on the expression of FRPGs and identified DEGs of the two groups. Functional enrichment analysis showed that DEGs enriched in regulation of cell death, unsaturated fatty acid metabolic process, PI3K-Akt signaling pathway, ECM-receptor interaction, regulation of cAMP-mediated signaling, and cGMP-mediated signaling pathway. Significantly, the metabolism of unsaturated fatty acids is associated with ferroptosis. PI3 K signaling pathway and ECMreceptor interaction are associated with cell differentiation, proliferation, and apoptosis [32,33]. The cAMP and cGMP-mediated signaling pathways are involved in regulating a variety of cellular physiological processes. However, although we have explored the biological processes involved in FRPGs in AML through enrichment analysis, how the expression of FRPGs and their methylation affect the prognosis of AML still needs to be further explored in biomedical experiments. Nevertheless, the role of FRPGs in the stratification of prognosis in AML is now clear. The more precise prognostic stratification, especially in patients with intermediate-risk is encouraging, and FRPGs are prognosis biomarkers worthy of further exploration.

Availability of data and materials
The clinical data, transcriptional data, along with methylation profiles of patients with AML are available in the TCGA database via the Cbioportal website (https://www.cbioportal.org/study/summary?id=laml_ tcga_pub). The FRGs data are available in the FerrDb website (http://www.zhounan.org/ferrdb/index.html).