Somatic mutation-associated risk index based on lncRNA expression for predicting prognosis in acute myeloid leukemia

ABSTRACT Objectives: Genomic instability has several implications for acute myeloid leukemia (AML) prognosis. This article aims to construct a somatic mutation-associated risk index (SMRI) of genomic instability for AML to predict prognosis and explore the potential determinants of AML prognosis. Methods: We obtained differentially expressed lncRNAs from genomic instability subtypes and selected six lncRNAs to construct the SMRI through multivariate Cox regression analysis. The median SMRI classified patients into high and low SMRI groups. Kaplan–Meier survival analysis was used to clarify the prognostic differences of SMRI subtypes. Receiver operating characteristic curve analysis was performed to elucidate the value of SMRI as a prognostic indicator. Gene set variation analysis, tumor mutation burden (TMB) analysis, immune infiltration, and immune checkpoint expression analysis were performed to investigate possible causes for the differences in prognosis of SMRI subtypes. Results: The high SMRI group exhibited a poor prognosis, which was characterized by elevated levels of TMB, mutation counts (TP53, NPM1, DNMT3A, and FLT3-TKD), CD8+ T cell infiltration, and immune checkpoint (PD-1, PD-L2, CTLA4, LAG3) expression. The SMRI was still associated with prognosis, even after adjustment for age, sex, cytogenetic risk, DNMT3A status, FLT3 status, and NPM1 status. Gene set variation analysis showed that AML with FLT3-ITD mutation, CEBPA mutation, and LSCs (leukemia stem cells) were enriched in the high SMRI group. Conclusion: Our research suggests that the SMRI derived from genomic instability subtypes is a useful biomarker for predicting prognosis and may be beneficial for improving the clinical outcome of patients with AML. GRAPHICAL ABSTRACT


Introduction
Acute myeloid leukemia (AML) is the most common leukemia in adults and the most deadly disease of all leukemias [1]. In recent years, the understanding of the pathobiology and the genetic basis of AML has improved, leading to tremendous progress in the treatment of acute myeloid leukemia [2]. However, primary and secondary drug resistance is still a problem. AML is a heterogeneous disease, which contained a variety of cytogenetic and molecular abnormalities that can affect clinical outcomes and may offer potential targets.
Genomic instability refers to high-frequency mutations within the genome of a cell lineage and is a common phenomenon in almost all cancers, including AML [3,4]. Large-scale genomic analyses have identified many types of genetic aberrations in AML patients [5]. Importantly, the value of genomic instability in cancer prognosis is also prominent [6][7][8]. Some studies showed that compared with genome instability, genome stability had a significantly higher survival rate [9,10]. Studies have found that long noncoding RNA (lncRNA), one type of endogenous RNA longer than 200nt, is associated with recurrent mutations in AML, thereby predicting treatment response and survival rates [11,12]. Dysregulated expression of lncRNA may have an impact on cell differentiation, proliferation, and tumor progression [13][14][15]. Some research had suggested that lncRNA GUARDIN [16] and NORAD [17] are essential for maintaining genome stability. However, the function of lncRNA is still largely unknown, especially in genome instability [18,19]. Lately, Pu et al. proposed that there are multiple links between genomic instability and antitumor immunity, and presented the point of predicting immune checkpoint inhibition (ICI) therapy response through the status of genomic stress [20]. ICI therapy is to slow down the recurrence of tumors by causing specific immune responses to inhibit and kill tumor cells [21]. However, few patients benefit from ICI therapy.
In this analysis, we hypothesized that SMRI has a positive effect on predicting AML survival outcomes and ICI therapy response. For this purpose, lncRNA expression profiles, clinical data, and somatic mutation profiles from The Cancer Genome Atlas (TCGA), University of California Santa Cruz (UCSC) Xena, and Gene Expression Omnibus (GEO) databases were subjected to multiple bioinformatics analyses. In addition, given that the comprehensive prognostic analysis of multiple lncRNAs is more significant than the predictive value of a single lncRNA, we chose to construct an integrative risk index model (also denoted as SMRI) consisting of multiple lncRNAs associated with somatic mutations. Our goal is to use the SMRI for risk stratification to offer references for improving the prognosis of patients with AML.

Data acquisition and preprocessing
In this analysis, 151 AML samples were acquired from the TCGA database (https://tcga-data.nci.nih.gov/ tcga/), and 246 AML samples were obtained from GSE146173 in the GEO database (https://www.ncbi. nlm.nih.gov/geo/). Samples in which the key clinical information (survival time, survival status) was incomplete were excluded. Information about mutation profiles of TCGA AML samples was taken from UCSC Xena (https://xenabrowser.net/). All the above data are available from the internet without any charge.

Selection of lncRNAs related to genomic instability in TCGA dataset
Before data processing, lncRNAs with median absolute deviation (MAD) < 0.5 were excluded from the analysis. The somatic mutation counts of samples were obtained from UCSC Xena, and samples with missing transcriptome data were excluded. The included 138 samples were sorted in ascending order according to the mutation counts. The top 25% of the samples were classified as genomic stable (GS) group, and the last 25% were classified as genomic unstable (GU) group. The differential expression analysis based on the 'limma' package was performed between GS and GU groups to identify somatic mutation-associated lncRNAs (SMlncs, |log2 FC (fold-change)| > 0.6 and false discovery rate adjusted p < 0.05).

Identification of AML subclasses and functional enrichment analysis
The 'hclust' function was used to perform hierarchical clustering of samples based on 254 SMlncs to identify TCGA AML subclasses. We performed Spearman correlation analysis to identify the correlation between lncRNAs and protein-coding genes, and the top ten components were regarded as co-expressed lncRNArelated partners. To identify the biological functions involved in these 254 SMlncs, we conducted functional enrichment analysis of lncRNA-related protein-coding genes to recognize enriched Gene Ontology (GO) terms. The R package 'clusterProfiler' was used for GO enrichment analysis.

Construction and validation of SMRI
The optimal SMlncs were screened by Kaplan-Meier analysis and univariate Cox proportional hazards regression analysis based on TCGA dataset; besides, the least absolute shrinkage and selection operator (LASSO) Cox regression analysis was introduced to further screen the pivotal candidate SMlncs. Briefly, LASSO is considered to be a method of estimation in linear models which imposes penalties on parameter estimates to prevent the model from overfitting [22]. These penalties serve the function of reducing the parameter estimates to zero to eliminate overfitting between genes. After that, the optimization and adjustment of the LASSO penalty terms are carried out through ten-fold cross-validation. In this work, we regarded 142 TCGA samples harboring survival time and survival status as the training cohort. Due to the complete clinical information of TCGA, we also randomly divided the TCGA dataset into cohorts A and B in the ratio of 5:5. The 246 GEO samples served as an external verification cohort to test the robustness of SMRI in risk stratification. The clinical characteristics of TCGA and GEO samples were shown in Table 1. Before constructing SMRI, we implemented the 'ComBat' function to eliminate potential batch effects of data on different platforms. SMlncs for constructing SMRI were then selected from the multivariate Cox proportional hazard regression analysis based on the bidirectional selection of the Akaike Information Criterion (AIC). The expression values and Cox regression coefficients of SMlncs were used to calculate the SMRI of each sample. The median SMRI of the training cohort was used to distinguish high and low SMRI groups of four cohorts. The prognostic difference between high SMRI and low SMRI was identified by Kaplan-Meier survival analysis. Before performing other analyses, we assigned patients to different risk groups (favorable, intermediate, or poor) based on the 2017 European Leukemia-Net (ELN) risk stratification determined by cytogenetic abnormalities [23]. ROC curves were yielded for each prognostic factor to acquire the area under the ROC curve (AUC) and to determine whether SMRI was more acceptable than other clinical variables (age, sex, cytogenetic risk, and 2017 ELN risk factors). Univariate and multivariate analyses were used to explore the independence of SMRI from other clinical variables. We distinguished subgroups by age, sex, NPM1 mutation status, DNMT3A mutation status, FLT3 internal tandem duplication (ITD) mutation status, TP53 mutation status, and cytogenetic risk and tested the differences in SMRI in different clinical subgroups. We collected the main clinical information and status of the highly mutated genes of each TCGA sample and then used a heatmap to visualize the different characteristics of the risk subgroups (high SMRI and low SMRI groups). The differences in the distribution of clinical variables between risk subgroups of the training cohort were identified by the chi-square test. GSVA is an effective method to predict the pathway activity scores from gene expression data in an unsupervised and non-parametric manner. After collecting the activity estimates of the relevant pathways of interest, a differential analysis was implemented between TCGA risk subgroups based on limma's t-test.

Mutation and immune landscapes of risk subgroups of the training cohort
The mutation activity of cancer cells can produce tumor-specific neoantigens, which may be used as biomarkers in cancer immunotherapy [24]. The subtypes of immune cells (T lymphocytes, macrophages, and dendritic cells) enriched in the tumor microenvironment have an important impact on tumor outcome [25]. The differences in prognosis between risk subgroups may stem from the extent of TMB and immune infiltration and the type of immune cells. With this consideration, we compared the differences in TMB levels between risk subgroups of the training cohort. TMB was defined as the number of mutations per million bases, based on the total number of somatic mutations in each sample. The immune score based on the ESTI-MATE algorithm was used to evaluate the immune status of risk subgroups. As for immune infiltration, we used CIBERSORT (https://cibersort.stanford.edu/) to estimate the relative abundance of 22 tumor-infiltrating immune cells (TIICs) in AML. Wilcoxon rank-sum test was used to compare the differences of TIICs between risk subgroups. Besides, we identified the divergences in the expression levels of immune checkpoint molecules (such as PD-1, PD-L2, CTLA4, and LAG3) between the risk subgroups. Immune checkpoint proteins have implications for tumor immune escape and patient outcomes [26].

Statistical analysis
All statistical analyses were implemented in R software version 4.0.0. The survival analysis between each group was performed by the Kaplan-Meier method. Differences in clinical variables between risk subgroups were compared by the chi-square test. Limma's t-test was used to determine differential pathway activities and differentially expressed lncRNAs and genes. For continuous variables, the comparison results between two groups were derived from the Wilcox test, and the comparison results between multiple groups were derived from Kruskal-Wallis test. The correlation between protein-coding genes and lncRNAs was tested by Spearman's correlation analysis.

Somatic mutation-associated lncRNAs (SMlncs) in TCGA AML patients
We identified 2633 lncRNAs through median absolute deviation. Based on the differences in somatic mutation counts of the TCGA samples, we selected 35 samples with the lowest mutation counts (defined as the GS group) and 43 samples with the highest mutation counts (defined as the GU group) ( Table  S1). The Kaplan-Meier survival curve revealed that the overall survival (OS) of the GU group was lower than that of the GS group (p = 0.029; Figure 1a). Using the 'limma' R package, we found that 205 lncRNAs were upregulated whereas 49 lncRNAs were downregulated in the GU group (Table S2). Subsequently, we excluded samples with missing prognostic information and obtained clustering information of TCGA samples based on 254 SMlncs. As shown in Figure 1b, the samples were classified into two groups based on the lncRNA expression profiles. Similarly, the subgroup with higher somatic mutation counts was designated as the GU-like group, while the other group was designated as the GS-like group. Figure 1c showed the higher median somatic mutation counts of the GU-like group than the GS-like group (p < 0.001). To explore the biological functions involved in these 254 SMlncs, we performed functional enrichment analysis on the protein-coding genes related to SMlncs. Therefore, we conducted an expression correlation analysis between SMlncs and protein-coding genes and extracted the top ten protein-coding genes most relevant to each SMlncs. We visualized the connections between them through the coexpression network diagram (Figure 1d). If a lncRNA and a protein-coding gene are correlated, they are linked by lines. Nodes with different colors represented protein-coding genes and lncRNAs, respectively. Related GO analysis showed that these proteincoding genes were involved in negative regulation of DNA binding, nucleobase metabolic process, chromatin-mediated maintenance of transcription, regulation of telomere maintenance, regulation of chromatin silencing, and DNA-binding transcription activator activity ( Figure 1e). Subsequently, the 254 SMlncs were used as seed genes for the construction of SMRI. We selected 18 components from the seed genes that have significant prognostic significance with the help of Kaplan-Meier survival analysis and univariate Cox proportional hazards regression analysis (Table S3). To improve the accuracy of the analysis, we implemented LASSO regression analysis and selected ten lncRNAs for further identification ( Figure S1).

Somatic mutation-associated risk index (SMRI) for AML
We previously mentioned that TCGA samples are randomly divided into cohorts A and B in the ratio of 5:5. The chi-square test showed that there were no differences between the two cohorts in terms of age The median SMRI of the training cohort was the critical point for distinguishing high SMRI and low SMRI patients. For the TCGA data, we found that the high SMRI group tended to have a poor prognosis compared to the low SMRI group in the training cohort (p < 0.001; Figure 2a), A cohort (p < 0.001; Figure 2d), and B cohort (p < 0.001; Figure 2g). The ROC curves of the three AML cohorts proved that SMRI has a higher prognostic value than age, cytogenetic risk, and other 2017 ELN risk factors (Figure 2b, e, h; Figure S2a-c). The expression characteristics of six lncRNAs were also visualized, and the high SMRI group was accompanied by more deaths (Figure 2c, f, i). For the external verification cohort, the OS of the high SMRI group was worse than that of the low SMRI group (p = 0.011; Figure 3a). The area under the ROC curve suggested that the predictive value of SMRI was slightly inferior to the age and cytogenetic risk (Figure 3b; Figure S2d). We also visualized the gene expression, SMRI, and survival status of each sample in the risk subgroups of the verification cohort ( Figure 3c). Data from training (p < 0.001), A (p = 0.009), B (p < 0.001), and verification (p = 0.030) cohorts proved that SMRI was still significantly correlated with overall survival after adjusting for clinicopathological characteristics such as age, sex, and cytogenetic risk (Table 2; Figure 3d).

Exploring the prognostic prediction potential of SMRI based on TCGA training cohort
We further revealed the prognostic differences of SMRI subtypes in different hierarchical features, including age (age < 60 and age ≥ 60), sex (female and male), and cytogenetic risk (poor, favorable, and intermediate). Kaplan-Meier plots manifested low survival rates of the high SMRI, which revealed that age, cytogenetic risk, and sex do not disturb the robustness of SMRI in AML prognostic prediction (Figure 4a-c). Subsequently, we compared the differences in SMRI in clinical subgroups (Figure 4d-i).
Notably, in patients with advanced age (p < 0.001), the SMRI was usually observed to be elevated, but the SMRI was not correlated with sex (p = 0.162). The group with favorable cytogenetic risk scored the lowest SMRI, but the intermediate-and poorrisk showed no difference in SMRI. In different clinical characteristic groups, patients with DNMT3A (p < 0.001) or TP53 (p = 0.024) mutation status exhibited a tendency to acquire higher SMRI. Also, patients harboring FLT3-ITD and NPM1 mutations had higher SMRI than patients with only NPM1 mutations (p = 0.047).

Clinical characteristics and differential pathways in risk subgroups
We exhibited the distribution of clinical characteristics and top-ranked mutant genes in the TCGA risk subgroups in the heatmap (Figure 5a). The Chi-square test revealed several distinct differences between AML subgroups. Compared with the low SMRI group, patients in the high SMRI group tended to have unfavorable clinicopathological factors, such as advanced age (p = 7.55e-04) and poor cytogenetic risk (p = 2.52e-09). In addition, the high SMRI group showed more patients with death status (p = 1.35e-04). Chisquare test displayed that highly mutated FLT3-TKD (p = 4.88e-02), NPM1 (p = 6.87e-05), TP53 (p = 3.16e-03), DNMT3A (p = 3.18e-03) were associated with high SMRI group. However, no differences in the distribution of sex (p = 1.29e-01), FLT3-ITD status (p = 5.37e-01), NRAS status (p = 4.38e-01), CEBPA status (p = 1), IDH1 status (p = 1), IDH2 status (p = 1), TET2 status (p = 1), and RUNX1 status (p = 3.98e-01) were observed between risk subgroups. Gene set variation analysis results proved that AML with FLT3-ITD mutation, CEBPA mutation, and LSCs were related to the high SMRI group (Figure 5b). Starch and sucrose metabolism, ascorbate and aldarate metabolism, pentose and glucuronate interconversions, linoleic acid metabolism, and drug metabolism other enzymes were related to the low SMRI group (Figure 5b). Figure 6a showed the higher TMB of the high SMRI group (p = 0.043). As shown in Figure 6b, the median SMRI can still distinguish patients into different survival groups with statistical significance in the high TMB group and the low TMB group, regardless of TMB level. The immune score of the high SMRI group was higher than that of the low SMRI group (p = 0.012; Figure 6c). The statistical differences indicated that the high SMRI group had higher abundance of CD8 + T cells (p = 0.039) and lower abundance of mast cells resting (p = 0.026) than the low SMRI group (Figure 6d). No differential infiltration of regulatory T cells (Tregs) was observed between risk subgroups (p = 0.159; Figure 6d). We also determined the correlation between risk subgroups and the expression of multiple immune checkpoint molecules. In addition to PD-L1, IDO2, and TIM-3, the high SMRI group showed higher expression for PD-1, PD-L2, CTLA4, LAG3, and IDO1 than the low SMRI group (Figure 6e).

Prognostic analysis of lncRNAs
As described in the section 'Materials and methods', we screened the differentially expressed lncRNAs between the GS group and the GU group based on the somatic mutation counts in the TCGA dataset. FAM30A, CACNA1C-AS1, and LINC02595 were found to be up-regulated in the GU group; LINC00926, AL589863.1, and AP000919.3 were found to be upregulated in the GS group ( Figure S3). For the TCGA dataset, the elevated expression of CACNA1C-AS1 (p = 0.044), LINC02595 (p = 0.021), LINC00926 (p = 0.017), AL589863.1 (p = 0.012), and AP000919.3 (p = 0.047) predicted a better prognosis, while elevated expression of FAM30A (p = 0.014) indicated a poor prognosis ( Figure S4a). For the GEO dataset, we only found the prognostic value of FAM30A (p = 0.049),

Discussion
Although some breakthroughs have been made in clinical and basic research on AML, the available treatment methods are still relatively limited. Age, performance status, cytogenetics, and gene mutations (such as FLT3, NPM1, DNMT3A, and CEBPA) are considered to be important prognostic factors at initial diagnosis [27]. With the advancement of genome-wide sequencing technology, it has gradually become a consensus of people to focus on the implementation of personalized targeted therapies to achieve the best therapeutic effect according to the cellular and molecular genetic characteristics of different AML patients [28]. However, the high degree of heterogeneity of AML makes it show a poor prognosis [29]. Genomic instability is considered to be one of the driving factors in tumorigenesis with important significance in patient survival [30,31]. LncRNA expression is commonly altered in various cancer types and correlates to patient outcomes [32][33][34]. Although it has been shown that lncRNAs are related to patient prognosis, the somatic mutation-associated lncRNAs of genomic instability and their clinical values in AML remain unknown. Therefore, we combined the lncRNA expression profile and AML mutation profile to identify lncRNAs and biomarkers related to genomic instability. In this analysis, we identified 254 SMlncs related to AML. Functional enrichment analysis of related targets of these SMlncs revealed enrichment in GO terms for negative regulation of DNA binding, nucleobase metabolic process, chromatin-mediated maintenance of transcription, regulation of telomere maintenance, regulation of chromatin silencing, and DNA-binding transcription activator activity. Such results suggested the connection between SMlncs and genetic instability [35,36]. We then examined the prognostic significance of SMlncs and constructed the SMRI based on six SMlncs.
We verified the application value of SMRI in clinical prognosis prediction. The low SMRI group showed significantly favorable OS on the Kaplan-Meier survival curve. In fact, we further evaluated whether the medium SMRI can still distinguish patients into different survival groups with statistical significance in the same clinical subgroups (such as a younger group, an elder group, a female group, a male group, a poor cytogenetic risk group, etc.). Stratified survival analysis revealed that the medium SMRI can still identify highrisk and low-risk patients in these clinical subgroups. More importantly, both TCGA and GEO cohorts presented that the SMRI is still significantly associated with clinical prognosis, even after adjustment for clinical factors (age, sex, cytogenetic risk, FLT3-ITD, FLT3-TKD, DNMT3A, and NPM1 mutation status). ROC curve analysis of the TCGA cohort showed that the predictive value of SMRI is higher than age and cytogenetics risk. These results indicate that the SMRI we established has good application value in clinical prognosis prediction. After that, we attempted to parse out potential clinical characteristics of patients with high SMRI, and we Abbreviations: SMRI, somatic mutation-associated risk index; HR, hazard ratio; CI, confidence interval.
then found that high SMRI was correlated with advanced age, unfavorable cytogenetic risk, DNMT3A mutations, and TP53 mutations, all of which were clues of adverse prognosis of AML patients [27,37,38]. We also found that the high SMRI group showed a higher frequency of FLT3-TKD, but its prognostic value in AML is still controversial [38]. As reported, NPM1 mutation is treated as a favorable prognostic marker in the absence of FLT3-ITD mutation. However, patients with NPM1 mutation frequently have FLT3-ITD mutation, and FLT3-ITD mutation often exerts its negative impact in NPM1 mutated patients [39]. Consistent with this report, we noticed that the SMRI of AML patients with FLT3-ITD and NPM1 mutations was higher than that of patients with only NPM1 mutation, and previous results have shown that high SMRI was associated with poor prognosis.
To understand the potential reasons for the differential prognosis of SMRI subtypes, we further compared the mutation and immune characteristics of risk subgroups of the training cohort. We found that the high TMB levels mainly appeared in the high SMRI group. High levels of TMB were reported to be associated with favorable immunotherapy response of certain solid tumors such as non-small-cell lung cancer [40,41]. However, AML is typically a low mutational burden disease [42], and the role of TMB for immunotherapeutic approaches in AML has not been fully elucidated. More and more evidence shows that AML can activate different immune pathways, lead to immunosuppressive function, determine the tumor's immune microenvironment, and reduce overall survival rates [43,44]. The roles of TIICs in cancers have gradually been disclosed, especially in the regulation of tumor prospects [25]. In our prediction data, high SMRI populations manifested less infiltration of resting mast cells and significant infiltration of CD8 + T cells in comparison to low SMRI populations; besides, although the difference was not statistically significant, the higher infiltration abundance of Tregs mainly appeared in the high SMRI group. The risk subgroups defined by SMRI were demonstrated to have differing immune checkpoint expression patterns, with a larger share of PD-1, PD-L2, CTLA4, LAG3, and IDO1expression in high SMRI populations. Tregs exert a significant role in the immunosuppressive networks that lead to a disabled antileukemic immune response [45]. Although CD8 + T cells show an essential role in anti-tumor immune responses, their tumor-killing functions are present at an inferior level in AML [46]. Programmed cell death 1 (PD-1), which is highly expressed on the surface of activated T lymphocytes, is considered to be a T cell brake, and its combination with PD-L1 ligand expressed on the tumor cells incurs the decrease of tumor suppressors, thus restraining the anti-tumor immune effect and tumor clearance ability mediated by cytotoxic T cells [47,48]. AML patients with high PD-1 expression tend to develop an exhausted T cell phenotype, which leads to immune escape and poor outcomes [49]. Similarly, cytotoxic T lymphocyte antigen 4 (CTLA4) expressed by CD4 + and CD8 + T cells is a key co-inhibitory molecule that can inhibit T cell activation and T cell response [50]. Hence, we speculated that hypermutation phenotype (TP53 and DNMT3A mutations), advanced age, poor cytogenetic risk, weak CD8 + T cell killing function, and overexpressed immunosuppressive targets (PD-1, PD-L2, CTLA4, LAG3, and IDO1) might be responsible for the unfavorable prognosis observed in patients with high SMRI. It is well known that immune checkpoint inhibitors suppress checkpoints on immune cells, thereby activating the anti-tumor response of T cells [21]. The role of PD-1 and PD-L1 in the immunosuppression of cancers makes them potential targets of ICI therapy [51]. A recent phase 2 study proposed that ICI therapy (nivolumab) concurrent with azacytidine resulted in a higher objective response rate, longer median overall survival, and event-free survival compared with chemotherapy alone [52]. While ICI therapy activates T cells and triggers antitumor immunity in many patients, a large proportion of AML patients do not respond to ICI therapy [53]. In the prediction results of this analysis, we speculated that higher immune checkpoint expression and CD8 + T cell infiltration make the high SMRI group more likely to benefit from ICI therapy. Also, the high SMRI group showed a higher NPM1 mutation rate than the low SMRI group, and the mutated NPM1 is considered to be a possible target of immunotherapy [54].
GSVA revealed that LSCs were associated with high SMRI patients. Leukemia stem cells (LSCs) have the ability to trigger leukemia and continue generating leukemia cells and also show immune resistance characteristics [55].
Although our analysis provides a potential reference for the prognosis prediction of AML, there remain some limitations: This study is based on retrospective data from TCGA and GEO databases, and it is not easy to obtain more prognostic information that affects patient outcomes; The characteristics of immune infiltration between risk subgroups are predicted based on bioinformatics methods, and their actual significance needs to be verified.

Conclusion
In conclusion, we constructed and tested SMRI based on somatic mutation-associated lncRNAs. SMRI was predictive of overall survival independent of age and cytogenetic risk. The current SMRI might provide more information for risk stratification and ICI therapy for AML.

Disclosure statement
The authors report no conflict of interest

Consent to participate
Not applicable Authors' contributions QX was responsible for study design, data analysis, figure visualization, and manuscript; TG contributed to the writing and review of the paper.