Identification of an eleven-miRNA signature to predict the prognosis of endometrial cancer

ABSTRACT Endometrial cancer (EC) is the most common gynecological malignancy. Recent studies have uncovered miRNA acted a striking role in predicting the prognosis of multiple tumors. Over 500 EC samples were selected from the Cancer Genome Atlas (TCGA) database. Univariate, LASSO and multivariate Cox regression analysis were employed to screen out the prognosis-involved miRNAs. Kaplan-Meier (K-M) and time-dependent receiver operation characteristic (ROC) curves were conducted to reveal survival analysis and assess the accuracy of the signature. The independence of the model was verified via univariate and multivariate Cox regression analysis. Besides, qRT-PCR was conducted to testified the expression of 11 miRNAs in 16 paired tissues. A total of 514 specimens were randomly divided into the training set and the testing set, then an 11 miRNAs-based signature were determined which divided the patients into high-risk group and low-risk group. The survival was markedly different and the ROC curve exhibited a precise prediction. Meanwhile, the univariate and multivariate Cox regression analysis verified the miRNAs-based model was an independent indicator of EC. Moreove, the prediction ability of this model with clinicopathological features was more efficient. Finally, functional enrichment analysis demonstrated these miRNAs were associated with the occurrence and progression of cancer. Additionally, hsa-mir-216b, hsa-mir-363, hsa-mir-940 and hsa-mir-1301 were highly expressed in EC tissues in contrast to normal tissues through qRT-PCR. Importantly, the eleven-miRNA signature was full of robust ability to predict the prognosis of EC.


INTRODUCTION
Endometrial cancer (EC) is one of the most malignant tumors derived from epithelial tissue that seriously threatens women's health, the morbidity of it among females ranked the fourth [1]. There are multiple factors that influenced morbidity and mortality, such as the histology (serous) and the grade (grade 3) impacted the 5-year survival rate, which was the lowest [2]. Besides, intentional weight loss was verified to contribute to an apparent low risk of type I endometrial cancer [3]. Although there was a prominent decline in postoperative mortality from 2000-2001 (0.70%) to 2008-2009 (0.48%) [4], and the adjuvant radiation treatment (ART) significantly improved diseasefree survival (DFS), overall survival (OS) in patients with early-stage EC [5], patients with metastasis had a poor prognosis, only with 17% of 5-year survival [6]. In recent years, various biomarkers were discovered associated with the tumorigenesis [7], progression [8], prognosis [9][10][11] of EC, but still tough to apply. Therefore, it is pressing to construct a systemic model to estimate prognosis by integrating these biomarkers or some unknown.
MicroRNAs which can target mRNAs mostly through interaction with the 3 UTR [12] and regulate the expression of genes are short RNA molecules 19 to 25 nucleotides in size [13]. Moreover, the miRNA acts as an adaptor for miRISC to recognize or regulate mRNA and the product (miRNA-mRNA) was a key determinant of the regulatory mechanism [14]. Recent studies hinted miRNAs were involved in farranging diseases, such as cancer, diabetes, viral infections and others [15]. Among them, the cancerrelated miRNAs had sprung up. For example, overexpression of miR24-2 accelerates the progression of liver cancer cells through Pim1 activation [16], while downregulated miR-28-5p in prostate cancer acts as a tumor suppressor by altering SREBF2, which is a vital mediator of miR-28-5p [17]. For endometrial cancer, plenty of relevant microRNAs and their mechanisms were disclosed. Some typical examples as following: highly-expressed miRNA-486-5p induced proliferation and migration of EC cells through targeting MARK1 [18]; high expression of miR-137 was considered as a tumor suppressor to inhibit the cell proliferation by targeting EZH2 and LSD1 [19]; Similarly, highly-expressed miR-449a attenuated cell growth and metastasis based on the regulatory axis of NDRG1/PTEN/AKT [20]. Despite these miRNAs were explored in EC, their reliability and stability were controversial. Hence, to establish a reliable prognosis-related signature was urgent to address so as to guide the subsequent treatment for the heterogeneous individual and promote survival after the operation or receiving comprehensive therapy.
In this study, we obtained miRNAs data and clinical information from TCGA database and construct a miRNA-based signature to explore its role in prediction of EC prognosis. Our hypothesis is that miRNA-based model has the robust ability to predict EC prognosis. The aim of our study is to establish a novel model that integrated biomarkers with clinicopathologic features to evaluate the prognosis of EC patients accurately. Our goal is to lay the theoretical basis for further pursuing a new therapeutic strategy to enhance the survival rate.

Selection of patient datasets
In this study, a total of 579 samples, which contained 546 EC patients and 33 normal tissues, were obtained from The Cancer Genome Atlas (TCGA) database (https://TCGAData.nci.nih.gov/TCGA/) [21]. First, according to the integrity of the miRNA expression profile and clinicopathological features, 514 samples were ensured for subsequent analysis. Secondly, we acquired the clinical information including prognostic information, age, grade, stage and so on. Finally, 514 samples were randomly divided into training sets (n = 258) and testing sets (n = 256).

Prognosis-involved miRNAs screening
EdgeR package [22] in R language was executed to screen out all differentially expressed miRNAs (DEMs) between 546 EC patients and 33 adjacent normal samples. The volcano plot based on the inclusion criteria (|log2FC|>1 and FDR<0.05) was emerged, then cluster analysis was conducted to demonstrate the evident difference between two antagonistic sets intuitively. After that, Univariate, LASSO, and multivariate Cox regression analysis were applied to select the prognosisinvolved miRNAs orderly in the training sets. Notably, based on the candidate miRNAs, which were acceptable to P < 0.05 in univariate Cox regression analysis, the OS-related miRNAs could be further screened out via LASSO, multivariate Cox regression analysis.

Establishment of a multi-miRNA-based prognostic model
After using the LASSO and multivariate Cox regression analysis, the most valuable miRNAs were ascertained. Based on these miRNAs, the prognostic risk stratification was revealed by calculating the risk score in the training sets. Risk score, where Coef i represented the coefficient of miRNA i , Exp i denoted the expression lever of miRNA i . The median score was served as optimal cutoff value to separate the cohort into high-and low-risk group, then we compared survival rate between the two groups through Kaplan-Meier (K-M) survival curves and verified the predictive efficiency via time-dependent receiver operation characteristic (ROC) curves. To validate the validity of the results, the multi-miRNA-based risk score was applied in the testing sets, and the K-M curves and ROC curves were also performed to investigate whether the results were steady.

Survival analysis and comparison of the miRNAs expression profile
Firstly, we performed the different distribution of clinicopathological features between the high-risk group and the low-risk group via the Chisq-test. Then we divided those samples into different subgroups according to clinicopathological variables. K-M curves were built to analyze the overall survival of the high-risk group and the low-risk group of every subgroup in the training cohort and testing cohort respectively. Subsequently, the t-test was utilized to manifest the significant difference in the expression profile of DEMs in different subgroups.

Verification of model's dependence and assessment of its prediction accuracy
Univariate and multivariate Cox regression analyses were employed to verify the miRNAbased prognostic model was independent and estimate the hazard ratio of each risk factor in the training set and testing set. The area under the curve (AUC) of time-dependent ROC was calculated to compare the accuracy of the different factors or their combinations in predicting 1-, 3-, 5-year survival in the training cohort. Naturally, the AUC of time-dependent ROC was executed in the testing cohort to validate the verdict obtained from the training cohort likewise.

Functional enrichment analysis
We utilized the TargetScan, miRTarBase and miRDB to infer the target genes regulated by those miRNA. After that, the Enrichr database (http://amp.pharm.mssm.edu/Enrichr/) [23] was applied to elaborate GO and KEGG pathways closely implicated in those target genes. Through functional enrichment analysis, it implied the internal biological mechanisms of these target genes. Besides, the GO analysis contained three categories: biological process (BP), cellular component (CC) and molecular function (MF).

RNA extraction and quantitative reverse transcription PCR (qRT-PCR)
Trizol reagent (Invitrogen) was applied to extract total RNA from 16 paired tissues (16 endometrial cancer tissues and 16 normal tissues) which were obtained from the First Affiliated Hospital of Nanjing Medical University (Nanjing, China). PrimeScript TM RT Reagent Kit (Takara Bio, Inc., Otsu, Japan) was used for reverse transcription (RT). SYBR PrimeScript RT PCR kit (Jijia, Suzhou, China) was employed for qRT-PCR. Additionally, U6 was seved as an internal reference. The primer sequences of 11 miRNAs were predented in Table 1. The comparative cycle threshold (2 −ΔΔCt ) were analyzed to acquired the final results via GraphPad Prism 7. Notably, our study was approved by the Ethics Committee of the First Affiliated Hospital of Nanjing Medical University.

RESULTS
In this study, we construct an miRNA-based signature to explore its role in prediction of EC prognosis based on TCGA database. We hypothesized that miRNA-based model has the robust ability to predict EC prognosis. Our goal is to construct a miRNA-based signature to predict prognosis of EC and to lay the theoretical basis for further pursuing a new therapeutic strategy to enhance the survival rate. In the present study, we first obtained over 500 EC samples were selected from the Cancer Genome Atlas (TCGA) database. Then we screened out most significantly expressed miRNAs and constructed a signature. The predictive ability was estimated and validated. Besides, we combined miRNA-based model with clinicopathological features to assess the prediction ability of different combinations. Finally, the biological functions were predicted by GO and KEGG.

Differentially expressed miRNAs screening
A total of 579 samples in this study were composed of 546 endometrial cancer (EC) patients and 33 adjacent normal samples. It was apparent that a total of 196 significantly differentially expressed miRNAs comprised of 151 upregulated miRNAs and 45 down-regulated miRNAs (inclusion criteria: |log2FC|>1 and FDR<0.05) between EC patients and normal samples were displayed in the volcano plot ( Figure 1a). From the heatmap, we could perceive all DEMs and the clustering analysis among all specimens ( Figure 1b).

Establishment of EC prognostic model in the training set
We integrated miRNA expression profiles and clinical information so as to identify 514 eligible EC samples, then we divided these samples into the training set (n = 258) and the testing set (n = 256) randomly. According to the formula: Risk score, the specific risk score for individuals were calculated as following: Risk score = (0. From the above, we acknowledged that coefficient < 0 mean a better prognosis, while coefficient > 0 indicated a poor prognosis. The median risk score among risk scores of all samples in the training cohort was served as the optimal cutoff value, which separated the patients into high-and low-risk group in the training cohort. The expression level of 11 miRNAs, risk score, and survival status were displayed in Figure 2a-2c. The survival rate of the high-risk group was conspicuously poorer than that of the low-risk group via the K-M analysis (P = 2.367e −5 ) (Figure 3a). ROC analysis was conducted to assess the accuracy of the prognostic ability (Figure 3c). The area under the ROC curve (AUC) was 0.863 at 1-year survival, 0.79 at 3-year survival, 0.804 at 5-year survival. To assess whether the model was an independent indicator of EC, univariate and multivariate Cox regression analysis in the training set were executed, including clinical factors (age, stage, histological type and grade) and risk score. The Table 2 was exhibited after using multivariate Cox regression analysis. we got that high-risk group VS lowrisk group: HR = 1.026, 95%CI1.015-1.037, P < 0.001, the results underscored this prognostic model possessed moderate independent prognostic value.

Validation of EC prognostic model in the testing set
To validate the predictive ability of this eleven-miRNA signature, the optimal cutoff value from the training cohort was also applied in the testing set. Samples in the testing cohort were divided into a high-risk group and a low-risk group likewise. It revealed that the survival time was decreased and the figure of dead people was up along with the elevated risk score by survival state plot (Figure 2d-2f). Therefore, the survival status was compliant with the previous inference. The K-M curve elicited the survival rate of the highrisk group was significantly inferior to that of the low-risk group. (P = 2.163e −3 ) (Figure 3b). The area under the ROC curve (AUC) was 0.721, 0.668, 0.713 at 1-year survival, 3-year survival, 5-year survival respectively (Figure 3d). Univariate and multivariate Cox regression analysis after adjusting for clinical characteristics were conducted to validate the independence of the model's prediction in the testing cohort. The multivariate Cox regression analysis obtained from the Table 3 testified high-risk group VS low-risk

Survival analysis of subgroups between eleven-miRNA signature and clinicopathological features
The clinicopathological features included grade (G3&G4, G1&G2), histological type (endometrioid, Mix&serous), stage (stageI&stageII, stageIII&stageIV) and age (≤60, >60). The distribution of these clinicopathological characteristics and the expression level of 11 miRNAs between the high-risk group and the low-risk group in the training cohort and testing cohort were presented in Figure 4a and 4b respectively. Subsequently, Kaplan-Meier curves were conducted for each subgroup, we could obtain the difference of OS for each clinicopathological variable through comparing risk stratification in the training set. Obviously, patients with low-risk had strikingly better OS than those with high-risk in endometrioid (P = 5.4e −5 ), grade G3 & G4 (P = 0.003192), stage I & stage II (P = 0.00198), stage III & stage IV (P = 0.006937), age>60 (P = 0.001877), age≤60 (P = 0.001002) (Figure 5a-5f). The robust predictive ability of this eleven-miRNA signature was also validated in the testing cohort. What's more, the OS of patients with high-risk was still poorer than that with low-risk in age>60, age≤60, grade G3 & G4 and stageIII & stageIV, corresponding to P = 0.04774, 0.034936, 0.03156, 0.011903 respectively (Figure 5g-5j).

Comparison of the expression profile of DEMs in different clinicopathological signatures
To further investigate the gene expression profile, we applied t-test and P < 0.05 was the exclusion criterion. Consequently, the expression patterns of DEMs in each clinicopathological parameter were demonstrated in Table 4 and Table 5, corresponding to the training set and testing set. Similarly, the result of the training set was mainly in line with that of the testing set. Through in-depth thinking, among those clinicopathological signatures, histological type played a critical role in the expression pattern of DEMs.

Assessment of the predictive value of the miRNA-based model with clinical parameters
In order to acquire a better prediction capacity, we imagine combining the prognostic model with different clinicopathological features to investigate and compare the prediction ability. Then timeindependent ROC curves were delineated to evaluate. The AUC of the risk score was better than that of clinical factor (stage) (risk score-AUC: 0.858, 0.779, 0.775 at 1, 3, 5 years; clinical factor-AUC: 0.710,      (Figure 6g-6i). Taken together, the combination of the model and the clinical factor had more advantages in predicting the prognosis of endometrial cancer.

Functional enrichment analysis
To further research the molecular biological mechanisms of the eleven miRNAs, 94 target genes of them were emerged by using miRDB, miRTarBase and TargetScan. Subsequently, enrichr database was employed to unfold the GO biological function and KEGG pathway enrichment analysis. The GO analysis constituted of biological process (BP), cellular component (CC) and molecular function (MF) were shown in   target genes remarkably participated in Ras GTPase binding (GO:0017016), oxidoreductase activity (GO:0016742), RNA polymerase II intronic transcription regulatory region sequence-specific DNA binding (GO:0001162). On the other hand, the KEGG enrichment analysis indicated target genes were involved in the signaling pathway of glyoxylate and dicarboxylate metabolism, cell cycle, acute myeloid leukemia, hepatitis B and other pathways (Figure 7d).

The expression levels of prognosis-related miRNAs in paired tissues
To verified the expression tendency of 11miRNAs in EC compared to normal tissues, we applied qRT-PCR to assess the expression level of them between 16 endometrial cancer tissues and paired 16 normal tissues. The results were exhibited in Figure 8.
Analyzing with the coefficient of each miRNA, we acknowledged that the expressions of hsa-mir-216b (P = 0.0038), hsa-mir-363 (P = 0.0075), hsa-mir-940 (P = 0.0066) and hsa-mir-1301 (P = 0.0062) were prominently high-expressed in EC tissues, which was satisfied with our inference. However, the expressions of hsa-mir-3614 (P = 0.027) and hsa-mir-4687 (P = 0.0053) were also highly expressed in the tumor tissues compared with normal tissues which was contrary to our bioinformatics analysis.

DISCUSSION
Endometrial cancer is the most common gynecological tumor with a high degree of malignancy. On account of the limited therapy strategies for Figure 6. The AUC of risk factor and clinical characteristics to predict 1, 3, 5-year survival. A-C, The AUC of risk score, clinical factor and risk score+clinical factor was calculated via ROC curve in training cohort to predict 1, 3, 5-year survival. D-F, The AUC of risk score, age, stage and grade was acquired to assess the predictive value in 1, 3, 5-year survival in testing cohort. G-I, The AUC of risk score, clinical factor and risk score+clinical factor was compared to ensure the most optimal model to predict the prognosis in testing cohort.
metastasis in patients with EC, there was a 15%-20% recurrence rate after surgery [24]. Therefore, establishing a prognostic signature with high specificity and sensitivity is desired and urgent, which could assist the selection of treatment strategy, promote the survival rate and contribute to better long-term prognosis. Current studies manifested miRNAs functioned multiple roles in EC initiation and progression. Zhou et al. demonstrated plasma-derived exosomal miR-15a-5p was a effective biomarker in EC early diagnosis [33,781,255]. Chang et al. testified metapristone could inhibit EC cell proliferation via targeting miR-492/Klf5/Nrf1 axis [33,413,440]. About drug resistance, Megumi Yanokura et al. verified microRNA-34b enhanced EC cell sensitivity to paclitaxel [33,300,049]. Recently, cutting-edge studies have certified miRNA was related to tumorigenesis and progression and could be considered as a potential biomarker in diagnosis, recurrence, treatment and prognosis of tumors. According to previous reports, miRNA was associated with diagnosis, prognosis, recurrence and therapy in breast cancer and papillary thyroid carcinoma [25][26][27][28], prognosis and therapy in ovarian cancer and colorectal cancer [29,30], diagnosis and progression in pancreatic cancer [31], prognosis in hepatocellular carcinoma and lung cancer [32,33]. As far as endometrial cancer, we listed several reports as following: Multivariate analysis demonstrated the decreased expression of miRNA-152 and miRNA-101 lead to poor prognosis in endometrial serous adenocarcinomas [32]. High expression of miRNA-34a was an indicator of positive prognosis in EC [34]. MiRNA-200 c maintained low expression levels in the advanced stage endometrioid EC that might generate inferior survival compared to the early stage [35]. These previous studies were incomplete lacking a comprehensive predictive model.
To further conduct a better model to improve the prognostic prediction ability of EC patients, we comprehensively analyzed and identified the novel biomarkers. First, 514 samples were finally selected from the TCGA database and were divided into a training set and a testing set. Secondly, by using the univariate, LASSO and multivariate Cox regression analysis, OS-related eleven miRNAs were ensured. Thirdly, The survival analysis in the training cohort based on the median risk score demonstrated the high-risk group was prominently poorer than that of the low-risk group (P = 2.367e −5 ). ROC curves were presented to evaluate the accuracy of the model's prediction. After that, we verified the miRNA-based model was an independent indicator to predict the prognosis of EC patients through univariate and multivariate Cox regression analysis in two cohorts respectively. All the results were validated in the testing cohort. Next, we divided the clinicopathological features into different subgroups, for each subgroup in the training set and testing set, it revealed that the OS of the low-risk group was full of superiority compared with the high-risk group. Finally, we intriguingly attested the combination of the model and clinical factor had a more precise presentation in predicting the prognosis of endometrial cancer than the single model or clinical factor via ROC curves in the training set, the result was also confirmed in the testing set.
We expect to discover the therapy target so as to provide a novel strategy. The miRDB, miRTarBase, TargetScan and Enrichr database were executed to explore downstream genes and their potential molecular biological mechanism. The GO terms implied these target genes mainly enriched in regulation of cell proliferation, negative regulation of cell motility, lysosomal membrane, Ras GTPase binding, RNA polymerase II intronic transcription regulatory region sequence-specific DNA binding and so on. While the KEGG pathway enrichment analysis hinted the target genes were implicated in cell cycle, acute myeloid leukemia and so on.
Accumulating evidence testified miRNAs acted significant roles in cell fate determination, proliferation, and cell death [36]. The miRNAs that we identified in this model were testified in other researches mostly. The over-expressed miR-216b restrained the cell proliferation, migration and invasion in HCC through the HBx-miRNA-216b-IGF2BP2 signaling pathway [37], a similar phenomenon was observed in colorectal cancer via targeting SRPK1 [38]. Whereas downregulation of miRNA-215 significantly inhibited cell proliferation, migration and invasion via Yin-Yang 1 in colon cancer [39]. Over-expressed miRNA-940 promoted proliferation, migration, and invasion of bladder cancer cell and inhibited cell apoptosis by mediating INPP4A or GSK3β and activating the Wnt/βcatenin pathway [40]. IGF2BP3 served as an antagonism could inhibit miRNA-3614 maturation, mediately resulted in preventing breast cancer cell growth through downregulating TRIM25 [41]. Upregulated miRNA-363 increased glioma cell viability and proliferation by adjusting the expression level of GAP-43, AKT,cyclin-D1 and other factors [42]. The miRNA-96 served as a tumor suppressor gene in pancreatic cancer, decreased cell invasion, migration and growth through inhibiting KRAS [43]. To sum up, the regulatory function of miRNA-215, miRNA-940 and miRNA-363 in former pieces of research were parallel to the results what we drew, while the miRNA-216b, miRNA-592, miRNA-96, and miRNA-3614 in other tumors functioned the opposite effects in contrast to our data. This reminded us these miRNAs might function as a two-edged sword in different tumor tissues and involved in variant mechanisms. Therefore, it is meaningful to further investigate the tumorigenesis in EC. It is a pity that miRNA-1301, miRNA-7110, miRNA-3170 and miRNA-4687 were lacking in reports, but as a challenge, it indicated we could catch these underlying targets to conduct an in-depth study to uncover their potential functions and complex regulatory mechanisms. Moreover, all these previous studies indicated the miRNA-based biomarker was effective in tumor prognostic prediction and possibly offer novel diagnostic and therapeutic strategies.
Compared with former research, this study performed superiority as we integrated the clinicopathological features with predictive miRNAs to give rise to more accuracy in evaluating survival. What's more, the prognostic model might also serve as an indicator of the selection of treatment strategies. However, there are several limitations. Firstly, our study lacks experiment verification, it is based on the TCGA database, so the molecular mechanism should be proved in the future. After that, the results of it need to be further validated before it is permitted to apply. Thirdly, explicit mechanisms should be elaborated in the further study.

Conclusion
In conclusion, an eleven-miRNA-based model could be regarded as an independent indicator for the prognosis of EC patients and harbored the forceful predictive capacity or potential treatment value.