A novel long non-coding RNA-based prognostic signature for renal cell carcinoma patients with stage IV and histological grade G4

ABSTRACT This study aimed to establish a lncRNA-based signature for predicting the prognosis of patients with high stage and grade renal cell carcinoma (RCC). According to the Surveillance, Epidemiology, and End Results (SEER) database, sex, age, grade, stage, surgery, chemotherapy, radiation, tumor size, and marital status were the independent prognostic factors for RCC and also had significant correlations with the overall survival through Cox univariate and multivariate analyses. Noticeably, among these influencing factors, the histological classification of undifferentiated group and pathological stage IV had the greatest prognostic risks for RCC patients. Furthermore, based on the samples at stage IV and histological grade G4 from The Cancer Genome Atlas (TCGA) portal, 9 key lncRNAs, including KIAA2012, CCNT2-AS1, ITPKB-AS1, TBX2-AS1, NUTM2A-AS1, LINC02522, LINC02384, LINC01559, and LINC00865 were identified and a prognostic signature was constructed by Lasso analysis and Cox regression model. The Kaplan-Meier analysis suggested that patients at stage IV and histological grade of G4 in high risk score group had a worse overall survival than that in low risk score group. The following receiver operating characteristic curve (ROC) curves also showed that this signature possesses a better predictive power performance. Pathway enrichment analysis discovered that 9 lncRNAs held potential roles in cell division, cell cycle, DNA damage and cytokines levels in RCC. This work indicates that the established 9-lncRNA signature has a good capacity in predicting the prognosis of RCC patients with stage IV and histological grade of G4, and may be helpful for guiding the treatment strategies for RCC patients.


Introduction
Renal cell carcinoma (RCC) is recognized as a common and deadly disease with a worldwide estimate of ~400,000 new cases and ~175,000 deaths [1]. Despite progressions in cancer control and survival improvement, locally advanced disease and distant metastases are still diagnosed in a notable proportion of patients, leading to its treatment remaining a challenge for oncologists [2]. Currently, surgery is considered to be the most important and effective method for treating the early stage of RCC [3]. However, 20%~30% patients still have recurrence or distant metastasis after surgery [4]. Additionally, RCC is not sensitive to chemotherapy or radiotherapy, resulting in the un-satisfactorily clinical outcomes for patients. Hence, identifying usefully prognostic biomarkers is of great significance for ameliorating the survival benefits of RCC patients.
Long non-coding RNA (lncRNA) refers to RNA transcripts ≥200 bp that are not equipped with protein-coding capacity [5]. Accumulating evidence demonstrates that lncRNAs play extensive regulatory roles in a variety of cellular processes, including transcriptional and posttranscriptional regulation [6,7]. Due to their strong tissue specific, they are potentially effective biomarkers for the survival of numerous cancers [8]. Identification of specific lncRNAs biomarkers may therefore be of clinical significance for the prognosis of RCC patients. Indeed, numerous lncRNAs have been identified to be related to the progression and prognosis of RCC. Qu et al. constructed a 4-lncRNA signature, including ENSG00000255774, ENSG00000248323, ENSG00000260911, and ENSG00000231666 to improve postoperative risk stratification for patients with localized clear cell RCC using the LASSO Cox regression model [9]. Zhang et al. established an 11-lncRNA signature that was clearly associated with the overall survival rates in clear cell RCC using the data from The Cancer Genome Atlas (TCGA) database [10]. At present, studies on the mechanism and function of lncRNAs in RCC are still under exploration. Therefore, identification of more effective lncRNA signatures undoubtedly provides more possible references for the clinical study of RCC.
In our work, to discriminate the sinnvoll prognostic factors, a large number data of RCC patients were retrieved from the Surveillance, Epidemiology, and End Results (SEER) database to analyze the significant prognostic factors. Moreover, to further ascertain the specific lncRNAs biomarkers, the RNA sequencing data of 38 tumor samples of RCC with the greatest prognostic factors of stage IV and histological grade G4 and 128 adjacent normal samples were extracted to construct a high stage and graderelated lncRNA signature associated with the prognosis of RCC patients. Ulteriorly, to explore the underlying mechanism behind this signature modulating the progression of RCC, pathway enrichment analyses were applied to identify cell cycle and cell division-related signaling. The established lncRNA signature may serve as a potential prognosticator the RCC, holding promise for acting as a potential therapeutic target for the therapy of RCC.

Data source from SEER
SEER database is programmed by US National Cancer Institute, which collects, processes, and provides data on approximately 10% of the US population [11]. Data of the present study were obtained from SEER database using the SEER*Stat software program (version 8.3.3) under a user agreement. The patients' demographic variables such as age of diagnosis, gender, marry, insure, surgery, chemotherapy, radiation, laterality, tumor size, grade, stage, race, histology, survival months, and vital status recode were extracted. Furthermore, we also excluded patients with missing information on the survival time/status, chemoradiotherapy information, or on the tumor stage.

Acquisition of gene expression data of RCC from TCGA database
The RNA-Seq expression data and the corresponding clinical data were downloaded from TCGA-KICH, TCGA-KIRC, and TCGA-KIRP datasets [12]. A total of 38 RCC patients and 128 normal patients were screened with pathological stage IV and histological grade of G4.

Construction of a prognostic signature for high stage and grade patients
Using the Cox univariate regression analysis, the survival-associated lncRNAs were identified. Next, Lasso regression analysis was used to obtain the most strongly survival-associated lncRNAs in RCC, further fitting in Cox multivariate regression analysis to generate the regression coefficient for each lncRNA. The following formula based on a combination of Cox coefficient and the expression of lncRNA was used to calculate the risk score: Risk score = (coefficient lncRNA1 × lncRNA1 expression) + (coefficient lncRNA2 × lncRNA2 expression) + . . . + (coefficient lncRNAn ×lncRNAn expression). The median risk score was used as a cutoff to divide the RCC patients into highand low-risk groups. The receiver operating characteristic (ROC) curves were plotted and the area under the curve (AUC) values were calculated with 'SurvivalROC' R package, which was used to evaluate the predictive power [13].

Gene set enrichment analysis (GSEA)
GSEA was conducted between the high and low-risk score groups to explore the relationship between risk score and lncRNA function. GSEA version 4.0.3 software (Cambridge, MA, USA) was used to perform the GSEA, and p values <0.05 with false discovery rate (FDR) <0.05 were considered statistically significant [14].

Pathway enticement for protein-protein interaction (PPI) network of co-expressed genes of 9-lncRNAs
The co-expressed genes of nine lncRNAs were analyzed through calculating Pearson correlation values between the nine lncRNAs and 854 differentially genes RCC and normal patients. Coexpressed genes were screened out according to Pearson coefficient greater than 0.4 and P less than 0.01 and were performed to establish protein-protein interaction (PPI) network by STRING dataset [15] and the most important module was identified by MCODE plugin in Cytoscape software [16]. The key genes contained in key module were incorporated to perform pathway enrichment analysis by metascape [1718].

Statistical analyses
All the data were analyzed using the SPSS 22.0 software (IBM SPSS, Armonk, NY, USA). Overall survival curves were plotted using the Kaplan-Meier approach. Cox proportional hazards model was used to determine the influences of the collected clinicopathological factors on the overall survival or disease-specific survival of RCC patients. Differences in patients' characteristics were compared by student t test for continuous variables and chi-square analysis for categorical variables. A two-tailed P value less than 0.05 was considered as statistical significance.

Results
In this current study, we intended to uncover the significantly independent prognosticators for RCC and to construct a lncRNA-based signature for patients with these distinguished features. Clinical characteristics of RCC patients were incorporated into analyzing the independent prognostic factors for RCC. In accordance with the significant value, the greatest prognosticators were identified, combined with RNA sequence, and thus the prognostic-associated lncRNA signature was built. Enrichment analysis was performed to explore the underlying mechanism for RCC progression.

Demographics of patients
A total of 27,435 individuals with a diagnosis of RCC were identified according to the SEER database. The clinical characteristics of these RCC patients are summarized in Table 1. In this population, the mean age of RCC patients was 60.1 ± 12.3 years. Most of them were white, male, married, and with insure. Histologically, adenocarcinoma was the main histologic type of RCC, accounting for 98.8%. Approximately 64.4% of patients were in stage I, and 96.5% underwent surgery, however, few patients received chemotherapy and radiation, which account for 6.5% and 2.6%, respectively.

Identification of independent prognosticator for RCC based on SEER database
Cox univariate and multivariate analyses were used to identify the independent predictive factors affecting the overall survival of RCC. We found that sex, age, grade, stage, surgery, chemotherapy, radiation, tumor size, and marital status can all be used as independent prognostic factors for the overall survival of RCC (p < 0.05, Table 2). Noticeably, among other influencing factors, the histological classification of undifferentiated group and pathological stage IV had the greatest prognostic risks, with the hazard ratio (HR) values of 3.05 and 6.515, respectively. Furthermore, Cox univariate and multivariate analyses for specific survival were also performed. Results shown in Table 3 indicated that age, grade, stage, surgery, chemotherapy, radiation, tumor size, and marital status can be used as independent prognostic factors for disease-specific survival of RCC (p < 0.05, Table 3). Importantly, the hazard ratio (HR) of patients who underwent surgery was 0.254 times that of those who did not, suggesting that surgery can significantly reduce patients' risk. However, the HR values of patients who received radiotherapy and chemotherapy were much higher than those who did not receive radiotherapy and chemotherapy (13.858 and 13.7058 times, respectively), indicating that radiotherapy and chemotherapy might lead to a worse prognosis. Besides, we analyzed the effects of the clinical parameters on the overall survival of patients of RCC using Kaplan-Meier approach. The results showed that age, race, grade, stage, surgery, chemotherapy, radiation, tumor size, and marital status had significant actions on the survival (p < 0.05, Figure 1).

Construction of a prognostic-associated lncRNA signature in RCC
Analysis of the SEER database indicated that undifferentiated histological grade and pathological stage IV had the greatest prognostic risk for RCC patients. Thus, in order to explore the relevant key molecules, we extracted samples of RCC with pathological stage IV and histological grade of G4 from the TCGA database for analysis.
According to the RNA sequencing expression files of RCC cases from TCGA database, we obtained all expressions of lncRNAs associated with the stage IV and histological grade of G4. These lncRNAs were then analyzed by Cox univariate analysis to identify prognosis-related lncRNAs. To further screen out the key lncRNAs for RCC patients with stage IV and histological grade of G4, Lasso regression and Cox multivariate analyses were performed (Figure 2a-c). The results showed that nine lncRNAs, including KIAA2012, CCNT2-AS1, ITPKB-AS1, TBX2-AS1, NUTM2A-AS1, LINC02522, LINC02384, LINC01559, and LINC00865, were selected to construct the prognostic model.
According to the expression of these nine lncRNAs for the overall survival prediction of RCC patients with stage IV and histological grade of G4, we established a risk score of the 9-lncRNA signature with the following formula: Risk score = (KIAA2012-AS1 × 95.28813027) + (CCNT2-AS1 × 2.637995925)+ (ITPKB-AS1× -51.44559677)+ (TBX2-AS1 × 1.102500797)+ (NUTM2A-AS1 × 3.340712729)+ (LINC02522 × 7.500647683)+ (LINC02384×-0.515463704)+ (LINC01559×-0.284528539)+ (LINC00865 × 2.265026437). We further validated the predictive power and stability of the 9-lncRNA signature in predicting the overall survival of RCC patients with stage IV and histological grade of G4 based on the TCGA cohort. Patients were divided into high-and low-risk score groups according to the median score. The Kaplan-Meier survival curve revealed that the survival time of the lowrisk score group was significantly higher than that of the high-risk group (p < 0.01, Figure 3a). Moreover, the time-dependent ROC curves of the 9-lncRNA prognostic signature in the first two years were plotted. Surprisingly, the AUC of the 9-lncRNA prognostic signature risk score in the first and second year was 0.991 ( Figure 3b) and 0.99 (Figure 3c), respectively, indicating that this signature had a powerful capacity in predicting the overall survival of RCC patients with stage IV and histological grade of G4. The distribution of the risk scores and survival status of each patients were shown in Figure 3d, e, and demonstrated that as the risk score increased, the number of patient deaths also increased. In addition, the heatmap of the expressions levels of nine lncRNAs was presented in Figure 3f, and revealed that patients with high-risk score tended to have high expressions of risk lncRNAs, and patients with low-risk score tended to have high expressions of protective lncRNAs.

Pathway enrichment for nine-lncRNA signature
To investigate the signaling pathways associated with the risk score signature, GSEA was conducted between high and low-risk groups based on the TCGA cohort. The results showed that GLYCOSAMINOGLYCAN_BIOSYNTHESIS_H-EPARAN_SULFATE, RNA_POLYMERASE, HOMOLOGOUS_RECOMBINATION, CELL_CYCLE were positively correlated with the risk score (Figure 4a). In the low-risk score groups, the enriched pathways focused on the KEGG_ARACHIDONIC_ACID_METABOLISM, KEGG_CYTOKINE_CYTOKINE_RECEPTOR_I-NTERACTION, KEGG_DRUG_METABOLIS M_CYTOCHROME_P450 (Figure 4b). These enriched pathways implied that the 9-lncRNA signature was correlated with the progression of RCC.
To further clarify the mechanism of this signature regulating the progression of RCC, 408 coexpressed genes of the nine lncRNAs were filtered and minimum required interaction score greater than 0.7 were executed to build PPI network by STRING (Figure 4c). By using the MCODE plugin in Cytoscape software, the most important module comprising 23 genes was identified (Figure 4d). These key gene were enriched to crucial signaling pathway though metascape and were significantly enriched in cell division and cell cycle, cytokines, and DNA damage, embodying in mitotic cell cycle process and cell cycle-phase transition, cell division, mitotic spindle organization, cell cycle, cytokines, meiotic chromosome segregation, DNA replication, signal transduction in response to DNA damage and so on (Figure 4e). These data indicated that the nine lncRNAs might exert vital roles in participating in progression of RCC through regulating cell division, cell cycle, DNA damage, and cytokines levels.

Discussion
RCC is one of the most common malignancies of the urinary system. It originates from the urinary tubular epithelial system of renal parenchyma and accounts for 80%-90% of renal malignancies [18].
Recently, due to smoking, changes in diet and other reasons, the incidence of RCC has gradually increased [2]. Despite the improvements in RCC diagnosis and management observed during the last two decades, the survival of RCC is still unsatisfactory. Herein, we summarized the clinical factors of RCC and identifying the predictive prognosticator, which were conducive to improving the treatment of such diseases. Besides, an accurate prognostic signature was constructed by this study, which will help clinicians to make better and more objective judgment on the prognosis of RCC patients. Initial analysis of RCC data from SEER database revealed that sex, age, grade, stage, surgery, chemotherapy, radiation, tumor size, and marital status were the independent prognostic factors for RCC patients. In addition, we also found that high and low levels of these factors were significantly correlated with survivals of patients with RCC, suggesting that these factors play important roles in clinician's prediction of patient survival. Moreover, because RCC itself was not sensitive to radiotherapy and chemotherapy, thus few patients received the two treatment approaches. We also noted that tumor grading and staging were significantly associated with the prognosis, supporting the previous literature [19]. Furthermore, the histological classification of undifferentiated group and pathological stage IV had the greatest prognostic risks, hinting us that constructing a high stage and grade-related model would play  [20]. Some lncRNAs and lncRNA signatures have been considered as prognostic indicator in a variety of cancers. In RCC, although numerous lncRNA signatures were constructed to be associated with the prognosis [21][22][23], few of them correlated an lncRNA signature with the clinical outcome in the high stage and grade of RCC. In present study, pathological stage IV and histological grade of G4 associated lncRNAs were obtained according to the TCGA cohort. By Lasso analysis and Cox univariate and multivariate analysis, finally, a 9-lncRNA signature, including, KIAA2012, CCNT2-AS1, ITPKB-AS1, TBX2-AS1, NUTM2A-AS1, LINC02522, LINC02384, LINC01559, and LINC00865, was constructed with the potential to predict the prognosis for patients with high stage and grade of RCC. Thus, the risk score model was established based on the signature of these nine lncRNAs. What's more, the Kaplan-Meier survival curve showed that patients in high-risk score group had a worse overall survival than that in low-risk score group. ROC curves also indicated that this 9-lncRNA signature may be served as a better prognostic biomarker. All these results confirmed the predictive power of the prognostic signature for patients with high stage and grade of RCC. Among the nine lncRNAs, LINC01559 was reported as a member of a 6-lncRNA signature that is an noninvasive biomarker of RCC [24]. NUTM2A-AS1 was found to be highly expressed and linked to the prognosis and prognosis in gastric cancer, non-small cell lung carcinoma, hepatocellular carcinoma, suggesting its potential role in other cancers [25][26][27]. These reports pointed the important roles of LINC01559 and NUTM2A-AS1 in tumor progression and are consistent with our present results. The remaining lncRNAs are poorly reported in tumors and deserves to be analyzed further.
Regarding the results of GSEA, we found that several pathways, such as cell cycle, glycosaminoglycan biosynthesis heparin sulfate, homologous recombination, RNA polymerase were positively associated with risk scores whereas arachidonic acid metabolism, cytokine receptor interaction, drug metabolism cytochrome P450 were negatively correlated with risk scores. In addition, PPI and metascape analyses found that cell cycle, cell division, DNA damage, and cytokines levels were enriched by analyzing the pathway enrichment of nine lncRNAs-coexpressed genes. Cell cycle is one of the most important regulatory mechanisms of cellular growth and proliferation. Generally, dysregulation of this pathway is thought to be the first step in carcinogenesis of RCC [28]. A previous study also revealed that glycosaminoglycans may serve as potential biomarkers in RCC [29]. Besides, the metabolism of arachidonic acid by either the cyclooxygenase or lipoxygenase pathway is believed to play an important role in tumor promotion [30]. Microtubule Interacting and Trafficking Domain containing one has been reported to participate in cytokinesis of cell division, and high level of it has been found to trigger a worse survival of RCC, correlate with CD8 + T cell and mediate immune response and cytokine- cytokine receptor signalings [31]. Joon Chae et al. have determined that alteration in the DNA damage response pathway correlates with increased recurrence in locally advanced clear-cell RCC, hinting exploration of therapeutic agents targeting this pathway for treating these patients [32]. Therefore, the activation of the above mechanisms and pathways may lead to tumorigenesis and progression of RCC. These data further emphasized the importance of the identified lncRNAs in the present study.
However, there were some limitations to the present study. Firstly, incorporated cases were a small number that were insufficient for establishing an ideal signature model for predicting survival in patients with pathological stage IV and histological grade of G4 of RCC. Secondly, this signature is warranted to be verified in the further study through our institute collected cases or experimental analyses in vitro or in vivo. Further, mechanism of action behind this signature regulating RCC progression will be worth depth exploring.

Conclusion
Taken together, the present study analyzed the effect of clinical features on the survival of RCC based on a large population samples that obtained from SEER database. Furthermore, a 9-lncRNA signature for predicting survival in patients with pathological stage IV and histological grade of G4 of RCC was constructed by analyzing the RNA-Seq expression profile in TCGA cohort. The risk score model was a potential prognostic biomarker for patients with high stage and grade of RCC. The results of pathway enrichment reflecting cell division and cell cycle-related signaling also deciphered the potential mechanism of its regulating the progression of RCC. Our results may provide a good predictive performance in the further treatment of RCC with pathological stage IV and histological grade of G4.

Disclosure statement
No potential conflict of interest was reported by the author(s). Highlights 1. A nine-lncRNA signature was built to predict survival in patients with pathological stage IV and histological grade of G4 of RCC. 2. Independent prognosticators were identified to provide favorable evidence for clinical prediction. 3. Discriminating prognostic-associated lncRNA signature and enrichment analysis contributes to uncover the pathogenesis for RCC patients with stage IV and grade of G4.