Identification of lncRNA biomarkers in lung squamous cell carcinoma using comprehensive analysis of lncRNA mediated ceRNA network

Abstract Long non-coding RNAs (lncRNAs) act as a member of competing endogenous RNAs (ceRNAs) and plays a significant role in tumorigenesis. The aim of this study was to identify potential lncRNA biomarkers for predicting the prognosis of lung squamous cell carcinoma (LUSC) using a comprehensive analysis of lncRNA mediated ceRNA network. Differentially expressed RNAs datasets were obtained using edge R package in 502 LUSC tissues and 49 adjacent non-LUSC tissues from the Cancer Genome Atlas (TCGA). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed to identify functional enrichment implication of lncRNA related differentially expressed mRNAs. Survival analysis was used Kaplan-Meier curve method. Univariate and multivariate Cox regression analysis were performed to construct a predictive model with lncRNA biomarkers. A total of 2185 lncRNAs, 170 miRNAs and 2053 mRNAs were differentially expressed between LUSC tissues and adjacent non-LUSC tissues. The novel constructed ceRNA network incorporated 184 LUSC-specific lncRNAs, 18 miRNAs, and 49 mRNAs. About 11 of 184 differentially expressed lncRNAs and 1 of 18 differentially expressed miRNAs and 5 of 49 differentially expressed mRNAs were conspicuously related to overall survival (p < .05). Univariate and multivariate cox regression analysis showed that 6 lncRNAs were retrieved to construct a predictive model to predict the overall survival in LUSC patients. In conclusion, CeRNAs contributed to the progression of LUSC and a model with 6 lncRNAs might be potential biomarker for predicting the prognosis of LUSC.


Introduction
Lung cancer is the most leading cause of cancer-related death worldwide [1]. Lung squamous cell carcinoma (LUSC) accounts for about 30% of non-small cell lung cancer (NSCLC) and causes 400,000 deaths annually worldwide [2]. LUSC still has a poor prognosis, with 40% of 5-year survival rate for TNM stage II and <5% for TNM stage IV [3]. Recently, the prognostic biomarkers for LUSC patients is still lacking [4,5]. Therefore, it is urgently needed to identify effective and promising biomarkers in predicting the prognosis of LUSC.
Long non-coding RNAs (lncRNAs) are defined as transcripts longer than 200 bp without protein-coding potential [6]. lncRNAs may play significant roles in varieties of biological functions during the tumorigenesis and metastasis of cancer [7]. In recent years, lncRNA-miRNA-mRNA ceRNA network has been demonstrated to involve in lung cancer [8], cervical squamous cell carcinoma [9], pancreatic adenocarcinoma [10], head and neck squamous carcinoma [11], breast cancer [12], colon adenocarcinoma [13], clear cell renal cell carcinoma [14], ovarian cancer [15], and so on. MiRNAs have been highly evolutionarily conserved and can interact with the target mRNA to silence the gene [16,17]. In molecular biology, competing endogenous RNAs (ceRNAs) regulate other RNA transcripts by competing for shared miRNAs. Because the lncRNAs and mRNAs share the common miRNA response elements (MREs), a complex network of lncRNA-miRNA-mRNA is becoming essential for the regulation of RNA expression [18]. In detail, lncRNAs act as "molecular sponge" directly or indirectly bind to miRNAs competitively, which would eventually result in the weakness of interaction between miRNAs and mRNAs [19].
Recently, six ceRNAs (PLAU, miR-31-5P, miR-455-3P, FAM83A-AS1, MIR31HG and MIR99HG) have been demonstrated to be significantly associated with overall survival of LUSC using the Cancer Genome Atlas (TCGA) database [8]. However, they only obtained prognostic ceRNAs and were not performed to construct a predictive model in their study. In this present study, we firstly constructed a ceRNA network to identify LUSC-specific ceRNAs using RNA sequencing data from the TCGA database. Thereafter, we established and validated a novel combined model to predict the prognosis of LUSC patients.

Patients and TCGA data collection
We used the TCGA data portal (https://portal.gdc.cancer.gov/) to retrieve RNA transcription data of 551 tissues patients with LUSC including cases of 502 tumor tissues and cases of 49 adjacent non tumor tissues. The clinical feature of LUSC patients is shown in Table 1. The clinical feature of LUSC patients including age, gender, race, pathology stage and pathology T, N, M stage. The TCGA dataset is publically available, including more than two types of gene expression data, which intend to improve the early diagnosis and therapy of cancers. Our present study was consistent with the publication guidelines of TCGA database (http://cancergenome.nih. gov/publications/publicationguidelines). Because the RNA sequencing data were obtained directly from the TCGA, no ratification was required from the Ethics Committee.

RNA sequence data retrieval
We used the TCGA data portal GDC (https://portal.gdc.cancer. gov/cart) to download the RNA transcription data (level 3) of 551 LUSC patients. The RNA and miRNA transcription data from 551 LUSC patients were obtained from the Illumina HiSeq_RNA-Seq and Illumina HiSeq_miRNA-Seq sequencing platforms; all the data were retrieved freely. Five hundred and fifty-one patients were divided into two cohorts, including cases of 502 tumor tissues and cases of 49 adjacent non tumor tissues. In this research, we used the Perl and R language for analyzing and coping with RNA sequencing data.

Identification of differential expression mRNAs, miRNAs and lncRNAs in LUSC
We used the Ensembl database (http://www.ensembl.org/ index.html, version 89) to identify mRNAs and lncRNAs. In this research, we excluded mRNAs and lncRNAs that were not included in the Ensembl database. We obtained the differentially expressed mRNAs, miRNAs and lncRNAs by using the "edge R" packages in R software with absolute fold change (log 2) >2 and adjusted the false discovery rate (FDR) to a p values <.01 to correct the statistical significance of multiple experiments. For the obtained DEmRNAs, DEmiRNAs and DElncRNAs, we used the gplots and heat map packages to draw volcano maps and heat maps.
Construction of lncRNA-miRNA-mRNA regulatory network in LUSC To better comprehend the relationships between differentially expressed mRNAs, miRNAs and lncRNAs, the lncRNA mediated ceRNA network in LUSC was constructed. The ceRNA network was based on the theory that by using shared microRNA response elements, lncRNA can act as miRNA sponges to bind miRNAs competitively to modulate the expression of the mRNAs. We constructed the ceRNA network by following the steps below. First, we used the miRCode database (http://www.mircode.org/) to predict relationships between lncRNAs and miRNAs. Next, miRTarBase [20] (http://mirtarbase.mbc.nctu.edu.tw/), miRDB [21] (http:// www.mirdb.org/) and TargetScan [22] (http://www.targetscan. org/) databases were performed to obtain miRNA targeted mRNAs. We showed miRNA-targeted mRNA in only three databases to establish a lncRNA-miRNA-mRNA network to improve the effectiveness of our results. Finally, Cytoscape (http://www.cytoscape.org/) 3.6.1 software was used for visualizing map the results.

Survival analysis and construction of lncRNA related predictive model
The survival R packages (https://CRAN.Rproject.org/package= survival, Version:3.51-3) were performed for survival analysis of all differentially expressed RNAs in the ceRNA network. We identified the lncRNAs associated with overall survival (p < . 05) to act as prognostic lncRNA signature candidates to subject to multivariate Cox regression analysis. On the basis of the median risk score, LUSC patients were divided into two cohorts, including high risk cohorts and low risk cohorts. Receiver operating curve were used for testing the effect of the lncRNA signature (high risk vs. low risk) on overall survival. We performed the receiver operating curve by calculating the area under the curve (AUC) under the binomial exact confidence interval to reveal prognostic biomarkers for predicting survival in LUSC.

Functional enrichment analysis of differentially expressed mRNAs in LUSC
To figure out function represented in the gene profile, the annotation, visualization and integrated discovery (DAVID) database was used for analyzing the functional enrichment of lncRNA related differentially expressed mRNAs in ceRNA network by GO and KEGG pathway analysis. We used KOBAS database to carry out for those differentially expressed mRNAs in the network. In the GO analysis, a p values of <.05 was considered as statistically significant. Furthermore, The GOCircle and GOChord plotting functions of the GOplot R package are used to allow data including analysis from expression analysis and data obtained from functional annotation enrichment analysis. In the KEGG pathway analysis, a p values of <.05 was used regarded as statistically significant. We use "cluster profile R" package to conduct KEGG pathway analysis. Meanwhile, in order to provide a visible graphic that represented the interactions of DEmRNAs and relative KEGG pathway, the gene pathway network was established by using Cytoscape3.6.1.
The prognostic lncRNA-miRNA-mRNA ceRNA Sub-network in LUSC To figure out the regulatory mechanism between prognostic lncRNA, miRNA and mRNA, we extracted prognostic differentially expressed RNAs to construct a sub-network using Cytoscape3.6.1.

Statistical analysis
All statistical analysis was carried out R software (version 3.5.1). We used the survival R packages (https://CRAN.R-project.org/package=survival,Version:3.51-3) to conduct survival analysis. The multivariate cox regression analysis was used for identifying DElncRNA in ceRNA network who can act as a prognostic factor to predict the early diagnosis and prognosis of LUSC. p < .05 was considered statistically significant unless otherwise stated.

Construction of a lncRNA mediated ceRNA network in LUSC
To better comprehend the role of differentially expressed lncRNAs, miRNAs and mRNAs in LUSC, we constructed a lncRNA mediated ceRNA regulatory network to elucidate the interaction between these differentially expressed RNAs. First, we applied 4012 differentially expressed lncRNAs searched from the miRCode database and displayed 699 pairs of interplay lncRNAs and miRNAs using the Perl language. We identified that 18 of the 170 retrieved DEmiRNAs could interplay with 184 differential expression lncRNAs. The representative interactions between lncRNAs and miRNAs in LUSC patients is shown in Table 5. Subsequently, the 820 mRNAs which were targeted by 18 DEmiRNAs were predicted performing miRTarBase [20] (http://mirtarbase.mbc.nctu.edu.tw/), miRDB [21] (http://www.mirdb.org/) and TargetScan [22] (http:// www.targetscan.org/). The representative relationships between miRNAs and mRNAs in LUSC patients is shown in Table 6. The results showed that 49 mRNAs were incorporated in the construction of lncRNA mediated ceRNA network (Figure 2(A)). Eventually, we constructed the lncRNA mediated ceRNA network of LUSC, including 184 DElncRNAs, 18 DEmiRNAs, 49 DEmRNAs, as shown in Figure 2(B). A flow diagram of ceRNA network construction in LUSC is shown in Figure 3.

Identification of survival-related RNAs in ceRNA network in LUSC
To elucidate the lncRNAs, miRNAs and mRNAs with prognostic characteristic, we evaluated the association between lncRNAs, miRNAs, mRNAs expression and overall survival in LUSC patients by using survival R packages. The results showed that a total of 11 out of 184 DElncRNAs were significantly associated with overall survival (p < 0.05) (

Construction of predictive model of six-lncRNAs in LUSC patients
Univariate cox regression analysis was first performed to identify prognosis associated with lncRNAs in LUSC, incorporating 18 lncRNAs that were conspicuously associated with overall survival (p < 0.05). Next, multivariate cox regression was used, and showed that six lncRNAs were eventually  selected to construct a predictive model. We used the linear combination of the expression of the 6 lncRNAs to construct the predictive model. The relative coefficient which were weighted in the multivariate cox regression were as follows: survival risk score ¼ (0.1200 Â expression value of ADAMTS9-AS2 þ 0.2166 Â expression value of AC011483.1 þ 0.1142 Â expression value of TTTY16 þ 0.0988 Â expression value of AC006238.1 þ (À0.0699) Â expression value of LINC00462 þ (À0.0940) Â expression value of CACNA2D3-AS1). Multivariate cox analysis in Table 7.

Risk groupings and ROC curve analysis
As is shown in the heat-map, expression of six prognostic lncRNAs were profiled ( Figure 5(A)). Based on the median risk scores, a total of 495 samples of complete survival information were divided into a high-risk group (n ¼ 247) and a low risk group (n ¼ 248). We used Kaplan-Meier curve with a Log-rank statistical examine to perform survival analysis. As is shown in the Figure 5(B), patients in the low risk group had conspicuously better overall survival than in the high-risk group ( Figure 5(B)). Receiver operating characteristic (ROC) curve was analyzed for testing the influence on the 6-lncRNA signature associated with overall survival in LUSC ( Figure 5(C)).

Enrichment analysis of functional implication of differentially expressed mRNAs signature in LUSC
To elucidate the biological function of lncRNA related differentially expressed mRNAs, we performed enrichment analysis. Gene ontology analysis showed that there were 5 GOs with statistical difference (p < 0.05) and the highest GO biological process was:" GO:0005515 protein binding" (Figure 6(A)). Figure 6(B) shows the interaction between statistically difference top 30 mRNAs and their associated GO terms. The top 5 GO terms were shown in Figure 6(C). Furthermore, as we can see in the Figure 7, The KEGG pathway analysis was used the "cluster profile R" packages. Figure 7(A,B) showed that the top 10 pathways that mRNAs were most enriched in KEGG database. The KEGG pathways enriched by mRNAs are outlined in Table 8. The top 14 enriched KEGG pathways were used to construct "pathway-gene" network for KEGG visualized analysis (Figure 7(C)). p Values <0.05 was considered as statistically significant (Figure 7).

Prognostic DElncRNA mediated ceRNA in sub-network
To figure out the interaction between prognostic lncRNAs, miRNAs and mRNAs in LUSC, we analyzed the prognostic DElncRNA mediated ceRNA network by using cytoscape3.6.1.
As is shown in the Figure 8, only 6 prognostics differentially expressed lncRNAs, 15 differentially expressed miRNAs and 23 differentially expressed mRNAs were showed in the DElncRNA-mediated ceRNA network.

Discussion
In recent years, with the increasing numbers of advanced diagnosis and poor prognosis in lung squamous cell carcinoma, it is pivotal to find more effective prognostic biomarkers to predict survival in LUSC. LncRNA related studies have attracted the attention of varieties of cancer fields. Accumulating researches show that cancer-related lncRNAs may serve as diagnostic or predictive biomarkers of cancer and also have a significant effect on the therapeutic of cancer [23]. However, the RNA sequencing data analysis associated with lncRNA profiles in LUSC is still lacking [24]. Jing et al. have established lncRNA-miRNA-mRNA network in LUSC by using differentially expressed lncRNAs, miRNAs and mRNAs, and the ceRNA hypothesis [25]. Only a few researches have showed that the lncRNA mediated ceRNA network can identify prognostic lncRNA biomarkers in LUSC [26]. Accumulating evidence showed that lncRNAs act as a pivotal component ceRNA family to regulate other RNA transcripts in lncRNA mediated ceRNA network [27]. LncRNA XIST functioned as a ceRNA by regulating miR-367/141-ZEB2 axis to promote TGF-B induced epithelial-mesenchymal transition lncRNA-miRNA interaction was predicted by miRcode; lncRNAs that were not associated with DEmiRNAs were removed. mRNAs targeted by miRNA using miRDB, miRTarBase and TargetScan databases; mRNAs that were not DEmRNAs were removed; ceRNA network was constructed.
in non-small cell carcinoma [28]. LncRNA XIST was identified to act as ceRNA directly bind to miR-137 to facilitate NSCLC cell proliferation and invasion [29]. lncRNA OGFRP1 modulated the expression of LYPD3 by sponging miR-124-3p to contribute to the progression of NSCLC [30]. The upregulation of SNHG6 regulated E2F7 expression by sponging miR-26a-5p to facilitate the development of lung adenocarcinoma. The upregulation of long non-coding RNA SNHG6 was associated with overall survival in lung cancer progression [31]. The lncRNA SNHG15 acts as a ceRNA to regulate YAP1-Hippo signaling pathway through binding to miR-200a-3p in papillary thyroid carcinoma [32].
Recent studies showed that the roles of lncRNA in tumorigenesis and metastasis can indicate the lncRNA may function as a novel biomarker for the diagnosis and prognosis of cancer [33]. lncRNA TUBA4B has been reported that it may serves as a newly predictor for prognosis and modulate the cell viability in non-small cell lung cancer [34]. lncRNA AFAP1-AS1 may acts as an oncogenic to facilitate the migration of non-small cell lung cancer (NSCLC) [35]. LncRNA SNHG1 may acts as an oncogene through ZEB1 signaling pathway by suppressing Tap63 in LUSC [36]. lncRNA SNHG1 functions as an oncogene by inhibiting miR-101-3p to activate the Wnt/ß catenin signaling pathway in NSCLC [37]. The ceRNA hypothesis has been raised as a newly regulatory mechanism functioning through miRNA competition [17]. However, an integrated analysis of LUSC associated lncRNA-miRNA-mRNA network in a whole genome-wide, especially based on high throughput sequencing with a large-scale patient size, is still lacking. Therefore, it is pivotal to find the regulatory mechanism and biologic function of lncRNA-miRNA-mRNA ceRNA network in LUSC.
In the ceRNA network constructed in this article, we found that the low expression of the lncRNA ADAMTS9-AS2 is closely related to 10 miRNAs, including hsa-mir-96, hsa-mir-31, hsa-mir-182, hsa-mir-183, hsa-mir-205, hsa-mir-137, hsamir-372, hsa-mir-301b, hsa-mir-373 and hsa-mir-122. Our study found that a potential mechanism by which ADAMTS9-AS2 competes with hsa-mir-96 to regulate the expression of PRKCE, SLC1A1, DDAH1, PRDM16. However, several researches found that lncRNA ADAMTS9-AS2 was associated with the development of gliomas [38], colorectal cancer [39], gastric cancer [40], ovarian cancer [41]. ADAMTS9-AS2/miR-223-3P/TGFBR3 axis suppresses development of lung cancer [42]. Based on these findings, ADAMTS9-AS2 may be involved in the invasion and migration of LUSC. Our research also found that the lncRNA AC011483.1 may act as a sponge to bind to hsa-mir-338 competitively to regulate the expression of ZWINT. Hsa-mir-338 may act as a cancer suppressor gene to inhibit the epithelial-mesenchymal transition and metastasis in human non small cell lung carcinoma [43][44][45], which is in agreement with our results. In the present study, ADAMTS9-AS2/has-mir-182/RECK axis and ADAMTS9-AS2/hsamir-182/CITED2 axis may construct a ceRNA network were closely to play a significant role in the progression of lung squamous cell carcinoma. Hsa-mir-182 upregulation greatly contributes to lung squamous cell carcinoma [46], which is the same as our results. A previous study shows that TTTY16  may act as a prognostic signature to cluster the patients with LUSC [47]. In addition to ADAMTS9-AS2, TTTY16, we also found that several lncRNAs in the ceRNA network were closely associated with overall survival in LUSC patients. Six lncRNAs had a significant effect on survival. With the risk increased, some lncRNAs expression were changed obviously. From the survival analysis, ADAMTS9-AS2 and TTTY16 had been previously described as a novel prognostic biomarker and a therapeutic target for predicting the prognosis of NSCLC [42,47]. The remaining four lncRNAs have been identified to be associated with the diagnosis and prognosis of LUSC at the first time, which can provide a significant insight to explore novel prognostic biomarkers in the treatment of LUSC in the future.
Because of heterogeneity between patients, conventional prognostic systems often make inadequate predictions of risk groupings and estimates of clinical outcomes. Therefore, in recent years, many studies have been performed to construct a novel predictive model to predict survival in LUSC [26,48]. Compared with previous studies, our study for the first time obtained the differentially expressed mRNAs, miRNAs and lncRNAs using edge R package and used the miRcode database to predict the interactions between lncRNA and miRNA; Next, we performed the miRTarbase, Targetscan and miRDB to obtain miRNA targeted mRNAs. Finally, the lncRNA mediated ceRNA (lncRNA-miRNA-mRNA) network was constructed in LUSC. Then, univariate and multivariate cox regression analysis was performed to obtain prognostic lncRNA biomarkers in LUSC. In this study, we obtained 6 lncRNA signature that predicted survival in LUSC. These 6 lncRNAs were identified to construct a predictive model to predict the early diagnosis and prognosis of LUSC. Survival analysis based on linear combination of 6-lncRNA signature demonstrated that the high-risk cohorts had worse survival rate than the low risk cohorts. p ¼ 0 was considered as most statistical significance compared to p < .05. The 6 lncRNA signature may performed well in the progression of LUSC.
In the present study, the GO analysis showed that the differential expression mRNAs were mostly enriched in protein binding, cytoskeleton, vasculogenesis, plasma membrane and lens fiber cell apoptotic process. The KEGG pathway analysis demonstrated that mostly differentially expressed mRNAs were enriched in cell adhesion molecules, cell cycle, complement and coagulation cascades, ECM-receptor interaction, protein digestion and absorption, p53 signaling pathway, drug metabolism-cytochrome P450, Arachidonic acid metabolism, Malaria and ABC transporters.  Our study subjects were retrieved from TCGA database, which indicate a significant tool for analyzing prognostic biomarkers, it is not known whether our results are applicable to our groups. The predictive prognostic lncRNA signature needs to be verified by molecular biologic experiments on clinical samples in future studies. Eventually, a large-scale samples and experiments studies are supposed to validate the biologic function of the prognostic six lncRNA in LUSC.
In conclusion, our study has identified differentially expressed lncRNAs, miRNAs and mRNAs that potentially predict the progression of LUSC by using the lncRNA, miRNA, and mRNA transcriptome sequencing data from TCGA database to construct a ceRNA network. The lncRNA identified in our constructed ceRNA network may play a pivotal role in the diagnosis and prognosis of LUSC patient.