Single nucleotide polymorphism mutation related genes in bladder cancer for the treatment of patients: a study based on the TCGA database

Abstract Bladder cancer (BLCA) is a common malignancy and has a poor prognosis. Single nucleotide polymorphisms (SNPs) in genes are closely associated with the tumorigenesis and tumor development and yet are not well illustrated in BLCA. In this study, we downloaded SNP‐related data and transcriptome profiling of BLCA from the Cancer Genome Atlas (TCGA) database and identified high frequency mutation genes using Maftools package and differentially expressed genes (DEGs) between BLCA and normal bladder tissues by the Limma package. Then, the overlapping genes between high frequency mutation genes and DEGs were obtained by the Venn diagram tool. These overlapping genes were analyzed by Gene Ontology (GO) and Kyoto Gene and Genome Encyclopedia (KEGG) pathway enrichment analysis, protein–protein interaction (PPI) network construction, survival analysis and drug–gene interaction analysis. As a result, 33 overlapping mutant genes were obtained, which were enriched in multiple KEGG pathways (e.g. cellular senescence) and GO items (e.g. muscle organ development, costamere and actin binding). A significant gene module was constructed by PPI network analysis and included 12 hub genes (SYNE1, DMD, ATM, EP300, ANK2, LAMA2, FAT1, SRRM2, MACF1, TSC1, VCAN and RXRA). Moreover, ATM, RXRA, TSC1, DMD, EP300, LAMA2 and VCAN were drug targets. These findings provide important bioinformatics and theoretical basis for understanding the pathogenesis of BLCA and exploring the detailed mechanisms of mutant genes in the disease.


Introduction
Bladder cancer (BLCA) ranks as the 10th most common malignancy worldwide, with 549,393 new cases diagnosed and 199,922 deaths in 2018 [1] . The incidence and mortality rates in men are 9.6 and 3.2 per 100,000 respectively, and the rates are 4-fold higher in men than in women [1,2]. Thus, BLCA ranks higher among men, and it is the sixth most common cancer and ranks as the ninth leading cause of cancer death [1]. Cigarette smoking is proved to be the major risk factor for BLCA [3]. Most BLCA is diagnosed after the appearance of macroscopic hematuria in patients [4]. Macroscopic hematuria is correlated with the advanced stage of BLCA [4,5]. However, there are no effective methods for BLCA diagnosed at the early stage [6]. In addition, current therapies, mainly including transurethral resection, immunotherapy and chemotherapy, can improve the outcome of BLCA patients diagnosed at the early stage but have no significant effect on improving the prognosis of BLCA patients at the advanced stage [7,8]. Hence, it is necessary to fully understand the molecular mechanisms of the pathogenesis of BLCA.
Single nucleotide polymorphisms (SNPs) are the most common type of variation in genetic sequences [9]. Accumulating evidence suggests that SNPs are closely related to the occurrence and development of tumors [10]. SNPs in promoter regions can alter transcription factor binding, histone modifications, promoter activity and DNA methylation to affect gene expression [11,12]. For instance, SNP of TERT in the promoter region frequently occurs in the progression of BLCA [13]. Mutations in the RB1, FGFR3, TP53 and PIK3CA genes have been observed in BLCA [14].
Variants in the CASP9 and EPHX1 genes are related to BLCA survival [15]. Therefore, SNPs have been regarded as potential biomarkers for tumor targeted therapy and early diagnosis [16]. However, SNPs in BLCA have not been extensively elucidated [17].
High-throughput sequencing technique provides an important method to investigate the molecular mechanisms of the pathogenesis of tumors and to search for potential therapeutic targets and diagnostic biomarkers [18]. The Cancer Genome Atlas (TCGA) is a comprehensive and important database for cancer genomics research [19]. The datasets in TCGA include genetic variants, miRNA, DNA methylation, transcriptome-wide expression and clinical information [19]. Thus, our study used TCGA database to perform highthroughput sequencing analysis to identify SNPs in BLCA. We downloaded the SNPs data and transcriptome-wide expression of BLCA from TCGA, identified the significant SNPs and differentially expressed genes (DEGs), carried out functional enrichment analysis, protein-protein interaction (PPI) network construction, survival analysis as well as drug-gene interaction analysis. These analysis processes facilitate finding gene mutations associated with the occurrence and development of BLCA, and provide new insights for BLCA treatments.

Data resources and analysis
The SNP data of BLCA were downloaded from TCGA database (https://portal.gdc.cancer.gov/). 'Masked Somatic Mutation' data were chosen from four types of data files and performed using the VarScan software (version 2.4.0, http://varscan.sourceforge.net/). Based on the TCGA database, the raw data of mRNA expression in FPKM (fragment per kilobase of exon model per million mapped reads) format and the corresponding clinical information were obtained using the UCSC Xena website (https://xenabrowser.net/datapages/), including 404 BLCA tissues and 19 normal bladder tissues. All data from TCGA in the present study are public and available, and consequently there was no ethical conflict to declare.

High frequency mutation genes analysis
The SNP-related data in maf (mutation annotation format) were processed and analyzed by the Maftools package (version 2.6.0, https://bioconductor.org/packages/release/bioc/html/maftools.html) [20] in R language to obtain the mutated genes. The genes with more than 10% of mutation frequency were chosen to draw a waterfall plot.
Adjusted p value <0.01 and j log 2 FC (fold change) j > 1 were set as the cut-off criterion to identify DEGs. p-Values were adjusted to reduce the false positive rate using Benjamini and Hochberg false discovery rate method by default, which is the reason for choosing adjusted p value.
Next, the overlapping genes between DEGs and high frequency mutation genes were analyzed using Venn diagram tool (http://bioinfogp.cnb.csic.es/tools/ venny/), which were used for further analysis.

Functional enrichment analysis
The above overlapping genes were performed by the cluster Profiler package (Version 3.16.0, https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) [21] in R language to perform GO (Gene Ontology, http://geneontology.org/) and KEGG (Kyoto Gene and Genome Encyclopedia, https://www.genome.jp/kegg/ kegg_ja.html) pathway analysis. The top 10 items in MF (molecular function), CC (cellular component), and BP (biological process) categories of GO as well as KEGG pathways were selected, which were used to plot bubble maps. Adjusted p value <0.05 and count !2 were designated as cut-off criteria. Count represents the number of genes that are enriched in each item.

PPI network construction and module analysis
PPI network construction was used to illustrate the interaction networks of protein complexes. The STRING (Search Tool for the Retrieval of Interacting Genes, version 3.2.0, https://string-db.org/) online tool was utilized to construct PPI network of the above overlapping genes. An interaction score of PPI pairs !0.15 was considered statistically significant. Thereinto, the significant gene module was visualized by the MCODE (Molecular Complex Detection, Version 1.4.2, http://apps.cytoscape.org/apps/MCODE) [22] plug-in of Cytoscape software with MCODE scores !5 as a selection criterion.

Validation of the hub genes and survival analysis
Based on the transcriptome profiling of BLCA from TCGA database, the expression levels of the genes in the hub module were validated by the GEPIA website (Gene Expression Profiling Interactive Analysis, http:// gepia.cancer-pku.cn/). In addition, the BLCA patients were divided into a low-expression group and a highexpression group based on the median of the gene expression. The corresponding survival analysis of the genes was also processed by the GEPIA website.

Drug-gene interaction
The genes in the hub module were analyzed by the DGIdb (Drug-Gene Interaction database, version 2.0, http://www.dgidb.org/) to identify the interaction between drug and gene. The drugs approved by the FDA (Food and Drug Administration) were chosen as the potential drugs. The drug-gene interaction network was visualized by the Cytoscape software.

Analysis of mutation genes in BLCA
After the mutated genes in the SNP data were identified by the Maftools package, the obtained mutations were classified based on different protein-coding situations, among which missense-mutations had the largest fraction ( Figure 1A). Figure 1B indicated that the most frequently occurring variation type was SNP compared with deletion (DEL) or insertion (INS). In addition, the most common SNV (single nucleotide variants) was C > T in BLCA ( Figure 1C). In each sample, the altered base number of different mutation types was shown in Figure 1D. In each mutation type, the number of the altered bases in different samples was listed in Figure  1E. Next, the top 10 mutated genes, including TTN, TP53, MUC16, KMT2D, ARID1A, KDM6A, SYNE1, PIK3CA, RB1 and HMCN1, are shown in Figure 1F.
Furthermore, we obtained 197 mutated genes with more than 5% of mutation frequency. Besides, 34 mutated genes with more than 10% of mutation frequency were chosen to draw a waterfall plot ( Figure 2).

Identification of DEGs and overlapping of DEGs with mutated genes
After the differential expression analysis, a total of 3180 DEGs were obtained between BLCA and normal bladder tissues, including 579 up-regulated DEGs and 2601 down-regulated DEGs. The expression of DEGs was shown in a heatmap and a volcano map (Figure 3), and their information is listed in Supplemental Table S1. Next, the overlapping genes between DEGs and genes with more than 5% of mutation frequency was analyzed using Venn diagram tool. We obtained 33 overlapping genes for further analysis (Figure 4).

Functional enrichment analysis
The GO and KEGG pathway analysis of the overlapping genes indicated that 8 KEGG pathways were enriched, including cellular senescence, non-small cell lung  cancer, platinum drug resistance, cell cycle, microRNAs in cancer, bladder cancer, Cushing syndrome, and human papillomavirus infection ( Figure 5A). The results obtained 108 significant BP terms (e.g. muscle organ development, regulation of release of sequestered calcium ion into cytosol by sarcoplasmic reticulum, and lymphocyte differentiation), 23 significant CC terms (e.g. costamere, contractile fiber part, and myofibril), and five significant MF terms (e.g. histone methyltransferase activity (H3-K4 specific), actin filament binding, and actin binding), and the top 10 significant items in BP and CC as well as five significant MF items are shown in Figure 5B-D.

PPI network construction and module analysis
PPI network was applied to illustrate the protein interaction of the overlapping genes by using the STRING online tool, containing 33 nodes and 153 edges ( Figure 6A). Moreover, the MCODE was used to visualize the significant gene module, which contained 12 nodes and 34 edges ( Figure 6B). The hub genes in the module consisted of SYNE1, DMD, ATM, EP300, ANK2, LAMA2, FAT1, SRRM2, MACF1, TSC1, VCAN and RXRA (Table 1). Among them, SYNE1 was closely associated with other node proteins (degree ¼ 19), and had the highest mutation percent.

Hub gene expression and survival analysis
The expression and survival analysis of the above hub genes were validated by the GEPIA website using the data and clinical information of BLCA from TCGA database. The results confirmed that SYNE1, DMD, ANK2 and LAMA2 were down-regulated in BLCA tissues compared with normal control tissues ( Figure 7A-D). Additionally, BLCA patients with low expression of LAMA2 showed a relatively high survival rate as compared to the patients with high expression of LAMA2 ( Figure 7E). The down-regulated expression of LAMA2 in BLCA seems to contradict the worse prognosis of patients with high expression of LAMA2, and therefore we speculated that LAMA2 might combine other factors to affect the overall survival of BLCA.

Comparative analysis
BLCA patients are frequently diagnosed at the advanced stage due to lack of effective screening methods [4][5][6]. Despite the development of therapies, the patients still have a poor prognosis [7,8]. A comprehensive understanding of the molecular pathogenesis of BLCA is essential for finding new insights in the diagnosis and treatment of BLCA. Since SNPs play an important role in tumorigenesis and tumor development [10], identification of SNPs is a good direction for providing potential biomarkers for the diagnosis and treatment of BLCA [16]. Thus, we attempted to identify potential biomarkers related to SNPs using bioinformatics methods based on TCGA database.
In the present study, we first downloaded SNP data of BLCA from the TCGA database and identified high frequency mutation genes using Maftools package. Then, the mRNA expression data from TCGA database were processed to obtain DEGs between BLCA and normal bladder tissues by the Limma package. Next, Venn diagram tool was used to identify the overlapping genes between DEGs and high frequency mutation genes. The overlapping genes suggested that SNPs in 33 genes were related to their expression at the transcription level. Subsequently, these overlapping mutant genes were analyzed by GO and KEGG pathway enrichment analysis. These genes were  enriched in eight KEGG pathways, including Cellular senescence, Non-small cell lung cancer, platinum drug resistance, cell cycle, microRNAs in cancer, bladder cancer, Cushing syndrome, and human papillomavirus infection. For example, cellular senescence is a state of irreversible cellular proliferation arrest in response to different stresses [23]. The key proteins (e.g. p16Ink4a/ retinoblastoma and p53/p21) in the senescence pathway play critical roles in senescent-associated processes and cell-cycle arrest [24]. Thus, the overlapping mutant genes can mediate various pathways to regulate the progression of BLCA. In addition, the genes are also enriched in BP (e.g. muscle organ development and lymphocyte differentiation), CC (e.g. costamere, contractile fiber part and myofibril), and MF (e.g. actin filament binding and actin binding) items of GO. BLCA is often classified as non-muscle-invasive bladder cancer and muscle-invasive BLCA [7], which indicates that the muscle organ development of BP is important for the development of BLCA. Costameres are critical for the myofibril's cytoskeletal anchor as well as mechanoregulated signaling [25]. Actin-binding proteins are closely related to tumor metastasis [26]. Hence, the overlapping mutant genes can be involved in different cellular processes, functions and molecules of BLCA.
To further explore the potential roles of the overlapping mutant genes, the protein interaction of the genes was established by PPI network analysis, and then the hub genes were selected to construct the significant gene module by MCODE. The hub genes contain SYNE1, DMD, ATM, EP300, ANK2, LAMA2, FAT1, SRRM2, MACF1, TSC1, VCAN and RXRA. Then, the GEPIA website was used to confirm the down-regulated expression of SYNE1, DMD, ANK2, LAMA2 in BLCA. Moreover, BLCA patients with low expression of LAMA2 showed a relatively high survival rate compared to patients with high expression of LAMA2. The down-regulated expression of LAMA2 in BLCA seems to contradict the worse prognosis of patients with high expression of LAMA2. Therefore, we speculated that LAMA2 might combine other factors to affect the overall survival of BLCA, which is one of our future research directions. Previous reports have also indicated that SYNE1, ATM, EP300, FAT1, MACF1, TSC1, VCAN and RXRA mutations occur in BLCA [27][28][29][30][31][32][33][34]. SYNE1 mutations have been reported to be implicated in motor neuron disease and muscular disorder [35]. Mutated ATM gene in cancer may lead to poor prognosis and chemotherapy resistance [36]. EP300 mutation in BLCA contributes to antitumor immunity [29]. FAT1 mutations suppress the growth and metastasis of diverse tumor cells [37,38]. Mutation in MACF1 can lead to myopathy genetic disease and its dysregulation is associated with cancer, schizophrenia and osteoporosis [39]. TSC1 mutations can regulate mTOR signaling pathway to be involved in diverse cancers [40,41]. VCAN has been shown to be a potential prognostic biomarker in cancers [33,42]. Mutant RXRA in BLCA is found to stimulate urothelial proliferation [34]. This study for the first time identified DMD, ANK2, LAMA2 and SRRM2 mutations in BLCA. DMD in human genome is the largest genetic locus [43], and its deletions characterize meningiomas patients with poor outcome [44]. ANK2 variant is mainly related to arrhythmia phenotypes [45]. LAMA2 is linked with muscular dystrophy [46]. SRRM2 is associated with Parkinson's disease and papillary thyroid carcinoma [47,48]. However, the specific roles and mechanisms of these hub genes in BLCA are not well illustrated, and therefore this is a good research direction for studying the pathogenesis of BLCA.
To investigate the interaction between drugs and hub genes, DGIdb was utilized to identify the potential drug targets, including ATM, RXRA, TSC1, DMD, EP300, LAMA2 and VCAN. It is reported that ATM mutations in BLCA have the potential to respond to platinum-based chemotherapies [28,36]. In addition, 23 drugs were chosen as potential candidates, such as EVEROLIMUS, ETEPLIRSEN, and ACITRETIN. EVEROLIMUS as an oral mTOR-inhibitor has widely applied in various cancers such as BLCA [49]. ETEPLIRSEN can restore DMD's translational reading frame to promote dystrophin production, which can be used for the treatment of Duchenne muscular dystrophy [50]. ACITRETIN is known as a drug for severe psoriasis treatment [51]. The targeting relationship of hub genes and drugs will provide the appropriate therapeutic methods for BLCA patients. Our work has some limitations. Particularly, we did not stratify the SNP to the subtypes of BLCA, and yet this is one of our research directions in the future.

Conclusions
Our study respectively identified the mutant genes and DEGs in BLCA and obtained the overlapping genes of mutant genes with DEGs. The overlapping genes indicated that SNPs in 33 genes were associated with their expression at the transcription level. In addition, the overlapping genes were enriched in multiple KEGG pathways and GO items to affect the progression of BLCA. Moreover, PPI network construction found that 12 hub genes (SYNE1, DMD, ATM, EP300, ANK2, LAMA2, FAT1, SRRM2, MACF1, TSC1, VCAN and RXRA) showed an obvious interaction. Furthermore, ATM, RXRA, TSC1, DMD, EP300, LAMA2 and VCAN could be considered as the drug targets. These findings provide important bioinformatics and theoretical basis for understanding the pathogenesis of BLCA and exploring the detailed mechanisms of mutant genes in the disease.

Disclosure statement
The authors state that there are no conflicts of interest to disclose.

Authors' contributions
Tengyun Long and Xiaoni Li designed the study, supervised the data collection, analyzed the data, Guofei Zhang and Chunming Qiu interpreted the data and prepared the manuscript for publication, Ouyang Huang, Canbiao Sun and Yong Yang supervised the data collection, analyzed the data and reviewed the draft of the manuscript. All authors have read and approved the manuscript.

Data availability statement
The data support the findings of this study are available on reasonable request from the corresponding author.