Identification of hub genes with prognostic values in multiple myeloma by bioinformatics analysis

ABSTRACT
 Purpose Multiple myeloma (MM) is a malignant disease with abnormal proliferation of clonal plasma cells. Hypoxia is an important factor in the pathogenesis and development of MM. However, the underlying mechanisms are not fully understood. Material & Methods To determine hub genes related to hypoxia in MM, this study took integrated bioinformatics analysis with two expression datasets (GSE80140 and GSE80545) downloaded from Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) were filtrated under the condition of both p-value < 0.05 and [log2FoldChange (log2FC)] > 1. Then, gene ontology (GO) and Kyoto encyclopedia of genes and genomes enrichment (KEGG) analysis, and protein–protein interaction (PPI) network construction were utilized to further explore these DEGs. PrognoScan evaluated all the candidate hub genes for survival analysis. Results In total, three hub genes, including FH, TSTA3, and POLR3G, were screened out to be related to hypoxia in MM. Patients with the lower expression level of FH, TSTA3, and POLR3G have statistically significantly longer disease- specific survival (Cox p < 0.05). Conclusion We identified FH, TSTA3, and POLR3G as hub genes which can affect MM patients’outcome and new biomarkers for diagnosis and prognosis of MM. Further functional and mechanistic studies are need to develop in order to make them as potential target for clinical treatment.


Introduction
Multiple myeloma (MM) is a hematological malignancy with about 1% morbidity and mortality of all tumors. In many countries, it has been the second most common malignant disease of the blood system [1]. Recently, there are data showing that the incidence of MM in Asian countries is increasing rapidly [2]. The characteristics of MM mainly include anemia, bone pain, and renal failure, leading to death eventually [3]. With the great progress of treatments including immunoregulatory drugs, proteasome inhibitors, histone deacetylase inhibitors, and monoclonal antibodies in the past few decades, the survival period of MM has been significantly extended, but it still cannot be completely cured [4,5]. As a result, identifying new biomarkers is particularly essential for early diagnosis, effective treatment, and better prognosis of MM.
The hypoxic bone marrow (BM) microenvironment is an important factor for the occurrence and progression of MM. It is mainly manifested in the following aspects: (1) hypoxia reduces the adhesion of tumor cells and accelerates tumor development process [6,7]; (2) hypoxia promotes the expression of vascular endothelial growth factor A, and then neovascularization in the BM is accelerated [8]; (3) hypoxia upregulates the expression of glycolysis-related genes such as lactate dehydrogenase A in MM cells, and the process of cellular glycolysis is significantly enhanced [9]; (4) MM cells interact with BM stromal cells in the hypoxic microenvironment, and some clones enter a dormant state and then have drug resistance [10].
In recent years, high-throughput platforms in gene expression (GE) is more and more important as a method to explore pathogenesis, prognosis prediction, and potential new target drugs of various diseases, including tumor [11]. To date, several chip analyses have revealed the pathogenesis of MM [6,12,13]. However, no chip analysis on hypoxia in MM is conducted, and related critical genes and signaling pathways remain to be clarified. So integrated bioinformatics analysis, including various methods, is utilized to explore hub genes related to hypoxia in MM.
In this study, two microarray datasets (GSE80140 and GSE80545) were downloaded from National Center for Biotechnology Information-Gene Expression Omnibus database (NCBI-GEO) (https://www.ncbi.nlm.nih.gov/ geo/). We picked out differentially expressed genes (DEGs) and further analyzed these DEGs' functions and pathway enrichment. Survival analysis was conducted with the online clinical data, and the results showed that expression levels of FH, TSTA3, and POLR3G were statistically significantly related to MM patients' outcomes. These analysis results provided important information for further research in MM.

Microarray data information and DEG identification
The two GE datasets GSE80140 and GSE80545 were downloaded from NCBI-GEO [14]. Both GSE80140 and GSE80545 were generated with Agilent-039494 Sure-Print G3 Human GE v2 8 × 60K Microarray 039381. GSE80140 compared four myeloma cell lines (RPMI8226, KMS-11, KMS-12BM, and MM.1S) cultured under hypoxia (1% O 2 ) with those cultured under normoxia (20% O 2 ) for 48 h, respectively. GSE80545 profiled the GE in four primary myeloma patient samples exposed to normoxia versus those exposed to normoxia for 48 h. R package 'limma' was utilized to screen out DEGs between normoxic and hypoxic groups [15]. Genes satisfied with p-value < 0.05 and [log2FoldChange (log2FC)] > 1 were picked out as DEGs. Then, all DEGs were visualized by volcanic maps, and the top 50 dramatically altered genes were selected to draw heatmaps by R package 'ggplot2' [16]. Wayne diagram produced by the Venny 2.1.0 (http://bioinfogp.cnb.csic.es/tools/venny/index.html) online tool was used to show the common genes.

Gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) analysis
DAVID (https://david-d.ncifcrf.gov/) was used to conduct GO enrichment analysis and KEGG pathway analysis. GO enrichment includes a biological process (BP), molecular function (MF), and cellular component (CC) as three subontologies. Analysis results were displayed with the 'GOplot' package of R [17]. Pathway analysis results were further analyzed with ClueGO plug-ins of Cytoscape software 3.7.2 [18]. A p-value less than 0.05 was considered the cut-off criterion.

Protein-protein interaction (PPI) analysis
A PPI network was established using the STRING database (https://string-db.org) in order to evaluate the proteins' interrelation encoded by selected genes. Cytoscape software 3.7.2 was used to visualize the genes with a minimum interaction score of more than 0.4.

Survival analysis
The correlation between DEG expression and survival rate in MM was analyzed with the PrognoScan database (http://dna00.bio.kyutech.ac.jp/PrognoScan/) [19]. We researched the relationship between GE and MM disease-specific survival (DSS) based on the GSE2658 dataset (n = 559) provided by Zhan [20]. Cox p-value < 0.05 was treated as statistically significant.

Identification of DEGs in MM
According to the standard of p-value < 0.05 and [log2FC] > 1, 760 and 1499 DEGs were identified using 'limma' R package from GSE80140 and GSE80545 (Figure 1(a)), respectively. The top 50 significant DEGs were used to depicted the heatmaps ( Figure  1(b)). We next utilized Wayne diagram and identified 119 up-regulated genes and 67 down-regulated genes as common to the two datasets, respectively ( Figure 1(c)).

GO and KEGG enrichment analysis
We performed GO and KEGG analysis with candidate DEGs using a functional annotation tools in DAVID in order to further explore their functions. As shown in Figure 2(a), the DEGs were examined according to three subontologies, including BP, MF, and CC, in GO enrichment analysis. Histone H3-K9 dimethylation pathway (p = 1.78E-04, FDR = 0.071474), catalytic activity pathway (p = 6.95E-04, FDR = 0.09272), and nucleoplasm pathway (p = 5.47E-06, FDR = 9.74E-04) were selected as the most significant pathway in each subontologies, respectively (Table 1). Fortyeight candidate DEGs were enriched in these three terms for further analysis. KEGG pathway enrichment analysis result showed that metabolic pathways (p = 1.73E-04, FDR = 0.016285) were the top enriched pathway according to their p-values, and there were 25 DEGs involved in this pathway (Figure 2(b)).
As a result, 48 and 25 DEGs were selected in the GO and KEGG enrichment analysis, respectively (Table 2). Then, we screened out eight common genes with overlapping DEGs selected in the two analyses above. They were TSTA3, FH, FLAD1, PNPO, CAD, POLR3G, SCLY, and POLR3K.

PPI network construction
In order to further understand the correlation of candidate DEGs and pick out the hub genes, we constructed the PPI network involving all the DEGs from the GO and KEGG enrichment analysis based on the STRING online database, respectively (Figure 3(a,b)). Then, we used Cytoscape plug-ins cytoHubba to screen all the candidate hub genes according to nodes rank ( Figure  3(c,d)). All the candidate genes were listed in Table 3. Then, we identified FH, POLR3K, TSTA3, POLR3G, and PNPO as common candidate hub genes in the two sets.

Discussion
Due to the complex unknown pathogenesis of MM, it remains an incurable hematologic malignancy which leads to high rates of mortality. The hypoxic microenvironment in BM is closely related to the occurrence, development, treatment, and prognosis of MM. Therefore, it is essential to explore hypoxia-related hub genes with prognostic values to provide potential biosignatures for diagnosis and follow-up treatment of MM.
In this study, we analyzed the GE in two microarray datasets based on myeloma cell lines and primary MM patient samples. About 119 up-regulated and 67 down-regulated common DEGs were identified with p < 0.05 and [log2FC] > 1 standard. Then, we utilized GO term enrichment, signaling pathway enrichment, and PPI network establishment to further explore these DEGs, and FH, POLR3K, TSTA3, POLR3G, and PNPO were identified as candidate hub genes.
Subsequently, survival analysis with PrognoScan based on clinical data of GSE2658 dataset revealed FH, TSTA3, and POLR3G were the hub genes associated with DSS of MM patients.
Fumarate hydratase (FH) is an integral enzyme component of the Krebs cycle and tricarboxylic acid cycle, and it can also act in different kinds of cancer [21]. Ha YS et al. [22] evaluated FH mRNA levels in 140 tumor specimens from the patients with primary renal cell cancer and in 62 specimens of corresponding normal-appearing kidney tissue. The results showed that FH was down-regulated in tumor specimens. Sudarshan S [23], with his team, demonstrated that reduced expression of FH was associated with HIF-2α accumulation and tumor progression. However, another study found that FH inhibition, together with elevated intracellular fumarate, coincides with HIF upregulation. Besides, fumarate could increase the expression level of HIF-regulated genes [24]. TSTA3, also known as GFUS (GDP-L-fucose synthase), is the substrate of several fucosyltransferases involved in the expression of many glycoconjugates [25]. Changes in glycoprotein structure were common in cancer cells that can affect the adhesion, invasion, and metastasis of cells. Yang J et al. [25] found high expression of TSTA3 was a potential biomarker for poor prognosis of esophageal squamous cell carcinoma patients. The same result was found in breast cancer, and the study further revealed that miR-125a-5p and miR-125b could regulate the expression of TSTA3 [26]. POLR3G (RNA polymerase III subunit G), a version of RNA polymerase III, which is enriched in stem and cancer cells [27]. Recently, Petrie JL et al. [28] demonstrated the inhibition effect of POLR3G on cell differentiation and down-regulated POLR3G significantly reduced in the proliferation, invasiveness, tumor-initiating activity, and viability of prostate cancer cells.

Conclusion
In summary, we identified DEGs by integrated bioinformatics analysis to find hypoxia-related hub genes with prognostic value in MM. The study results provide several crucial DEGs for further research on the molecular pathogenic mechanisms and biomarkers of MM so that patients can get target therapy and a longer survival period in the future. It's worth noting that different research results mentioned above showed that the same gene might express at different levels in different tumors, especially in hematologic malignancies. More molecular biological researches are still required to verify the DEGs' function in MM.

Disclosure statement
The authors declare that there is no conflict of interests.