Expression profile analysis reveals hub genes that are associated with immune system dysregulation in primary myelofibrosis

ABSTRACT Objection Primary myelofibrosis (PMF) is a familiar chronic myeloproliferative disease with an unfavorable prognosis. The effect of infection on the prognosis of patients with PMF is crucial. Immune system dysregulation plays a central role in the pathophysiology of PMF. To date, very little research has been conducted on the molecular mechanism of immune compromise in patients with PMF. Methods To explore potential candidate genes, microarray datasets GSE61629 and 26049 were obtained from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) between PMF patients and normal individuals were evaluated, gene function was measured and a series of hub genes were identified. Several significant immune cells were selected via cell type enrichment analysis. The correlation between hub genes and significant immune cells was determined. Results A total of 282 DEGs were found, involving 217 upregulated genes and 65 downregulated genes. Several immune cells were found to be reduced in PMF, such as CD4+ T cells, CD4+ Tems, CD4+ memory T cells. Gene Ontology (GO) enrichment analysis of DEGs reflected that most biological processes were associated with immune processes. Six hub genes, namely, HP, MPO, MMP9, EPB42, SLC4A1, and ALAS2, were identified, and correlation analysis revealed that these hub genes have a negative correlation with immune cell abundance. Conclusions Taken together, the gene expression profile of whole blood cells in PMF patients indicated a battery of immune events, and the DEGs and hub genes might contribute to immune system dysregulation.


Introduction
Primary myelofibrosis (PMF) is a chronic myeloproliferative disease (Ph-negative) that is a myeloproliferative neoplasm (MPN) with the main characteristics of a significantly enlarged spleen, enhanced oxidative stress, advanced bone marrow fibrosis and extramedullary haematopoiesis [1]. PMF usually occurs in middle-aged and elderly individuals, and the median age is approximately 67 [2]; its annual incidence is approximately 0.6∼1.3/ 100,000. Although PMF is uncommon, the median survival time of intermediate-risk patients is only approximately 5 years, and there is still a lack of specific treatments. The most important factors related to the prognosis of PMF are complications, such as thrombosis, haemorrhage, and infection [2][3][4]. Among them, infections are one of the main causes of unfavorable prognosis in myelofibrosis, representing the cause of death in approximately 10% of cases [6,7]. Autoimmune symptoms are a main characteristic of patients with PMF [5].
In general, patients with PMF have a pro-inflammatory milieu and dysregulated immune system phenomena. Immune function loss and dysregulation are potential reasons for the increased infection risk in PMF [8]. Currently, much progress has been made against the immune function of PMF patients. Researchers found that myelofibrosis is closely associated with high circulating levels of inflammatory cytokines [9,10]. A decreased frequency of Tregs was observed in the blood of PMF patients [11]; Clodagh et al. found that CD4 + CD127 low CD25 high FOXP3 + T regulatory cells were reduced in MPN patients [12]. These previous studies provide a deep understanding of the key factors of immunosuppression in PMF patients.
A clear pathogenesis is one basic condition of the clinical protocol for precise treatment. An example is the discovery of JAK2 (V617F) or MPL (W515) mutations. This mechanism has provided inspiration for novel therapeutics for PMF patients, such as JAK2 inhibitors (although most JAK2 inhibitors are effective in ameliorating some symptoms, they have little impact on bone marrow fibrosis [13]). However, to date, the molecular mechanism of infection for PMF patients is still difficult to pin down precisely. In this regard, exploring and elucidating the in-depth mechanisms of immune responses may offer some inspiration to reduce the occurrence of complications, ensure the quality of prognosis and develop targeted medicine for PMF. Rapidly developed omics and bioinformatics technologies have offered comprehensive insight into the molecular mechanisms of various clinical issues, and some satisfactory results may be obtained through bioinformatics analysis based on public databases [14].
In this study, comprehensive bioinformatics analysis based on two Gene Expression Omnibus (GEO) datasets was applied to explore the potential molecular factors in PMF. We hypothesize that the mechanism of immunosuppression may be associated with the proliferation and differentiation of haematopoietic stem cells; thus, expression profiles from the whole blood cells of PMF patients were chosen as the analytic target. The purpose of this study was to identify the hub genes with immunological effects in PMF to provide a potential research strategy of the molecular mechanism.

Affymetrix microarray data acquisition and quality control
Fig. S1 performs the whole workflow of this study. Expression profiles of GSE61629 and GSE26049 were downloaded from NCBI-GEO (https://www.ncbi.nlm. nih.gov/geo/). Pre-treatment and quality control procedures, including probe annotation, duplicates removing, data normalization and principal component analysis (PCA) were all executed by using R software (3.6.1).

Immune cell infiltration analysis
xCell [15], an online tool that can achieve cell type enrichment analysis based on gene expression data, was chosen to analyze the immune cell infiltration of PMF and control groups in GSE61629 and GSE26049 datasets. Results of xCell, that is, xCell scores, were obtained and carried out significance analysis to measure the remarkable immune genes, heatmap was used to visualized the cell type enrichment scores d. The comparison of common cell types from different datasets were performed through Venn diagrams.

Identification of DEGs
A normalized expression matrix was used to identify the DEGs. DEGs between PMF and control were identified via Limma package in R, the cut-offs of statistically significant DEGs were set to |LogFC| > 1 and P < 0.05. If LogFC > 1, DEGs can be regard as up regulated; if LogFC < −1, DEGs should be deemed to be down regulated.

Gene ontology (GO) and pathway enrichment analysis of DEGs
GO enrichment and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis were chosen to explore the DEGs at the functional level. The Database for Annotation, Visualization, and Integrated Discovery (DAVID, https://david.ncifcrf.gov/summary.jsp) was used as the functional analysis tool. The cut-off criterion was set as P < 0.05. The result of GO with three terms was performed through enrich circle plot, and KEGG analysis was showed in a bubble plot.

Construction of PPI network
Search Tool for the Retrieval of Interacting Genes (STRING; http://string-db.org)(11.0b) was used to predict the PPI network of the common DEGs. The result of STRING was stored as TSV format and visualized in Cytoscape (3.7.2), an open source bioinformatic platform, to achieve the visualization of PPI network. MCODE, a plugin of Cytoscape, was used to excavate modules in the whole network.

Screening of hub genes
CytoHubba with a series of topological analysis methods was used to select hub genes, the rank of nodes was decided by their attributes in the whole network. In this study, total five selection methods of CytoHubba, involving MNC, stress, closeness, degree and radiality, were used to screen hub genes. The hub genes were selected with degrees ≥10. Venn diagram was used to obtain the intersection result of five sets that based on various taxonomy methods. The expression of six hub genes was performed by NetworkAnalyst 3.0 (https://www. networkanalyst.ca/). Human Protein Atlas was explored to ensure the specific expression tissue of hub genes.

Determination of the relationship between the immune cells and hub genes
Gene expression and xCell score matrixes were used to calculate correlation coefficients. The correlation heat map was based on the result of a built-in function in R, corr.test. The detail results of Pearson's correlation coefficient were performed by using scatter diagrams.
2.8. Construction of hub genes' miRNA regulatory and potential drug network The miRNet (https://www.mirnet.ca/miRNet/home. xhtml), a miRNA-centric network visual analytics platform, was chosen to predict the potential miRNAs of hub genes [16]. The latent drugs for hub genes were measured through the protein-drugs interaction analysis function of NetworkAnalyst 3.0, which based on Drugbank online database (https://go.drugbank.com/). All the prediction results were visualized by using Cytoscape software.

Affymetrix microarray data quality control
A total of 30 samples from GEO database were divided into control group (n = 17) and PMF group (n = 13). To remove the batch effect, each dataset was normalized. Box plots show that direct data become neat after normalization ( Fig. S2a, b, c and d). 3D-PCA results reflect that the PMF group and the control group are discriminated well; the greater the distance is between two symbol points (blue triangle and red circle), the more significant the differences between the samples (Fig. S2e, f).

Identification of significant immune cells in PMF
A total of 64 cell types were explored, and approximately 40% of them were immune cells. There was a markedly disparate pattern of infiltrated cell types between PMF and controls. The enrichment scores of CD4 + T cells, CD4 + Tem cells, CD4 + memory cells, and CD8 + Tem cells were decreased in the PMF group compared to the controls (Figure 1a). In total, 32 and 11 significant cell types were selected from the GSE26049 and GSE61629 datasets (t-test, P<0.05), respectively. Furthermore, we found eight common significant cell types from the two datasets (Figure 1b), including CD4 + T cells, CD4 + Tems, CD4 + memory T cells, CD8 + Tems, melanocytes, natural killer (NK) cells, preadipocytes and classical dendritic cells (cDCs).

Identification of DEGs
In the GSE61629 dataset, compared with the control group, a total of 383 DEGs were identified in the PMF group, involving 274 upregulated genes and 109 downregulated genes (Figure 1c). There were 674 DEGs in the GSE26049 dataset, consisting of 436 upregulated genes and 238 downregulated genes ( Figure  1d). The distinct DEG expression pattern successfully grouped the PMF and control samples (Figure 1e and f).
There were 282 DEGs shared by the two datasets, involving 217 upregulated DEGs and 65 downregulated DEGs (Figure 2a). The top 10 genes with the most significant expression changes are listed in Table S1.

Functional enrichment analysis of DEGs
Potential biological pathways of DEGs were illustrated through GO analysis and KEGG pathway enrichment. Three terms were analysed in GO enrichment, including biological processes, cell composition and molecular function. Figure 2b presents the top 20 most markedly enriched GO terms. The top 10 biological processes are marked as yellow modules, which are primarily associated with defence responses to bacteria, defence responses, haem biosynthetic processes, protoporphyrinogen IX biosynthetic processes, antibacterial humoural responses, cellular iron ion homeostasis, defence responses to fungi and so on. The most significant cell composition and molecular function terms were extracellular space and 2 iron, 2 sulfur cluster binding. Other GO analysis details are shown in Table S2.
KEGG pathway enrichment analysis was demonstrated by bubble diagram (Figure 2c). The analysis shows that the major pathway of DEGs is associated with the haematopoietic cell lineage (Fig. S3a), involving a series of genes, such as IL1A, GYPA, TFRC, ITGA2B, CD3G, CD24, IL7R and MS4A1 (Fig. S3b).

Protein-protein interaction (PPI) network analysis and hub gene identification
To explore the most significant clusters of DEGs, STRING was used to establish a PPI network of DEGs. Ten major modules were recognized by the MCODE plug-in of Cytoscape; among them, module 1, with 18 nodes and 144 edges, was the most significant ( Figure 3a). Five ranking methods of CytoHubba were chosen to identify hub genes in the whole network, and all related results were intersected via a Venn diagram. Finally, a total of 6 common hub genes (HP, MPO, MMP9, EPB42, SLC4A1 and ALAS2) were selected (Figure 3b and c). Hub gene details are listed in Table S3.
As shown in Figure 4a, the expression of HP was upregulated in the PMF groups compared to the control groups. A similar differential expression pattern was identified for MPO, MMP9, EPB42, SLC4A1 and ALAS2 between these two groups; that is, their expression was upregulated in the PMF groups. Basically, these hub genes were specifically expressed in bone marrow and lymphoid tissues (Table S3). A chordal graph was used to reflect a positive association between hub genes, and the most prominent relationship was found between MPO and HP, SLC4A1 and ALAS2, and SLC4A1 and

Association between hub genes and significant immune cells
We found a series of negative correlations between hub genes and significant immune cells (Figure 4d). The most remarkable correlations emerged among HP, MPO, and MMP9 with CD4 + T cells, with Pearson correlation coefficients reaching −0.8848, −0.8141 and −0.7452 (P<0.0001), respectively. The threshold of the Pearson correlation coefficient is set as 0.6. Other hub genes also showed certain negative correlations with immune cells, such as SLC4A1 and memory CD4 + cells; however, their relationship was not statistically significant ( Figure 5).

Construction of hub gene miRNA regulatory and potential drug networks
The miRNA regulatory network of the six hub genes was constructed using miRNet. Each hub gene has multiple relevant miRNAs, which form a complex communication network (Figure 6a). Some miRNAs are related to two or more targeted hub genes, including hsa-miR-143-3p, hsa-miR-34a-5p, hsa-miR-129-5p, hsa-miR-8055, hsa-miR-4716-5p and hsa-miR-147a. For the protein drug interaction analysis, eighteen potential drugs related to the hub genes were screened. Among the explored drugs, five were associated with MPO, eleven drugs targeted MMP, and two drugs were associated with ALAS2 (Figure 6b).

Discussion
In clinical practice, PMF has diverse clinical presentations, which reflect a complex regulatory network. Immune system dysregulation plays a central role in the pathophysiology of PMF [17], but the in-depth mechanism remains unknown. Fortunately, integrated bioinformatics tools have become incredibly powerful in extracting large-scale clinical data and gene expression profiles via public databases [18]. In this study, we found 282 common DEGs and identified six hub genes. To our knowledge, these six hub genes are the first to be reported in the pathogenesis of PMF.
GO enrichment analysis indicated that most biological processes were associated with immune processes (e.g. the defence response), involving the majority of identified DEGs. Infection is one of the most important complications that affects the prognosis of PMF patients. KEGG analysis of DEGs showed that the most significant pathway was the haematopoietic cell lineage. It is known that PMF is mainly caused by clonal haematopoietic stem cell abnormalities, and the most obvious characteristics of blood are anaemia (moderate to significant) and normal or abnormal numbers of white blood cells and platelets. The morphological manifestations of peripheral blood include leuco-erythroblastosis, a significant increase in abnormal red blood cells and teardropshaped red blood cells.
Blood cell development initiates from haematopoietic stem cells (HSCs). HSCs can differentiate into a multi-lineage committed progenitor cells, such as common lymphoid progenitors (CLPs) and common myeloid progenitors (CMPs). CMPs give rise to the myeloid lineage, including erythrocytes, some leukocytes and megakaryocytes that produce platelets; cells belonging to the lymphoid lineage (white blood cells, leukocytes, NK cells, T and B lymphocytes) are differentiated from CLPs. At each stage, multiple genes perform different functions within the differentiation procedure (Figure 6a). Haematopoiesis is orchestrated via a tightly regulated network. Haematopoiesis homeostasis is important, and the imbalance between cells, genes and the microenvironment could induce the relevant disruption to contribute to some haematological diseases, more specifically, myeloproliferative disorders [19,20].
Furthermore, we identified six hub genes from the PPI network, including HP, MPO, MMP9, EPB42, SLC4A1 and ALAS2. HP is an acute phase plasma protein that participates primarily in inflammatory reactions and has various functions, such as antibacterial, antioxidant, carbon monoxide inhibition, and angiogenesis functions [21]. HP is closely related to coronary heart disease, diabetes, hypertension, blood diseases, autoimmune diseases and malignant tumors [22], and it is regarded as a marker of inflammation in clinical cases. In general, an elevated HP level occurs during infections, injuries, and malignancies [23]. Our analysis shows that the expression of HP was increased in PMF patients, which could reflect the level of infection.
MPO is a key mediator of the function of immune and non-immune cells through its non-enzymatic effects. MPO binds to the cell surface of multiple cells, such as platelets [24], neutrophils [25,26], and erythrocytes [27,28], which can activate intracellular signaling processes and change cellular structural and functional characteristics. In this study, the expression profile showed an upregulated trend of MPO, indicating an increased level of halogen-containing compounds. However, the overproduction of halogen-containing compounds could lead to host cell and tissue damage, causing oxidative/halogenative stress responses and resulting in numerous inflammatory diseases [29,30]. The increased level of MPO could make a greater contribution to the progression of infection in PMF. Additionally, pathologic thrombus formation can be caused by an increased activity of blood platelets [31]. A previous study proved that an increased activity of platelets was affected by elevated levels of MPO [32]. Thus, thrombus, a common complication for PMF patients, is also related to the overexpression of MPO.
MMP9 belongs to the matrix metalloproteinase (MMP) family, and MMPs regulate tissue remodeling and repair, inflammatory responses, cellular proliferation, migration, differentiation, angiogenesis, defence response and so on [33]. Similar to other MMPs, MMP9 is mainly secreted by neutrophils and macrophages and regulates inflammation in tissues and diseases [34][35][36][37]. The expression of MMP9 is upregulated in some diseases associated with inflammatory processes, such as arthritis and cancer [38]. Gene expression of peripheral blood MMP9 was found to be upregulated in PMF patients in this study. Coincidentally, researchers also found that the plasma levels of total and active MMP9 were increased in patients with idiopathic myelofibrosis [39], and Jensen et al. found that the level of blood MMP9 was increased in polycythaemia vera patients [40]. The proteolytic properties of MMP9 may exacerbate disease progression under pathophysiological conditions [41], which may also indicate the role of MMP9 in the prognosis of PMF. Moreover, MMP9 affects blood cell development. Youn et al. reported that MMP9 is a negative regulator of erythroid development, and the expression level of MMP9 above the activation threshold is antagonistic to erythropoiesis [42]. We propose that the overexpression of MMP9 in patients with PMF may cause the abnormal erythrocyte development. The EPB42 gene encodes protein 4.2, which is a key protein of the erythrocyte membrane and an ATPbinding protein that regulates band 3 and ankyrin. A previous study found that EPB42 regulates the shape and mechanical properties of erythrocytes [43]. We found that the expression of EPB42 was markedly enhanced in the PMF group, which perhaps implies a potential effect of EBP42 on the erythrocyte function of PMF patients. This result could even reflect an association with immune function due to the negative correlation between EPB42 and some immune cells. However, there is still less research focusing on the in-depth function of EPB42, and whether EPB42 interferes with the immune function of patients with PMF needs future study.
Cl-/HCO 3 -anionic exchange transporter 1 in erythrocytes (band 3/eAE1) and anionic exchange transporter 1 in α-intercalated kidney cells (kAE1) are encoded by SLC4A1 [44]. The encoded proteins of SLC4A1 play an important role in the maintenance of red blood cell structure and the transport of anions [45]. A previous study reported that SLC4A1 has a direct effect on erythroid development [46]. Peripheral blood of patients with PMF may exhibit immature redness, increased myeloblasts, and abnormal red blood cell morphology due to the destruction of bone marrow fibrous sclerosis and microenvironment structure. We suspect that erythroid hyperplasia and abnormal red blood cell morphology are associated with the upregulated expression of SLC4A1 and EPB42 in PMF patients. Furthermore, according to several recent studies, SLC4A1 is reported to be closely associated with some immune-related diseases. Kawamoto et al. found that the production of auto-antibodies that bind to erythrocyte membranes could be stimulated through band 3 expression in colorectal cancer cells, which could cause immune-related anaemia [47,48]. Researchers also suggested that SLC4A1 has the potential to be a candidate gene in the pathogenesis of lupus and lupus nephritis, and SLC4A1 could stimulate innate immune cell functional activity [49].
5-Aminolevulinic acid synthase (ALAS) is encoded by ALAS2 and can catalyze the formation of 5-aminolevulinic acid from succinyl-coenzyme A and glycine, which is the first step in haemoglobin synthesis [50]. Increased haemoglobin synthesis could promote haematopoietic cell differentiation [51]. The upregulated expression of ALAS2 could cause enhanced haem production, haemoglobinization, and erythropoiesis [52]. In this study, we found that the expression of ALAS2 was enhanced in patients, reflecting that ALAS2, similar to EPB42 and SLC4A1, could affect the abnormally increased erythropoiesis in PMF [53], which may be related to ineffective erythropoiesis. Ineffective erythropoiesis is one of the main causes of anaemia and organomegaly. In addition, several studies revealed that haem-or haemoglobin-related genes were also expressed in nonerythrocyte cells, such as cervicovaginal epithelial cells and murine macrophages, in response to different stress conditions [54,55]. It is well established that haem is an important molecular factor for cellular physiological and metabolic functions. However, excessive free haem has toxic effects on cells [56]. In this study, we conjecture that the enhanced expression of ALAS2 reflects an increased level of haem, indicating a feasible reason for the immunological response in patients with PMF.
To prove our conjecture, the cell proportion of the dataset was measured, and the correlativity between hub genes and significant immune cells was determined. Most stromal cells were not significant, according to the t-test. We found that the proportions of some immune cells, such as CD4 + T cells, CD4 + Tem cells, CD4 + memory cells and CD8 + Tem cells, were markedly reduced in the PMF group. This situation suggests a common phenomenon in patients with chronic infections, that is, T cell exhaustion. Due to long-term exposure to persistent antigens and inflammation, exhausted T cells gradually lose their effector functions, and memory T cell characteristics begin to be lost [57]. Previous studies also support our findings; for instance, the frequency of Tregs was decreased in PMF patients [11], and CD4 + CD127 low CD25 high FOXP3 + T regulatory cells were reduced in MPN patients [12]. All of these findings suggest that T cell exhaustion, or T cell loss, plays a key role in the chronic inflammation process of PMF. However, cellular proportion analysis based on bioinformatics tools is still limited, and relevant experimental research is lacking. This hypothesis needs further verification. In addition, we think that some hub genes contribute to immune cell loss because there is a negative correlation between other hub genes and immune cells. A study proved that HP suppresses T lymphocyte responses to PHA and concanavalin A as well as B cell mitogenesis in response to LPS [58]. HP could prevent Langerhans cells from undergoing functional transformation and from activating autologous T cells [59]. Arredouani et al. offered evidence of the effect of HP on T lymphocyte functions and found that HP could specifically result in strong suppression-induced T cell proliferation. In addition, HP also inhibits Th2 cytokine release in vitro [60]. MPO is intimately related to T cell proliferation and responses. Researchers found that the enhanced generation of T cell responses was attributable to MPO deletion in experimental glomerulonephritis and multiple sclerosis [61,62]; MPO could also affect adaptive immunity via the inhibition of murine and human DC activation [63]. In tumors, MMP9 can support the development of myeloidderived suppressor cells (MDSCs) via c-kit and osteopontin cleavage [64], while MDSCs are negative regulatory cells that can inhibit the T cell immune response in cancer patients [65]. For the other three hub genes, there is still a lack of evidence about the relationship between them and T cells; thus, further research is needed.
To measure the regulatory relationships of hub genes at a deeper level, we produced a complicated miRNA network. Certain miRNAs with multiple targets were identified, such as hsa-miR-34a-5p. Hsamir-34a-5p could regulate MMP9, EPB42 and ALAS2, which implies that it could be regarded as a potential target. Interestingly, according to the hypothesis of Lu, hsa-miR-34a-5p could regulate T cell differentiation by affecting epigenetic progress [66]. Additionally, the T cell activation regulation function whereby hsa-miR-34a-5p mediates the WNT, Ras-ERK and NF-κB pathways has already been proven [67]. Therefore, the regulatory relationship between MMP9, EPB42, ALAS2 and hsa-miR-34a-5p may play a part in the immunological process of PMF patients.
In summary, the gene expression profile of whole blood cells in patients with PMF indicates a battery of immune events. The haematopoietic cell lineage could be a pivotal pathway for erythroid and immune cell differentiation in the progression of PMF. The decrease in immune cells could be related to the upregulation of hub genes. All of these findings suggest that the identified DEGs and six hub genes may contribute to immune system dysregulation.

Disclosure statement
No potential conflict of interest was reported by the author(s).