Identification of candidate blood biomarkers for the diagnosis of septicaemic melioidosis based on WGCNA

Abstract Melioidosis is an infectious disease caused by Burkholderia pseudomallei (Bp), a gram-negative bacillus. Sepsis is the most prevalent type of melioidosis. Due to factors such as lack of precision and slow presentation of bacterial culture tests, the misdiagnosis rate could exceed 100 per cent. Therefore, more reliable, and adaptable diagnostic methods are urgently needed. Weighted gene co-expression network analysis (WGCNA) was employed to screen the featured modules specially expressed in sepsis patients caused by Bp. Two representative co-expression modules were selected to perform gene ontology(GO) and KEGG analysis using ClusterProfiler package based on R language. We found that antigen processing and presentation of exogenous peptide antigen via MHC class I pathway, cytosol to ER transport and cell killing related pathways enriched in darkmagenta module which significantly correlated with the sepsis caused by Bp. Eventually, a diagnostic 6-mRNA signature consisting of ASPHD2, LAP3, SEPT4, FAM26F, WARS and LGALS3BP was identified, which could discern the sepsis caused by Bp compared with other organisms. This will provide a new insight in screening markers for early detection of sepsis caused by Bp, and the interaction between pathogens and hosts. This should shed light on the early detection of Bp-caused infectious diseases.


Introduction
Melioidosis is an infectious disease caused by Burkholderia pseudomallei, a kind of Gram-negative bacillus. Cases have been reported in different regions around the world, including Thailand, China, and Australia etc. In China, it is mainly distributed in provinces of Hainan, Guangdong and Guangxi. People infected with B.pseudomallei are those who have been exposed mainly through inhalation or skin wounds to contaminated soil and water. Sepsis is the most prevalent type of melioidosis and MiRNAs are extensively explored in the diagnosis of sepsis as serum sepsis biomarkers [1]. The misdiagnosis rate could reach 80-90% due to factors such as lack of specificity and slow presentation of bacterial culture results [2]. The patients could be misdiagnosed as pulmonary infection, pulmonary tuberculosis, hepatitis, malaria, and general sepsis, etc. Compared with other infections, B. pseudomallei-induced sepsis has a high mortality rate. Therefore, better diagnostic tests are needed to improve the earlier diagnosis, initiation of appropriate therapy and increase the survival rate of sepsis caused by B. pseudomallei. The ideal biomarkers should be able to rapidly and specifically screen patients with sepsis caused by B. pseudomallei from those patients caused by other pathogens.
Microarray has been proved a useful biomedical tool in the identification of biomarkers in many aspects [3,4]. The immune system in the peripheral blood of people infected by pathogenic bacteria will change [5]. So the gene expression profile based on microarray could be applied to explore the dysregulation of genes caused by pathogen.
Weighted gene co-expression network (WGCNA) could be used to find the modules which include highly correlated genes, and the modules could be related to external traits [6]. WGCNA has already shown its advantage in screening tumour-related hub genes [7][8][9].
In order to explore the genes changed in the peripheral blood of the septicaemia patients caused by B. pseudomallei, 138 samples divided into four groups were performed bioinformatics analysis.

Data processing
The dataset (GSE:69528) used in this manuscript was downloaded from the NCBI Gene Expression Omnibus(GEO) with a total of 138 samples [10]. The labelled cRNA of all samples were hybridised to Illumina HumanHT-12 V4.0 expression BeadChip array. All the 138 samples were divided into four groups including Uninfected type 2 diabetes mellitus (U_DM), Uninfected healthy(U_H), patients with sepsis caused by B. pseudomallei(SB) and patients with sepsis caused by other pathogens(OS). The processed data was downloaded for the next analysis.

Construction of weighted gene co-expression network and identification of modules related to external traits
The genes with similar expression profiles could be grouped in one module by WGCNA, and the genes commonly participated in one biological process or pathway. WGCNA is designed to be an unsupervised analysis method which could group genes based on their expression profiles [11,12]. WGCNA package is accessible for free which can be used to find the modules of highly correlated genes [6].

Functional enrichment analysis
To explore the enrichment results of the genes within each module identified by WGCNA, the functional enrichment analysis of the biological process (GO:BP) and the Kyoto Encyclopaedia of Genes and Genomes (KEGG) analysis [13][14][15] was done using the Clusterprofiler package [15]. In our analysis, the cut-off value for p-value and q-value was set to 0.05 and 0.2, respectively. Because of the redundancy of enriched GO terms, the simRel method was used to do the redundancy analysis between GO terms [16]. The final result was visualised by R language based on REVIOG [16][17][18][19].

Data processing
The series expression matrix data of 138 samples were downloaded from NCBI. The four groups of U_DM, U_H, SB and OS contain 27, 28, 29 and 54 samples, respectively. According to the request of input files for WGCNA, the expression file and corresponding traits file was generated respectively.

Construction of weighted gene co-expression network and identification of modules related to external traits
The adjacency matrix for a scale-free topology network was defined by choosing the soft threshold power 9 which is performed using the pickSoftThreshold function in WGCNA as shown in Figure 1. This value directly affected the construction of the module and the division of genes within the module. This was the lowest value closest to the scale-free network. In our study, the dynamic tree cut method was employed to identify these kinds of genes with similar expression patterns as well as their relevant biological processes and pathways. Next, the modules were clustered according to their eigengenes which stands for their correlations. Here, the cut height of 0.2 is chosen to merge the similar modules (Figure 2(A)). The dendrogram was generated using hierarchical clustering, in which the short vertical line corresponded a gene and the branches corresponded the coexpressed genes (Figure 2(B)).

Correlation between modules and clinic traits
Genes sharing the similar expression pattern (co-expressed gene) were clustered into the same module. The modules which highly correlates with sepsis caused by B. pseudomallei mainly were picked out based on the correlation between MEs (module eigengenes) and external traits as shown in Figure 3. Darkmagenta and darkgrey modules were identified significantly associated with septicaemia caused by B. pseudomallei. Module darkmagenta and darkgrey were positively and negatively correlated with SB respectively.

Functional enrichment analysis
The genes in dark magenta and dark grey modules were selected to perform the GO:BP and KEGG enrichment analysis using ClusterProfiler package, which was shown in Figure 4. According to the similarity of the GO terms got from ClusterProfiler, the GO term ID and p-value was selected to perform the REVIGO analysis with the SimRel method. Then the results were visualised using R language as shown in Figure 5. The child items belonging to the same parent item   are merged and represented by the most significant item. For dark magenta module, the enriched GO terms included antigen processing and presentation of exogenous peptide antigen via MHC class I, cytosol to ER transport, regulation of amino acid metabolism and multi-organism process, canonical Wnt signalling pathway, cell killing and cytolysis (Figures 4(A) and 5(A)). And the enriched KEGG pathway in dark magenta module included NOD-like receptor signalling pathway, Epstein-Barr virus infection, herpes simplex infection, antigen processing and presentation, primary immunodeficiency, allograft rejection (Figure 4(C)). For dark grey module, the enriched GO:BP terms included gas transport, the regulation of protein catabolism and haemoglobin metabolism (Figures 4(B) and 5(B)). The enriched KEGG pathways included mitophagy (Figure 4(D)). Response to interferon-gamma pathway was enriched from the redundancy analysis results ( Figure 5(A)).
The selection of candidate biomarkers related to sepsis caused by B.pseudomallei R. Pankla et al. identified 37 transcript candidate diagnostic biomarkers from sepsis caused by melioidosis compared with by other organisms [20]. In our study, WGCNA was used to find the genes related closely to sepsis caused by B. pseudomallei. Two modules were identified. Six genes including ASPHD2, LAP3, SEPT4, FAM26F, WARS and LGALS3BP which has been identified in both methods as the candidate biomarkers as shown in Figure 6.

Discussion
The burden of disease caused by melioidosis is seriously underestimated all over the world. There are approximately 165,000 cases of melioidosis in the world each year, of which 89,000 are fatal, resulting in an annual death toll that is comparable to measles and much higher than dengue fever [21]. Melioidosis is a leading cause of community-acquired sepsis [22]. The rate of positive cultures of B. pseudomallei isolated from clinical specimens was low and time-consuming, which could delay diagnosis and treatment [23][24][25]. So, fast and accurate diagnostic methods of B. pseudomallei are urgently needed. Microarray data have demonstrated its application value in the peripheral blood of many diseases [26][27][28][29].
In order to identify the early blood biomarkers differentiating melioidosis from sepsis caused by pathogenes, the whole blood signature was used to perform WGCNA analysis in our study. Two modules were identified which closely correlated with sepsis caused by B. psedomallei. Next, the genes in dark magenta and dark grey module were selected to perform GO:BP and KEGG enrichment analysis. The enrichment results were integrated based on semantic similarity due to the redundancy of GO terms.
As one of two classes of Major histocompatibility complex(MHC), MHC class I present the peptide site including 8-10 amino acids for recognition by the appropriate T-cells. Contrary to MHC class I, MHC class II present antigens derived from extracellular proteins. According to the previous study, elevated mRNA expression related to MHC class II could discriminate septicaemic melioidosis from B. psedomallei [30][31][32]. So the detecting of the expression of the molecules may be conducive for the diagnosis of melioidosis [20]. In the present study, antigen processing and presentation of exogenous peptide antigen via MHC class I, TAP-dependent (GO:0002479) enriched in dark magenta module. This process indicates an exogenous peptide antigen which has been degraded within the cell on its cell surface. In this process, the peptide transports from the cytosol to ER relying on TAP. Exogenous recombinant influenza A virus nucleoprotein has been shown to be processed for presentation of MHC class I antigens to CD8þ cells [32]. Although several mechanisms have been proposed to elucidate the processing of foreign proteins and MHC class I antigen presentation, the details remain unclear [30,31,33]. At the same time, cytosol to ER transport and ubiquitin-dependent protein catabolic process enriched in sepsis caused by B. psedomallei. All of the above showed the importance of MHC class I antigen presentation in the host response to melioidosis. Cell killing pathway enriched in sepsis caused by B. psedomallei too. Different from cell-autonomous death, cell killing refers to the induction of death of one cell by another cell, which is very important to the human defense system. This process is mediated and performed by natural killer cells (NK cells). NK cells are prone to kill the vulnerable cell which loses their MHC I due to infection or being cancer cells. The role of MHC class I in the host-pathogen interaction should be explored deeply which could promote vaccine development.
In our study, the response to interferon-gamma (IFNc) pathway enriched too. IFNc plays a vital in innate and adaptive immunity against bacterial and viral infections by inhibiting viral replication directly. Among the 6 genes overlapping with 37 candidate signatures screened by Damien Chaussabel, LAP3 and WARS participated in IFNc pathway [20]. Recently, it was shown that the galectin 3 binding protein (LGALS3BP, also known as 90 K) is a virus-induced protein. It leads to IFN and pro-inflammatory production by binding with TRAF6 and TRAF3 complex as a scaffold [34]. In addition, it could suppress virus replication. It remains unclear how LGALS3BP works during bacterial infections. As far as we know, there has been no reports of ASPHD2 gene. But it has been suggested that it is one of the components of NK cell membrane protein.
Early diagnosis of sepsis infected by B. psedomallei could save treatment time. The host's response to pathogens can be obtained by analysing the blood genomic transcripts. And the specific biomarkers could be identified.
Our research provides a new perspective for the screening of markers for early diagnosis of sepsis caused by B. pseudomallei, and the interaction between pathogens and hosts. It will provide a basis for the early diagnosis of diseases infected by B. pseudomallei.