Blood transcriptomics to characterize key biological pathways and identify biomarkers for predicting mortality in melioidosis

ABSTRACT Melioidosis is an often lethal tropical disease caused by the Gram-negative bacillus, Burkholderia pseudomallei. The study objective was to characterize transcriptomes in melioidosis patients and identify genes associated with outcome. Whole blood RNA-seq was performed in a discovery set of 29 melioidosis patients and 3 healthy controls. Transcriptomic profiles of patients who did not survive to 28 days were compared with patients who survived and healthy controls, showing 65 genes were significantly up-regulated and 218 were down-regulated in non-survivors compared to survivors. Up-regulated genes were involved in myeloid leukocyte activation, Toll-like receptor cascades and reactive oxygen species metabolic processes. Down-regulated genes were hematopoietic cell lineage, adaptive immune system and lymphocyte activation pathways. RT-qPCR was performed for 28 genes in a validation set of 60 melioidosis patients and 20 healthy controls, confirming differential expression. IL1R2, GAS7, S100A9, IRAK3, and NFKBIA were significantly higher in non-survivors compared with survivors (P < 0.005) and healthy controls (P < 0.0001). The AUROCC of these genes for mortality discrimination ranged from 0.80-0.88. In survivors, expression of IL1R2, S100A9 and IRAK3 genes decreased significantly over 28 days (P < 0.05). These findings augment our understanding of this severe infection, showing expression levels of specific genes are potential biomarkers to predict melioidosis outcomes.


Introduction
Melioidosis is a severe infectious disease caused by Burkholderia pseudomallei, a Gram-negative bacterium and biothreat agent [1]. The disease is highly endemic in the tropics, particularly in Southeast Asia and northern Australia but reported cases are increasing globally. Melioidosis carries a mortality rate of 40% or higher in many endemic regions where resources are limited. This poor outcome from melioidosis has remained unchanged for many years [2,3]. Melioidosis is associated with several host factors, but diabetes is the major risk [4,5]. Pneumonia and bacteremia are the most common manifestations of disease; infections of these systems are frequently associated with septic shock and contribute to high mortality [2].
A comprehensive understanding of the individual response to infection is necessary to develop effective and targeted therapies. Additionally, biomarkers that predict outcome may be useful to guide patient management. Evaluation of the entire transcriptome of cells offers both the possibilities of characterizing pathways activated in disease and identifying potential biomarkers. In murine melioidosis, blood transcriptomic profiling reveals the regulation of many immune pathways, which reflect severity of disease [6] and can be used to identify a potential marker of acute lung infection [7]. Transcriptomic changes have been reported in human melioidosis during acute infection, highlighting the involvement of host immunity against infection [8]. Recent studies based on microarrays showed that blood transcriptional profiles can distinguish B. pseudomallei infection from sepsis caused by other microorganisms [9,10]. These studies suggest that these transcriptomic profiles may be useful in understanding the immune response during infection and serve as informative biomarkers of infection. RNA-sequencing (RNA-seq) is a unbiased approach and powerful tool to define the transcriptome [11]. However, to date, RNA-seq has not been used extensively to characterize human melioidosis. The aims of this study were to use RNA-seq (i) to analyze whole blood transcriptomic profiles of acute melioidosis patients to define biological pathways associated with death, and (ii) to identify host prognostic gene biomarkers that are associated with mortality.

Study design and patients
A prospective study of whole blood transcriptomic analyses in 97 individuals with melioidosis was conducted at seven hospitals in Northeast of Thailand: Udon Thani Hospital, Nakhon Phanom Hospital, Mukdahan Hospital, Roi Et Hospital, Buriram Hospital, Surin Hospial, and Sisaket Hospital. This study was part of a multi-centre study of patients aged ≥15 years who were culture-positive for B. pseudomallei from any type of clinical samples and admitted to the hospitals between January 2015 and December 2019. The inclusion and exclusion criteria were described previously [12]. B. pseudomallei were identified by biochemical tests and latex agglutination [13] at the microbiology laboratories of the hospitals and further confirmed by Matrix-Assisted Laser Desorption Ionization Mass Spectrometry (MALDI-TOF MS) as previously described [14]. Whole blood samples were collected at the time of enrolment (within 24 h of culture results, defined as day 0) and day 5, day 12, and day 28 after enrolment. Clinical information was obtained from the medical records. Mortality of patients was recorded at the hospitals or by phone calls for 28 days of follow-up.
Twenty-three healthy individuals aged ≥ 18 years were recruited from Udon Thani Hospital and Mukdahan hospital as baseline controls for discovery and validation data sets. Inclusion and exclusion criteria for these controls were previously described [15].
This study was designed by the process of 3 data sets as follows: discovery set, validation set, and follow-up set as described in Supplementary Figure 1.

Ethical approval
The study was approved by the ethical committees of Faculty of Tropical Medicine, Mahidol University, Udon Thani Hospital, Nakhon Phanom Hospital, Mukdahan Hospital, Roi Et Hospital, Buriram Hospital, Surin Hospial, and Sisaket Hospital. Written informed consent was obtained from all participants or their representatives.

Sample collection
Three milliliters of whole blood were collected from melioidosis patients and healthy controls into Tempus TM Blood RNA Tubes (Thermo Fisher Scientific) and stored at −20°C or −80°C at the hospitals. The frozen samples were transported on dry ice to the laboratory in Bangkok for RNA extraction.

RNA extraction
Total RNA was extracted from Tempus-stabilized blood using the MagMAX™ for Stabilized Blood Tubes RNA Isolation Kit (Life technologies). Total RNA concentration and its purity were assessed by determining the A260/280 and A260/230 ratios, respectively on the NanoDrop Spectrophotometer (Thermo Fisher Scientific). RNA integrity number (RIN) was assessed with the Agilent RNA 6000 Pico kit on 2100 Bioanalyzer (Agilent Technologies). Genomic DNA contamination was checked by RT-qPCR using primers for the Peptidylprolyl isomerase A (PPIA) gene [16].

Library preparation for RNA-seq
Libraries were prepared from 50 ng of RNA per sample using Ion AmpliSeq™ Transcriptome Human Gene Expression Kit (Thermo Fisher Scientific). Targets of 20,802 genes were amplified with Ion AmpliSeq™ Transcriptome Human Gene Expression core panel (Life Technologies). The primer sequences were then digested, and DNA adaptors (Ion P1 Adaptor and Ion Xpress Barcode Adaptor, Life Technologies) were ligated to the targets. Adaptor ligated targets were purified using the Agencourt AMPure XP reagent (Beckman Coulter) and eluted into an amplification mix containing Platinum PCR SuperMix High Fidelity and Library Amplification Primer Mix (Life Technologies) for further amplification. Size-selection purification was performed using Agencourt AMPure XP reagent (Beckman Coulter). Amplicons were quantified using a Fragment Analyzer TM instrument with a DNF-474 High Sensitivity NGS Fragment Analysis Kit (Advanced Analytical Technologies, INC.). Samples were then pooled together with four samples per pool and performed an emulsion PCR on the Ion Chef System using the Ion PI Hi-Q Chef Kit (Life Technologies). The emulsion PCR samples were loaded on Ion PI v3 chips and sequenced on an Ion Proton System using an Ion PI Hi-Q Sequencing 200 Kit chemistry (Life Technologies) to obtain approximately 200 bp read length.

Transcriptomic data analysis
Sequencing data were generated using Torrent Suite Software version 5.4.0 with AmpliSeq RNA plugin (Thermo Fisher Scientific) and normalized using reads per million mapped reads (RPM) method. The normalized transcripts were analyzed using Gene-Spring GX software version 14.9 (Agilent Technologies) to identify differentially expressed genes (DEGs) within the 10th -100 th percentile. One-way ANOVA was used to compare DEGs among non-survivors, survivors, and healthy controls. Moderated ttest was used to compare DEGs between non-survivors and survivors. An adjusted P value < 0.05 was deemed significant (Benjamini-Hochberg correction method). Functional analysis was derived using Metascape tool (http://metascape.org). Area under the receiver operating characteristic curves (AUR-OCC) were plotted using GraphPad Prism version 6.0.
DEGs were initially selected for validation based on fold change ≥ 2 and adjusted P value ≤ 0.05 between non-survivors and survivors.

Quantitative reverse-transcriptase PCR (RT-qPCR)
Two-step RT-qPCR was used to quantitatively validate gene expression. Total RNA from whole blood was converted into cDNA using the iScript TM cDNA Synthesis Kit (Bio-Rad). The amplification was performed in duplicate in a total volume of 10 μl containing 5 μl of iTaq Universal SYBR Green (Bio-Rad), 2 μl of 4 ng cDNA, 0.4 μl of 10 mM forward primer, 0.4 μl of 10 mM reverse primer, 2.2 μl of distilled water. The cycle conditions were as follows: 1 cycle of 95°C for 30s followed by 40 cycles of 95°C for 10s and 60°C for 30s. After amplification, melting curve analysis was carried out from 65°C to 95°C. Primers were designed using NCBI PrimerBlast (https://www.ncbi. nlm.nih.gov/tools/primer-blast/). All primer pairs are listed in Supplementary Table 1. Peptidylprolyl isomerase A (PPIA), Human large ribosomal protein P0 (RPLP0), and Tata-box binding protein (TBP) were used as reference genes for calculating the relative expression levels of other genes [16] The expression levels were calculated by using the 2 −ΔCt method, where ΔCt = mean Ct of target genemean Ct of the three reference genes.

Statistical analysis
Mann-Whitney or Kruskal-Wallis tests followed by Dunn's multiple comparison tests correction were used to test the difference in gene expressions among subject groups. Mean, median, interquartile range (IQR), standard deviation (SD), area under the receiver operating characteristic curve (AUROCC) values and 95% confidence intervals (CI) were assessed using Prism 6 (GraphPad Software). The classification accuracy of the 12 gene signature was determined using the randomForest machine learning R package (v. 4.16) [17] applied to the qRT-PCR data. The AUROCC curve was visualised using the pROC package (v. 1.10).

Whole blood transcriptomic profiles of survivors and non-survivors
To identify genes associated with mortality, we performed whole blood transcriptomic analysis of a discovery set consisting of 29 Thai melioidosis patients, fourteen of whom survived and fifteen of whom died within 28 days, and 3 healthy controls. The clinical characteristics of the patients are shown in Table 1. The quality of 32 RNA samples were analyzed for integrity and read count/mapped read numbers. Overall average RNA integrity numbers (RIN) of 6.0-8.6, average OD ratios 260/280 > 1.8, 260/230 < 1, and average of 22 million reads with mapping rate of >58% were achieved from each cDNA library. Out of 20,802 genes, 18,713 genes with expression values between 10th -100th percentiles were further analyzed using one-way ANOVA and 5,189 genes were statistically different among groups as shown in three dimensional principal component analysis (3D-PCA) plots ( Figure 1).
Analysis of differentially expressed genes (DEGs) between non-survivors and survivors performed using the moderated t-test method identified 283 DEGs. Hierarchical cluster analysis of these genes was generated by GeneSpring ( Figure 2). Whole blood of non-survivors presented more down-regulated genes compared to survivors (fold change ≥ 2). RNA-seq data of 65 up-regulated genes and 218 down-regulated genes with P value ≤ 0.05 and fold change ≥ 2 are shown in Supplementary Table 2. In comparison to melioidosis patients who survived, the fold changes of up-regulated genes in non-survivors ranged between 2.00-15.72 and P value = 1.70 × 10 −3 to 5.47 × 10 −9 . The fold change of down-regulated genes ranged between 2.00-9.42 and P value = 9.50 × 10 −5 to 2.54 × 10 −9 . The volcano plot in Figure 3 shows the distribution and relationship between fold change and P value of 65 up-regulated genes and 218 down-regulated genes in non-survivors in relation to survivors.

Functional enrichment analysis of DEGs between survivors and non-survivors
In order to gain insight into the biological function of DEGs, the genes found significantly differential expressed (65 up-regulated and 218 down-regulated) between survivors and non-survivors were analyzed using the Metascape tool. The analysis was based on combined datasets for enrichment analysis, including gene ontology, KEGG pathways, reactome gene sets, canonical pathways, and CORUM complexes. The data in Figure 4 shows that the significant DEGs were involved in functions of host immune response (n = 7), stress response (n = 6), cell development (n = 35), signaling transduction (n = 23), catabolic process (n = 16), and metabolic process (n = 24). The   Table 3.

Pathway analysis of DEGs between melioidosis survivors and non-survivors
To gain better understanding of the underlying mechanisms of the 283 altered genes in non-survivors compared to survivors, we performed KEGG pathway analysis. Interestingly, KEGG identified six pathways in immunological response that were associated with 65 up-regulated genes (Supplementary Table 4). These included pathways of Toll-like receptor signalling, Th17 cell differentiation, MAPK, IL-17 signalling, FoxO signalling, HIF-1 signalling. Moreover, KEGG identified seven pathways in immunological response that were associated with 218 down-regulated genes. These included hematopoietic cell lineage, cell adhesion molecules (CAMs), intestinal immune network for IgA production, Th1 and Th2 cell differentiation, Th17 cell differentiation, antigen processing and presentation and B cell receptor signalling pathway.

RT-qPCR validation of DEGs to predict mortality in melioidosis
Twenty-eight DEGs were manually selected to confirm the expression by RT-qPCR in a validation set of 30 non-survivors, 30 survivors and 20 healthy controls. The DEGs were selected according to (i) their degree of alteration (fold changes and P value) (Supplementary Table 2   with survivors (P = 0.016) and healthy controls (P < 0.0001) for 1 of 8 down-regulated genes: CD160.

ROC assessment of gene expression as predictive markers for mortality
Receiver operating characteristic (ROC) curves were constructed based on the RT-qPCR results from the validation set of melioidosis patients to examine the classification accuracy of each DEG for distinguishing between non-survivors and survivors ( Figure 6A-C). The highest area under the ROC (AUROCC) were obtained from the genes listed in Supplementary  Table 6. Among these, S100A9 showed the highest AUROCC value (0.88) followed by IL1R2 (0.87) and TLR4 (0.86). The down-regulated gene with the highest AUROCC was CD160 (0.77). A combined signature of the expression of the 12 genes with best individual discriminatory ability was able to classify the non-survivors from the survivors in a Random Forest model (AUROCC 0.85, CI = 0.74-0.94), and completely discriminated the melioidosis patients from the healthy controls ( Figure 6D).

Trajectory of gene expression profiles in survivors after enrolment
Five up-regulated DEGs (S100A9, IL1R2, IRAK3, NFKBIA and GAS7) were selected based on AUROCC ≥ 0.82 and whether the genes have secretory functions of proteins as they may be better suited to a point-of-care assay. Gene expression was measured by RT-qPCR in survivors (n = 8) at day 0, day 5, day 12, and day 28 to test whether expression decreases as patients recovered. The trend of gene expression at day 0, day 5, day 12, and day 28 were determined by calculating the fold change reduction. None of the five genes had major changes in expression at day 5 but S100A9, IRAK3 and IL1R2 subsequently had decreased expression over time as patients recovered (Figure 7 and Supplementary Table 7). Expression of S100A9, IRAK3, IL1R2 and NFKBIA significantly decreased at day 28 relative to day 5. Expression of S100A9, IRAK3, and NFKBIA in patients decreased at day 28 but did not reach to the expression level of healthy controls (P < 0.0001). However, expression of IL1R2 and GAS7 rapidly decreased to the same level of healthy controls and did not change further after day 12 (P < 0.05). The mean fold changes (day 28/day 5) for gene expression of 8 individual patients and 95% CI are shown in Supplementary Table 8.

Discussion
Our study demonstrated that the whole blood transcriptome of melioidosis patients who survived was distinguishable from non-survivors, with 283 DEGs significantly associated with mortality. The majority of these DEGs were related to the immune response, cellular functions and metabolism. Twenty-eight DEGs were selected by functional enrichment and pathway analyses and RT-qPCR of these genes in a validation cohort confirmed 16 up-regulated and 1 down-regulated gene associated with mortality. ROC analyses of the validation set identified the 15 most predictive genes. Subsequent RT-qPCR of four selected genes (S100A9, IRAK3, IL1R2, and NFKBIA) in surviving patients followed over time demonstrated a trajectory expression profile with decreased differential expression by day 12 and day 28 after enrolment. Genes of melioidosis patients associated with death include IL1R2, IRAK3, IL18RAP, MGAM, LPL, HGMB2, S100A9, GAS7, NFKBIA, TLR2, TLR4, MAPK14, GPR27, HIF1A, and ITGAM. Many of these genes or their proteins have been reported in related studies. Elevation of IL1R2 expression and soluble IL1R2 concentrations are correlated with severity of Escherichia coli and Staphylococcus aureus infections [18]. Increased expression levels of the IRAK3 Figure 7. One month follow-up of S100A9, IRAK3, IL1R2, GAP7, and NFKBIA in surviving melioidosis patients over the course of illness. Whole blood samples from melioidosis survivors (n = 8) were collected at the various times from diagnosis (day 0, day 5, day 12, and day 28). The P-values were calculated by Mann-Whitney test. Data of healthy individuals were plotted as the controls.
gene are correlated with the development of acute lung injury in patients with severe sepsis [19]. In melioidosis, Wiersinga et al. reported up-regulation of IRAK3 is related to attenuated capacity of monocytes to respond to B. pseudomallei stimulation and this coincided with mortality [20]. In parallel to our study, a recent study reported that extracellular S100A8 and S100A9 (S100A8/A9), a Ca 2+ sensor in cytoskeleton rearrangement and arachidonic acid metabolism, are the key mediators of sepsis secreted from neutrophils and monocytes during inflammation [21]. The S100A9 serve as damage associated molecular patterns and induce pro-inflammatory cytokine expression and secretion via toll-like receptor 4 (TLR4) activation [22,23]. Increasing evidence supports that NFKBIAmediated inflammation is linked to susceptibility to infectious and inflammatory diseases [24][25][26]. A report demonstrated an up-regulation of NFKBIA expression in mouse macrophages in response to B. pseudomallei infection [27] and our data confirmed that increased NFKBIA expression is associated with fatality in melioidosis patients.
A recent study suggests that HLA-DPA1 and -DRB3 are under-expressed in whole blood of sepsis patients caused by B. pseudomallei, which distinguished melioidosis from sepsis caused by other organisms [9]. In addition, we found HLA-DPB1 was down-regulated in non-survivors in our discovery cohort. Our data also revealed that non-survivors had reduced expression of HLA-DPB1, HLA-DOA, HLA-DOB, and HLA-DRA representing MHC class II molecules, which are important for antigen presentation. Our results in melioidosis are similar to the results of other studies [28][29][30] suggesting that non-surviving patients with severe sepsis from melioidosis or other infections exhibit decreased MHC class II expression and that can contribute to persistent failure of T cell activation [31,32]. We did not observe the changes of these MHC class I at transcriptional levels. However, Dunachie et al. showed the presence of MHC class I genes, HLA-B46, and HLA-C*01 was associated with an increased mortality in an acute melioidosis cohort [8].
Enrichment analysis demonstrated a number of GO terms, including the up-regulation of myeloid leukocyte activation and down-regulation of lymphocyte activation in non-survivors compared with survivors. KEGG pathway analysis revealed many upregulated genes involved in signal transduction pathways associated with severe melioidosis. Among these, TLRs are known to recognize B. pseudomallei LPS and initiate inflammation [33][34][35][36] and acute septic melioidosis patients had increased expression of many TLRs in leukocytes [34]. The activation of MAPK signaling and Th17 pathway in melioidosis patients have also been demonstrated in previous studies [37][38][39][40]. Multiple signaling pathways were down-regulated in severe melioidosis suggesting that prolonged bacterial persistence exacerbates inflammatory responses that may lead to immune exhaustion, immune suppression, and poor outcome of the disease.
Expression of several genes, assayed on day 0, had high mortality discrimination, including S100A9 and IL1R2. Notably, expression of these genes decreased significantly in surviving patients by day 12, suggesting that the gene expression tracks with clinical condition. Therefore, these genes and their encoded proteins could be considered as candidate biomarkers for predicting clinical outcomes in patients with melioidosis, and deserve further study in comparison to other clinical and biological prediction tools.
Strengths of our study were the multi-center design, prospective subject enrolment and sample collection, serial sampling over time in a subset of patients, and validation of selected findings. Some limitations are the relatively small number of samples in the discovery cohort, enrolment into our study only after the diagnosis of melioidosis was confirmed (rather than at the time of admission to hospital), and validation of only a subset of genes.
In conclusion, our findings provide new knowledge about transcriptional host responses in circulating leukocytes from hospitalized melioidosis patients and suggest several candidate biomarkers for further study. These data are important to ongoing efforts to reduce the burden of this often severe infection.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by The Royal Society, UK and the Office of the Higher Education Commission of Thailand under Newton Mobility Grant Award (grant number NI170152), and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health, USA (grant number U01AI115520). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funders.