Consensus gene modules related to levels of bone mineral density (BMD) among smokers and nonsmokers

ABSTRACT Osteoporosis, as a common metabolic disorder characterized by the decrease of bone mass, can cause fractures, thereby threatening the life quality of females, especially postmenopausal women. Thus, it is necessary to reveal the genes involved in osteoporosis and explore biomarkers for osteoporosis. In this study, two groups, smokers and nonsmokers with different bone mineral density (BMD) levels, were collected from the Gene Expression Omnibus (GEO) database GSE13850. Consensus modules of the two groups were identified; the variety of gene modules between smokers and nonsmokers with different BMD levels was observed; and a consensus module, including 390 genes significantly correlated with different BMD levels, was identified. Function analysis revealed the significantly enriched osteoporosis-related pathways, such as the PI3K-Akt signaling pathway. Hub genes analysis revealed the critical role of CXCL12 and CHRM2 in modules related to BMD levels. Based on the support vector machine recursive feature elimination (SVM-RFE) analysis, the model containing 10 genes (TNS4, IRF2, BSG, GZMM, ARRB2, COX15, RALY, TP53, RPS6KA3, and SYNPO) with good performance in identifying people with different BMD levels was constructed. Among them, the roles of RALY and SYNPO in the osteogenic differentiation of hBMSCs were verified experimentally. Overall, this study provides a strategy to explore the biomarkers for osteoporosis through analysis of consensus modules.


Introduction
As people get older, the bone mass decreases, and the risk of fractures increases significantly, especially for women after menopause [1]. Osteoporosis is a common metabolic disorder often characterized by low bone mineral density (BMD). It places thousands of premenopausal and postmenopausal women as victims, causing fractures of their spine, hip, and wrist, with some even worse fractures directly leading to mortality, such as hip fractures [1,2]. The association between smoking and the loss of bone mineral density has been reported. Insufficient bone mineral density, especially at the hip, is considered significantly positively associated with smoking [3,4]. Therefore, it is of great significance to reveal the pathophysiology of osteoporosis, explore its novel diagnostic features and analyze the relations between smoking and BMD related features.
Assessing the bone mineral density with dualenergy X-ray absorptiometry is the most widely used method for the diagnosis of osteoporosis [5]. Diagnostic features such as biomarkers involving the formation of bones in the aspects of serum procollagen type I N-terminal propeptide (s-PINP), urinary N-telopeptide (NTX), and serum C-terminal telopeptide type I collagen (s-CTX), etc. were developed for predicting the risks of fractures and other skeletal pathologies [6]. Markers such as s-PINP and s-CTX still have limitations, for they lack specificity and fail to reflect the osteocyte activity or the bone tissue quality [6,7]. It is a widely acknowledged fact that the process of bone loss can't be tracked by these molecular markers, especially at the early stage [6,7]. So, the exploration of associated biomarkers is undoubtedly necessary. With the development of sequencing, multiple molecular biomarkers including the circulating miRNAs, have been explored [6,7].
Circulating monocytes, also known as peripheral blood monocytes, play important roles in the response to inflammation [8]. Researchers summarized the benefits and reasons for using peripheral blood monocytes (PBMs) containing precursors of osteoclasts, as study models for bone-related diseases [9][10][11]. For example, PBMs could be used as possible precursors of osteoclasts and affect the differentiation of osteoclast by producing potent cytokines [12][13][14]. The significant role of peripheral blood monocyte-expressed genes in osteoporosis has already been reported by multiple publications. Deng's team found that the expression of ANXA2 was significantly upregulated in patients with low BMD compared with those healthy people with high BMD [15]. The expression of long noncoding RNAs in PBMs was confirmed to be involved in the regulation of osteoporosis [16]. Zhou, et al. revealed five critical independent pathways related to BMD [17]. Recently, the weighted gene co-expression network analysis (WGCNA), as an efficient tool for locating the highly correlated gene clusters (modules), has been widely used to analyze the network of genes or hub genes in multiple diseases. Researchers has constructed a scoring system for the prediction of BMD and then found the significant role of ribonucleoprotein complex biogenesis in osteoporosis [18]. Smoking is reported as a common behavior that can lead to impaired osteogenesis [19]. A recent study revealed six genes (HNRNPC, PFDN2, PSMC5, RPS16, TCEB2, and UBE2V2) associated with smoking-related postmenopausal osteoporosis [20]. At present, studies on the comparative analysis of multiple groups with osteoporosis (such as smokers and nonsmokers) are limited. However, they are essential in exploring behavior-related osteoporosis and revealing critical osteoporosis-related gene clusters (also called consensus modules in WGCNA).
In this study, the analysis of consensus modules was performed among smokers and nonsmokers with different BMD levels to reveal the association between smoking-related modules and osteoporosis-related consensus modules. Two groups of samples (including 20 smokers and 20 nonsmokers) with low or high BMD, were collected for weighted gene co-expression network analysis to identify consensus modules. The gene modules associated with smoking were constructed and then mapped to consensus modules to analyze the smoking-specific modules. Consensus modules were related to clinical conditions (different levels of BMD). Function analysis and protein-protein interactions (PPI) analysis were performed to explore the functional processes and interactions of genes involved in the smoking-specific modules and consensus modules significantly negatively correlated with BMD. Hub genes were revealed based on the interactions of genes. Support vector machine recursive feature elimination (SVM-RFE) was employed to construct the prediction model for people with low BMD and the constructed model was validated with independent data, GSE56815.

Data collection and preparation
To analyze osteoporosis-related gene modules, gene expression profiles associated with osteoporosis were collected and downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) by searching for the key words 'osteoporosis' and 'smoking'. To perform WGCNA analysis, experiments with the sample size larger than 15 were collected. Finally, GSE13850, containing 20 postmenopausal female smokers and 20 nonsmoking samples with low or high BMD, was used for analysis (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc=GSE13850). GSE56815 was adopted as validation data, with 80 female samples -40 with high BMD (20 premenopausal samples and 20 postmenopausal samples) and 40 with low BMD (20 premenopausal samples and 20 postmenopausal samples) [17]. The expression profiles were annotated with annotation files. The expression matrix in this analysis was performed from the same platform, [HG-U133A] Affymetrix Human Genome U133A Array. The average expression level was calculated and considered as the gene expression for genes with multiple probes.

Network construction and consensus module detection
Before the network construction, genes and samples were examined with the WGCNA package, and those containing excessive numbers of missing values were removed [21]. Clustering analysis was performed with R software and the outliers were identified and removed based on the expression of genes. Then, clustering trees were depicted. The network topology was analyzed to choose a proper soft-thresholding power used for the network construction. To make the data sets comparable, the topological overlap matrices (TOM) of the two datasets (the expression set of smokers and nonsmokers) were scaled to mitigate the effect between different statistical properties of TOM and the results of scaling was depicted with the Q-Q plot function in the WGCNA package [21]. The consensus TOM was used as input to perform hierarchical clustering analysis to identify modules.
To obtain larger modules, the 'minModuleSize' of 30 was considered as cutoff and modules with higher similarity and correlation, 0.75, were merged.

Identifying the smoking-related specific modules
To identify the gene modules specifically related to smoking, the modules were constructed based on the expression profiles of smokers with proper soft power value. The correlations between modules and BMD statuses were analyzed and the smokingrelated modules were, then, mapped to consensus modules. The overlaps of smoking-consensus modules were calculated and the P-value of each pairwise overlap was analyzed with the Fisher's exact test [21,22]. The overlapped counts and the significance of overlaps were depicted with the color-coded table.

Relating consensus modules to the clinical traits
The correlation analysis between consensus modules and the BMD statuses (low or high BMD) in smokers and nonsmokers was conducted, individually. The correlations and the corresponding significance values (P-value) were displayed with a color-coded table [21]. To explore the similarities and differences of the module-trait relationships between different groups (smokers and nonsmokers), the relationships in these two sets were integrated and depicted within one color-code table: taking the lower absolute value as correlation values with similar signs and signing zero relationships as correlation values with opposite signs [21,22].

GO and KEGG analysis
Two gene sets were collected for downstream analysis (one with genes involved in the smoking-related specific modules and the other with genes in consensus modules negatively correlated with BMD levels in smokers and nonsmokers). Function analysis was performed with clusterProfiler software to explore the KEGG pathways and function processes involved, respectively. The P-value cutoff and q-value cutoff were set to 0.05 and 'BH' method was used to adjust the P-value. The minimal number of genes annotated by Ontology term was set to 10 [23].

Protein-protein interactions analysis
To analyze the interactions of genes in modules, the genes were collected and predicted with STRING database (https://string-db.org/). 'Cytohubba' (https:// apps.cytoscape.org/apps/cytohubba), a plugin of Cytoscape software, was employed to analyze the critical genes with default parameters. The top 10 hub genes were precited and depicted with Cytoscape (https:// apps.cytoscape.org/).

Osteoporosis-related signatures analysis
To explore osteoporosis-associated genes, the samples of smokers were randomly and averagely divided into two groups: the train group and the test group. The SVM-RFE method was applied with the 'e1071ʹ package (https://cran.r-project.org/web/packages/ e1071/index.html) to construct a model based on the train group. The model was depicted based on the test group to analyze the potential in separating samples with different BMD levels [24]. The nonsmokers in GSE13850 and the premenopausal and postmenopausal samples in another independent experiment, GSE56815, were used for validation.

Cell culture
Primary hBMSCs (Cyagen Biosciences Inc., US) were cultured in an incubator with 5% CO 2 at 37°C, and DMEM medium (Gibco, US) with 10% fetal bovine serum (Gibco, US) was used. When the cell density reached 70%-80%, the cells were subcultured. hBMSCs were transfected with siRALY or siSYNPO to downregulate the gene expression, and were then divided into four groups: hBMSCs, siRALY, siSYNPO, siRALY + siSYNPO. The cells in hBMSCs group were not treated; the cells in siRALY group were transfected with siRALY; the cells in siSYNPO group were transfected with siSYNPO; and the cells in siRALY + siSYNPO group were transfected with siRALY and siSYNPO.

Western blot
After siRALY or siSYNPO was transfected into hBMSCs, hBMSCs were collected and proteins were extracted. The expression of RALY and SYNPO was detected by Western blotting analysis according to the conventional method [25].

Construction of consensus modules
Revealing consensus modules or gene clusters related to BMD provides basis for exploring critical BMDrelated genes. Based on clustering analysis, 3 samples of nonsmokers in GSE13850, and 1 sample of smokers in GSE13850 were identified as outliers and were then removed. 36 samples (including 17 nonsmokers and 19 smokers) and 12,402 genes were used for this study (Figure 1a). Topology analysis results were depicted and shown in Figure 1b, indicating that the soft power of 9 attaining the minimal approximate scale-free topology of 0.8 was suitable for network construction because the summary connectivity measures declined significantly around the soft power of 9. Compared with the raw TOM of the two sets, the scaling process made the TOM of the two groups more uniform (Figure 1c). A total of 9 consensus modules were identified (module 'greenyellow' with 106 genes, module 'blue' with 469 genes, module 'purple' with 721 genes, module 'red' with 425 genes, module 'pink' with 237 genes, module 'turquoise' with 484 genes, module 'brown' with 917 genes, module 'black' with 390 genes and module 'grey' with 8653 genes) (Figure 1d).

Smoking-related specific modules
To explore smoking-related specific modules, modules constructed with smokers were mapped to consensus modules. Based on the correlation analysis of the modules of the smoking samples and consensus modules, relations of modules among the smoking samples and nonsmokers were observed. Few modules in the smoking samples were significantly overlapped by consensus modules (Figure 2a). Multiple modules in the smoking samples showed specificity, including module 'brown', module 'magenta', module 'yellow', and module 'greenyellow' (Figure 2a). Functional analysis of genes was performed using the smoking-related specific module 'brown' with the largest specific genes (671 genes significantly overlapped with genes in the consensus module gray). No significant enriched pathways and GO terms was observed. PPI analysis indicated significant interactions among genes in the module 'brown'. Hub gene analysis revealed 10 critical genes, CCR7, FPR2, CXCR5, GPR183, CXCL10, OPRL1, GPR37L1, ACKR3, P2RY14, and TAS2R14 (Figure 2b, c). Through relating modules to the clinical traits of the smoking samples, a significant correlation was shown between the four modules (module 'green', module 'tan', module 'pink', and module 'turquoise') and the levels of BMD (Figure 2d).

Relationships between consensus modules and the clinical traits
Correlation analysis was performed to analyze the relationships between consensus modules and BMD statuses in smokers and nonsmokers. Based on the correlation analysis, multiple consensus modules, including module 'blue', module 'purple', module 'red', module 'pink', module 'turquoise', module 'brown', and module 'black', were significantly correlated with different BMD statuses (Figure 3a). On the contrast, only one consensus module, the module 'black', indicated considerable correlation with BMD with a correlation score of 0.53 and a P-value of 0.03 (Figure 3b). Six of nine consensus modules showed opposite correlation with BMD in the two groups of smokers and nonsmokers (Figure 3c).

Function analysis of genes in module 'black' with opposite relationship
Analyzing the consensus relationships between modules and the BMD of both smokers and nonsmokers indicated that though the module 'black' showed a significant correlation with the BMD, they represented opposite relationships in smokers and nonsmokers (Figure 3). To explore the critical genes involved in the opposite module, proteinprotein interactions analysis was performed with genes in module 'black', and the top 10 hub genes were revealed, including CXCL12, APP, CHRM2, CCR5, LPAR1, APLNR, CXCL9, GRM7, CXCR6, and SSTR2 (Figure 4a, b). Genes involved in the module 'black' were collected and used for function analysis. Enriched GO terms were not observed significantly. KEGG pathways, such as

Critical features associated with osteoporosis
To explore biomarkers associated with osteoporosis, SVM-RFE analysis was performed based on the consensus module 'black', which showed a significant correlation between BMD statuses in smokers and nonsmokers ( Figure 3). Based on the SVM-RFE analysis of genes in consensus module 'black', including 390 genes, a model of the features related to BMD levels was constructed and the top 50 features were displayed. The SVM-RFE model performed well in distinguishing smokers with low BMD from those with high BMD, both in the train set and the test set (Figures 2a, 5a). Moreover, the property of the model was tested with nonsmokers in GSE13850 as well as an independent experiment of osteoporosis, GSE56815, including 40 premenopausal samples and 40 postmenopausal samples (4 samples, 2 samples in each group, were identified as outliers and removed through clustering analysis). Considering the variety of genes between premenopausal and postmenopausal females, the samples were divided into two groups according to the menopausal status, and a test analysis was performed for each group, respectively. The testing analysis presented that the features showed pretty stable and great performance in identifying the females with both low and high BMD. The AUC of the model containing 10 genes (TNS4, IRF2

Validation of osteoporosis-related genes
Osteoporosis is mainly caused by the disorder of bone homeostasis, that is, the balance between osteogenesis and bone resorption is broken. Osteogenesis is mainly regulated by osteoblasts, which are generally differentiated from bone-derived mesenchymal stem cells The relationship between consensus modules and BMD levels in the two groups were analyzed. The correlations between consensus modules and the BMD in smokers (a) and nonsmokers (b) and the consensus relationship between consensus modules and the clinical traits of smokers and nonsmokers (c) were depicted. The correlation scores were depicted with colors (blue for negative correlation, red for positive correlation); the lower absolute scores were used to denote the relationships with similar signs between the two groups, while 'NA' refers to those with opposite scores.
(BMSCs). Considering the limitations of experiments (lacking critical clinical BMD-related features, such as the loss of body weight, physical inactivity, alcohol, etc.) used for bioinformatic analysis, the role of RALY and SYNPO in the osteogenic differentiation of hBMSCs was verified with cell experimental method. Western blot results showed that after transfection with siRALY or siSYNPO, the expression of RALY or SYNPO was significantly down-regulated (Figure 6a). Then osteogenic induction was performed on hBMSCs, and the osteogenic differentiation ability of hBMSCs was detected by ARS staining assay. The results showed that there was no significant change in the osteogenic differentiation ability of siRALY, the osteogenic differentiation ability of siSYNPO was significantly weakened, and the osteogenic differe- ntiation ability of siRALY + siSYNPO was further weakened (Figure 6b). qRT-PCR was used to detect the relative expression level of osteogenic marker gene (OPN, RUNX2 and ALP) mRNA. The relative expression level of osteogenic marker genes in siRALY did not change significantly, the relative expression level in siSYNPO decreased significantly, and the relative expression level in siRALY + siSYNPO further decreased (Figure 6c).

Discussion
Before scaling analysis, a great difference was shown in the topological overlap matrices (TOM) of the two groups, indicating the significance of the scaling process before analysis (Figure 1c). Various gene patterns of both smokers and nonsmokers were revealed by means of relating the constructed modules with the gene expression of 19 smokers to consensus modules (Figure 2a). Few modules of the smokers were significantly overlapped with consensus modules, except consensus module 'grey' (Figure 2a). No significantly enriched pathways were observed with genes in smoking-specific modules (module 'greenyellow', 'yellow', 'magenta', and 'brown') ( Figure 2d). Smoking-related genes such as CCR7 were revealed through hub genes analysis [26]. Cigarette smoking has been considered as one of the negative factors influencing osteoporosis, though its role was still unclear [19,27]. In our study, there was no significant correlation between smokingrelated specific modules and different BMD levels.
Considering the limitations in our study, such as the variety of sequencing between the two groups, more works are need to be done to further explore the smoking-specific modules associated with osteoporosis. Nevertheless, our study provides a strategy for exploring behavior-related gene clusters associated with osteoporosis.
Relating consensus modules to the levels of BMD revealed that most consensus modules showed significant correlation with the BMD levels in smokers (Figure 3a). This is not unexpected because most consensus modules, including module 'blue', 'purple', 'red', 'pink', 'turquoise', and 'black', were significantly overlapped by the largest smoking-related module 'turquoise' containing 4563 genes with significant association with the BMD levels of smokers (Figures 2a, d,  3a). Most consensus modules showed no significant correlation with different levels of BMD in nonsmokers (Figure 3b) Interestingly, consensus module 'black', with 390 genes, indicated a statistically significant correlation with different BMD levels (Figure 3a, b, c), reflecting its critical role in the regulation of osteoporosis. Function analysis showed a statistically significantly enriched PI3K-Akt signaling pathway (Figure 4c). The PI3K/Akt signaling pathways were widely reported in regulating the process of osteoporosis [28][29][30]. It was reported by previous publications that the differential gene expressions of circulating monocytes were involved in neuroactive ligandreceptor interaction pathways [31]. According to our analysis, the significantly enriched neuroactive ligand-receptor interaction was observed, showing its potential role in response to the osteoporosis. The top 10 critical genes based on interactions of genes were revealed by hub gene analysis, including CXCL12, APP, CHRM2, CCR5, LPAR1, APLNR, CXCL9, GRM7, CXCR6, and SSTR2 (Figure 4a, b). CXCL12 signaling was considered as an important component in regulating the development and activity of osteoblasts [32]. The increase of the level of plasma CXCL12 was reportedly associated with the severity of postmenopausal osteoporosis patients [33]. CHRM2 was a critical gene regulating the osteogenic differentiation of adipose stem cells [34]. And, based on our analysis, the critical roles of CXCL12 and CHRM2 were shown in the interaction with genes of the BMD-related module in smokers and nonsmokers. The SVM-RFE model with 10 genes (TNS4,  IRF2, BSG, GZMM, ARRB2, COX15, RALY,  TP53, RPS6KA3, and SYNPO) presented good performance in predicting the BMD levels of people, especially for those without smoking behavior and with AUC > 0.9 (Figure 5a, b). The correlation between seven genes (IRF2, BSG (CD147), GZMM, ARRB2, COX15, TP53, and RPS6KA3) and the process of osteoporosis has already been reported [35][36][37][38]. The expression of TNS4 was widely analyzed in cancers. The risky SNPs of RALY and SYNPO associated with osteoporosis were reported as well [39]. The RALY expression was significantly downregulated in non-small cell lung cancer patients with bone metastasis [40]. Considering the limitation of experiments (lacking critical clinical BMD-related features, such as the loss of body weight, physical inactivity, alcohol, etc.) used for bioinformatic analysis, the role of RALY and SYNPO in the osteogenic differentiation of hBMSCs was verified. It was found that the osteogenic differentiation ability did not change significantly when down-regulating RALY expression, the osteogenic differentiation ability was significantly weakened when down-regulating SYNPO expression, and the osteogenic differentiation ability was further weakened when downregulating RALY and SYNPO expression ( Figure 6). The relative expression level of osteogenic marker genes (OPN, RUNX2, and ALP) mRNA also changed accordingly. These results indicated the potential role of SYNPO in regulating osteoporosis. Additionally, a model containing 25 genes (TNS4, IRF2, BSG (CD147), GZMM, ARRB2, COX15, RALY, TP53, RPS6KA3, SYNPO, TRPV2, WNT10B, NUBPL, CHRM5, PPP1R7, SH3TC1, PCOLCE, HCFC1R1, RPL14, MEN1, TAF10, MTA2, SULT1B1, ADAM11, and ZNF185) showed great performance for predicting BMD levels, which were both validated by nonsmokers of GSE13850 and premenopausal and postmenopausal females. More experiments and sequencing dates are needed to validate the potential of the model in the prediction of BMD levels.

Availability of data and material
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by grants from the Natural Science Foundation of Zhejiang Province [LY20H270002].