Identification of core gene in obese type 2 diabetes patients using bioinformatics analysis

ABSTRACT Objectives Adipocytes and adipocyte lipid metabolism are closely related with obesity and type 2 diabetes, but the molecular mechanism still needs further investigation. The aim of this study is to discover the adipocyte genes and pathways involved in obesity and type 2 diabetes using bioinformatics analysis. Methods The GSE27951 gene expression profile was obtained. Software and online tools (STRING, Cytoscape, BioGPS, CTD, and FunRich) were used to identify core genes.21 human subcutaneous adipose samples, with 10 from type 2 diabetic patients and 11 from normal controls, were included in these analyses. Results 184 differentially expressed genes (DEGs) including 42 up-regulated genes and 142 down-regulated genes were found to be enriched in metabolism, receptor activity, collagen type IV and glutamine biosynthesis I pathway by using the enrichment analysis. Seven hub genes were identified from the PPI network using various software (Cytoscape, STRING, BioGPS, and CTD). Four core genes (COL4A2, ACACB, GLUL, and CD36) were found to be highly expressed in subcutaneous adipose tissue of obese patients accompanying type 2 diabetes. Conclusion COL4A2, ACACB, GLUL and CD36 might be the core molecular biomarkers of obesity in patients with or without type 2 diabetes.


Introduction
Obesity is a serious chronic medical condition where patients have accumulated excess adipose tissue that could cause serious complications such as metabolic syndrome, type 2 diabetes mellitus (T2DM) and hypertension [1]. Studies have shown that obesity is increasing every year worldwide, with the number of adults with obesity increasing from 100 (81-122) million in 1975 to 671 (620-725) million in 2016 [2]. According to the statistics released by the International Diabetes Federation (IDF) in December 2015, there were about 450 million diabetic patients worldwide, who mainly suffered from T2DM; 75% were from developing countries [3,4]. Obesity and T2DM have become a burden globally as healthcare costs have increased with a growing population. Thus, determination of target genes promoting the pathogenesis of obesity and T2DMis vital for providing the future personalized treatment plans and offering timely preventative measures.
Obesity leads to accumulation of excess fatty tissues, such as the subcutaneous fat. Adipocytes are the major cell type in the fat tissue. Adipocytes and their metabolism are involved in the development of obesity and T2DM [5,6]. Research suggests that adiponectin is an insulin sensitizing hormone secreted by adipocytes that may promote the absorption of glucose, inhibit glycogen metabolism, stimulate the oxidation of adipocytes, and improve glycolipid metabolism disorders by binding to receptors of liver and skeletal muscle cells [7][8][9]. It has been reported that leptin, visfatin, and FGF21 play important roles in obesity and T2DM [10][11][12]. Moreover, Klimcakova E. et al. used mRNA and protein expression profiling techniques to confirm that adipose tissue macrophage-specific genes (e.g., TRAP, GP110, and M130) are markers of insulin resistance and the metabolic syndrome in human subcutaneous and visceral adipose tissue and thus play leading roles in adiposity [13]. However, it is still uncertain whether adipose tissue preferentially expresses core target genes of obesity and/or T2DM. The vast amount of the gene expression data deposited in public databases provides an opportunity to search for potential key genes of obesity and T2DM in adipose tissue. NCBI-Gene Expression Omnibus (GEO) is such a public database containing high-quantity gene expression data, such as gene chips, microarrays, and next generation sequencebased data [14]. With the gene data torrent from high throughput research in biology, the identification and analysis of differentially expressed genes (DEGs) is an important albeit massive undertaking. An integrated bioinformatics analysis, however, can be used to analyse these data for the discovery of human disease related DEGs.
In this study, we found gene datasets from human adipose tissue associated with obesity and/or T2DM in the GEO database and utilized the GEO2R web tool to define the groups and detect DEGs. We then used the Retrieval of Interacting Genes (STRING) online tool and Cytoscape software to analyse the DEGs and determine hub genes with a combined score of >0.4. Furthermore, the FunRich software was used to perform Gene Ontology (GO) and the pathway enrichment analysis. BioGPS, Comparative Toxicogenomic Database (CTD), and GEO2R were used to determine any correlation between diseases and expression of hub genes. Using all of these strategies, we found the potential gene markers associated with obese patients at risk for T2DM which may be useful for the personalized treatment of those epidemic diseases.

Microarray data and patient sample selection
GEO (https://www.ncbi.nlm.nih.gov/geo/) is a public functional genomics data repository supporting array and next generation sequence-based data (including RNAseq, DNAseq, and ChIPseq) submissions. We searched the GEO DataSets using the keywords 'diabetes' AND 'obesity' AND 'adipose' to find related gene expression profiles. The gene expression profile of GSE27951 including 21 human subcutaneous adipose tissues regardless of T2DM status was obtained from the GEO database [14].

Analysis and the identification of differentially expressed genes (DEGs)
GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/) is an interactive net software tool that allows operators to compare two or more different groups of samples in a GEO series to identify DEGs across experimental conditions via the GEO query and limma R packages (Bioconductor project). GEO2Rwas used to detect DEGs from different groups. We performed a t-test to identify DEGs with a P value <0.05. Log2-fold change (|logFC|) ≥2 cut-off was considered as statistical significance, with logFC≥2 considered as an up-regulated gene and LogFC ≤-2 considered down-regulated. The adjusted P values were used for multiple testing and genes with the smallest adjusted P-values were considered to be the most reliable. We also performed the distribution to see whether the selected samples were suitable for comparison by GEO2R where the mediancentred values suggested that the data is normalized and cross-comparable. If not, samples were omitted.

Integration of protein-protein interaction (PPI) network and modular analysis
STRINGv10.5 (https://string-db.org/) is a web tool used to calculate and evaluate the PPI information in order to expose the potential correlation among DEGs indifferent patient groups [15,16]. Following STRING analysis, we further utilized the Cytoscape software (V3.5.1, integrating biomolecular interaction networks with highthroughput expression data and other molecular states) to identify hub genes with k-core = 2, degree cut-off = 2and cut-off = 0.2. Confidence score of v ≥ 0.4 was considered significant. The related proteins in the central nodes were thought of as the core genes that have important biological regulatory functions [17].

Comparison of the expression level of core genes on normal human adipose tissues by BioGPS
The BioGPS (http://biogps.org/#goto=welcome) is a free online extensible and customizable gene annotation portal tool, supplying a complete resource about gene and protein function. Based on HG_U133A/GNF1H and GNF1M Gene Atlas Data sets [18], we used BioGPS to identify the expression of newly identified core genes on subcutaneous and omental adipose tissues (E-GEOD-15,773 data, http://biogps.org/dataset/E-GEOD-15773/ expression-data-from-human-adipose-tissue/). The barcode function of the R package 'frma' (http://www.bio conductor.org/packages/2.6/bioc/html/frma.html) was used to determine z-scores. A z-score of >5 implies that the gene was expressed in the tissue [19].

Demonstration of the core genes associated with potential diseases
Comparative Toxicogenomic Database (CTD) (http:// ctdbase.org/) is a public database that provides manually curated information about gene-disease relationships [20]. Based on the NCBI gene database (NCBI's RefSeq project), we identified the diseases which were associated with each core gene by CTD.

Functional enrichment and pathway analysis of hub genes
FunRich 3.0 (FR) is a software used to establish the functional enrichment of GO and pathways, gene expression, and interaction network analysis of genes and proteins [21]. FunRich software utilizes several databases (see: http://funrich.org/forum/faq). The data are integrated with functional and pathway data from GO. Pathway analysis via FR. P < 0.05 was considered as the cut-off criterion. We performed the GO analysis using FR with biological processes (BP), molecular functions (MF), cellular components (CC), and pathways among the hub genes [22]. The heat map used to show gene expression was generated by FunRich.

Patient and public involvement
This bioinformatics' research was conducted without patient and public involvement.

Identification of DEGs
The distribution of samples was viewed by GEO2R, the median-centred values are shown in Figure 1. It suggests that the data are normalized and cross-comparable, so the sample quality is reliable and suitable for comparison. These samples were cross-compared in BMI ≥35 kg/ m 2 , BMI 25-34.9 kg/m 2 , and BMI 18-24.9 kg/m 2 of NDM with T2DM patients (nine comparison groups in total), and the DEGs analysis was conducted in every group. The results showed that a total of 184 DEGs, including 42 up-regulated genes and 142 downregulated genes, were found using GEO2R analysis. Based on BMI, T2DM and NDM patients were further divided into BMI ≥35 kg/m 2 , BMI 25-34.9 kg/m 2 and BMI 18-24.9 kg/m 2 (normal weight) group, respectively. We found 14 DEGs inT2DM with BMI ≥35 kg/m 2 versus NDM with BMI ≥35 kg/m 2 group, one DEG in T2DM with BMI 25-34.9 kg/m 2 versus NDM with BMI 25-34.9 kg/m 2 group, and 46 DEGs in T2DM-normal weight versus NDM-normal weight group. Within T2DM patients, we found two DEGs in BMI ≥35 kg/ m 2 versus BMI 24.9-35 kg/m 2 group, 78 DEGs in BMI ≥35 kg/m 2 versus normal weight group, and 16 DEGs in  Table 2 showed the detailed information of DEGs in nine comparison groups.

Module screening from the PPI network
We entered the DEGs of each comparison group and searched through the STRING database. The data was imported from the network as a simple tabular text using Cytoscape software to detect the hub genes.
A combined score of>0.4 was considered statistically significant.

The potential hub gene expression levels in normal tissues
We used BioGPS to identify the hub gene expression in normal tissues (Dataset: Barcode in 262 normal tissue samples; Probe set: 242194_×_at). This data set demonstrated a survey across various normal human tissues (platform: U133plus2 Affymetrix microarray). The median, 3X median, 10X median, and 30X are defaults of the BioGPS presentation sketched by the lines and are not central to this analysis. The results showed that COL4A2, GLUL and CD36 were highly expressed both in insulin resistant omental adipose tissue and subcutaneous adipose tissue, and PLN, ACACB, and ELOVL7 were highly expressed in insulin resistant subcutaneous adipose tissue (Figure 3). Figure 4 showed the expression level of hub genes in normal human organs by FunRich. Table 3 illustrates the expression level of hub genes in normal human adipose tissues.

Demonstration of the potential hub genes associated with potential diseases
To identify the diseases that are most associated with hub genes, we input the potential hub genes to CTD for defining the greater the inference score of the individual gene, which means the more relevance with disease. The CD36, GLUL, MYH11, CD163, and COL4A2 genes were closely associated with weight loss, while the ACACB gene was closely related to insulin resistance ( Table 3). The results also showed that PLN was less relevant to glucose intolerance. The heat-map generated by EnRich showed the expression level of the potential hub genes in normal human organs, with MYH11 having the highest expression in every normal organ. However, MYH11 is a non-specific gene and thus was not considered as a hub gene in this study (Figure 4). Taken together, CD36, GLUL, COL4A2, and ACACB were considered as core genes in human adipose tissue with obesity or T2DM.

The outcome of function and pathway enrichment analysis
In order to identify the function and pathway of the potential hub genes, GO function and pathway enrichment analyses were applied by using FunRich. CD36, GLUL, COL4A2, and ACACB were put into FR software, the gene enrichment analyses showed that hub genes enriched in biological processes (BP), including metabolism, cell growth and/or maintenance, and cell communication. Molecular function (MF) includes receptor activity, ion channel activity, and transaminase activity. Cell component (CC) includes collagen type IV, Golgi apparatus, and platelet alpha granule membrane). The significantly enriched biological pathways include neurotransmitter uptake and metabolism in glial cells, astrocytic glutamate-glutamine uptake and metabolism, and glutamine biosynthesis I (Table 4).

Discussion
It is estimated that 493 million people will be either obese or overweight by 2030, and 629 million people  will be diabetic by 2045 worldwide [23]. It will consume enormous resources and cause immense challenges to maintain well-being in a population with the majority being obese or overweight [24]. Dietary adjustment, medication, and/or surgical intervention can help to obtain certain weight loss achievements, but personalized treatment by identifying key genes or pathways relating obesity and diabetes may provide major improvements in the treatment of obesity and T2DM. Obesity and T2DM are closely related to similar genetic, epigenetic, adipose, metabolism, and endocrine aberrations [25]. Understanding the molecular mechanism behind obesity and T2DM is critically important for early diagnosis and precise treatment. Therefore, it is vital to discover these sensitive and specific gene biomarkers of obesity or T2DM. In this analysis, 21 subcutaneous adipose tissues were extracted from the GEO database of GSE27951. Then, bioinformatics analysis was performed to identify candidate genes from biological database. It is useful for understanding the differences between expressed genes, potential hub genes, pathways, genetics, or unique adaptations amongst samples. Forty-two up-regulated and 142 down-regulated DEGs were selected. Then, TRDN, PLN, GEFL6, MYH11, COL4A2, COL6A6, ACACB, GLUL, ELOVL7, MYH2, CD36, FCGR2A, PTPRC, CD163, CD36, MS4A6A, TNNC2, and MYH2 were considered as hub genes by bioinformatics analysis. It was shown that no hub genes were found in NDM groups (BMI ≥35 kg/m 2 versus BMI 25-34.9 kg/ m2, BMI ≥35 kg/m 2 versus normal weight, and BMI 25-34.9 kg/m 2 versus normal weight), suggesting that there were no differences among those NDM samples with different BMIs [26]. The lack of DEGs in the BMI ≥35 kg/m2 group (DM versus NDM) indicated that if there are more DEGs between DM and the NDM, the greater differential BMI values will be. Obesity is a risk for diabetes as the incidence of diabetics with obesity is more than those with diabetes who are overweight and those with diabetes who are normal weight [23]. Here BioGPS, CTD, FunRich, GO and pathway analysis were used to identify the potential hub genes. As a result, CD36, COL4A2, GLUL, and ACACB were selected and considered as hub genes that may provide novel targets for further studying obese diabetic patients.
The GO and pathway analysis showed that CD36 enriched biological function (FP, MF, CC) in metabolism, receptor activity, and platelet alpha granule membrane where the enriched pathway exists due to platelet adhesion to exposed collagen. CD36 is member 3 of the scavenger receptor class B family of cell surface proteins and is located on the long arm of chromosome 7 at band 11.2 in humans. It may play a key role in the development of glucose intolerance and diabetes [27,28]. Tahar et al. reported that, in a low fat and high starch diet, CD36 deficiency enhanced insulin responsiveness in CD36-null mice [29]. Pravenec et al. reported that transgenic expression of CD36 was closely associated with reduced serum fatty acids as well as improvement of insulin resistance and glucose intolerance in the spontaneously hypertensive transgenic rat models [30]. Drover et al. indicated that CD36 deficiency is a possible risk factor for diet-induced T2DM  in both the postprandial and fasting states in humans [31]. Our results also showed that CD36 was more highly expressed in T2DM with normal weight than T2DM with obesity. In addition, CD36 was closely correlated with weight loss (analysed by GEO2R), suggesting that a decreased CD36 levels may lead to an increase in body weight.
The GO and pathway analysis showed that GLUL enriched the biological functions (FP, MF, CC) of metabolism and transaminase activity in the Golgi apparatus, and its closest related pathway is glutamine biosynthesis I. GLUL (ligase) is an enzyme-coding gene that belongs to the glutamine synthetase family [32,33]. GLUL plays a crucial role in the metabolism of the nitrogen form of glutamine in an ATP-dependent reaction [32]. Our study also showed that GLUL had a higher expression in non-diabetic normal weight patients than in diabetic normal weight patients; GLUL was also closely correlated with weight loss. Petrus et al. also reported that GLUL expression was lower in the obese group with human white adipose tissue than non-obese group [34]. This observation suggests that increasing GLUL levels may help to reduce diabetes. Prudente et al. reported that the rs10911021 polymorphism of the GLUL genies an independent modulator of mortality in T2DM patients [35]. The Look AHEAD Research Group performed genetic analyses in 3,845 overweight/obese participants with T2DM over a median of 9.6 years. The results showed that the risk (C) allele for GLUL rs10911021 was  significantly associated with morbidity and mortality from cardiovascular disease among T2DM patients [36].
COL4A2 was found to enrich the biological functions (FP, MF, CC) of cell growth and/or maintenance as well as the expression of extracellular matrix structural constituents and collagen type IV (COL4). However, no pathways were enriched. COL4 Alpha 2 (COL4A2) chain is the encoded basement membranes by the COL4A2 gene that exist in humans [37]. Du et al. suggested that the reduction of miR-29a caused by high glucose may increase the risk of excess COL4A2 in proximal tubule cells [38]. This study showed that COL4A2 is highly expressed in adipose tissue, which is related to weight loss. We plan to examine the relationship between COL4A2 expression and the pathogenesis and progression of T2DM and obesity in future studies.
ACACB (Acetyl-CoA carboxylase β) was found to enrich the biological functions (FP, MF, CC) of metabolism, ligase activity, and the Golgi apparatus and its closest related pathway is the import of palmitoyl-CoA into the mitochondrial matrix. ACACB is an enzyme that associated with diseases including Acetyl-Coa Carboxylase-Beta Deficiency and Biotin deficiency and it is related with the glucagon signalling pathway. ACACB is thought to control fatty acid oxidation through malonyl-CoA to inhibit carnitine palmitoyl transferase I, which is the rate-limiting step in fatty acid uptake and oxidation by mitochondria [39]. Our study also showed that ACACB had higher expression in non-diabetic normal weight patients compared to diabetic patients with normal weight. It also showed that ACACB was closely correlated to insulin resistance. This denotes that increasing ACACB levels may help to reduce diabetes. MA et al. showed that ACACB plays a role in obesity-altered lipid metabolism in susceptibility to T2DM [40]. An et al. analysed 12 case-control studies containing 3273 cases and 3242 controls, which indicated that there are significant associations between the ACACB gene, rs2268388 polymorphism, and diabetic nephropathy among Caucasian patients with diabetes. Some case reports also suggested that ACACB expression may be related to obesity [41]. For example, the Corbett et al. study used ACACB as a target in the design of isozyme-nonselective acetyl-CoA carboxylase inhibitors in obese mice [42]. After analysing the array dataset, GSE29718 [43], it is concluded that immune system pathways may have a significant role in child obesity. In contrast, Wei et al. reported that ACACB polymorphisms were associated with blood pressure in an analysis of Han Chinese T2DM populations from 1975 [44]. In a twin study, ACACB polymorphisms in biotin-dependent carboxylases were found to be downregulated in adipose tissue and adipocytes of the obese twin in comparison with the non-obese twin. Taken together, CD36, COL4A2, GLUL, and ACACB were considered as core genes closely related to obesity and T2DM.

Limitation
First, the sample size of each group in this study was limited, and regarding the inevitably bias from the heterogeneity of clinical tissues, more other databases containing tissues from obesity or T2DM should be included further. Second, the gender and age should be considered as well. Third, the adipose tissues from different parts of the body, including subcutaneous adipose, the omentum, jejunal mesentery, and ileal mesentery, as well as the adipose near to the lesser curvature and greater curvature of stomach of obese or overweight patients, should be utilized to confirm those core genes.

Conclusions
CD36, COL4A2, GLUL, and ACACB that are significantly enriched in metabolism and glutamine biosynthesis I, astrocytic glutamate-glutamine uptake, and neurotransmitter uptake and metabolism in glial cells. They are also related to weight loss and insulin resistance. These genes might be the core molecular biomarkers for obesity or T2DM. Furthermore, these results provided strong evidence for forthcoming therapeutic research involving precise gene targets in T2DM and NDM obese patients. We anticipate identifying the genes closely related to obesity and T2DM and further investigate their relationship with the CD36, COL4A2, GLUL, and ACACB as they may be target genes for future therapies.