DNA methylation biomarkers for head and neck squamous cell carcinoma

ABSTRACT DNA methylation plays an important role in the etiology and pathogenesis of head and neck squamous cell carcinoma (HNSCC). The current study aimed to identify aberrantly methylated-differentially expressed genes (DEGs) by a comprehensive bioinformatics analysis. In addition, we screened for DEGs affected by DNA methylation modification and further investigated their prognostic values for HNSCC. We included microarray data of DNA methylation (GSE25093 and GSE33202) and gene expression (GSE23036 and GSE58911) from Gene Expression Omnibus. Aberrantly methylated-DEGs were analyzed with R software. The Cancer Genome Atlas (TCGA) RNA sequencing and DNA methylation (Illumina HumanMethylation450) databases were utilized for validation. In total, 27 aberrantly methylated genes accompanied by altered expression were identified. After confirmation by The Cancer Genome Atlas (TCGA) database, 2 hypermethylated-low-expression genes (FAM135B and ZNF610) and 2 hypomethylated-high-expression genes (HOXA9 and DCC) were identified. A receiver operating characteristic (ROC) curve confirmed the diagnostic value of these four methylated genes for HNSCC. Multivariate Cox proportional hazards analysis showed that FAM135B methylation was a favorable independent prognostic biomarker for overall survival of HNSCC patients.


Introduction
Head and neck squamous cell carcinoma (HNSCC) is a collective term for malignancies originating from the oral and nasal cavities, pharynx, or larynx, which show microscopic evidence of squamous differentiation [1]. HNSCC is one of the most common malignant tumors, ranking sixth in incidence worldwide [2]. Approximately 50,000 new cases and 10,000 deaths from HNSCC were recorded in 2017 in the US. The incident rates increase by about 1% per year among whites, and are more than twice as high in men as in women [3].
Early diagnosis of cancer is one of the most important factors contributing to less extensive and more successful treatment and better prognosis for patients [4]. It has been reported that the 5-year relative survival rate for HNSCC is about 47% for blacks, while only about 30% of cases are diagnosed at an early local stage, for which the 5-year survival is about 79% [3]. Unfortunately, the majority of patients are diagnosed with advanced stage HNSCC due to the asymptomatic nature of early stage disease and lack of effective screening modalities, resulting in very low 5-year survival rates [5][6][7]. Therefore, it is essential to identify useful early diagnostic and prognostic molecular biomarkers for HNSCC.
Multiple factors contribute to the incidence of HNSCC. These factors include cigarette-smoking behavior, alcohol consumption [8][9][10], and human papillomavirus (HPV) infection [11][12][13], which might induce accumulation of various genetic alterations in normal squamous cells regarded as the processes driving the initiation and progression of HNSCC [14,15]. In recent years, increasing evidence of epigenetic modifications as a cause of HNSCC has been documented [16,17]. DNA methylation, a heritable modification known to alter gene expression that is not mediated by changes in the DNA sequence [18,19], is the most widely investigated epigenetic modification. Aberrant methylation in the promoter region of tumor suppressor genes could induce their abnormal expression to influence some crucial biological processes, such as proliferation, apoptosis, cell cycle, DNA repair, tumor invasion, and metastasis [20][21][22][23]. Thus, identification of DNA methylation biomarkers has emerged as one of the most promising approaches to improve cancer diagnosis, as it presents several advantages compared to other markers [24,25].
In previous studies, a variety of epigenetic biomarkers has been evaluated in HNSCC for early detection and prognostic prediction; however, most of them focused on single genes. Abnormal hypermethylation of P16, MGMT, and DAPK promoters were considered as biomarkers for HNSCC detection and prognostic prediction [1,26,27]. In recent years, the introduction of high-throughput technologies enabled the simultaneous analysis of thousands of genes in single experiments, which could serve as promising tools to screen for useful genetic and epigenetic biomarkers for diagnosis and prognosis of cancer [28]. A number of gene expression profiling studies have been conducted to find various differentially expressed genes (DEGs) in HNSCC [29][30][31]. Aberrant methylation profiling has also been analyzed in HNSCC [16,[32][33][34]. Although numerous studies have demonstrated abnormal methylation and expression of certain genes in HNSCC, few studies have comprehensively investigated gene methylation and expression profiling in HNSCC and their potential prognostic values.
In the present study, genomic methylation microarrays (GSE25093 and GSE33202) and gene expression profiling (GSE23036 and GSE58911) datasets were integrated and analyzed using a series of bioinformatics tools. These four independent databases from Gene Expression Omnibus (GEO) were produced by different researchers from different countries. However, all of them possess limited numbers of subjects. Therefore, we identified overlapping aberrantly methylated-DEGs profiled in the four databases and then validated them by TCGA database, which has the characteristic of owing large-size samples. Using this approach, we aimed to find novel aberrantly methylated genes with small regions of heterogeneity, and then explore their prognostic value by taking advantage of TCGA database (with large sample size and detailed clinical characteristics).

Microarray datasets
In the current study, we used methylation profiling (GSE25093 and GSE33202) and gene expression (GSE23036 and GSE58911) datasets from Gene Expression Omnibus (GEO, https://www. ncbi.nlm.nih.gov/geo/) of The National Center for Biotechnology Information (NCBI). Specimens from a total of 50 cases and 25 controls were enrolled in the GSE33202 program, while 91 head and neck cancer and 18 normal control specimens were in the GSE25093 program. Both of these two methylation array datasets utilized GPL8490 platform (Illumina HumanMethylation27 BeadChip). For the expression microarray datasets, the profiling of GSE58911 included 15-paired cases and the corresponding normal control specimens (platform: Affymetrix Human Gene 1.0 ST Array), while GSE23036 included 63 cases and 5 controls (platform: Affymetrix Human Genome U133A 2.0 Array).
In addition, in order to confirm the results, hypomethylated-high-expression and hypermethylated-low-expression hub genes were further validated using other datasets from TCGA database. This database includes numerous microarray data of various human cancers genomes and has generated comprehensive, multi-dimensional maps of genomic changes in human cancers.

Data processing
The microarray data of methylation and expression were analyzed by R 3.1.2 software (https://www.r-pro ject.org/). Generally, all the microarray data after normalization were analyzed by R software. Aberrant methylation and expression between cases and controls were identified using Bioconductor minfi and limma packages [35,36], respectively. Benjamini-Hochberg false-discovery rate (FDR) method of adjusted P value of each CpG site and gene was calculated. An FDR-adjusted P value less than 0.05 was used as the cut-off criteria to identify differentially methylated CpG sites (DMCs) and DEGs in the microarray datasets. Similarly, FDR adjusted P value less than 0.05 and delta β value either more than 0.2 (termed hypermethylated gene) or less than −0.2 (termed hypomethylated gene) were used as cut-off values for methylation microarray datasets. Adjusted P value less than 0.05 and delta expression value either more than 1 (termed upregulated gene) or less than −1 (termed downregulated gene) were used as cut-off values for expression microarray data. All the DMCs were annotated into corresponding differentially methylated genes (DMGs) based on the platform annotation file. Subsequently, matched function was adopted to identify overlapping genes in the two methylation (GSE25093 and GSE33202) and expression (GSE23036 and GSE58911) datasets. Eventually, hypermethylated-low-expression hub genes were identified from the overlapping hypermethylated and downregulated genes, while hypomethylated-highexpression hub genes were obtained from the overlapping hypomethylated and upregulated genes.

Validation of the hub genes in TCGA database
In order to confirm the results, we also downloaded the methylation (platform: Illumina HumanMethylation450 BeadChip) and expression (IlluminaHiSeq) microarray data from TCGA database for validation. Based on the annotation file of HumanMehtylation450, there are more than 450,000 CpG sites in the human genome. These include several CpG sites located in the hub gene body that do not participate in regulating hub gene expression. Therefore, the target CpG sites used for further analysis had to meet the following two criteria: i) the target CpG site must negatively correlate with the expression of the hub gene; ii) the Pearson's correlation among target CpG sites of each of the hub genes must be more than 0.4. Subsequently, we extracted the clinical characteristics including gender, age, cigarette smoking, alcohol consumption and clinical stage of each HNSCC patient. Survival data including overall survival (OS) and relapse-free survival (RFS) were also obtained. Finally, we constructed a logistic model using the methylation level of identified CpG sites, gender, age, clinical stage, and survival data to uncover any differences in survival duration with different methylation levels.

Statistical analysis
The methylation and expression data from GEO and TCGA were analyzed with R 3.1.2 software (https://www.r-project.org/) using Bioconductor minfi and limma packages [35,36]. Both methylation and expression data were normalized before further analysis. The DMCs and expressed genes between HNSCC and normal controls were identified. Benjamini-Hochberg FDR method of adjusted P value of each CpG site and gene was calculated. Both DMCs and DEGs are highlighted by red points in the volcano plots ( Figure 1). Pearson's correlation analysis was applied to estimate the association of CpG sites with the expression level of the corresponding gene.
The diagnostic value of validated genes was assessed using receiver operating characteristic (ROC) curve and area under the curve (AUC). A multivariate Cox proportional hazards analysis was performed to evaluate OS and relapse-free survival (RFS) of HNSCC patients with validated gene methylation. The Pearson's correlation, diagnostic, and survival analyses were executed with Statistical Program for Social Sciences (SPSS) software 13.0 (SPSS, Inc., Chicago, IL, USA).

Identification of aberrantly methylated genes in HNSCC
Methylation data from GSE25093 and GSE33202 were downloaded from GEO. There were 50 HNSCCs and 25 controls in GSE33202, 91 HNSCC and 18 control specimens in GSE25093. The DMCs of these two methylation datasets were analyzed separately. A total of 9,894 DMCs were identified in GSE33202, among which a total of 359 were hypermethylated and 350 hypomethylated CpG sites. Similarly, a total of 20,041 DMCs were found in GSE25093, of which there were 796 hypermethylated and 275 hypomethylated CpG sites. Subsequently, after annotation of these DMCs in the two methylation datasets, there were 587 DMGs consisting of 321 hypermethylated and 266 hypomethylated genes in GSE33202, while a total of 925 DMGs consisting of 690 hypermethylated and 235 hypomethylated genes were identified in GSE25093. The differentially methylated CpG sites in the methylation microarray data are shown as volcano plots in Figure 1.

Identification of degs in HNSCC
The expression data from GSE58911 and GSE23036 were utilized to analyze for differential expression of genes in HNSCC. These two-microarray datasets included 78 HNSCC specimens and 30 controls. The platforms used for these two expression datasets were different; therefore, DEGs were screened separately and annotated in their corresponding annotation files. Ultimately, a total of 1,646 and 3,384 DEGs were identified in GSE23036 and GSE58911, respectively. Among the DEGs, 176 upregulated and 445 downregulated genes were found in GSE58911, while a total of 257 upregulated and 327 downregulated genes were identified in GSE23036. The volcano plots of DEGs are shown in Figure 1.

Identification of two series of hub genes
After identification of multiple DMCs and DEGs from these microarray datasets, a matching function was performed to identify overlapping genes with hypermethylated-low-expression and hypomethylatedhigh-expression genes in HNSCC. A total of 14 hypermethylated and downregulated genes were obtained by overlapping 164 hypermethylated and 141 lowexpression genes, while a total of 13 hypomethylated and upregulated genes were found by overlapping 115 hypomethylated and 68 high-expression genes ( Figure 2). The hypermethylated-low-expression and hypomethylated-high-expression genes are listed in Table 1. The heat maps of all included datasets and of the two series of hub genes are shown in Figure 3(a,  b), respectively.

Validation of the hub genes in TCGA cohort
The hypermethylated-low-expression and hypomethylated-high-expression genes were validated in another database (TCGA) to confirm the results. TCGA database had registered 530 HNSCC patient and 74 normal control specimens from HNSCC patients. Among these samples, a total of 520 HNSCC and 44 normal control specimens were analyzed by genomic expression and methylation microarray. We first used the expression data to validate our results.
Approximately half of the genes in GEO datasets were confirmed by TCGA cohort. Six genes, DEFB118, FAM135B, CYP21A2, ZNF610, FCN2, and CEACAM7, were confirmed to be downregulated, while six other genes, NID2, HOXA9, DCC, TBX20, WT1, and GATA4, were confirmed to be upregulated in HNSCC (Figure 4). The comparison among the expression of our hub genes is shown in Figure S1.
Subsequently, we downloaded all the methylation microarray data of DEFB118, FAM135B, CYP21A2, ZNF610, FCN2, CEACAM7, NID2, HOXA9, DCC, TBX20, WT1, and GATA4 from TCGA database. All CpG sites located in the gene body region or 3'untranslated region (3'UTR) were removed. Pearson's correlation was performed on the rest of the CpG sites to identify sites closely related to each other. These closely associated CpG sites were measured and the mean methylation used to assess the methylation difference between HNSCC and normal control. The methylation status of four genes, FAM135B, ZNF610, HOXA9, and DCC, in TCGA cohort were consistent with that in the GEO cohort. Hypermethylation of FAM135B and ZNF610 and hypomethylation of HOXA9 and DCC were validated in TCGA cohort ( Figure 5). A comparison among the methylation status of our hub genes is shown in Figure S2 Diagnostic and prognostic value of the four validated genes in TCGA cohort A ROC curve was plotted to evaluate the diagnostic value of FAM135B, ZNF610, HOXA9, and DCC methylation ( Figure 6). The AUCs for the 4 genes were 0.792, 0.934, 0.939, and 0.939, respectively. The sensitivities and specificities with the maximum Youden indices were, respectively, 74.2% and 80.0% for FAM135B, 86.2% and 98.0% for ZNF610, 91.5% and 96.0% for HOXA9, and 88.9% and 90.0% for DCC (Table 2). In addition, we also performed multivariate Cox proportional hazards analysis to evaluate the prognostic value of FAM135B, ZNF610, HOXA9, and DCC methylation in OS and RFS of HNSCC by  integration of clinical characteristics of HNSCC patients from TCGA. As shown in Table 3, our results confirmed a significant favorable OS in HNSCC patients with hypermethylation of FAM135B (hazard ratio: 0.121; 95% confidence interval (CI): 0.021-0.694). However, there was no correlation between ZNF610, HOXA9, and DCC methylation and OS or RFS of HNSCC.

Discussion
Elucidating the underlying mechanisms of the initiation and development of HNSCC could greatly benefit diagnostic and prognostic evaluation. HNSCC is a collective term used to describe carcinomas that originate from the oral cavity, pharynx, and larynx and are characterized by various activated pro-oncogenes and inactivated tumor suppressor genes [37][38][39]. Epigenetic changes alter the expression of critical genes responsible for the initiation and development of cancers [40][41][42]. Multiple types of epigenetic modifications have been identified, including methylation, acetylation, phosphorylation, ubiquitination, and sumolyation [43,44]. However, the best well known epigenetic change is DNA methylation, which modifies the expression of key genes involved in various biological processes. Previous studies have reported hypermethylation of genes including CDKN2A [45], DAPK [46], RASSF1A [22], RARβ 2 [47], MGMT [22], and E-cadherin [48] in HNSCC, and have proposed that these aberrantly methylated genes are associated with cell proliferation, cell-cell adhesion [48], and apoptosis [49]. Hypermethylationassociated silencing in multiple biological processes serves an important role throughout the progression of HNSCC [50]. Worsham et al. have delineated an epigenetic continuum in HNSCC by identification of a series of abnormally methylated genes from benign papillomas to HNSCC and concluded that DNA methylation plays an important role in the development continuum from benign neoplasm to malignancy [51]. In addition, the methylation profile of gene promoters was different for each type of tumor [46], suggesting that the detection of aberrant gene promoter methylation could serve as a potential molecular biomarker for tumors. The evaluation of methylation profiles as a tool for the diagnosis or prognosis of tumors has been described for many cancers, such as colorectal cancer [52], nasopharyngeal carcinoma [53], laryngeal squamous cell carcinoma [54], and HNSCC [55].
In the present study, we investigated genomewide DNA methylation patterns of 141 HNSCC patients and 43 normal controls by microarray and identified 29,935 probes showing significant differential DNA methylation in cancer tissues. By integrating DNA methylation and RNA sequencing microarray data, 27 aberrantly methylated genes accompanied by altered expression were identified. To verify the results of GEO microarray dataset, an independent validation of these aberrantly methylated-DEGs was performed using TCGA database. In total, two hypermethylated-low-expression (FAM135B and ZNF610) and two hypomethylatedhigh-expression (HOXA9 and DCC) genes were validated. The diagnostic value of these genes was evaluated, and the sensitivity and specificity of the four methylated genes for diagnosis of HNSCC were >70% and >80%, respectively, indicating that these aberrantly methylated genes could serve as a useful tool for early diagnosis of HNSCC. Furthermore, the four methylated genes were evaluated for their potential patient prognosis of HNSCC; only FAM135B hypermethylation was associated with favorable patient survival.
Hypermethylation of ZNF610 with impaired transcriptional capacity was reported in previous studies [56,57] and also identified in our study. As a member of the Krüppel C2H2-type zinc finger protein family, ZNF610 encodes zinc figure protein 610 that consists of 9 C2H2-type zinc fingers and 1 Krüppel-associated box (KRAB) domain. The KRAB domain could behave as a transcriptional repressor by recruiting corepressor proteins [58], and has been implicated in the maintenance of the nucleolus, cell proliferation, cell differentiation, and apoptosis [59]. In gastric carcinomas, approximately 70% of patients were shown to harbor hypermethylated and decreased expression of ZNF610 [60]. However, hypermethylated FAM135B has been less investigated and its  association with cancer has been less reported. Only one study has implicated FAM135B methylation with cancer, and this study elaborated on its ability to promote esophageal squamous cell carcinoma (ESCC). This study further concluded that FAM135B mutation was associated with poor survival of ESCC patients [61]. Deletion of FAM135B could attenuate cellular malignant phenotype, such as cell growth, colony formation, migration, and invasion [61]. From this perspective, downregulated expression of FAM135B caused by promoter methylation might be a favorable prognostic factor for cancer patients. In the current study, our results show that hypermethylation of FAM135B could be considered a favorable prognostic factor of HNSCC.
In summary, we have identified four aberrantly methylated genes (FAM135B, ZNF610, HOXA9, and DDC) that could serve as useful biomarkers for diagnosis and prognosis of HNSCC. Additionally, we have uncovered hypermethylation of FAM135B to be associated with favorable survival of HNSCC patients.

Disclosure statement
No potential conflict of interest was reported by the authors.