Gene expression and prognosis of x-ray repair cross-complementing family members in non-small cell lung cancer

ABSTRACT The X-ray repair cross-complementing gene (XRCC) family participates in DNA damage repair and its dysregulation is associated with the development and progression of a variety of cancers. However, XRCCs have not been systematically studied in non-small cell lung cancer (NSCLC). Using The Cancer Genome Atlas (TCGA) and Oncomine databases, we compared the expression levels of XRCCs between NSCLC and normal tissues and performed survival analysis using the data from TCGA. The correlations of XRCCs with the clinical parameters were then analyzed using UCSC Xena. Genetic alterations in XRCCs in NSCLC and their effects on the prognosis of patients were presented using cBioPortal. SurvivalMeth was used to explore the differentially methylated sites associated with NSCLC and their effect on prognosis. Next, the immunological correlations of XRCCs expression level were analyzed using TIMER 2.0. Finally, GeneMANIA was used to visualize and analyze the functionally relevant genes, while Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) were used for functional and pathway enrichment analyses of prognostic genes. Our results revealed that XRCCs were overexpressed in lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). Univariate and multivariate Cox analyses showed that XRCC4/5/6 were independent risk factors for LUAD. Additionally, genetic alterations, methylation, and immune cell infiltration demonstrated an association between XRCC4/5/6 and poor prognosis in LUAD. Finally, the KEGG-enriched and non-homologous end-joining (NHEJ) pathways were shown to be associated with XRCC4/5/6. In conclusion, our study demonstrated that XRCC4/5/6 could be used as diagnostic and prognostic biomarkers for LUAD.


Introduction
Lung cancer remains the leading cause of cancerrelated deaths globally, with an estimated diagnosis of 2 million patients and 1.76 million deaths each year [1]. Despite the worldwide efforts to control smoking, which is the most prominent factor causing lung cancer [2,3], the number of patients with lung cancer will only increase further as the use of computed tomography (CT) screening becomes more widespread [4]. Approximately 85% of lung cancer patients are diagnosed with non-small cell lung cancer (NSCLC), with lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) being the most common subtypes of NSCLC [5]. Among these two, LUAD is more common and being a peripheral lung cancer, it mostly originates from the bronchial mucosal epithelium, whereas LUSC mainly originates from the bronchial mucosal columnar epithelium and is predominantly a central lung cancer. Although screening high-risk groups using lowdose CT can reduce lung cancer mortality by 20% [6], there is no standard method to predict the survival of patients with NSCLC [7]. Therefore, to develop individualized treatment plans, it is essential to identify prognostic biomarkers and study their oncological characteristics in lung cancer, thereby improving the prognosis of patients with NSCLC.
An underlying hallmark of cancers is their genomic instability, which may be the combined effect of DNA damage, tumor-specific DNA repair defects, and failure to stop or block the cell cycle before the damaged DNA is passed on to the daughter cells [8]. The X-ray repair crosscomplementing (XRCC) gene family mainly consists of six members (XRCC1/2/3/4/5/6), which are primarily involved in maintaining the chromosome stability by DNA single-strand break repair [9,10], and homologous recombination and nonhomologous end-joining to repair the DNA double-stranded breaks [11][12][13][14].However, whether the protein kinase, DNA-activated, catalytic subunit (PRKDC), Fanconi anemia (FA) complementation group G (FANCG), breast cancer gene 2 (BRCA2), etc. belong to the XRCC family remains partially controversial. Studies have demonstrated that dysregulation of the XRCC family may disrupt the DNA repair processes, leading to tumor development in the body. [13,15,16]. Despite genomic instabilities promoting the development of cancer, they also offer therapeutic opportunities [17]. Consequently, our research focused on the expression levels and prognostic values of XRCCs in NSCLC.
RNA-and DNA-based studies are a significant part of biomedical research, which has been rapidly developing due to the advancements in microarray technologies [18]. Moreover, an increasing number of tumor values are being uncovered due to the improved efficiency of data analysis using the online platforms based on The Cancer Genome Atlas (TCGA) databases, such as UCSC Xena [19], cBioPortal [20], etc. In this study, bioinformatics analysis was used to comprehensively explore the expression and prognosis of XRCC family members in NSCLC and to search for biomarkers that can be used as diagnostic and prognostic markers for NSCLC.

Differential expression of XRCC family members
Multiple methods were utilized to determine the expression levels of XRCC family members in patients with NSCLC. TCGA database was used to evaluate the differential expression of XRCC members between the NSCLC (n = 1037) and normal tissues (n = 108). We then compared the expression levels of XRCCs in LUAD (n = 535) and LUSC tissues (n = 502) with those in normal tissues (LUAD, n = 59; LUSC, n = 49). Furthermore, we performed the differential expression analysis of XRCC family members in patients with LUAD and LUSC using several datasets from Oncomine.

Correlation of the expression levels of XRCC family members with clinical parameters in NSCLC
A correlation analysis of the expression levels of XRCC family members with the different clinical stages of cancer (LUAD: stage I, n = 410; stage II, n = 176; stage III, n = 118; stage IV, n = 38; LUSC: stage I, n = 381; stage II, n = 253; stage III, n = 131; stage IV, n = 12) and gender of patients (LUAD: Female, n = 409; Male, n = 343; LUSC: Female, n = 207; Male, n = 559) was performed using the UCSC Xena database (https://xenabrow ser.net/). Then, the correlation between the expression levels of XRCCs and the gender of patients was analyzed using the Wilcoxon test to identify the significant differences between the two groups, while the correlation between the expression levels of XRCCs and the clinical stages of cancer was analyzed using the Kruskal-Wallis test to identify the significant differences among three or more groups.

Genetic alterations in XRCCs and their prognosis
First, we chose six datasets (n = 2558) (LUAD: TCGA Firehose Legacy; TCGA PanCancer Altas; TCGA Nature 2014; LUSC: TCGA Firehose Legacy; TCGA PanCancer Atlas; TCGA Nature 2014) from cBioPortal (http://www.cbioportal. org/), an open-source website for interactive exploration of multidimensional cancer genomics datasets that aids in the analysis, visualization, and download of a large number of cancer datasets [20,21], to analyze the genetic alterations of XRCCs in LUAD and LUSC. Later, we used Kaplan-Meier (KM) analysis to explore the genetic mutations in XRCCs and their association with the overall survival (OS) and disease-free survival (DFS) rates of patients. The survival rates were compared by log-rank test to identify the differences between LUAD and LUSC patients with and without the genetic alterations.

Prognostic values of XRCC family members in LUAD and LUSC
The prognostic values of the expression levels of XRCC family members in LUAD and LUSC were estimated using the data downloaded from TCGA in December 2020 (LUAD: OS, n = 504). The OS rates of LUAD and LUSC patients were analyzed by dividing the patients into low-and high-expression groups based on their median mRNA levels. We then evaluated the differences in the OS rates of the high-and low-expression groups using the KM survival curve. P-value < 0.05 was considered as statistically significant. We further explored the prognosis of XRCCs by performing univariate and multivariate Cox analyses to identify the genes that can be considered as independent prognostic factors.

Analysis of XRCC4/5/6 DNA methylation sites and their prognosis
We performed the differential methylation analysis of XRCC4/5/6 promoter regions in patients with LUAD using the SurvivalMeth database (http:// bio-bigdata.hrbmu.edu.cn/ survivalmeth/) (P < 0.05), a web server to investigate the effects of DNA methylation-related functional elements on prognosis. Additionally, we used the T-test to examine the data. The methylation sites associated with LUAD were classified into high-and low-risk groups for survival analysis using the KM method.

Tumor purity and immune cell infiltration of XRCC4/5/6
We used TIMER 2.0 (http://timer.cistrome.org/), a comprehensive resource for systematic analysis of immune infiltrates across diverse cancer types [22], to investigate the correlation of expression levels of XRCC4/5/6 genes with tumor purity and immune cell infiltration in LUAD.

Genetic interaction analysis
We used the GeneMANIA (http://www.genema nia.org), a prediction website that serves as a biological network integrator for gene prioritization and function prediction [23], to construct gene networks associated with XRCC4/5/6 and visualize the functional correlation between these genes.

Functional and pathway enrichment analyses
Functional enrichment analysis was performed using Gene Ontology (GO), while pathway enrichment analysis was performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG). Both GO and KEGG were performed using the R package of 'enrichplot'(v.3.12), which implements several visualization methods to interpret the functional enrichment results obtained from over-representation analysis (ORA) or gene set enrichment analysis (GSEA). Next, we used the KEGG database (https://www.genome.jp/kegg/), a database resource for understanding high-level functions and utilities of the biological system from molecular-level information, especially largescale molecular datasets generated by genome sequencing and other high-throughput experimental technologies, to plot pathway maps associated with the target genes [24,25].

Statistical Analysis
Bioinformatics statistical analysis was performed using 'R x64 4.0.5' software and open online websites. Differences in expression of XRCCs in NSCLC compared with normal tissues were analyzed by Student's t-test. Genetic alteration prognostic plots and Kaplan-Meier survival curves were compared by logrank test. In all analyses, differences were considered statistically significant if the P value was less than 0.05.

Results
In this study, we analyzed the expression and prognosis of the XRCCs family from multiple biological perspectives using bioinformatics approaches, and we identified that XRCC4/5/6 may become new biomarkers for LUAD diagnosis and prognosis. This discovery promises to benefit lung cancer patients.

XRCC family members are significantly overexpressed in NSCLC
Using TCGA database, we compared the expression levels of XRCC family members between the NSCLC tumor samples (n = 1037) and normal tissue samples (n = 108). The results indicated that XRCC1/2/3/4/5/6 were expressed at higher levels in NSCLC tissues than in the normal tissues (P < 0.001) (Figure 1(A)). We then analyzed the mRNA levels of XRCCs in LUAD and LUSC using TCGA database and found that the expression levels of XRCC1/2/3/4/5/6 were significantly upregulated in both LUAD and LUSC tissues compared to those in normal tissues (P < 0.05) ( Figure 1(B, C)). In addition, we validated these findings by analyzing the datasets of Hou [26], Semalat [27], Su [28], Bhattacharjee [29], Garber [30], Talbot [31], and Yamagata [32] from the Oncomine database (Table 1). Taken together, these results indicate that the expression levels of XRCC family members are significantly upregulated in both subtypes of NSCLC (LUAD and LUSC).).

Correlation of the expression levels of XRCCs with the clinical parameters of LUAD and LUSC
We used the UCSC Xena database to continue exploring the expression levels of XRCCs with regard to the tumor stages and genders of patients with LUAD and LUSC. Statistically significant differences were observed in the patients with LUAD in the XRCC5 and XRCC6 groups (P < 0.05), with a positive correlation between the tumor stage and gene expression (Figure 2(A)). In patients with LUSC, XRCC2 expression was correlated with tumor stage (P < 0.05), with the highest gene expression being observed in stage II (Figure 2 (B)). In addition, the mRNA levels of XRCC2 and XRCC5 in LUAD patients and XRCC2 in LUSC patients were higher in men than in women (P < 0.05; Figure 2(C, D)). Overall, these findings imply that the expression levels of XRCCs are partially correlated with the clinical parameters in NSCLC patients.
Next, we explored the relationship between these genetic alterations in XRCCs and the survival rates of patients with LUAD and LUSC. KM analysis showed that the presence of altered XRCC family members was associated with reduced OS and DFS in LUAD patients compared to that in patients with unaltered XRCC family members (P < 0.05) (Figure 4(A)). In contrast, the analysis of genetic alterations in XRCCs in patients with LUSC did not reveal any significant correlation with the OS and DFS (Figure 4(B)). Nevertheless, the curve trend showed that LUSC patients with altered XRCCs were predicted to exhibit better OS than those with unaltered ( Figure 4(B)). In summary, these results suggest that genetic alterations in XRCC family members significantly affect the prognosis of patients with LUAD and LUSC.

Prognostic values of XRCC family members in LUAD and LUSC
We analyzed the prognostic values of XRCCs in patients with LUAD and LUSC using TCGA database. The results suggested that high expression levels of XRCC2/3/4/5/6 were significantly associated with poor OS in patients with LUAD patients (Figure 5(A)). Additionally, upregulation of the mRNA levels of XRCC1/2/6 was significantly associated with longer OS in patients with LUSC ( Figure 5(B)). These results are consistent with those shown in Figures 4A and 4B.
Subsequently, we used univariate and multivariate Cox analyses by combining the target genes, age, sex, and staging parameters to identify genes that can be used as prognostic indicators   independent of these clinical factors. Univariate Cox analysis showed that the tumor stage, XRCC4, XRCC5, and XRCC6 were potential risk factors for the OS in patients with LUAD, while no such statistically significant genes were found in patients with LUSC (P < 0.05) ( Table 2; Table 3). In addition, multivariate Cox analysis demonstrated that the tumor stage, XRCC4, XRCC5, and XRCC6 could predict the tumor prognosis independent of other factors for the OS in patients with LUAD (P < 0.05; Figure 6). Taken together, our results illustrated the excellent prognostic characteristics of XRCC family members in NSCLC patients and also showed that XRCC4/5/6 were potential independent risk factors for OS in patients with LUAD.

Correlation of methylation of XRCC4/5/6 with survival analysis in LUAD
Based on our finding that XRCC4/5/6 can be used as potential independent risk factors for patients with LUAD, we further analyzed the methylation sites of XRCC4/5/6 using SurvivalMeth. Comparison of the methylation levels in LUAD tumor tissues with those in normal tissues identified three relevant methylation sites each in XRCC4 and XRCC5 (P < 0.05), whereas no such methylation sites were found in XRCC6 (P < 0.05) ( Table 4). Then, we divided the samples with differentially methylated sites into high-and low-risk groups and performed survival analysis using the KM method. The results showed that the XRCC4/5 high-risk groups were associated with a poor prognosis (Figure 7), which is consistent with the conclusions of our previous survival analysis.

Correlation analysis of expression levels of XRCC4/5/6 and immune cell infiltration
Using TIMER 2.0, we analyzed the correlation between the expression levels of XRCC4/5/6 and the level of immune cell infiltration in patients with LUAD. Our results indicated that the expression levels of XRCC5/6 were positively correlated, while those of XRCC4 were negatively correlated with tumor purity (Figure 8). High expression levels of XRCC4/5/6 showed a significant negative correlation with the infiltration of the cluster of differentiation 4 (CD4) + T cells and positive correlation with the infiltration of the cluster of differentiation 8 (CD8) + T cells (Figure 8). The immune cell composition of NSCLC tissues was  dominated by T cells (47%), with CD4 + T cells being the most abundant T cell population (26%), followed closely by CD8 + T cells (22%) [33]. Furthermore, a study demonstrated that lowinfiltrating CD4 + T cells, high-infiltrating CD8 + T cells, and high-infiltrating CD8 + /lowinfiltrating CD4 + T cells were associated with a poor prognosis in patients with NSCLC [34,35]. Taken together, our research showed that the infiltration of CD4 + T and CD8 + T cells associated with expression levels of XRCC4/5/6 revealed poor prognosis.

Gene-function interaction analysis of XRCC4/5/6
To identify the genes associated with XRCC4/5/6 functions, we performed a visual analysis using the GeneMANIA online tool and identified 20 genes that closely interacted with XRCC4/5/6 ( Figure 9). Among them, DNA ligase IV (LIG4), PRKDC, barrier-to-autointegration factor 1 (BANF1), non-homologous end-joining gene 1 (NHEJ1), and aprataxin and polynucleotide kinase 3 -phosphatase (PNKP)-like factor (APLF) were the five most significantly interacting genes with XRCC4/5/6 ( Table 5). XRCC4/5/6 all had multiple interactions with LIG4 and PRKDC, and the most significant interactions were physical interactions and pathway. However, BANF1 only interacted with XRCC4/5, and the most relevant interaction was pathway (Figure 9). Therefore, identifying genes that are functionally similar to XRCC4/5/6 will aid in the discovery of other DNA repair target genes, thereby expanding the therapeutic options for tumor treatment..

Predicting the function and pathway of XRCC4/5/ 6 in LUAD
The functions of XRCC4/5/6 and the correlations among their functions were explored by GO enrichment analysis in terms of biological processes (BP), cellular components (CC), and molecular functions (MF) [36]. The results showed that the functions significantly regulated by XRCC4/5/6 concerning BP were viral latency, response to x-rays, double-strand break repairs via nonhomologous end-joining, non-recombinational repair, response to ionizing radiations, doublestrand break repair, DNA recombination, and response to radiation (Figure 10(A)). Moreover, these eight functions had the most significant functional correlation (Figure 10(A)). For CC, the function of the DNA repair complex was most markedly controlled by XRCC4/5/6 alterations and was most significantly correlated with other regulated functions (Figure 10(B)). Among all the MFs, the function most significantly regulated by XRCC4/5/6 was the protein C-terminal-binding, which was most significantly correlated with other regulatory functions (Figure 10(C)). KEGG enrichment analysis was used to identify pathways related to the functions of XRCC4/5/6 alterations and the pathways of gene alterations were drawn using the KEGG website. However, by KEGG enrichment analysis, only the NHEJ pathway was found to be associated with XRCC4/ 5/6 alterations in LUAD (P < 0.05) (Figure 11(A)). Furthermore, we mapped the NHEJ pathway using the KEGG website, which can directly reflect the changes in genes in the pathway. The results showed that Rad27 was highly expressed in the Saccharomyces cerevisiae pathway, while DNA polymerase μ (Polμ) was expressed at low levels in the mammalian pathway (Figure 11(B)).

Discussion
Environmental factors play critical roles in tumorigenesis by affecting the stability of target genes [37]. Therefore, the ability of the genes to repair the damaged DNA is closely related to tumorigenesis and determines the difference in the susceptibility to cancer in different individuals [38]. Although the  Figure 6. Multifactorial Cox analysis for independent prognostic analysis.
XRCC family members are essential components of DNA repair mechanisms, they are reported to play potential roles in the tumorigenesis and prognosis of multiple cancers; however, their oncological and prognostic values in NSCLC need to be investigated in future studies. XRCC1, a single-strand break repair factor [39], has been reported to play essential roles in various cancers. Miao et al. demonstrated that the XRCC1 Arg399Gln polymorphism is a genetic susceptibility factor for the development of gastric cardia cancer [40]. The XRCC1 399Gln allele is a potentially important determinant of susceptibility to smokinginduced pancreatic cancer [41]. Moreover, the presence of single nucleotide polymorphism (SNP)-77 T > C in the 5 -untranslated region (UTR) of XRCC1 is associated with an increased risk of developing lung cancer [42]. In our study, XRCC1 was highly expressed in both LUAD and LUSC tissues. Moreover, high expression of XRCC1 in LUSC was associated with longer OS, but it was not statistically significant in LUAD. The frequency of XRCC1 alterations was 2.6% in LUAD and 2.7% in LUSC, and both of these gene amplifications account for a significant part.
A variety of cancers have been associated with XRCC2, such as breast [43], lung [44], pancreatic [45], and head and neck cancers [46]. Zienolddiny et al. reported that there are associations between a set of genetic polymorphisms in DNA repair genes and the risk of lung cancer [44]. Furthermore, in vivo studies using a viral vector containing XRCC2 promoter indicated that XRCC2 is a promising target for the diagnosis and treatment of various types of cancers [47]. In this study, XRCC2 was found to be overexpressed in the LUAD and LUSC tissues compared with the normal tissues. Moreover, high XRCC2 expression was significantly associated with poor OS in LUAD, while it was associated with better OS in LUSC. In terms of clinical characteristics, the expression of this gene was correlated with the clinical stage of LUSC and the gender of patients with LUAD and LUSC. Moreover, XRCC2 was predominantly amplified in LUAD exhibiting the highest frequency (4%), while it was predominantly deleted in LUSC (2%).
XRCC3 is required to repair double-strand breaks via homologous recombination repair pathways for accurate chromosomal segregation and repair of DNA cross links [11,48]. XRCC3 IVS6 C1571T and the associated haplotype AAC are associated with a relatively high risk of lung cancer [14]. In the current study, XRCC3 expression was higher in the LUAD and LUSC tissues than in the normal tissues. In terms of survival, high XRCC3 expression in LUAD was significantly associated with the poor OS of patients, while no such relation was observed in LUSC. In addition, XRCC3 was the most frequently altered gene of the XRCC family in LUAD (3%) and amplification was the dominant genetic alteration in both LUAD and LUSC.
XRCC4/5/6 are essential components of the NHEJ pathway, in which XRCC4 acts in conjunction with the Ku70-Ku80 heterodimer encoded by XRCC5 and XRCC6, respectively, and ligase IV for precise end-joining of blunt DNA double-strand breaks [12,13,49]. Moreover, NHEJ is the primary pathway for the repair of double-strand breaks in mammalian cells [13]. These conclusions are consistent with our pathway enrichment analysis results shown in Figures 11A and 11B. Our study also indicated that Rad27 was highly expressed in the Saccharomyces cerevisiae pathway, while Polμ was downregulated in the mammalian pathway. The association between the polymorphisms of genes in the NHEJ pathway and the susceptibility and prognosis of lung cancer was first proposed by Tseng et al [50]. Furthermore, XRCC4/5/6 have been widely studied by scholars in the field of oncology [51][52][53][54][55].
In our study, XRCC4/5/6 were overexpressed in LUAD and LUSC tissues as compared to those in the normal tissues, which is consistent with the results of Ye et al [56] and Ma et al [57] who reported that XRCC5 was overexpressed in LUAD and LUSC. Survival analysis showed that high expression levels of XRCC4/5/ 6 were associated with poor OS in LUAD patients. Multivariate Cox analysis showed that XRCC4/5/6 were potential independent risk factors for LUAD. Moreover, XRCC4/5/6 DNA methylation and immune cell infiltration analysis in LUAD also indicated a poor prognosis. Currently, biomarkers are used for the clinical diagnosis and prognosis in a variety of cancers, such as lung [58], breast [59,60], and gastric cancers [61,62] Therefore, XRCC4/5/6 can be  used as novel diagnostic and prognostic biomarkers for LUAD. Notably, this study has some limitations. First, the conclusions lack experimental validation; however, we applied a multiple-omics approach and used multiple databases to further support our conclusions. Second, the data in this study are mainly obtained from publicly available databases; therefore, the results heavily depend on the quality of the data in the databases.

Conclusion
In summary, we comprehensively analyzed the role of XRCC family members in NSCLC from multiple oncological perspectives using various online authoritative databases. Our study demonstrated that XRCC4/5/6 could be used as diagnostic and prognostic biomarkers for LUAD.

Disclosure statement
No potential conflict of interest was reported by the author(s).