Identification of a prognostic model based on immune and hypoxia-related gene expressions in cervical cancer

Abstract Background Tumour immune microenvironment (TIME) has long been a key direction of tumour research. Understanding the occurrence, metastasis and other processes of cervical cancer (CC) is of great significance in the diagnosis and prognosis of tumours. Methods Here, this study applied the univariate Cox regression model to determine the prognostic association of immune and hypoxia signature genes in CC, and used Least Absolute Shrinkage and Selection Operator (LASSO) Cox method to build immune and hypoxia related risk score model to uncover the immune signature of the TIME of CC. Moreover, we used in vitro experiment to validate the expression level of signature genes. Notably, we assessed the predictive effect of anti-PD1/PDL1 immunotherapy using risk score model. Results Through the LASSO Cox regression model, we obtained 12 characteristic genes associated with the prognosis of CC, and also associated with immunity and hypoxia. Interestingly, the high-risk group had the properties of high hypoxia and low immunity, while the low-risk group had the properties of low hypoxia and high immunity. In the low-risk group, patients lived longer and had a significant therapeutic advantage of anti-PD-1 immunotherapy. Conclusions Established risk scores model can help predict response to anti-PD-1 immunotherapy of CC. Plain Language Summary The survival rate of cervical cancer (CC) is still low. A prognostic model for CC is urgently needed to improve the prognosis and survival. This study constructed a risk scoring models based on 12 characteristic gene related to hypoxia and immunity, including CX3CL1, CXCL3, GHSR, DLL4, FGFR2, PDF, KLRK1, MAP3K14, RETNLB, PRDX2, P4HA1 and PGK1, which can help predict the prognosis of PD-1 immunotherapy in CC patients. The high-risk group may have the properties of high hypoxia and low immunity, while the low-risk group patients live longer and have obvious therapeutic advantages in anti-PD-1 immunotherapy. Our findings suggest a potential link between hypoxia, immunity, prognosis, tumour immune microenvironment and response to immunotherapy in CC patients.


Introduction
Current studies elucidated that cervical cancer (CC) ranks fourth among women in terms of morbidity and mortality (Bray et al. 2018, Ferrall et al. 2021).The incidence of CC varies between countries and regions, and there are risk factors associated with social conditions within specific regions (Gelband et al. 2015).Unfortunately, the prognostic survival rate of CC is still low so far (Ferrall et al. 2021).In view of this, a prognostic model for CC is urgently needed to improve the prognosis and survival of CC.
Computed tomography (CT) and magnetic resonance imaging (MRI) are regarded as the most effective tools for clinical evaluation of invasive CC (Scheidler et al. 1997).
Lymph node metastasis (LNM) is present in a minority of patients with early stages of CC.Positron emission tomography/computed tomography (PET/CT) is increasingly common in the preoperative examination and diagnosis of metastatic CC.However, as conventional imaging including PET/CT has shown limited sensitivity, and the limited spatial resolution of the PET system limits its inherent ability to detect small lesions, the metastatic CC is usually not detected until late (Groheux et al. 2009, Weissinger et al. 2021).Overall prognosis remains poor for women with metastatic or recurrent disease (Pourhanifeh et al. 2020).Since the prognosis of CC varies significantly due to the genetic heterogeneity of patients, the identification of effective genetic biomarkers will remarkably improve the ability to predict the prognosis of CC (Nahand et al. 2020, Hashemipour et al. 2021, Huang et al. 2021).
There is a general consensus is that persistent human papillomavirus (HPV) infection is a key cause of CC (Bruni et al. 2016).The immune system plays a crucial role in preventing HPV infection, the development of cervical lesions, and ultimately tumour progression (Naumann and Leath 2020).Therefore, exploring the immune signature after CC infection is of great significance for improving the prognosis of CC.Hypoxia-related mechanisms is recognised as one of the hallmarks of cancer signalling pathways (Gilkes et al. 2014), related to stem cell characteristics, extracellular matrix organisation and cancer cell metastasis (Mathieu et al. 2011, Semenza 2017).Furthermore, epithelial-mesenchymal transition, induced by hypoxia plays key roles in invading surrounding tissues.Previous studies reported that hypoxic response in T lymphocytes can enhance the expression of CD137 in immunotherapy (Palaz� on et al. 2012).Moreover, hypoxia also drives migration and effector functions of CD8 þ T cells, suggesting that hypoxia plays different roles in immune and tumour cells (Graham et al. 2015).
Here, we are committed to constructing an immune-and hypoxia-related model using the Least Absolute Shrinkage and Selection Operator (LASSO) Cox method based on the expression of immune and hypoxia related genes to effectively predict the prognosis of CC.Our findings suggest a potential link between hypoxia, immunity, prognosis, tumour immune microenvironment (TIME) and response to immunotherapy in CC patients.

Construction of risk scoring model related to immunity and hypoxia
First, the data of the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) were intersected with the gene sets related to immunity and hypoxia, and genes that overlap in three gene sets were selected.Then, the obtained gene sets were put into the data of TCGA and GEO for single-factor Cox analysis.Genes with a p value <.05 were selected as the intersection in the two data sets.LASSO method was applied to screen variables to reduce the number of genes in risk models.Finally, the multivariate Cox regression analysis was utilised to construct the risk score model of immunity and HRGs.

Survival and expression analysis of signature genes in CC and paracancerous tissue
Based on the expression values of the signature genes in each sample of the TCGA dataset, the samples were divided into high and low expression groups based on median.The effect of the signature genes on survival was analysed using the Kaplan-Meier method in R (R Foundation for Statistical Computing, Vienna, Austria).The Wilcoxon test method was utilised in R to analyse the expression differences of genes.

Mutation analysis and transcription factor correlation analysis of signature genes
The 'maftools' in R was utilised to analyse the mutation data of CC and visualised in waterfall plot.The subset of transcription factors (TFs) was obtained from Citrome (http://cistrome.org/), and corresponding TF expression values were obtained from the TCGA database.Through correlation analysis, the TF in the risk score associated with signature genes were identified.

Survival analysis of risk scores
To judge the effect of risk scores on survival, the samples were grouped based on the median of risk scores.Samples above the median were assigned to the high-risk group, while samples below the median were assigned to the lowrisk group.The TCGA dataset was used to perform survival analysis via the Kaplan-Meier method in R, and validated in the CEO Dataset.

Association between risk score and clinical features
To further judge the association between risk scores and clinical features, the multivariate Cox analysis was first used to determine whether risk score was the independent prognostic factor different from other clinical factors.After testing for collinearity, a prognostic nomogram was constructed to predict 1-, 3-and 5-year overall survival (OS) in CC patients in the TCGA dataset.A nomogram calibration curve was drawn to compare predicted versus observed OS.In addition, the differences in risk scores in different clinical subgroups were also compared.

Estimation of the tumour immune microenvironment cell infiltration
The single sample GSEA (ssGSEA) algorithm was used to quantify the relative abundance of each cell infiltration in CC.A gene set marking each TIME infiltrating immune cell type was identified from Charoentong's study (Charoentong et al. 2017).Enrichment scores were utilised to represent the relative abundance of each TIME infiltrating cells in each sample to observe the differences in immune cell infiltration.

Gene set variation analysis (GSVA)
Aiming to investigate the differences in biological processes, GSVA enrichment analysis was performed by the GSVA R package.The R package limma was utilised to calculate differentially expressed pathways.False discovery rate (FDR) <0.05 was considered statistically significant for differences.

Differential expression and enrichment analysis of differentially expressed genes (DEGs)
To further evaluate the gene expression between high-risk and low-risk groups, DEG analysis was performed through the limma package of R, with a screening thresholds of FDR <0.01 and jlog 2 fold change (FC)j >0.3.Then, the David database was used to perform enrichment analysis of DEGs.

Tumour mutational burden (TMB) analysis
First, the maftools package in R was utilised to calculate the TMB score.The tumour samples in the experimental group were divided into two groups with high and low TMB scores based on the median of TMB.The high and low scores were analysed by Kaplan-Meier survival curves.Subsequently, the TMB scores of patients were compared.

Analysis of immune checkpoints
To assess the status of immune checkpoints, the expression of three immune checkpoints was assessed, including programmed cell death-ligand 1 (PD-L1), programmed death 1 (PD-1) and cytotoxic T-lymphocyte-associated antigen 4 (CTLA-4).Correlation analysis between risk scores, signature genes and immune checkpoints was then performed.

Prediction of tumour risk scores in the context of immunotherapy benefit
In order to explore the predictability of tumour risk score in patients with benefit from immunotherapy, the response of patients was analysed according to the immunotherapy cohort, and the difference of risk score in response and nonresponse groups was analysed.

Quantitative real-time polymerase chain reaction (QRT-PCR)
A total of 13 tumour tissues and 13 para-carcinoma tissues were collected from patients with CC.Aidlab's Easy Tissue/ Cellular RNA Rapid Extraction Kit was used for the total RNA extraction of tissues.Each sample contained three technical replicates, and the primers are shown in Supplementary Table 1.This study was approved by the Ethics Committee of The First Affiliated Hospital of Bengbu Medical College.All patients provided the informed consent.

Statistical analysis
Statistical analysis was performed using R (R Foundation for Statistical Computing, Vienna, Austria).Differences in expression levels of signature genes were compared between normal and cancer samples using Wilcoxon's test.Survival curves were generated using the Kaplan-Meier method.Differential expression between subgroups was performed using the limma package.Univariate and multivariate analyses were performed through Cox regression models, combined with other clinical features to determine the independent prognostic value of risk scores.The predictive efficiency of risk models for 1-, 3-and 5-year OS was estimated using ROC curves.p Value <.05 was considered statistically significant.

Construction of a risk scoring model related to immunity and hypoxia
This study included 293 patients from the TCGA and cervical squamous cell carcinoma (CESC) database and 300 patients from GSE44001 (Lee et al. 2013).Totally, 1811 IRGs and 200 HRGs were screened, 1437 immune and hypoxia related genes were obtained (Supplementary Figure 1A).Forty-three prognosis related genes were finally obtained (Supplementary Figure 1B).Taking the hypoxic microenvironment may affect the activation state of infiltrating immune cells and the immune response of tumour cells, 43 prognostic-related genes were used to the LASSO Cox regression model to construct a prediction model for the OS rate of patients with CC.The optimal k value was obtained correspondingly using the smallest partial likelihood deviation, 12 signature genes were finally identified.Among which, P4HA1 and PGK1 were HRGs, and the rest were IRGs, including CX3CL1, CXCL3, GHSR, DLL4, FGFR2, PDF, KLRK1, MAP3K14, RETNLB and PRDX2.A risk score model for hypoxia and immunity was constructed using multivariate Cox regression (Supplementary Figure 1C-E).

The effect of signature genes on CC
The Wilcoxon test was used to analyse the differences in the gene expression level of 12 signature genes.The expression of PDF, PGK1 and KLRK1 was significantly up-regulated, and DLL4 was significantly down-regulated in cancer tissues (Supplementary Figure 2A).The expression of these genes was detected by QRT-PCR (Supplementary Figure 2B).

Survival analysis of risk score model
The impact of the risk scores constructed by these 12 genes on the OS of CC was further determined in the TCGA and GEO datasets.First, based on the median risk score, the samples were divided into high-risk and low-risk groups.The proportion of death samples in the high-risk group is higher than that in the low-risk group (Supplementary Figure 3A).Prognostic analysis of risk score showed that patients with higher risk score have poor OS (Supplementary Figure 3B).
In addition, the risk score had a well survival prediction value for both TCGA data and GEO data (Supplementary Figure 3C).

The relationship of risk scores with clinical features and immune infiltration
The univariate and multivariate Cox analysis confirmed that age, N stage and risk score can be used as independent prognostic factors (Figure 1(A)).A nomogram was constructed to predict the probability of 1-, 3-and 5-year OS.Each factor was assigned in proportion to its risk contribution to survival (Figure 1(B)).Calibration curves indicated that the nomogram shows high accuracy at 1-year and 3-year OS (Figure 1(C)).In addition, differences in risk scores across clinical subgroups were compared and the results showed significant differences in risk scores across T stage, M stage, stage and age (Figure 1(D)).
To explore the relationship between the risk score constructed by the combination of tumour hypoxia and IRGs and the TIME, the ssGSEA method was used to evaluate the infiltration status of 23 immune cells in the TCGA CESC dataset.Differences in infiltration were hypothesis tested and found that the infiltration of most immune cells was significantly higher in the low-risk group than in the high-risk group (Figure 2(A)).In the low-risk group, the immune score, stromal score and ESTIMATE score were significantly higher than those in the high-risk group, while the tumour purity was lower than that in the high-risk group (Figure 2(B-E)).A total of 65 metabolic pathways were screened by GSVA.In the high-risk score group, the metabolism of galactose, fructose and mannose was mainly metabolic pathways, such as energy metabolism, that were more active.In the low-risk group, arachidonic acid metabolism, B cell receptor signalling and the immune related metabolic were more active (Figure 2(F)).

Differential expression analysis between high-risk and low-risk groups
The cancer tissue samples in the TCGA dataset were grouped according to high-and low-risk scores and subjected to gene differential expression analysis using the limma package of the R software.Totally, 784 DEGs were obtained, including 203 up-regulated genes and 581 down-regulated genes.Interestingly, the HRGs, P4HA1 and PGK1, were highly expressed in the high-risk group, while the immune related genes, MP3K14, CX3CL1 and PRDX2, were lowly expressed in the high-risk group, indicating that the high-risk group may have the properties of high hypoxia and low immunity, while the low-risk group may have the properties of low hypoxia and high immunity.
Among the 16 hypoxia signature genes, 10 genes were significantly highly expressed in the high-risk group, and the remaining six genes showed no significant changes (Supplementary Figure 4).However, the expression of 30 immune related genes was more complex in the low-risk group, but the general trend was to show high expression levels in the low-risk group (Supplementary Figure 5).

The relationship between risk score and immune checkpoint
The PD-1 and CTLA4 were significantly higher in the low-risk group and there was no significant difference in the expression of PD-L1 (Supplementary Figure 6A).Moreover, the relationship between signature genes KLRK1 and CTLA4 was 0.61 and p value <2.2e-16, KLRK1 and PD-1 was 0.63, p < 2.2e-16.Interestingly, risk scores were negatively correlated with immune checkpoints (Supplementary Figure 6B).

Evaluation of the efficiency of the risk model for prognosis in cancer immunotherapy
There was no significant difference in risk scores between the different anti-PD-L1 clinical response groups, and there was no significant correlation between anti-PD-L1 treatment and survival in the high-risk and low-risk groups (Supplementary Figure 7A,B).There was also no significant difference in PD-L1 expression in the high-risk and low-risk groups, while the expression of PD-1 was significantly higher in the low-risk group than that in the high-risk group (Supplementary Figure 7C,D).Anti-PD-1 therapy was significantly correlated with the survival of patients in high-risk group and low-risk group, and the survival of patients in low-risk group was longer than that in high-risk group (Figure 3(A)).Moreover, significant therapeutic advantage and clinical response to anti-PD-1 immunotherapy were demonstrated in low-risk patients compared with patients with high-risk groups (Figure 3(B,C)).Meanwhile, by comparing the expression of PD-L1/1 in the high-risk and low-risk groups, it was found that there was no significant difference in the expression of PD-L1, while PD-1 was highly expressed in the low-risk group (Figure 3(D,E)).Finally, ROC analysis validated the accuracy of risk scores in differentiating disease progression (PD/SD) and disease remission (CR/PR) groups after receiving anti-PD-L1 immunotherapy (Figure 3

Discussion
Currently, it is a sad fact that CC remains the fourth most common gynaecological malignancy, and its incidence tends to be younger in recent years (Lee et al. 2013, Ferrall et al. 2021).The preservation of fertility in women with CC is a demanding but constantly evolving issue (Somigliana et al. 2020).Fertility preservation approaches should be given due consideration in the decision-making process regarding treatment of gynaecologic malignancies, especially in light of the apparent detrimental psychological effects of radical approaches that deprive women of their reproductive potential (Zaami et al. 2022, Gullo et al. 2023).Researchers have studied the oncology and reproductive outcomes of fertility preserving therapy, and assisted reproductive technologies available to cancer patients and survivors are constantly evolving (Floyd et al. 2021, Tanos et al. 2022).Cryobiology is the cornerstone of preserving fertility, and vitrification as a human oocyte cryopreservation technique can be used for female fertility preservation (Gullo et al. 2022).Furthermore, studies have shown that biomarkers with good prognosis can help clinical doctors make management decisions for patients who wish to maintain fertility (Giampaolino et al. 2022, Tanos et al. 2022).Previous studies reported that the survival rate of CC was still not high at present (Siegel et al. 2018).Encouragingly, the emergence of PD-1 immunotherapy offers hope for treating CC.However, there is no effective assessment of its prognosis, it is urgent to develop a predictive model to predict PD-1 immunotherapy (Wang and Li 2019).In this study, we constructed a risk scoring models based on 12 characteristic gene related to hypoxia and immunity, including CX3CL1, CXCL3, GHSR, DLL4, FGFR2, PDF, KLRK1, MAP3K14, RETNLB, PRDX2, P4HA1 and PGK1, which can help predict the prognosis of PD-1 immunotherapy in CC patients.The high-risk group may have the properties of high hypoxia and low immunity, while the low-risk group patients live longer and have obvious therapeutic advantages in anti-PD-1 immunotherapy.Our findings reveal the relationship between TIME, immunity and hypoxia in CC, which may help improve the prognosis of patients with CC.
More recently, Yu et al. established an immune-related model containing four key genes (CHIT1, GTSF1L, PLA2G2D and GNG8) as a supplementary criterion for effectively predicting survival outcomes in CC patients (Yu et al. 2021).Nie et al. discovered a hypoxia-related prediction model of nine key genes (EFNA1, IER3, ISG20, KLF7, LDHC, P4HA2, PGM1, RBPJ and STC1), which provided a better understanding of the hypoxic tumour microenvironment in CC (Nie et al. 2022).Nevertheless, the genes in these models are only related to one type of immune or hypoxia, which cannot reveal the association between CC tumour microenvironment, immunity and hypoxia.This study constructed a risk scoring model based on hypoxia and immune genes to uncover the immune and hypoxia signature of CC tumour microenvironment.To our knowledge, our model is the first risk scoring model established to date that includes prognostic signature genes of CC microenvironment, hypoxia and immunity.Moreover, Kaplan-Meier's survival analysis was used to determine the effect of a single signature gene on survival (Zheng et al. 2021).Our results suggested that except RETNLB had no significant effect on survival, other single characteristic genes had significant effects on survival (p < .05)(Figure 2), which indicated that signature genes were potential biomarkers for CC diagnosis and prognosis.Moreover, the expression level analysis of above signature genes in TIME showed that PDF, PGK1, DLL4 and KLRK1 were highly expressed in CC (Figure 3).Likewise, signature genes used in risk models in reported studies have the potential for biomarker (Li et al. 2021, Liang et al. 2021).Currently, a study reported that PGK1 was involved in CC (Mhaidly and Mechta-Grigoriou 2021), which supported PGK1 serving as biomarkers for CC.Interestingly, it was reported that DLL4 served as a novel prognostic biomarker in patients with early-stage CC (Yang et al. 2016).Taken together, the biomarkers in our prediction model are consistent with previous studies and have favourable predictive function.Survival analysis is important for judging the predictiveness of the risk score model (Lu et al. 2022).High-risk group had high mortality rate, which indicated that the risk score model had a good predictive effect on the prognosis of CC.Moreover, the immune infiltration is one of the important indicators to evaluate the prognosis of tumour therapy (Mhaidly and Mechta-Grigoriou 2021).In this study, the method of ssGSEA was used to evaluate the TIME of 23 immune cell infiltrations in the TCGA CESC dataset, and the results indicated that the infiltration degree of most immune cells in the low-risk group was significantly higher than that in the high-risk group, but the purity of tumour cells was lower than that in the high-risk group (Supplementary Figure 7).Similarly, previous studies reported that higher levels of tumour infiltrating CD4 þ T, CD8 þ T were significantly related to longer survival (Ino et al. 2013).Furthermore, there were significant differences in risk scores across T stage, M stage, stage and age, while no significant difference in the risk score between N0 and N1 disease, which may be due to the insufficient number of TCGA CC cases (Chen et al. 2021).Overall, our risk score model is able to accurately stratify patients according to their immune microenvironment.
It is noted that the emergency of immunotherapy represented by PD-1/PD-L1 blockade brings good news to CC (Meng et al. 2018, Zhang et al. 2021).Similar to previously published studies, risk score model is able to predict the prognostic effect of immunotherapy (Song et al. 2021, Zhao et al. 2021).Notably, our hypoxia and immune-related risk score model had no effect on response to anti-PD-L1 immunotherapy, but was significantly related to response to anti-PD-1 immunotherapy.It is of importance for the prognosis of CC patients undergoing anti-PD-1 therapy.
This study also has several limitations.First, the lack of detailed information for a few patients in the TCGA hinders a more comprehensive analysis of the clinical and pathological characteristics of CESC patients.Second, due to the fact that number of CC cases in TCGA was not large enough, some statistical differences were not significant.Finally, due to the different characteristics of the internal and external microenvironment of the tumour, the whole tumour may not be able to distinguish the hypoxia and immune status of different tumour sites.The single-cell RNA sequencing combined with spatial transcriptome analysis can be considered as a potential solution to this problem.

Conclusions
In this work, we constructed the risk score model related to immunity and hypoxia to predict the prognosis of patients with CC on PD-1 treatment, and found that the model had a good predictive effect on PD-1 treatment regimens.

Figure 1 .
Figure 1.Relationship between risk scores and clinical characteristics.(A) Multivariate Cox analysis of risk scores and clinical characteristics.(B) Nomogram of risk scores and clinical features.(C) 1, 3 and 5 years of verification in the nomogram.(D) Differences in risk scores among different clinical subtypes.

Figure 2 .
Figure 2. Comparison of the tumour microenvironment in different risk score groups.(A) Differences in immune infiltration of different immune cells in different risk score groups.(B) Differences in immune scores among different risk score groups.(C) Matrix score differences in different risk score groups.(D) ESTIMATE score differences in different risk score groups.(E) Differences in tumour purity among different risk score groups.(F) Metabolic pathway differences in different risk score groups.

Figure 3 .
Figure 3. Predictive efficiency of risk score model in anti-PD-1 immunotherapy.(A) Survival curves for different risk scores in PDL-1 anti-PD-1 immunotherapy.(B) Differences in risk scores between different anti-PD-1 clinical response groups.(C) The proportion of patients in the high-risk and low-risk patient group who responded to PD-1 blockade immunotherapy.(D) Gene expression level of PD-L1 among different risk score groups.(E) Gene expression level of PD-1 among different risk score groups.(F) Accuracy of risk scores in distinguishing between PD/SD and CR/PR groups.(G) Correlation of risk scores with clinical response to anti-PD-1 immunotherapy.SD: stable disease; PD: progressive disease; CR: complete response; PR: partial response.