Construction and validation of nomograms for predicting the prognosis of grade 3 endometrial endometrioid adenocarcinoma cancers: a SEER-based study

ABSTRACT Most cases of endometrial adenocarcinoma (EAC) are diagnosed early and have a good prognosis; however, grade 3 (G3) EACs have poor outcomes. We retrospectively analyzed the data of 11,519 patients with G3 EACs registered between 2004 and 2015 in the Surveillance, Epidemiology, and End Results Program database and constructed a nomogram to guide clinicians in decision-making and accurate prediction of the prognosis. The caret package was used to divide samples into a training set and a validation set. Univariate and multivariate Cox regression analyses were performed, and a nomogram was constructed. A calibration curve was plotted, and a decision curve analysis was performed to verify the accuracy and clinical utility in both cohorts. The Cox regression analysis revealed that age, race, tumor size, number of lymph nodes resected, International Federation of Gynecology and Obstetrics stage, tumor/node stage, and adjuvant therapy were the prognostic factors for G3 EAC, and these were included in the nomogram. The area under the curve values of the training cohort for 1-, 3-, and 5-year were 0.832, 0.798, and 0.784, respectively for the overall survival (OS) group, and 0.858, 0.812, and 0.799, respectively for the cancer specific survival (CSS) group. A nomogram was constructed to predict the survival rate of patients with G3 EACs more accurately. The predictive nomogram will help clinicians manage patients with G3 EACs more effectively in terms of clinical prognosis.


Introduction
Endometrial carcinomas (ECs) are some of the most common malignant tumors of the female reproductive system. ECs had an estimated incidence of 65,620 new cases and 12,590 deaths in 2020 in the United States [1]. The main clinical presentation of early EC is abnormal vaginal bleeding. While patients with ECs that are diagnosed early have good prognoses, with a 5-year overall survival (OS) of about 90%, the prognoses of patients with advanced or high-grade ECs remain poor [2]. High-grade ECs account for 10-20% of all ECs and 40% of all mortality due to ECs. The pathological types of high-grade EC mainly include grade 3 (G3) endometrial adenocarcinoma (EAC), dedifferentiated carcinoma, undifferentiated carcinoma, clear cell carcinoma, serous carcinoma, mixed adenocarcinoma, and carcinosarcoma [3]. The factors suggested by the International Federation of Gynecology and Obstetrics (FIGO) staging system are currently followed to determine the prognosis of patients with ECs; however, their performance in predicting the individual survival risk is poor due to the low accuracy and omission of independent risk factors, such as age, for the patients' survival outcomes [4][5][6]. Therefore, an individualized clinical prediction model for patients with G3 EACs is necessary.
In this study, the clinical characteristics and prognostic factors of patients with G3 EACs were obtained from the Surveillance, Epidemiology, and End Results (SEER) Program database from 2004 to 2015. We aimed to construct G3 EAC nomogram models based on the SEER data and predict the survival to meet the current clinical requirements. Our hypothesis is that a predictive nomogram will help clinicians manage patients with G3 EACs more effectively in terms of accurate clinical prognosis, which is very important to guide decision-making and predicting the prognoses of these patients.

Statistical analysis
R 3.6.1 was used for statistical analysis. X-tile software is a new bioinformatics tool suitable for biomarker evaluation and result-based cut point optimization(Yale University, New Haven, USA). That is, different values are used as cutoff values to group for statistical testing. The result with the smallest p-value of the test result can be considered as the best cutoff value. Based on the results of X-tile software, we found that in OS group, when age was classified into 3 subgroups: <64 years, 64-77 years, and >77 years, the prognosis is the most significant among different subgroups ( Supplementary Figure 1 abc). When Tumor size was classified as <48 mm, and ≥ 48 mm (Supplementary Figure 1 def), there is a significant prognostic difference between the two groups. Similarly. In CSS group, the best cutoff value for age is: <59 years, 59-69 years, and >69 years (Supplementary Figure1 ghi). The best cutoff value for Tumor size is: <60 mm, and ≥ 60 mm (Supplementary Figure1 jkl). Univariate Cox regression was used to determine the risk factors related to OS and CSS. Patients with a P-value <0.05 in the univariate Cox regression analysis were included in the multivariate Cox regression analysis (variable screening method: bidirectional) to determine the independent prognostic factors and establish nomograms for predicting OS or CSS based on independent factors. Univariate and multivariate analyses were performed using the survival package, receiver operating characteristic (ROC) curves were plotted with the survival ROC package. Nomogram was drawn based on the nomogram function in the rms package.

Flowchart of the analysis
In order to facilitate the understanding of the research, a methodology flowchart is provided in Figure 1. A total of 11,519 eligible patients with G3 EACs between 2004 and 2015 were enrolled from the SEER Program database. Patients were divided into a training set (OS: n = 3721; CSS: n = 3192) and a validation set (OS: n = 2480; CSS: n = 2126). There were no statistically significant differences in demographic or clinical characteristics between the two groups. Tables 1 and 2 list the demographic data and tumor characteristics of the patients in the OS and CSS groups, respectively. Subsequently, univariate and multivariate Cox regression analyses were performed to construct a nomogram and predict the 1-, 3-, and 5-year survival of the patients. The calibration plot and decision curve analysis (DCA) curve were used to evaluate the accuracy and clinical applicability of the model.

Independent prognostic factors for G3 EAC
In the OS training set, univariate and multivariate Cox regression analysis of all the variables revealed that age, race, tumor size, number of lymph nodes resected, FIGO stage, tumor/node (T/N) stage, and adjuvant therapy were independent prognostic factors (P < 0.05, Table 3). In the CSS training set, univariate and multivariate Cox analyses were also performed for all variables (P < 0.05, Table 4). The results showed that age, race, tumor size, number of lymph nodes resected, FIGO stage, T/N stage, In addition, we plotted the survival curve of each significant variable in the univariate Cox regression analysis of the OS data set. We found that all the variables were related to prognosis except the age at diagnosis (Figure 2). The survival rate of patients treated with postoperative radiotherapy alone was significantly better than those treated with postoperative chemotherapy alone, and the survival benefit of patients treated with combined postoperative chemoradiotherapy was better than that of patients treated with postoperative chemotherapy (Figure 2(a)). The survival rate of the patients was not only affected by age, race, tumor size, number of lymph nodes resected, FIGO stage, and T/N stage ( Figure 2(b-e,g,i,j)), but also by marital status. The survival rate of married patients was better than that of unmarried patients (Figure 2(f)). The prognoses of black patients with G3 EACs were generally worse than those of white patients (Figure 2(h)). Consistent results were observed in the CSS univariate survival curve (Figure 3).

Construction of nomograms related to OS and CSS
We constructed nomogram models using the clinical characteristics of age, race, tumor size, number of lymph nodes resected, FIGO stage, T/N stage, and adjuvant therapy to predict OS and CSS based on the multivariate Cox regression analysis results.
In the nomograms, the length of the line corresponds to the influence of the different variables, and the different values of the variables correspond to outcomes. The OS nomogram showed that FIGO stage had the maximum impact on prognosis, followed by age, number of lymph nodes resected, and N stage (Figure 4(a)). The CSS nomogram showed that age and the FIGO stage had a considerable influence on CSS, followed by the number of lymph nodes resected, T stage, and the N stage (Figure 4(b)). Each number/category of these variables was assigned a score on a points scale. After the total score was calculated and it was located on the total points scale, a straight line drawn to the 1-, 3-, and 5-year survival probability scale showed the estimated OS or CSS at each time point.

Clinical evaluation of the nomogram models
To evaluate the accuracy of the nomograms, we drew a calibration curve on each nomogram. The calibration diagrams showed consistency between the predicted values (x-axis) and observed values (y-axis) at 1, 3, and 5 years for OS and CSS, in both the training ( Figure 5(a,b)) and validation sets ( Figure 5(c,d)), indicating that the model had high accuracy.

Clinical utility evaluation of the nomograms
A DCA is a simple method to evaluate clinical prediction models, diagnostic tests, and molecular markers. To demonstrate the advantages of the   (Figures 8(a-f), 9a-f).

Discussion
With the continuous progress of medical diagnosis and treatment, the survival outcome of EC has been significantly improved, and almost 75% of ECs can be diagnosed in the early stages (FIGO stages I or II) [7,8]; however, some patients with ECs still have poor prognoses. The prognoses of patients with ECs are closely related to independent risk factors such as stage, grade, histological type, lymph node metastasis [9]. However, these pathological parameters are still inadequate to predict the rates of survival and recurrence in patients with EC. In recent times, there have been many studies about the accurate prediction of the prognoses of patients with EC. An analysis of the data of 63,729 patients with ECs in the SEER Program database from 1988 to 2015 showed that the CSS nomogram was constructed using age, race, histological grade, clinical stage, tumor size, and the OS nomogram was constructed using histological grade, clinical stage, tumor size, and race, with C-indices of 0.859 and 0.782, respectively [10]. However, the survival outcomes of patients with ECs of different stages, grades, and histological   types are considerably different due to their high heterogeneity and different molecular characteristics.
Type I EACs account for about 65% of EACs and are associated with favorable prognoses. Type II mainly includes serous carcinoma, clear cell carcinoma, and carcinosarcoma, which are associated with low incidence, high grade of malignancy, and poor prognosis. The Bokhman dichotomy has certain limitations in predicting the prognoses of patients with ECs. In 2020, the inclusion of The Cancer Genome Atlas (TCGA)    [11]. Although most tumors with high copy numbers and p53 abnormalities are usually serous ECs, quite a few of them are G3 EACs [12,13]. In 2016, the European Society for Medical Oncology-ESGO-ESTRO guidelines defined the high-risk of recurrence group of EC as (I) endometrial carcinoma (type 1), FIGO stage IB, grade 3 tumor (T1/G3 EAC); (II) nonendometrioid carcinoma (type 2); and (III) advanced endometrial cancer, regardless of the pathological type [14]. The prognoses of patients in the high-risk recurrence group are poor, and the risk of recurrence and metastasis is high [15,16]. Therefore, it is important to evaluate the high-risk recurrence of EC to improve the overall survival rate of the patients. At present, there are still some controversies regarding the G3 EAC pathogenesis, prognosis, and treatment.
The C-indices of the training and validation sets in a previous study were 0.814 and 0.837, respectively, and the AUC was 0.7. These were derived by analyzing the prognostic risk factors of 1172 patients with low-grade endometrial carcinosarcoma, indicating that the constructed nomogram had good predictive ability [17]. Another study showed that the mortality of patients with G3 EACs was 45% lower than that of patients with endometrial carcinosarcoma [18]. The 5-year OS of patients with G3 stage I, II, and III ECs were 77.5%, 62.7%, and 49.6%, respectively, indicating poorer prognoses than those of G1 and G2 ECs, while being slightly better than those of endometrial papillary serous carcinoma and clear cell carcinoma [19]. Therefore, some scholars have proposed that G3 EACs should be classified as type II. However, there has been no large-scale retrospective study about G3 EACs.
In this study, we retrospectively analyzed the data of 11,519 patients with G3 EACs registered in the SEER Program database between 2004 and 2015. The univariate Cox regression analysis showed that age, race, tumor size, number of lymph nodes resected, FIGO stage, T/N stage, and adjuvant therapy were independent prognostic factors in terms of either OS or CSS. Previous studies have identified some of these variables as being associated with the survival of patients with ECs [20][21][22]. These clinical features were used to construct nomograms to predict the prognoses. The accuracy and clinical utility of the nomograms were tested by calibration plots and DCA, respectively. The DCA is a new method to evaluate diagnostic tests and prediction models. Our study had good clinical application value. For OS, the 1-, 3-, and 5-year AUC values of the training set were 0.832, 0.798, and 0.784, respectively. For CSS, the 1-, 3-, and 5-year AUC values were 0.858, 0.812, and 0.799, respectively. This study was the first large-scale retrospective study on G3 EACs. This is the first time that nomograms and websites for patients with G3 ECs based on SEER data have been created. Compared with previous nomograms, our research provides clinicians a more convenient and accurate prediction method. The variables required for our nomograms are common and easy to obtain in clinical practice, making them more cost-effective than other prediction methods that use TCGA molecular typing or biomarkers [23].
However, our study has some limitations. First, the SEER database still lacks some clinical information that is significant for the prognosis, such as invasion of the lymphovascular space. Additionally, there is no molecular profile information, which is likely to be the future trend for precision cancer therapies. Second, the nomograms were based on data retrospectively obtained from the SEER database. A robust nomogram needs to be verified externally in multi-center clinical trials and prospective studies. In the future, we plan to explore the possibility of including more predictors to further improve the performance of the nomograms.

Conclusions
Our prognostic nomograms will provide a new method to accurately predict the survival of individual patients with G3 EACs.

Highlights
(1) Marital status affects the survival of patients with G3 EAC. (2) FIGO stage has a great influence on the prognosis of G3 EAC patients. (3) The nomogram has good clinical applicability in predicting the survival of G3 EAC patients

Data availability statement
Some or all data, or code generated or used during the study are available from the corresponding author by request. https://seer.cancer.gov/

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.