Construction of an immune-related genes nomogram for the preoperative prediction of axillary lymph node metastasis in triple-negative breast cancer

Abstract Immune system disorder is associated with metastasis of triple-negative breast cancers (TNBCs). A robust, individualized immune-related genes (IRGs)-based classifier was aimed to develop and validate in our study to precisely estimate the axillary lymph node (ALN) status preoperatively in patients with early-stage TNBC. We first analyzed RNA sequencing profiles in TNBC patients from The Cancer Genome Atlas database by using bioinformatics approaches, and screened 23 differentially expressed IRGs. A 9-gene panel was generated with an area under the curve (AUC) of 0.77 [95% confidence interval (95% CI), 0.68–0.87]. We detected the 9 ALN-status-related IRGs in the training set (n = 133) and developed a reduced and optimized five-IRGs signature, which effectively distinguished TNBC patients with ALN metastasis (AUC, 0.80; 95% CI, 0.65–0.86), and was superior to preoperative ultrasound-based ALN status (AUC, 0.73; 95% CI, 0.53–0.93). Predictive efficiency (AUC, 0.77; 95% CI 0.61–0.93) of this five-IRGs signature was validated in the validation set (n = 81). Furthermore, IRGs nomogram incorporated IRGs signature with US-based ALN status showed higher ALN status prediction efficacy than US-based ALN status and five-IRGs signature alone in both training and validation sets. IRGs nomogram may aid in identifying patients who can be exempted from axillary surgery. Novelty and impact: An immune-related genes (IRGs) nomogram was first developed and externally validated in our study, which incorporated the IRGs signature with ultrasound (US)-based axillary lymph nodes (ALN) status. IRGs nomogram is superior to IRGs signature alone for preoperative estimation of ALN metastasis in patients with triple-negative breast cancer (TNBC). It is a favourable biomarker for preoperatively predicting ALN metastasis risk and may aid in clinical decision-making in early-stage TNBCs.


Introduction
Breast cancer (BC) is one of the most common malignancies worldwide and is the first leading cause of cancer-related mortality among women with an estimated 620,000 deaths every year [1,2]. Four major molecular subtypes of BCs were luminal A and B subtypes, HER-2 amplification subtype, and triple-negative subtype [3,4]. Among these, triple-negative breast cancer (TNBC) accounts for approximately 15% of all BCs, and the patients with TNBCs have worse prognosis than those with other subtypes [5,6]. Currently, the diagnosis and clinical staging of BCs is mainly achieved by physical examination, combined with ultrasonography (US), mammography, computed tomography (CT), and magnetic resonance (MR) [7][8][9][10]. However, these aforementioned procedures are inadequate for screening and estimating the axillary lymph node (ALN) status of TNBC patients.
Compared with patients suffering from other breast cancer subtypes, patients with TNBC are more likely to suffer ALN metastasis at the time of initial diagnosis [11]. ALN metastasis is confirmed as the most critical independent risk factor at the time of initial diagnosis of TNBCs [12,13]. Currently, a modified radical mastectomy with ALN dissection is still one of the most effective surgical treatments for patients with local advanced BCs without distant metastasis, especially early-TNBCs [14]. However, many patients acknowledged that after surgery their dissected ALNs were non-metastasis pathologically, and ALN dissection before might be unnecessary and excessive, which made patients suffer from many unnecessary risks of postoperative morbidity, such as lymphoedema, shoulder stiffness, paraesthesia, and so on [15][16][17]. Therefore, it is necessary to investigate a robust diagnostic marker for predicting ALN metastasis before surgery, and which will make significant improvement in precision treatment for patients and effectively avoid excessive ALN dissection, which often results in severe complications [18,19].
ALN status prediction has been brought to the forefront in clinical practice. Several biomarkers such as biological moleculars, classifiers and models have been identified to predict ALN status, nevertheless, limitations are always inevitable [20,21]. As reported, more and more immune-related moleculars have been proven as key factors in the carcinogenesis and development of cancers, such as breast cancer [22]. Presently, immunotherapy by targeting the immune checkpoints has widely used in the clinical treatment of immunogenic breast cancer [23][24][25]. Moreover, the prognostic value of the immune-related moleculars in TNBCs has been demonstrated in several studies [26][27][28][29]. Therefore, in order to gradually establish and complete ALN status prediction, we supplement and perfect clinicopathological and biological prediction with immune-related genomic prediction, and further construct immune-based prognostic signature for TNBCs.
In this study, a systematic analysis of the immune-related genes (IRGs) characteristics of ALN status was conducted, and an IRGs signature was developed and validated subsequently for the preoperative prediction of ALN metastasis in patients with TNBC. Moreover, an IRGs nomogram, incorporating the IRGs signature and clinical risk factors, was constructed and can provide an individual, non-invasive preoperative assessment of the risk of ALN metastasis in TNBC patients.

Public dataset and RNA-Seq details
To identify a gene-expression signature for detecting ALN metastasis of TNBCs, we first investigated genome-wide gene-expression profiles of primary tumours with or without ALN metastasis in The Cancer Genome Atlas (TCGA) database. RNA-sequencing data of 132 TNBC patients with pathological T1-T2 stage from TCGA was analyzed, among them, 88 patients were ALN metastasis negative and 44 patients were ALN metastasis positive. Clinical characteristics of these 132 patients are shown in Supplementary Table S1.
The normalized TCGA RNA-Seq gene expression data and clinical information for TNBC were downloaded from Firehose Broad GDAC portal (http://gdac.broadinstitute.org/, accessed on 1 June 2018). The TCGA normalized read counts for the RNA-seq data were filtered by assigning all values less than three and log2-transformed to avoid noise. Genes were mapped to the coordinates of human genome using the UCSC Human Genome probe Map.
The Wilcoxon signed-rank test was used to selected differential gene expression between ALN-positive and ALNnegative patients in the TCGA TNBC RNA-Seq dataset. A least absolute shrinkage and selection operator (LASSO) logistic regression model was constructed by using selected gene features. And then, the area under the curve (AUC) values of the receiver operator characteristic (ROC) plots was used to assess the diagnostic performance of each gene panel.

IRGs definition
The immune-related genes (IRGs) were downloaded from the ImmPort database (http://www.immport.org) [30], including 17 immune categories according to their different molecular function, such as T-cell receptor signalling pathway, B-cell receptor signalling pathway, Natural Killer Cell Cytotoxicity, antimicrobials, cytokine, TNF family receptors, and so on. TCGA platform was used to measure 1811 IRGs from the ImmPort database.

Patients
Two independent cohorts with 214 operable TNBC patients who were pathologically diagnosed as T1-T2 were enrolled in this study. Among them, 133 patients and their specimens after surgeries for training set were selected from the Sun Yat-sen University Cancer Centre (SYSUCC), Guangzhou, China, between January 2012 and December 2014. Other 81 TNBC patients and their specimens after surgeries were consecutively enrolled as validation sets from the First Affiliated Hospital of Guangzhou Medical University (GZMUFAH), Guangzhou, China, between January 2010 and August 2016.
The inclusion criteria of patients in our study were as follows: (a) all tumour tissue specimens were histologically confirmed for invasive breast carcinoma; (b) TNBC was diagnosed if immunohistochemical tests showed that oestrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) were all negative; (c) tumours were scored as ER/PR negative if they had less than 1% nuclear staining; (d) available clinicopathological characteristics (age, tumour size, tumour location, histologic grade, Ki67, US-based ALN status) and enough tumour tissues for RNA extraction and quantitative real-time polymerase chain reaction (qRT-PCR) assay; (e) tumour size were pathologically diagnosed as no more than 5 cm; (f) patients should have received sentinel lymph node biopsy or ALN dissection. The exclusion criteria of our study were as follows: (a) patients with distant metastatic disease before operation; (b) patients received chemotherapy or other anticancer treatments before surgery.
In addition, all patients were treated with postoperative systemic treatments according to the clinical practice guidelines. All procedures associated with the study were approved by the Institutional Review Boards of all participating institutions, and ethical approval and written informed consent were obtained for the retrospective analysis.

qRT-PCR analysis of IRGs expression
Total RNAs from 214 formalin-fixed paraffin-embedded (FFPE) TNBC tumour tissues was extracted as described previously [18,31], and the expression of TNBC ALN status-related mRNAs (selected in the discovery stage) was further examined via qRT-PCR. We used PrimeScript TM RT Master Mix (TaKaRa, Dalian, China) to synthesize first-strand cDNA. The qRT-PCR was carried out using SYBR-Green PCR Master Mix (Applied Biosystems, American). Homo sapiens glyceraldehyde 3-phosphate dehydrogenase was used as an internal reference gene to normalize mRNA levels. Expression levels of each mRNA were calculated using the comparative Ct method.

Statistical analysis
Volcano plot analysis was performed based on adjusted t-test p values, false discovery rate (FDR) and absolute fold change (FC). To minimize the risk of over-fitting, we applied a logistic regression model combined with the LASSO (glmnet, version 2.0-5) that is a method of automatic variable selection, which can be used to select predictors X of a target variable Y from a larger set of potential or candidate predictors X. The LASSO punishes the absolute size of the regression coefficient according to the value of the tuning parameter k. When there are many possible predictors, many of which have an effect on the target variable that is actually zero to very small, then LASSO is particularly useful in variable selection by setting zero for some variables. The penalty parameter was estimated by 10-fold cross-validation (CV) in the training data set at one standard error beyond the minimum partial likelihood deviance. Significant clinicopathologic risk factors were selected by multivariable logistic regression and incorporated into construct an IRGs nomogram. The backward step-wise selection method was used for the likelihood ratio test.
All the statistical analysis was performed with the SPSS software (version 22, SPSS Inc., Chicago, IL) and R statistical software (version 3.5.1; http://www.Rproject.org). The packages of R used in this study are as follows: "glmnet," "rms," "Hmisc," "pROC," and "Dca.R". The conventional two-sided p value < .05 was considered significant in all analyses.

Clinical characteristics
We developed the model for the preoperative prediction of ALN metastasis in three stages, including the discovery, training and validation stages. The whole workflow for the discovery of IRGs and the development of an IRGs nomogram is presented in Figure 1. In total, we recruited 214 TNBC patients who underwent axillary surgery without preoperative therapy from two centres with the exception of TCGA data (Supplementary Table S1). We quantitated the expression of the identified mRNAs in the TNBC patients using qRT-PCR. The patients' clinicopathologic characteristics are summarized in Table 1. The baseline clinical and pathological characteristics were similar between the training and validation cohorts (all p > .05).

Selection of candidate IRGs to predict ALN metastasis
In the discovery stage, we identified 524 differentially expressed mRNAs in discovery set by volcano plot analysis (FC > 2. 0, FDR 0.001 and p < 0.05) (Figure 2(a)). IRGs in TNBCs which can promote tumour metastasis were downloaded from ImmPort database (http://immport.niaid. nih.gov) and 36 IRGs were found associated with ALN metastasis from the 524 mRNAs. To optimization the genes signature for a clinically viable and practical assay, we then filtered out all candidate genes with low expression levels (average expression level < 3), which subsequently resulted in a panel of 23 IRGs (Supplementary Table S2). To avoid colinearity, we then used a LASSO logistic regression algorithm to screen ALN status-correlated mRNAs and selected nine TNBC-associated IRGs from the above 23 IRGs (Figure 2(b,c)); The optimal tuning parameter k estimation in the LASSO model corresponded to the minimum prediction errors via the 10-fold CV method (Figure 2(d)). Using risk scores based on the detection of logistic regression, we evaluated the performance of the constructed 9-gene panel [AUC ¼ 0.77, 95% confidence interval (95% CI) ¼ 0.68-0.87] (Figure 3). Compared with other clinical factors such as age (AUC ¼ 0.54, 95%CI ¼ 0.39-0.69), tumour stage (AUC ¼ 0.51, 95% CI ¼ 0.25-0.77), and the anatomical location of the primary tumour (AUC ¼ 0.55, 95% CI ¼ 0.25-0.85), our model achieved a superior diagnostic power (all p < .01 compared with each factor) ( Figure 3). To obtain biological information of the IRGs signature, we carried out enrichment analysis of the 9 unique component IRGs with DAVID tool (https://david.ncifcrf.gov/). Biological processes of gene ontology (GO) with false discovery rate (FDR) value less than 0.05 were selected and 15 over-represented biological processes in GO were identified (Supplementary Figure 1). Most biological processes were epidermal growth factor receptor signalling pathway, phagocytosis, glucose metabolic process, and leukocyte migration.

Development and validation of an IRGs signature to predict ALN status of TNBC
In the training stage, a LASSO logistic regression analysis was used to construct a five-IRGs signature for predicting ALN status in the training set (the SYSUCC set, n ¼ 133) based on the findings in the discovery stage (Figure 4(a,b)). The qRT-PCR expression values of the five candidate IRGs in ALN positive and negative TNBC patients are shown in Supplementary  (Figure 4(d)). When stratified by US-based ALN status, there was a potential benefit for our IRGs signature to identifying the high-risk population among those who were US-based ALN negative and the low-risk population among those who were US-based ALN positive, thus avoiding understaging or overstaging in clinical practice (Supplementary Figure 3).

Development and validation of an IRGs nomogram for individualized prediction
To construct a clinical nomogram, we first evaluated the risk of clinicopathological factors with univariate analysis ( Table 2) and selected only US-based ALN status using a multivariable logistic regression model (Table 3). By combining the selected US-based ALN status with the IRGs signature, we constructed an IRGs nomogram, which can predict the individual risk of ALN metastasis in TNBC patients preoperatively ( Figure 5(a)). To compare the discrimination ability of the IRGs nomogram with the US-based ALN status, we conducted a ROC analysis of the TNBC patients. The results suggested that the IRGs nomogram had higher prediction efficacy (AUC ¼ 0.87, 95% CI ¼ 0.76-0.95) compared with the model containing US-based ALN status (AUC ¼ 0.73, 95% CI ¼ 0.53-0.93) alone in training set ( Figure 5(b)). The corresponding calibration curve indicated good discrimination for the nomogram, which is shown in Figure 5( (Figure 5(c)). Calibration curves were also plotted to assess the calibration of the nomogram (Figure 5(e)) and the Hosmer-Lemeshow test also indicated a satisfactory goodness-of-fit in the GZMUFAH set (p > .05). Moreover, the DCA results for the nomogram are presented in Figure 6(a,b). The DCA showed that if the threshold probability for patients or clinicians is 14% or more, ALN statusrelated treatment decision-making based on the IRGs nomogram adds more net benefit than treating either all patients or none scheme in each independent data set in this study. It also showed higher net benefit than IRGs signature alone.

Discussion
In our study, we performed a systematic and comprehensive RNA sequencing-based IRGs expression profiling analysis in specimens of TNBC patients with/without ALN metastasis, and developed a five IRGs signature for identifying ALN metastasis, which showed robust performance signature in both training and validation cohorts. Furthermore, we demonstrated that our five IRGs signature was superior to currently used clinicopathological risk factors in diagnosing ALN metastasis. In addition, the ALN status prediction performance of a combined classifier which incorporated five IRGs signature with US-based ALN status was significantly superior to the IRGs signature or single clinicopathological feature in TNBC patients.
ALN status is vitally important for therapeutic decision-making for early T stage TNBC patients in clinical [32]. Currently, sentinel lymph node biopsy and lymphadenectomy following pathologic diagnosis is the best approach for assessing ALN status of TNBC patients. According to the American Joint Committee on Cancer and the Union for International Cancer Control, N staging for TNBC is classified based on the number of metastatic ALNs. It is obvious that the number of identified metastatic ALNs improve with increase in the total number of surgically dissected ALNs. Thus, inadequate number of dissected ALNs could potentially lead to stage migration and subsequent underestimation of the disease severity. However, sufficient dissection of ALN radical lymphadenectomy may lead to dysfunction, and so on. Therefore, availability of robust biomarkers that facilitate categorization of patients with TNBC before surgery based on the ALN status will permit individualization of the surgical treatments and improve overall mortality and morbidity of TNBC patients by avoiding excessive lymphadenectomy. The subgroup of patients with a very low risk of ALN metastasis identified with the our nomogram might be candidate for observation strategy and no axillary surgery is indeed required. While ALN dissection should be conducted directly and sentinel lymph node biopsy is no longer necessary for the TNBC patients with high risk of ALN metastasis. To our knowledge, our study is the largest multimolecular markers analysis of primary tumour of early invasive TNBC with known ALN status to date and reveals an association between IRGs expression and ALN status. Although the accuracy of individual genes was relatively modest, the combination of multiple genes revealed remarkable diagnostic accuracy. Collectively, we demonstrated that the potential of a multigene panel for identifying ALN metastasis in TNBC is realistic, and can be used in the clinic in not so distance future.
With the rapid development of whole genome and transcriptome sequencing in recent years, the molecular characteristics of most cancer types are becoming clearer and clearer, including TNBC [33][34][35][36][37]. Primary tumours may produce the premetastatic niche before subsequent metastasis in secondary organ, including ALN [38]. In patients with TNBC, the immune system and tumour immuno-microenvironment are the key to premetastatic niche formation and are related to the prognosis and survival of patients [18,[38][39][40]. In addition, the continuous stimulation of tumour-derived factors leads to the formation of immunosuppressive microenvironments in lymph nodes, including cytotoxic T-cells. In addition, as the main cell component of lymph nodes, B cells have been shown to be essential for lymphangiogenesis, which indicates that B cells have a certain role in lymph node metastasis. Mechanistically, new roles of B cells in tumour-promoting humoral immunity and resistance to cancer therapy have been found [38]. Although in lymph nodes, T cells are the main component of this environment, there is a significant accumulation of B-cells, rather than T-cells, in lymph nodes shortly after breast cancer cell inoculation. And the IgG production of these B-cells increased significantly in lymph nodes, suggesting the B-cells promote the lymph node premetastatic niche formation through the systemic humoral immune response to tumours [38,41]. From the ImmPort database, we found that more genes were involved in B-cells rather than T-cells receptors processes, which is consistent with the latest research finding [38]. As far as we know, our study is the first to develop an immune systembased multigene expreesion classifier for determination of TNBC ALN status. Among the remaining genes in our IRGs signature, NRAS and PIK3CA have been reported to involve in cancer cell proliferation, invasion, and angiogenesis [42,43]; however, the function of IGHG2, IGHJ2 and IGKV1D_39 in cancer is not completely clear and deserves further study.
Although our current results show that our IRGs signature can distinguish TNBC ALN status preoperatively, there were some inherent limitations to the current study. First, the retrospective nature of our study is the inherent limitation, although we have tried to include different data cohorts for   training and validation of our biomarker. Second, the source of tumour samples for gene expression detection is from paraffin tissue samples, not from fresh tumour coarse needle puncture tissues before axillary surgery. Considering that breast punctual biopsy is a common diagnosed practice to collect biopsy  samples before the surgery, it would have been ideal to validate our gene signature in punctual biopsy specimens. In addition, we did not evaluate the role of our model in other molecular subtypes and BC patients treated with neoadjuvant chemotherapy because our study was initially conducted based on operable TNBC. Moreover, diverse biological processes that may provide more complete molecular features of the tumours will be integrated in our future studies.

Conclusions
In this study, we developed a novel IRGs signature to predict ALN metastasis of patients with TNBC. IRGs nomogram, which incorporated IRGs signature and US-based ALN status showed clinical usefulness in stratifying patients with or without ALN metastasis in TNBCs and identifying patients who can be exempted from axillary surgery. These results may improve current treatment standards. Solid red or blue line: net benefit when all breast cancer patients are considered according to the developed IRGs signature or nomogram model. In our current study, the decision curves in both the training and validation sets showed that if the threshold probability is 14% or more, then using the IRGs signature or nomogram to predict ALN metastases adds more benefit than treating either all or no patients, while the ideal model is the model with the highest net benefit at any given threshold.