A gene expression based predictor for high risk myeloma treated with intensive therapy and autologous stem cell rescue

Myeloma is characterized by a highly variable clinical outcome. Despite the effectiveness of high-dose therapy, 15% of patients relapse within 1 year. We show that these cases also have a significantly shorter post-relapse survival compared to the others (median 14.9 months vs. 40 months, p = 8.03 × 10− 14). There are no effective approaches to define this potentially distinct biological group such that treatment could be altered. In this work a series of uniformly treated patients with myeloma were used to develop a gene expression profiling (GEP)-based signature to identify this high risk clinical behavior. Gene enrichment analyses applied to the top differentially expressed genes showed a significant enrichment of epigenetic regulators as well as “stem cell” myeloma genes. A derived 17-gene signature effectively identifies patients at high risk of early relapse as well as impaired overall survival. Integrative genomic analyses showed that epigenetic mechanisms may play an important role on transcription of these genes.


Introduction
Multiple myeloma (MM) is a malignancy of plasma cells which accumulate in the bone marrow, causing clinical symptoms as a consequence of myelosuppression, osteolysis and the production of monoclonal protein [1]. Although the advent of new agents during the last decade has made therapies in MM more diverse, high-dose therapy (HDT) followed by autologous stem cell transplant (ASCT) remains a standard treatment for newly diagnosed patients with myeloma who are considered to be able to tolerate the procedure. Compared to prior standard treatments it has been shown to increase the median overall survival (OS) by a year to 4 -5 years [2,3]. Th e incorporation of novel agents as part of induction and maintenance has improved outcomes further [4 -6]. However, virtually all patients with transplant eventually relapse, and the duration of remission is highly variable, ranging from a few months to more than 10 years. Th e diff erence in outcome is thought to be mediated via tumor acquired genetic diff erences [7]. A diagnostic test able to identify patients at high risk of early relapse based on such genetic diff erences would be of great clinical utility, as it would allow clinicians to design and implement trials investigating new therapeutic strategies in this high risk subset. Any test used in such a setting should have high specifi city for the correct identifi cation of high risk behavior as well as have good sensitivity to be able to identify such cases.
Until now a number of approaches have been used to determine risk status. Th e initial approach used β 2microglobulin ( β 2M) level, which was subsequently incorporated into the international staging system (ISS) [8]. While the ISS is generally applicable, it works best for classifying populations entered into clinical trials and lacks biological relevance. Genetic variables associated with poor outcome have been identifi ed, and fl uorescence in situ hybridization (FISH) studies have identifi ed the presence of del(17p13), gain(1q21) and del(1p) as well as adverse translocations t (4;14), t(14;16) and t (14;20) as high risk factors [9 -13]. However, despite the fact that this approach can identify distinct biological groups, the sensitivity and specifi city for identifying high risk behavior are too poor to be used as a diagnostic tool, and developing alternative approaches is a priority to reliably identify patients at high risk.
Gene expression profi ling (GEP) off ers a potential solution to identify high risk behavior, and gene signatures based on global GEP of presenting samples have been

Abstract
Myeloma is characterized by a highly variable clinical outcome. Despite the eff ectiveness of high-dose therapy, 15% of patients relapse within 1 year. We show that these cases also have a signifi cantly shorter post-relapse survival compared to the others (median 14.9 months vs. 40 months, p ‫؍‬ 8.03 ؋ 10 ؊ 14 ). There are no eff ective approaches to defi ne this potentially distinct biological group such that treatment could be altered. In this work a series of uniformly treated patients with myeloma were used to develop a gene expression profi ling (GEP)-based signature to identify this high risk clinical behavior. Gene enrichment analyses applied to the top diff erentially expressed genes showed a signifi cant enrichment of epigenetic regulators as well as " stem cell " myeloma genes. A derived 17-gene signature eff ectively identifi es patients at high risk of early relapse as well as impaired overall survival. Integrative genomic analyses showed that epigenetic mechanisms may play an important role on transcription of these genes.

Keywords: Myeloma , gene expression profi ling , risk , predictor
High risk myeloma treated with HDT 595 explored [14 -16]. However, they still have a number of drawbacks, including a failure to take account of known cytogenetic subgroups with distinct clinical behaviors. In order to develop a novel risk predictor for patients treated with HDT, we have utilized a series of uniformly treated patients with myeloma and driven the analysis using early relapse as the endpoint to develop a signature able to identify high risk cases.

Patients
Th e Medical Research Council (MRC) Myeloma IX trial (ISRCTN68454111) enrolled 1960 patients with newly diagnosed symptomatic myeloma who were allocated to two main treatment pathways, intensive ( n ϭ 1111) or nonintensive ( n ϭ 849), at the discretion of the treating physician, taking account of age and performance status [4]. For the purpose of reliably defi ning subjects with early relapse following the HDT procedure, cases used in this study were all based on per protocol rather than intention to treat. Of the 1111 patients in the intensive arm, 747 cases who actually received HDT were used for subsequent analyses [Supplementary Figure 1(A) available online at http:// informahealthcare. com/doi/abs /10.3109/10428194.2014.91 1863]. Early/late relapse subgroups were defi ned by calculating the duration from the time of HDT to subsequent relapse.
Gene expression profi les from CD138-selected bone marrow plasma cells (to a purity of more than 90%) of 261 patients from Myeloma IX (GSE15695) were collected as previously described [17]. In addition, two other similar publicly available gene expression sets were used in this study for development of a GEP-based predictor. Ninety-seven evaluable cases from Myeloma IX and 82 evaluable cases from the Hemato-Oncology Foundation for Adults in the Netherlands (HOVON)-65 trial (GSE19784; ISRCTN64455289), 18 being treated with single HDT, were combined to form a training set of 179 samples, and batch eff ect was removed using Bioconductor package Combat 19. In general, cases who died from causes other than progressive myeloma (mostly other cancers, heart disease, stroke and infection) within 1 year post-HDT were excluded, as the relapse status at 1 year post-HDT could not be assessed. As there was partial overlap of samples between HOVON-65 and GMMG-HD4 datasets, the subjects present in both datasets were excluded from the training set to ensure the independence of the test set. Following the same selection criteria, 155 patients from the German-Speaking Myeloma Multicenter Group (GMMG)-HD3/HD4 trial (E-MTAB-362; ISRCTN06413384) [20], of whom 56% were treated with double HDT, were used as a validation set for the GEP-based predictor. A summary of samples used in this study, therapy schedule and analysis fl ow is outlined in Supplementary Figure 1(B) available online at http://informahealthcare. com/doi/abs/ 10.3109 /10428194.2014.911863. Th e detailed designs of these trials have been reported previously [4,6,21]. A further independent dataset (GSE24080) was used to validate its eff ect on PFS and OS.
Genes identifi ed as being diff erentially expressed were correlated with matching SNP-based mapping ( n ϭ 99) and DNA methylation profi ling ( n ϭ 118) data from the Myeloma IX trial to explore possible mechanisms underlying deregulation.

Bioinformatics and statistical analysis
GEP of all samples from the training and test sets was carried out on the Aff ymetrix Human Genome U133 Plus 2.0 platform, and gene expression signals were quantifi ed using robust multi-array average (RMA) normalization. All analyses were performed in R 2.10.1 and Bioconductor. Diff erentially expressed genes between patient groups were selected using signifi cance analysis of microarray (SAM) (Bioconductor package samr ) with a 1000-permutation adjustment. Th e LASSO algorithm (Bioconductor package glmnet ) was used to further refi ne the selection to a subset of non-correlated genes with strongest discriminative power for early relapse. Th e selected genes were fi tted in a logistic regression model to obtain an optimal model, and a risk score (z) for early relapse was calculated by a linear combination of the expression levels of the 17 selected genes at presentation, weighted by their estimated regression coeffi cients; subsequently the probability for early relapse could be calculated accordingly (Table III). Th e predictive power (sensitivity/specifi city) of a model was tested using a receiver operating characteristic (ROC) method and the corresponding area under the curve (AUC) was calculated, which can be interpreted as the chance of getting the prediction correct.
Th e associations between early and late relapse groups and various clinical parameters were investigated using Fisher ' s exact test for categorical parameters and Wilcoxon test for continuous variables. Any parameters statistically associated with early relapse were combined in a multivariate logistic regression model to test their independence. Performance of the predictive models was compared using the likelihood-ratio test (R package anova.glm ). Either Wilcoxon or Kruskal -Wallis test was used to correlate the gene expression level and single nucleotide polymorphism (SNP)-mapping as well as DNA methylation data where appropriate. As the impact of DNA methylation on gene expression is thought to result from a discrete methylation pattern (hyper-or hypo-methylation), we used the unsupervised k -means method to defi ne the high/low methylation groups for each gene. Th e binary distribution of promoter methylation of the genes was visualized by Kernal density plot. Th e distribution of OS and progression-free survival (PFS) between risk groups was estimated using the Kaplan -Meier method (log-rank test). Th e performance of the derived high-risk gene signature (REL-17) was compared with another gene signature using multivariate logistic/Cox regression analysis. Pathway analyses were performed using GeneGo ' s MetaCore (www.genego.com).

Defi ning a patient group with high risk clinical behavior following HDT
In order to identify a group of patients who have poor outcome post-HDT, 423 out of the 747 patients from Myeloma IX who relapsed after HDT were split into subgroups based on time to progression. Th e results showed that patients who relapsed within 6 months and those who relapsed between 6 months and 1 year had signifi cantly shorter post-relapse survival compared to the others [ Figure 1(A)]. When combined, these cases had a median post-relapse survival of 14.9 months, in contrast to 40 months for those who relapsed after 1 year [ Figure 1(B), log-rank test p ϭ 8.03 ϫ 10 Ϫ 14 ], suggesting that they may represent two biologically distinct groups.

Clinical and FISH parameters associated with high risk behavior
Among the 747 patients treated with upfront HDT, relapse status at 1 year post-HDT was available in 718 cases, among whom 18.2% (131 out of 718) relapsed within 1 year post-HDT. In order to compare the associated clinical and FISH parameters of this group with the rest of cases ( n ϭ 587), univariate logistic regression analyses were performed on parameters including gender, age, World Health Organization (WHO) performance status, hemoglobin (Hb), platelets (Plt), albumin (Alb), β 2 M, creatinine (Cr), calcium (Ca), lactate dehydrogenase (LDH) and C-reactive protein (CRP), paraprotein type (immunoglobulin A [IgA] vs. non-IgA), light chain type, duration from diagnosis to HDT, type of induction therapy, thalidomide maintenance, adverse immunoglobulin heavy chain (IgH) translocations [including t(4;14), t(14;16) and t(14;20)], del(17p), gain(1q) and del(1p). Th e results show that only low Hb, low Plt, presence of adverse IgH translocations and gain(1q) at presentation were statistically associated with early relapse ( p Ͻ 0.05). When tested together in a multivariate logistic regression model, only adverse IgH translocations and gain(1q) retained statistical signifi cance (Table I)

Development of a GEP-based predictor for early relapse
We investigated whether a GEP-derived predictor could improve or outperform the FISH-based predictor. Among the 179 patients in the training set, 22.3% (40 patients) relapsed within 1 year post-HDT, which is comparable to the complete dataset. Th e GEP of these two groups of patients was compared using signifi cance analysis of microarray (SAM) [22], and 207 genes were identifi ed as being diff erentially expressed at 5% false discovery rate (FDR), among which 173 were up-regulated in early relapsed patients and 34 were down-regulated (Supplementary Table I    High risk myeloma treated with HDT 597 Each gene ' s expression values were dichotomized, using the 75th percentile as a threshold between high and low expression for the up-regulated genes and the 25th percentile as a threshold for the down-regulated genes. Th ese genes underwent further shrinkage and selection using the LASSO algorithm [24], yielding 17 genes with the strongest discriminative power for early relapse. Th e selected genes were fi tted in a logistic regression model to generate an optimal geneexpression based predictor for likelihood of relapse within 1 year post-HDT (Table III).
Th e 17-gene signature (REL-17) had an AUC of 0.917 with an optimal sensitivity and specifi city of 87% and 83%, respectively [ Supplementary Figure 3 poor outcome associated with gain(1q) by FISH. Notably WHSC1 , FGFR3 and MAF were among the top diff erentially expressed genes, known to be deregulated via t (4;14) translocation and t(14;16), respectively.
In order to develop a robust yet manageable GEP-based predictor for early relapse, we repeated the analysis with a more stringent FDR of 0% and obtained 37 diff erentially expressed genes, among which 30 were up-regulated in early relapsed patients while seven were down-regulated (Table II). Th e 37 genes still showed marked overrepresentation of genes from chromosome X ( p ϭ 0.0001), while the overrepresentation of genes from chromosome 1 was no longer seen. Among these 37 genes, pathway analyses identifi ed signifi cant enrichment of epigenetic regulators, including genes involved in histone modifi cation ( WHSC1 , HIST1H4H ) as well as other chromatin modifi cators NAP1L3 and HMGN5 (p Ͻ 0.05, adjusted for Benjamini -Hochberg multiple testing). Notably, six of the top deregulated genes ( NUDT11 , PKP2 , ROBO1 , AGAP1 , NAP1L3 and EPDR1 ) were recently identifi ed as " stem cell " genes in myeloma [23]. were also compared with those derived from the Erasmus University Medical Center (EMC)-92 signature [25] in multivariate analyses for their performance of predicting relapse within 1 year post-HDT, PFS and OS, respectively, in the independent test set. Th e results showed that the REL-17 signature performed best for predicting relapse within 1 year and PFS (Table IV), although was also associated with OS [ Figure 2(D)]. We applied the REL-17 signature to a further independent dataset (GSE24080), either as a whole ( p -value 2.03 ϫ 10 Ϫ 9 and 3.73 ϫ 10 Ϫ 11 for PFS and OS respectively) or in two subsets from separate trials (Supplementary Figure 4 available online at http://informahealthcare. com/doi/abs/1 0.3109/10428194.2014.911863), where its eff ects on PFS and OS were also validated.

Mechanisms of gene deregulation
We explored the possible mechanisms underlying the deregulation of the diff erentially expressed genes by carrying out integrative analyses of GEP, DNA methylation and SNP-mapping array data. Among the top 37 genes, expression levels of MSTO1/MSTO2P (on 1q22), RSBN1 (on 1p13.2), EEF1A1 (on 6q14.1) and REEP5 (on 5q22) were positively correlated with copy number changes (Supplementary Figure 5 available Table III. Th e 17 selected genes were fi tted in a logistic regression model to generate an optimal GEP-based predictor for likelihood of early relapse post-HDT, and the probability for each case could be calculated accordingly. Probability (early relapse) ϭ 1/1 ϩ e Ϫ z GEP, gene expression profi ling; HDT, high-dose therapy. Figure 2. Eff ect of risk groups derived from the REL-17 signature on PFS and OS. In the training set, 15.1% cases being identifi ed as having more than 60% chance to relapse within 1 year had signifi cantly shorter PFS (A; median 13.8 vs. 34.8 months, p Ͻ 10 Ϫ 16 ) and OS (B; median 29.9 vs. 88.1 months, p ϭ 2.39 ϫ 10 Ϫ 14 ) in contrast to those at lower risk. Using the same criteria 12.3% patients being identifi ed at high risk in the test set also had signifi cantly shorter PFS (C; median 15.9 vs. 40.5 months, p ϭ 10 Ϫ 7 ) and OS (D; median 53 months vs. not reached, p ϭ 0.0003) compared to rest of the cases. a clear diff erential eff ect on post-relapse survival between patients who relapse within 1 year in contrast to those relapsing later. We used this observation as a tool to derive a GEP-based predictor for outcome that eff ectively identifi es cases at high risk of early relapse. When tested in the training set, cases considered as being at high risk, having a Ͼ 60% probability of early relapse, were shown to have signifi cantly shorter PFS and OS compared to the rest of the cases. Th e eff ects on both OS and PFS were validated in two independent datasets; interestingly, the prognostic merit of this predictor was also seen in the Total Th erapy 3 (TT3) cohort, which comprises the most intensively treated cases so far (bortezomib/thalidomide-combined induction followed by double autograft with bortezomib/lenalidomide-combined maintenance) [26]. Th erefore this signature seems to be applicable for all cases receiving HDT-ASCT regardless of the type of induction and maintenance therapy received.
We show that the presence of known FISH-based abnormalities, including adverse IgH translocations and gain(1q), are strongly associated with early relapse following HDT. However, in our analysis this FISH-based predictor only has a modest predictive performance, and therefore lacks the sensitivity and specifi city to be used as a prognostic test. Furthermore these FISH abnormalities do not statistically improve the predictive capability of the REL-17 signature for high risk clinical behavior. Th is is consistent with observations from the EMC-92 [25] and University of Arkansas for Medical Sciences (UAMS) [14] signatures, suggesting that additional biological features, defi ned by the GEP, interact with behavior induced by the FISH variables to determine high risk behavior. As the number of genes comprising this signature is low and the risk score is calculated based on the binary expression status of each gene (high/low), it could be transformed into a reverse transcription-polymerase chain reaction (RT-PCR)-based test.
Among the 37 genes most diff erentially expressed between the two risk groups, the expressions of FGFR3 , WHSC1 and DSG2 are known to be deregulated via t (4;14). Th e association of t(4;14) myeloma with aggressive relapse has been reported in a number of studies [27,28]. Th ese genes, together with another fi ve signifi cantly deregulated genes, NGFRAP1 , NAP1L3 , TEAD1 , LRP12 , AGAP1 ( CENTG2 ), are among the overexpressed genes previously seen within this molecular subgroup [23]. MAF , another gene associated with early relapse, is a transcriptional activator of key target genes and is mainly deregulated via t(14;16) [29].
Gene enrichment analysis by chromosome location shows that the 207 deregulated genes (FDR Ͻ 0.05) were signifi cantly overrepresented on chromosome 1 and X; notably the overrepresentation of genes on chromosome 1 was no longer seen among the top 37 deregulated genes (FDR Ͻ 0). Th ese fi ndings support the importance of genes in the 1q12-q23 region as outcome predictors, but also that there are other genes playing more important roles in determining high risk behavior, which are not able to be identifi ed by the FISH approach. Despite the over-representation of genes from chromosome X, the risk of early relapse was not associated with gender (data not shown). Among the genes located on chromosome X, CTAG1 , CTAG2 and MAGEB2 online at http://informahealthcare. com/doi/abs /10.3109/ 10428194.2014.911863, p Ͻ 0.05).
We also looked at the correlation between the expression level of these top deregulated genes and the methylation status of their promoter, except for NUDT11 and MAL2 , for which there were no corresponding methylation probes. Th e methylation probes for these genes were all located in CpG dense areas and mapped to promoters or transcription start site, with their DNA methylation status following a binary distribution [Supplementary Figure 6(A) available online at http://informahealthcare. com/doi/abs/ 10.3109/10428194 .2014.911863]. Th e expression levels of nine genes ( CTAG1 , EPDR1 , CTAG2 , PKP2 , NGFRAP1 , LRIG1 , NAP1L3 , MAF , LRP12) were statistically correlated with the methylation status ( p Ͻ 0.05). Scatter plots for these genes show a typical " L " pattern, indicating that the hyper-methylation status prevents the genes being transcribed, which suggests that promoter DNA methylation is likely to make an important contribution to the transcription of these genes [Supplementary Figure 6(B) available online at http://informahealthcare. com/doi/abs/10.3109/10428194.2014.911863]. As MAF has been shown to be deregulated via t (14;16) and can also be induced via WHSC1 translocation, the correlation for MAF was only analyzed in cases negative for both t(14;16) and t (4;14) by FISH, and a correlation with methylation status was seen. For the genes located on chromosome X, the correlations were also confi rmed in each gender group separately (data not shown).

Discussion
MM therapeutic schemes are changing rapidly with the constant introduction of new drugs. Although regimens containing novel agents may produce comparable response rates to those of HDT, prospective data based on a head-tohead comparison are limited, especially for the long-term eff ect. Th erefore these agents are currently incorporated prior to and following the HDT rather than replacing the procedure as fi rst-line regimen. Th e GEP signature in this study was developed and validated in a series of datasets using novel agents as induction and maintenance, which refl ects the current treatment settings.
In this study we show that one of the most signifi cant predictors for long-term survival following HDT-ASCT at presentation is the time to fi rst relapse. Our analyses show belong to the cancer testis gene family (CTAGs), which have been previously demonstrated to have prognostic value in patients with myeloma [30]. It is also noteworthy that the 37 top deregulated genes were enriched for the myeloma " stem cell " gene set [23], including NUDT11 , PKP2 , ROBO1 , AGAP1 , NAP1L3 and EPDR1 , indicating that " stemness " may play an important role in determining high risk behavior. It is interesting that one of these genes, ROBO1 , together with another deregulated gene, HMGN5, is also present in the UAMS-70 gene signature [14]. Integrative analysis of GEP and SNP-mapping array data shows that only four of the 37 top diff erentially expressed genes are possibly deregulated via copy number variation. Two of them are located on 1q22 and 1p13.2, respectively, which may refl ect the association of gain(1q) and del(1p) with inferior outcome. Th e majority of the top diff erentially expressed genes do not appear to be deregulated via mechanisms which could be detected by FISH, such as translocations or gains/losses. Th e exploratory analysis integrating GEP and DNA-methylation profi ling shows that nine of these genes were statistically associated with the DNA methylation level at the promoter, suggesting an epigenetic mechanism being involved in their transcription, among which CTAG genes have been previously shown to be silenced by DNA methylation during normal cellular diff erentiation [31]. We also found signifi cant evidence showing that three " stem cell " genes, PKP2 , NAP1L3 and EPDR1 , might be modulated via DNA methylation. Although MAF is normally deregulated via t(14;16) and t (4;14) in MM, there are still cases that express this gene while lacking these translocations, consistent with additional unknown mechanisms driving its transcription. In this analysis the association between MAF expression and the promoter methylation status suggests a possible epigenetic mechanism in its transcription, as is frequently seen in diff use large B-cell lymphomas [32].
In conclusion, in this work we have developed a GEPbased predictor for high risk myeloma treated with HDT-ASCT. Th e signature is biologically relevant and can identify individuals, who constitute up to 20% of newly diagnosed patients with myeloma, whose remission is not sustainable, with a high risk of relapsing within 1 year post-HDT. Patients identifi ed via such an approach could have their treatment modifi ed to improve outcomes. Th e future development of predictive signatures is likely to focus on the use of biologically relevant genes which are deregulated via specifi c mechanisms.