Can we well assess the relative efficacy and tolerability of a new drug versus others at the time of marketing authorization using mixed treatment comparisons? A detailed illustration with escitalopram

Objective To assess the variation of relative efficacy and tolerability of an antidepressant versus others based on both pre-marketing (registration studies) and post-marketing studies versus pre-marketing studies only in patients with major depressive disorder. Methods The relative efficacy and tolerability of antidepressants was assessed by mixed treatment comparisons (MTCs) using data acquired over two time periods: before registration of the reference drug escitalopram (1989–2002) and up to 5 years later (1989–2007). Ranking probability outputs were presented for efficacy, using change from baseline to 8 weeks on Montgomery–Åsberg Depression Rating Scale total score, and tolerability, using withdrawals due to adverse events. Results The relative efficacy and tolerability of some selected antidepressants, including escitalopram, varied considerably over the two time periods. The improved relative efficacy and tolerability of escitalopram over time, compared with citalopram, was demonstrated by greater separation of ranking probability curves for efficacy and tolerability. In 2002, escitalopram ranked low with 13.9% and 5.1% probability of being in the top four antidepressants’ relative efficacy and tolerability, respectively. In 2007, ranking probabilities for relative efficacy and tolerability of escitalopram increased to 52.5% and 82.1%, respectively. Conclusions Time of marketing authorization may not be the most appropriate time to evaluate the relative efficacy and tolerability of a new antidepressant based on MTC approach due to the asymmetry of information between new and older compounds. However, the first evaluation of relative effect of a new drug for health technology assessment recommendations is commonly done at this time. Re-evaluation of a drug several years after its launch is likely to provide a more accurate indication of its relative efficacy and tolerability.

Clinical data obtained post-marketing authorization (MA) are intended to supplement or expand registration trial data. Such post-marketing studies are critical to update clinical benefit risk evaluation throughout the life cycle of the drug and to record the long-term safety of the drug. Therefore, post-registration studies are expected to differ from pre-approval studies in design, population enrolled, and so on.
To determine the relative efficacy of a therapeutic class, network meta-analyses (NMAs) can be conducted to generate comparative evidence from multiple randomized clinical trials (RCTs) assessing the same efficacy outcomes and indication, but involving different interventions (4,5). However, these evaluations usually include both direct and indirect comparisons based on different types of studies (preand post-MA studies). The results of these analyses often have to be interpreted with caution as the direct and indirect evidence are not always consistent one with each other.
The objective of this study is to assess the variation of relative efficacy and tolerability of an antidepressant, escitalopram, versus others based on both pre-marketing (registration studies) and post-marketing studies versus pre-marketing studies only in patients with MDD. The present study reports the results of a comparison between the selected antidepressants over two time periods using a mixed treatment comparison (MTC) method (5,6).

Methods
The first analyses incorporated only registration studies up to the MA date of escitalopram (an allosteric selective serotonin reuptake inhibitor [SSRI]) which was chosen as the reference antidepressant.
The second analyses included both registration and post-marketing studies in the same indication that were published up to 5 years later. Therefore, the current settings were similar to the conditions of a health technology assessment (HTA) evaluation.

Time periods
Two time periods, from 1989 to 2002 and from 1989 to 2007, were used for the analysis of the relative efficacy and tolerability of antidepressants. The first time period included studies up to 2002, the year of the MA for escitalopram in both the United States and Europe. In practice this means that only registration studies were considered for escitalopram on this first time period, while both registration and post-marketing studies were included for the other antidepressants as they received a marketing authorization earlier. The second time period included additional studies that were published up to 5 years post-approval and ended in 2007, as reassessment of drugs by HTA authorities are often conducted after this time period.

Study selection and antidepressants evaluation
In order to define a homogeneous pool of treatments between the two time periods only the following new generation antidepressants for which the MA was obtained before 2002 were included: citalopram, escitalopram, fluoxetine, fluvoxamine, milnacipran, mirtazapine, paroxetine, reboxetine, sertraline, and venlafaxine (pooled IR and XR) Placebo was also considered. Clinical studies were retrieved from a systematic review performed on MDD. Studies were identified by searching MEDLINE, Embase, Cochrane Central Register of Controlled Trials, and PsychINFO. The following conference proceedings were hand searched: American Psychiatry Association (APA), European College of Neuropsychopharmacology (ECNP), International College of Neuropsychopharmacology (CINP), New Clinical Drug Evaluation Unit (NCDEU) with the following public and company registries: Clinicaltrials.gov, Clinicalstudyresults.org, Lundbecktrials.com, Forestclinicaltrials. com, Lillytrials.com, Novartis register, EMEA websites for retrieving the European Public Assessment Reports (EPAR), and FDA websites.
Non-English publications were excluded from the systematic review. Studies were also excluded if they did not report subgroup data for depression in an adult population (in case of larger patient selection) or did not report any of the outcomes of interest, or if they examined the continued use of antidepressants for relapse or recurrence prevention, or if they included B30 patients per treatment group.
Only treatment arms including approved doses of the selected antidepressants and placebo were included in the analyses (Table 1). If several approved doses of an antidepressant were included in a given RCT, doses were pooled together and treated as a single treatment arm (4).

Outcomes
Efficacy was evaluated as the change from baseline to 8 weeks (time window for assessment visit from 6 to 12 weeks) in the MADRS total score, which was the primary outcome in most of the escitalopram and citalopram studies considered above. Any studies without MADRS outputs were not used, including those that only reported outcomes based on the HAM-D. Tolerability was defined as the proportion of patients who withdrew from the study due to adverse events (AEs) (study duration between 6 and 12 weeks), out of the total number of patients randomly assigned to each antidepressant or placebo and who received at least one dose of IMP.

Imputation methods
If any values were not reported in the original data source, queries were made to find the data. In addition, imputation rules were defined for cases where data were missing for the year of publication or the standard deviation (SD) of the change from baseline to 8 weeks on the MADRS. Missing years of publication data were imputed by the date of the clinical report plus 1 year or the date of study completion plus 2 years. Missing SD values were either recalculated from the standard error (SE) or 95% Credible Interval (CrI) values, or imputed by 10 (value defined by the observed mean of the SD across studies and treatment arms). When necessary, the change from baseline was recalculated as the difference between endpoint and baseline values (SD imputed by 10, corresponding to the rounded value of the average of the SDs observed across all studies and treatment arms). This imputation was performed before the pooling of doses so that differences between doses results in additional variability.

Data analysis
MTCs (5, 6) were performed to assess the relative efficacy and tolerability of the selected antidepressants based on available RCT data from the two time periods. A Bayesian NMA with random effect was performed using a linear model with normal likelihood distribution for the MADRS change from baseline and a logistic model with binomial distribution for withdrawals due to AEs. The random effect model specification was chosen as it provides conservative estimates of variance and as such reduces the risk of spurious findings compared to the fixed effect specification. Vague priors were assigned to the parameters of interest: normal distributions with mean 0 and variance 10,000 for the baseline and treatment effects of the trials, and uniform distribution between 0 and 5 for the between-trial SD. Initial values were randomly drawn independently for three parallel chains. As this process is iterative, a thin rate of two was applied in order to reduce auto-correlation between consecutive draws. The Bayesian analysis consisted of 40,000 draws with a burn-in of 10,000 in order to ensure convergence (so that the draws used in the analyses actually inform on the uncertainty of parameters of interest and not on convergence). Outcomes were entered for every treatment group of each study and relations between the relative efficacy indicators (difference between treatments or odds ratios [OR]) across studies were specified in order to estimate all possible pairwise comparisons between treatments. This method combined direct and indirect evidence for any given pair of treatments. Based on the rank of treatments in each Bayesian draw, the probability for each treatment to have a given rank and the mean ranks were estimated.
An analysis of the relative efficacy and tolerability of the antidepressants was performed for the two embedded time periods (1989Á2002 and 1989Á2007). Outputs generated from these analyses were presented as ranking probabilities for MADRS change from baseline and for withdrawal rate due to AEs, respectively. Ranking probability curves for escitalopram (the active S-enantiomer of citalopram) and citalopram (a racemic mixture of S-and R-citalopram, originally launched in 1989) were examined more thoroughly (7).
A comparison of the relative efficacy and tolerability was also made for all antidepressants meeting the inclusion criteria for analysis and placebo. Difference to placebo, mean ranks, and probability of being ranked among the top four antidepressants are also presented.
Software: analyses for the MTCs were run with WinBUGS † version 1.4, using standard programs developed by Bristol university. Programs can be provided upon request.
All analyses were performed by one statistician and quality control was conducted by another.
Results from analyses were then compared and any discrepancies in the analyses were resolved by program examination by the statisticians.

Results
The number of RCTs selected from the database and included in the MTC analysis of efficacy and tolerability for each antidepressant and each time period (1989Á2002 and 1989Á2007, respectively) is detailed in Table 1 and a flowchart is presented in Fig. 1.

Rankings for efficacy outcomes over the two time periods
Fifty-four studies were selected for the efficacy evaluation (as assessed by the change from baseline in MADRS reported at 8 weeks) for the time period 1989Á2002 and 82 studies for 1989Á2007 (Fig. 2). Within the first time period, mirtazapine, venlafaxine, and paroxetine ranked highest overall (mean ranks of 2.00, 2.64, and 3.38, respectively), while fluvoxamine ranked lowest among active treatments (mean rank 9.98), just above placebo (mean rank 10.69). A comparison of findings for the two time periods revealed some ranking probability changes between 2002 and 2007. However, mirtazapine and venlafaxine still ranked highest in terms of efficacy (mean ranks of 1.86 and 2.71, respectively) in 2007. Fluvoxamine continued to rank lowest for efficacy (mean rank 9.97), with a mean rank value for placebo of 10.64. Focusing more precisely on escitalopram and citalopram, their MADRS change from baseline ranking probability curves were mostly overlapping in 2002 (Fig. 3a), while a much clearer separation in favor of escitalopram was observed in 2007 (Fig. 3b). MADRS change from

Rankings for tolerability outcomes over the two time periods
Eighty-four studies were selected for the tolerability evaluation (as assessed by the withdrawals rates due to AEs reported at 8 weeks) for the time period 1989Á2002 and 135 for 1989Á2007 for the antidepressants of interest (Fig. 4). In 2002, milnacipran ranked highest for tolerability with a mean rank of 2.12 compared with 1.39 for placebo, while escitalopram ranked lowest at 9.21. However, differences between drug tolerability rankings were seen for 1989Á2007 data, with both milnacipran and escitalopram ranked high. Mean ranks were 3.14 for milnacipran, 3.42 for escitalopram, and 1.11 for placebo.
When escitalopram and citalopram were compared over the two time periods, ranking probability curves showed a clear differentiation between escitalopram and citalopram tolerability in 2007, in favor of escitalopram (Fig. 5b). In contrast, the ranking probability curve for escitalopram based on data up to 2002 provided clear evidence of non-separation of escitalopram's improved tolerability relative to citalopram (Fig. 5a). Differences for escitalopram compared to placebo in the log-OR for dropout rates decreased from 1.    Relative efficacy and tolerability of all antidepressants Relative efficacy varied considerably according to the time of the evaluation for several of the other drugs (Fig. 6). Paroxetine exhibited reduced relative efficacy in 2007 compared to 2002, while the relative efficacy of sertraline was enhanced. The tolerability of both of these antidepressants did not change substantially between the two time periods. However, milnacipran, fluvoxamine, and venlafaxine showed little change in efficacy over the two time periods. Milnacipran exhibited a very high tolerability profile and a low efficacy profile for both time periods. Venlafaxine maintained a good efficacy for both time periods, with poor tolerability. In contrast, fluvoxamine presented a low efficacy and tolerability profile during both time periods.

Discussion and conclusions
This paper reports the results of MTCs that were conducted to examine the relative efficacy and tolerability of a number of new-generation antidepressants in MDD patients between 1989Á2002 and 1989Á2007. The first time period included registration studies up to time of launch of the reference antidepressant (escitalopram), while the second time period included additional studies that were published up to 5 years later, to coincide with the time of reassessment by HTA authorities several years after the drug was first approved. Despite using the same antidepressants and the same inclusion/exclusion criteria, considerable variation was observed in relative efficacy and tolerability over the two time periods.
In the case of citalopram and escitalopram, both antidepressants had similar efficacy (mean rank 6.14 vs. 6.78) and tolerability (mean rank 7.70 vs. 9.21) around the time of launch for escitalopram (2002). However, greater ability to differentiate between citalopram and escitalopram was observed 5 years later (in 2007) for both efficacy (mean rank 7.27 vs. 4.42) and tolerability (mean rank 7.14 vs. 3.42), with a markedly improved ranking for escitalopram.
Such variability in the relative ranking of antidepressants can arise after a drug has been authorized for the market as the study design of post-marketing trials can differ greatly from the design of the registration trials. Further profiling studies may be conducted post-marketing in patient subgroups in which a new drug is expected to demonstrate a better effect. Subsequent inclusion of data reporting the higher benefits seen in specific patient subgroups for antidepressants may therefore improve ranking of this drug several years after the initial launch date. This is highlighted by the stability of relative efficacy observed for old compounds such as fluvoxamine, for which a low number of studies were conducted after 2002 ( Table 1).
The results obtained can also be influenced by shift in treatment guidance and practice, therefore such an overtime comparison requires precautions.
Post-hoc analysis has demonstrated that the benefits of citalopram over placebo are relatively constant, regardless of disease baseline severity, while escitalopram separation from placebo increases with severity of depression (7). As a result, several RCTs designed and published after the approval of escitalopram have shown that escitalopram 20 mg is particularly beneficial in severely depressed patients (baseline MADRS ]30) (7Á9), with superior effects over citalopram or paroxetine reported in this patient subgroup (baseline MADRS ]30); with the benefits improving further as baseline MADRS score gets higher (7Á9). Moreover, while escitalopram 10 mg is the optimal dose for patients with moderate depression (baseline MADRS 22Á29), its efficacy decreases in severely depressed patients (10). These results demonstrate the importance of reevaluation of a drug over time and have important consequences for HTA agencies and other decisionmakers who need to evaluate a drug at its time of launch and make a decision on whether to reimburse the drug.
Data presented here on the relative efficacy of a selection of antidepressants during the second time period support a landmark study by Cipriani and colleagues that evaluated the relative efficacy and tolerability of SSRIs and serotonin-noradrenaline reuptake inhibitors (SNRIs) in MDD using NMA based on publications up to 2007 (11). The outcomes of the Cipriani report have since been used by HTA agencies in drug evaluations and clinical guidelines (11,12). However, the methods used in the Cipriani report have limitations that may influence the outcomes and should be taken into consideration, including bias in study selection (data from placebo-controlled comparisons were excluded if they did not include an active comparator) (13,14), choice of outcome (the definition of the response may be inconsistent and general withdrawal was considered instead of withdrawal due to AE which is closer to tolerability), and differential drug exposure (further profiling studies in specific patient subpopulations may improve the response of a more established drug, patients that are non-responders/intolerant to a comparator may be excluded, and there may be bias towards response with a more established drug[functional unblinding]) (15). In contrast to the Cipriani study, which included only acute phase studies and head-to-head trials with active comparators, study selection for the MTC analyses reported here also included studies with placebo as the only comparator. Furthermore, the database used for the MTC data reported here included all publications up to 2007 and excluded studies presenting only HAM-D outcomes. Such differences in study selection may explain changes in the ranking of specific antidepressants, such as the improved ranking of reboxetine, compared with the Cipriani report (11).
Typically, any variation in the tolerability of drugs should not be as prominent as for efficacy as it is expected that drugs will have similar tolerability profiles across slightly varying patients' populations or disease's characteristics. For illustration, variations in tolerability observed in the current analysis may arise from the possibility that patients with improved symptoms, particularly those with severe MDD, will choose not to withdraw from the study despite the onset of AEs. Variations in tolerability may also be associated with important limitations of the MTC methods used, such as untested comparisons between study populations (due to selection/ publication bias), pooling of data from several approved drug doses, or the choice of comparator (active reference or placebo) selected for direct and indirect comparisons between antidepressants of interest. These elements can potentially hinder the assumptions of homogeneity and consistency underlying the MTC methods. Thus, although MTCs combining direct and indirect drug comparisons can provide useful information for decision-makers, care must be taken when interpreting findings from such analyses.
No quantitative evaluation of consistency within the presented treatment network has been conducted. However, it is worth noting that no clear sign of inconsistency was identified in the original Cipriani evaluation, and that the results of this are very consistent with what was obtained here on the full time period (up to 2007).
In conclusion, the time of MA may not be the most appropriate time to evaluate the relative efficacy and tolerability of therapies for a number of reasons. First, the comparison of a new drug versus more established agents may be biased due to the limited evidence that has accumulated for newer drugs and the use of potentially sub-optimal doses prior to drug launch. Second, at launch, there may be insufficient time to capture all of the benefits of a new drug and these will not be fully understood until additional data are collected postregistration. Third, current methods of comparison may be limited due to publication or selection bias. As it is important for HTA agencies and decision makers to evaluate a new drug against other alternatives at the time of launch, the recommendation is to compare drugs from studies with very similar experimental conditions in terms of patient populations and study design. One way to ensure this would be to evaluate a new drug against more established drugs based on the data from pivotal studies submitted as part of the registration package at the time of launch (16). Nevertheless, it is clearly important to reassess the drug relative effect several years after its launch date. In this case, appropriate MTCs should be conducted 5 years after drug launch based on all relevant RCTs, when clinical development of the drug is often completed and there is likely to be less variation of drug relative efficacy and relative tolerability over time.