Introduction

Multiple sclerosis (MS) is a chronic inflammatory and demyelinating disease of the CNS, mostly affecting young adults, who invariably develop serious and irreversible neurological disability over a period of 10–30 years.1 Although no cure currently exists, the range of available treatments is rapidly broadening, at least for the relapsing–remitting form of the disease.2,3 The use of injectable therapies with various formulations of IFN-β and glatiramer acetate (GA) has changed the course of the disease, reducing the relapse rate as well as the development of new lesions as detected by MRI.4 The long-term experience with these drugs speaks to their safety over the long term, combined with an efficacy, in terms of reducing relapses, of about 30%.5,6,7,8 These benefits have to be weighed, however, against the burdensome need for regular injections.

In recent years, natalizumab, a monoclonal antibody administered intravenously once per month, has been shown to greatly reduce the frequency of relapses and MRI lesion formation in patients with relapsing–remitting MS (RRMS), with additional significant effects on the risk of disability progression.9,10 In many countries, however, the use of this drug is limited to RRMS patients who respond poorly to first-line therapies. Fingolimod, an oral sphingosine-1-phosphate receptor modulator, has demonstrated superior efficacy compared with placebo and low-dose IFN-β1a in phase III studies, with profound suppression of inflammatory activity (in terms of both MRI lesions and relapses) and effects on brain atrophy and disability progression.11,12 Owing to its safety profile, however, patients receiving fingolimod require intensive monitoring both before the first dose and during the course of treatment.13,14 Two new oral drugs, teriflunomide15,16 and dimethyl fumarate,17,18 have recently been approved in the USA for the treatment of patients with RRMS. In addition, a number of new molecules have shown promising results in phase III trials19,20,21,22 and could potentially be used in clinical practice in the near future.

This complex scenario creates an imperative for clinicians to have the instruments to make prompt treatment decisions in patients who have suboptimal treatment responses, so as to prevent MS from progressing to such an extent that any treatment adjustment would no longer be effective. However, the detection of early markers of response to any treatment is very challenging in MS, possibly due—at least in part—to the inherent complexity of the definition of response and non-response to therapy in this chronic disease.

Despite these hurdles, several studies have tried to provide rules and strategies to aid the classification of patients on the basis of response to treatment. Here, we will review these studies, focusing predominantly on response to IFN-β treatment. We will outline the many definitions of clinical response to IFN-β provided to date, and explore the markers that can predict such a response. We will then review the use, in MS clinical studies, of 'scoring systems' that combine different clinical and MRI markers to improve the definition of an early response, highlighting advantages and limitations of these approaches. The overall objective is to address, on the basis of previously identified markers of response to IFN-β treatment, the possibility of developing a common stratification strategy to enable personalized use of disease-modifying therapies in MS.

Definitions of response to IFN-β

In general, a response to a given therapy is deemed to have occurred when the treatment induces a benefit that would not have happened in its absence. Such events are difficult to identify in MS, for several reasons. First, MS has an unpredictable clinical course, which can vary greatly among patients regardless of whether they are treated with disease-modifying therapies. Second, patients often show spontaneous almost-complete remission following acute relapses (particularly in the relapsing–remitting phase), which makes it difficult to attribute even a drastic change in disease activity exclusively to treatment. Third, the available therapies are only partially effective, raising the possibility of responsive patients with residual disease activity. Last, we lack a standardized definition of the clinical outcomes that signify improvement or worsening of the disease course.

Currently, the trajectory of the disease course is evaluated largely on the basis of three outcomes—progression of disability, incidence of relapses, and presence of brain lesions on MRI—individually or in combination. Progression of disability is usually measured by the Expanded Disability Status Scale (EDSS),23 and is defined as an increase of 1 EDSS point (or 0.5 points if baseline EDSS is >5.5), confirmed on subsequent visits to exclude transient changes associated with relapses. Clinical relapses are defined as new or worsening neurological symptoms lasting more than 24 h, preceded by a minimum of 30 days of clinical stability or improvement and confirmed by objective findings on a neurological examination.24 Changes in MS brain lesion patterns that are visible on MRI reflect changes in the underlying disease pathology, which provides the rationale for using MRI lesions (usually quantified as the number of gadolinium [Gd]-enhancing lesions on T1-weighted imaging or new lesions on T2-weighted scans) as measures of disease activity.25

A review of MS studies confirms wide heterogeneity in the concepts of 'response' and 'non-response', and in the time frames during which these eventualities are assessed. In many studies, the definition of non-response to IFN-β treatment has been based on an increase of 1 EDSS point confirmed at 6 months.26,27,28,29,30,31 However, this change in disability status was often measured independently of the duration of the clinical follow-up, which varied greatly across studies (ranging from 1–6 years).26,27,28,29,30,31 Indeed, the timing of the event defining the non-response is a crucial component of the definition, since 1 point of EDSS progression has very different clinical meanings if experienced over 1 year, 2 years or longer. Parameters that have been used to define non-responders in long-term studies (15–16 years of follow-up) include the net change in EDSS score,32 as well as the milestone of reaching an EDSS score of ≥6 or of converting to secondary progressive MS.33 In other studies, non-response has been defined as the combination of EDSS progression and/or relapses26,28,31 or as the change in the relapse rate as compared with the pre-treatment relapse frequency.34,35,36,37,38 Moreover, the absence or presence of active lesions on MRI during treatment has been included in the definitions of response and non-response to treatment, respectively, owing to the common interpretation of MRI activity as expression of disease progression.34 Finally, even switching one drug with another has been considered as a criterion of non-response.30,39 A recent review, focused on studies of treatment switch in poor responders to therapies in MS,40 highlighted the variety of definitions of suboptimal treatment response based on MRI findings, relapses and disability progression.

In summary, the definitions of response to IFN-β are essentially based on measures of disability progression, clinical relapses and MRI activity, although a common definition is still lacking. A standardized definition of treatment response is of paramount importance to allow comparisons of different studies and validation of markers of response. To date, the largest effort to devise an evidence-based, homogeneous definition of response versus non-response to IFN-β therapy has come from the work of Río and co-workers.41,42,43 These authors evaluated different definitions of response to IFN-β over 2 years of therapy, and showed that the proportion of responders can range greatly (7–49%; Table 1) according to the stringency of the used criteria. Río et al. assessed the validity of the different criteria against the 'hard' outcome of reaching an EDSS score of 6 or exhibiting an increase in EDSS score of at least 3 points after 6 years from the commencement of treatment. The criterion based on EDSS progression, which defined as non-responders those patients who had an increase of 1 EDSS point confirmed at 6 months within 2 years from treatment start, had the highest sensitivity (criterion A in Table 1) in predicting the final outcome.42 The combination of EDSS progression and the presence of relapses in the same time frame of 2 years of treatment had the highest accuracy (criterion J in Table 1).42

Table 1 Definitions of non-response to IFN-β in patients with MS

Markers of response to IFN-β therapy

The definition of a 'marker of response' to a therapy encompasses two classes of factors. The first class, termed treatment effect modifiers, includes baseline variables that can identify subgroups of patients who show different treatment effects. The second class includes variables that can be measured earlier or more conveniently than the actual clinical end point of interest once treatment has commenced. The latter factors, also known as surrogate markers, can predict the effects of therapy on the relevant clinical end point, thereby identifying patients who are responding to the therapy. Assessment of both effect modifiers and surrogate markers requires the presence of a control group for comparison with the treatment group, so as to rule out the effects of variables that are simply prognostic factors (Box 1).

Baseline factors as effect modifiers

In the MS field, good examples of analyses of treatment effect modifiers have come from subgroup analyses of clinical trials evaluating different formulations of IFN-β in clinically isolated syndrome (CIS).44,45,46,47 In these studies, however, the effects of IFN-β were homogeneous across all the examined subgroups, so it was not possible to identify any baseline factors associated with a better response to IFN-β.

In other studies, correlations between baseline factors and clinical outcomes in IFN-β-treated patients were evaluated in the absence of a proper control group.34,35,36,37,38 In such studies, baseline factors were often associated with the response to IFN-β if they correlated with the response definition. For example, pre-treatment relapse rate was associated with non-response when the definition of non-response was based on a change in relapse rate during treatment, as compared with the years before treatment commenced.35,36,37 Also, high baseline MRI activity was associated with non-response if the definition of non-response was based on relapses and MRI activity.34 In all of these cases, it is impossible to determine whether the baseline factors associated with the response are real markers of response, whether they are simply prognostic factors, or whether the observed response reflects regression to the mean.48

Surrogate markers and predictors of response

To use a marker as a predictor of response to a given treatment, its validity as a surrogate marker—that is, its capacity to mediate, in the short term, the effects seen on the clinical (true) outcome in the long term—must be assessed. This assessment must be performed with a control group and, therefore, a randomized clinical trial represents the optimal setting.49

The MS literature reports several examples of validation of surrogate markers for clinical outcomes during IFN-β therapies.50,51,52 A recent summary of all the evidence indicates that, at the individual patient level, the effect of IFN-β on MRI active lesions mediates more than the 60% of the effect on relapses,52 while another study has shown that MRI lesions mediate about 57% of the IFN-β effect on disability progression.51 The role of relapses as a valid surrogate for disability progression over a 2-year timeframe has been studied in IFN-β-treated53 and natalizumab-treated54 patients, in whom relapses accounted for 62% and over 80%, respectively, of the treatment effect on disability. Taken together, these observations provide a solid basis for considering MRI activity and relapses as potential markers of response to IFN-β, and they suggest that patients with MS who are likely to derive an IFN-β-induced benefit with regard to disability progression can be identified early by their MRI lesion and relapse activity.

The observation that both clinical relapses and active lesions on MRI are good surrogate markers of clinical disability in MS supports the use of short-term changes in these markers as predictors of clinical outcome during the subsequent years.26,28,29,30,31,55 However, the MS literature has provided heterogeneous results on this topic. When the impact of relapse numbers on response to IFN-β during treatment was evaluated, some studies found that relapses were not significantly related to response to IFN-β,28,31 while others showed a close correlation between short-term relapses and long-term disability progression during IFN-β treatment.29,30,32,33

Reports of the relationship between MRI lesion activity and patient response to IFN-β have also varied. In some studies,26,31 the presence of lesion activity over the first year of treatment did not influence the probability of disability progression over the follow-up period. By contrast, the development of one new T2 lesion greatly increased the risk of disability progression over 4 years in another study.29 Most studies did show that MRI lesion activity was associated with high relapse frequency over the follow-up period.26,28,29,30,31

In this context, it must be stressed that while the definition of clinical relapses is usually quite homogeneous across studies,56 the same cannot be said for the definition of MRI activity. This parameter has been defined as the presence of new T2 and Gd-enhancing lesions during the first year of IFN-β treatment in some studies,26,28,55 and as the number of new T2 lesions29,30 or new Gd-enhancing lesions31 in others. Recently, the concept of 'complete or transient MRI remission'—defined as the total or transient absence of new Gd-enhancing or T2 lesions on frequent MRI scans—was introduced.27 In general, given that T1-weighted Gd-enhancing lesions are transient (with a time course of about 1 month57), they should be used only in studies involving very frequent MRI scans (monthly or bimonthly) to allow meaningful counting of acute lesions. By contrast, the T2 signal alteration that signifies new lesion formation, although technically demanding and time-consuming to detect, remains stable over time, thereby allowing accurate counting of accumulated new lesions even when MRI scans are performed with low frequency (for example, yearly).

Heterogeneity across studies also exists in relation to the number of MRI lesions considered to be the cut-off for significant disease activity. This value is usually fairly arbitrary, and tends to be based more on common sense28,31 than on scientific evidence. Two studies based on a 2-year placebo-controlled trial32,55 set the cut-off for MRI variables as the median in the placebo group of new Gd-enhancing lesions (two or more) and T2 lesions (three or more) over the trial period. When the best cut-off for new T2 lesions was estimated on the basis of an optimization procedure, however, the result was four to five new T2 lesions.30

Interestingly, it was recently suggested that the presence of new T2 lesions should be interpreted in relation to the timing of the reference scan and the pharmacodynamics of the therapy.58 Since some drugs may take up to 6 months to become effective, the MRI scan to use as a reference to assess new T2 lesion formation could be obtained 6 months after initiating therapy, so that new T2 lesions on subsequent follow-up scans could be interpreted without the uncertainty of whether they developed before the drug became effective.

Scoring the response to IFN-β therapy

As discussed above, both clinical relapses and active lesions on MRI have been shown to represent good surrogate markers of clinical disability in MS. Indeed, an analysis published in 201153 showed that, according to the Prentice criteria for surrogate marker validation,49 both 1-year MRI active lesions and clinical relapses independently accounted for more than 60% of the effects of IFN-β on 2-year EDSS worsening. Most importantly, when 1-year MRI active lesions and relapses were used as markers in combination, the effects of IFN-β on 2-year disability progression seemed to be entirely mediated by the treatment-induced reduction in the numbers of MRI active lesions and relapses during the first year of treatment. Overall, the difficulty of finding homogeneous results when using the different markers of response in isolation, together with the observation that the effects of IFN-β can be predicted at the individual-patient level by the combined effects on 1-year lesion activity and relapses, suggests that IFN-β-induced benefits on disability progression in patients with MS can be better (and more promptly) identified by the use of composite clinical and MRI scores.

Development of scoring systems

The combined use of disease activity parameters to predict response to therapy is the basis of the Treatment Optimization Recommendations published by the Canadian MS Working Group (CMSWG)40,58 and subsequently tested on the PRISMS (Prevention of Relapses and disability by IFN-β1a Subcutaneously in Multiple Sclerosis) trial data.59 The CMSWG describes a model, derived from an expert consensus, that is based on different levels of disability progression, relapse and MRI activity during treatment, which can be classified as 'notable', 'worrisome' and 'actionable'. Although this scheme does not provide quantitative rules for indicating when to switch treatment, when applied to the PRISMS data (using relapse and disability progression only) it was shown to be able to identify a group of suboptimal responders, 89% of whom had a continued breakthrough in terms of relapses and progression. In a recent revision of the recommendations,58 the CMSWG proposed a more current approach for assessing suboptimal response to treatment, suggesting that a change in treatment may be considered in any patient with RRMS if there is a high level of concern in any one domain (relapses, progression or MRI findings), a medium level of concern in any two domains, or a low level of concern in all three domains.

In a study published in 2008, Río and colleagues analysed a clinical data set of 222 patients with RRMS, each of whom had been treated with one of several formulations of IFN-β for more than 1 year.28 On the basis of their findings, the authors proposed a more quantitative version of a composite score. The new scoring system involved the combined assessment, at 1 year from the start of treatment, of the presence of clinical relapses, disability progression (as measured by an increase of 1 EDSS point confirmed at 6 months) and active MRI lesions (that is, more than two new T2 or Gd-enhancing lesions) to identify patients with a poor outcome during the subsequent 2 years (Table 2). Patients who were positive for at least two of the three criteria analysed after the first year of IFN-β therapy were found to have a higher probability of experiencing disability progression or showing relapse activity at follow-up. These individuals would, therefore, be strong candidates for a switch of treatment. Of note, the isolated presence of relapses or MRI activity after 1 year of treatment did not significantly predict the risk of new clinical activity or disease progression in the ensuing 2 years. In addition, an increase in disability alone during the first year of treatment was a poor predictor of subsequent disability progression.

Table 2 The Rio and Modified Rio Scores

Prompted by these seminal papers, a recent study30 proposed a simplified version of the Rio Score (the so-called Modified Rio Score). The new score was based on an analysis of the treatment arms of the PRISMS study8 that included 365 patients with RRMS who were treated with two doses of subcutaneous IFN-β1a (training set). In light of the observation that active MRI lesions and relapses over the first year of IFN-β treatment were able to mediate the full effect on disability progression,53 only these markers were used as score components. Cut-off values for the number of relapses and the number of new T2 lesions counted over the first year of treatment were established by statistical modelling (Table 2). To avoid overfitting, the results of this analysis were validated on an independent data set—namely, the one that was used for developing the original Rio score.28

The Modified Rio Score grouped patients into three risk groups (Table 2). The validation exercise established a probability of disability progression of 24% in the low-risk group, 33% in the medium-risk group and 65% in the high-risk group. Patients classified as medium-risk by the Modified Rio Score are the most difficult to classify in terms of treatment response and planning. A study has shown that further evaluation with an MRI scan and clinical visit 6 months after the first year of therapy could allow better classification of these patients.59 On the basis of these findings, an evidence-based quantitative algorithm to monitor response to IFN-β can be proposed (Figure 1).

Figure 1: An evidence-based quantitative algorithm to monitor response to IFN-β.
figure 1

This proposed algorithm is based on the Modified Rio Score for the assessment of the risk of progression over 4 years in patients with multiple sclerosis treated for 1.5 years with IFN-β therapy. *Substantial new T2 activity is defined as >4–5 new T2 lesions in 1 year of treatment,30 or >1–2 new T2 lesions if the reference MRI scan to assess new T2 lesion formation is obtained at least 6 months after initiating therapy.58

In a preliminary study, both the Rio and Modified Rio scores were further validated in an independent large data set of 516 IFN-β-treated patients with at least 5 years of follow-up in a clinical setting.39 Both score systems were confirmed to provide good discrimination of patients at risk of disease progression at 1 year from treatment initiation.

Provided that they have a homogeneous definition and are used in the same way in different cases, the components of a score system need not be limited to the 'classic' markers such as MRI activity, clinical relapses and disability progression. In a 2-year study, the combination of MRI activity and positivity for anti-IFN-β neutralizing antibodies (NAbs) during the first 6 months of IFN-β therapy correlated with the probability of non-response better than did the two components alone, and showed good sensitivity in identifying patients with relapses or disability progression in the subsequent 2 years.26 In addition, a recently published study in patients with CIS showed that a score based on quantitative MRI markers (baseline T2-lesion volume and changes in the area of the corpus callosum in the first 6 months of IFN-β therapy) could identify patients at high risk of clinical conversion in the 2 years that followed.61 These studies suggest that new markers have the potential to improve the prediction of response to IFN-β when added to the more conventional criteria, and warrant validation in future prospective studies.

Advantages and limitations of scoring systems

In their daily clinical practice, clinicians must decide on the best therapeutic approach for a given patient. In MS, these decisions are mainly based on clinical and MRI activity, although, as discussed above, evidence-based procedures are lacking. The main relevance of the scoring systems is to provide evidence-based, quantitative decision rules that integrate different parameters of disease activity.

We should stress that the combined score systems perform suboptimally when evaluated as classifiers. Taking the Modified Rio Score as an example, this system classifies patients enrolled in the PRISMS study into three groups with probabilities of progression of about 30%, 40% and 50%, compared with a probability of around 40% for the whole group.30 The gain in classification afforded by this score, therefore, seems modest. However, these limitations could be at least partly attributable to specific characteristics of MS, such as its clinical variability (for example, non-responding patients can have periods of stable disease, and patients classed as responders can experience disease activity), and its imperfect response to therapy (for example, 'true' responders to IFN-β are not cured and cannot, therefore, be expected to have a probability of progression equal or close to zero). In this context, the proposed scores should not be evaluated according to their performance as classifiers but, rather, their discriminant ability should be weighed against the low differences in disability progression that are detectable between groups of patients with MS.

To clarify this point, in Figure 2, disability progression over 2 years in patients enrolled in the PRISMS trial is shown; the placebo group is compared with the IFN-β-treated group, which is split into responders and non-responders as classified by the Modified Rio Score.30,59 Of note, the disability progression curve of the group of non-responders (Figure 2) overlaps substantially with the progression curve of the placebo group, while the group of responders shows a substantially lower rate of disability progression over 2 years. The difference between the responder and placebo groups equates to a hazard ratio (HR) of 0.48; that is, a reduction in the progression rate of about 52%. Seen in this way, this difference is rather high—more so, for example, than the difference in disability progression induced by two of the most effective drugs, fingolimod and natalizumab, versus placebo. Data from clinical trials show that fingolimod reduces the rate of progression of disability over 2 years by about 30% (HR = 0.70)11 and natalizumab by about 42% (HR = 0.58).9

Figure 2: Probability of disability progression over 2 years in patients with multiple sclerosis enrolled in the PRISMS trial.
figure 2

The placebo group (blue line) is compared with the IFN-β treated group, which is split into responders (pink line) and non-responders (green line) as classified by the Modified Rio Score.30,59 Note that the disability progression curve of the group of non-responders overlaps substantially with the progression curve of the placebo group, whereas the group of responders shows a substantially lower rate of disability progression over 2 years (HR = 0.48).

On the basis of these data, one can state that the use of scoring systems in MS—though open to improvement through integration of new and more disease-specific components—does provide good discrimination between responders and non-responders to IFN-β treatment among patients with MS. However, the proposed scoring systems all require further validation in large cohorts of patients with RRMS undergoing IFN-β treatment, ideally collected from clinical practice databases, since the close monitoring performed in clinical trials can overestimate the sensitivity of the scores when translated into clinical practice.

Conclusions and future directions

The advent of a large number of new therapies for MS warrants the development of tools to select the best treatment for each new patient, and to identify factors that can predict whether that patient will respond to the selected therapy. Such tools would enable early, evidence-based and individualized decisions to be made on this crucial clinical issue. An integrated approach is required, whereby clinical and paraclinical biomarkers are combined to accurately quantify individual risk and guide patient-specific treatment strategies. In the future, this approach should incorporate new imaging and laboratory biomarkers that have shown potential as predictors of treatment response.

A large number of imaging studies have provided evidence that measurement of focal T2 lesions that accumulate during the disease course is not sufficient to properly profile MS clinical heterogeneity and monitor its progression.62 Recent data have shown that among the different MRI measures, brain atrophy is the one that many investigators deem as the most promising for the future.63 Likewise, genetic factors have been evaluated in transcriptional profiling studies,64 and changes in gene expression profiles have been detected in patients exhibiting differential treatment responses.65,66,67 A review of biomarker approaches at the protein and mRNA level, involving measurements of NAb titres and IFN-β biological activity,68 concluded that the presence of NAbs is associated with a reduction in the clinical efficacy of IFN-β. Finally, a biological association between levels of IL-17F (a cytokine produced by T helper 17 cells) and IFN-β has been hypothesized, on the basis of the observation that MS patients who did not respond to IFN-β had higher levels of IL-17F than did patients who responded to this treatment,69 although more-recent publications using larger cohorts could not replicate these observations.70,71

The quest for new and relevant biomarkers must be coupled with a search for a shared, standardized and validated definition of response to treatment in patients with MS. Without such a standardized classification method, studies aimed at determining the underlying mechanistic basis for treatment response—for example, pharmacogenetic or genomic studies, for which treatment response is the dependent variable—cannot be effective. This issue is just part of the more relevant problem of the inherent imprecision of the measures used in most of the MS studies referenced in this Review.

The future challenges for a personalized approach to the treatment of MS based on combined scores will be threefold. First, more-precise and meaningful measures of disease progression, together with standardized definitions of response to therapies, must be defined and acknowledged by the MS community. Second, new studies are needed to clarify the value of new and promising biomarkers that can be integrated with paraclinical and clinical variables into predictive scores. Last, the applicability to clinical practice should be taken into account, so as to generate scoring systems that are simple enough to be implemented in any clinical setting.

Review criteria

A PubMed search was done for English-language articles focusing on human studies. The search was run using the following terms: “multiple sclerosis” AND “interferon beta” AND “response” in Title/Abstract, which generated over 395 citations. The search was subsequently refined by screening the abstracts for relevant content.