Physicist Niels Bohr famously said that “prediction is difficult, especially about the future”. Perhaps even more so than in physics, this statement holds true in biology. Nevertheless, biologists have repeatedly attempted to predict how their systems will fare in the future. When it comes to systems that evolve rapidly in response to changes in their environment, prediction becomes even harder. Despite this formidable challenge, Łuksza and Lässig1, on page 57 of this issue, Łuksza and Lässig1 tackle the problems of predicting how the influenza virus will evolve from one year to the next, and how such prediction can improve seasonal-flu vaccines.

Certain aspects of influenza's evolutionary dynamics are, if not predictable, at least highly repetitive. Because the virus infects up to 15% of the human population each year2, most individuals have some degree of immunity. However, new strains carrying mutations in epitopes (protein regions that are recognized by human antibodies) regularly arise. These strains initially have a fitness advantage over previously dominant strains because they can more effectively escape a host's immune response. As a result, they rise in frequency and, in doing so, deplete their own supply of susceptible hosts such that even newer strains gain the advantage. This continuous process of evolution — referred to as antigenic drift — results in rapid turnover of the viral population and thus the possibility of an individual becoming reinfected with flu within a few years. Moreover, it leads to the need to regularly update the composition of seasonal-flu vaccines3.

Although flu's antigenic drift is well characterized, predicting exactly what strains carrying which antigenic mutations will circulate in the future remains problematic. This is largely because the stochastic nature of the mutational process itself leads to uncertainty over what mutations will arise. Apart from this, even predicting the fate of strains currently residing in the population is a formidable challenge, because multiple strains carrying different combinations of mutations co-circulate and to some extent compete with one another for susceptible hosts. The predictive model that Łuksza and Lässig present addresses this problem by targeting a somewhat more manageable question: can one predict changes in the frequencies of groups of viral strains (clades) from one year to the next? The answer seems to be yes, and with considerable accuracy.

At its core, Łuksza and Lässig's model predicts viral clade frequencies in a given year using strain frequencies and fitness values from the preceding year (Fig. 1). Its effectiveness therefore relies on how accurately the model assigns fitness values to strains, a difficult task to do well, given that we have little understanding of how any individual mutation affects fitness. To make this task feasible, the authors consider only the fitness effects of two classes of mutation — epitope and non-epitope mutations — in the haemagglutinin surface protein of the virus.

Figure 1: Forecasting changes in influenza-clade frequencies.
figure 1

Łuksza and Lässig's model1 predicts the frequencies of viral clades (blue, green and purple) in one season from the frequencies and fitness values of individual strains in the previous season. An individual strain's fitness is inferred by incorporating the beneficial effects of mutations in epitope regions of the viral haemagglutinin protein (inset, black dots) and the deleterious effects of mutations in non-epitope regions (inset, red dots). (Haemagglutinin image: David S. Goodsell/http://doi.org/c3nqkh.)

Although simple, this approach has a clear biological rationale. Mutations at epitopes are likely to be beneficial to the virus, because they alter the structural features targeted by host antibodies. Thus, a strain can have higher fitness than its competitors by being antigenically more distinct from previously circulating strains. By contrast, mutations outside epitope regions are often deleterious because they reduce protein stability or upset evolutionarily conserved viral functions. By training the model on the evolutionary dynamics of historical strains, the authors were able to estimate the fitness effects of these two classes of mutation, and thereby to quantify the fitness of currently circulating strains on the basis of the mutations they carried.

Using these estimates, Łuksza and Lässig projected clade frequencies a year into the future, and determined the accuracy of their model's predictions by comparing the magnitude of predicted clade-frequency changes with that of the changes observed. They found that their model correctly predicted growth in viral clades 93% of the time and correctly predicted decline 76% of the time.

The model's predictive accuracy may be very useful to the network of researchers who determine which strains to include in the seasonal-flu vaccine. Currently, vaccine strains are chosen using assays that quantify antigenic differences between circulating strains4. This approach is highly effective in some years, but antigenic mismatches between the vaccine strain and the strain that ends up dominating the next flu season do occur. Łuksza and Lässig's model provides insight into why such mismatches might arise: deleterious mutations in non-epitope regions might suppress the most antigenically distinct strains that are prime vaccine candidates. The model also delivers another way of choosing vaccine strains — by including strain cross-immunity estimates as well as the inferred fitness and frequency of current strains.

Although Łuksza and Lässig's model presents a new perspective on what contributes to viral fitness and what aspects of flu evolution can feasibly be predicted, it also points to necessary future work. First, the model assumes a simple relationship between the genetic distance between strains and the extent of cross-immunity that they induce in a host. However, antigenic analysis has shown that some amino-acid changes in epitope regions have only a slight antigenic effect, whereas others have a pronounced one5. Incorporating a more empirically informed 'map' of the associations between viral genotypes and their antigenic characteristics might increase the model's predictive power. Second, although the authors have already taken the important step of broadening predictive models to include the effects of non-epitope mutations in the haemagglutinin protein, viral fitness also surely depends on the virus's seven other gene segments. Incorporating whole-genome analysis into the model may therefore further improve prediction, especially if mutations interact in a non-additive manner (epistatically) across gene segments.

Finally, in terms of strain selection for vaccines, it is worth bearing in mind that the ultimate goal of vaccination might not be to reduce the number of flu infections, but rather to minimize the number of flu deaths or the overall economic cost of infections6. Luckily, modifying the model to incorporate such aims seems relatively straightforward, provided that sufficient data are available to quantify how viral strains differ in virulence or other relevant properties. Thus, although further work is needed, it is clear that Łuksza and Lässig have significantly advanced the difficult task of flu prediction, especially about the future.