Public health strategies aimed at disease prevention or early detection and intervention have the potential to advance human health worldwide. However, their success depends on the identification of risk factors that underlie disease burden in the general population. Genome-wide association studies (GWAS) have implicated thousands of single-nucleotide polymorphisms (SNPs) in common complex diseases or traits. By calculating a weighted sum of the number of trait-associated alleles harboured by an individual, a polygenic score (PGS), also called a polygenic risk score (PRS), can be constructed that reflects an individual’s estimated genetic predisposition for a given phenotype. Here, we ask six experts to give their opinions on the utility of these probabilistic tools, their strengths and limitations, and the remaining barriers that need to be overcome for their equitable use.
What are the applications of polygenic scores in biomedical research? Are some more suitable than others?
Iftikhar J. Kullo. The study of complex trait genetics builds on nearly a century of work in agricultural genetics, in which the concepts of trait variation and heritability emerged from the study of phenotypes in related individuals, without actually measuring genetic variants1. The availability of genotyping arrays enabled genome-wide association studies (GWAS) and subsequently, calculation of polygenic scores (PGSs; also called polygenic risk scores, PRSs), which provide a measure of genetic liability to disease and explain (to varying degrees) inter-individual variation in biomedically relevant quantitative traits. PGSs have become an intensely active area of research, with a major focus on their use for estimating disease risk, as they are only modestly correlated with conventional risk factors and family history of disease and provide incremental predictive information. Consequently, methods that integrate PGSs and other genetic risk information, family history and clinical variables, are being developed to generate comprehensive risk scores with greater accuracy.
Beyond risk prediction, PGSs provide insights into disease prognosis, mechanisms and subtypes, genetic architecture of disease, causality through Mendelian randomization studies and pleiotropy through phenome-wide association studies. PGSs may also help to explain variability in penetrance and expressivity of monogenic disorders. Additionally, PGSs could be used to enrich clinical trials in patients with desired risk profiles, thereby reducing the cost and time to undertake such trials, predict variation in drug response and susceptibility to adverse drug reactions and potentially inform targeted therapy on the basis of the disease pathways that are found to be activated.
Cathryn M. Lewis. PGSs have transformed genetic research studies, allowing the estimation of risk, dissection of within-disorder heterogeneity and exploration of shared genetic components between distinct disorders. Furthermore, we have shown that multiple scores from different traits better predict outcomes, in general, than scores from a single trait2,3.
Michael Inouye. PGSs have a wide range of potential applications. These include enhancing disease risk prediction and population screening (for example, in breast cancer and cardiovascular disease), refining diagnoses (for example, in type 2 diabetes mellitus (T2DM), coeliac disease and ankylosing spondylitis), slowing disease progression and recurrence (for example, through targeting therapeutics such as statins or PCSK9 inhibitors) and improving clinical trials either by focusing on those with higher polygenic risk and thus rates of incident disease, or by identifying subgroups in whom there is increased or decreased drug efficacy. There is also the potential for PGSs to prompt risk-reducing behaviours; however, behaviour change is difficult in general and strongly depends on the manner in which information is communicated — and we are still fairly naive about how to communicate polygenic risk. The International Common Disease Alliance (ICDA) PRS Task Force has recently done a deep dive on the potential benefits, risks and gaps to enable responsible clinical use of PGSs4.
Beyond clinical applications, it has been shown that PGSs can be used to dissect disease biology. By integrating PGS with proteomic or other multi-omic data, we can identify the molecules through which polygenic disease risk ‘flows’ and thus identify which modifiable (potentially druggable) targets we could target to disrupt the effect of a PGS5.
Alicia R. Martin. There is a huge range of applications of PRS in research contexts. The simplest predicts risk of a particular disease directly studied by GWAS. Researchers studying cardiovascular disease and some cancers, such as breast, prostate and colorectal cancer, have been leading the charge on clinical implementation and therefore maximizing PRS accuracy in those areas. Beyond using PRS to predict and understand coarse phenotypes typically ascertained through GWAS, there is great interest in deeper phenotyping studies. For example, researchers are interested in using PRS to understand how likely it is that an individual will be hospitalized or respond to classes of pharmaceuticals, which are hard to measure at GWAS scales. This has opened up a massive cascade of correlation studies in genetic epidemiology that vary enormously in rigour. For example, a very interesting and practical application has been to investigate the fact that cases identified by GWAS are often more extreme than those who typically have a diagnosis listed in their electronic health record (EHR). Specifically, PRS studies have compared predictive utility in a GWAS cohort against biobanks, which can offer practical insights about quantitative differences in translational value6. However, reminiscent of the candidate gene era, in which investigations often found only what was looked for and often without reproducibility, many other studies that correlate PRS from stratified GWAS with phenotypes that are known to differ as a function of geography through no clear genetic link add little or no value. More rigorous studies have reinvestigated signals from polygenic selection studies, describing in detail why such findings may be false and interpretations can be fraught7,8,9.
Another interesting area of PRS and genetic profiling studies is in better understanding disease subtypes, such as in breast cancer10 and T2DM11.
Samuli Ripatti. PRS estimates an individual’s risk of a disease due to genetic factors. PRSs are in a way the flipside of heritability: while narrow-sense heritability (h2) estimates the role of additive genetic variation on a trait or a disease in a population, a PRS projects this variation to an individual by summing up effects of genetic variants across the genome.
PRS has turned out to be a quite powerful way to capture an individual’s genetic risk. This is particularly the case in diseases with large-scale GWAS, which provide accurate weights to the effects of individual single-nucleotide polymorphisms (SNPs), such as coronary heart disease, T2DM or prostate cancer12,13. Even if we currently capture only a fraction of the heritability of a disease with significantly associated loci, PRSs using the genetic variation on a genome-wide scale show considerable predictive power in many diseases. Therefore, developing PRSs has become a common follow-up analysis for disease-specific GWAS.
Two key developments in disease genetics have allowed for the derivation of powerful PRSs over the past 10 years or so. First, many large-scale GWAS have been making full summary statistics available to the research community for downstream analyses such as polygenic risk estimation. Second, large-scale biobank projects, such as the UK Biobank and FinnGen, have allowed systematic evaluation of PRSs for many diseases. The availability of well-profiled biobanks with genome-wide data has also facilitated statistical algorithmic development, testing and method comparison to find the best-behaving models for various genetic architectures that underlie diseases. Moreover, having key clinical risk factor measures such as smoking status and cholesterol levels available in biobanks has created opportunities to test the additional value of PRSs over routinely tested risk markers, which is the first step on the way to their implementation for preventive health care. However, it is important to keep in mind that the clinical applications of PRSs differ between diseases. PRSs will likely be easiest to implement in a clinical setting for diseases that have existing actionable risk-lowering or screening strategies, such as coronary artery disease and breast cancer.
Nilanjan Chatterjee. On the clinical side, the main application of the PGS is to enhance risk stratification of a population so that preventive interventions that have risks and benefits can be targeted in a more effective manner14. Whereas risk stratification of populations needs to take into consideration many other factors, including lifestyle factors and other biomarkers, PGSs have the unique advantage that they can be applied early in life to simultaneously assess long-term risk of many individual diseases and conditions, as well as to provide an outlook for broad measures of health such as life expectancy15 and disability-adjusted life-years16. Another potential clinical application is the use of PGSs to adjust diagnostic biomarkers, such as prostate-specific antigen levels, for underlying genetic determinants that contribute to variation in the level of the biomarker across individuals unrelated to disease states17. Finally, beyond clinical applications, PGSs can be a powerful tool for research to establish genetic links between complex traits and biomarkers, providing insight into shared biological pathways and sometimes into directions of causality.
Should it be polygenic score or polygenic risk score? Does the name matter?
M.I. The history of nomenclature in this space is extensive and can be so confusing for newcomers! A name matters insofar as it communicates as much information as possible in the fewest words. For PGS and PRS, the difference of course is ‘risk’. For quantitative traits, clearly the concept of risk does not fit well. For discrete phenotypes (such as heart attack case or control), risk, and thus PRS, would fit that setting. However, the use of the term still needs consideration because ‘risk’ has fairly negative connotations for lay people. ‘Polygenic score’ is a more generic term (all PRSs being PGSs), thus we tend to use that as a default. As PGSs are combined with high-penetrance, rare variants, we may indeed go back to the future and refer to them as genomic (or genetic) scores.
C.M.L. Of the three words in the phrase ‘polygenic risk score’ only the first word ‘polygenic’ is fully appropriate. ‘Risk’ is inappropriate when calculating scores for continuous measures such as body mass index (BMI) or blood pressure. It also implies that the measure has good prediction of a clinical outcome, although most scores are only modestly predictive. When PGSs were first introduced, the term ‘score alleles’ was used in place of risk alleles, as not all variants included in the PGS calculation will be associated with the disorder18. The term ‘score’ is also not intuitive, given the large number of genome-wide variants often included and the complex weighting applied in many PGS methods19. My preferred term is ‘polygenic score’. It is simple, applicable to any trait — continuous, ordinal or bivariate — and can be abbreviated to PGS, if necessary. But there are more important issues with PGSs than the terminology used.
I.J.K. Adoption of standards can facilitate scientific communication and progress in the field. The Clinical Genome Resource (ClinGen) Complex Disease Working Group has proposed a reporting guideline to facilitate benchmarking and evaluation of PGSs20. Typically, the term PGS is applied to quantitative traits whereas PRS is used in the context of disease susceptibility. PGS or the term ‘polygenic index’, however, could be applied in both contexts.
A.R.M. PGS and PRS both refer to our prediction of a phenotype from multiple genetic loci in an individual’s genome using various algorithms. Proponents of PRS might argue that it helps orient the interpretation of the score (that is, a higher PRS correlates with increased risk of disease) whereas proponents of PGS might argue about the sensitivity of calling natural trait variation ‘risky’. The semantic difference of whether we say ‘risk’ does not matter when predicting risk of disease phenotypes from alleles polarized to those that increase the likelihood or risk of developing a disease.
What are the major strengths and limitations of PGS? What are some of the common misconceptions?
I.J.K. A major strength of PGSs is the orthogonal and incremental predictive information captured in these scores, with the potential to improve risk prediction for common (‘complex’) disorders, an area that has remained relatively stagnant for decades. Because common diseases pose an enormous health-care burden, even modest improvements in the accuracy of risk estimates could have a substantial impact. An example is coronary heart disease, for which, nearly 60 years after the original Framingham risk equation was developed, no new biomarker has made it into the equation21. This is in stark contrast to spectacular advances in new drug development during this time. PGSs are most useful for diseases for which algorithms are available to estimate absolute risk, which in turn informs preventive strategies. For example, absolute risk of coronary heart disease and breast cancer, estimated using validated algorithms, guides management based on expert society recommendations.
Current PGSs explain only a fraction of genetic liability to disease and heritability. Some of this gap is because SNPs on genotyping arrays tag causal variants and the actual causal variants are often unknown. It is possible that with very large sample sizes of source GWAS, fine mapping of causal variants and functional annotation of genetic variants, PGSs could eventually explain most of the heritability in a trait. Another limitation is the ‘imprecision’ around a PGS that results when ‘average’ effect sizes derived from large cohorts are applied to an individual. In one study, only a modest proportion of individuals with PGS point estimates in the top decile had corresponding 95% credible intervals within that decile22. A common misconception about PGSs is that they are a diagnostic/screening test, when in fact a PGS is just one more probabilistic risk variable. Health-care providers and patients may perceive polygenic risk as deterministic despite evidence that such risk can be mitigated by lifestyle changes as well as drug therapy23,24.
N.C. There is widespread misunderstanding around the utility of PGS for population risk stratification versus individualized risk prediction. PGSs for several common diseases, such as breast cancer25,26, now have the ability to stratify a population into distinct categories of risk for which the risk–benefit balance for existing interventions is sufficiently different. However, these same PGSs have modest utility when it comes to accuracy of individual-level prediction or risk discrimination. Although this may seem contradictory, the fact is that a model need not have very high individual-level prediction accuracy to have a meaningful population-level impact for guiding interventions. The popular NCI BCRAT model for breast cancer (also known as Gail model), which includes only a handful of risk factors and has modest discriminatory utility (area under the curve (AUC) <60%), has been shown to have the ability to stratify a population into risk categories for which the risk–benefit balance of chemoprevention can differ substantially27. Similarly, several studies have shown that PGSs, despite modest discriminatory ability, can be useful to identify small high-risk populations who would benefit from certain interventions that may not be recommended for the population at large because of risk–benefit balance. It is, however, also important to remember that modest discriminatory ability of PGSs, or any other risk score, implies that most cases of a given disease will occur outside small high-risk groups, and thus broader efforts for more benign interventions need to continue. At the other end, PGS can also be useful to identify low-risk population subgroups who could avoid or delay interventions that have substantial cost or harm associated with them.
C.M.L. The major strength of a PGS is that it captures genetic loading across the genome in a single number. This moves genetics from data on a sequence of 3 billion base pairs to one measure. This simplicity is beguiling and enables us to start to think of a genetic predisposition alongside other single risk factor variables such as cholesterol levels or BMI. But the simplicity is also a compromise, as PGSs alone have low predictive ability. Their predictive ability increases when multiple scores are used and when they are combined with other clinical or lifestyle risk factors.
A strength of PGSs is that they remain constant throughout life: one genetic test can be used to calculate a single PGS, or multiple scores, and they can be interrogated when relevant for a clinical problem or at an appropriate age for risk prediction. PGSs could highlight someone at high genetic risk before other risk factors would raise concerns.
One misconception is that PGSs are instantly interpretable and can be applied to reduce risk. In fact, we have little information on how people interpret PGSs, or whether a personal risk assessment based on a PGS will have more impact than population-level risk guidance. For example, does being told you have a high PGS for coronary artery disease encourage a healthier lifestyle? Fortunately, some evidence is emerging that careful communication of risk from genetics and clinical risk factors may motivate study participants to change behaviour28.
A.R.M. A major strength of PRS analyses that we gain from interpreting their relative predictive value is that they are already as predictive as or even more predictive than biomarkers and other predictors used in clinical risk models currently implemented for several disease areas, such as cardiovascular disease29 and breast cancer30. This fact suggests that we could considerably improve the disease risk prediction models that clinicians are already using, which could in turn yield better preventive medicine more generally, at least for some populations. However, to me, the biggest technological limitation to genetics in precision medicine is that PRSs are far more accurate in populations of European ancestry owing to Eurocentric genetic study biases31. This has been a clear issue for a long time, for which proposed large-scale solutions have still not yet been meaningfully implemented32.
A common misconception with PRS relates to genetic determinism, the idea that because risk is genetic, high scores are destined to manifest. Depending on the disease phenotype, environment is often as or more important than genetic risk factors.
M.I. There are both many strengths and limitations and I would defer to more comprehensive assessments of those4,33. In terms of misconceptions, it is quite common to see the (re)emergence of extreme views after major advances; in the case of PGSs, this seems to be a renewed, albeit mistaken, interest in genetic determinism. PGSs are, to some degree, causative of the phenotype they predict (as, of course, the germline genome does not change), but they do not determine it. Most of the public do not have the tools to understand the probabilistic nature of PGSs and that other more modifiable factors (for example, weight, exercise and blood pressure) can still be reduced to minimize absolute risk. In a clinical setting, there is an argument for whether lay individuals need to understand their PGSs; if they do, there’s a need for broader education in risk, statistics and genetics.
What are current concerns surrounding the clinical implementation of PGSs? How can PGSs lead to actionable and cost-effective measures?
C.M.L. Two components are needed for PGSs to be implemented in clinical services: first, a score that shows sufficient predictive ability for the disorder or the trait above currently used prediction models, and secondly, an intervention that can be applied, be it pharmacological, entry to a screening programme or a behavioural intervention.
Clinical implementation of PGSs is currently unproved. In contrast to the many research studies that show that PGSs predict risk of disease across age span, we have little evidence of how they could be implemented clinically, how people interpret their scores and whether they are cost-saving. Some of the earliest implementations are likely to be in conjunction with genetic counselling for rare pathogenic variants for breast cancer or familial hypercholesterolaemia34 as PGSs moderate penetrance for the rare variant. In carriers of a pathogenic BRCA1 variant, having a PGS for breast cancer in the top 10% compared with the bottom 10% increases the risk of breast cancer by age 50 years from 21% to 39%35. These results imply a liability threshold model that encompasses both monogenic and polygenic contributions to disease risk.
M.I. In many use cases, PGSs are at the stage of assessing clinical utility. For the vast majority, we do not yet have strong evidence for or against clinical implementation, which is largely because PGSs are still new to medicine, and they need time to be assessed by the medical community. People also need to remember that PGSs are still rapidly improving in performance (mostly because of larger, more ancestrally diverse sample sizes and better analytical methods for construction), so it is difficult to know what the state of the art is at any point in time. Despite that, my view is that we will likely first see widespread and cost-effective implementation of PGSs for breast cancer (for example, via CanRisk36) and cardiovascular disease37.
As with any new technology that reaches the stage of clinical implementation, there are many concerns, which range from abuses, such as genetic discrimination, to operational issues, such as training a sufficient number of genetic counsellors. The ICDA PRS Task Force covers these well4, and one I would highlight, and which has been the subject of intense research, is the issue of European ancestry bias in the performance of many PGSs (discussed below).
I.J.K. PGSs can be used to improve accuracy of risk estimates for common diseases, thereby bringing genomics to primary care. A major concern is that PGSs perform less well and may not have been validated in individuals of non-European ancestry. Patients and providers should be aware that PGSs are probabilistic not deterministic, and that those with high polygenic risk can reduce adverse outcomes, for example, by lifestyle changes and drug therapy. Given the limited genetic counsellor workforce, disclosure of PGSs at scale will require innovations such as chatbots, videos and pictogrammes. Shared decision-making aids can facilitate the choice of screening and treatment options38. EHRs should be configured to incorporate structured genomic data to generate absolute risk estimates that can be linked to decision support, helping providers to understand and act on PGSs39. For example, care providers, although willing to initiate measures for a high PGS, may feel less comfortable in de-escalating interventions in those with a low PGS38. Other domains that providers may need help with include direct-to-consumer test results and revision of risk estimates if a PGS is updated.
Using data from a single genotyping array to generate PGSs for many diseases or traits, targeting screening and therapy to those at greatest risk and ‘de-implementing’ such measures in those at low risk, could be highly cost-effective, a hypothesis that needs confirmation in appropriately designed studies. It is likely that, in the near future, whole-genome sequencing will be performed as a one-time assay early in life to generate PGSs as well as to identify monogenic predisposition to disease. There are multiple conditions for which PGS could lead to actionable and cost-effective preventive measures, coronary heart disease and breast cancer being the foremost, as validated equations are available to estimate absolute risk and guide preventive strategies.
S.R. In the clinical setting, genetic information is typically captured by asking for a family history of the disease and in some cases such as cancers also with genetic testing for a set of high-risk genetic variants. Neither of these two strategies measures genetic risk very well. Family history has well-documented recall and other biases40, and individual ‘driver’ mutations with varying penetrance typically explain only a small fraction of cases of cancer41. PRSs provide a new tool to better capture an individual’s genetic risk but also modify the risk in individuals carrying high-impact variants or a positive family history42.
Introducing PRSs in the clinical setting is a game changer because, unlike individual high-impact mutations, PRSs behave like a quantitative risk factor and affect us all. This means that new and different automated interpretation and communication tools are needed compared with the mostly manual approach of interpreting individual variants.
N.C. One crucial concern is that premature implementation of PGS could exacerbate existing inequity in health-care systems. Eurocentric bias in existing GWAS has led to PGSs with performance that is subpar in populations of non-European ancestry31. Furthermore, inequity will arise from differential access and/or uptake of the technology, influenced by factors such as ethnicity, education and income level of individuals. Several studies have indicated cost-effectiveness for PGSs for specific interventions, such as cancer screening43, but there is uncertainty regarding future cost of necessary data collection and how it will be paid for. Nevertheless, I feel that the cost-effectiveness of PGS as a technology overall is likely to be high as it can be applied across many health outcomes simultaneously, and it can be evaluated fairly accurately using inexpensive technologies, such as low-pass genome sequencing. Another major concern is that misguided and/or unethical applications of PGS, such as the use of it for embryo screening44, can create public mistrust about the role of PGSs in shaping our society.
A.R.M. One issue is timing; clinical trials are already underway and some guidelines for implementing PRS, for example, in National Health Service (NHS) health checks in the UK37, have been recommended. However, fairly few absolute risk models are used in clinical medicine, actionability thresholds are often unclear (particularly when PRS accuracies vary by population) and communicating risk from PRSs remains challenging and understudied. Relatedly, another issue is that patients may be more likely to view PRSs through a lens of genetic determinism compared with other risk factors, as described above. Another issue is that there are staggering disparities in PRS accuracy across ancestries simply as a function of human history over the course of hundreds of thousands of years and who we have studied to date. Another concern relates to PRS implementation far beyond biomedical use cases that have been well-studied with unclear to minimal benefits and real potential risks, such as embryo selection44,45 and targeted education programmes.
Many diseases involve environmental or lifestyle factors as well as genetic susceptibility. How does this affect PGS, and what can be done to improve risk prediction and help define clinical action thresholds?
N.C. The utility of PGS for clinical applications will depend to a large extent on how it is combined with other information, including that from rare germline mutations, family history, sociodemographic indicators, other biomarkers, and environmental and lifestyle factors. Ideally, we need models that incorporate a variety of different types of risk factor and can produce dynamic assessment of absolute risk of diseases in individuals over time46. Although many factors can contribute to inform risk, different components may be more or less important depending on the life stages and the span of time for which the risk is being predicted. As mentioned earlier, a unique advantage of PGS is that it can be determined early in life and risk associated with it tends to persist over the lifetime, irrespective of other risk factors. By contrast, in middle age, major lifestyle-related factors, such as smoking habit, obesity and physical activity, are going to be more important determinants of risks of many common chronic diseases, such as heart disease and T2DM. Nevertheless, information on PGS can still be relevant, as lifestyle factors and PGS tend to have multiplicative effects on risks, and individuals with poor lifestyle profile and high PGS are expected to experience the highest levels of risk of these diseases. In fact, several studies have suggested that individuals with the highest level of genetic risk will have the most to gain by making better lifestyle choices23. Additionally, various types of existing and emerging biomarkers, such as those based on imaging or proteomic technology, will also have major potential for enhancing short-term risk prediction, but they are likely to need potentially expensive repeated measurements to contribute to risk assessments over longer time spans.
A.R.M. Both genetic and environmental risk factors are important to a varying extent depending on the biomedical domain, so it only makes sense to model both classes of risk factor together. Genetic risk factors are rarely sufficient except in Mendelian diseases or those with extremely simple and well-characterized genetic architectures. Unlike genetic risk factors, identifying environmental risk factors with unbiased approaches can be notoriously tricky. Coupling these in genetic epidemiology studies empowers our dissection of causes from correlations or consequences of disease. Conversely, some genetic risk factors are relevant only in specific environments. For example, genetic variants in CHRNA5 are only likely to have an impact on lung disease risk in smokers. Taken together, these considerations mean that genetic and environmental studies are complementary, with genetic risk factors unlikely to subsume existing clinical models but instead to be considered jointly.
S.R. In diseases such as coronary artery disease, atrial fibrillation and T2DM, there are routinely used clinical risk scores consisting of lifestyle measures such as smoking, laboratory biomarkers and other established risk factors. Interestingly, PRSs are mostly independent of these clinical risk scores, thus providing complementary information to the risk evaluation. Therefore, to get a comprehensive view of a risk for an individual, it is beneficial to measure all sources of risk.
This is particularly the case when evaluating the risk for young individuals aged in their 30s or 40s, where individuals at high risk are typically missed by routine clinical risk scores. This is partly explained by the fact that the clinical risk scores evaluate the risk for a relatively short age window of the next 5–10 years. As most young individuals have very low baseline risk, they are not picked up well by clinical scores even when their relative risk is high compared with others. PRSs not only help to identify high-risk individuals earlier in life but also provide a basis for estimating risk over the whole lifetime, thus allowing better early intervention in slowly developing late-onset diseases.
C.M.L. Most diseases that are a burden to society and the individual are partly genetically determined, and a PGS captures part of that genetic contribution. But that leaves an important contribution from non-genetic risk factors, many of which are known. It would be naive to expect PGSs alone to be sufficient to provide good risk estimation. The crucial question is whether PGS can be a useful component of risk prediction and how that can be implemented.
In one of the first planned clinical studies for PRSs, the Heart study is being piloted in the north of England by Genomics plc and the UK NHS. The study will combine PGSs with QRisk, a prediction tool used in primary care for assessing cardiovascular disease risk, to test whether combining PGSs with Qrisk improves prediction. This real-life implementation of PGSs will provide valuable information on possible clinical utility and acceptability. Further studies showing that PGSs are cost-saving will be important to introduce these novel technologies to the NHS.
I.J.K. When viewed through an evolutionary genetics lens, most common (that is, ‘modern’) diseases can be considered a maladaptation of our ancient genetic background to contemporary environments and lifestyle47. This has important implications for PGSs. First, incorporating signals of selection may enhance the performance of such scores. Second, it is necessary to include environmental and lifestyle factors along with PGSs in risk models. Work in these domains will also advance our knowledge about the basis of variability in disease susceptibility across ancestry groups. Environmental factors are poorly documented in the EHR. We need to move beyond the use of geocodes to wearables and sensors that ‘sync’ with the EHR to provide granular assessment of environmental factors (including water, food and air quality, access to walking paths or bike trails, and social determinants of health) and lifestyle factors (including diet, physical activity, and sleep duration and quality).
To facilitate actionability, PGSs should be incorporated into existing clinical frameworks to generate absolute risk estimates that enable patients and providers to engage in shared decision-making. In the absence of validated risk equations, absolute risk can be derived from epidemiological indices (preferably ancestry specific) such as incidence and competing causes of mortality. The definition of clinical action thresholds could be based on existing guidelines, risk of adverse events in prior clinical trials (as was done in the coronary heart disease prevention guideline)48 or even simulation and modelling.
M.I. There is still a mountain of research to be done for PGSs and environment or lifestyle. In clinical risk prediction, there is always a balance between performance and practicality. Throughout the life course, we should maximize the amount of information that can reasonably be gathered and design risk prediction models that are flexible enough to extract useful information out of both complete and incomplete data. As a general principle, whenever possible, genetic information should be combined with non-genetic information to offer the greatest clinical utility. However, we may find ourselves in situations where the only available data are a PGS, sex and age, and we should think more about how to handle such a scenario.
For clinical action thresholds, this will depend on the use case. In general, PGSs improve individual-level risk predictions, and this may (or may not) improve the classification performance of a particular clinical action threshold. Given that PGSs are a broadly applicable technology with potential clinical utility for many use cases, it would make sense to revisit many clinical thresholds after incorporation of polygenic risk to ensure that we have the most effective models and decision points possible. Furthermore, PGSs enable and enhance primordial prevention, which focuses on prediction and prevention of the antecedents of disease as early as possible.
The scores have largely been calculated from populations of European ancestry. What does this mean for transferability of scores across populations?
A.R.M. We have shown that PRS accuracy decreases with increasing genetic distance from European ancestries owing to vast Eurocentric genetic study biases31,49. These are not small differences across populations. For example, we showed that PRSs across a range of phenotypes are about twice and four to five times more accurate in individuals with European than East Asian or African ancestry, respectively. In the USA, PRS accuracy often tracks with health disparities and health-care coverage, so it is easy to see how incorporating PRS into clinical risk models currently would worsen disparities across populations. This is a major issue that differs quite substantially from other biomarkers in some key ways. For example, say you go to your primary care provider for an annual physical and they order a complete blood count. They read that panel on the same scale for everyone, regardless of your ethnic background. With genetic data, we have to take your ancestry into consideration because PRSs are fundamentally more or less useful right now depending on who is in your family tree and how closely related they are to people who have traditionally been studied in genetics. The only way to address this issue is to enrol more diverse participants in our genetic studies, which is much easier said than done. Right now, participants are not reflective of the global population or even national diversity in the USA. Instead, they correlate rather well with the make-up of NIH-funded investigators, suggesting that an upheaval of the incentive system in genetics is crucial to advance the mission of diverse genomic studies. This goes against foundational roots in the field of genetics as well as ingrained practices in genomics. My colleagues and I have recently outlined a roadmap for increasing diversity in genetic studies32.
M.I. Indeed, the issue of European ancestry bias in the performances of many PGSs may lead to increasingly inequitable health care31. This is a pressing issue for many PGSs, but it is important to remember that no medical tool is unbiased; indeed some, such as pulse oximeters, seem to have been initially designed with little consideration for black or brown skin. What the genetics community now has is an opportunity to address historic biases and injustices that have propagated to the present day. With many large and genetically diverse studies recruiting around the world and our data scientists focusing on equitability, we are in a very exciting time, and the genetics community can be a leading light for many other fields as we create more equitable tools using more representative data. Importantly, this extends to improving the diversity of the genetics and genomics workforce so that it reflects our data, thus enriching both our ideas and our approaches.
I.J.K. European ancestry-derived PGSs perform poorly in individuals of African ancestry50 because of differences in linkage disequilibrium, allele frequencies, causal variants and causal variant effect sizes, tempering enthusiasm to use such scores in the clinic and galvanizing efforts to further explore this phenomenon and improve portability. The performance of PGSs in non-African non-European ancestry groups generally lies between that in African and European ancestry groups. Despite being weaker, PGSs in non-European ancestries may yet be useful in the clinical setting if validated effect estimates are available. The US National Human Genome Research Institute (NHGRI) has initiated the PRIMED consortium to develop robust PGSs for common diseases in different population groups. Broadly, work in three areas is needed to improve the accuracy of PGSs in diverse populations: increase the size of GWAS cohorts for diverse ancestry groups; establish collaborations to share extant data sets for meta-analysis; and develop new population and statistical genetics methods to improve the performance of existing PGSs in diverse population groups. The latter includes fine mapping to identify causal variants, functional annotation of genetic variants and examining correlated traits. Measures of local ancestry are being evaluated to apply PGSs to variably admixed groups such as African Americans and Latinos. With globalization and diminishing geographical barriers, the number of individuals who are mosaics of different ancestry groups will continue to increase.
C.M.L. The single most troubling aspect of PGSs, particularly as we move from research towards clinical implementation, is that prediction is highest in populations of European ancestry. The level of prediction drops substantially in other ancestries, to approximately half for people of Asian ancestry, and to one-quarter in people of African ancestry, where the original genetic studies were performed solely in populations of European ancestry31. It is difficult to justify maintaining momentum of a potentially transformative scientific advance knowing the barriers that exist in implementation in many communities that already have weaker health-care provision. It is essential that genetics should not exacerbate existing health-care inequalities. Raising the predictive ability in all population groups to the level attained in those of European ancestry will require substantial investment in primary genetic studies performed worldwide and in minority communities. This is now a priority in genetic research, but belated action will take time to fill the gap.
Other advances in ensuring that PGSs are uniformly applicable will come from in silico approaches. For example, software that integrates discovery genetic studies across populations, such as PRS-CSx51, increases prediction. Cross-population differences in allele frequencies and linkage disequilibrium contribute to prediction differences, and may be reduced by integrating fine-mapping studies, which aim to identify the causal variant. All these approaches will likely be needed to minimize the predictive gap between groups and to build scientific maturity for polygenic scores.
S.R. We need both larger GWAS in individuals of non-European ancestry and better algorithms to capture the rich ancestral backgrounds of varying mixtures of ancestry. Luckily, both are currently being developed and tested in global networks of biobank projects such as the Global Biobank Meta-analysis Initiative (GBMI) and International Consortium for Integrative Genomics Prediction (INTERVENE).
N.C. There are now plenty of empirical studies that have shown that the current PGSs, because of their European ancestry bias, underperform in non-European populations, and the drop in performance is particularly stark for the populations of African and South Asian origin. Clearly, we need to build large GWAS for non-European populations. We also need better statistical methods that will allow borrowing of information across populations to derive optimal PGSs for all populations. It is, however, unclear that even with larger studies and better methods, the disparity in PGS performance across populations will go away anytime in the near future. In particular, even if sample size increases across all populations, the gap in performance of PGS may remain, or even grow, unless the sample size for the underperforming populations grows much more rapidly. One recent study for blood lipid traits has suggested that in spite of disparity in sample sizes it may be possible to have ‘equal’ performance of PGS across populations52, but it remains to be seen whether similar results will hold up for other traits.
Going forward, we need a better framework, informed by health disparity and inequity considerations, to decide what should be the criterion for deriving and delivering PGS for multi-ancestry populations. Should we aim to generate a single PGS that will have the most ‘equal’ performance across different ancestry groups? What does ‘equal’ even mean if the baseline risks of diseases and/or distribution of other risk factors differ for different ethnic and ancestry groups? Or should the aim be to generate the best possible PGS for each distinct population using all available data? How should PGSs be generated and delivered in populations that show a high degree of population substructure and admixture? Recent efforts such as the NHGRI PRIMED consortium, which aims to bring together GWAS data sets across diverse populations and researchers from various disciplines, including, but not limited to, genetics, epidemiology, statistics, health disparity and bioethics, hold the promise to move the field forwards.
What are some of the imminent challenges that the field needs to address going forward?
M.I. There are of course many challenges for PGSs and, of those not already mentioned, I would flag several. Although PGSs are not expensive to produce (a one-time cost per individual of ~£50 for reagents, array and bioinformatics to calculate 100s to 1,000s of PGSs), we do need cost-effectiveness studies as part of their clinical utility evaluation. There is also a substantial risk that regulatory frameworks internationally are poorly prepared to deal with PGSs, even down to their legal classification, which may hinder their implementation4. Private companies are also offering PGSs for the selection of embryos as part of in vitro fertilization. Although polygenic embryo selection is not widespread, the community should be vocal in calling out this unproven and unethical practice44,53.
Finally, there is a long history to the core values of the genomics community, namely data sharing, open science and FAIR (findability, accessibility, interoperability and reusability) principles. But, recently, this has become noticeably more difficult54. Now and in the future, we need to ensure that PGSs are FAIR and do not become a black box. They should be open for humanity and face rigorous evaluation. To that end, the Polygenic Score Catalogue55 and Polygenic Risk Score Reporting Standards (PRS-RS)20 have been quite helpful, but more needs to be done to ensure the principles of the Human Genome Project are maintained as we move down a winding path to clinical benefit.
S.R. We need to genotype more samples with non-European ancestry and develop more ancestry- and culture-aware risk algorithms. This applies to both genetic and non-genetic risk factors. We need to better utilize the rich health history that is routinely collected in EHRs and health registries, but also by wearables and other automated lifestyle trackers. Similarly, we need to integrate the full allele frequency spectrum into the PRS algorithms and develop new integrated risk communication tools.
We also need to identify the key clinical decision-making events over the life course and how best to integrate the rich genetic information to guide clinical decisions in those moments. For cardiovascular health, we have efficient risk-lowering interventions through statin use and lifestyle changes23,24, and we also know that the knowledge of elevated genetic risk motivates individuals to positive health actions28. Therefore, integrating PRSs into primary prevention will allow us to measure an individual’s baseline risk much more precisely than is currently done and to target interventions earlier in life.
In breast cancer, earlier start of targeted screening and sometimes also preventive surgery are recommended for women with high lifetime risk. The PRS for breast cancer has been shown to identify 10% of women in a lifetime cumulative incidence of >30% and to modify the effect of family history and high-impact variants42. As knowledge of elevated genetic risk may cause anxiety, and persons carrying high-impact pathogenic variants are faced with the difficult decision of prophylactic surgery, it is important that the risk assessment is as comprehensive as possible, including polygenic risk.
N.C. Imminent challenges as I see it are: the lack of large disease-specific GWAS as well as large and well-characterized prospective cohort studies, such as the UK Biobank, in many populations of non-European ancestry; the lack of an ‘algorithmic fairness’ framework for developing and implementing PGS for diverse populations, acknowledging the fact that significant disparity in the performance of PGSs by ancestral groups may remain in the foreseeable future; and limited research on data integration methods that allow the building of comprehensive risk prediction models, including PGSs as well as a host of other risk factors, by combining information across diverse and disparate data sets. For the clinical implementation of PGSs, there are also significant barriers related to risk communication and perception, mechanisms of cost-effective delivery and integration with other risk factors, and eventually demonstrating clinical utility of integrated risk models in improving health outcomes through combinations of randomized trials and real-world evidence in health-care delivery settings.
C.M.L. Communication of PGSs will require a conceptual move away from a common understanding of genetic risk as a yes/no binary variable to the framing of a PGS as a continuous measure of genetic loading. This will require careful communication of both relative and absolute risks56 available, in person or with online tools, given the access to genetic data from direct-to-consumer genetic testing and ancestry companies. Clinical genetics may move from a specialist clinical service to be embedded in each clinical speciality, requiring substantial workplace training to ensure it reaches its full potential.
Although the focus of PGSs has been on predicting risk of developing a disorder or disease, this is only helpful where interventions exist to reduce risk. Other end points may be better targets of prediction, such as prognosis or treatment response. PGSs for these outcomes are not widely available and require different discovery studies from case–control GWAS for disease risk.
If access to genetic information increases substantially over the next decade, how will this be regulated? People can calculate their own PGSs from genotypes downloaded from direct-to-consumer genetic testing companies57. Within clinical settings, a genetic test performed for a single indication allows PGSs for all disorders to be calculated. How will this be actioned, and how will access be enabled or protected? These scientific, ethical and practical challenges will need to be dealt with to ensure genetic studies reach their potential for improving health worldwide.
A.R.M. As PRSs are already available to physicians and through direct-to-consumer companies, we urgently need clinical trials that include diverse participants that measure the long-term utility of PRSs in practice. Our clinical models need to adapt and be updated by incorporating PRSs on an absolute risk scale with clinically actionable and reasoned thresholds. Just as importantly, we need to experiment with communicating risk from PRSs to a general audience of non-scientists. Most urgently, genetic studies need to diversify immensely for PRSs to be broadly useful in biomedical contexts, as described above. The NHGRI-funded PRIMED consortium offers one great step forward, but will alone be insufficient to close these massive gaps, so we truly need a field-wide effort to rectify these imbalances.
I.J.K. Imminent challenges include the need to improve the performance of PGSs, make these portable across ancestry groups, and applicable to admixed individuals. Transitioning PGSs to the clinical realm will require experts in population genetics, translational genomics and implementation science, cost and outcomes researchers and ethicists to work together. In parallel, there is a need to educate the public, patients and providers about the numeracy aspects of polygenic risk estimates. Training pathways in genomic medicine, in both primary care and speciality settings, are necessary58. Prospective studies of outcomes following the return of PGSs in the clinical setting are scarce; the MI-GENES clinical trial demonstrated that incorporating a PGS for coronary heart disease enabled shared decision-making related to statin use and lowered low-density lipoprotein cholesterol levels compared with participants who received a conventional risk score38; clinical trials are underway to evaluate the use of a PGS for breast cancer, and in phase IV of the eMERGE Network, outcomes after returning PGSs for ten common diseases will be assessed. Such studies will begin to address the evidence gap related to the use of PGSs in the clinic including clinical and personal utility and cost-effectiveness.
PGSs could transform clinical practice by improving risk prediction for common diseases that are collectively responsible for vast mortality and morbidity worldwide. Progress in refining risk estimates for common diseases has been slow because of their complex and multifactorial aetiology. A grand challenge in genomics is to improve risk stratification of common diseases by boosting the performance of PGSs and by incorporating other measures of genetic risk, multi-omics and environmental variables into comprehensive risk profiles.
References
Wray, N. R., Kemper, K. E., Hayes, B. J., Goddard, M. E. & Visscher, P. M. Complex trait prediction from genome data: contrasting EBV in livestock to PRS in humans: genomic prediction. Genetics 211, 1131–1141 (2019).
Krapohl, E. et al. Multi-polygenic score approach to trait prediction. Mol. Psychiatry 23, 1368–1374 (2018).
Rodriguez, V. et al. Use of multiple polygenic risk scores for distinguishing schizophrenia-spectrum disorder and affective psychosis categories in a first-episode sample; the EU-GEI study. Psychol. Med. https://doi.org/10.1017/S0033291721005456 (2022).
Polygenic Risk Score Task Force of the International Common Disease Alliance. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat. Med. 27, 1876–1884 (2021).
Ritchie, S. C. et al. Integrative analysis of the plasma proteome and polygenic risk of cardiometabolic diseases. Nat. Metab. 3, 1476–1483 (2021).
Zheutlin, A. B. et al. Penetrance and pleiotropy of polygenic risk scores for schizophrenia in 106,160 patients across four health care systems. Am. J. Psychiatry 176, 846–855 (2019).
Berg, J. J. et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725 (2019).
Sohail, M. et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8, e39702 (2019).
Novembre, J. & Barton, N. H. Tread lightly interpreting polygenic tests of selection. Genetics 208, 1351–1355 (2018).
Zhang, H. et al. Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses. Nat. Genet. 52, 572–581 (2020).
Ahlqvist, E. et al. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol. 6, 361–369 (2018).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Mars, N. et al. Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers. Nat. Med. 26, 549–557 (2020).
Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018).
Meisner et al. Combined utility of 25 disease and risk factor polygenic risk scores for stratifying risk of all-cause mortality. Am. J. Hum. Genet. 107, 418–431 (2020).
Jukaranien et al. Genetic risk factors have substantial impact on healthy life years. Preprint at. medRxiv https://doi.org/10.1101/2022.01.25.22269831 (2002).
Hoffmann, T. et al. Genome-wide association study of prostate-specific antigen levels identifies novel loci independent of prostate cancer. Nat. Commun. 8, 14248 (2017).
International Schizophrenia, C. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Pain, O. et al. Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genet. 17, e1009021 (2021).
Wand, H. et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature 591, 211–219 (2021).
Kannel, W. B., Dawber, T. R., Friedman, G. D., Glennon, W. E. & McNamara, P. M. Risk factors in coronary heart disease: the Framingham study. Ann. Int. Med. 61, 888–899 (1964).
Ding, Y. et al. Large uncertainty in individual PRS estimation impacts PRS-based risk stratification. Nat. Genet. 54, 30–39 (2022).
Khera, A. V. et al. Genetic risk, adherence to a healthy lifestyle, and coronary disease. N. Engl. J. Med. 375, 2349–2358 (2016).
Mega, J. L. et al. Genetic risk, coronary heart disease events, and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials. Lancet 385, 2264–2271 (2015).
Mavaddat et al. Polygenic risk scores for breast cancer and breast cancer subtypes. Am. J. Hum. Genet. 104, 21–34 (2019).
Hudson et al. Prospective validation of breast cancer risk model integrating classical risk-factors and polygenic risk in 15 cohorts and six countries. Int. J. Epidemiol. 50, 1897–1911 (2021).
Gail et al. Weighing risks and benefits of tamoxifen treatment for preventing breast cancer. J. Natl Cancer Inst. 91, 1829–1846 (1999).
Widén, E. et al. How communicating polygenic and clinical risk for atherosclerotic cardiovascular disease impacts health behavior: an observational follow-up study. Circ. Genom. Precis. Med. https://doi.org/10.1161/CIRCGEN.121.003459 (2022).
Inouye, M. et al. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J. Am. Coll. Cardiol. 72, 1883–1893 (2018).
Lee, A. et al. BOADICEA: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genet. Med. 21, 1708–1718 (2019).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Fatumo, S. et al. A roadmap to increase diversity in genomic studies. Nat. Med. 28, 243–250 (2022).
Lambert, S. A., Abraham, G. & Inouye, M. Towards clinical utility of polygenic risk scores. Hum. Mol. Genet 28, R133–R142 (2019).
Fahed, A. C. et al. Polygenic background modifies penetrance of monogenic variants for tier 1 genomic conditions. Nat. Commun. 11, 3635 (2020).
Kuchenbaecker, K. B. et al. Evaluation of polygenic risk scores for breast and ovarian cancer risk prediction in BRCA1 and BRCA2 mutation carriers. J. Natl Cancer Inst. 109, djw302 (2017).
Carver, T. et al. CanRisk Tool–a web interface for the prediction of breast and ovarian cancer risk and the likelihood of carrying genetic pathogenic variants. Cancer Epidemiol. Biomark. Prev. 30, 469–473 (2021).
Brigden, T. et al. Implementing polygenic scores for cardiovascular disease into NHS health checks, PHG Foundation https://www.phgfoundation.org/report/prs-implementation-and-delivery (2021).
Kullo, I. J. et al. Incorporating a genetic risk score into coronary heart disease risk estimates: effect on low-density lipoprotein cholesterol levels (the MI-GENES Clinical Trial). Circulation 133, 1181–1188 (2016).
Kullo, I. J., Jarvik, G. P., Manolio, T. A., Williams, M. S. & Roden, D. M. Leveraging the electronic health record to implement genomic medicine. Genet. Med. 15, 270–271 (2013).
Chang, E. T. et al. Reliability of self-reported family history of cancer in a large case-control study of lymphoma. J. Natl Cancer Inst. 98, 61–68 (2006).
Peto, J. et al. Prevalence of BRCA1 and BRCA2 gene mutations in patients with early-onset breast cancer. J. Natl Cancer Inst. 91, 943–949 (1999).
Mars, N. et al. The role of polygenic risk and susceptibility genes in breast cancer over the course of life. Nat. Commun. 11, 6383 (2020).
Dixon, P., Keeney, E., Taylor, J. C., Wordsworth, S. & Martin, R. M. Can polygenic risk scores contribute to cost-effective cancer screening? A systematic review. Preprint at. medRxiv https://doi.org/10.1101/2021.11.26.21266911 (2021).
Turley, P. et al. Problems with using polygenic scores to select embryos. N. Engl. J. Med. 385, 78–86 (2021).
Karavani, E. et al. Screening human embryos for polygenic traits has limited utility. Cell 179, 1424–1435.e8 (2019).
Chatterjee, N., Shi, J. & García-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).
Ding, K. & Kullo, I. J. Evolutionary genetics of coronary heart disease. Circulation 119, 459–467 (2009).
Goff, D. C. Jr. et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation 129, S49–S73 (2014).
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Dikilitas, O. et al. Predictive utility of polygenic risk scores for coronary heart disease in three major racial and ethnic groups. Am. J. Hum. Genet. 106, 707–716 (2020).
Ruan, Y. & et al. Improving polygenic prediction in ancestrally diverse populations. Preprint at. medRxiv https://doi.org/10.1101/2020.12.27.20248738 (2021).
Graham et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).
Forzano, F. et al. The use of polygenic risk scores in pre-implantation genetic testing: an unproven, unethical practice. Eur. J. Hum. Genet. https://doi.org/10.1038/s41431-021-01000-x (2021).
Powell, K. The broken promise that undermines human genome research. Nature 590, 198–201 (2021).
Lambert, S. A. et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021).
Pain, O., Gillett, A. C., Austin, J. C., Folkersen, L. & Lewis, C. M. A tool for translating polygenic scores onto the absolute scale using summary statistics. Eur. J. Hum. Genet. https://doi.org/10.1038/s41431-021-01028-z (2022).
Folkersen, L. et al. Impute.me: an open-source, non-profit tool for using data from direct-to-consumer genetic testing to calculate and interpret polygenic risk scores. Front. Genet. 11, 578 (2020).
Safarova, M. S., Ackerman, M. J. & Kullo, I. J. A call for training programmes in cardiovascular genomics. Nat. Rev. Cardiol. 18, 539–540 (2021).
Acknowledgements
I.J.K. is funded by NIH grants HG-006379, HG-011710 and HL-70710. A.R.M. is supported by funding from the NIH (R00MH117229).
Author information
Authors and Affiliations
Contributions
Iftikhar J. Kullo is a Professor of Cardiovascular Medicine, at Mayo Clinic, Rochester, Minnesota, USA. His research laboratory focuses on the genetic epidemiology of coronary heart disease and implementation of genomic medicine. He is a Principal Investigator in the National Human Genome Research Institute’s eMERGE and PRIMED Networks and serves on the US National Advisory Council on Human Genome Research.
Cathryn M. Lewis is Professor of Genetic Epidemiology and Statistics at King’s College London, UK, where she leads the Social, Genetic and Developmental Psychiatry Centre. She co-chairs the Psychiatric Genomics Consortium Major Depressive Disorder Working Group and leads the Biomarkers and Genomics theme in the NIHR Maudsley Biomedical Research Centre, performing translational research to establish the evidence base for genomics in a clinical setting.
Michael Inouye is a computational biologist who has been analysing human genome data for more than 20 years. He is a Professor and Director of Research at the University of Cambridge, UK, Munz Chair of Cardiovascular Prediction and Prevention at the Baker Heart and Diabetes Institute and Director of the Cambridge Baker Systems Genomics Initiative.
Alicia R. Martin is a population and statistical geneticist. Her research examines the role of human history in shaping global genetic and phenotypic diversity. To ensure that vast Eurocentric study biases do not exacerbate health disparities, she is developing statistical methods, genomics resources and research capacity for diverse and under-represented populations.
Samuli Ripatti is Professor and Vice Director at the Institute for Molecular Medicine Finland (FIMM), University of Helsinki, and chair of the Academy of Finland’s Centre of Excellence in Complex Disease Genetics. His research group studies genetic variation and its effects on common disease risks and management. His research uses cardiometabolic diseases and cancers as models to learn about disease mechanisms and genome-based strategies for prevention and prognosis.
Nilanjan Chatterjee is a Bloomberg Distinguished Professor at Johns Hopkins University, USA, and was previously the Chief of the Biostatistics Branch at the National Cancer Institute. He is known for his research on sample size requirements for polygenic prediction, methods for building polygenic scores (PGS) and integration of PGS with non-genetic risk factors.
Corresponding authors
Ethics declarations
Competing interests
C.M.L. is a member of the Scientific Advisory Board for Myriad Neuroscience. A.R.M. has consulted for 23andMe and Illumina and received speaker fees from Genentech, Pfizer and Illumina. The other contributors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Clinical Genome Resource (ClinGen) Complex Disease Working Group: https://www.clinicalgenome.org/working-groups/complex-disease/
eMERGE Network: https://emerge-network.org/
Global Biobank Meta-analysis Initiative: www.globalbiobankmeta.org
International Consortium for Integrative Genomics Prediction: www.interveneproject.eu
Polygenic Score Catalogue: https://www.pgscatalog.org/
PRIMED consortium: https://primedconsortium.org/
Rights and permissions
About this article
Cite this article
Kullo, I.J., Lewis, C.M., Inouye, M. et al. Polygenic scores in biomedical research. Nat Rev Genet 23, 524–532 (2022). https://doi.org/10.1038/s41576-022-00470-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41576-022-00470-z
This article is cited by
-
Recent advances in polygenic scores: translation, equitability, methods and FAIR tools
Genome Medicine (2024)
-
Principles and methods for transferring polygenic risk scores across global populations
Nature Reviews Genetics (2024)
-
Pleiotropy, epistasis and the genetic architecture of quantitative traits
Nature Reviews Genetics (2024)
-
Examining intergenerational risk factors for conduct problems using polygenic scores in the Norwegian Mother, Father and Child Cohort Study
Molecular Psychiatry (2024)
-
A linear weighted combination of polygenic scores for a broad range of traits improves prediction of coronary heart disease
European Journal of Human Genetics (2024)