Genome-wide association studies (GWAS) have revolutionized our ability to understand the genetic underpinnings of biomedical traits; however, their extreme Eurocentric bias has exacerbated health inequities. In this perspective, we highlight recent efforts to address the imbalance of ancestral representation in genomics research, including the formation of large collaborative efforts and the development of novel statistical methods to improve translation of genomic insights across ancestries. Using more ancestrally diverse GWAS samples will improve our understanding of the genetic architecture of complex diseases, not only for the understudied populations, but for individuals of all ancestral backgrounds.

GWAS have yielded a wealth of clues about the molecular basis of many common human diseases [1]. Firstly, GWAS have repeatedly identified associations with genes that are the targets of existing and highly effective pharmacologic agents, such as PCSK9 for cardiovascular health [2], and DRD2 for schizophrenia [3], and may allow for the identification of additional novel biological pathways. Secondly, GWAS have revealed that virtually all traits are influenced by many variants, each with small effect sizes, distributed throughout the genome. Using GWAS results, we can combine the risk conferred by these multiple variants into a single genetic liability score (i.e., polygenic score (PGS)), which may assist in risk prediction and disease stratification [4], potentially contributing to precision medicine. Despite these advances, there is a significant problem with GWAS: most current well-powered GWAS are performed on samples with a drastic overrepresentation of individuals of European ancestry. This is problematic because genomic results often do not fully transfer across ancestries [5]. Although it is presumed that people of all ancestries share the same underlying biological disease mechanisms and causal variants, some variants may be more frequent or more correlated in different genetic backgrounds. As a result, variants may be statistically correlated with a trait in one ancestral group but not another. Similarly, while PGS might perform well within a specific ancestry, their accuracy decreases when applied to others [6,7,8]. This lack of diversity in gene discovery cohort composition alongside a paucity of methods designed for diverse populations has resulted in reduced transferability of findings across ancestries, which may exacerbate existing health disparities and stigmatization [9, 10].

One way to address this problem is by generating GWAS datasets in reference panels based on individuals from diverse ancestries. Current efforts have been propelled by both academic and direct-to-consumer genetic companies (e.g., All of Us, Million Veterans Program, China Kadoorie Biobank, Biobank Japan, TOPMed, 23andMe), alongside large-scale data aggregation largely led by consortia (e.g., Latin American Genomics Consortium, H3Africa, PAGE, Qatar Biobank, GenomeAsia 100k, gnomAD). To support these massive efforts, it is of critical importance to facilitate funding mechanisms. Of equal importance are targeted outreach and education efforts (e.g., workshops, community engagement, and development and distribution of informational materials) that build trust in genomic research among minority populations, ensuring they benefit maximally from the research.

However, collecting the vast amounts of data needed to diversify our datasets is an immense undertaking. Along with the growing awareness that genomic studies need to include more diverse populations, there is also a push to improve methods for studying data from these populations. Some of the underrepresented populations that may fill this gap are genetically heterogeneous and contain genetic components from multiple continental ancestries, also known as “admixed”. For example, Latin American and African American individuals are typically admixed between two or three different continental ancestries. Admixed populations have generally been excluded from GWAS due to the difficulty of effectively accounting for their complex genomic structure. One promising strategy to account for this structure is the use of local ancestry (i.e., the particular ancestry of each genomic segment of an individual). Early association efforts in admixed cohorts utilized local ancestry via admixture mapping and novel tools are being developed that build local ancestry into GWAS [11], including Tractor [12]. Other recent works have developed ancestry-aware methods that only require summary statistics (e.g., Multi-Ancestry Meta-Analysis [13]). Applying these multi-ancestry genomics methods to combine more samples will consequently increase power to detect genetic factors for complex traits shared across ancestries, and help localize signals closer to causal variants (e.g. [14]). In addition to GWAS, other polygenic methods are being actively developed to better estimate heritability [15], generate cross-ancestry genetic correlations [16], and improve the transferability of PGS across populations [17,18,19,20]. Along with other sources of omic data (many of which are also currently Eurocentric), novel methods leverage cross-population prediction at the level of gene transcript [21], cell-type specific regulatory annotations [22], and even gene network analyses.

Despite much progress, many current methods do not account for the complex sociocultural experiences of individuals that may impact health outcomes or disease prevalence [23]. When ancestry-specific results arise, we must be cognizant that these may reflect differences in case ascertainment as well as environmental exposures, societal factors, and demographics (e.g., socioeconomic status, diet) that may be confounded by ancestry. Special attention is needed to ensure that the phenotypes and ancestry categorization in understudied groups are well-defined.

Beyond these efforts to generate more diverse data and enhance methods for its analysis, other challenges remain to increase equity in genomics research [5, 9, 10, 24]. To work toward closing the diversity gap, it is important that there is sufficient and sustained support for efforts to increase the diversity of scientists through training programs and capacity building (e.g., https://gingerprogram.org), and the promotion and recognition of local researchers. Current and future consortium efforts should maintain equitable and ethical partnerships with low- and middle-income countries (LMIC), ensuring that they represent full partners and not data harvesters. There must also be an increased effort to revise current publication and grantmaking policies to ensure that they do not disadvantage researchers from underrepresented communities. The success of these initiatives will require support from funding agencies and scientific journals, for example, by considering studies of all cohort sizes, encouraging replication of findings in distinct ancestries, offering fee waivers for publication and open access to journals, particularly in LMICs, and flexible data sharing policies [25]. These efforts will ensure that the benefits of genomics research are shared across populations, striving toward global health equity.