Abstract
Uniform manifold approximation and projection (UMAP) has been rapidly adopted by the population genetics community to study population structure. It has become common in visualizing the ancestral composition of human genetic datasets, as well as searching for unique clusters of data, and for identifying geographic patterns. Here we give an overview of applications of UMAP in population genetics, provide recommendations for best practices, and offer insights on optimal uses for the technique.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
McVean G. A genealogical interpretation of principal components analysis. PLoS Gen. 2009;5:e1000686.
Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Gen. 2006;2:e190.
Maaten Lvd, Hinton G. Visualizing data using t-sne. J Mach Learn Res. 2008;9:2579–2605.
McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv 2018. http://arxiv.org/abs/1802.03426.
Becht E, McInnes L, Healy J, Dutertre C, Kwok IWH, Newel EW, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2019;37:38–44.
Moon KR, Dijk Dv, Wang Z, Gigante S, Burkhardt DB, Coifman RR, et al. Visualizing structure and transitions in high-dimensional biological data. Nat Biotechnol. 2019;37:1482–92.
Diaz-Papkovich A, Anderson-Trocmé L, Ben-Eghan C, Gravel S. UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLoS Gen. 2019;15. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853336/.
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526:68–74.
Cann HM, Toma Cd, Cazes L, Legrand MF, Morel V, Cambon-Thomsen A, et al. A human genome diversity cell line panel. Science. 2002;296:261–2.
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, MacArthur DG, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–43.
Nagai A, Hirata M, Kamatani Y, Muto K, Matsuda K, Mushiroda T, et al. Overview of the BioBank Japan Project: study design and profile. Journal of epidemiology. 2017;27:S2–S8.
Sakaue S, Hirata J, Kanai M, Suzuki K, Akiyama M, Okada Y, et al. Dimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction. Nat Commun. 2020;11:1569.
Belbin GM, Wenric S, Cullina S, Glicksberg BS, Moscati A, Kenny EE, et al. Towards a fine-scale population health monitoring system. bioRxiv780668. 2019. https://www.biorxiv.org/content/10.1101/780668v1.
Hunter-Zinck H, Shi Y, Li M, Gorman BR, Ji SG, Pyarajan S, et al. Genotyping array design and data quality control in the million veteran program. Am J Human Gen. 2020;106:535–48.
Margaryan A, Lawson D, Sikora M, Racimo F, Rasmussen S, Willerslev E, et al. Population genomics of the Viking world. bioRxiv703405. 2019. https://www.biorxiv.org/content/10.1101/703405v1.
Simon A, Fraïsse C, El Ayari T, Liautard-Haag C, Strelkov P, Bierne N, et al. Local introgression at two spatial scales in mosaic hybrid zones of mussels. bioRxiv818559. 2019. https://www.biorxiv.org/content/10.1101/818559v1.
Sánchez-Barreiro F, Gopalakrishnan S, Ramos-Madrigal J, Westbury MV, Manuel Mde, Gilbert MTP, et al. Historical population declines prompted significant genomic erosion in the northern and southern white rhinoceros (Ceratotherium simum). bioRxiv2020.05.10.086686. 2020. https://www.biorxiv.org/content/10.1101/2020.05.10.086686v1.
The Anopheles Gambiae 1000 Genomes Consortium. Genome variation and population structure among 1142 mosquitoes of the African malaria vector species Anopheles gambiae and Anopheles coluzzii. bioRxiv864314. 2020. https://www.biorxiv.org/content/10.1101/864314v2.
Schmidt TL, Chung J, Honnen A-C, Weeks AR, Hoffmann A A. Population genomics of two invasive mosquitoes (aedes aegypti and aedes albopictus) from the indo-pacific. bioRxiv. 2020.
Dai CL, Vazifeh MM, Yeang CH, Tachet R, Wells RS, Martin AR, et al. Population histories of the United States revealed through fine-scale migration and haplotype analysis. Am J Hum Gen. 2020;106:371–88.
Spear ML, Diaz-Papkovich A, Ziv E, Gravel S, Torgerson DG, Hernandez R. Recent fluctuations in Mexican American genomes have altered the genetic architecture of biomedical traits. bioRxiv. 2020.
Holmes S, Huber W. Modern statistics for modern biology (Cambridge University Press, 2018).
Tonkin-Hill G, Lees JA, Bentley SD, Frost SDW, Corander J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Research. 2019;47:5539–49.
Almarri MA, Bergström A, Prado-Martinez J, Yang F, Fu B, Xue Y, et al. Population structure, stratification, and introgression of human structural variation. Cell. 2020;182;189–199.e15.
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Gen Res. 2009;19:1655–1664.
Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Gen. 2012;8:e1002453.
Kerminen S, Martin AR, Koskela J, Ruotsalainen SE, Havulinna AS, Daly MJ, et al. Geographic variation and bias in the polygenic scores of complex diseases and traits in Finland. Am J Hum Gen. 2019;104:1169–81.
Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Coop G, et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife. 2019;8:e39725.
Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Sunyaev SR, et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019;8:e39702.
Yamamoto K, Sakaue S, Matsuda K, Murakami Y, Kamatani Y, Okada Y, et al. Genetic and phenotypic landscape of the mitochondrial genome in the Japanese population. Commun Biol. 2020;3:1–11.
Mathieson I, Scally A. What is ancestry? PLoS Genetics. 2020;16:e1008624.
McInnes L, Healy J, Saul N, Grossberger L. UMAP: uniform manifold approximation and projection. J Open Source Softw. 2018;3:861.
Hunter JD. Matplotlib: a 2d graphics environment. Comput Sci Eng. 2007;9:90–5.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Duchesnay E, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Diaz-Papkovich, A., Anderson-Trocmé, L. & Gravel, S. A review of UMAP in population genetics. J Hum Genet 66, 85–91 (2021). https://doi.org/10.1038/s10038-020-00851-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s10038-020-00851-4
This article is cited by
-
A gene-based score for the risk stratification of stage IA lung adenocarcinoma
Respiratory Research (2024)
-
Heterogeneity in response to treatment across tinnitus phenotypes
Scientific Reports (2024)
-
Cell-type-resolved mosaicism reveals clonal dynamics of the human forebrain
Nature (2024)
-
Enabling methanol fixation of pediatric nasal wash during respiratory illness for single cell sequencing in comparison with fresh samples
Pediatric Research (2024)
-
Distance correlation entropy and ordinal distance complexity measure: efficient tools for complex systems
Nonlinear Dynamics (2024)