GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals

Iotchkova, Valentina; Ritchie, Graham R. S.; Geihs, Matthias; Morganella, Sandro; Min, Josine L.; Walter, Klaudia; Timpson, Nicholas John; Dunham, Ian; Birney, Ewan; Soranzo, Nicole

doi:10.1038/s41588-018-0322-6

Technical Report
Published: 28 January 2019

GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals

Nature Genetics volume 51, pages 343–353 (2019)Cite this article

9258 Accesses
108 Citations
28 Altmetric
Metrics details

Subjects

Abstract

Loci discovered by genome-wide association studies predominantly map outside protein-coding genes. The interpretation of the functional consequences of non-coding variants can be greatly enhanced by catalogs of regulatory genomic regions in cell lines and primary tissues. However, robust and readily applicable methods are still lacking by which to systematically evaluate the contribution of these regions to genetic variation implicated in diseases or quantitative traits. Here we propose a novel approach that leverages genome-wide association studies’ findings with regulatory or functional annotations to classify features relevant to a phenotype of interest. Within our framework, we account for major sources of confounding not offered by current methods. We further assess enrichment of genome-wide association studies for 19 traits within Encyclopedia of DNA Elements- and Roadmap-derived regulatory regions. We characterize unique enrichment patterns for traits and annotations driving novel biological insights. The method is implemented in standalone software and an R package, to facilitate its application by the research community.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Outline of the GARFIELD method.**

**Fig. 3: Enrichment of GWA analysis P values in DHS (hotspots).**

**Fig. 4: Method comparison for 21 GWAS datasets in DHS (hotspots) and histone modifications (H3K27ac and H3K4me3) at the T < 10^-8 GWAS significance threshold.**

Fig. 5: Enrichment levels (log OR) and extent of sharing between traits for 25-state chromatin segmentations of the National Institutes of Health Roadmap and ENCODE projects at the T < 10⁻⁵ GWAS significance threshold.

An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci

Article 28 October 2021

Systematic differences in discovery of genetic effects on gene expression and complex traits

Article 19 October 2023

Transcriptome-wide association studies: recent advances in methods, applications and available databases

Article Open access 01 September 2023

Code availability

Custom codes can be found at http://www.ebi.ac.uk/birney-srv/GARFIELD/.

Data availability

Web links for publicly available GWAS datasets and regulatory information databases are included in the URLs section. Restriction of availability applies to blood cell indices GWAS from van der Harst et al.³³ and Gieger et al.³⁴, which have been obtained through the manuscripts’ authors. Any other data that support the findings of this study are available from the corresponding authors upon reasonable request.

References

Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Article CAS PubMed PubMed Central Google Scholar
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article Google Scholar
Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Article CAS PubMed PubMed Central Google Scholar
Bernstein, B. E. et al. The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol. 28, 1045–1048 (2010).
Article CAS PubMed PubMed Central Google Scholar
Adams, D. et al. BLUEPRINT to decode the epigenetic signature written in blood. Nat. Biotechnol. 30, 224–226 (2012).
Article CAS PubMed Google Scholar
1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Article Google Scholar
Shen, H. et al. Comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty-four Caucasians. PLoS ONE 8, e59494 (2013).
Article CAS PubMed PubMed Central Google Scholar
Chung, D., Yang, C., Li, C., Gelernter, J. & Zhao, H. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS. Genet. 10, e1004787 (2014).
Article PubMed PubMed Central Google Scholar
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Article CAS PubMed PubMed Central Google Scholar
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Article CAS PubMed PubMed Central Google Scholar
Schork, A. J. et al. All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoS. Genet. 9, e1003449 (2013).
Article CAS PubMed PubMed Central Google Scholar
Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).
Article CAS PubMed PubMed Central Google Scholar
Trynka, G. et al. Disentangling the effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex-trait loci. Am. J. Hum. Genet. 97, 139–152 (2015).
Article CAS PubMed PubMed Central Google Scholar
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article PubMed Central Google Scholar
Schmidt, E. M. et al. GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinformatics 31, 2601–2606 (2015).
Article CAS PubMed PubMed Central Google Scholar
Galwey, N. W. A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests. Genet. Epidemiol. 33, 559–568 (2009).
Article PubMed Google Scholar
Dunham, I., Kulesha, E., Iotchkova, V., Morganella, S. & Birney, E. FORGE: a tool to discover cell specific enrichments of GWAS associated SNPs in regulatory regions. F1000Res. https://doi.org/10.12688/f1000research.6032.1 (2015).
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429.e19 (2016).
Article CAS PubMed PubMed Central Google Scholar
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Article CAS PubMed PubMed Central Google Scholar
Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
Article CAS PubMed PubMed Central Google Scholar
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Article CAS PubMed PubMed Central Google Scholar
Heid, I. M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat. Genet. 42, 949–960 (2010).
Article CAS PubMed PubMed Central Google Scholar
Saxena, R. et al. Genetic variation in GIPR influences the glucose and insulin responses to an oral glucose challenge. Nat. Genet. 42, 142–148 (2010).
Article CAS PubMed PubMed Central Google Scholar
Dupuis, J. et al. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat. Genet. 42, 105–116 (2010).
Article CAS PubMed PubMed Central Google Scholar
Strawbridge, R. J. et al. Genome-wide association identifies nine common variants associated with fasting proinsulin levels and provides new insights into the pathophysiology of type 2 diabetes. Diabetes 60, 2624–2634 (2011).
Article CAS PubMed PubMed Central Google Scholar
Soranzo, N. et al. Common variants at 10 genomic loci influence hemoglobin A₁(C) levels via glycemic and nonglycemic pathways. Diabetes 59, 3229–3239 (2010).
Article CAS PubMed PubMed Central Google Scholar
Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Article CAS PubMed PubMed Central Google Scholar
Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
Article CAS PubMed PubMed Central Google Scholar
International Consortium for Blood Pressure Genome-Wide Association Studies et al. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103–109 (2011).
Article Google Scholar
Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012).
CAS PubMed PubMed Central Google Scholar
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Article PubMed Central Google Scholar
Van der Harst, P. et al. Seventy-five genetic loci influencing the human red blood cell. Nature 492, 369–375 (2012).
Article PubMed PubMed Central Google Scholar
Gieger, C. et al. New gene functions in megakaryopoiesis and platelet formation. Nature 480, 201–208 (2011).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
UK10K Consortium. et al. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015).
Article Google Scholar

Download references

Acknowledgements

This study made use of data generated by the UK10K Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.UK10K.org/. Funding for UK10K was provided by the Wellcome Trust under award no. WT091310. Research by N.S. is supported by the Wellcome Trust (grants WT098051 and WT091310). N.J.T. is supported as a Wellcome Trust Investigator (no. 202802/Z/16/Z), is supported as the principal investigator of the Avon Longitudinal Study of Parents and Children (no. MRC & WT 102215/2/13/2), is supported by the University of Bristol NIHR Biomedical Research Centre (no. BRC-1215-20011) and the MRC Integrative Epidemiology Unit (no. MC_UU_12013/3), and works within the CRUK Integrative Cancer Epidemiology Programme (no. C18281/A19169). G.R.S.R. and V.I. are supported by European Molecular Biology Laboratory-Wellcome Trust Sanger Institute postdoctoral fellowships.

Author information

A full list of members is available in the Supplementary Note.

Authors and Affiliations

Human Genetics, Wellcome Sanger Institute, Hinxton, UK
Valentina Iotchkova, Graham R. S. Ritchie, Matthias Geihs, Klaudia Walter & Nicole Soranzo
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
Valentina Iotchkova, Graham R. S. Ritchie, Sandro Morganella, Ian Dunham & Ewan Birney
MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
Josine L. Min & Nicholas John Timpson
Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
Josine L. Min & Nicholas John Timpson
Department of Haematology, University of Cambridge, Cambridge, UK
Nicole Soranzo
The National Institute for Health Research Blood and Transplant Unit (NIHR BTRU) in Donor Health and Genomics, University of Cambridge, Cambridge, UK
Nicole Soranzo

Authors

Valentina Iotchkova
View author publications
You can also search for this author in PubMed Google Scholar
Graham R. S. Ritchie
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Geihs
View author publications
You can also search for this author in PubMed Google Scholar
Sandro Morganella
View author publications
You can also search for this author in PubMed Google Scholar
Josine L. Min
View author publications
You can also search for this author in PubMed Google Scholar
Klaudia Walter
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas John Timpson
View author publications
You can also search for this author in PubMed Google Scholar
Ian Dunham
View author publications
You can also search for this author in PubMed Google Scholar
Ewan Birney
View author publications
You can also search for this author in PubMed Google Scholar
Nicole Soranzo
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

UK10K Consortium

Contributions

G.R.S.R., J.L.M., K.W., N.J.T., I.D. and N.S. contributed data or materials. E.B., G.R.S.R., I.D., J.L.M., N.S. and V.I. developed the method. V.I. analyzed the data. E.B., I.D., N.J.T., N.S. and V.I. provided critical interpretation of the results. M.G. and S.M. designed the tools. E.B., N.S. and V.I. wrote the manuscript. E.B., G.R.S.R., I.D., J.L.M., K.W., M.G., N.J.T., N.S., S.M. and V.I. evaluated the manuscript. E.B. and N.S. designed and managed the project.

Corresponding authors

Correspondence to Ewan Birney or Nicole Soranzo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Fig. 1 GARFIELD method assessment.

a, GARFIELD-estimated false-positive rate (FPR) from 29 real data GWAS and n = 1,000 independent simulated annotations. The black horizontal line denotes the 5% FPR threshold. Traits are shown on the x axis. Colored bars denote the GARFIELD threshold of T < 10^–8 (red) and the GARFIELD threshold of T < 10^–5 (gray). Error bars denote standard errors. b,c, Effect of feature correction for each of 29 GWAS at the T < 10^–8 significance threshold when analyzing 424 DHS cell types. The figure shows the proportion of significant annotations with respect to feature correction, where N denotes the number of LD proxies and T distance to the nearest TSS. The y axis shows the corresponding values when no feature correction is employed. d, Difference in proportion of significant enrichments between a model not accounting for any feature to a model accounting for any combination of the features, respectively, applied to 424 DHS cell types.

Supplementary Fig. 2 GARFIELD enrichment wheel plots.

Enrichment of genome-wide association analysis P-values in DNase I–hypersensitive sites (hotspots) for 27 disease/quantitative traits. Radial lines show OR values at eight GWAS –log₁₀ P-value thresholds (T) for n = 424 ENCODE and Roadmap Epigenomics DHS cell lines, sorted by tissue on the outer circle. Dots on the outer side of the circle denote significant enrichment (if present) at T < 10^–5 (outermost) to T < 10^–8 (innermost).

Supplementary Fig. 3 Comparison between real data results for 29 real GWAS and 424 open chromatin annotations at the T < 10^–8 and T < 10^–5 GWAS P-value thresholds.

Left, −log₁₀ P-value comparison for all trait annotation pairs (n = 29 × 424 points). Horizontal and vertical lines denote the threshold for detecting enrichment after multiple-testing correction. The numbers in each corner denote the number of points in it. Right, odds ratio (OR) comparison for trait annotation pairs with significant enrichment in both the T < 10^–8 and T < 10^–5 GWAS P-value thresholds.

Supplementary Fig. 4 Multiple-annotation enrichment of genome-wide association analysis P-values in DNase I–hypersensitive sites (hotspots) for 15 disease or quantitative GWAS traits.

Cell types/tissues remaining after a heuristic multiple-annotation approach are shown on the y axis for each trait. Odds ratios (on log scale) are represented as dots and 95% CI with lines. The multiple-annotation model estimates are represented in red and the marginal effects of analysis of each annotation on its own are represented in black. Only phenotypes with at least a single detected enrichment are shown.

Supplementary Fig. 5 Enrichment levels (log OR) and extent of sharing between traits for 25-state chromatin segmentations of the NIH Roadmap and ENCODE projects at the T < 10^–8 GWAS significance threshold.

a, Distribution of significant OR values across the 29 traits considered, split by segmentation state and colored to highlight predicted functional elements by Roadmap Epigenomics (see Supplementary Table 9). Number of points n is shown on the x axis below each category. b, Distribution of the pairwise difference between ORs from all enhancer, promoter and transcriptional enhancers and transcriptional regulatory states tested (‘state 1’) to ORs from transcription states for significant enrichments only (‘state 2’; for example, measuring OR^c,t_EnhA1 − OR^c,t_Tx for all cell types c and traits t for which P-value^c,t_EnhA1 and P-value^c,t_Tx are both significant). Number of points n is shown on the x axis below each category. c, Sharing of significantly enriched/depleted annotations across 27 phenotypes (excluding CD and UC). The bar plot displays the number of cell types where an annotation is uniquely enriched in a trait or shared among multiple traits.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5 and Supplementary Note

Reporting Summary

Supplementary Table 1

Summary of available enrichment analysis approaches

Supplementary Table 2

Summary of overlap of UK10K sequence variants with DNase I–hypersensitive sites

Supplementary Table 3

Summary of the number of variants per disease/quantitative trait

Supplementary Table 4

GARFIELD enrichment of 29 publicly available GWAS studies in DNase I–hypersensitive sites from 424 ENCODE and Roadmap Epigenomics cell types

Supplementary Table 5

Enrichment P values (–log₁₀) from five methods of 21 publicly available GWAS studies in DNase I–hypersensitive sites from 424 ENCODE and Roadmap Epigenomics cell types; in H3K27ac peaks from 127 ENCODE and Roadmap Epigenomics cell types; and in H3K4me3 peaks from 127 ENCODE and Roadmap Epigenomics cell types

Supplementary Table 6

Method running time and memory usage for 21 traits and 424 DNase I–hypersensitive site annotations

Supplementary Table 7

Roadmap epigenomics and ENCODE cell lines used for segmentations

Supplementary Table 8

GARFIELD enrichment of 29 publicly available GWAS studies in 25 state genome segmentations in 127 cell types

Supplementary Table 9

Segmentation state summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iotchkova, V., Ritchie, G.R.S., Geihs, M. et al. GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals. Nat Genet 51, 343–353 (2019). https://doi.org/10.1038/s41588-018-0322-6

Download citation

Received: 19 January 2016
Accepted: 29 November 2018
Published: 28 January 2019
Issue Date: February 2019
DOI: https://doi.org/10.1038/s41588-018-0322-6

This article is cited by

Genetics of chronic respiratory disease
- Ian Sayers
- Catherine John
- Ian P. Hall
Nature Reviews Genetics (2024)
Multi-ancestry genome-wide association study of kidney cancer identifies 63 susceptibility regions
- Mark P. Purdue
- Diptavo Dutta
- Stephen J. Chanock
Nature Genetics (2024)
Integration of epigenetic and genetic profiles identifies multiple sclerosis disease-critical cell types and genes
- Qin Ma
- Hengameh Shams
- Jorge R. Oksenberg
Communications Biology (2023)
Functional genomics identify causal variant underlying the protective CTSH locus for Alzheimer’s disease
- Yu Li
- Min Xu
- Yong-Gang Yao
Neuropsychopharmacology (2023)
Genetic variation in cis-regulatory domains suggests cell type-specific regulatory mechanisms in immunity
- Diana Avalos
- Guillaume Rey
- Olivier Delaneau
Communications Biology (2023)