Abstract
Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444?454 (2006).
Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727?732 (2005).
Lupski, J.R. & Stankiewicz, P. (eds). Genomic Disorders: The Genomic Basis of Disease (Humana Press, Totowa, New Jersey, 2006).
Stranger, B.E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848?853 (2007).
Flint, J. et al. High frequencies of alpha-thalassaemia are the result of natural selection by malaria. Nature 321, 744?750 (1986).
Gonzalez, E. et al. The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility. Science 307, 1434?1440 (2005).
Aitman, T.J. et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851?855 (2006).
McCarroll, S.A. & Altshuler, D.M. Copy-number variation and association studies of human disease. Nat. Genet. 39, S37?S42 (2007).
Yang, Y. et al. Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE): low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans. Am. J. Hum. Genet. 80, 1037?1054 (2007).
Fellermann, K. et al. A chromosome 8 gene-cluster polymorphism with low human beta-defensin 2 gene copy number predisposes to Crohn disease of the colon. Am. J. Hum. Genet. 79, 439?448 (2006).
Hollox, E.J. et al. Psoriasis is associated with increased beta-defensin genomic copy number. Nat. Genet. 40, 23?25 (2008).
Clayton, D.G. et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat. Genet. 37, 1243?1246 (2005).
Plagnol, V., Cooper, J.D., Todd, J.A. & Clayton, D.G. A method to address differential bias in genotyping in large-scale association studies. PLoS Genet. 3, e74 (2007).
WTCCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661?678 (2007).
Armour, J.A., Barton, D.E., Cockburn, D.J. & Taylor, G.R. The detection of large deletions or duplications in genomic DNA. Hum. Mutat. 20, 325?337 (2002).
Chong, S.S., Boehm, C.D., Higgs, D.R. & Cutting, G.R. Single-tube multiplex-PCR screen for common deletional determinants of alpha-thalassemia. Blood 95, 360?362 (2000).
Newman, T.L. et al. High-throughput genotyping of intermediate-size structural variation. Hum. Mol. Genet. 15, 1159?1167 (2006).
Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906?913 (2007).
McCullagh, P. & Nelder, J.A. Generalized Linear Models (Chapman and Hall, London, 1989).
Prentice, R.L. & Pyke, R. Logistic disease incidence models and case-control studies. Biometrika 66, 403?411 (1979).
Meng, X.-L. & Rubin, D.B. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80, 267?278 (1993).
R Development Core Team. R: A Language and Environment for Statistical Computing < http://cran.r-project.org/doc/manuals/refman.pdf> (2007).
Schwarz, G. Estimating the dimension of a model. Annals of Statistics 6, 461?464 (1978).
Acknowledgements
C.B., T.F., R.R. and M.E.H. are funded by the Wellcome Trust (WT), J.M. is funded by the WT and the National Institute of General Medical Sciences, V.P. is supported by a Juvenile Diabetes Research Foundation (JDRF) fellowship, and D.C. is supported by a JDRF/WT fellowship. The authors would like to thank the Wellcome Trust Case Control Consortium, D. Conrad, A. Moses, N. Carter, M. Dermitzakis, B. Stranger, J. Armour and E. Hollox for data access and helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1?5, Supplementary Table 1, Supplementary Methods (PDF 334 kb)
Rights and permissions
About this article
Cite this article
Barnes, C., Plagnol, V., Fitzgerald, T. et al. A robust statistical method for case-control association testing with copy number variation. Nat Genet 40, 1245–1252 (2008). https://doi.org/10.1038/ng.206
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.206
This article is cited by
-
MADloy: robust detection of mosaic loss of chromosome Y from genotype-array-intensity data
BMC Bioinformatics (2020)
-
Bayesian copy number detection and association in large-scale studies
BMC Cancer (2020)
-
Probe-based association analysis identifies several deletions associated with average daily gain in beef cattle
BMC Genomics (2019)
-
Complement receptor 1 gene (CR1) intragenic duplication and risk of Alzheimer’s disease
Human Genetics (2018)
-
On the association analysis of CNV data: a fast and robust family-based association method
BMC Bioinformatics (2017)