Key Points
-
Either population-based or family-based designs can be used in gene-association studies. Population-based designs use unrelated individuals; family-based designs use probands and their relatives, typically either parents or siblings.
-
Genetic-association studies face the obstacles of population substructures and multiple testing.
-
Family-based designs are favoured because they are robust against confounding due to population substructures and test both linkage and association.
-
Case–control designs are preferred for the relative ease of data collection. They have modest power advantages, depending on the prevalence of the disease.
-
Family-based designs can be extended to incorporate pedigrees and complex phenotypes.
-
Screening tools are available for family-based designs that allow the multiple-testing problem, which is an important issue in whole-genome association studies, to be handled.
Abstract
Both population-based and family-based designs are commonly used in genetic association studies to locate genes that underlie complex diseases. The simplest version of the family-based design — the transmission disequilibrium test — is well known, but the numerous extensions that broaden its scope and power are less widely appreciated. Family-based designs have unique advantages over population-based designs, as they are robust against population admixture and stratification, allow both linkage and association to be tested for and offer a solution to the problem of model building. Furthermore, the fact that family-based designs contain both within- and between-family information has substantial benefits in terms of multiple-hypothesis testing, especially in the context of whole-genome association studies.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Risch, N. & Merikangas, K. The future of genetics studies of complex human diseases. Science 273, 1516–1517 (1996). Shows that genome-wide association scans based on trios have greater power than genome-wide linkage scans based on affected sib pairs.
Clayton, D. G. et al. Population structure, differential bias and genomic control in a large-scale, case–control association study. Nature Genet. 37, 1243–1246 (2005).
Freedman, M. L. et al. Assessing the impact of population stratification on genetic association studies. Nature Genet. 36, 388–393 (2004).
McGinnis, R. General equations for Pt, Ps, and the power of the TDT and the affected-sib-pair test. Am. J. Hum. Genet. 67, 1340–1347 (2000).
McGinnis, R., Shifman, S. & Darvasi, A. Power and efficiency of the TDT and case–control design for association scans. Behav. Genet. 32, 135–144 (2002).
Zollner, S. et al. Evidence for extensive transmission distortion in the human genome. Am. J. Hum. Genet. 74, 62–72 (2004).
Ott, J. Statistical properties of the haplotype relative risk. Genet. Epidemiol. 6, 127–130. (1989) Demostrates the need for linkage and association under the alternative hypothesis for a family-based test.
Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nature Rev. Genet. 6, 95–108 (2005).
Lazzeroni, L. C. & Lange, K. A conditional inference framework for extending the transmission/disequilibrium test. Hum. Hered. 48, 67–81 (1998).
Rabinowitz, D. & Laird, N. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum. Hered. 50, 211–223 (2000). Generalization of the TDT for general pedigrees, missing parents and arbitrary phenotypes using the approach of conditioning on the sufficient statistic.
Fulker, D. W. et al. Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet. 64, 259–267 (1999). Forms the basis of the likelihood approaches for quantitative traits in family-based studies with correction for admixture.
Cox, D. R. & Hinkley, D. V. Theoretical Statistics 18–23 (Chapman and Hall, London, 1974).
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genet. 38, 203–208 (2006).
Laird, N. et al. in Respiratory Genetics (eds Silverman, E. et al.) 27–46 (Hodder Arnold, Boston, 2005).
Weinberg, C. R. Studying parents and grandparents to assess genetic contributions to early-onset disease. Am. J. Hum. Genet. 72, 438–447 (2003).
Martin, E. R., Kaplan, N. L. & Weir, B. S. Tests for linkage and association in nuclear families. Am. J. Hum. Genet. 61, 439–448 (1997).
Thompson, G. Mapping disease genes: family-based association studies. Am. J. Hum. Genet. 57, 487–498 (1995).
Schneiter, K., Laird, N. & Corcoran, C. Exact family-based association tests for biallelic data. Genet. Epidemiol. 29, 185–194 (2005).
Lake, S. L., Blacker, D. & Laird, N. M. Family-based tests of association in the presence of linkage. Am. J. Hum. Genet. 67, 1515–1525 (2000).
Curtis, D., Miller, M. B. & Sham, P. C. Combining the sibling disequilibrium test and transmission/disequilibrium test for multiallelic markers. Am. J. Hum. Genet. 64, 1785–1786 (1999).
Horvath, S. & Laird, N. M. A discordant-sibship test for disequilibrium and linkage: no need for parental data. Am. J. Hum. Genet. 63, 1886–1897 (1998).
Spielman, R. S. & Ewens, W. J. A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. Am. J. Hum. Genet. 62, 450–458 (1998).
Knapp, M. The transmission/disequilibrium test and parental-genotype reconstruction: the reconstruction-combined transmission/ disequilibrium test. Am. J. Hum. Genet. 64, 861–870 (1999).
Horvath, S. et al. Family-based tests for associating haplotypes with general phenotype data: application to asthma genetics. Genet. Epidemiol. 26, 61–69 (2004).
Dudbridge, F. Pedigree disequilibrium tests for multilocus haplotypes. Genet. Epidemiol. 25, 115–121 (2003).
Cordell, H. J., Barratt, B. J. & Clayton, D. G. Case/pseudocontrol analysis in genetic association studies: a unified framework for detection of genotype and haplotype associations, gene–gene and gene–environment interactions, and parent-of-origin effects. Genet. Epidemiol. 26, 167–185 (2004).
Purcell, S., Sham, P. & Daly, M. J. Parental phenotypes in family-based association analysis. Am. J. Hum. Genet. 76, 249–259 (2005).
Whittaker, J. C. & Lewis, C. M. Power comparisons of the transmission/disequilibrium test and sib-transmission/disequilibrium-test statistics. Am. J. Hum. Genet. 65, 578–580 (1999).
Lange, C. & Laird, N. Analytical sample size and power calculations for a general class of family-based association tests: dichotomous traits. Am. J. Hum. Genet. 71, 575–584 (2002).
Whittaker, J. C. & Lewis, C. M. The effect of family structure on linkage tests using allelic association. Am. J. Hum. Genet. 63, 889–897 (1998).
Lange, C. & Laird, N. M. On a general class of conditional tests for family-based association studies in genetics: the asymptotic distribution, the conditional power, and optimality considerations. Genet. Epidemiol. 23, 165–180 (2002).
Abecasis, G. R., Cardon, L. R. & Cookson, W. O. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66, 279–292 (2000).
Gauderman, W. J. Candidate gene association analysis for a quantitative trait, using parent-offspring trios. Genet. Epidemiol. 25, 327–338 (2003).
Lunetta, K. L. et al. Family-based tests of association and linkage that use unaffected sibs, covariates, and interactions. Am. J. Hum. Genet. 66, 605–614 (2000).
Lange, C. et al. A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Stat. Appl. Genet. Mol. Biol. 3, 17 (2004).
Lange, C., DeMeo, D. L. & Laird, N. M. Power and design considerations for a general class of family-based association tests: quantitative traits. Am. J. Hum. Genet. 71, 1330–1341 (2002).
Weiss, S. T. The origins of childhood asthma. Monaldi Arch. Chest Dis. 49, 154–158 (1994).
Weiss, S. T. Epidemiology and heterogeneity of asthma. Ann. Allergy Asthma Immunol. 87 (1 Suppl. 1), 5–8 (2001).
Silverman, E. K. et al. Familial aggregation of severe, early-onset COPD: candidate gene approaches. Chest 117 (5 Suppl. 1), 273S–274S (2000).
Demeo, D. L. et al. The SERPINE2 gene is associated with chronic obstructive pulmonary disease. Am. J. Hum. Genet. 78, 253–264 (2005).
Celedon, J. C. et al. The transforming growth factor-β1 (TGFB1) gene is associated with chronic obstructive pulmonary disease (COPD). Hum. Mol. Genet. 13, 1649–1656 (2004).
Todd, R. Genetics of attention deficit/hyperactivity disorder: are we ready for molecular genetic studies? Am. J. Med. Genet. 96, 241–243 (2000).
Lange, C. et al. A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics 4, 195–206 (2003).
Mokliatchouk, O., Blacker, O. & Rabinowitz, D. Association tests for traits with variable age at onset. Hum. Hered. 51, 46–53 (2001).
Lange, C., Blacker, D. & Laird, N. M. Family-based association tests for survival and times-to-onset analysis. Stat. Med. 23, 179–189 (2004).
Jiang, H. et al. Family-based association test for time-to-onset data with time-dependent differences between the hazard functions. Genet. Epidemiol. 30, 124–132 (2005).
Shih, M. C. & Whittemore, A. S. Tests for genetic association using family data. Genet. Epidemiol. 22, 128–145 (2002).
Lange, C. et al. Using the noninformative families in family-based association tests: a powerful new testing strategy. Am. J. Hum. Genet. 73, 801–811 (2003).
Lange, C. et al. PBAT: tools for family-based association studies. Am. J. Hum. Genet. 74, 367–369 (2004).
Van Steen, K. et al. Genomic screening and replication using the same data set in family-based association testing. Nature Genet. 37, 683–691 (2005). Demonstrates that the multi-testing problem can be handled at a genome-wide level in family-based association tests.
Lasky-Su, J. et al. Family-based association analysis of a statistically derived quantities trait for ADDO reveals an association in DRD4 with inattentive simony in AD individuals. Am. J. Med. Genet. B Neurophyschiatr. Genet. 138B, 57–58 (2005).
Thomas, D., Xie, R. & Gebregziabher, M. Two-stage sampling designs for gene association studies. Genet. Epidemiol. 27, 401–414 (2004).
Zaykin, D. V. & Zhivotovsky, L. A. Ranks of genuine associations in whole-genome scans. Genetics 171, 813–823 (2005).
Rosner, B. Fundamentals of Biostatistics 5th edn 527–530 (Duxbury, Boston MA,1995).
Hochberg, Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800–802 (1988).
Herbert, A. et al. A common genetic variant 10 kb upstream of INSIG2 is associated with adult and childhood obesity. Science (in the press).
Gordon, D. et al. A transmission disequilibrium test for general pedigrees that is robust to the presence of random genotyping errors and any number of untyped parents. Eur. J. Hum. Genet. 12, 752–761 (2004).
Gordon, D. et al. A transmission/disequilibrium test that allows for genotyping errors in the analysis of single-nucleotide polymorphism data. Am. J. Hum. Genet. 69, 371–380 (2001).
Gordon, D. & Ott, J. Assessment and management of single nucleotide polymorphism genotype errors in genetic association analysis. Pac. Symp. Biocomput. 18–29 (2001).
Spielman, R. S., McGinnis, R. E. & Ewens, W. J. Transmission test for linkage disequillibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52, 506–516 (1993). Proposed the original idea of the TDT.
Laird, N. M., Horvath, S. & Xu, X. Implementing a unified approach to family-based tests of association. Genet. Epidemiol. 19 (Suppl. 1), S36–S42 (2000).
Self, S. et al. On estimating HLA/disease association with application to a study of aplastic anemia. Biometrics 47, 53–61 (1991).
Cordell, H. J. & Clayton, D. G. A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes. Am. J. Hum. Genet. 70, 124–141 (2002).
Schaid, D. J. General score tests for associations of genetic markers with disease using cases and their parents. Genet. Epidemiol. 13, 423–449 (1996). Shows how the TDT can be derived as a score statistic from a multinomial likelihood model.
Clayton, D. A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission. Am. J. Hum. Genet. 65, 1170–1177 (1999). The first haplotype-analysis paper to use a likelihood approach.
Whittemore, A. S. & Tu, I. P. Detection of disease genes by use of family data. I. Likelihood-based theory. Am. J. Hum. Genet. 66, 1328–1340 (2000). Generalized Schaid's-likelihood approach to handle missing parents, multiple offspring and incorporate founders into the test statistic.
Horvath, S., Xu, X. & Laird, N. M. The family based association test method: strategies for studying general genotype–phenotype associations. Euro. J. Hum. Gen. 9, 301–306 (2001).
Weinberg, C. R., Wilcox, A. J. & Lie, R. T. A log-linear approach to case-parent-triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting. Am. J. Hum. Genet. 62, 969–978 (1998).
Weinberg, C. R. Allowing for missing parents in genetic studies of case-parent triads. Am. J. Hum. Genet. 64, 1186–1193 (1999).
Weinberg, C. R. Methods for detection of parent-of-origin effects in genetic studies of case-parents triads. Am. J. Hum. Genet. 65, 229–235 (1999).
Umbach, D. M. & Weinberg, C. R. The use of case-parent triads to study joint effects of genotype and exposure. Am. J. Hum. Genet. 66, 251–261 (2000).
Kistner, E. O. & Weinberg, C. R. Method for using complete and incomplete trios to identify genes related to a quantitative trait. Genet. Epidemiol. 27, 33–42 (2004).
Kistner, E. O. & Weinberg, C. R. A method for identifying genes related to a quantitative trait, incorporating multiple siblings and missing parents. Genet. Epidemiol. 29, 155–165 (2005).
Kistner, E. O., Infante-Rivard, C. & Weinberg, C. R. A method for using incomplete triads to test maternally mediated genetic effects and parent-of-origin effects in relation to a quantitative trait. Am. J. Epidemiol. 163, 255–261 (2006).
Witte, J. S., Gauderman, W. J. & Thomas, D. C. Asymptotic bias and efficiency in case–control studies of candidate genes and gene–environment interactions: basic family designs. Am. J. Epidemiol. 149, 693–705 (1999).
Acknowledgements
This work was supported by the National Institute of Mental Health and the National Heart, Lung and Blood Institute, USA. We would like to thank C. Garcia for with help with the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
Related links
FURTHER INFORMATION
Glossary
- Linkage analysis
-
A method for localizing genes that is based on the co-inheritance of genetic markers and phenotypes in families over several generations.
- Association studies
-
A gene-discovery strategy that compares allele frequencies in cases and controls to assess the contribution of genetic variants to phenotypes in specific populations.
- Candidate gene
-
A gene for which there is evidence, usually functional, for a possible role in a disease or trait of interest.
- Power
-
The ability of a study to obtain a significant result if this result is true in the underlying population from which the study subjects were sampled.
- Multiple-hypothesis testing
-
Many different statistical tests are used on the same sample; for example, many genetic markers might be tested against many different phenotypes. Failure to account for multiple testing inflates the study-wide type-1 error rate.
- Population substructure
-
Characteristics of a population, such as admixture, population stratification and/or inbreeding, which might distort the distribution of the standard association statistics, leading to increased type-1 error and/or decreased power.
- Genome-wide association studies
-
Studies designed to look for association between disease and a dense set of markers covering the entire genome.
- Case–control study
-
An epidemiological study design in which cases with a defined condition and controls without this condition are sampled from the same population. Risk-factor information is compared between the two groups to investigate the potential role of these in the aetiology of the condition.
- Case–cohort study
-
Similar to a case–control study, except both cases and controls are drawn from an existing cohort of subjects who are being followed to study a broad spectrum of diseases and risk factors.
- Proband
-
In a family study, this is the individual who is first identified in the family as having the disease under study.
- Odds ratio
-
The odds of exposure to the susceptible genetic variant in cases compared with controls. If the odds ratio is significantly greater than one, then the genetic variant is associated with the disease.
- Monte Carlo
-
A method for obtaining a p-value for a test statistic by drawing repeated samples from the null distribution of the data, computing the p-value for the same statistic for each sample, and comparing the observed p-value to the distribution of p-values obtained from the samples.
- Likelihood
-
A statistical model for analysing data that requires specifying a particular form for the distribution of the data.
- Admixture
-
This occurs when two or more subpopulations inbreed, so that two randomly chosen individuals in the population might have different degrees of genetic heritage from the original subpopulations.
- Population stratification
-
The presence in a population of distinct strata or groups that show limited inbreeding; they might have different disease rates and distinct allele-frequency distributions. Failure to control for the stratification can invalidate tests of association.
- Linkage disequilibrium
-
(LD). This occurs when alleles at two different loci are associated in a population because of tight linkage.
- Haplotype
-
A set of alleles at different loci that are present together on the same chromosome.
- Phase
-
The arrangement of alleles at multiple loci on homologous chromosomes. For example, in a diploid individual with genotype Aa at one locus and genotype Bb at another locus, possible linkage phases are BA/ba or Ba/bA, where '/' separates the two homologous chromosomes.
- Covariance
-
A measure of association between two variables that characterizes the tendency for the two variables to co-vary around their mean in a systematic way.
- Informative families
-
Families that make a contribution to the FBAT test; that is, those with at least one heterozygote parent, or sibships with at least two distinct genotypes.
- Nuisance parameters
-
Parameters that are not the primary focus of a statistical analysis, but for which misspecficiation might lead to biased results, for example, allele frequency in association tests.
- Sufficient statistics
-
A data reduction function that retains all information about an unknown parameter; they are used to remove the dependence of a test on nuisance parameters that are unknown or difficult to model.
- Confounding
-
A measure of the association between a disease and a risk factor is distorted because other variables, associated with both the disease and the risk factor, are not controlled for in the calculation of the measure of association.
- Likelihood-ratio tests
-
A class of statistical tests obtained by comparing the likelihood statistic under the alternative hypothesis to the likelihood under the null hypothesis.
- Score tests
-
A class of statistical tests that are derived from a likelihood model and are generally easier to compute than likelihood-ratio tests.
- Identity-by-descent
-
(IBD). An allele shared by two related individuals is said to be identical-by-descent if the allele is inherited from the same common ancestor.
- Permutation
-
An approach in which the actual data are randomized many times to generate a distribution of outcomes, so that the fraction of observations with values that are more extreme than the outcome that is observed with the real data reflects the statistical significance.
- Outcome space
-
Set of all possible genotype configurations for a specific pedigree that are plausible under Mendelian transmissions, and consistent with the sufficient statistics for parental genotype.
- Discordant sibs
-
A family design for testing association that uses a case and his/her unaffected sib.
- Nested models
-
A sequence of statistical models, each specifying a different hypothesis, such that each model in the sequence contains one more factor than the preceeding model. Nested models are often used to test for the presence of interactions between two or more risk factors.
- Multiplicative genetic model
-
A genetic model for penetrance functions that assumes the relative risk for disease given two alleles is the square of the relative risk for disease given only one allele.
- Linear regression
-
A statistical method used to test and to describe the linear relationship between two or more variables.
- Type-1 error
-
The probability that the null hypothesis is falsely rejected.
- Intermediate phenotypes or endophenotypes
-
Measured biological variables intermediate between genotype and external phenotype that can indicate susceptibility to, or manifest as early signs of, a wide range of diseases or disorders.
- Imputed
-
A statistical method for handling missing data which replaces the missing values by estimated values.
- Bonferroni or Hochberg corrections
-
Statistical methods, proposed by Bonferroni and Hochberg, for controlling type-1 error (false positives) in the presence of multiple testing.
Rights and permissions
About this article
Cite this article
Laird, N., Lange, C. Family-based designs in the age of large-scale gene-association studies. Nat Rev Genet 7, 385–394 (2006). https://doi.org/10.1038/nrg1839
Issue Date:
DOI: https://doi.org/10.1038/nrg1839
This article is cited by
-
Transmission disequilibrium analysis of whole genome data in childhood-onset systemic lupus erythematosus
Genes & Immunity (2023)
-
Region-based analysis of rare genomic variants in whole-genome sequencing datasets reveal two novel Alzheimer’s disease-associated genes: DTNB and DLG2
Molecular Psychiatry (2022)
-
Identification of Novel Alzheimer’s Disease Loci Using Sex-Specific Family-Based Association Analysis of Whole-Genome Sequence Data
Scientific Reports (2020)
-
A Mixture Model for Bivariate Interval-Censored Failure Times with Dependent Susceptibility
Statistics in Biosciences (2020)
-
Rediscovering the value of families for psychiatric genetics research
Molecular Psychiatry (2019)