Power of QTL detection using association tests with family controls

Hernández-Sánchez, Jules; Haley, Chris S; Visscher, Peter M

doi:10.1038/sj.ejhg.5201042

Download PDF

Article
Published: 22 October 2003

Power of QTL detection using association tests with family controls

Jules Hernández-Sánchez¹,
Chris S Haley¹ &
Peter M Visscher²

European Journal of Human Genetics volume 11, pages 819–827 (2003)Cite this article

748 Accesses
4 Citations
Metrics details

Abstract

The power of testing for a population-wide association between a biallelic quantitative trait locus and a linked biallelic marker locus is predicted both empirically and deterministically for several tests. The tests were based on the analysis of variance (ANOVA) and on a number of transmission disequilibrium tests (TDT). Deterministic power predictions made use of family information, and were functions of population parameters including linkage disequilibrium, allele frequencies, and recombination rate. Deterministic power predictions were very close to the empirical power from simulations in all scenarios considered in this study. The different TDTs had very similar power, intermediate between one-way and nested ANOVAs. One-way ANOVA was the only test that was not robust against spurious disequilibrium. Our general framework for predicting power deterministically can be used to predict power in other association tests. Deterministic power calculations are a powerful tool for researchers to plan and evaluate experiments and obviate the need for elaborate simulation studies.

Robust association tests for quantitative traits on the X chromosome

Article 10 September 2022

Population size in QTL detection using quantile regression in genome-wide association studies

Article Open access 13 June 2023

Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics

Article 25 May 2020

Introduction

Geneticists have been successful in mapping genes underlying rare, monogenic disorders with clear patterns of Mendelian inheritance.^1,2,3 However, mapping genes underlying complex traits, such as common multifactorial diseases, has been more difficult.^4,5 Genes can be mapped via linkage or association tests. Although both strategies exploit the cosegregation of markers with phenotypes, there are some striking differences between them. For example, in humans, genome-wide searches may require testing 30 000–500 000 single-nucleotide polymorphisms (SNPs) to detect significant associations, compared to 200–400 microsatellites in a linkage analysis.^6,7,8 Moreover, theoretical work suggests that finding associations between markers and complex diseases is more powerful than searching for linkage, even if many SNPs have to be tested and significance thresholds are raised to compensate for multiple testing.⁹ However, not all association tests are robust to spurious associations.¹⁰ This explains why association tests using family controls, such as the transmission/disequilibrium test (TDT), have been favoured over association tests using random controls, for example, case-control, because the former are robust to spurious associations caused by population stratification, or recent admixture.^11,12,13

Power studies of association tests will help researchers to design appropriate experiments, and to choose the most powerful test for the analysis of data. In this study, we investigate the power of association tests for quantitative traits, with and without family controls. Allison¹⁴ proposed five TDTs (TDT_Q1–Q5) for analysing quantitative traits under different ascertainment conditions, and we have included the most powerful, TDT_Q5, in this study. Long and Langley¹⁵ compared the power of five random controls association tests and the TDT_Q5, and found that TDT_Q5 was always the least powerful test. Nevertheless, these authors acknowledged that under population stratification, the type-I error rate of association tests using random controls can rise above the nominal level set for the experiment. Xiong et al¹⁶ proposed the TDT_G, an extension of Allison's TDT_Q1 that accounts for any number of sibs per family, families with one or two heterozygous parents, and any number of alleles at the marker locus. They found that TDT_G is more powerful than TDT_Q1, the Haseman–Elston linkage test,¹⁷ and an extreme discordant sib pair test.¹⁶ Lastly, Rabinowitz¹⁸ developed a TDT (referred to here as TDT_R) to model explicitly the correlation between a quantitative trait and marker segregation.

In this study, several tests of association have been compared in terms of power, both empirically and deterministically. Our deterministic approximation for predicting power was based on the calculation of noncentrality parameters (NCPs).¹⁹ The accuracy of this methodology, which can be used to predict power of other association tests, was validated via simulation.

Materials and methods

Definition of the evaluated association tests

The power of five tests to detect both linkage and/or association between a marker locus and a Quantitative Trait Locus (QTL) was studied empirically (simulations) and deterministically (calculation of NCP). Table 1 shows all the tests used in this study: the one-way analysis of variance (one-way ANOVA), the nested analysis of variance (nested ANOVA), the TDT_Q5, the TDT_R, and the TDT_G.^14,16,18,20 A general deterministic method for predicting power at a linked marker is proposed in this study, and implementation examples are given for one-way ANOVA, TDT_R, and TDT_Q5. The one-way ANOVA test uses a sample of unrelated individuals who have been both genotyped and phenotyped, whereas the other four tests use the same but with the inclusion of the genotypes of the parents. Recombination rate and linkage disequilibrium were denoted c and D, respectively.

Table 1 Main features of the tests compared in this study assuming family trios

Full size table

One-way ANOVA

The one-way ANOVA contrasts marker genotype means among the progeny. This is the simplest and the most powerful test of association, although it is prone to high type-I error rates in the presence of spurious association, viz. disequilibrium without linkage.¹⁵ This is so because the null hypothesis (H₀) being tested by one-way ANOVA is no association, regardless of linkage. Therefore, H₀ could be rejected when testing unlinked marker loci (c = ½) if there was a sufficiently strong population-wide association (D≠0). This lack of robustness is common to tests that do not use family controls, for example, case–control studies.^12,13 The test statistic follows an F_2,n'-3 distribution under H₀, given a total sample size n′ and three different genotype groups.²⁰ The sum of squares between genotype groups, after subtracting the overall mean effect, reflects differences between marker genotypes. Hence, a significant statistic suggests greater differences between genotypes than would be expected under the assumption of linkage equilibrium between the QTL and the marker. Under the alternative hypothesis (H₁) of association, the distribution of the test statistic is a noncentral F with NCP equal to λ₀, or a noncentral $χ_{2, λ_{0}}^{2} / 2$ for large n.

Nested ANOVA

A way of overcoming the lack of robustness of one-way ANOVA is to contrast marker genotype means of progeny within parental types, using a nested ANOVA design.²⁰ Parental type represents a particular combination of parental marker genotypes, and family type a particular combination of marker genotypes across all family members (Table 2). Thus, the H₀ being tested by nested ANOVA is no association within parental types. There must be at least two progeny with different marker genotypes within each parental type for there to be a contrast; therefore, only those families with at least one heterozygous parent, that is, informative families, are used. This type of family ascertainment increases the degrees of freedom (df) between groups and reduces the df within groups, resulting in a loss of power compared to one-way ANOVA. The appropriate F-test in nested ANOVA is a ratio of between to within genotype mean squares within parental types. The test follows an F_α,β distribution under H₀, where α = ∑_{i = 1}^γ ng_i -1, and ng_i is the observed number of genotypes within parental type i, γ the observed number of parental types, and β=n–(α+γ), where n is the number of informative families.

Table 2 Probability of each family type given a biallelic marker and assuming Hardy–Weinberg equilibrium

Full size table

TDT_Q5

The original statistic for TDT_Q5 is [(SS_F - SS_R) / 2]/[(SS_T - SS_F)/ (n - 5)], where SS_T is the total sum of squares, SS_R is the sum of squares explained by a reduced model that fits an overall mean and two (out of three) informative parental types as fixed factors, and SS_F is the sum of squares explained by a full model that fits, in addition to the reduced model, two more fixed factors to estimate additive and dominant effects.¹⁴ The total number of informative families is n. TDT_Q5 is testing whether a significant amount of phenotypic variation can be explained by marker genotypes in the progeny, over and above the variation already explained by parental type. TDT_Q5 follows an F_{2,n - 5} distribution under H₀ if residuals are normally distributed, or χ₂² /2 for large n. Under H₁, TDT_Q5 follows a noncentral $F_{2, n -5, λ_{Q 5}}$ , or a noncentral $χ_{2, λ_{Q 5}}^{2} / 2$ /2 for large n, with NCP λ_Q5. The TDT_Q5 is equivalent to a two-way ANOVA with a cross-classified design where the factors are parental type and progeny marker genotype (Appendix B).

TDT_R

Although Rabinowitz¹⁸ derived a NCP, he used parameters not included in his simulations, leading to some confusion in interpreting and calculating λ_R. Therefore, we developed a neater NCP for TDT_R. The TDT_R is calculated as T/σ_T for a biallelic marker. T measures the strength of the covariance between the transmission of a marker allele, from heterozygous parents to progeny, and the phenotype of progeny, and σ_T is the standard deviation of T. We will next describe TDT_R in detail, as this information will be needed for further statistical developments. The numerator is T = ∑_iⁿ (y_i - ȳ)w_i, where y_i is the phenotype of the ith child, ȳ is the overall mean (or the mean among informative families), and w_i are weights given to each family type (Table 3). The sum is over n informative and unrelated family trios randomly drawn from a population. The variance of T is σ_T² = ¼ ∑_iⁿ (y_i - ȳ)²h_i where h_i is the number of heterozygous parents in the family (Table 3). Under H₀, TDT_R follows a t_n−1 distribution, so (TDT_R)² follows an F_1,n−1 distribution. Under the alternative hypothesis, (TDT_R)² follows a noncentral F with NCP λ_R, or a noncentral $χ_{1, λ_{R}}^{2}$ for large n.

Table 3 Variables in TDT_R

Full size table

TDT_G

The last test being considered is TDT_G.¹⁶ For a biallelic marker where ȳ_M (ȳ_m) is the mean among progeny having inherited allele M (m) from heterozygous parents, and n_M (n_m) is the number of times allele M (m) is transmitted. The variance of (ȳ_M - ȳ_m) is

where

and y_M_k is the phenotype of the child of the kth parent. The latter sum is over the 2n parents in a sample of n family trios. If all family members are Mm heterozygous, then the same information is included in both allele categories. For a normally distributed trait and large n, TDT_G follows a χ₁² distribution under H₀. The asymptotic distribution under H₁ is a noncentral $χ_{1, λ_{G}}^{2}$ with NCP λ_G.

Empirical power

Power was calculated empirically as the proportion of significant results out of 1000 analyses of independent data sets, simulated under specific combinations of parameter values. Each sample consisted of n=200 unrelated family trios (father, mother, and a single child). The frequencies of the positive allele (Q) from a biallelic QTL were p_Q=[0.5, 0.3, 0.1], and the same frequencies were assigned to allele M from a biallelic marker linked to the QTL. The recombination rates between the marker and the QTL were c=[0, 0.1, 0.3, 0.4, 0.5]. QTL and marker genotypes were generated for all individuals. Phenotypes were generated only for the progeny by adding a normally distributed error with variance σ_e² = 1, plus -1/2, 0, or ½ for QTL genotypes qq, qQ, or QQ, respectively. Neither dominance nor polygenic effects were simulated. The level of association between allele Q at the QTL and allele M at the marker was given by the standardised linkage disequilibrium parameter D′ = [0, ½, 1].²¹

Deterministic power

We have developed a compound method with two parts for predicting power of association tests deterministically. The first part consisted in calculating the expected effect of marker genotypes as functions of underlying QTL genotypes, conditional on population parameters and family type. This part can be used to predict power in other association tests, in addition to the ones in this study. The second part consisted in calculating the NCP as a function of marker contrasts specific to each test.

Expected marker effects

Consider the 10 different family types at a biallelic marker (Table 2), and let X_j be a vector with the marker genotypes of child, father, and mother in a family of type j, for example, X₁=[MM, MM, MM]. Let G_i denote the ith QTL genotype of the child, that is, G₁=QQ, G₂=Qq, and G₃=qq. The expected phenotype (y) of a child given the ith family type, assuming no dominance, is

where a is the effect of substituting allele q for Q, assumed to be ½. The conditional probabilities P(G₁∣X_i) and P(G₃∣X_i) can be calculated using Tables W1, W2, W3, W4, available on the web (www.nature.com/ejhg/5201042).^22,23 For example, the probability of QTL genotype QQ given X₁ is

where P(G₁ ∩ X₁) is the joint probability of QTL genotype QQ in the child and marker genotype MM in all members of the family, P(X₁) is the probability of family type 1 (Table 2), and h₁ is the probability of drawing haplotype QM from the population which, assuming random mating and no segregation distortion, is h₁ = h_QM = p_Q, p_M + D_QM. (Note: D_QM=D′D_max, and if D′>0 then D_max=min{p_qp_M, p_Qp_m}.)²⁴ The joint probability P(G₁ ∩ X₁ can be obtained from Table W4 by multiplying the third and the sixth columns and adding up all. The conditional probabilities P[G_i∣X_j], for i=1, 2, 3 and j=1…10, are all summarised in Table 4.

Table 7 Probabilities of 4 parental haplotypes and expected frequency of QTL genotypes in progeny given Mm and Mm parents and MM, Mm or mm progeny

Full size table

Table 8 Probabilities of 4 parental haplotypes and expected frequency of QTL genotypes in progeny given MM and Mm parents and MM or Mm progeny

Full size table

Table 9 Probabilities of 4 parental haplotypes and expected frequency of QTL genotypes in progeny given mm and Mm parents and Mm or mm progeny

Full size table

Table 10 Probabilities of 4 parental haplotypes and expected frequency of QTL genotypes in progeny given MM and MM, or mm and mm, or MM and mm parents and the marker genotype in their progeny

Full size table

Table 4 Conditional QTL genotype probabilities in a child, given the family type (FT), and population parameters D, c, P_M, P_m, P_Q, and P_q

Full size table

Noncentrality parameters (NCP)

The NCP for the one-way ANOVA (λ_O) can be obtained applying the formula²⁵

The sum in Equation (2) is over all three marker genotype classes, the vector B′ contains the three marker genotype means [μ_MM, μ_Mm, μ_mm], and X′X is a matrix with diagonal elements [n_MM, n_Mm, n_mm] and zeroes elsewhere, where n_i is the sample size corresponding to marker genotype i. Equation (2) represents the sum of squares due to both the marker locus and the sample mean (μ). The appropriate λ_O can be obtained after subtracting from Eq. (2) the sum of squares due to the sample mean, that is, n′μ², where n′ = n_MM + n_Mm + n_mm. When testing the QTL (ie conditioning on c=0, D′=1, and p_Q=p_M), and assuming no dominance, Eq. (2) simplifies to

where σ_QTL² = 2p_Q p_q a².²⁶

In Appendix B, we have shown that TDT_Q5 is equivalent to a two-way ANOVA analysis, where data are modelled fitting parental type and progeny genotype as fixed factors, in addition to μ. Taking this equivalence into account, the NCP λ_Q5, derived in Appendix A, is

where b_i is the expected marker genotype effect in progeny of family type i (Table 3), n_i is the number of type i families, I_i(j) is an indicator variable that takes the value 1 when the family is informative (viz. at least one heterozygous parent), and 0 otherwise, F_j is the mean value of the jth parental type, and f_j the number of j parental types. Eq. (4) measures, in σ_e² units, the amount of total sum of squares explained by the marker, after subtracting the parental type effect. When testing the QTL, Eq. (4) reduces to

The NCP for TDT_R (λ_R) is approximately

(Appendix C). When testing the QTL, Eq. (6) simplifies to

Finally, the NCP for TDT_G (λ_G) is¹⁶

where n is the number of informative families. When testing the QTL in family trios, the appropriate NCP is¹⁶

The differences between the four NCPs λ_O, λ_Q5, λ_G, and λ_R are easily appreciated in Table 5, for both large and small sample sizes. In all cases, the QTL allele frequency and effect size only affect λ through the QTL variance.

Table 5 Noncentrality parameters (λ) given c=0 and D′=1, and distribution under H₀ for small and large sample sizes

Full size table

Results

Empirical versus deterministic power

We have developed formulae to calculate NCPs for one-way ANOVA (λ_O), TDT_Q5 (λ_Q5), and TDT_R (λ_R), assuming that the sample consists of family trios. Once these λ's are obtained, power can be calculated from the appropriate noncentral distributions. Xiong et al¹⁶ derived the equation for the NCP of TDT_G (λ_G). Figure 1 shows that predictions of power using our deterministic method (lines) match very well the simulation results (points). Power is shown as a function of c for three different allele frequencies denoted with circles (p=0.5), triangles (p=0.3), and squares (p=0.1), while averaging out D′. The NCP of nested ANOVA can also be calculated following this method; however, simulation results showed that nested ANOVA is the least powerful method by far, and therefore we concentrated on deriving the other NCPs. In addition to the close match between deterministic and empirical power, two other features in Figure 1 are worth mentioning. First, power decayed more when p dropped from 0.3 to 0.1, than when it dropped from 0.5 to 0.3. This is because the loss of information is relatively more important in the former than in the latter drop. Second, TDT_Q5 was less powerful than TDT_R, whereas the contrary was true in Table 6. This can be explained by the fact that, in Figure 1, TDT_Q5 was implemented as described by Allison,¹⁴ that is, using only informative families, and estimating both additive and dominant effects. The NCP λ_Q5 was obtained assuming this model. However, the power of TDT_Q5 increases when the dominant parameter need not be estimated.

Table 6 Empirical power (%) of tests per single parameter

Full size table

Power ranking with more powerful models via simulations

The power of TDT_Q5 increases after removing the dominance parameter from the model when it is redundant, that is, the QTL has additive effects only. A further improvement in power, albeit slight, can be achieved by using all six parental types, whether informative or not. In doing so, TDT_Q5 follows an F_1,n′-4 distribution under H₀, as opposed to an F_2,n-5, where n′ (n) is the total number of (informative) families. Likewise, one-way ANOVA can become more powerful, fitting a simple regression line across genotypes to estimate additive QTL effects. Thus, one-way ANOVA will be distributed as F_1,n′-2 under H₀, as opposed to F_2,n′-3. All other tests remained unchanged, and power was estimated for all via simulations.

Table 6 shows empirical power across tests, focusing on each parameter at a time (c, p or D′), averaging across the other two parameters. The ranking of the tests in terms of power was the same across scenarios: first the one-way ANOVA, followed by TDT_Q5, TDT_G, and TDT_R (the last two with similar power), and lastly nested ANOVA. Table 6(a) shows power of the tests for a given c, averaging across values of D′ and p. The last row in Table 6(a) corresponds to the empirical type-I error for each test, ie, c = ½. The one-way ANOVA was the only test for which the empirical error exceeded the nominal 5%. This is caused by the fact that one-way ANOVA is testing whether D′ is significantly different from zero, regardless of c.¹⁵ Power declined steadily as c increased, because the amount of σ_QTL² explained by the marker decreased as interloci distance increased.

Table 6(b) shows power for a given D′, averaging across values of p and c. The power of one-way ANOVA reached ∼72% when D′=1, being approximately twice as powerful as the TDTs. Undoubtedly, if spurious association is not an issue, significant extra power can be obtained by testing genotype differences directly, as opposed to using robust tests. All tests showed ∼5% type-I error when D′=0, even for c=0.

Finally, Table 6(c) shows power for a given p, averaging across values of c and D′. Power decays as allele frequency becomes more extreme because (1) there are less informative families, and (2) the proportion of informative families with two heterozygous parents decreases. The first point directly causes a reduction in sample size. The second point means that less σ_QTL² is available to TDTs. TDTs owe their robustness to the fact that they use only within-family genetic variation, which is greater in families with two heterozygous parents. These results contrast with those of Allison,¹⁴ who concluded that power increases as p decreases. However, Allison¹⁴ kept σ_QTL² constant, so as p became more extreme, the QTL effect, and the mean difference between marker genotypes, increased, resulting in more powerful contrasts.

Discussion

A comprehensive review of methodology developed in the 1990s provided more than 60 references of association tests for monogenic diseases with Mendelian inheritance, and only about a dozen references of association tests for complex diseases.²⁷ Nevertheless, complex diseases are by far the commonest human ailments; for example, infectious and parasitic diseases, psychiatric disorders, and cardiovascular diseases affect ∼44% of the world population, compared to just 0.05% of Caucasians being affected by cystic fibrosis, the commonest of the monogenic diseases.^28,29

TDTs are increasingly used to identify QTLs underlying complex diseases because they can be more powerful than other tests, for example, linkage analysis, when markers are tightly linked to responsible QTLs, and because they are robust to spurious associations generated by common demographic events such as population stratification and/or admixture.^8,10

We have developed and verified deterministic power calculations for a range of association tests for quantitative traits, that is, three TDTs and two ANOVAs, and shown how the power depends on the effect of a QTL, the recombination rate between a QTL and a marker, and the amount of linkage disequilibrium between marker and QTL. In this study, we have assumed that both loci were biallelic, and shared the same allele frequencies. Moreover, we considered a continuously distributed trait genetically determined by a single additive QTL, without polygenic component or dominance. This simplistic scenario was chosen to facilitate the derivation of NCPs for predicting power. Nonetheless, we recognise that a more comprehensive picture of the properties of these tests requires analyses of more realistic situations, for example, including dominance and polygenic effects, which is possible within the framework presented here.

The deterministic method proposed in this study consists in deriving NCPs (λ's) as functions of marker genotype contrasts specific to each test. These λ's can subsequently be used to obtain power. A common feature across all λ's was the use of expected marker genotype means, conditional on family information, under the assumptions of random mating and no segregation distortion. The marker effects were functions of the standardised linkage disequilibrium (D'), the recombination rate (c), the allele frequencies (p_Q, p_M), and the size of the QTL (a). Allison¹⁴ derived λ_Q5 for TDT_Q5 when the marker is the trait locus, and we have obtained an alternative prediction of λ_Q5 for any recombination rate, and linkage disequilibrium in the parent population.

Power was also predicted empirically via stochastic simulations, and results confirmed the accuracy of our deterministic predictions. The advantages of deterministic over stochastic methods are (1) ease of implementation, (2) instant predictions, and (3) direct appreciation of the relationship between population parameters and power. However, deriving NCPs becomes cumbersome in complex scenarios. Thus, in these cases, empirical simulations are invaluable.

The tests ranked as follows in terms of power. The one-way ANOVA was the most powerful test of association across all scenarios, but also the only test not robust to spurious disequilibrium. The TDTs had similar, and intermediate, power. However, we showed how to increase the power of TDT_Q5 compared to the original version, if there is no dominance. Lastly, the nested ANOVA was the least powerful test of association.

The power of TDT_Q5 may have been previously overemphasised because complete linkage and linkage disequilibrium between marker and QTL were assumed, and family trios were sampled from a population of informative families.¹⁴ This sampling scheme means that the variance explained by the QTL is larger in the sample of informative trios than in the population at large, which would include both informative and noninformative families, and led to the counter-intuitive conclusion that the more extreme the allele frequency, the higher the power of TDT_Q5 to detect associations. In addition, Allison's¹⁴ comparison between TDT_Q5 and the Haseman–Elston linkage test¹⁷ favours TDT_Q5 because this is a test for association, and a perfect association was assumed, whereas the Haseman-Elston test is for linkage.

In summary, a new and accurate deterministic method has been developed to predict the power of QTL detection for TDTs and ANOVAs, as a function of population parameters. We have obtained specific formulae for the NCPs of the tests, when the marker is the QTL, as functions of sample size and QTL heritability. The method contains a general part (Table 4) that can be used to calculate NCPs for other association tests. Moreover, our method can also model dominant QTL effects, and a polygenic component. Extensions to cope with multiallelic markers are theoretically possible, although future association studies in human populations are more likely to employ vast arrays of SNPs than multiallelic markers.^30,31 Therefore, further developments of these approaches ought to be directed to coping with the problem of simultaneous testing of several loci, and the study of haplotypes.

References

Kerem B, Rommens JM, Buchanan JA et al: Identification of the cystic fibrosis gene: genetic analysis. Science 1989; 245: 1073–1079.
Article CAS PubMed Google Scholar
Hastbäcka J, de la Chapelle A, Kaitila I, Sistonen P, Weaver A, Lander E : Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nat Genet 1992; 2: 204–211.
Article PubMed Google Scholar
Hastbäcka J, de la Chapelle A, Mahtani MM et al: The diastrophic dysplasia gene encodes a novel sulfate transporter: positional cloning by fine-structure linkage disequilibrium mapping. Cell 1994; 78: 1073–1087.
Article PubMed Google Scholar
Terwilliger JD, Weiss KM : Linkage disequilibrium mapping of complex disease: fantasy or reality? Curr Opin Biotech 1998; 9: 578–594.
Article CAS PubMed Google Scholar
Schork NJ, Cardon LR, Xu X : The future of genetic epidemiology. Trends Genet 1998; 14: 266–272.
Article CAS PubMed Google Scholar
Kruglyak L : Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet 1999; 22: 139–144.
Article CAS PubMed Google Scholar
Ott J : Predicting the range of linkage disequilibrium. Proc Natl Acad Sci USA 2000; 97: 2–3.
Article CAS PubMed PubMed Central Google Scholar
Neale MC, Cherny SS, Sham PC et al: Distinguishing population stratification from genuine allelic effects with MX: association of ADH2 with alcohol consumption. Behav Genet 1999; 29: 233–243.
Article Google Scholar
Risch N, Merikangas K : The future of genetic studies of complex human diseases. Science 1996; 273: 1516–1517.
Article CAS PubMed Google Scholar
Wright AF, Carothers AD, Pirastu M : Population choice in mapping genes for complex diseases. Nat Genet 1999; 23: 397–404.
Article CAS PubMed Google Scholar
Spielman RS, McGinnis RE, Ewens WJ : Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993; 52: 506–516.
CAS PubMed PubMed Central Google Scholar
Clayton D : Population association; in Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics. New York: John Wiley & Sons Ltd., 2001, pp 519–540.
Google Scholar
Schork NJ, Fallin D, Thiel B et al: The future of genetic case–control studies; in Rao DC, Province MA (eds): Genetic dissection of complex traits (Advances in genetics, Vol 42). US: Academic Press, 2000, pp 191–212.
Google Scholar
Allison DB : Transmission-disequilibrium tests for quantitative traits. Am J Hum Genet 1997; 60: 676–690.
CAS PubMed PubMed Central Google Scholar
Long AD, Langley CH : The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res 1999; 9: 720–731.
CAS PubMed PubMed Central Google Scholar
Xiong MM, Krushkal J, Boerwinkle E : TDT statistics for mapping quantitative trait loci. Ann Hum Genet 1998; 62: 431–452.
Article CAS PubMed Google Scholar
Haseman JK, Elston RC : The investigation of linkage between a quantitative trait and a marker locus. Behav Genet 1972; 2: 3–19.
Article CAS PubMed Google Scholar
Rabinowitz D : A transmission disequilibrium test for quantitative trait loci. Hum Hered 1997; 47: 342–350.
Article CAS PubMed Google Scholar
Sham PC, Cherny SS, Purcell S, Hewitt JK : Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am J Hum Genet 2000; 6: 1616–1630.
Article Google Scholar
Sokal RR, Rohlf FJ : Biometry. New York US: WH Freeman and Company, 1995.
Google Scholar
Lewontin RC : On measures of gametic disequilibrium. Genetics 1988; 120: 849–852.
CAS PubMed PubMed Central Google Scholar
Jayakar SD : On the detection and estimation of linkage between a locus influencing a quantitative character and a marker locus. Biometrics 1970; 26: 451–464.
Article CAS PubMed Google Scholar
Hill AP : Quantitative linkage: a statistical procedure for its detection and estimation. Ann Hum Genet 1975; 38: 439–449.
Article CAS PubMed Google Scholar
Weir BS : Genetic data analysis II. Sunderland, US: Sinauer Associates, Inc. 1996.
Google Scholar
Searle SR : Linear models. New York: John Wiley & Sons, 1971.
Google Scholar
Falconer DS, Mackay TFC : Introduction to quantitative genetics. England: Longman Group Ltd, 1996.
Google Scholar
Zhao H : Family-based association studies. Stat Methods Med Res 2000; 9: 563–587.
Article CAS PubMed Google Scholar
The World Health Report. Part three: statistical annex. WHO, 1999, www.who.int/whr/1999/en/report.htm.
Underwood JCE : Genetic and environmental causes of disease; in Underwood JCE (ed): General and systematic pathology. London Churchill Livingstone, 1996, pp 31–60.
Google Scholar
Weiss KM, Terwilliger JD : How many diseases does it take to map a gene with SNPs? Nat Genet 2000; 26: 151–157.
Article CAS PubMed Google Scholar
Miller RD, Kwok PY : The birth and death of human single-nucleotide polymorphisms: new experimental evidence and implications for human history and medicine. Hum Mol Genet 2001; 20: 2195–2198.
Article Google Scholar
Lynch M, Walsh B : Genetics and analysis of quantitative traits. Sunderland, US: Sinauer Associates, Inc., 1998.
Google Scholar

Download references

Acknowledgements

We are grateful to Ian White and Dr O Southwood for helpful comments on earlier versions of this manuscript. This work has been supported by Sygen International, and by the Biotechnology and Biological Sciences Research Council of UK.

Author information

Authors and Affiliations

Roslin Institute (Edinburgh), Roslin, Midlothian, EH25 9PS, Scotland, UK
Jules Hernández-Sánchez & Chris S Haley
Institute of Cell, Animal and Population Biology, University of Edinburgh, West Mains Road, Edinburgh, EH9 3JT, Scotland, UK
Peter M Visscher

Authors

Jules Hernández-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Chris S Haley
View author publications
You can also search for this author in PubMed Google Scholar
Peter M Visscher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jules Hernández-Sánchez.

Appendices

Appendix A

The NCP (λ) of two-way ANOVA can be expressed as²⁵

Let σ_e² be unity. Let B′ be the vector [μ, f₁, f₂, f₃, g₁, g₂, g₃] of parameters in the model, where μ is the sample mean, f_i is the mean of the ith parental type, and g_j the mean of the jth marker genotype across all parental types. Let K be a matrix of parameter contrasts reflecting the H₀ being tested; for example, if H₀: g₁=g₂ and g₂=g₃, then

The matrix X′X is

where n_ij is the number of records in the ith family and jth marker genotype class. X′X is a matrix of order 7 and rank 5; hence, there are seven unknowns and only five df. An appropriate generalisation of X′X is obtained deleting the first row and column, hence setting μ=0, and the last row and column, hence setting g₃=0.²⁵ Let G be the reduced X′X matrix. This G matrix can be partitioned as follows:

Then, if C=G⁻¹, K^*′ is the matrix K′ with the first and last columns deleted, and B^* is the vector B with the first and last elements deleted, then

where C₂₂ = K^*CK^* = (G₂₂ - G₂₁ G₁₁^{- 1} G₁₂)^{- 1} and (K^*′ B^*)′ = [g₁ - g₂, g₂] When testing the QTL, Eq. (A2) gives

where the first part of (A3) corresponds to the sum of squares due to genotype, and the second part of (A3) corresponds to the sum of squares due to parental type.

However, it is a linked marker, rather than the QTL, what usually is being tested. Thus, Eq. (A3) needs to accommodate this fact. Using Tables 2 and 3 in the Materials and Methods section, the new λ can be written as

where b_i is the expected marker genotype effect among progeny in the ith trio class, n_i the number of trios in class i, I_i(j) an indicator variable equal to 1 if the trio is informative and 0 otherwise. Table A1 shows F_j, the mean value of the jth parental type, and f_j, the number of these parental types.

Table 11 Family mean (F_i) and number (f_i)

Full size table

It is also possible to use all trios, thus setting I_i(j)=1 for all i (j), without increasing the type-I error rate. By doing so, power increases slightly, through augmenting the residual df, and ascertainment of informative families becomes unnecessary.

This method of obtaining λ can be applied to derive the NCP for nested ANOVA; however, the algebra becomes more tedious. Finally, the NCP λ_O can be derived through Eq. (3), although a simpler method was described in the Materials and Methods section.

Appendix B

Let us consider two fixed effects, α and β, where α could represent the factor parental type, and β could represent the genotype of the progeny. Thus, the model can be written as y_ij=μ+α_i+β_i+e_ij, which corresponds to a two-way ANOVA model without interaction. We will now show that the original statistic F_2,n−5 for ${TDT}_{Q_{5}}$ ¹⁴ is equivalent to the F-ratio for testing the effects of β after having corrected for the effects due to μ and α, using the previous model.

For a constant k = 2/n - 5, we can see that

where SS_μ_α and SS_μ_α_β are the sum of squares explained by a model that fits μ and α, and by a model that fits μ, α, and β, respectively; SS_T is the total sum of squares; and R_β∣μ,α² and R_e² are the proportions of the total variance explained by β, after taking into account the effects of μ and α, and the proportion of unexplained variance, respectively. The null hypothesis of interest is whether factor β explains a significant amount of phenotypic variance over and above the amount explained by μ and α jointly. The F-ratio that appropriately reflects this null hypothesis is given in Eq. (B1).

Appendix C

Let assume T is a random variable following a t-distribution, and let σ_T be the standard deviation of T. A first-order Taylor's approximation for λ is λ = E(T/σ_T) ≈ E[T]/E[σ_T].³² In order to derive E[T] and E[σ_T], we used the probabilities of the 10 different types of trios and the expected effects of marker genotypes in the progeny contained in Tables 2 and 3. Hence, conditional on p_M, p_Q, c, and D′, E[T] = E[∑_iⁿ (y_i - ȳ)w_i], and because all family trios are independent (ie unrelated) E[T] = NE[(y - ȳ)w], where y, the phenotype, and w, a weighting factor, are expectations for a single trio (Table 3). Thus, the expected value of the numerator of TDT_R is approximately E[T] = Np_M p_m [p_M² (b₂ - b₃) + p_M p_m (b₅ - b₇) + p_m² (b₈ - b₉)]. When analysing the QTL, and assuming no dominance, the previous equation simplifies to E[T] = Np_Q p_qa.

The expected variance of T, E[σ_T²], is the same regardless of whether the locus being tested is the QTL or a marker. Equation (A1.23a) in Reference³² is , which reduces to if the second term can be ignored. Hence, E[σ_T²] = E[1/4 ∑_iⁿ (y_i - ȳ)² H_i] = 1/4E[(y - ȳ)² H] and, as the expectation of a random variable X given another random variable Y is E[X] = E[E[X∣Y]], E[(y - ȳ)² H] = ∑_{H = 0}² HP_HE(y - ȳ)² = p_M p_m σ_e² + σ_QTL² Finally, dividing E[T] by we obtain

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hernández-Sánchez, J., Haley, C. & Visscher, P. Power of QTL detection using association tests with family controls. Eur J Hum Genet 11, 819–827 (2003). https://doi.org/10.1038/sj.ejhg.5201042

Download citation

Received: 22 October 2002
Revised: 17 March 2003
Accepted: 16 April 2003
Published: 22 October 2003
Issue Date: 01 November 2003
DOI: https://doi.org/10.1038/sj.ejhg.5201042

Keywords

This article is cited by

Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses
- M J Sillanpää
Heredity (2011)

Power of QTL detection using association tests with family controls

Abstract

Similar content being viewed by others

Robust association tests for quantitative traits on the X chromosome

Population size in QTL detection using quantile regression in genome-wide association studies

Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics

Introduction

Materials and methods