Abstract
The power of testing for a population-wide association between a biallelic quantitative trait locus and a linked biallelic marker locus is predicted both empirically and deterministically for several tests. The tests were based on the analysis of variance (ANOVA) and on a number of transmission disequilibrium tests (TDT). Deterministic power predictions made use of family information, and were functions of population parameters including linkage disequilibrium, allele frequencies, and recombination rate. Deterministic power predictions were very close to the empirical power from simulations in all scenarios considered in this study. The different TDTs had very similar power, intermediate between one-way and nested ANOVAs. One-way ANOVA was the only test that was not robust against spurious disequilibrium. Our general framework for predicting power deterministically can be used to predict power in other association tests. Deterministic power calculations are a powerful tool for researchers to plan and evaluate experiments and obviate the need for elaborate simulation studies.
Similar content being viewed by others
Introduction
Geneticists have been successful in mapping genes underlying rare, monogenic disorders with clear patterns of Mendelian inheritance.1,2,3 However, mapping genes underlying complex traits, such as common multifactorial diseases, has been more difficult.4,5 Genes can be mapped via linkage or association tests. Although both strategies exploit the cosegregation of markers with phenotypes, there are some striking differences between them. For example, in humans, genome-wide searches may require testing 30 000–500 000 single-nucleotide polymorphisms (SNPs) to detect significant associations, compared to 200–400 microsatellites in a linkage analysis.6,7,8 Moreover, theoretical work suggests that finding associations between markers and complex diseases is more powerful than searching for linkage, even if many SNPs have to be tested and significance thresholds are raised to compensate for multiple testing.9 However, not all association tests are robust to spurious associations.10 This explains why association tests using family controls, such as the transmission/disequilibrium test (TDT), have been favoured over association tests using random controls, for example, case-control, because the former are robust to spurious associations caused by population stratification, or recent admixture.11,12,13
Power studies of association tests will help researchers to design appropriate experiments, and to choose the most powerful test for the analysis of data. In this study, we investigate the power of association tests for quantitative traits, with and without family controls. Allison14 proposed five TDTs (TDTQ1–Q5) for analysing quantitative traits under different ascertainment conditions, and we have included the most powerful, TDTQ5, in this study. Long and Langley15 compared the power of five random controls association tests and the TDTQ5, and found that TDTQ5 was always the least powerful test. Nevertheless, these authors acknowledged that under population stratification, the type-I error rate of association tests using random controls can rise above the nominal level set for the experiment. Xiong et al16 proposed the TDTG, an extension of Allison's TDTQ1 that accounts for any number of sibs per family, families with one or two heterozygous parents, and any number of alleles at the marker locus. They found that TDTG is more powerful than TDTQ1, the Haseman–Elston linkage test,17 and an extreme discordant sib pair test.16 Lastly, Rabinowitz18 developed a TDT (referred to here as TDTR) to model explicitly the correlation between a quantitative trait and marker segregation.
In this study, several tests of association have been compared in terms of power, both empirically and deterministically. Our deterministic approximation for predicting power was based on the calculation of noncentrality parameters (NCPs).19 The accuracy of this methodology, which can be used to predict power of other association tests, was validated via simulation.
Materials and methods
Definition of the evaluated association tests
The power of five tests to detect both linkage and/or association between a marker locus and a Quantitative Trait Locus (QTL) was studied empirically (simulations) and deterministically (calculation of NCP). Table 1 shows all the tests used in this study: the one-way analysis of variance (one-way ANOVA), the nested analysis of variance (nested ANOVA), the TDTQ5, the TDTR, and the TDTG.14,16,18,20 A general deterministic method for predicting power at a linked marker is proposed in this study, and implementation examples are given for one-way ANOVA, TDTR, and TDTQ5. The one-way ANOVA test uses a sample of unrelated individuals who have been both genotyped and phenotyped, whereas the other four tests use the same but with the inclusion of the genotypes of the parents. Recombination rate and linkage disequilibrium were denoted c and D, respectively.
One-way ANOVA
The one-way ANOVA contrasts marker genotype means among the progeny. This is the simplest and the most powerful test of association, although it is prone to high type-I error rates in the presence of spurious association, viz. disequilibrium without linkage.15 This is so because the null hypothesis (H0) being tested by one-way ANOVA is no association, regardless of linkage. Therefore, H0 could be rejected when testing unlinked marker loci (c = ½) if there was a sufficiently strong population-wide association (D≠0). This lack of robustness is common to tests that do not use family controls, for example, case–control studies.12,13 The test statistic follows an F2,n'-3 distribution under H0, given a total sample size n′ and three different genotype groups.20 The sum of squares between genotype groups, after subtracting the overall mean effect, reflects differences between marker genotypes. Hence, a significant statistic suggests greater differences between genotypes than would be expected under the assumption of linkage equilibrium between the QTL and the marker. Under the alternative hypothesis (H1) of association, the distribution of the test statistic is a noncentral F with NCP equal to λ0, or a noncentral for large n.
Nested ANOVA
A way of overcoming the lack of robustness of one-way ANOVA is to contrast marker genotype means of progeny within parental types, using a nested ANOVA design.20 Parental type represents a particular combination of parental marker genotypes, and family type a particular combination of marker genotypes across all family members (Table 2). Thus, the H0 being tested by nested ANOVA is no association within parental types. There must be at least two progeny with different marker genotypes within each parental type for there to be a contrast; therefore, only those families with at least one heterozygous parent, that is, informative families, are used. This type of family ascertainment increases the degrees of freedom (df) between groups and reduces the df within groups, resulting in a loss of power compared to one-way ANOVA. The appropriate F-test in nested ANOVA is a ratio of between to within genotype mean squares within parental types. The test follows an Fα,β distribution under H0, where α = ∑i = 1γ ngi -1, and ngi is the observed number of genotypes within parental type i, γ the observed number of parental types, and β=n–(α+γ), where n is the number of informative families.
TDTQ5
The original statistic for TDTQ5 is [(SSF - SSR) / 2]/[(SST - SSF)/ (n - 5)], where SST is the total sum of squares, SSR is the sum of squares explained by a reduced model that fits an overall mean and two (out of three) informative parental types as fixed factors, and SSF is the sum of squares explained by a full model that fits, in addition to the reduced model, two more fixed factors to estimate additive and dominant effects.14 The total number of informative families is n. TDTQ5 is testing whether a significant amount of phenotypic variation can be explained by marker genotypes in the progeny, over and above the variation already explained by parental type. TDTQ5 follows an F2,n - 5 distribution under H0 if residuals are normally distributed, or χ22 /2 for large n. Under H1, TDTQ5 follows a noncentral , or a noncentral /2 for large n, with NCP λQ5. The TDTQ5 is equivalent to a two-way ANOVA with a cross-classified design where the factors are parental type and progeny marker genotype (Appendix B).
TDTR
Although Rabinowitz18 derived a NCP, he used parameters not included in his simulations, leading to some confusion in interpreting and calculating λR. Therefore, we developed a neater NCP for TDTR. The TDTR is calculated as T/σT for a biallelic marker. T measures the strength of the covariance between the transmission of a marker allele, from heterozygous parents to progeny, and the phenotype of progeny, and σT is the standard deviation of T. We will next describe TDTR in detail, as this information will be needed for further statistical developments. The numerator is T = ∑in (yi - ȳ)wi, where yi is the phenotype of the ith child, ȳ is the overall mean (or the mean among informative families), and wi are weights given to each family type (Table 3). The sum is over n informative and unrelated family trios randomly drawn from a population. The variance of T is σT2 = ¼ ∑in (yi - ȳ)2hi where hi is the number of heterozygous parents in the family (Table 3). Under H0, TDTR follows a tn−1 distribution, so (TDTR)2 follows an F1,n−1 distribution. Under the alternative hypothesis, (TDTR)2 follows a noncentral F with NCP λR, or a noncentral for large n.
TDTG
The last test being considered is TDTG.16 For a biallelic marker where ȳM (ȳm) is the mean among progeny having inherited allele M (m) from heterozygous parents, and nM (nm) is the number of times allele M (m) is transmitted. The variance of (ȳM - ȳm) is
where
and yMk is the phenotype of the child of the kth parent. The latter sum is over the 2n parents in a sample of n family trios. If all family members are Mm heterozygous, then the same information is included in both allele categories. For a normally distributed trait and large n, TDTG follows a χ12 distribution under H0. The asymptotic distribution under H1 is a noncentral with NCP λG.
Empirical power
Power was calculated empirically as the proportion of significant results out of 1000 analyses of independent data sets, simulated under specific combinations of parameter values. Each sample consisted of n=200 unrelated family trios (father, mother, and a single child). The frequencies of the positive allele (Q) from a biallelic QTL were pQ=[0.5, 0.3, 0.1], and the same frequencies were assigned to allele M from a biallelic marker linked to the QTL. The recombination rates between the marker and the QTL were c=[0, 0.1, 0.3, 0.4, 0.5]. QTL and marker genotypes were generated for all individuals. Phenotypes were generated only for the progeny by adding a normally distributed error with variance σe2 = 1, plus -1/2, 0, or ½ for QTL genotypes qq, qQ, or QQ, respectively. Neither dominance nor polygenic effects were simulated. The level of association between allele Q at the QTL and allele M at the marker was given by the standardised linkage disequilibrium parameter D′ = [0, ½, 1].21
Deterministic power
We have developed a compound method with two parts for predicting power of association tests deterministically. The first part consisted in calculating the expected effect of marker genotypes as functions of underlying QTL genotypes, conditional on population parameters and family type. This part can be used to predict power in other association tests, in addition to the ones in this study. The second part consisted in calculating the NCP as a function of marker contrasts specific to each test.
Expected marker effects
Consider the 10 different family types at a biallelic marker (Table 2), and let Xj be a vector with the marker genotypes of child, father, and mother in a family of type j, for example, X1=[MM, MM, MM]. Let Gi denote the ith QTL genotype of the child, that is, G1=QQ, G2=Qq, and G3=qq. The expected phenotype (y) of a child given the ith family type, assuming no dominance, is
where a is the effect of substituting allele q for Q, assumed to be ½. The conditional probabilities P(G1∣Xi) and P(G3∣Xi) can be calculated using Tables W1, W2, W3, W4, available on the web (www.nature.com/ejhg/5201042).22,23 For example, the probability of QTL genotype QQ given X1 is
where P(G1 ∩ X1) is the joint probability of QTL genotype QQ in the child and marker genotype MM in all members of the family, P(X1) is the probability of family type 1 (Table 2), and h1 is the probability of drawing haplotype QM from the population which, assuming random mating and no segregation distortion, is h1 = hQM = pQ, pM + DQM. (Note: DQM=D′Dmax, and if D′>0 then Dmax=min{pqpM, pQpm}.)24 The joint probability P(G1 ∩ X1 can be obtained from Table W4 by multiplying the third and the sixth columns and adding up all. The conditional probabilities P[Gi∣Xj], for i=1, 2, 3 and j=1…10, are all summarised in Table 4.
Noncentrality parameters (NCP)
The NCP for the one-way ANOVA (λO) can be obtained applying the formula25
The sum in Equation (2) is over all three marker genotype classes, the vector B′ contains the three marker genotype means [μMM, μMm, μmm], and X′X is a matrix with diagonal elements [nMM, nMm, nmm] and zeroes elsewhere, where ni is the sample size corresponding to marker genotype i. Equation (2) represents the sum of squares due to both the marker locus and the sample mean (μ). The appropriate λO can be obtained after subtracting from Eq. (2) the sum of squares due to the sample mean, that is, n′μ2, where n′ = nMM + nMm + nmm. When testing the QTL (ie conditioning on c=0, D′=1, and pQ=pM), and assuming no dominance, Eq. (2) simplifies to
where σQTL2 = 2pQ pq a2.26
In Appendix B, we have shown that TDTQ5 is equivalent to a two-way ANOVA analysis, where data are modelled fitting parental type and progeny genotype as fixed factors, in addition to μ. Taking this equivalence into account, the NCP λQ5, derived in Appendix A, is
where bi is the expected marker genotype effect in progeny of family type i (Table 3), ni is the number of type i families, Ii(j) is an indicator variable that takes the value 1 when the family is informative (viz. at least one heterozygous parent), and 0 otherwise, Fj is the mean value of the jth parental type, and fj the number of j parental types. Eq. (4) measures, in σe2 units, the amount of total sum of squares explained by the marker, after subtracting the parental type effect. When testing the QTL, Eq. (4) reduces to
The NCP for TDTR (λR) is approximately
(Appendix C). When testing the QTL, Eq. (6) simplifies to
Finally, the NCP for TDTG (λG) is16
where n is the number of informative families. When testing the QTL in family trios, the appropriate NCP is16
The differences between the four NCPs λO, λQ5, λG, and λR are easily appreciated in Table 5, for both large and small sample sizes. In all cases, the QTL allele frequency and effect size only affect λ through the QTL variance.
Results
Empirical versus deterministic power
We have developed formulae to calculate NCPs for one-way ANOVA (λO), TDTQ5 (λQ5), and TDTR (λR), assuming that the sample consists of family trios. Once these λ's are obtained, power can be calculated from the appropriate noncentral distributions. Xiong et al16 derived the equation for the NCP of TDTG (λG). Figure 1 shows that predictions of power using our deterministic method (lines) match very well the simulation results (points). Power is shown as a function of c for three different allele frequencies denoted with circles (p=0.5), triangles (p=0.3), and squares (p=0.1), while averaging out D′. The NCP of nested ANOVA can also be calculated following this method; however, simulation results showed that nested ANOVA is the least powerful method by far, and therefore we concentrated on deriving the other NCPs. In addition to the close match between deterministic and empirical power, two other features in Figure 1 are worth mentioning. First, power decayed more when p dropped from 0.3 to 0.1, than when it dropped from 0.5 to 0.3. This is because the loss of information is relatively more important in the former than in the latter drop. Second, TDTQ5 was less powerful than TDTR, whereas the contrary was true in Table 6. This can be explained by the fact that, in Figure 1, TDTQ5 was implemented as described by Allison,14 that is, using only informative families, and estimating both additive and dominant effects. The NCP λQ5 was obtained assuming this model. However, the power of TDTQ5 increases when the dominant parameter need not be estimated.
Power ranking with more powerful models via simulations
The power of TDTQ5 increases after removing the dominance parameter from the model when it is redundant, that is, the QTL has additive effects only. A further improvement in power, albeit slight, can be achieved by using all six parental types, whether informative or not. In doing so, TDTQ5 follows an F1,n′-4 distribution under H0, as opposed to an F2,n-5, where n′ (n) is the total number of (informative) families. Likewise, one-way ANOVA can become more powerful, fitting a simple regression line across genotypes to estimate additive QTL effects. Thus, one-way ANOVA will be distributed as F1,n′-2 under H0, as opposed to F2,n′-3. All other tests remained unchanged, and power was estimated for all via simulations.
Table 6 shows empirical power across tests, focusing on each parameter at a time (c, p or D′), averaging across the other two parameters. The ranking of the tests in terms of power was the same across scenarios: first the one-way ANOVA, followed by TDTQ5, TDTG, and TDTR (the last two with similar power), and lastly nested ANOVA. Table 6(a) shows power of the tests for a given c, averaging across values of D′ and p. The last row in Table 6(a) corresponds to the empirical type-I error for each test, ie, c = ½. The one-way ANOVA was the only test for which the empirical error exceeded the nominal 5%. This is caused by the fact that one-way ANOVA is testing whether D′ is significantly different from zero, regardless of c.15 Power declined steadily as c increased, because the amount of σQTL2 explained by the marker decreased as interloci distance increased.
Table 6(b) shows power for a given D′, averaging across values of p and c. The power of one-way ANOVA reached ∼72% when D′=1, being approximately twice as powerful as the TDTs. Undoubtedly, if spurious association is not an issue, significant extra power can be obtained by testing genotype differences directly, as opposed to using robust tests. All tests showed ∼5% type-I error when D′=0, even for c=0.
Finally, Table 6(c) shows power for a given p, averaging across values of c and D′. Power decays as allele frequency becomes more extreme because (1) there are less informative families, and (2) the proportion of informative families with two heterozygous parents decreases. The first point directly causes a reduction in sample size. The second point means that less σQTL2 is available to TDTs. TDTs owe their robustness to the fact that they use only within-family genetic variation, which is greater in families with two heterozygous parents. These results contrast with those of Allison,14 who concluded that power increases as p decreases. However, Allison14 kept σQTL2 constant, so as p became more extreme, the QTL effect, and the mean difference between marker genotypes, increased, resulting in more powerful contrasts.
Discussion
A comprehensive review of methodology developed in the 1990s provided more than 60 references of association tests for monogenic diseases with Mendelian inheritance, and only about a dozen references of association tests for complex diseases.27 Nevertheless, complex diseases are by far the commonest human ailments; for example, infectious and parasitic diseases, psychiatric disorders, and cardiovascular diseases affect ∼44% of the world population, compared to just 0.05% of Caucasians being affected by cystic fibrosis, the commonest of the monogenic diseases.28,29
TDTs are increasingly used to identify QTLs underlying complex diseases because they can be more powerful than other tests, for example, linkage analysis, when markers are tightly linked to responsible QTLs, and because they are robust to spurious associations generated by common demographic events such as population stratification and/or admixture.8,10
We have developed and verified deterministic power calculations for a range of association tests for quantitative traits, that is, three TDTs and two ANOVAs, and shown how the power depends on the effect of a QTL, the recombination rate between a QTL and a marker, and the amount of linkage disequilibrium between marker and QTL. In this study, we have assumed that both loci were biallelic, and shared the same allele frequencies. Moreover, we considered a continuously distributed trait genetically determined by a single additive QTL, without polygenic component or dominance. This simplistic scenario was chosen to facilitate the derivation of NCPs for predicting power. Nonetheless, we recognise that a more comprehensive picture of the properties of these tests requires analyses of more realistic situations, for example, including dominance and polygenic effects, which is possible within the framework presented here.
The deterministic method proposed in this study consists in deriving NCPs (λ's) as functions of marker genotype contrasts specific to each test. These λ's can subsequently be used to obtain power. A common feature across all λ's was the use of expected marker genotype means, conditional on family information, under the assumptions of random mating and no segregation distortion. The marker effects were functions of the standardised linkage disequilibrium (D'), the recombination rate (c), the allele frequencies (pQ, pM), and the size of the QTL (a). Allison14 derived λQ5 for TDTQ5 when the marker is the trait locus, and we have obtained an alternative prediction of λQ5 for any recombination rate, and linkage disequilibrium in the parent population.
Power was also predicted empirically via stochastic simulations, and results confirmed the accuracy of our deterministic predictions. The advantages of deterministic over stochastic methods are (1) ease of implementation, (2) instant predictions, and (3) direct appreciation of the relationship between population parameters and power. However, deriving NCPs becomes cumbersome in complex scenarios. Thus, in these cases, empirical simulations are invaluable.
The tests ranked as follows in terms of power. The one-way ANOVA was the most powerful test of association across all scenarios, but also the only test not robust to spurious disequilibrium. The TDTs had similar, and intermediate, power. However, we showed how to increase the power of TDTQ5 compared to the original version, if there is no dominance. Lastly, the nested ANOVA was the least powerful test of association.
The power of TDTQ5 may have been previously overemphasised because complete linkage and linkage disequilibrium between marker and QTL were assumed, and family trios were sampled from a population of informative families.14 This sampling scheme means that the variance explained by the QTL is larger in the sample of informative trios than in the population at large, which would include both informative and noninformative families, and led to the counter-intuitive conclusion that the more extreme the allele frequency, the higher the power of TDTQ5 to detect associations. In addition, Allison's14 comparison between TDTQ5 and the Haseman–Elston linkage test17 favours TDTQ5 because this is a test for association, and a perfect association was assumed, whereas the Haseman-Elston test is for linkage.
In summary, a new and accurate deterministic method has been developed to predict the power of QTL detection for TDTs and ANOVAs, as a function of population parameters. We have obtained specific formulae for the NCPs of the tests, when the marker is the QTL, as functions of sample size and QTL heritability. The method contains a general part (Table 4) that can be used to calculate NCPs for other association tests. Moreover, our method can also model dominant QTL effects, and a polygenic component. Extensions to cope with multiallelic markers are theoretically possible, although future association studies in human populations are more likely to employ vast arrays of SNPs than multiallelic markers.30,31 Therefore, further developments of these approaches ought to be directed to coping with the problem of simultaneous testing of several loci, and the study of haplotypes.
References
Kerem B, Rommens JM, Buchanan JA et al: Identification of the cystic fibrosis gene: genetic analysis. Science 1989; 245: 1073–1079.
Hastbäcka J, de la Chapelle A, Kaitila I, Sistonen P, Weaver A, Lander E : Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nat Genet 1992; 2: 204–211.
Hastbäcka J, de la Chapelle A, Mahtani MM et al: The diastrophic dysplasia gene encodes a novel sulfate transporter: positional cloning by fine-structure linkage disequilibrium mapping. Cell 1994; 78: 1073–1087.
Terwilliger JD, Weiss KM : Linkage disequilibrium mapping of complex disease: fantasy or reality? Curr Opin Biotech 1998; 9: 578–594.
Schork NJ, Cardon LR, Xu X : The future of genetic epidemiology. Trends Genet 1998; 14: 266–272.
Kruglyak L : Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet 1999; 22: 139–144.
Ott J : Predicting the range of linkage disequilibrium. Proc Natl Acad Sci USA 2000; 97: 2–3.
Neale MC, Cherny SS, Sham PC et al: Distinguishing population stratification from genuine allelic effects with MX: association of ADH2 with alcohol consumption. Behav Genet 1999; 29: 233–243.
Risch N, Merikangas K : The future of genetic studies of complex human diseases. Science 1996; 273: 1516–1517.
Wright AF, Carothers AD, Pirastu M : Population choice in mapping genes for complex diseases. Nat Genet 1999; 23: 397–404.
Spielman RS, McGinnis RE, Ewens WJ : Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993; 52: 506–516.
Clayton D : Population association; in Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics. New York: John Wiley & Sons Ltd., 2001, pp 519–540.
Schork NJ, Fallin D, Thiel B et al: The future of genetic case–control studies; in Rao DC, Province MA (eds): Genetic dissection of complex traits (Advances in genetics, Vol 42). US: Academic Press, 2000, pp 191–212.
Allison DB : Transmission-disequilibrium tests for quantitative traits. Am J Hum Genet 1997; 60: 676–690.
Long AD, Langley CH : The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res 1999; 9: 720–731.
Xiong MM, Krushkal J, Boerwinkle E : TDT statistics for mapping quantitative trait loci. Ann Hum Genet 1998; 62: 431–452.
Haseman JK, Elston RC : The investigation of linkage between a quantitative trait and a marker locus. Behav Genet 1972; 2: 3–19.
Rabinowitz D : A transmission disequilibrium test for quantitative trait loci. Hum Hered 1997; 47: 342–350.
Sham PC, Cherny SS, Purcell S, Hewitt JK : Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am J Hum Genet 2000; 6: 1616–1630.
Sokal RR, Rohlf FJ : Biometry. New York US: WH Freeman and Company, 1995.
Lewontin RC : On measures of gametic disequilibrium. Genetics 1988; 120: 849–852.
Jayakar SD : On the detection and estimation of linkage between a locus influencing a quantitative character and a marker locus. Biometrics 1970; 26: 451–464.
Hill AP : Quantitative linkage: a statistical procedure for its detection and estimation. Ann Hum Genet 1975; 38: 439–449.
Weir BS : Genetic data analysis II. Sunderland, US: Sinauer Associates, Inc. 1996.
Searle SR : Linear models. New York: John Wiley & Sons, 1971.
Falconer DS, Mackay TFC : Introduction to quantitative genetics. England: Longman Group Ltd, 1996.
Zhao H : Family-based association studies. Stat Methods Med Res 2000; 9: 563–587.
The World Health Report. Part three: statistical annex. WHO, 1999, www.who.int/whr/1999/en/report.htm.
Underwood JCE : Genetic and environmental causes of disease; in Underwood JCE (ed): General and systematic pathology. London Churchill Livingstone, 1996, pp 31–60.
Weiss KM, Terwilliger JD : How many diseases does it take to map a gene with SNPs? Nat Genet 2000; 26: 151–157.
Miller RD, Kwok PY : The birth and death of human single-nucleotide polymorphisms: new experimental evidence and implications for human history and medicine. Hum Mol Genet 2001; 20: 2195–2198.
Lynch M, Walsh B : Genetics and analysis of quantitative traits. Sunderland, US: Sinauer Associates, Inc., 1998.
Acknowledgements
We are grateful to Ian White and Dr O Southwood for helpful comments on earlier versions of this manuscript. This work has been supported by Sygen International, and by the Biotechnology and Biological Sciences Research Council of UK.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
The NCP (λ) of two-way ANOVA can be expressed as25
Let σe2 be unity. Let B′ be the vector [μ, f1, f2, f3, g1, g2, g3] of parameters in the model, where μ is the sample mean, fi is the mean of the ith parental type, and gj the mean of the jth marker genotype across all parental types. Let K be a matrix of parameter contrasts reflecting the H0 being tested; for example, if H0: g1=g2 and g2=g3, then
The matrix X′X is
where nij is the number of records in the ith family and jth marker genotype class. X′X is a matrix of order 7 and rank 5; hence, there are seven unknowns and only five df. An appropriate generalisation of X′X is obtained deleting the first row and column, hence setting μ=0, and the last row and column, hence setting g3=0.25 Let G be the reduced X′X matrix. This G matrix can be partitioned as follows:
Then, if C=G−1, K*′ is the matrix K′ with the first and last columns deleted, and B* is the vector B with the first and last elements deleted, then
where C22 = K*CK* = (G22 - G21 G11- 1 G12)- 1 and (K*′ B*)′ = [g1 - g2, g2] When testing the QTL, Eq. (A2) gives
where the first part of (A3) corresponds to the sum of squares due to genotype, and the second part of (A3) corresponds to the sum of squares due to parental type.
However, it is a linked marker, rather than the QTL, what usually is being tested. Thus, Eq. (A3) needs to accommodate this fact. Using Tables 2 and 3 in the Materials and Methods section, the new λ can be written as
where bi is the expected marker genotype effect among progeny in the ith trio class, ni the number of trios in class i, Ii(j) an indicator variable equal to 1 if the trio is informative and 0 otherwise. Table A1 shows Fj, the mean value of the jth parental type, and fj, the number of these parental types.
It is also possible to use all trios, thus setting Ii(j)=1 for all i (j), without increasing the type-I error rate. By doing so, power increases slightly, through augmenting the residual df, and ascertainment of informative families becomes unnecessary.
This method of obtaining λ can be applied to derive the NCP for nested ANOVA; however, the algebra becomes more tedious. Finally, the NCP λO can be derived through Eq. (3), although a simpler method was described in the Materials and Methods section.
Appendix B
Let us consider two fixed effects, α and β, where α could represent the factor parental type, and β could represent the genotype of the progeny. Thus, the model can be written as yij=μ+αi+βi+eij, which corresponds to a two-way ANOVA model without interaction. We will now show that the original statistic F2,n−5 for 14 is equivalent to the F-ratio for testing the effects of β after having corrected for the effects due to μ and α, using the previous model.
For a constant k = 2/n - 5, we can see that
where SSμα and SSμαβ are the sum of squares explained by a model that fits μ and α, and by a model that fits μ, α, and β, respectively; SST is the total sum of squares; and Rβ∣μ,α2 and Re2 are the proportions of the total variance explained by β, after taking into account the effects of μ and α, and the proportion of unexplained variance, respectively. The null hypothesis of interest is whether factor β explains a significant amount of phenotypic variance over and above the amount explained by μ and α jointly. The F-ratio that appropriately reflects this null hypothesis is given in Eq. (B1).
Appendix C
Let assume T is a random variable following a t-distribution, and let σT be the standard deviation of T. A first-order Taylor's approximation for λ is λ = E(T/σT) ≈ E[T]/E[σT].32 In order to derive E[T] and E[σT], we used the probabilities of the 10 different types of trios and the expected effects of marker genotypes in the progeny contained in Tables 2 and 3. Hence, conditional on pM, pQ, c, and D′, E[T] = E[∑in (yi - ȳ)wi], and because all family trios are independent (ie unrelated) E[T] = NE[(y - ȳ)w], where y, the phenotype, and w, a weighting factor, are expectations for a single trio (Table 3). Thus, the expected value of the numerator of TDTR is approximately E[T] = NpM pm [pM2 (b2 - b3) + pM pm (b5 - b7) + pm2 (b8 - b9)]. When analysing the QTL, and assuming no dominance, the previous equation simplifies to E[T] = NpQ pqa.
The expected variance of T, E[σT2], is the same regardless of whether the locus being tested is the QTL or a marker. Equation (A1.23a) in Reference32 is , which reduces to if the second term can be ignored. Hence, E[σT2] = E[1/4 ∑in (yi - ȳ)2 Hi] = 1/4E[(y - ȳ)2 H] and, as the expectation of a random variable X given another random variable Y is E[X] = E[E[X∣Y]], E[(y - ȳ)2 H] = ∑H = 02 HPHE(y - ȳ)2 = pM pm σe2 + σQTL2 Finally, dividing E[T] by we obtain
Rights and permissions
About this article
Cite this article
Hernández-Sánchez, J., Haley, C. & Visscher, P. Power of QTL detection using association tests with family controls. Eur J Hum Genet 11, 819–827 (2003). https://doi.org/10.1038/sj.ejhg.5201042
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/sj.ejhg.5201042