Introduction

Association studies have received substantial interest in the advent of the genomic era. In complex diseases, association studies can offer advantages over linkage studies, both, from a statistical and practical point of view.1,2,3,4,5 However, problems with regard to population stratification and failure to reproduce results have drawn criticism and have led to several guidelines on how to conduct statistical analyses of association studies.6,7,8 A major issue in association studies is that of studywide statistical significance (ie problem of multiple testing9,10,11). Association studies are often performed by screening many loci in patients and controls and evaluating the frequencies for each locus separately. Patients are then stratified for the predisposing allele of a certain locus and analysed for allele frequencies of a second locus. By this approach, the number of comparisons increases rapidly and has to be considered in the statistical analysis. A stringent correction of P-values (to attain a studywide P-value by procedures such as Bonferroni’s or Holm’s12) can lead to reduction in power since correction procedures tend to be conservative and may, therefore, hamper reproducibility.

Single locus statistics to test for genetic association with disease make use of several statistical models. Choices are comparisons of allele, phenotype (allele carrier) or genotype frequencies. All these models have the advantage of being analysed in contingency tables where a χ2 or Fisher statistics can be applied. The use of general linear models has been proposed,13 for which most models have no direct correspondence in classical genetic association studies. We have developed a statistic to test the following null hypothesis: no allele combinations (ACs) comprising alleles from a fixed number of loci are associated with a given disorder. Given such a set of loci, we define an AC of an individual to be a set of alleles which contains exactly one allele from each genotype of each locus. Therefore, if l loci are considered 2l different ACs can be formed if the individual is heterozygous at each locus. ACs represent potential haplotypes if the loci are confined to the same chromosomal region. The number of loci considered for these sets is subject to practical considerations that are discussed below. The disease-modifying effects of ACs can be modelled by considering extensions of several genetic models to many loci. Rejection of the null hypothesis shows the existence of an AC to be associated with disease, that is, no particular AC is designated. Descriptive methods are implemented, including tests of individual ACs with the possibility of performing Bonferroni correction. Odds ratios (ORs) for all ACs are computed and highlighted graphically. These ORs are computed in terms of AC frequencies, which can be computed in several ways in the multilocus case. We define these frequencies in the Material and methods section. The graphical representation of ORs and other measures of association gives a concise summary of complex data sets and has proven useful for both simulated and real data sets. We apply a normalization of ORs of combinations by comparing each OR with its expectation based on ORs of single alleles. We used a simulated data set to illustrate the problems involved and show that power gains can be substantial when comparing our statistics to a conventional analysis. Descriptive methods may be effective in revealing interaction between several loci. We then applied the test statistics and descriptive methods to a case–control study for multiple sclerosis (MS).

Material and methods

When deriving ACs for an individual that is homozygous at some of the loci being investigated several identical ACs are observed. The identical ACs can be weighted in different ways. The possibilities investigated here are illustrated in Figure 1. In the additive case, all ACs are formed for each individual irrespective of whether they are identical or not and are used to compute AC frequencies in patients and controls. In the single count model ACs derived for each individual are counted at most once. These two models differ in how homozygous individuals are weighted. If evaluated for single loci the additive and single count models correspond to allele and phenotype frequencies, respectively. Genotypes are directly analogous in the single and multilocus cases. To quantify epistatic effects we consider the quotient of observed and predicted ORs which is called normalized OR (nOR) hereafter (cf Appendix A). The statistical test can be performed for any of the genetic models. As a result of multiple interdependencies in the data set, a bootstrap simulation procedure was applied (cf Appendix A). The data set used for re-evaluation contains unrelated MS patients and healthy control subjects. These groups are matched ethnically and have been used in several association studies so far.14,15,16 Details of clinical MS parameters and other group characteristics have been given previously.14 The most basic information is summarized in Table 1. The data set contains 77 markers, 1187 MS patients and 524 control subjects. 60992 and 55498 alleles have been typed in the groups, respectively.

Figure 1
figure 1

Genetic models as applied and extended to multiple loci. The additive model takes into account all possible combinations of alleles. The single-count model counts any particular AC at most once for an individual and the genotype model takes into account full genotypes at each locus.

Table 1 Properties of the MS patients and the control group

Simulation study

We have simulated our statistics for several scenarios to evaluate its power. A data set of variable size was generated according to a coalescent process.17,18 A total of 20 unlinked loci were simulated mimicking the situation of an association study. To consider different alternative hypotheses, we picked an AC comprising an allele from the first and second locus and divided the data set into patients and controls generating deterministic frequencies in two groups (cf Appendix A). Frequencies were determined according to a single count model. The differences in the frequencies between the groups as well as the sizes of the cohorts were varied giving rise to several different simulated situations. In each case, a test was applied to detect association of ACs comprising two alleles.

Results

Simulation study

Results from the simulation study are summarized in Figure 2. Differences in AC frequencies between patients and controls of 0.21 (Figure 2a) and 0.11 (Figure 2b) were evaluated. As can be seen, power is excellent for a difference of ≥0.21, when a number of 400 individuals is sufficient to reach power >0.8. Power drops below 0.8 when differences are smaller than 0.11 and sample sizes are smaller than 800 individuals (Figure 2b).

Figure 2
figure 2

Power simulations of a global null hypothesis. Power simulations for the test statistics under the alternative of one AC of size 2 being associated with the disease. Difference (either 0.21 or 0.11; a, b) denotes the surplus fraction of cases bearing the associated AC. Controls are fixed to a 19% frequency of the predisposing AC (a, b). Power simulations under misspecification of the inheritance model (c) (for further explanations see text). When a single AC of size 2 is associated with the disease, a test statistics for single alleles is employed.

Furthermore, we have evaluated the effect of choosing the wrong formal genetic model. We have simulated data as above, but then used single locus statistics to evaluate the simulated data sets. Figure 2c shows that power is reduced dramatically. Power of >0.8 cannot be achieved for <1000 individuals when the difference of the AC frequency is 0.11 between the groups. Figure 3 shows descriptive analyses of a simulated data set with 60 loci. This data set was produced as for the power simulations. The size of each spot represents the maximal OR > 1 of all ACs at the respective locus pair in the lower left part, by including only ACs with at least 50 observations. The minimal OR < 1 is shown in the upper right part. As can be seen, a strong association between loci 1 and 2 is evident, together with spurious associations of loci 1 and 2 with other loci. Also, a considerable background of false positives is present (Figure 3a). The background can be reduced by selecting ACs more stringently for inclusion into the descriptive analysis. Figure 3b represents such an analysis, which requires at least 80 observations, and the causative AC from locus 1 and 2 can be readily identified. To address the problem of spurious associations of alleles from other loci with a predisposing allele at either locus 1 or 2, ORs were normalized as described in Material and methods. As shown in Figure 3c (cf Appendix A) spurious associations could be eliminated. In contrast, single-allele analysis shows strong associations for many loci (Figure 3d).

Figure 3
figure 3

Two-way interaction in a simulated data set. Maximal (bottom left) and minimal ORs (top right, a, b) of ACs for a simulated data set according to the single-count model. The size of each point corresponds to the value of the OR according to a logarithmic scale. The minimal number of observations for an AC to be included is 50 (a, c) and 80 (b), respectively. nORs are shown in part (c) Minimal ORs (top) and maximal ORs (bottom) of single alleles for the data set are shown in part (d).

MS case–control study

Table 2 lists the loci included in this study. Table 3 lists results from global hypothesis testing. Clearly, the results are highly significant. Descriptive results for two-way interactions are shown in Figure 4. Results for two different genetic models, additive (Figure 4a) and single count (Figure 4b) are presented. Results for the different models are similar, yet there are noteworthy differences. For example, associations of the markers NFKBIA-2 through NFKBIA-5 with the marker D18S35 seem to be stronger for the additive than for the single-count model. In contrast, the TNF-1/FGF1 association from the single-count model is absent for the additive model. In general, several ACs are striking and fit into pathogenetic hypotheses. We point to combinations which include the allele IFNA-1:07. Associations for the single alleles were reported previously.14,15 ACs including this predisposing allele and a certain allele of another locus seem to be strongly associated with MS. However, the display of nOR (Figure 4c; single-count model) shows that most of these associations can be explained by the individual effect of the IFNA-1:07 allele. ACs with strong disease association for which single alleles have weak effects include TCRB-5/HLA-DRB1, TCRB-5/NFKB1-2, TNF-1/HLA-DRB1 and HLA-DRB1/D18S364. These ACs display presumed epistatic effects, which correspond to large nORs (Figure 4c). As defined by testing ACs in a contingency table individually, the most significant ACs are iterated in Table 4 for the single count model.

Table 2 Results of statistical tests to infer genetic association in the MS data set
Table 3 Loci included in this study of genetic MS association
Figure 4
figure 4figure 4

Two-way interaction for MS. Maximal (bottom left) and minimal ORs (top right) of ACs in the MS data set according to the additive model (a) and the single-count model (b). The size of each point corresponds to the value of the OR by a logarithmic scale. Minimal amount of observations for an AC to be included is 10. nORs are shown in part (c). ORs for ppMS are shown in part (d) (minimal count of observations is 30).

Table 4 ACs most strongly associated with MS

Results for the single-count model in primary progressive MS (PPMS) are shown in Figure 4d. Differences between the group of all MS patients and that of PPMS are apparent. Comparing Figures 4b and d, HLA-DRB1 appears to be more important in predisposing to PPMS. Also, the locus D18S41 is more strongly associated in PPMS patients than in all MS patients. However, no epistatic effects are present as revealed by nOR analysis (data not shown). Epistatic effects are demonstrable for the combinations TNFRSF1A-1/NFKBIA-11, TNFRSF1A-2/NFKBIA-11 and TNFRSF1A-2/NFKBIL1-1. These are only present for PPMS. More descriptive results including lists of ACs and graphical analyses for other MS groups/parameters are presented on a supplementary website (http://www.s-boehringer.de/cd). All software used for this paper is available for downloading.

Discussion

Our simulation study has shown that statistical power to detect genetic associations can be excellent whenever a global null hypothesis is considered. This approach circumvents the need for multiple testing at the cost of not readily identifying the presumably causative ACs. Therefore, descriptive methods are needed to weigh individual ACs. Several descriptive measures exist among which we used P-values of individual tests, ORs and nORs. Each measure generates a different aspect of the data set, which places a caveat on each descriptive measure. The simulated data have shown that ACs may be identified by descriptive measures if all differences between cases and controls are concentrated in a single AC. If several ACs have small contributions each, as is to be expected in a complex disorder, descriptive measures are less likely to give clear evidence for single factors. The test itself is robust against heterogeneity since a global null hypothesis is tested and all differences between the groups are summed up to produce a single test statistics. No exact rules can be given of how to identify single ACs relevant to disease predisposition. However, several patterns can hint at relevant ACs. The pattern exhibited by INFA1:07 shows spurious associations with other alleles in the absence of epistatic effects, which is because of strong associations from the single allele. This effect could be revealed by considering nORs.

As shown previously,15 the phenotype frequency for IFNA1:07 was 6.3% (OR 12.41, Pc<8 × 10−4). The combination TCRB-5/HLA-DRB1, which corresponds to TCRBV6S3*2 and HLA-DRB1*03, has been described previously.14 The OR for TCRBV6S3*2 was 2.72 (Pc<0.006), for HLA-DRB1*03 it was 1.42 (Pc<0.8) and for the AC TCRBV6S3*2/HLA-DRB1*03 it was 22.03 (Pc<5 × 10−3) comparing with an OR of 23.91 (Pc=3.64−2) in Table 4 (HLA-DRB1:03/TCRB-5:02; ie HLA-DRB1:03/TCRB-5:02) (Pc=3.64−2). The published data correspond to the single-count model (phenotype frequencies) and differs slightly due to inclusion of few extra probands. In this example, the additive model generates similar results, since the number of individuals homozygous for the relevant HLA-DRB1 allele or the relevant TCRB allele is extremely low. In addition, this example demonstrates that the statistical analysis presented here is in concordance with previous evaluations but it is more efficient, since multiple comparisons can be made in a single step. Moreover, allele combinations that may be overlooked because of borderline significance of certain alleles from single loci will be detected by comprehensive descriptive analysis. Taking into account P-values, ORs and nORs simultaneously may suggest interesting candidates.

In our statistics, a full model of a given complexity is considered (say all pairs of loci). This is desirable when sizable data sets are under scrutiny (say up to about hundred loci). However, the number of ACs increases polynomially with the number of loci and exponentially with the number of alleles contained in each AC. This imposes a practical limitation for applying this test in terms of calculation time. The number of alleles contained in each AC is a parameter which is to be chosen carefully. The bootstrap procedures assume that AC frequencies can be estimated with high accuracy (ie the error in AC frequency estimation is only partially accounted for in the test statistics). We have, therefore, limited our analyses to combinations of two alleles. If SNPs are used throughout, ACs of size three could be realistically screened in case less than about 1000 probands are present in each group.

One issue raised in case/control designs is the problem of population stratification. Considering several loci simultaneously can accentuate this confounder. In our simulation study we have assumed no stratification. The MS group set is closely matched with the control group, ethnically.14 Nevertheless, population stratification remains an issue and certain applications may require our statistics to allow for stratification. Some authors have proposed solutions to the stratification problem.19,20,21,22 Among these are general methods to estimate a degree of memberships to population substrata for each individual.21 These values can be used to test for association in each substratum and combine these statistics to a global statistics. Another issue raised by association studies in complex diseases is that of inferential vs descriptive statistics. We have put forward the notion to look at association studies from a descriptive point of view.11 Although the statistic presented here provides a P-value (ie multiple testing is not involved), the topic of stringent inferential statistics is only deferred rather than answered satisfactorily. The issue is that association studies are continuously ongoing efforts. Also, tests for global null hypotheses are repeated when the data set grows. We rather consider this test as an additional useful descriptive measure to evaluate complex data sets. In our opinion, the burden of using stringent correction procedures for multiple testing (such as Bonferroni's or Holm’s) can be relieved and should allow for more inclusive presentation of large association studies in terms of the amount of loci reported. We want to add that the test presented can be used in a more direct way to evaluate relative contributions from different loci. By excluding loci from the test, the contribution to the overall test statistics can be assessed (cf. Payami et al23). Descriptive and especially graphical methods can help to give comprehensive yet concise summaries of complex data sets. Especially data sets that are grown historically can be analysed by our methods, allowing to include the entire data set whenever the study is expanded. This can give more insights into data that have been gathered long before. So far, independent replications are the only means to verify any particular finding. ACs suggested in this paper to be associated with MS are yet to be replicated.

Still, there are very few examples that demonstrate complexity of a disorder which goes beyond single locus inheritance (eg Gabriel et al,24 Hugot et al,25 Ogura et al26). Clear evidence for more than a couple of loci increasing liability to disease through additive or epistatic effects is yet to be proven unequivocally. In part this is because of lack of appropriate statistics, which therefore should be considered a topic of paramount interest. In summary, we consider the statistical methods presented to improve the analysis of complex disease substantially when reasonable guidelines are respected.

During the preparation of this manuscript similar methods have been developed. A method to identify genotype combinations was proposed.27 To reduce the complexity of the set of relevant genotype combinations the authors group genotype combinations with similar effect into partitions which are then evaluated by cross validation for their predictive power. In another study, a global test statistic is computed in a two-stage process.28 First, a sum of test statistics is calculated similar to the present study. A variable number of loci is considered at this point. The subset which displays the strongest association according to the initial test statistics is chosen and evaluated for significance by bootstrapping techniques. Simultaneous effects of loci are not considered.