Among-environment heteroscedasticity and the estimation and testing of genetic correlation

Dutilleul, Pierre; Carrière, Yves

doi:10.1046/j.1365-2540.1998.00267.x

Download PDF

Original Article
Published: 01 April 1998

Among-environment heteroscedasticity and the estimation and testing of genetic correlation

Pierre Dutilleul¹ &
Yves Carrière²

Heredity volume 80, pages 403–413 (1998)Cite this article

1056 Accesses
5 Citations
Metrics details

Abstract

The genetic correlation between a character in two environments is of considerable interest in the context of plant and animal breeding for the prediction of evolutionary trajectories and for the evaluation of the amount of genetic variance maintained at equilibrium in subdivided populations. The two-way analysis of variance with genotype and environment as crossed factors is the usual basis for estimating this genetic correlation. In plasticity experiments, the genetic variance can differ widely between environments, for instance when the variance component associated with the genotype–environment interaction is not constant over environments. When this is the case, the assumption of homoscedasticity is violated, and the ANOVA method tends to underestimate the absolute value of the genetic correlation. To solve this problem, a variance-stabilizing transformation previously applied in a multivariate ANOVA context was developed. This development resulted in a new procedure (method 3), in which the genetic correlation is estimated from the transformed data (i.e. after among-environment heteroscedasticity is removed, while the within-environment means are maintained). In a simulation study and an analysis of Chlamydomonas reinhardtii growth rate data, we compared method 3 with two existing methods in which the genetic correlation is estimated from the raw data. Method 1 uses one ‘global’ variance component associated with the genotype–environment interaction, and method 2 uses two variance components associated with the genotype and obtained from one-way ANOVAs conducted separately in the two environments. Under increasing among-environment heteroscedasticity, method 1 produces increasingly biased genetic correlation estimates, whereas method 3 almost consistently provides accurate estimates; the performance of method 2 is intermediate, with more estimates out of range or indeterminate. This is the first demonstration that a variance-stabilizing transformation of the data removes the bias in the estimation of genetic correlation caused by among-environment heteroscedasticity, while allowing valid statistical testing in an ANOVA-based approach.

Correlational selection in the age of genomics

Article 15 April 2021

Species-specific effects of thermal stress on the expression of genetic variation across a diverse group of plant and animal taxa under experimental conditions

Article 06 July 2020

Plant genetic diversity affects multiple trophic levels and trophic interactions

Article Open access 27 November 2022

Introduction

Organisms allowed to develop in different environments typically show phenotypic plasticity (e.g. Via, 1993; Schlichting & Pigliucci, 1995). A central idea in the evolution of phenotypic plasticity is that a character measured in different environments may represent character states that are more or less correlated genetically (Falconer, 1952; Via & Lande, 1985). Therefore, selection on a trait in a particular environment may affect this trait differently when the selected population is raised in another environment.

Selection gradients and additive genetic variances and covariances have been used in equations to predict the evolutionary trajectory of character states (Via & Lande, 1985). The square root of the heritability of character states and their genetic correlation can be used equivalently in such equations (Falconer, 1989; Grant & Grant, 1995). Moreover, genetic correlation is a dimensionless descriptor of the state of populations (Houle, 1992) that can be used to compare the evolutionary potential of different character states (e.g. Fry et al., 1996). The genetic correlation between character states is of considerable interest, therefore, in the context of plant and animal breeding, especially when the selective environment differs from the environment in which the improved population will live. It also provides information on the rate at which optimum phenotypes are attained under disruptive selection in a spatially variable environment and on the amount of genetic variance maintained at equilibrium in the improved population (Via & Lande, 1985, 1987; Bell, 1992).

The two-way ANOVA with genotype (e.g. clones or sib-groups) and environment as crossed factors is the usual basis for the estimation and significance testing of the genetic correlation (Robertson, 1959; Yamada, 1962; Fry, 1992). The standard method using variance components associated with the genotype and the genotype–environment interaction is based on the assumption that the genetic variance (i.e. the variance computed among the genotypic means) is constant over environments (Yamada, 1962, p. 504). A common finding in plasticity experiments, however, is that the environment affects both the genetic architecture of populations (i.e. the genetic variances and covariances) and the phenotypic expression of genotypes (e.g. Gebhardt & Stearns, 1988; Bell, 1991; Simons &Roff, 1996). Consequently, the usual two-way ANOVA approach tends to underestimate the absolute value of the genetic correlation between character states when the variance component associated with the genotype–environment interaction differs between environments (Yamada, 1962, pp. 504–505). To correct for the bias, the genetic correlation has been estimated with modified formulae (Yamada, 1962, p. 505; Bell, 1990, pp. 306–307). Because the F-ratio test of the correlation is no longer valid, the execution of separate one-way ANOVAs in each environment, followed by the use of the resulting variance components associated with the genotype in the estimation of the genetic correlation, has been recommended by some without further justification (Via, 1984; Fry, 1992, p. 543).

In the ANOVA approach to the study of phenotypic plasticity, Dutilleul & Potvin (1995) developed various data transformations to remove the statistical nuisance of the among-environment heteroscedasticity or that of genetic autocorrelation (i.e. when the responses expressed by the same genotype in two different environments are more similar or dissimilar than two randomly associated values), or both. One of their recommendations was to apply the transformation that removes heteroscedasticity, while taking autocorrelation into account by modified F-testing. The authors suggested (p. 1818) that further investigation was needed before using their transformation in the context of genetic correlation analysis. The present paper develops and validates the variance-stabilizing transformation of Dutilleul & Potvin (1995) in that context. First, we define a new version of the transformation. Secondly, the resulting method of genetic correlation estimation based on two-way ANOVA of the transformed data is compared theoretically with the standard method and that based on separate one-way ANOVAs in each environment, both performed on the raw data. Thirdly, the accuracy and precision of the three methods are compared in a simulation study, in which the bias and variance of the genetic correlation estimates are analysed in relation to the level of among-environment heteroscedasticity and the number of replicates per genotype and environment. Fourthly, the methods are applied to Chlamydomonas reinhardtii growth rate data published by Bell (1991). Finally, the discussion is extended to other methods that were not studied further because of their lack of generality or poor performance following preliminary results.

The mixed two-way analysis-of-variance model

Following Fry (1992) and Dutilleul & Potvin (1995), the genotype and environment factors are considered random and fixed, respectively; this allows the expected value (i.e. the theoretical mean) of the phenotypic response to differ among environments. The model parameters are those of the ‘SAS model’, in which the genotype variance component represents the variance of the genotype main effects, and not of the ‘Scheffé model’, in which that variance component is the variance of the genotypic means. The SAS model is recommended for its natural application for estimating the genetic correlation and testing whether it differs from zero (Fry, 1992).

In a standard plasticity experiment, the phenotypic response of replicate k (k=1,..., r_ij) of genotype i (i=1,..., n) in environment j (j=1,..., p) can be expressed as

where μ is the intercept; G_i, e_j, and Ge_ij are the deviations attributable to genotype i and environment j and the interaction between them, respectively; and the ε_ijk are residual deviations of microenvironmental and individual nature. Whereas μ+ e_j=μ_j represents the expected value of the response of an individual in environment j, the other terms are all considered to be normally distributed with an expected value of 0 and a given variance component: σ²_G for G_i, σ²_Ge for Ge_ij under among-environment homoscedasticity, and σ²_ε for the error term. Under among-environment heteroscedasticity, the variance of the interaction term, Ge_ij, may change from environment to environment, so there may be as many variance components as there are environments: σ²_Ge,j (j=1,..., p).

When replicates follow from the replication of the np genotype–environment combinations in different growth chambers, as in the Chlamydomonas example considered here, it is justified to define profile vectors of repeated measures on the same genotype within a growth chamber, y_ik=(Y_{i1 k},..., Y_ipk), and to consider the following p-variate model:

where m=(μ+ e₁,..., μ+ e_p), G_i=(G_i+ Ge_i1,..., G_i+ Ge_ip) and e_ik=(ε_{i1 k},..., ε_ipk); the last term would incorporate the growth chamber effects. In model (2), m is the mean vector of the profile vectors y_ik, whereas the variances of and the covariance between the phenotypic responses of genotype i in environments j and j§ (j≠ j§) are given by σ²_G+σ²_Ge,j+σ²_ε, σ²_G+σ²_Ge,j§+σ²_ε, and σ²_G, respectively. The variance–covariance structure of the genotypic profiles y_ik can be described by two variance–covariance matrices, Σ_G and Σ_ε. The diagonal entries of Σ_G are given by σ²_G+σ²_Ge,j (j=1,..., p), and the off-diagonal ones are the genetic autocovariances (i.e. the genetic correlations multiplied by the square root of the product of the corresponding variances). Matrix Σ_G can be decomposed as the product of the diagonal matrix with entries σ²_G+σ²_Ge,j (j=1,..., p) and the genetic autocorrelation matrix, $∑_{r_{g, 0}}$ , with unit diagonal entries and off-diagonal entries equal to r_g,0. The variance–covariance structure among replicates is assumed to be spherical (i.e. there is independence and homogeneity of variances among replicates): Σ_ε=σ²_ε I_p, where I_p is the p× p identity matrix. The multivariate model (eqn 2) and its assumptions listed above are used in the simulation study.

When the np genotype–environment combinations are replicated in a completely random way (i.e. there are no ‘blocks’ like growth chambers), the experimental unit for repeated measurements is the genotype. However, because there is then no link among replicates, the multivariate model applies to the genotypic mean profiles ¯y_i=(¯Y_i1,..., ¯Y_ip), where ¯Y_ij denotes the mean response computed across the r_ij replicates available for the genotype–environment combination (i, j) (Dutilleul & Potvin 1995).

The three estimation methods

All three methods are based on the following equation written under the SAS model (eqn 1):

To estimate the variance components involved in the calculation of the genetic correlations, we wrote a computer program (PLASTIC) using the SAS/IML language (SAS Institute Inc., 1988). For each of the three methods, the program implements a procedure equivalent to the VARCOMP procedure, option type I (SAS Institute Inc., 1989), in which the variance component estimates are solutions of the system of expected mean squares given in response to the RANDOM statement. The data generated in the simulation study and the real data used in the example are balanced (i.e. the number of replicates is the same for all np genotype–environment combinations: r_ij= r for i=1,..., n; j=1,..., p), so the argument of bias put forward by Fernando et al. (1984) does not apply here.

Method 1

Under model (1) with p=2, one ‘global’ variance component associated with the genotype–environment interaction is computed over environments j and j§, that is σ²_Ge,j=σ²_{Ge, j§}=σ²_Ge in eqn (3). Therefore, this method is strictly valid only in the homoscedastic case. The same holds true for testing, when the significance of the genetic correlation is assessed with an F-ratio test based on the genotype mean square divided by the genotype–environment interaction mean square (Fry, 1992, p. 542); the denominator would tend to be overestimated with heteroscedastic data, resulting in a lack of power of the test (Yamada, 1962).

Method 2

Fry (1992, p. 543) briefly mentioned that, under among-environment heteroscedasticity (i.e. σ²_{Ge, j}≠σ²_{Ge, j§}, j≠ j§), it is preferable first to estimate the variance components associated with the genotype in one-way ANOVAs performed on the raw data in the two environments separately, and then to use the resulting variance component estimates under the square root in eqn (3). Via (1984) had already considered such an estimation procedure without real justification and labelled it ‘method 2’.

The justification for this procedure is the following. Using eqn (1) and rewriting it explicitly for environments j and j§ (e.g. j=1 and j§=2) provides

Grouping then the terms that do not depend on i (i.e. μ, e₁, e₂), those that depend on i only (i.e. G_i, Ge_i1, Ge_i2), leaving the last term depending on i and k (i.e. ε_{i1 k}, ε_{i2 k}), results in

The variance components associated with G_i1§ and G_i2§ are those that need to be estimated (i.e. σ²_G+σ²_Ge,1 and σ²_G+σ²_Ge,2). Clearly, the estimation of negative or zero variance components is a limiting factor for method (2) because of the square root in eqn (3). Furthermore, there is no test because the denominator of the eventual F-ratio is a mixture of variance components estimated from correlated data.

Method 3

Yamada (1962) stated that ‘the standard two-way analysis [of variance] is no longer valid [for estimating genetic correlations] unless some transformation to make homogeneous variances is made’. Accordingly, Dutilleul & Potvin (1995) proposed a transformation in which the genetic variance of the transformed data was fixed to 1.0 in each environment, while mentioning (p. 1818) that it may be justified to scale the genetic variances to a common value other than 1.0; in all cases, the transformation maintains the within-environment means. The version considered here for the analysis of genetic correlations uses the geometric mean of the genetic variances of environments j and j§ as common genetic variance after transformation. The basis for that choice is that the denominator in eqn (3) is, by definition, the geometric mean of the variances σ²_G+σ²_Ge,j and σ²_G+σ²_Ge,j§. Using the geometric mean of the two variances in the transformation therefore produces equality between the two terms under the square root in eqn (3), while maintaining their product equal to (σ²_G+σ²_Ge,j)(σ²_G+σ²_{Ge, j§}). The properties of our transformation (i.e. the means are maintained and the variances made homogeneous are fixed to an intermediate value) are illustrated in Fig. 1, using the data from two environments with extreme variances in the Chlamydomonas example.

If ^Σ denotes the sample covariance matrix estimated from the genotypic mean profiles (Dutilleul & Potvin 1995, p. 1817), the new transformation can be defined by

where ¯y is the overall sample mean vector computed over y_ik (i=1,..., n; k=1,..., r), ¯σ_geom denotes the positive square root of the geometric mean of the genetic variances of environments j and j§, ^Σ_jj§ is the 2×2 submatrix of ^Σ corresponding to environments j and j§; and diag and ^0.5 denote the diagonal and square root operators of matrix algebra respectively (Graybill, 1983); other notations are as in eqn (2).

Method 3 uses eqn (3) with the data transformed after eqn (4), for which σ²_{Ge, j}=σ²_{Ge, j§}=σ²_Ge because the data so transformed are homoscedastic (Fig. 1). As the genetic variances involved in the geometric mean in eqn (4) are diagonal entries of the ^Σ matrix, they are inflated by the error variance; this is analogous to the contamination of the product–moment correlation of genotypic means. The contaminating term is σ²_ε divided by r, so the higher the number of replicates, the less the contamination (Via, 1984; Roff & Preziosi, 1994). To provide a method also valid in small samples, we developed an adjustment for the inflation by subtracting the error mean square divided by r from each of the two genetic variances before computing the geometric mean in eqn (4). The simulation study will show if this adjustment is effective. Method 3 should then allow valid F-ratio testing based on the genetic correlations estimated on the transformed data, whatever the sample size.

The simulation procedure

Equation (2) was used for simulation with r_ij= r for any (i, j). The simulation parameters were the theoretical genetic correlation, r_g,0 (i.e. the genetic correlation generated in the data and expected from the correct estimation method), the number of replicates per genotype and environment, r, and the level of among-environment heteroscedasticity. The r_g,0-values considered were −1.0, −0.5, 0.0, 0.5 and 1.0. The numbers of replicates were 2, 4 and 8. The heteroscedastic pattern considered among eight environments (i.e. p=8), as there are eight environments in the Chlamydomonas example, is defined by σ²_G=0.2 and σ²_{Ge, j}=0.01(p+1− j)³ (j=1,..., p=8), so that σ²_G+σ²_{Ge, j} ranges from 0.21 to 5.32. When the number of environments had to be decreased to three in order to ensure that $∑_{r_{g, 0}}$ was positive semidefinite so that its square root existed, the three σ²_Ge,j-values considered were 0.01, 0.2263 (the geometric mean of the other two) and 5.12; when p=2, only the two extreme values were retained. Such ratios of σ²_G+σ²_{Ge, j} fall within the range of values observed in other studies (e.g. Bell, 1991; A. R. Aldous, P. Dutilleul and M. J. Waterway, unpubl. manuscript).

The multivariate intercept m was maintained the same for all simulation runs, whatever the values of the simulation parameters; it was fixed to 5+0.5 exp [0.25 (p+1− j)] (j=1,..., p), in order to mimic the decreasing pattern in the phenotypic response of log relative growth rate over environments in the Chlamydomonas example (Dutilleul & Potvin, 1995). Also, the number of genotypes, n, was 12 and the error variance, σ²_ε, was 0.6 (i.e. σ²_ε/σ²_G=3.0) for all simulation runs.

Model (2), with the covariance matrices Σ_G and Σ_ε, allows the simulation of genetic correlations of any sign and size for any among-environment variance–covariance pattern. Given an intercept m, a set of σ²_G+σ²_{Ge, j} (j=1,..., p), a genetic autocorrelation matrix $\sum_{r_{g, 0}}$ and an error variance σ²_ε, a profile vector y_ik can then be simulated as follows:

where e₁ and e₂ are two p-variate vectors of pseudorandom numbers from a standard normal distribution with zero mean and unit variance (SAS Institute Inc., 1990; function RANNOR); other notations are as in eqn (4).

For a given set of simulation parameters, the empirical bias of the genetic correlation estimates was calculated for each method as the sample mean of the estimated values minus the theoretical value r_g,0; the empirical variance was provided by the sample variance. A standard one-mean t-test was performed to assess the departure of the empirical bias from 0.0; the asymptotic normality of the sample mean was verified empirically. All these outputs are available in the computer program PLASTIC, which is available from the first author upon request and on WWW at ftp://gnome.agrenv.mcgill.ca/pub/genetics/software.

The Chlamydomonas example

We used part of Bell's (1991) data set to compare the three estimation methods; the same data were used by Dutilleul & Potvin (1995) for illustration. It originates from a series of experiments on the ecology and fitness of Chlamydomonas reinhardtii (Bell, 1990, 1991, 1992). The data reanalysed here are log relative growth rates of strain CC-410 (mt⁻) grown in eight environments (i.e. p=8). Twelve genotypes (i.e. n=12) were grown in each environment and the design was replicated twice (i.e. r=2). Experimental and technical details can be found in Bell (1991). We estimated the genetic correlations between the 28 pairs of environments by each method. Genetic correlation estimates were compared in regression biplots for two random variables, in which a 95% confidence interval was computed for the slope of the major axis following Sokal & Rohlf (1995, pp. 544–549).

Results and discussion

The simulation study

The primary objective here was to establish the best estimation method using simulated data in which the magnitude and sign of the theoretical genetic correlation r_g,0 are fixed for a given among-environment heteroscedasticity; this will serve as a basis for comparison when the three methods are applied to the Chlamydomonas example. Results are presented in order of decreasing value of the theoretical genetic correlation (Table 1, Table 2, Table 3, Table 4 and Table 5).

Table 1 Empirical bias, ¯r_g− r_g,0, and variance, $S_{r_{g}}^{2}$ , of the genetic correlations estimated by the standard method based on the two-way analysis of variance with replicates of the raw data (method 1), by the method based on two separate one-way analyses of variance with replicates (method 2) and by the new method based on the two-way analysis of variance with replicates of the transformed data (method 3), using heteroscedastic data sets simulated with a theoretical genetic correlation, r_g,0, of 1.0

Full size table

Table 2 Empirical bias and variance of the genetic correlations estimated by the three methods, using heteroscedastic data sets simulated with a theoretical genetic correlation of 0.5

Full size table

Table 3 Empirical bias and variance of the genetic correlations estimated by the three methods, using heteroscedastic data sets simulated with a theoretical genetic correlation of 0.0

Full size table

Table 4 Empirical bias and variance of the genetic correlations estimated by the three methods, using heteroscedastic data sets simulated with a theoretical genetic correlation of −0.5

Full size table

Table 5 Empirical bias and variance of the genetic correlations estimated by the three methods, using heteroscedastic data sets simulated with a theoretical genetic correlation of −1.0

Full size table

Under among-environment heteroscedasticity, the following trends are observed. First, the higher the heteroscedasticity and genetic correlation, the poorer the performance of the standard method (Table 1, 2 and Table 4, 5). In fact, when | r_g,0|≲0.0, method 1 is only valid for low heteroscedasticity, whatever the number of replicates, and, as expected (Yamada, 1962; Fry, 1992), the bias is negative for positive r_g,0 and positive for negative r_g,0. Secondly, for a theoretical genetic correlation of 0.0 (Table 3), all three methods perform very well, with no statistically significant bias. Thirdly, as expected on a theoretical basis (see The three estimation methods section), method 3 performs well to very well, even when r=2 and the level of heteroscedasticity is low (see the pair of environments 7–8 in Table 1 and Table 2, and the pair 1–2 in Table 4). On the other hand, method 2 gets worse with increasing r, especially for high (negative or positive) r_g,0-values under high heteroscedasticity (Table 1 and 5). In particular, method 2 provides less reliable genetic correlation estimates than the other two methods, especially for two replicates, in which case about 35% of the correlation estimates were either out of range or indeterminate because of a negative variance component estimate under the square root in eqn (3). Maximum likelihood estimation of the variance components would not improve the performance of method 2 because negative estimates would be rounded to zero. Fourthly, the bias of method 1 is almost constant when | r_g,0|≲0.0 under moderate and high heteroscedasticity, whereas there is no evidence of a relationship between bias and number of replicates for methods 2 and 3. The adjustment for inflated genetic variance estimates in method 3 is thus confirmed to be effective; this is reported here for a σ²_ε/σ²_G ratio of 3.0 and was observed for σ²_ε/σ²_G ratios of 4.0 or less (results not reported). Finally, for all methods, the variance tends to decrease when the number of replicates increases.

Under among-environment homoscedasticity (unpubl. results), the three estimation methods behave very similarly in terms of absolute value of the bias and its statistical significance, especially when r_g,0=0.0, with a slight advantage overall for the standard method. In particular, method 1 performs better for high and negative genetic correlation. The most significant biases are for r_g,0=1.0. The lack of reliability mentioned above for method 2 holds true under homoscedasticity.

Overall, the novel method 3 performs better than the other two methods. Fig. 2 illustrates the bias for three non-negative values of r_g,0 when r=4. When r_g,0=0.0, among-environment heteroscedasticity has no effect on the bias whatever the method. Method 1 is strongly affected by heteroscedasticity when r_g,0=0.5 (slope=−0.014, P<0.001) and 1.0 (slope=−0.030, P<0.001). In contrast, the departure from zero is nonsignificant (P≲0.05) for the slopes of methods 2 and 3 when r_g,0=0.5 and 1.0, with a mere tendency to increase for method 2; the intercepts, however, are significantly (P<0.001) different from zero when r_g,0=1.0 (intercept=0.052 and 0.048 for methods 2 and 3, respectively). Method 2, more than method 3, thus tends to overestimate positive genetic correlations.

In conclusion, under among-environment heteroscedasticity, methods 2 and 3 perform better than the standard method 1, except when the theoretical genetic correlation is near zero. Method 3 is recommended in all other cases, with the exception of moderate genetic correlation and low to moderate heteroscedasticity (e.g. ratios of about 2 to 12 between the genetic variances), for which method 2 is almost equivalent to method 3 but suffers from lack of reliable genetic correlation estimates. Increasing the number of replicates per genotype and environment affects the performance of method 2, especially when the genetic correlation is strong, whether positive or negative. Increasing the number of replicates does not affect the performance of method 3, as its adjustment for inflated genetic variance estimates in the data transformation is effective in the range of σ²_ε/σ²_G ratios considered (i.e. 1.0–4.0). This simulation study represents the first demonstration that a method of genetic correlation estimation based on data transformation is efficient in removing the nuisance effects of among-environment heteroscedasticity in an ANOVA-based approach.

The Chlamydomonas example

Bell's (1991) data are distinctly heteroscedastic, the highest ratio of genetic variances between environments being equal to 32.6 (Dutilleul & Potvin, 1995). Based on the results of the simulation study, therefore, we expected method 1 to underestimate the absolute value of the genetic correlation consistently. Indeed, the slope of the major axis between the genetic correlations estimated with methods 1 and 2 is significantly lower than 1.0 (Fig. 3: n=19, slope=0.85, 95% confidence interval=[0.76, 0.95]), as is that of the regression contrasting methods 1 and 3 (Fig. 3: n=24, slope=0.84, 95% confidence interval=[0.72, 0.98]). The slope of the major axis between the genetic correlations estimated with methods 2 and 3 is very close to 1.0 (Fig. 3: n=19, slope=1.019, 95% confidence interval=[1.001, 1.037]), which indicates that methods 2 and 3 are almost equivalently unbiased. Nevertheless, method 2 was less reliable than method 3 because the former provided no estimates of the genetic correlation for seven pairs of environments and yielded two estimates outside the [−1, 1] range, whereas method 3 always produced an estimate, even though four of them were out of the range. Method 1 yielded a genetic correlation estimate for all 28 pairs of environments (as did method 3), and only one of them was out of range. From the analysis of the Chlamydomonas data, one may conclude that method 1 consistently underestimated the absolute value of the genetic correlation compared with methods 2 and 3, and that method 3 should be preferred to method 2 because of its higher reliability.

Lack of effectiveness of other methods

We complete our discussion by elaborating on other methods that were not retained for the simulation study, either on a theoretical basis or after poor preliminary results. The three methods below are all based on eqn (3) and are performed on the log-transformed data, transformed data with a zero mean and a unit variance and the raw data in the framework of the mixed ANOVA models, respectively.

The log transformation is a well-known and very simple variance-stabilizing transformation (e.g. Sokal & Rohlf, 1995). In model (1), it would be applied to all observations Y_ijk indiscriminately, but in eqn (4) the data from two environments with unequal genetic variances are transformed differently: the dispersion of the observations from the environment with the higher variance is decreased, whereas that of the observations from the environment with the lower variance is increased (Fig. 1). More importantly, the log transformation modifies the within-environment means and thus the environment main effects in model (1) (i.e. the ‘mean plasticity’; Bell & Lechowicz, 1994), without completely removing the among-environment heteroscedasticity. On that basis, it cannot be recommended.

Transforming all the data from each environment to a zero mean and a unit variance (including the replicates) removes the environment main effects from model (1) (i.e. there remains no term for mean plasticity) and imposes a particular among-environment homoscedasticity with a common variance of 1.0 that can sometimes be quite out of range (i.e. much higher or much lower than the variances computed on the raw data). The variance components estimated from such transformed data are of no use per se; two experiments cannot be compared on the basis of their within-environment variances if these are all fixed at 1.0. Furthermore, the common within-environment variance is not a common genetic variance, because it is computed over genotypes and replicates within a genotype instead of among the genotypic means, and it incorporates the entire error variance because the variance of observation Y_ijk is σ²_G+σ²_{Ge, j}+σ²_ε in model (1). This point is particularly important from the perspective of genetic correlation estimation from which the contamination by the error variance should be absent or at least minimized. Nevertheless, the estimates of variance components σ²_G and σ²_Ge change in such a way after (0, 1) transformation that the resulting genetic correlation estimates are similar to those provided by method 3 when r=1, because the genotype–environment interaction is then indistinguishable from the error term in eqn (1). Otherwise, the (0, 1) transformation only approximates method 3 in both estimation and testing. In summary, in the broad framework of plasticity analysis, the (0, 1) transformation is not recommended; only when an approximation of the genetic correlation is sufficient (without testing) can this transformation be used.

To recall, the environment main effects, or similarly the within-environment means, are maintained by the new transformation (eqn 4) Fig. 1; see also Dutilleul & Potvin, 1995). Equation (4) also uses an intermediate common genetic variance given by the geometric mean of the genetic variances of environments j and j§ (Fig. 1), and method 3 provides the user with an adjustment for the contamination of the genetic variance estimates by the error variance divided by the number of replicates.

Lastly, PROC MIXED (SAS Institute Inc., 1995) may seem to be an obvious solution to the problem of among-environment heteroscedasticity in the analysis of genetic correlations. In fact, this procedure carries out repeated-measures ANOVA [the random vectors y_ik in model (2) are profile vectors of repeated measures on the same genotype within a growth chamber], while estimating one variance component associated with each random term [genotype main effects and genotype–environment interaction in model (1)] and a variance–covariance matrix for the errors. Unfortunately, when using the REPEATED statement of PROC MIXED, the variance estimated for each environment separately is then an error variance instead of a genotype–environment interaction variance component and the covariance estimated between environments is computed between the corresponding errors. The correlation derived from this covariance will thus generally be far from the theoretical genetic correlation.

References

Bell, G. (1990). The ecology and genetics of fitness in Chlamydomonas. I. Genotype-by-environment interaction among pure strains. Proc R Soc B, 240: 295–321.
Article Google Scholar
Bell, G. (1991). The ecology and genetics of fitness in Chlamydomonas. III. Genotype-by-environment interaction within strains. Evolution, 45: 668–679.
PubMed Google Scholar
Bell, G. (1992). The ecology and genetics of fitness in Chlamydomonas. V. The relationship between genetic correlation and environmental variance. Evolution, 46: 561–566.
Article PubMed Google Scholar
Bell, G. and Lechowicz, M. J. (1994). Spatial heterogeneity at small scales and how plants respond to it. In: Caldwell, M. M. & Pearcy, R. W. (eds) Exploitation of Environmental Heterogeneity by Plants: Ecophysiological Processes Above and Below Ground, pp. 391–414. Academic Press, San Diego, CA.
Chapter Google Scholar
Dutilleul, P. and Potvin, C. (1995). Among-environment heteroscedasticity and genetic autocorrelation: implications for the study of phenotypic plasticity. Genetics, 139: 1815–1829.
CAS PubMed PubMed Central Google Scholar
Falconer, D. S. (1952). The problem of environment and selection. Am Nat, 86: 293–298.
Article Google Scholar
Falconer, D. S. (1989). Introduction to Quantitative Genetics. 3rd edn. Longman, Harlow, Essex.
Google Scholar
Fernando, R. L., Knights, S. A. and Gianola, D. (1984). On a method of estimating the genetic correlation between characters measured in different environmental units. Theor Appl Genet, 67: 175–178.
Article CAS PubMed Google Scholar
Fry, J. D. (1992). The mixed-model analysis of variance applied to quantitative genetics: biological meaning of the parameters. Evolution, 46: 540–550.
Article PubMed Google Scholar
Fry, J. D., Heinsohn, S. L. and Mackay, T. F. C. (1996). The contribution of new mutations to genotype–environment interaction for fitness in Drosophila melanogaster. Evolution, 50: 2316–2327.
Article PubMed Google Scholar
Gebhardt, M. D. and Stearns, S. C. (1988). Reaction norms for developmental time and weight at eclosion in Drosophila mercatorum. J Evol Biol, 1: 335–354.
Article Google Scholar
Grant, P. R. and Grant, B. R. (1995). Predicting microevolutionary responses to directional selection on heritable variation. Evolution, 49: 241–251.
Article PubMed Google Scholar
Graybill, F. A. (1983). Matrices with Applications in Statistics, 2nd edn. Wadsworth, Pacific Grove, CA.
Google Scholar
Houle, D. (1992). Comparing evolvability and variability of quantitative traits. Genetics, 130: 195–204.
CAS PubMed PubMed Central Google Scholar
Robertson, A. (1959). The sampling variance of the genetic correlation coefficient. Biometrics, 15: 469–485.
Article Google Scholar
Roff, D. A. and Preziosi, R. (1994). The estimation of the genetic correlation: the use of the jackknife. Heredity, 73: 544–548.
Article Google Scholar
SAS INSTITUTE INC. (1988). SAS/IML^TM User's Guide, Release 6.03. SAS Institute Inc., Cary, NC.
SAS INSTITUTE INC. (1989). SAS/STAT^® User's Guide, Version, 6, 4th edn. SAS Institute Inc., Cary, NC.
SAS INSTITUTE INC. (1990). SAS^®Language: Reference, Version 6. SAS Institute Inc., Cary, NC.
SAS INSTITUTE INC. (1995). Introduction to the MIXED Procedure Course Notes. SAS Institute Inc., Cary, NC.
Schlichting, C. D. and Pigliucci, M. (1995). Gene regulation, quantitative genetics and the evolution of reaction norms. Evol Ecol, 9: 154–168.
Article Google Scholar
Simons, A. M. and Roff, D. A. (1996). The effect of a variable environment on the genetic correlation structure of a field cricket. Evolution, 50: 267–275.
Article PubMed Google Scholar
Sokal, R. R. and Rohlf, F. J. (1995). Biometry: the Principles and Practice of Statistics in Biological Research. 3rd edn. Freeman, New York.
Google Scholar
Via, S. (1984). The quantitative genetics of polyphagy in an insect herbivore. II. Genetic correlations in larval performance within and among host plants. Evolution, 38: 896–905.
Article PubMed Google Scholar
Via, S. (1993). Adaptative phenotypic plasticity: target or byproduct of selection in a variable environment? Am Nat, 142: 352–365.
Article CAS PubMed Google Scholar
Via, S. and Lande, R. (1985). Genotype–environment interaction and the evolution of phenotypic plasticity. Evolution, 39: 505–522.
Article PubMed Google Scholar
Via, S. and Lande, R. (1987). Evolution of genetic variability in a spatially heterogeneous environment: effects of genotype–environment interaction. Genet Res, 49: 147–156.
Article CAS PubMed Google Scholar
Yamada, Y. (1962). Genotype by environment interaction and genetic correlation of the same trait under different environments. Jap J Genet, 37: 498–509.
Article Google Scholar

Download references

Acknowledgements

The authors are indebted to Dr G. Bell for permission to reanalyse his data in the present paper. We are grateful to Dr M. J. Kearsey and Dr T. J. Crawford for their editorial work, and to an anonymous referee for his suggestions and comments. The research work of both authors is funded through NSERC grants in Ecology and Evolution. The research work of the first author is also supported by FCAR.

Author information

Authors and Affiliations

Department of Plant Science, McGill University, Macdonald Campus, 21111 Lakeshore Road, Ste-Anne-de-Bellevue, H9X 3V9, Québec, Canada
Pierre Dutilleul
Centre de Recherche en Horticulture, Université Laval, Pavillon Envirotron, Cité universitaire, G1K 7P4, Québec, Canada
Yves Carrière

Authors

Pierre Dutilleul
View author publications
You can also search for this author in PubMed Google Scholar
Yves Carrière
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre Dutilleul.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dutilleul, P., Carrière, Y. Among-environment heteroscedasticity and the estimation and testing of genetic correlation. Heredity 80, 403–413 (1998). https://doi.org/10.1046/j.1365-2540.1998.00267.x

Download citation

Received: 27 September 1996
Published: 01 April 1998
Issue Date: 01 April 1998
DOI: https://doi.org/10.1046/j.1365-2540.1998.00267.x

Keywords

This article is cited by

Prospective evaluation of designs for analysis of variance without knowledge of effect sizes
- C. Patrick Doncaster
- Andrew J. H. Davey
- Philip M. Dixon
Environmental and Ecological Statistics (2014)

Among-environment heteroscedasticity and the estimation and testing of genetic correlation

Abstract

Similar content being viewed by others

Correlational selection in the age of genomics

Species-specific effects of thermal stress on the expression of genetic variation across a diverse group of plant and animal taxa under experimental conditions

Plant genetic diversity affects multiple trophic levels and trophic interactions

Introduction

The mixed two-way analysis-of-variance model