Main

Analysis of individual variation in transcriptional response to drug treatment has successfully identified regulatory genetic variants that interact with treatment in model organisms11 and human tissues12,13,14,15. Cellular transcriptional analysis may be particularly useful for investigating genetic influences on statin efficacy, as statin-induced plasma LDL lowering is controlled through sterol-response element binding protein (SREBP)–mediated transcriptional regulation16. Therefore, to identify novel regulatory variants that interact with statin exposure, we conducted a genome-wide eQTL analysis based on comparing simvastatin exposure versus control exposure of 480 lymphoblastoid cell lines (LCLs) derived from European American participants in the Cholesterol and Pharmacogenetics (CAP) trial (http://www.clinicaltrials.gov/ct2/show/NCT00451828). LCLs have proven to be a useful model system for the study of genetic regulation of gene expression17,18. Although non-genetic sources of variation, if uncontrolled, may limit the utility of LCLs for transcriptional perturbation analyses19,20, there has been increasing use of these cells to screen for genetic variants associated with molecular response to drug intervention20. Furthermore, many features of statin-mediated regulation of cholesterol metabolism are operative in LCLs21.

Simvastatin exposure had a significant effect on gene expression levels for 5,509 of 10,195 expressed genes (54%, false discovery rate (FDR) < 0.0001). The magnitude of change in expression across all responsive genes was small (0.12 ± 0.08 mean absolute log2 change ± s.d., Fig. 1) with 1,952 genes exhibiting ≥10% change in expression and only 21 genes exhibiting ≥50% change in expression. Among the strongest responders were 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR), which encodes the direct target of simvastatin inhibition (0.49 ± 0.29 mean log2 change ± s.d., P < 0.0001, n = 480), and low density lipoprotein receptor (LDLR), which encodes the receptor responsible for internalization of LDL particles (0.50 ± 0.35 mean log2 change ± s.d., P < 0.0001). As expected, surface expression of the LDLR protein was also increased following simvastatin exposure (1.6 ± 0.11 mean log2 change ± s.d., P < 0.0001, n = 474). Gene-set enrichment analysis showed a treatment-dependent increase in expression of genes involved in steroid biosynthesis, consistent with the mechanism responsible for the lipid-lowering response to statin, and a decrease in expression of genes involved in RNA splicing, consistent with evidence for statin regulation of alternative splicing of genes involved in cellular cholesterol homeostasis22 (Supplementary Fig. 1).

Figure 1: Simvastatin treatment alters transcript expression in LCLs.
figure 1

Log change in expression following simvastatin- and control-exposed lymphoblastoid cell lines (n = 480) displayed as a function of the log sum of expression traits. Grey, genes for which expression was significantly changed in response to simvastatin exposure (n = 5,509 genes, 0.12 ± 0.08 mean absolute log2 change ± s.d., q < 0.0001); black: genes for which expression was not significantly changed (n = 4,686); red, genes in the cholesterol biosynthesis pathway, all of which exhibited significant changes in expression. A.U., arbitrary units.

PowerPoint slide

We first identified eQTLs without considering whether they interact with simvastatin exposure. We computed Bayes factors23 to quantify evidence for the association between every single-nucleotide polymorphism (SNP) and the expression level of each gene, and we used permutations to estimate FDRs (see Methods). This analysis identified 4,590 genes with cis-eQTLs, defined as eQTLs within 1 megabase (Mb) of the gene’s transcription start or end site (FDR = 1%, log10 Bayes factor ≥ 3.24, Supplementary Table 1). Statistical power to detect eQTLs was substantially increased by controlling for known covariates and unknown confounders (represented by principal components of the gene expression data24,25) and by testing for association with expression traits averaged across paired simvastatin- and control-exposed samples to reduce measurement error (Supplementary Table 2 and Supplementary Fig. 2). Our analysis also identified 98 trans-eQTLs at the same stringent FDR (FDR = 1%, log10 Bayes factor ≥ 7.20, Supplementary Table 3).

To identify eQTLs that interact with simvastatin exposure (that is, eQTLs with different effects in control- versus simvastatin-exposed samples, or differential eQTLs), we used two approaches14: first, univariate association mapping of log fold expression change between paired control- and simvastatin-exposed samples; and second, bivariate association mapping of paired control- and simvastatin-exposed samples. This bivariate approach aims to improve power and interpretability by explicitly distinguishing among different modes of interaction (see Methods), which the univariate approach does not distinguish. The univariate approach identified differential cis-eQTLs for four genes: GATM, RSRC1, VPS37D and OR11L1 (FDR = 20%, log10 Bayes factor ≥ 4.9, Supplementary Tables 4 and 5). No differential trans-eQTLs were identified at an FDR of 20%, so trans analyses were not pursued further (see Supplementary Table 6 for top differential trans-eQTLs). The bivariate approach identified differential cis-eQTLs for six genes (FDR = 20%, log10 Bayes factor ≥ 5.1; Supplementary Tables 4 and 7, Supplementary Fig. 3 and Supplementary Data), including two genes that were not identified in the univariate analysis: ATP5SL and ITFG2. Both GATM and VPS37D had significantly stronger eQTL associations under simvastatin-exposed conditions in comparison to control, whereas the other four genes had significantly stronger eQTL associations under control-exposed conditions (Fig. 2a, Supplementary Table 4 and Supplementary Fig. 3). As in similar studies12,13,14,17, we found many fewer differential eQTLs than stable eQTLs, or SNPs with similar effects across both conditions. The finding of relatively few gene by exposure interactions, and of relatively modest effect sizes of those interactions, seems remarkably consistent across studies regardless of method (including family-based comparisons), exposure, sample size, sample source, or the number of stable eQTLs detected. We focus further analysis on our most significant differential association from the bivariate model, the GATM locus, for which we observed stronger evidence for eQTL association after statin exposure and for which there was evidence for biological relevance to pathways involved in lipoprotein metabolism and myopathy (see Supplementary data).

Figure 2: Treatment-specific QTL associated with GATM expression.
figure 2

a, Association of rs9806699 with quantile normalized GATM expression levels following control exposure (left panel, not significant); simvastatin exposure (middle left panel, log10 Bayes factor (BF) = 5.1, effect size = −0.43); fold change (middle right panel, log10 BF = 5.7, effect size = −0.40); control versus simvastatin-exposed GATM expression (right panel; black, GG, n = 225; red, GA, n = 207; green, AA, n = 48). Box height and whiskers are described in Supplementary Methods. b, Top panel, SNPs associated with GATM expression (log10 BF, left y axis); SNPs associated with statin-induced myopathy (red); significance threshold (dotted line) recombination rates in centimorgans (cM) per megabase (Mb) (blue, right y axis). Bottom panel, transcribed genes (green); DNase I hypersensitive (DHS) sites and transcription factor binding sites (TFBS; black); predicted chromosomal enhancers (orange) and promoters (red) as identified in hepatocyte (HepG2), lymphoblastoid (GM12878), and myocyte (HSMM) cell lines, using ChromHMM software (see Methods).

PowerPoint slide

GATM encodes glycine amidinotransferase, an enzyme that is required for the synthesis of creatine. We observed evidence for differential eQTL association with GATM (log10 Bayes factor > 5.1) across a group of 51 SNPs within the GATM locus that are in linkage disequilibrium (chromosome 15: 45627979–45740392, hg19, r2 = 0.85–0.99, n = 587). The most significant differential eQTL association was observed with SNP rs9806699 (minor allele frequency (MAF) = 0.32), for which we observed stronger evidence for an association with GATM expression following simvastatin exposure (log10 Bayes factor = 5.1, effect size = −0.43) than following control exposure (log10 Bayes factor = 0.52, effect size = −0.17, Fig. 2a). SNPs at this locus also had a stable association with expression of a neighbouring gene, SPATA5L1 (differential eQTL rs9806699 log10 Bayes factor = −0.33, stable eQTL rs9806699 log10 Bayes factor = 21.75, Supplementary Fig. 4). This locus has been shown previously to be associated with reduced glomerular filtration rate (GFR)26 with a small effect size (<1%). This association was specific to GFR as estimated from plasma creatinine but not from cystatin C, a second biomarker of renal function, suggesting that the association was related to variation in creatinine production rather than renal elimination. We found evidence for SNP differential association with GATM that spans the GATM coding region and includes multiple SNPs located within DNase I hypersensitive sites, active promoters and several alternative GATM transcription start sites (Fig. 2b).

Phosphorylation of creatine, the primary downstream product of GATM activity, is a major mechanism for energy storage in muscle and is mediated by creatine kinase, the primary plasma biomarker of statin-induced myopathy. To test the relationship of this locus with statin-induced myopathy, we examined the association of the GATM differential eQTL locus with statin-induced myopathy in a population-based cohort comprised of 72 cases of myopathy and 220 matched controls (Marshfield cohort)27. In this cohort, we observed that the minor allele at the GATM differential eQTL locus was associated with reduced incidence of statin-induced myopathy (odds ratio = 0.61, 95% confidence interval = 0.39–0.95, P = 0.03; Table 1). This association was replicated in a second cohort consisting of 100 cases of myopathy identified within the Study of Effectiveness of Additional Reductions in Cholesterol and Homocysteine (SEARCH; http://clinicaltrials.gov/ct2/show/NCT00124072)10 (odds ratio for rs1719247 = 0.61, confidence interval = 0.42–0.88, P = 0.01; r2 = 0.70 to rs9806699; Table 1). Meta-analysis of these two cohorts showed an overall odds ratio of 0.60 (confidence interval = 0.45–0.81, P = 6 × 10−4, log10 Bayes factor = 1.5, Table 1). As myopathy is defined in part through elevation in plasma creatine kinase concentrations, we also tested for a direct association of this locus with this enzyme in statin-treated populations in which myopathy was not observed. Within CAP (40 mg per day simvastatin exposure for 6 weeks), no association of rs9806699 was observed with plasma creatine kinase either before simvastatin exposure (n = 575, P = 0.83) or following exposure (n = 574, P = 0.48). This lack of association was confirmed in a second statin study (Justification for the Use of Statins in Prevention: an Intervention Trial Evaluating Rosuvastatin (JUPITER) trial, 20 mg per day rosuvastatin, median follow-up = 1.9 years; http://clinicaltrials.gov/show/NCT00239681) both before rosuvastatin exposure (n = 8,504, P = 0.54) and after treatment (n = 3,052, P = 0.83)3. These findings suggest that the observed association of the GATM locus with risk for statin-induced myopathy is independent of an association with plasma creatine kinase. Although the present studies do not address the mechanism for the link between reduced GATM expression and protection from statin-induced myopathy, it is thought that diminished capacity for phosphocreatine storage modifies cellular energy storage and adenosine monophosphate-activated protein kinase (AMPK) signalling28,29 in a manner that is protective against cellular stress as induced by glucose deprivation29 or, potentially, by cholesterol depletion. Given that myocellular creatine stores are predominantly derived from renal and hepatic creatine biosynthesis, these results raise the possibility that statins may predispose to muscle toxicity in part through metabolic effects in the liver, the major site of statin's pharmacologic actions (Supplementary Fig. 5). Conversely, the finding of severe myopathy in two cases of extreme genetic GATM deficiency30 suggests that this protective effect may be overcome if creatine synthesis is insufficient to support myocellular energy needs.

Table 1 Associations of SNPs at the GATM locus with statin-induced myopathy.

Given the influence of statin exposure on regulation of GATM expression, we next tested whether GATM may modulate sterol-mediated changes in cholesterol homeostasis. Knockdown of GATM in hepatocyte-derived cell lines (HepG2 and Huh7) resulted in reduced upregulation of SREBP-responsive genes (HMGCR, LDLR and SREBP2) by sterol depletion (Fig. 3a). Moreover, GATM knockdown decreased media accumulation of apolipoprotein B (apoB), the major structural protein of LDL, in both cell lines (P < 0.05; Fig. 3b), but did not alter levels of apoAI, the major structural protein in high density lipoproteins (HDLs, Fig. 3b). An effect of GATM deficiency on cholesterol and lipoprotein metabolism is supported further by a recent study describing reduced plasma cholesterol concentrations in GATM knockout mice28.

Figure 3: GATM knockdown attenuated sterol-mediated induction of expression of SREBP-responsive genes.
figure 3

a, Changes in transcript concentrations following sterol depletion through 24 h exposure to lipoprotein deficient serum (LPDS)-containing media versus standard FBS-containing media in hepatocyte-derived HepG2 (left, n = 12) and Huh7 (right, n = 12) cell lines. Asterisk indicates P < 0.05 for the comparison of GATM versus non-targeting control (NTC) siRNA treated cells. b, Fold changes in accumulation of apolipoprotein B (ApoB) and apolipoprotein AI (ApoAI) in media after gene knockdown with GATM versus (NTC) siRNA in HepG2 cells (left, n = 6–10) or Huh7 cells (right, n = 4–6) under standard culture conditions. Experiments repeated 2–3 times with 2–8 biological replicates each. Data presented as average values. Error bars, s.e.m.

PowerPoint slide

In summary, this study has provided evidence that functionally significant genetic effects can be discovered using a novel cell-based screen for gene-by-treatment effects on transcriptional expression. This approach has led to the identification of GATM as a genetic locus associated with statin-induced myopathy, and as a potential link between cellular cholesterol homeostasis and energy metabolism.

Methods Summary

Gene expression levels were measured using the Illumina Human-Ref8v3 beadarray in 480 lymphoblastoid cell lines derived from European American participants in CAP, a 6-week trial of simvastatin (40 mg per day), after 24 h exposure to 2 μM activated simvastatin or control buffer. Treatment-specific effects were modelled after adjustment for known covariates and unknown confounding variables using linear regression, and eQTLs were identified using the BIMBAM software after imputing the available genotypes. Differential eQTLs were identified using the BIMBAM software (univariate test) as well as linear models of differential association (bivariate tests). Associations with myopathy were tested in two cohorts containing cases of statin-induced myopathy (definitions of myopathy in Marshfield and SEARCH are described in the Supplementary Methods), and associations with plasma creatine kinase were tested in two statin trials that did not contain myopathy cases (CAP and JUPITER). Media accumulation of apolipoproteins was measured by enzyme-linked immunosorbent assay (ELISA) and gene expression was measured by quantitative PCR in hepatoma cell lines (HepG2 and Huh7) after GATM knockdown as achieved by 48 h transfection of Ambion Silence Select short interfering RNA (siRNA) or non-targeting control. See full Methods for complete details.

Online Methods

In vitro simvastatin exposure of lymphoblastoid cell lines

Lymphoblastoid cell lines (LCLs), immortalized by Epstein–Barr virus transformation of lymphocytes isolated from whole blood31, were derived from European American participants in the CAP trial, a 6-week simvastatin trial of 40 mg per day (Supplementary Table 8)2. Simvastatin was provided by Merck , converted to active form (beta-hydroxy simvastatin acid, SVA) and quantified by liquid chromatography–tandem mass spectrometry as described previously21. LCLs were normalized to a uniform cell density and exposed to 2 μM SVA (simvastatin-exposed) or control buffer (control-exposed) for 24 h as described previously21. This concentration was selected by assessing dose–response effects on expression profiles (n = 8 LCLs, 4 doses), wherein a more robust change in expression profiles was observed with 2 μM simvastatin exposure (7.8% of genes, q = 0.001) than lower doses (<0.1% of genes for 0.02 μM or 0.2 μM, q = 0.001, data not shown). Pre-experiment cell density was recorded as a surrogate for cell growth rate. After exposure, cells were lysed in RNAlater (Ambion), and RNA was isolated using the Qiagen miniprep RNA isolation kit with column DNase treatment.

Expression profiling and differential expression analysis

RNA quality and quantity were assessed using a Nanodrop ND-1000 spectrophotometer and an Agilent Bioanalyzer, respectively. Paired RNA samples, selected based on RNA quality and quantity, were amplified and labelled with biotin using the Illumina TotalPrep-96 RNA amplification kit, hybridized to Illumina HumanRef-8v3 beadarrays (Illumina), and scanned using an Illumina BeadXpress reader. Data were read into GenomeStudio and samples were selected for inclusion based on quality-control criteria: signal to noise ratio (95th:5th percentiles); matched gender between sample and data; and average correlation of expression profiles within three standard deviations of the within-group mean (r = 0.99 ± 0.0093 for control-exposed and r = 0.98 ± 0.0071 for simvastatin-exposed beadarrays). In total, viable expression data were obtained from 1,040 beadarrays including 480 sets of paired samples for 10,195 genes. Genes were annotated through biomaRt from ensMBL Build 54 (http://may2009.archive.ensemble.org/biomart/martview). Treatment-specific effects were modelled from the data following adjustment for known covariates using linear regression32. False discovery rates were calculated for differentially expressed transcripts using the qvalue package33. Ontological enrichment in differentially expressed gene sets was measured using GSEA (1,000 permutations by phenotype) using gene sets representing Gene Ontology biological processes as described in the Molecular Signatures v3.0 C5 Database (10–500 genes per set)34.

Expression QTL mapping

For association mapping, we use a Bayesian approach23 implemented using the software package BIMBAM35 that is robust to poor imputation and small minor-allele frequencies36. Gene expression data were normalized as described in the Supplementary Methods for the control-treated (C480) and simvastatin-treated (T480) data and used to compute D480 = T480 − C480 and S480 = T480 + C480, where T480 represents the adjusted simvastatin-treated data and C480 represents the adjusted control-treated data. SNPs were imputed as described in the Supplementary Methods. To identify eQTLs and differential eQTLs, we measured the strength of association between each SNP and gene in each analysis (control-treated, simvastatin-treated, averaged, and difference) using BIMBAM with default parameters35. BIMBAM computes the Bayes factor for an additive or dominant response in expression data as compared with the null, which is that there is no correlation between that gene and that SNP. BIMBAM averages the Bayes factor over four plausible prior distributions on the effect sizes of additive and dominant models. We used a permutation analysis (see Supplementary Methods) to determine cutoffs for eQTLs in the averaged analysis (S480) at an FDR of 1% for cis-eQTLs (log10 Bayes factor > 3.24) and trans-eQTLs (log10 Bayes factor > 7.20). For cis-eQTLs, we considered the largest log10 Bayes factor above the cis-cutoff for any SNP within 1 Mb of the transcription start site or the transcription end site of the gene under consideration. For trans-eQTLs, we considered the largest log10 Bayes factor above the trans-cutoff for any SNP, and if that SNP was in the cis-neighbourhood of the gene being tested, we ignored any potential trans-associations; there were 6,130 genes for which the SNP with the largest log10 Bayes factor was not in cis with the associated gene. Correspondingly, we only considered those 6,130 genes when computing the permutation-based FDR for the trans-associations.

Differential expression QTL mapping

We define cis-SNPs as being within 1 Mb of the transcription start site or end site of that gene. To identify differential eQTLs, we first computed associations between all SNPs and the log fold change using BIMBAM as above.

We then considered a larger set of models for differential eQTLs. The associations for the genes in Supplementary Fig. 3 indicate that there are a few possible patterns of differential association. Although these patterns may have different mechanistic or phenotypic interpretations, they are not distinguished by a test of log fold change. We used the interaction models introduced in another paper14 to compute the statistical support (assessed with Bayes factors) for the four alternative eQTL models described above versus the null model (no association with genotype). These methods are based on a bivariate normal model for the treated data (T) and control-treated data (U). Note that simply quantile transforming T and U to a standard normal distribution is not sufficient to ensure that they are jointly bivariate normal, and so we used the following more extensive normalization procedure. Let D = qT − qU and S = qT + qU, where q indicates that the vector following it has been quantile normalized. We then quantile normalize and scale D and S to produce S = (σSqS) and D = (σDqD), where σS, σD are robust estimates of the standard deviations of S and D, respectively (specifically, they are the median absolute deviation multiplied by 1.4826). Note that this transformation ensures that S and D are univariate normal. Furthermore, they are approximately independent, which ensures that they are also bivariate normal. Finally, let U = 0.5(S – D) and T = 0.5(S + D).

The Bayes factor when the eQTL effect is identical in the two conditions (model 1) uses the linear model L(S  D + g), where g is the vector of genotypes at a single SNP. The Bayes factor when the eQTL is only present in the control-treated samples (model 2) uses the model L(U  T + g). The Bayes factor when the eQTL is only present in the simvastatin-treated samples (model 3) uses the model L(T  U + g). The Bayes factor when the eQTL effect is in the same direction but unequal in strength (model 4) uses the model L(D  S + g). We averaged each Bayes factor for each gene and each cis-SNP over four plausible effect size priors (0.05, 0.1, 0.2 and 0.4).

To find eQTLs that interact with treatment (that is, those that conform best to one of the differential models 2–4, rather than the null model or the stable model) we defined an interaction Bayes factor (IBF) = 2(Bayes factor2 + Bayes factor3 + Bayes factor4)/3(Bayes factor1 + 1), where Bayes factori denotes the Bayes factor for model i compared with the null model (the 1 in the denominator represents the null model Bayes factor0). Large values of the IBF represent strong support for at least one interaction model (2–4) compared with the two non-interacting models (0–1), and hence strong support for a differential association.

Association with statin-induced myopathy

For the Marshfield Cohort37, cases of myopathy were identified from electronic medical records of patients treated at the Marshfield Clinic (Wisconsin, USA) using a combination of automated natural language processing and manual review as described previously37. Seventy-two cases of incipient myopathy (creatine kinase concentrations greater than 3-fold normal concentrations, with evidence in the charts of muscle complaints) were identified for which patients were not also undergoing treatment with concomitant drugs known to increase incidence of statin-induced myopathy (fibrates or niacin). Controls were matched based on statin exposure, age and gender. This study was approved by the Marshfield Clinic institutional review board. The study population included residents living in Central and Northern Wisconsin, served by the Marshfield Clinic, a large multi-specialty group practice37. For the SEARCH and Heart Protection Study Collaborative Groups10,38, a total of 100 myopathy cases were identified from participants with genotyping data in the SEARCH trial, including 39 definite myopathy cases (creatine kinase > 10 × upper limit of normal (ULN) with muscle symptoms) and 61 incipient myopathy cases (defined as creatine kinase ≥ 5 times baseline value and alanine transaminase ≥1.7 times baseline value and creatine kinase >3 × ULN). Genotypes were available from the Illumina Human610-Quad Beadchip for 25 myopathy cases (12% of which had definite myopathy) and from the Illumina HumanHap300-Duo BeadChip for 75 myopathy cases (48% of which had definite myopathy). Genotypes for rs9806699 were only available for the 25 cases genotyped on the Illumina Human610-Quad Beadchip, so proxy SNPs were used for analyses in Table 1. Analyses of rs9806699 are provided in Supplementary Table 9. All myopathy cases were compliant with statin therapy (95 myopathy cases occurred while the patient was taking simvastatin 80 mg daily, and 5 cases while taking simvastatin 20 mg daily). Controls were identified from the SEARCH Study as well as from the Heart Protection Study (where considerably more participants had been genotyped). Controls from the Heart Protection Study had similar baseline characteristics to those in the SEARCH Study and inclusion of this large number of additional controls improved statistical power. Multi-centre ethics approval was obtained from the South East Research Ethics Committee for the SEARCH study, and from the local ethics committees covering each of the 69 UK hospitals involved in the Heart Protection Study. Genetic associations were determined by chi-squared analysis using an additive model. A meta-analysis was performed using a random effects model and, for the Bayesian analysis, we used an expected effect size of 0.2. Associations of rs9806699 with plasma creatine kinase in the CAP2 and JUPITER3 trials were also assessed using linear regression. The CAP trial (ClinicalTrials.gov number NCT00451828) was approved by the institutional review boards located at Children’s Hospital Oakland Research Institute (Oakland, California) and all enrollment sites. The JUPITER trial (ClinicalTrials.gov number NCT00239681) was approved by the Institutional Review Board of Brigham and Women's Hospital. Informed consent was obtained from all participants in all trials.

Genomic characterization of GATM locus

Cis-regulatory elements were downloaded from the ChromHMM (ref. 39) track of the UCSC Genome Browser (ref. 40) and aggregated manually.

Functional analysis of candidate genes

GATM knockdown was achieved by 48 h transfection of Ambion Silence Select siRNA or non-targeting control into 80,000 HepG2 or Huh7 cells per well in 12-well plates. To assess the influence of sterol depletion, cell culture medium was replaced with medium containing 10% lipoprotein deficient serum (Hyclone) or fetal bovine serum (Omega Scientific) at 24 h after transfection. All samples were harvested 48 h post transfection. Transcript levels were quantified by quantitative PCR and normalized to CLPTM. Cell culture medium was taken from all samples at the time of collection, and ApoB (MP Biomedicals) and ApoAI (Meridian Life Sciences) were quantified in triplicate by sandwich-style ELISA. Samples with a coefficient of variation greater than 15% were subjected to repeat measurement.