Main

Genome-wide association studies (GWAS) of prostate cancer have identified more than 30 variants associated with risk that, in aggregate, are estimated to account for approximately 20% of the familial risk of prostate cancer1,2,3,4,5,6,7,8,9,10,11,12. Aside from admixture and fine-mapping studies that identified multiple independent risk variants at 8q24 (refs. 13,14), and a more recent GWAS among Japanese men that identified five new loci9, discoveries in prostate cancer have come from studies in men of European ancestry. However, prostate cancer incidence in men of African ancestry is greater than in non-African populations15, with the disparity presumably reflecting both differences in prevalence of environmental risk factors and susceptibility alleles that are shared among men of African descent. For example, the risk variants at 8q24, many of which are more common in men of African ancestry14, could contribute partly to the greater incidence of prostate cancer in this population and provide some support for the hypothesis of a genetic contribution underlying racial and ethnic disparities in disease risk.

We assembled a consortium of prostate cancer studies that included men of African ancestry and conducted a GWAS to search for additional risk loci that may be more common in men of African descent. Stage 1 included 3,621 African-American cases with prostate cancer and 3,502 African-American controls drawn from 11 studies (Supplementary Table 1 and Online Methods). We conducted genotyping in stage 1 using the Illumina Infinium 1M-Duo. Following quality-control exclusions (Online Methods), the stage 1 analysis consisted of 1,047,986 SNPs (minor allele frequency ≥0.01) examined in 3,425 cases and 3,290 controls.

In comparing (for all SNPs) the observed with the expected distribution of P values from a 1-degree-of-freedom trend test, there was evidence of inflation in the test statistic (λ = 1.11). Principal components analysis highlighted the high degree of admixture in this population, and the overinflation diminished following additional adjustment for ancestry (λ = 1.03; Supplementary Fig. 1 and Online Methods). The association of four SNPs achieved genome-wide significance in the stage 1 sample, with P values between P = 5.4 × 10−9 and P = 5.7 × 10−13 (Fig. 1). These SNPs are located in known prostate cancer risk regions, three of which are at 8q24 (rs10505483, rs1456315 and rs7824364 at 128.173–128.205 Mb (NCBI36) and one of which is at 11q13 (rs7130881 at 67.75 Mb).

Figure 1
figure 1

A plot of the −log10 P values by chromosome.

We selected 17 SNPs (P < 2 × 10−5) located outside of known prostate cancer risk regions to examine in a second stage. The associations of these 17 SNPs with prostate cancer risk were not influenced substantially by population stratification in the stage 1 sample as evaluated by principal components analysis (Supplementary Table 2). The stage 2 sample included 1,396 cases and 2,383 controls of African ancestry from seven independent studies: six US-based studies and one study in Ghana. Of the 17 SNPs, only marker rs7210100 at 17q21 was significantly associated with risk in the stage 2 studies (odds ratio (OR) = 1.55, P = 2.5 × 10−5; Table 1). None of the other SNPs selected in stage 1 were significantly associated with risk in the stage 2 sample (all P values were >0.05); we excluded rs13116912 because it deviated from Hardy-Weinberg equilibrium in the majority of stage 2 studies. The results for all 17 SNPs in stage 1 and stage 2 are presented in Supplementary Table 3.

Table 1 The association of variant rs7210100 at 17q21 with prostate cancer risk in men of African ancestry

We further examined the association with rs7210100 in a third stage that included three studies among men of African descent, a study from the United States (SCORE), a study in Senegal (PROGRÈS) and a study in Barbados (PCBP). We found rs7210100 to be positively associated with risk in all three studies (stage 3, 471 cases and 904 controls; combined OR = 2.07, P = 1.5 × 10−5; Table 1).

Adjustment for global ancestry or local ancestry (African versus European) in the stage 1 studies did not influence the results for rs7210100 (OR = 1.41 without adjustment for ancestry, OR = 1.40 adjusted for global ancestry and OR = 1.43 adjusted for global and local ancestry). The effect estimate for rs7210100 was also similar in men with <15% global European ancestry (1,251 cases and 1,325 controls; OR = 1.41) as well as in cases and controls estimated to have two chromosomes of African ancestry at this location (2,214 cases and 2,080 controls; OR = 1.47). We observed no evidence of heterogeneity of the association by study for this variant in the stage 1 (Phet = 0.89), stage 2 (Phet = 0.25) or stage 3 studies (Phet = 0.51) or among all studies (Phet = 0.58). Results for all SNPs examined in the replication stages were also unaffected when adjusting for European ancestry in studies in which information on global ancestry was available (Supplementary Tables 4 and 5).

In combining the results across all three stages (5,262 cases and 6,554 controls), rs7210100 was strongly and significantly associated with risk (OR = 1.51, 95% CI 1.35–1.69, P = 3.4 × 10−13). The risk for heterozygote and homozygote carriers was 1.49 (95% CI 1.32–1.68) and 2.73 (95% CI 1.50–4.96), respectively. We did not find any stronger signal with imputed SNPs to the phase 2 HapMap populations in the surrounding region at chromosome 17q21 (Fig. 2 and Supplementary Fig. 2).

Figure 2: A regional plot of the −log10 P values for genotyped (squares) and imputed (circles) SNPs at the chromosome 17q21 risk locus in the stage 1 African-American sample.
figure 2

The shading depicts the strength of the correlation (r2) between rs7210100 and the SNPs tested in the region. The correlation is estimated in the YRI population from the 1000 Genomes Project (June 2010). Also shown are human genome build 18 coordinates (Mb), recombination rates in cM per Mb and genes in the region. The plot was generated using LocusZoom.

The association with rs7210100 was similar when stratifying on age (P = 0.72) and first-degree family history of prostate cancer (P = 0.36). We also observed no significant difference in the association of rs7210100 with prostate cancer stage (P = 0.94) or tumor grade (P = 0.11) at diagnosis. However, the association with rs7210100 was greater for non-advanced disease when classified based on stage and grade (Gleason score <8 and localized stage, 2,433 cases and 6,554 controls, OR = 1.67, P = 8.6 × 10−12) than for advanced disease (Gleason score ≥8 or non-localized disease, 1,719 cases and 6,554 controls, OR = 1.27, P = 5.0 × 10−3, Phet = 6.0 × 10−3).

Among controls with prostate-specific antigen (PSA) levels measured at ≤4 ng/ml (n = 2,383), we found no significant association between PSA levels and rs7210100 genotype (P = 0.58). Limiting the analysis to controls with PSA levels <4 ng/ml and cases from these studies did not change the association between rs7210100 and prostate cancer risk (n = 3,157 cases and 2,383 controls, OR = 1.62, P = 4.5 × 10−8).

The variant rs7210100 is located in intron 1 of ZNF652 on chromosome 17q21.32. ZNF652 encodes a zinc-finger protein transcription factor that has been shown to interact with the eight-twenty-one (ETO) protein, CBFA2T3, which acts as a transcriptional repressor by forming complexes with co-repressor proteins and HDACs16. Co-expression of ZNF652 and the androgen receptor in prostate tumors has been associated with a decrease in relapse-free survival17. A common variant just upstream of ZNF652 has also been associated with blood pressure in a GWAS of men and women of European ancestry18. Sequencing of the five coding exons of ZNF652 in 48 subjects (with an oversampling of risk allele carriers; Online Methods) did not reveal a coding variant strongly correlated with rs7210100. Further work is needed to map this locus in order to nominate optimal candidate markers, in addition to rs7210100, for functional studies in pursuit of regulatory effects of one or more variants in the region.

The risk allele of rs7210100 is relatively uncommon in men of African ancestry (4–7%), and is extremely rare (<1%) in non-African populations as reported by the 1000 Genomes Project. The frequency of the risk allele in men of west-African ancestry (Ghana and Senegal) is very similar to that observed in African Americans, as well as in men from east Africa (Uganda; n = 111, risk allele frequency = 0.04). GWAS in populations of European ancestry have not pointed to this region of 17q21 as a risk locus for prostate cancer (Supplementary Fig. 3). Together, these observations suggest that the underlying biologically relevant allele may be limited to populations of African descent. As reported by the National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) Program, prostate cancer incidence in African-American men is 1.56 times higher than the incidence in non-Hispanic individuals of European descent. Because approximately 10% of African-American men carry this variant, which increases their risk 1.50-fold over non-carriers, we estimate that this locus may be responsible for as much as 9% (95% CI 6–12%) of the greater incidence of prostate cancer in African-American men (Online Methods).

In summary, we detected a marker of risk for prostate cancer that appears specific to men of African descent, who have an increased incidence and mortality of this disease. These findings provide strong support for conducting GWAS in diverse populations to identify markers of risk that may be population specific and which could contribute to racial and ethnic disparities in disease incidence. Further work is needed to characterize the 17q21 region and conduct the functional studies required to understand the role of this germ-line variation in prostate cancer susceptibility.

URLs.

SEER, http://seer.cancer.gov/; LocusZoom, http://csg.sph.umich.edu/locuszoom/; PLINK, http://pngu.mgh.harvard.edu/~purcell/plink/; EIGENSTRAT, http://genepath.med.harvard.edu/~reich/Software.htm.

Methods

Studies.

The studies included in stage 1 were drawn from 11 epidemiological studies of prostate cancer among African-American men. These studies included The Multiethnic Cohort (MEC; 1,094 cases and 1,096 controls), The Southern Community Cohort Study (SCCS, 212 cases and 419 controls), The Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO, 286 cases and 269 controls), The Cancer Prevention Study II Nutrition Cohort (CPS-II, 76 cases and 152 controls), Prostate Cancer Case-Control Studies at MD Anderson (MDA, 543 cases and 474 controls), Identifying Prostate Cancer Genes (IPCG, 368 cases and 172 controls), The Los Angeles Study of Aggressive Prostate Cancer (LAAPC, 296 cases and 303 controls), Prostate Cancer Genetics Study (CaP Genes, 75 cases and 85 controls), Case-Control Study of Prostate Cancer among African Americans in Washington, DC (DCPC, 292 cases and 359 controls), King County (Washington) Prostate Cancer Study (KCPCS, 145 cases and 81 controls) and The Gene-Environment Interaction in Prostate Cancer Study (GECAP, 234 cases and 92 controls). These studies provided DNA samples for 3,621 cases and 3,502 controls.

Stage 2 included 1,396 cases and 2,383 controls from seven studies: San Francisco Bay Area Prostate Cancer Study (SFPCS, 86 cases and 37 controls), The Flint Men's Health Study (FMHS, 135/353), The Multiethnic Cohort/Los Angeles County (MEC-LA, 554 cases and 557 controls), North Carolina Prostate Cancer Study (NCPCS, 214 cases and 249 controls), Wake Forest University Prostate Cancer Study (WFPCS, 59 cases and 66 controls), Washington University Prostate Cancer Study (WUPCS, 75 cases and 153 controls) and The Ghana Men's Health Study (GHS, 271 cases and 968 controls). Stage 3 included 484 cases and 947 controls from three studies: The Study of Clinical Outcomes, Risk and Ethnicity (SCORE, 152 cases and 280 controls), Prostate-Genetique-Recherche-Senegal (PROGRÈS, 86 cases and 414 controls) and Prostate Cancer in a Black Population (PCBP, 246 cases and 253 controls). Detailed information about the design and organization of each study is provided in the Supplementary Note.

Genotyping and quality control.

Genotyping in stage 1 (3,621 cases and 3,502 controls) was conducted using the Illumina Infinium Human1M-Duo. Samples (n = 408) were removed based on the following exclusion criteria: (i) unknown replicates across studies, (ii) call rates <95%, (iii) >10% mean heterozygosity on the X chromosome and/or <10% mean intensity on the Y chromosome, (iv) ancestry outliers and (v) samples that were related (discussed below). The concordance rate for 158 replicate samples was 99.99%. Starting with 1,153,397 SNPs, we removed SNPs with <95% call rate, minor allele frequencies <1% or >1 quality-control mismatch based on sample replicates (n = 105,411). The analysis included 1,047,986 SNPs among 3,425 cases and 3,290 controls.

We used PLINK (see URLs) to calculate the probabilities of sharing 0, 1 and 2 alleles (Z = Z0, Z1, Z2) across all possible pairs of samples to determine individuals who were likely to be related to others within and across studies. We identified 167 pairs of related subjects (monozygotic twin, parent-offspring, full- and half-sibling pairs) based on the values of their observed probability vector Z being within 1 standard deviation of the expected values of Z for their respective relationship. The criterion for removal was such that individuals that were connected with a higher number of pairs were chosen for removal. In all other cases, one of the two members was randomly selected for removal. A total of 141 subjects were removed.

The EIGENSTRAT (see URLs) software was used to calculate eigenvectors that explained genetic differences in ancestry among samples in the study19. We included data from both HapMap populations (CEPH (Utah residents with ancestry from northern and western Europe) (CEU), Japanese in Tokyo, Japan (JPT), Yoruba in Ibadan, Nigeria (YRI) and African ancestry in the Southwestern United States (ASW)) and our study so that comparisons to reference populations of known ethnicity could be made. A total of 2,546 ancestry-informative SNPs from the Illumina array were selected based on low inter-marker correlation and ability to differentiate between samples of African and European descent. An individual was subject to filtering from the analysis if his value along eigenvector 1 or 2 was outside of 4 standard deviations from the mean of each respective eigenvector. We identified 108 individuals who met this criterion. Eigenvector 1 was highly correlated (ρ = 0.997, P < 1 × 10−16) with percentage of European ancestry, estimated in HAPMIX20. Together, the top ten eigenvectors explain 21% of the global genetic variability among subjects.

Genotyping in the stage 2 and 3 studies was conducted using the TaqMan allelic discrimination assay. In stage 2, we removed samples missing data for greater than three SNPs (n = 36). To assess genotyping reproducibility, each study included replicate samples; the concordance was >98% for each SNP within each study. rs13116912 deviated from Hardy-Weinberg equilibrium in all but one of the stage 2 studies and was removed from the stage 2 analysis. No other SNP deviated from Hardy-Weinberg equilibrium (P < 0.01 in more than two studies) in stage 1 or 2. The call rate for rs7210100 was very high in stage 1 (99.9%) and was similar in cases (99.9%) and controls (99.9%). The call rate for this SNP was also very high in stages 2 (99.8% overall, 99.9% in cases and 99.8% in controls) and 3 (96.1% overall, 97.3% in cases and 95.5% in controls).

Sequencing.

Bi-directional sequencing of rs7210100 and the five coding exons of ZNF652 was performed in 48 subjects (20 homozygous for the risk variant, 20 heterozygous for the risk variant and 8 homozygous for the wild-type allele.) Primers were designed at least 50 bases upstream and downstream from each exon.

Statistical analysis.

In stage 1, we tested the association of each SNP and prostate cancer risk using a 1-degree-of-freedom χ2 likelihood ratio test from a logistic regression analysis adjusted for age, study and the first ten eigenvectors estimated by principal components analysis19. Overinflation of the test statistic was examined with and without adjustment for ancestry and was visualized with quantile-quantile plots. Lambdas were estimated as the median of the test statistics divided by 0.456 (the median of the 1-degree-of-freedom χ2 null distribution). Age-adjusted ORs and 95% CIs for each SNP were estimated from the same logistic regression model. At each locus and for each participant, local ancestry was defined as the estimated number of European chromosomes (continuous between 0 and 2) carried by the participant estimated using the HAPMIX program20. Local ancestry at the 17q21 locus was evaluated as a confounder in the analysis of rs7210100.

Phased haplotype data from the founders of the CEU and YRI HapMap phase 2 samples were used to infer linkage disequilibrium patterns in order to impute untyped markers. We carried out genome-wide imputation using the software MACH21. The Rsq metric was used as a threshold in determining which SNPs to filter from analysis (Rsq < 0.3). Imputed SNPs in the 17q21 risk region, as shown in Figure 2, were examined in association with prostate cancer risk as described for typed SNPs above.

In stage 2, the SNPs were analyzed using logistic regression controlling for age and study (in the pooled analysis). Information regarding European ancestry was available for seven studies included in stages 2 and 3. As observed in stage 1 (Supplementary Table 2), the OR for rs7210100 was similar with and without adjustment for estimated European ancestry in these studies (Supplementary Table 4). The results for rs7210100 in stage 2, stage 3 and stages 1, 2 and 3 combined are presented without adjustment for ancestry. Heterogeneity of the OR across studies was evaluated using a likelihood ratio test.

Effect modification by age and first-degree family history of prostate cancer was assessed in stratified analyses, and significance was determined comparing the model with and without the cross-product term using a likelihood ratio test. We also examined the association of rs7210100 genotype with stage, Gleason score and the combination of stage and grade, with advanced disease defined as Gleason score ≥8 or stage ≥2 (non-localized disease), and non-advanced disease was defined as Gleason score <8 and stage = 1 (localized disease). A case-only analysis was used to test for differences in the association of rs7210100 with disease phenotypes. The association of rs7210100 with least-squares geometric-mean PSA levels was examined using multiple linear regression adjusting for age, body mass index and study.

We estimated the risk ratio between populations of different ancestral origin (African or European) caused by rs7210100 as RR = [(1 – pA)2 + 2pA(1 – pA)RR1 + pA2RR2]/(1 – pE)2 + 2pE(1 – pE)RR1 + pE2RR2]. Here pA is the risk allele frequency in African origin populations, pE is the risk allele frequency in European populations, RR1 is the relative risk associated with carrying one copy of the risk allele (compared to none) and RR2 is the relative risk associated with carrying two copies of the risk allele. We used values pA = 0.05, pE = 0, RR1 = 1.5 and RR2 = 1.52 so that the risk ratio between populations caused by the influence of this risk allele was estimated to be equal to 1.050625. Using the SEER incidence rates of prostate cancer in African Americans (234.6 per 100,000) and non-Hispanic individuals of European ancestry (150.4 cases per 100,000), we estimated the ratio of risks between these populations as 234.6/150.4 = 1.56. The percentage of greater risk to African Americans that may be associated with rs7210100 was estimated as 1 – [(1.56 – 1.050625)/(1.56 – 1)] × 100.