Main

To perform a large, family-based recombination study, one challenge is to phase the genotypes of the parents when the grandparents are not genotyped. One solution is to use genotyped nuclear families with two or more offspring, which in essence uses the children to phase the parents. However, resolution can be diminished and difficulties can arise when two or more offspring have recombinations that are close to each other. We capitalized on recent methodological advances that led to the successful determination of parental origins of over 97% of the heterozygous genotypes of 38,167 Icelanders typed on Illumina SNP arrays, many of them with ungenotyped parents9,10. Parental origins provide phase. We used phased haplotypes of 8,850 mother–offspring pairs (6,041 distinct mothers) and 6,407 father–offspring pairs (4,389 distinct fathers) to identify recombinations (Fig. 1) for 15,257 meioses (Supplementary Table 1).

Figure 1: Determining recombination locations.
figure 1

Here it is assumed that genotypes of parent and offspring have been phased, with parental origin determined9,10. The parent shown is a father, but the same method applies to a mother–offspring pair. For an SNP that is heterozygous for the parent, it can be determined whether the allele passed on to the offspring is from the parent’s maternal or paternal chromosome. The location of the recombination can hence be localized to the region spanned by the two closest flanking heterozygous markers in the parent. (Details are in Supplementary Information.)

PowerPoint slide

Recombinations were determined using 289,658 and 8,411 SNPs on the autosomal and X chromosomes respectively. The data only allowed us to assign a recombination to the region spanned by the two closest flanking heterozygous markers in the parent (Fig. 1). Treating this as a missing-data problem, the EM algorithm11 was used to calculate likelihood-based estimates of recombination rates for males and females (Supplementary Information and Supplementary Table 2). Also, results from the E-step of the EM algorithm were used to calculate the estimated recombination count in each marker interval for each meiosis. In addition to genetic distances between SNPs, maps for various uniformly spaced grids were calculated by linear interpolation.

Existing genetic maps include the 2002 deCode family-based map8 and the most commonly used LD-based maps12 (Methods Summary), referred to here as the CEU, YRI and COMBINED maps. The COMBINED map is essentially the average of the CEU and YRI maps. These maps have similar lengths, because the 2002 deCode map was used to scale the other maps which only provide information about relative recombination rates. By comparison, our newly constructed sex-averaged map is 3% shorter. This is probably because we tabulated only recombinations considered highly reliable and some recombinations were missed. Also, the 2002 deCode map could be slightly inflated because of genotyping errors. Supporting the assumption that the dropped recombinations are approximately randomly distributed and have minimal impact on the relative recombination rate estimates, the correlation between the sex-averaged map and the 2002 deCode Map is 0.945 at 3-megabase (Mb) resolution, roughly the limit of resolution for the older map. This correlation is stronger than that between the 2002 deCode map and the LD-based maps (r = 0.920, 0.914 and 0.927 for the CEU, YRI and COMBINED maps, respectively). Correlation between the sex-averaged map and the COMBINED map is 0.977 (Supplementary Table 3).

Recombination maps partitioned into 10-kb bins were calculated for each sex. For subsequent investigations, we excluded the X chromosome and 5-Mb regions at the ends of autosomal chromosomes relative to the SNP coverage, locations where the determination of recombinations is less reliable. We also excluded 10,254 bins covering unsequenced regions (Human Map build 36), of which 8,891 were centromeric. These bins generally have low recombination rates and a fraction of them include intervals without recombination rates assigned by the COMBINED-map. Genetic distances of those bins are clearly biased downwards in all three LD-based maps. In total, the studied regions covered 2,444.46 Mb or 244,446 bins. For these bins, the estimated average genetic distance is 0.0155 cM (sum = 3,790.1 cM) for females and 0.0077 cM (sum = 1886.7 cM) for males. At this resolution, the correlation between the male and female maps is 0.659.

A standardized recombination rate (SRR) was calculated for males and females separately, by dividing the genetic distance of each bin by the overall average. Defining recombination hotspots as those bins with an SRR greater than 10, we observed 4,762 hotspots for males and 4,129 hotspots for females, with an overlap of 1,953. The male hotspot bins covered 1.9% of the physical distance of the studied region but accounted for 36.2% of its recombinations. Corresponding numbers for females were 1.7% and 28.0%, respectively. Despite similarities of sexes, 718 and 125 of the 4,762 male hotspots have an SRR less than 3 and 1, respectively, in females. A permutation test (Supplementary Information) showed that these male-specific hotspots have a false-discovery rate of approximately 1.9% and 0%, respectively. Thus approximately 704 of the 718, and all of the 125, identified bins correspond to true sex differences, indicating that about 14.8% (704/4,762) of the male hotspots are sex specific. Correspondingly, of the 4,129 female hotspots, 624 (false-discovery rate 2.8%) and 166 (false-discovery rate 0.7%) have an SRR smaller than 3 and 1, respectively, in males. About 14.7% (606/4,129) of female hotspots are sex specific.

Sex-specific hotspots tend to occur in clusters. Fig. 2a shows a region harbouring the Basonuclin-2 gene13 where recombinations are dominated by those resulting from male meioses. This region contains five male-specific hotspots, the two most striking being at 16.649 Mb (male SRR = 29.1) and 16.829 Mb (male SRR = 27.7). However, even though the female SRR is substantially smaller for these two bins (0.5 and 2.6, respectively), they do correspond to local peaks. This is typical for other male-specific hotspots. The same trend applies to regions where recombinations are dominated by females: that is, local peaks for male recombination rate at female-specific hotspots (Supplementary Fig. 1). Thus, even though hotspots are defined for narrow intervals (noting that the 10-kb resolution hotspots examined here could often be driven by intervals much shorter in length), they are determined by interactions between factors both local and regional, the latter concerning regions that are hundreds of kilobases to many megabases in length (Fig. 2b). If the local factors, but not the regional ones, are supportive of recombination, a local peak that is not a hotspot would result. Moreover, the regional forces influencing male and females are only partly correlated. Indeed, the correlation between the male and female maps is less at 3-Mb (0.649) than at 10-kb resolution (0.659), even though the former is less affected by sampling variation.

Figure 2: Sex differences in recombinations.
figure 2

a, Male and female SRRs at the Basonuclin-2 gene. Recombinations in this region are dominated by those resulting from male meioses. It is, however, noted that, although female recombination rates are generally low here, locations of male recombination hotspots often correspond to local peaks for female recombination rate. b, Autocorrelations of the difference between male and female SRR as a function of the number of bin separations. Note that, albeit small (0.007), the correlation is clearly positive for bins that are 10 Mb (1,000 bins) apart.

PowerPoint slide

We classified the 10-kb bins as genic, intergenic or at gene boundaries (Table 1). On average, the recombination rate is lower in genic regions than in intergenic ones, a difference that is greater for females (average SRR = 0.898 and 1.053, respectively) than males (average SRR = 0.992 and 1.012, respectively). For both sexes, the recombination rate tends to be lower at genic bins containing exons, and higher for those containing only introns, particularly those where the closest exon is more than three bins away. This latter difference is much greater for males (SRR = 0.868 and 1.284, respectively) than females (SRR = 0.843 and 1.013, respectively). In fact, intron bins far from exons exhibit the greatest difference between male and female SRR (0.270, P = 2.2 × 10−7) among the bin categories studied. At intergenic regions, for both sexes, the recombination rate first increases with distance from the first or last exon of genes, peaking at approximately three to four bins away, then decreases. The changes are more dramatic in females than males (Table 1). For intergenic bins that are ten bins or less from genes, the average SRR for males and females is 1.119 and 1.256, respectively (P = 1.3 × 10−19). Hence, although more male recombinations participate in shuffling exons within genes, female meioses are characterized more by gene shuffling.

Table 1 Sex-specific standardized recombination rate and genomic regions

For both sexes, similar differences in SRR exist between the 5′ and 3′ ends of genes. For intergenic regions within 100 kb of the nearest gene, the average SRR at the 5′ ends is approximately 0.15 lower than that at the 3′ ends (P = 2.7 × 10−7). The difference disappears for distances greater than 100 kb. In contrast, bins containing the first exon of a gene have a higher average SRR than those containing the last exon (0.11, P = 7.2 × 10−4). This difference, however, does not extend further into the genic regions, which suggests that the first intron has a higher recombination rate than the last intron at the immediate neighbourhoods of the first and last exons. Figure 3 summarizes the relationships between sex-specific recombination rate and genes.

Figure 3: Sex-specific recombination rates and genes.
figure 3

Schematic picture summarizing general trends (see Table 1); it is not meant to reflect the recombination rate pattern around a specific gene. Male SRR, although low at exons, tends to be high at intronic regions that are distant from exons. Male and female SRRs both tend to be high at intergenic regions around 40 kb from the first or last exon of a gene, but it is higher for females. Also, for both sexes, intergenic regions close to 3′ ends tend to have higher recombination rates than those close to 5′ ends.

PowerPoint slide

Differences in recombination rate exist between individuals of the same sex8,14,15. Recently, the PRDM9 gene was shown to be a major determinant of hotspots in humans4,5,6. This gene is highly polymorphic, with most of its sequence variants clustering in the zinc-finger domain of the gene. The Human Genome assembly (hg18) ascribes 13 zinc-finger repeats to the PRDM9 gene. These repeats are invariant except at positions −1, 3 and 6 of each of the zinc-finger α-helixes. Variations in the number of repeats within the human population have been described. In the Hutterite population, carriers of a rare version of the gene with 16 zinc-finger repeats were shown to have substantially fewer recombinations in hotspots than non-carriers4. To investigate comprehensively variants that could affect hotspots, we performed a genome scan, separately, for the 6,041 mothers and 4,389 fathers studied, correlating the fraction of recombinations in hotspots (henceforth referred to as the hotspot phenotype) with SNPs on the Illumina 1M chip (Methods Summary).

For both sexes, many SNPs around PRDM9 achieved genome-wide significance (P < 5 × 10−8, Supplementary Figs 2 and 3). Most significant was rs2914276, with the minor allele G (frequency = 3.9%) associating with fewer recombinations in hotspots (P < 10−100 for females and P < 10−50 for males). Determining the zinc-finger repeat number of 575 Icelanders, enriching for carriers of rs2914276-G, we observed variations in repeat numbers from 12 to 15 (Supplementary Information, Supplementary Fig. 4 and Supplementary Table 4). Imputing this polymorphism into others, 12 to 15 repeats were estimated to have frequencies of 0.1%, 96.0%, 3.2% and 0.6%, respectively. Rs2914276-G correlates substantially with either 14 or 15 repeats (r = 0.83), whereas the major allele A is correlated with 12 or 13 repeats. No significant difference was seen between12 and 13 repeats with respect to hotspots. Individuals carrying only 12 or 13 repeats have 28.6% and 37.1% of their recombinations in hotspots for females and males, respectively. Fourteen and 15 repeats are associated with significantly lower fractions. One copy of 14 repeats brings the corresponding fractions down to 19.1% and 25.4%, and for 15 repeats to 20.5% and 27.3%. Although the higher fractions for the 15 repeats than the 14 repeats are barely significant when results from both sexes are combined (P = 0.018), it emphasizes that the fraction of recombinations in hotspots does not decrease monotonically with number of repeats. The number of repeats has a stronger association with the hotspot phenotype than rs2914276, but the latter remains highly significant after accounting for the former. By sequencing the zinc-finger repeats from 55 Icelandic chromosomes covering the 12 to 15 repeat spectrum, a variant rs6875787 leading to an amino-acid change in the sixth zinc finger, also noted previously4, was seen (Supplementary Fig. 4a). Additional sequencing and further investigations (Supplementary Information) showed that the minor allele of rs6875787 is in about 5.3% of the chromosomes with 13 repeats, and confirmed a previous suggestive finding4 that it lowers the fraction of recombinations in hotspots. However, the effect is only about one-tenth that of the 14 or 15 repeats. Thus it could be that many polymorphisms in the PRDM9 locus affect hotspots, but the repeat polymorphism alone captures most (>90%) of the association currently observed between variations in PRDM9 and the hotspot phenotype. Because the 14 and 15 repeats do not behave that differently, we collapsed them into a single 14/15 allele with a frequency of about 3.9%. We estimated that the differences between the 14/15 and 12/13 repeats alone can account for 60% and 44% of the total systematic component of the hotspot phenotype for males and females, respectively (Supplementary Information). The 16 repeat found in the Hutterites4 was not observed in the Icelandic samples examined, but the described differences between the 14/15 and the 12/13 repeats are novel.

Because few parents carry more than one 14/15 allele, we grouped heterozygous and homozygous carriers together (351 males and 429 females, contributing 502 and 612 meioses, respectively) to construct two sex-specific carrier recombination maps. The same was done for non-carriers: that is, those with only 12 or 13 repeats. Although carriers have fewer recombinations in hotspots, they remain in substantial excess of the genome average (average SRR = 13.1 and 11.4 for male and female carriers, respectively, compared with an average SRR of 19.0 and 16.9, respectively, for non-carriers). Moreover, the binding motif corresponding to 13 repeats is associated with increased recombination rate and hotspots in carriers and non-carriers, although the effect is slightly stronger for non-carriers (Supplementary Information and Supplementary Table 5). The binding motif predicted for 14 repeats is also associated with increased recombination rate for both sets of individuals, but here the effect is stronger for carriers. In addition to motif intensities at a bin itself, motif intensities at nearby bins also appear to have an effect (Supplementary Table 6). However, the magnitudes of all these correlations are low, and the motifs alone provide very modest power for predicting hotspots.

For the 10-kb bins, the correlation between the CEU and YRI maps is 0.716 (Supplementary Table 3). The correlation between our overall sex-averaged map is stronger with the COMBINED map (0.729) than with the CEU (0.700) and YRI (0.643) maps, which indicates that a substantial part of the difference between the CEU and YRI maps is noise. Nonetheless, by examining the variations of PRDM9 in the HapMap YRI samples, we found zinc-finger repeat lengths of 12 to 15 and 17 to 19 (Supplementary Table 4 and Supplementary Fig. 4b). Grouping different repeat lengths into three composite alleles, the 12/13, 14/15 and 17/18/19 alleles have frequencies of 65.8%, 26.7% and 7.5%, respectively. The 14/15 allele is much rarer in the CEU samples, where only 13 and 14 repeats are found, at frequencies of 96.6% and 3.4%, respectively. We standardized all maps, including sex-averaged maps constructed separately for carriers and non-carriers of the 14/15 repeat (referred to as mapC and mapNC, respectively), in the same way as with our sex-specific maps. When regressing the difference between the YRI and CEU maps, on mapC and mapNC jointly, the coefficient of mapC was positive (0.089, P < 10−100) and the coefficient of mapNC was negative (−0.307, P < 10−300). When regressing the CEU and YRI maps separately on mapC and mapNC jointly, all coefficients were positive. For CEU, the coefficient of mapC was approximately 5.4% of the coefficient of mapNC, compared with 33.2% for YRI. Thus true differences between Europeans and Africans, explained by differences in frequencies of PRDM9 variants, are identified.

There are 4,006 hotspot bins (10-kb bins with SRR >10) in our overall sex-averaged map, compared with 4,010 for the COMBINED map. The overlap is 2,139. Based on mapC and mapNC, 18.3% of the recombinations of carriers of the 14/15 repeat are in hotspots of our overall sex-averaged map, whereas the figure is 27.0% in non-carriers. Simulations that adjust for increased variation in maps estimated based on a reduced sample size show that, genome-wide, mapC is not smoother than mapNC, which suggests that the reduced recombination rate at hotspots defined by the overall map is compensated for by hotspots elsewhere. Among the 5,034 bins with an SRR greater than 10 in mapC, 1,380 and 371 have an SRR less than 3 and 1, respectively, in mapNC. A permutation test shows that 350 and 38 bins with such properties are expected by chance, suggesting that 1,030 of the 1,380 (74.6%), and 333 of the 371 (89.8%), are true hotspots specific to the 14/15 repeat carriers. Further support comes from examining the differences in SRR between the YRI and CEU maps, wherein 545 of the 1,380 (39.5%) and 150 of the 371 (40.2%) identified bins fall into the top fifth percentile of all bins studied.

In the genic regions, our sex-averaged map and the COMBINED map have an average SRR of 0.929 and 0.904, respectively. The difference is significant (P = 3.1 × 10−5) and is mainly accounted for by bins containing exons and intronic bins within 10 kb of an exon. One possibility is that regions around exons are more likely to have been subject to natural selection, resulting in a lower number of recombinations detectable from LD and consequently leading to an underestimation of recombination rate in LD-based maps7. However, for regions (23,614 10-kb bins) that have been proposed as targets of selection by at least two of nine genome scans compiled by Akey16, the difference between the COMBINED map and our map is opposite to that observed for regions around exons. Specifically, although both maps assign low recombination rates to these regions, the average SRR in the COMBINED map (0.650) is significantly higher (P = 5.7 × 10−9) than that in our sex-averaged map (0.593). Whether these differences are a result of some novel bias that affects estimates of LD-based recombination rate at regions under selection, or partly reflect properties exhibited by the statistical methods used to identify regions under selection that are currently poorly understood, warrants further investigation.

Polymorphisms at the RNF gene that influence total genome-wide recombination rates of males and females in opposite directions15 have little impact on the fraction of recombinations in hotspots (Supplementary Information). Variations at the PRDM9 gene influence recombination locations in a similar manner for both sexes, but have little effect on total genome-wide recombination rate. An inversion on chromosome 17 reported to associate with increased fertility and genome-wide recombination rate17 also appears to increase the fraction of recombinations in hotspots, but the effect is limited to females (P = 2.9 × 10−5 and 0.49 for females and males, respectively) and is modest (Supplementary Information). These polymorphisms, together with the systematic regional and local differences in recombination rates between the sexes, provide a glimpse of nature’s ingenuity in building diversity and flexibility into the system. The maps constructed in this study (available at http://www.decode.com/addendum) should serve as a valuable resource for genetics research for years to come.

Methods Summary

Subjects were 20,217 distinct individuals genotyped using various Illumina BeadChips and processed to determine parental origin. When searching for variants associated with the hot-spot phenotype, adding to SNPs used to determine recombinations, another 497,257 SNPs typed for a subset of the individuals were imputed into the others with methods used before. Variants at the PRDM9 gene were similarly imputed. Imputations were not used for map construction. The LD-based maps were downloaded from https://mathgen.stats.ox.ac.uk/impute. The entire zinc-finger domain of the PRDM9 gene was amplified with unique primers outside the repetitive region, avoiding homology to chromosome 16, for a total of 575 Icelanders, 30 CEU and YRI trios, and 74 Han Chinese in Beijing (CHB) and Japanese in Tokyo (JPT) samples. The amplified product was run on agarose gel to determine the number of zinc-finger repeats. For further analysis, 55 bands of different repeat lengths from Icelanders and 21 from YRI samples were isolated from the agarose gel, cloned and fully sequenced. Statistical tests used were mainly regression based, for example paired and unpaired t-tests, correlation tests and regressions. Genomic control18 was used for the genome-wide association analysis with the hotspot phenotype. For map comparisons, to handle correlations among close-by bins, a procedure that permutes and flips chromosomes was used to calculate adjustment factors for the test statistics. Another randomization procedure that permutes individuals was used to estimate false discovery rates for sex-specific and new hotspots. See Supplementary Information for details.