Main

To characterize the chromatin modifications typical of CGIs, we used the methyl-CpG-sensitive restriction endonuclease HinPI (cleavage site GCGC) to release small chromatin fragments from purified brain nuclei, as described previously9. As sites for this enzyme in bulk chromatin are rare and generally uncleavable owing to DNA methylation, the released fraction predominantly contains non-methylated CGIs. Confirming this, further digestion of the deproteinized DNA with HpaII (cleavage site CCGG) specifically collapsed the nucleosomal ladder generated by HinPI, but had little effect on DNA released from bulk chromatin with MseI (cleavage site TTAA; Supplementary Fig. 1)9. Western blotting confirmed that non-methylated CGI chromatin is enriched for histone modifications associated with actively transcribed genes (acetylated histone H3, H3K4me3 and H3K4me2) compared with bulk chromatin (Fig. 1a). In contrast, CGI chromatin was depleted for marks not found at active promoters: H3K36me3, H3K9me3, H3K27me3 and H4K20me3 (Fig. 1a). Agreement between these results and genome-wide studies of chromatin modifications3,4,10,11 indicated that this fraction could be used to identify proteins that preferentially localize to non-methylated CGIs. We first tested CXXC finger protein 1 (Cfp1), which binds to non-methylated CpG dinucleotides in vitro by a CXXC zinc finger domain6,12. The data showed that Cfp1 is enriched within the CGI fraction of the genome (Fig. 1a). Similarly, Kdm2a, an H3K36 demethylase that also contains a CXXC domain13, was enriched in the CGI fraction.

Figure 1: Cfp1 is enriched in non-methylated CpG island chromatin.
figure 1

a, Western blot analysis of non-methylated CGI and bulk chromatin released from purified nuclei using antibodies against selected histone modifications and CpG-binding proteins (normalized to histone H3 levels). An antibody against histone H3 served as a loading control. Modifications associated with transcriptional activity, including H3 acetylated on amino acids K9 and K14, H3K4me3 and H3K4me2 were enriched in CGI chromatin, whereas the elongation and silencing marks H3K36me3, H3K9me3, H3K27me3 and H4K20me3 were depleted. The CXXC domain proteins Cfp1 and Kdm2a also showed enrichment within the CGI fraction. b, Cfp1 ChIP assayed by qPCR across the X-linked mouse Xist locus. Vertical strokes beneath the plot represent CpGs within the locus and the black bar above demarcates the CGI. The open box below the CpG map shows the region amplified for bisulphite analysis. IP, immunoprecipitation. c, Bisulphite analysis of input chromatin (female brain) and chromatin immunoprecipitated with Cfp1 antibodies and control MeCP2 antibodies. Twelve representative clones are shown from the total number sequenced (number in brackets). Solid and open circles represent methylated and non-methylated CpGs, respectively. Uncharacterized CpGs are represented as gaps.

PowerPoint slide

Focusing on Cfp1, we tested its in vivo binding specificity by chromatin immunoprecipitation (ChIP) at an endogenous CGI that is present in both methylated and non-methylated states. The Xist CGI is mono-allelically methylated in female cells, but fully methylated in males, which only have one X chromosome14. ChIP analysis of mouse brain tissue identified a peak of Cfp1 binding over the Xist CGI in females, but no peak was present in males, suggesting that Cfp1 exclusively binds to the non-methylated allele (Fig. 1b). To test this more stringently, we used bisulphite sequencing across the Xist locus to determine the methylation status of the immunoprecipitated chromatin recovered from females. As expected, input DNA comprised equal numbers of methylated and non-methylated DNA clones. DNA immunoprecipitated by the Cfp1 antibody was almost exclusively non-methylated (96%), however, whereas DNA immunoprecipitated with an antibody against the methyl-CpG-binding protein MeCP2 (refs 15–17) was predominantly methylated (88%; Fig. 1c). We conclude that Cfp1 selectively binds to non-methylated CpGs in vivo.

To test whether Cfp1 is concentrated at non-methylated CpGs within CGIs, we analysed the genome-wide distribution of Cfp1 using high-throughput DNA sequencing of immunoprecipitated DNA (ChIP-Seq). Prominent peaks of Cfp1 binding co-localized with non-methylated CGIs (Fig. 2a), 81% of which were Cfp1-associated. Cfp1 has been identified as part of the Setd1 H3K4 methyltransferase complex8 and ChIP-Seq with H3K4me3 antibodies showed that 93% of Cfp1-bound CGIs also possess this histone modification (Fig. 2b and Supplementary Table 1). Consistent with the possibility that Cfp1 binding is responsible for recruiting the Setd1 complex to these sites, Cfp1-negative non-methylated CGIs (19% of the total) also lack H3K4me3 (Fig. 2b). Despite being rich in non-methylated CpGs, these CGIs are somehow refractory to Cfp1 binding. One potential explanation came from alignment with the published18 distribution of the polycomb-associated mark H3K27me3 (ref. 19) in mouse brain. More than half (58%) of Cfp1-negative and H3K4me3-negative CGIs contained the H3K27 modification (Fig. 2a, b and Supplementary Fig. 2). In these cases H3K27me3 and polycomb binding may render a CpG island refractory to Cfp1 binding and to H3K4 methylation.

Figure 2: Genome-wide ChIP sequencing shows a tight association between Cfp1 and H3K4me3 at CGIs.
figure 2

a, Typical Cfp1 ChIP-Seq profiles from whole mouse brain. For comparison, we also carried out H3K4me3 ChIP-Seq. The data were aligned with non-methylated CGIs mapped in mouse brain using a CXXC affinity column29. The panel shows a typical region of the genome from chromosome 4 (nucleotides 126,333,759–127,054,849) demonstrating the coincidence of Cfp1 and H3K4me3 peaks with CGIs. A subset of genes is labelled (RefSeq). Two CGIs that lack H3K4me3 and Cfp1 coincide with sites of H3K27me3 binding (red rectangles; data of ref. 30 for mouse brain). b, Venn diagram showing strong overlap between H3K4me3 and Cfp1 peaks in mouse brain chromatin, but minimal overlap with H3K27me3.

PowerPoint slide

To assess the importance of Cfp1 for the recruitment of H3K4me3, we used stably expressed short hairpin RNAs (shRNAs) directed against Cfp1 to reduce its level in NIH3T3 cells. Single shRNAs reduced Cfp1 (Supplementary Fig. 3), but a combination of three gave a greater effect (Fig. 3a). Depleted cells showed altered morphology (Fig. 3b) and retarded growth (Fig. 3c). ChIP analysis revealed a loss of Cfp1 binding compared with vector-only transfected cells accompanied by a precipitous drop in levels of H3K4me3 across CGIs at the brain-derived neurotrophic factor (Bdnf), β-actin (Actb), c-Myc and Dlx5/6 genes (Fig. 3d). The same results were obtained with clones expressing each of two independent shRNA sequences, ruling out off-target effects of shRNA expression (Supplementary Fig. 3). As a further control, H3K27me3 profiles at the same loci were unaffected by depletion of Cfp1 (Fig. 3d and Supplementary Fig. 3b). The loss of H3K4me3 at six randomly selected CGI promoters in Cfp1-depleted cells argues that this modification is dependent on the presence of Cfp1.

Figure 3: Depletion of Cfp1 results in reduced H3K4 trimethylation levels at CpG islands.
figure 3

a, Expression of three short hairpin RNAs in NIH3T3 fibroblasts reduced Cfp1 messenger RNA levels to 15% compared with vector-only transfected control cells. Expression of Cfp1 relative to Gapdh in control cells is set to 1. The inset shows reduction of Cfp1 by western blotting. Error bars indicate s.d. (n = 3) b, c, Gross morphology (b) and growth rate (c) of Cfp1-depleted versus vector-only transformant cells. Cells were plated at low density and monitored at the times shown using a haemocytometer. Original magnification, ×200. d, ChIP qPCR using Cfp1, H3K4me3 and H3K27me3 antibodies at selected loci in vector-only control and Cfp1-depleted NIH3T3 cells. The results were replicated with an independent clone expressing the same shRNA combination (data not shown) and with each of two individual shRNA constructs (see Supplementary Fig. 3).

PowerPoint slide

Although Cfp1 binds non-methylated CpGs and seems to be required for H3K4 methylation at CGIs, it is possible that this reflects indirect recruitment of Setd1 by RNA polymerase II, which is present at active CGI promoters. Alignment of ChIP-Seq profiles for Cfp1, H3K4me3 and the unphosphorylated form of RNA polymerase II indeed showed co-localization of all three signals at 86% of all Cfp1-bound CGIs (Supplementary Table 1 and Supplementary Fig. 4). In a small proportion (7%) of cases, however, RNA polymerase II was undetectable, despite the presence of robust peaks of H3K4me3 and Cfp1 (Supplementary Fig. 4). This raised the possibility that RNA polymerase II may not be required and that Cfp1 binding is sufficient to direct H3K4 trimethylation. To test this hypothesis, we used embryonic stem (ES) cell lines in which artificial promoterless CpG-rich DNA sequences had been introduced into the genome at sites that normally lack H3K4me3. The DNA insert in ES line TβC44 (ref. 20) comprises a 720-base-pair (bp) enhanced green fluorescent protein (eGFP) coding sequence containing 60 CpGs21 adjacent to a 600-bp puromycin-resistance gene with 93 CpGs (Fig. 4a). The inserted sequence has the typical CpG density of a CGI, but lacks a promoter. Bisulphite analysis showed that integrated sequence is non-methylated (Fig. 4a). In the targeted cells, prominent domains of Cfp1 and H3K4me3 coincided with the inserted CpG-rich DNA (Fig. 4b). Interestingly, the peaks of H3K4me3 and Cfp1 tracked CpG density as expected if H3K4me3 is determined by this DNA dinucleotide sequence (Fig. 4b, broken line). No peak of RNA polymerase was detected. An independent ES cell line carrying an eGFP insertion on the X chromosome22 (Fig. 4c) also created a peak of H3K4me3 and Cfp1 (Fig. 4d). In this case, bisulphite sequencing showed that approximately a quarter of the integrated sequences were hypomethylated and the remainder were densely methylated (Fig. 4e, input panel). ChIP-bisulphite analysis demonstrated that Cfp1 and H3K4me3 antibodies significantly enriched the hypomethylated sequences (Fig. 4e). We conclude that clusters of non-methylated CpG are sufficient to recruit Cfp1 and create a peak of H3K4me3 modification, even in the absence of a promoter.

Figure 4: Artificial promoterless CpG-rich sequences recruit Cfp1 and generate new H3K4me3 peaks in mouse ES cells.
figure 4

a, The TCβ44 ES cell line carries adjacent promoterless eGFP and bacterial puromycin-resistance sequences (black bars) inserted together within the 3′ untranslated region of the Nanog gene20. The positions of CGIs and H3K4me3 peaks4 at this locus in wild-type ES cells are shown below the map. DNA methylation within the insertion was determined by bisulphite sequencing of 306-bp (eGFP) and 275-bp (Puro) segments of the inserted sequence. b, ChIP qPCR across the region containing the insertion using antibodies against Cfp1, RNA polymerase II (Pol2) and H3K4me3. The dotted line plots CpG density in a 500-bp window with a 100-bp step size. Vertical strokes below the graph mark CpG sites within and surrounding the insertion. c, A second ES cell line carried an eGFP coding sequence (black bar) inserted into the 3′ untranslated region of the X-linked Mecp2 gene. CGIs and H3K4me3 peaks4 at this locus in wild-type ES cells are shown below. d, ChIP qPCR across the region containing the insertion (black bar) using antibodies against Cfp1, RNA polymerase II and H3K4me3. e, Bisulphite sequence analysis determined the methylation status of input and DNA immunoprecipitated by the Cfp1 and H3K4me3 antibodies. The percentage of non-methylated CpGs is shown below each panel. H3K4me3 ChIP data in b and d used different commercial antibodies with differing affinities (see Supplementary Table 2).

PowerPoint slide

The density of non-methylated CpG is 50-fold higher in CGIs than in bulk genomic DNA, as CpG in the latter is deficient (20% of expected23) and mostly methylated (70%). It is unclear whether this high CpG density arises as a passive consequence of events at promoters and has no functional significance, or whether it has been selected over evolutionary time because it facilitates transcription (or other DNA-related processes). Our results favour selection, as they indicate that CpG density per se can directly influence histone modification status by the recruitment of the Cfp1 protein and its associated Setd1 histone H3K4 methyltransferase complex. The ability of an exogenous promoter-less CpG-rich insertion to create de novo an H3K4me3 focus provides strong support for this notion. An attractive biological rationale for this phenomenon may be simplification of the large mammalian genome by the creation of ‘beacons’ of H3K4me3 that highlight CGI promoters within the genomic landscape1.

Whether CpG clustering is sufficient to create stable non-methylated CGIs is uncertain. There is evidence that H3K4me3 is incompatible with de novo methylation as components of the DNA methyltransferase complex (Dnmt3L) are repelled by this modification24. In theory, therefore, Cfp1-bound CGIs should be intrinsically stable in the non-methylated state. Previous studies suggest, however, that transcription also has a role. Maintenance of non-methylated CGIs through the waves of de novo methylation in the early embryo depends on promoter function, as point mutations that prevent transcription factor binding without significantly reducing CpG density destroy the immunity of a CpG island to DNA methylation25,26. It follows that H3K4 methylation due to CpG clustering may not be sufficient to reliably perpetuate the non-methylated state. Indeed, more than half of cells carrying the promoter-less eGFP insertion at the Mecp2 locus had acquired dense methylation in ES cells despite the presence of a CpG cluster.

Our data suggest that chromatin modification need not arise secondarily as a result of, for example, transcriptional status, but can be determined genetically due to the sequence characteristics of the underlying DNA. In particular CpG, by virtue of its widely varying local densities and alternative modification states, has the properties of a signalling module that locally influences genome function. As shown here, DNA methylation-free CpG clusters can recruit Cfp1 and probably other CXXC domain proteins. Densely methylated CGIs, on the other hand, attract methyl-CpG-binding proteins, which in turn recruit enzymes that can reinforce repressive histone modifications17,27,28. Future studies of proteins that read and interpret CpG signals promise to shed further light on both genetic and epigenetic determinants of chromosome function.

Methods Summary

Mouse brain nuclei were incubated with restriction enzymes and then pelleted to liberate small fragments of CGI chromatin that were then analysed by western blotting. ChIP was performed on mature mouse brain with antibodies against various chromatin proteins. Immunoprecipitated DNA was used for: (1) quantitative PCR (qPCR) analysis of specific loci; (2) bisulphite analysis to determine DNA methylation patterns; and (3) ligation of linkers and Solexa sequencing to identify sites of binding. To knockdown Cfp1, mouse NIH3T3 cells were transfected using shRNA targeting Cfp1 or vector alone as a control, and stable clones were selected by puromycin resistance. RNA and protein samples were prepared to verify the knockdown of Cfp1. For Fig. 3, three shRNA were used in combination, but comparable results were obtained with two individual shRNAs (Supplementary Fig. 3). ChIP with antibodies against Cfp1, H3K4me3 and H3K27me3 determined the effect of Cfp1 knockdown at four loci using qPCR. The eGFP insert was targeted to the Mecp2 locus by homologous recombination of a construct containing a PGK-Neo cassette flanked by loxP sites. Cre-mediated recombination was used to delete the selectable marker before bisulphite and ChIP analysis.

Online Methods

Release of CGI chromatin

Nuclei were prepared from brains of 4-week-old mice as previously described31. Nuclear preparations were digested with a twofold excess of HinP1 or Mse1 in a buffer containing 50 mM Tris-HCl, pH 8, 100 mM NaCl, 5 mM MgCl2, 0.1 mM EGTA and 1 mM β-mercaptoethanol. The released chromatin was retained in the supernatant after centrifugation at 3,800g for 5 min and the proteins were precipitated using trichloroacetic acid before western blot analysis.

Antibodies

Antibodies used are listed in Supplementary Table 2.

Chromatin immunoprecipitation and bisulphite sequencing

ChIP on brain tissue was performed as described17 using antibodies as shown in Supplementary Table 2. Most ChIP-qPCR profiles were replicated using independent Cfp1 antibodies. Illumina linkers were ligated in-house and Solexa sequencing was carried out using Illumina 2G Solexa sequencers using two replicate lanes per biological sample. ChIP-Seq was analysed using custom bioinformatic tools generated in-house (see Supplementary Table 3 for the parameters used). ChIP using formaldehyde crosslinked NIH3T3 cells was performed as previously described32. Bisulphite sequencing was performed as described29. Real-time PCR was carried out using Quantace Sensimix Plus using a Biorad iCycler according to the manufacturer’s instructions (primer sequences are available on request).

Generation of stable Cfp1-knockdown cells

NIH3T3 cells were transfected using lipofectamine reagent (Invitrogen) with three independent pSuper vectors containing short hairpin constructs directed against Cfp1 (Oligoengine) or vector alone. Target sequences were as follows: target 986, 5′-GAAGGUGAAGCACGUGAAG-3′; target 1250, 5′-CAGCCAACCGAAUCUAUGA-3′; and target 1920, 5′-CUUCACCAAACGAUCCAAC-3′. Stable clones were selected for puromycin resistance. A combination of the three shRNAs reduced Cfp1 more robustly and was therefore used for the data in Fig. 3. Individual shRNAs gave comparable results by western and ChIP (see Supplementary Fig. 3). RNA was extracted using Tri reagent (Sigma) and was complementary DNA was prepared using reverse transcriptase (Promega). Expression levels were determined using real-time PCR analysis (primer sequences available on request).

ES cell lines

ES cell line TβC44 was generated by homologous recombination as described20. A Mecp2-eGFP knock-in targeting vector was constructed by sequential cloning of 5′ (5.3 kb) and 3′ (1.9 kb) regions of Mecp2 homology into peGFP-N1 (Clontech). A PGK-Neo cassette flanked by loxP sites was added to enable selection of transfected cells. Gene targeting was carried out in the ES cell line E14 TG2a to generate an insertion into the Mecp2 gene transcription unit at the junction between the open reading frame and the 3′ untranslated region. This construct was initially designed to create a MeCP2-eGFP fusion protein after transcription and translation. Cells were grown on gelatinized dishes in the presence of recombinant human LIF in Glasgow MEM (Invitrogen) supplemented with 10% FBS (Globepharm), 1× MEM non-essential amino acids, sodium pyruvate (1 mM) and β-mercaptoethanol (50 μM; all Invitrogen). ES cells (5 × 107 cells) were transfected with linearized targeting vector (250 μg DNA in 0.8 ml HEPES buffered saline) by electroporation (800 V, 3 μF, BioRad Gene Pulser) and plated at 5 × 106 cells per dish. Correctly targeted clones were first identified by PCR specific for homologous recombination. The integrity of the targeted locus was confirmed by Southern blot analyses. A single positive clone was transiently transfected with pCAGGS-CRE33 for the Cre-mediated deletion of the selectable marker and a recombinant clone was then used for this study.