Main

Eukaryotic genomes are dynamically packaged into multiple levels of organization, from nucleosomes to chromatin fibres to large-scale chromosomal domains that occupy defined territories of the nucleus. The three-dimensional interplay of protein–DNA complexes facilitates timely realization of intricate nuclear functions such as transcription, replication, DNA repair and mitosis1. A combination of microscopy and chromosome conformation capture (3C)-related approaches2 has revealed that CCCTC-binding factor (CTCF) is, in large part, responsible for bridging the gap between nuclear organization and gene expression. CTCF is the main insulator protein described in vertebrates. Initially characterized as a transcription factor that is capable of activating or repressing gene expression in heterologous reporter assays3,4, CTCF was later found to display properties that are characteristic of insulators (that is, the ability to interfere with enhancer–promoter communication or to buffer transgenes from chromosomal position effects caused by heterochromatin spreading). These properties, which were observed using transgenic assays, were interpreted to suggest a role for insulators in restricting enhancer–promoter interactions and in establishing functional domains of gene expression.

In this Review, we discuss recent evidence from the use of 3C-related techniques, which indicates that the diverse properties of CTCF and other insulator proteins are based on their broader role in mediating both interchromosomal and intrachromosomal interactions between distant sites in the genome. As a result of these interactions, CTCF elicits specific functional outcomes that are context dependent, determined by the nature of the two sequences brought together and by the proteins with which they interact. Consequently, CTCF contributes to the establishment of a three-dimensional structure of the chromatin fibre in the nucleus that is both an effector and a consequence of genome function. As the role of CTCF extends well beyond that originally attributed to insulator proteins and its functional effects are based on its ability to mediate interactions between distant sequences, we propose the term 'architectural' rather than 'insulator' to describe this type of protein.

Regulation of CTCF binding to DNA

CTCF is conserved in most bilaterian phyla but is absent in yeast, Caenorhabditis elegans and plants5. It contains a highly conserved DNA-binding domain with 11 zinc-fingers6; it is present at 55,000–65,000 sites in mammalian genomes7 and is normally located in linker regions surrounded by well-positioned nucleosomes8. Of these sites, ~5,000 are ultraconserved between mammalian species and tissues, and correspond to high-affinity sites9, whereas 30–60% of CTCF-binding sites show cell-type-specific distribution8,10,11,12. The location of CTCF-binding sites with respect to genomic features provides insights into the possible roles of this protein. Approximately 50% of CTCF-binding sites are found in intergenic regions, ~15% are located near promoters and ~40% are intragenic (that is, within exons and introns)7,12 (Fig. 1). Surprisingly, and in view of the original role attributed to CTCF as an enhancer blocker, enhancer elements are enriched for this protein13,14, which indicates that a subset of CTCF-binding sites may be important in regulating transcription to establish cell-lineage-specific programmes. Experiments using the ChIP-exo technique uncovered a 52-bp CTCF-binding motif that contains four CTCF-binding modules15,16 (Fig. 1).

Figure 1: Features of CTCF-binding sites in the genome.
figure 1

Binding sites of CCCTC-binding factor (CTCF) are associated with different genetic elements. The majority of these sites are intergenic and colocalize with cohesin. In addition, a proportion of CTCF-binding sites are located near RNA polymerase III (Pol III) type II genes (for example, tRNA genes and short interspersed nuclear elements (SINEs)) and 'extra-TFIIIC' (ETC) loci, which suggests that TFIIIC and CTCF cooperate in some nuclear processes. The 12-bp consensus sequence of CTCF-binding sites is embedded within binding modules 2 and 3 as determined by ChIP-exo experiments. DNA methylation (represented by red circles) of cytosine residues occurs at positions 2 and 12 of the consensus sequence in a subset of CTCF-binding sites.

PowerPoint slide

The presence of CpGs in the DNA consensus sequence of the CTCF-binding site supports the notion that methylation of cytosine residues at carbon 5 of the base to form 5-methylcytosine (5mC) in CpG-containing sites may, at least partly, underlie CTCF target selectivity in different cell types17. Recent studies indicate that DNA methylation has a widespread role in regulating CTCF occupancy at many genes, including CDKN2A (which encodes INK4A and ARF)18, B-cell CLL/lymphoma 6 (BCL6)19 and brain-derived neurotrophic factor (BDNF)20. One study has mapped the occupancy of CTCF in 19 human cell types; by comparing this information with DNA methylation data from parallel reduced representation bisulphite sequencing, it was found that 41% of cell-type-specific CTCF-binding sites are linked to differential DNA methylation21 (Fig. 2). Conversely, at 67% of sites that showed variability in DNA methylation, the presence of 5mC was associated with a concomitant downregulation of cell-type-specific CTCF occupancy. CTCF can also affect the methylation status of DNA by forming a complex with poly(ADP-ribose) polymerase 1 (PARP1) and DNA (cytosine-5)-methyltransferase 1 (DNMT1). CTCF activates PARP1, which can then inactivate DNMT1 by poly(ADP-ribosyl)ation, and thus maintains methyl-free CpGs in the DNA22,23. An additional level of complexity in the interaction between CTCF and its target sequence can arise from the oxidation of 5mC to 5-hydroxymethylcytosine (5hmC)24,25, 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC)26 by ten-eleven translocation (TET) enzymes. Genome-wide profiling analyses of 5hmC have shown that this modification and, to a lesser extent, 5fC are enriched genomic locations that contain CTCF-binding sites27,28. Furthermore, identification of proteins that bind to different oxidized derivatives of 5mC discovered CTCF as a 5caC-specific binder29. These results underscore the complexity and possible importance of the relationship between DNA methylation status and plasticity of CTCF occupancy. However, the presence of cell-type-specific CTCF-binding sites that are not differentially methylated suggests the existence of other mechanisms by which the DNA occupancy of this protein is regulated (Fig. 2).

Figure 2: Regulation of CTCF binding to DNA.
figure 2

Constitutive binding sites of CCCTC-binding factor (CTCF), which are bound by CTCF in cells from different tissues, are present in non-methylated and nucleosome-free regions. Cell-type-specific CTCF binding is partly regulated by differential DNA methylation and nucleosome occupancy across different cell types. This suggests that cells can use ATP-dependent chromatin remodelling complexes to regulate nucleosome occupancy at specific CTCF-binding sites and control the interaction of this protein with DNA. In addition, the methylation status of cell-type-specific CTCF-binding sites may be determined by a combination of activities of de novo methyltransferases and ten-eleven translocation (TET) enzymes that regulate the presence and levels of 5-methylcytosine (5mC) at specific sites. Immortalized cancer cell lines contain high levels of 5mC at CTCF-binding sites, which correlates with the low CTCF occupancy in these cells. Filled red circles represent methylated DNA, and open circles denote unmethylated DNA.

PowerPoint slide

One such mechanism is post-translational covalent modification of CTCF, such as sumolyation30 and poly(ADP-ribosy)lation31. In breast cancer cells, defective poly(ADP-ribosyl)ation of CTCF leads to its dissociation from the CDKN2A locus, which results in aberrant silencing of this tumour suppressor gene32. In Drosophila melanogaster, poly(ADP-ribosy)lation of Centrosomal protein 190 kDa (Cp190) and CTCF facilitates their interaction, tethering to the nuclear matrix and intrachromosomal contacts33.

Interactions between CTCF and other proteins may represent an additional strategy by which the function of this protein can be regulated at various genomic locations during cell differentiation. Although different proteins — including transcription factor YY1, transcriptional regulator Kaiso, chromodomain helicase DNA-binding protein 8 (CHD8), PARP1, MYC-associated zinc-finger protein (MAZ), JUND, zinc-finger protein 143 (ZNF143), PR domain zinc-finger protein 5 (PRDM5) and nucleophosmin — have been implicated in CTCF function at specific loci34, only cohesin has been shown to be required to stabilize most CTCF-mediated chromosomal contacts and to be essential for CTCF function at most sites in the genome35,36,37,38,39,40,41. Interaction between CTCF and the cohesin complex takes place through the carboxy-terminal region of CTCF and the SA2 subunit of cohesin35. Similar to CTCF, cohesin is present in intergenic regulatory regions, promoters, introns and 5′ untranslated regions (5′UTRs) of genes during interphase of the cell cycle. Depending on the cell line, 50–80% of CTCF-binding sites in the genome are also occupied by cohesin, and downregulation of cohesin using RNA interference results in disruption of CTCF-mediated intrachromosomal interactions42,43,44. A second protein that may cooperate with CTCF at a subset of sites in the genome is TFIIIC, which is required for the transcription of tRNAs, 5S ribosomal RNA (rRNA), B2 short interspersed nuclear elements (SINEs) and other non-coding RNAs (ncRNAs) by RNA polymerase III (Pol III)45. TFIIIC also binds to many genomic sites devoid of Pol III that are called 'extra-TFIIIC' (ETC) loci (Fig. 1). In yeast, both the tRNA genes and ETC loci have been shown to cluster and tether DNA sequences to the nuclear periphery45,46. Furthermore, B2 SINEs and human tRNA genes, both of which contain TFIIIC-binding sites, can act as enhancer-blocking insulators in transgenic assays47,48, and genome-wide analyses revealed that CTCF and its binding partner cohesin are found in the vicinity of many tRNA genes and ETC loci in mouse49 and human cells50,51 (Fig. 1).

In addition, several observations suggest that RNAs can also cooperate with CTCF to stabilize its interactions with other proteins. D. melanogaster architectural proteins, such as Cp190, require the RNA helicase Rm62 for proper function, and interactions between Rm62 and Cp190 depend on the presence of RNA52. Similar observations have been made in mammalian cells, in which CTCF has been shown to interact with the DEAD-box RNA helicase p68 and its associated ncRNA, both of which are required for proper CTCF function53. These observations, together with new findings which indicate that CTCF can itself bind to the Jpx RNA54, support the idea that ncRNAs may have an important role in stabilizing interactions that are mediated by CTCF and its protein partners.

Mechanisms of CTCF function

A large body of evidence strongly supports the idea that the mechanism underlying the diverse functions of CTCF in genome biology is based on its ability to mediate long-range interactions between two or more genomic sequences. This evidence first came from 3C analyses of the mouse H19–insulin-like growth factor 2 (Igf2) and haemoglobin subunit beta (Hbb) loci. At the imprinted maternal H19–Igf2 locus, work using circular chromosome conformation capture (4C) indicated that the H19 imprinting control region (ICR) forms extensive interchromosomal and intrachromosomal interactions across the genome, many of which require the presence of CTCF-binding sites within the ICR55. Binding of CTCF at multiple DNase I hypersensitive sites is also required to maintain a specific chromatin architecture at the murine Hbb locus56. Similarly, the D. melanogaster architectural proteins Suppressor of Hairy wing (Su(Hw)), Boundary element-associated factor of 32 kDa (BEAF-32) and Zeste-white 5 (Zw5; also known as Dwg) were shown to mediate long-range interactions both at the Heat-shock-protein-70 (Hsp70) locus and between copies of the gypsy retrotransposon57,58.

Taken together, these observations suggest that the effect of CTCF and other architectural proteins on gene expression is a result of their ability to bring sequences that are far apart in the linear genome into close proximity. Assuming that this mechanism underlies all functions of CTCF, results of both locus-specific and genome-wide studies can be interpreted to suggest that CTCF-mediated contacts regulate aspects of genome function in a context-dependent manner; that is, the functional outcomes of these interactions depend on the nature of the sequences adjacent to CTCF-binding sites and perhaps on the presence of other specific chromatin proteins. Below, we critically analyse existing evidence for these various roles in an attempt to generate a model that reflects the function of this protein in its normal genomic context.

The classical roles of CTCF

CTCF as a chromatin barrier. The function of CTCF and other architectural proteins was initially analysed using transgenic assays, in which results were often interpreted to suggest that these proteins act as barriers to the processive spread of heterochromatin59. Indeed, sequences with bona fide barrier activity in yeast have been shown to recruit histone acetyltransferases that antagonize the spreading of silencing histone modifications60. However, although this interpretation of experimental results is often used to provide an explanation for those obtained in higher eukaryotes, recent studies now offer a different view of how architectural proteins may regulate gene expression. For example, one study has demonstrated the presence of a CTCF-dependent enhancer-blocking function but a CTCF-independent barrier function in the chicken HS4 insulator. In this case, the barrier function depends on upstream stimulatory factor 1 (USF1), which recruits histone acetyltransferases61.

Results from genome-wide studies of the localization of CTCF in relation to various histone modifications also do not offer strong support for a role of CTCF as a barrier. Human CD4+ T cells and HeLa cells contain ~30,000 domains of histone H3 trimethylated at lysine 27 (H3K27me3, which is a histone modification characteristic of silenced chromatin), of which ~1,600 and ~800 contain CTCF at one of the domain borders, respectively8. This represents only 2–4% of the domain borders, which is a relatively low number if CTCF was primarily involved in the establishment or maintenance of these silenced domains. Similarly, it has been suggested that CTCF can contribute to the formation of chromosomal domains that are associated with the nuclear lamina. Human lung fibroblasts contain 2,688 borders that flank lamin-associated domains (LADs). These domains are enriched in oriented promoters, CpG islands and CTCF-binding sites. Approximately 9% of LAD borders contain CTCF (245 of 2,688; 120 additional borders contain a combination of promoters, CpGs and/or CTCF) within 10 kb of the boundary62. Although the correlation is tantalizing, it is unlikely that CTCF is the main contributor to the formation of these borders or that this is one of the primary functions of this protein.

Despite the lack of strong evidence from genome-wide localization analyses, the role of CTCF has been interpreted as a domain barrier in several studies. Recent mapping of CTCF-mediated intrachromosomal and interchromosomal interactions in mouse embryonic stem cells (ESCs) using chromatin interaction analysis with paired-end tag sequencing (ChIA–PET), which combines chromatin immunoprecipitation (ChIP) with 3C analyses, seems to support the notion that CTCF may be, at least partly, responsible for the establishment of functional expression domains63. A total of 1,480 CTCF-containing cis-interacting loci were identified by this strategy. Cluster analyses of intrachromosomal interactions with seven histone modification signatures and Pol II occupancy profiles uncovered four distinct categories of CTCF-mediated loops. One class (155 of 1,295 loops that are <1 Mb; 12%) contains active H3K4me1, H3K4me2 and H3K36me3 histone modifications inside but repressive H3K9me3, H3K20me3 and H3K27me3 modifications outside the loops. A second class (142 of 1,295; 11%) of loops has the reverse pattern of histone modifications. Although the evidence is only correlative and the number of CTCF-mediated interactions is much smaller than the total number of CTCF-binding sites in the genome, the existence of these two types of loops in which CTCF flanks histone modifications that have opposite effects on transcription is a suggestive, but not conclusive, proof of a role for this protein in separating functional domains of gene expression.

Some examples of locus-specific analyses also seem to support this view of CTCF-mediated barrier activity. For example, the Wilms tumour 1 homologue (WT1) transcription factor can either activate or repress expression of the mouse wingless-related MMTV integration site 4 (Wnt4) gene in a cell-type-specific manner by controlling the state of chromatin in a domain that has CTCF-defined boundaries. Mutation of CTCF leads to spreading of histone modifications outside the delimited genomic domain, which causes aberrant expression of neighbouring genes and suggests a role for CTCF in the establishment or maintenance of the Wnt4 domain by creating a functional barrier64.

Although the ChIA–PET and Wnt4 results are best explained by assuming that CTCF is capable of barrier activity, a similar explanation of results from studies at other loci may seem to be less straightforward. For example, groups of androgen-responsive genes that are demarcated by CTCF-binding sites tend to have similar epigenetic and expression profiles, which suggests that CTCF establishes domains in which these genes are co-regulated65. Downregulation of CTCF results in a decrease in the expression of genes within the domain, whereas genes outside the domain are unaffected; this can be explained if CTCF is involved in targeting regulatory sequences to androgen-responsive promoters and, in its absence, transcription of these genes decreases. Similarly, the mouse homeobox A cluster (Hoxa) forms two distinct chromatin loops around CTCF-binding site 5 (CBS5) as ESCs differentiate into neural progenitor cells. The loop that contains the Hoxa1–7 gene cluster upstream of CBS5 is marked by active H3K4me3 modifications, whereas the loop containing the downstream Hoxa9–13 gene cluster is enriched for repressive H3K27me3 marks66. Knockdown of CTCF results in the loss of three-dimensional conformation and the concomitant spread of H3K27me3 modifications across the locus. These results can be explained on the basis of a barrier function for CTCF, but it is equally possible that CTCF-binding sites in the Hoxa locus can participate in bringing together regulatory sequences for gene activation or repression. This explanation is supported by results obtained in D. melanogaster, in which the role of CTCF in the maintenance of H3K27me3-enriched domains that are delimited by CTCF and other architectural proteins was analysed in detail. When CTCF was knocked down, H3K27me3-enriched domains showed a significant reduction in the level of this histone modification within the domain. However, little or no spreading of H3K27me3 was observed outside the demarcated domains67,68. This suggests that D. melanogaster CTCF helps to maintain the level of silencing within domains, but not its spreading, presumably by clustering H3K27me3-enriched loci and Polycomb group proteins into Polycomb bodies67,69. On the basis of these results, we suggest that there is little causal evidence to support a generalized functional role for CTCF in separating domains with different epigenetic marks. Instead, alternative mechanistically different processes, such as those involving looping between regulatory sequences, may provide a better explanation for some of the observations that were previously interpreted in this context.

CTCF as an enhancer blocker. Although CTCF has been extensively characterized for its ability to block enhancer activity in transgenic assays, there has been little evidence to support such a role for this protein in its normal genomic context. However, some recent studies suggest that CTCF can indeed act as an enhancer blocker at specific loci. For example, induction of the Ecdysone-induced protein 75B (Eip75B) gene by treatment of D. melanogaster Kc cells with the steroid hormone ecdysone results in the downregulation of one of the Eip75B transcripts that is expressed from an alternative upstream promoter. This is caused by activation of a poised CTCF-binding site by recruitment of Cp190, which increases its interaction with a distant CTCF-binding site and topologically separates the downregulated promoter of the Eip75B gene from its enhancer70. Some genome-wide studies also suggest an enhancer-blocking function for CTCF. A search for conserved regulatory motifs in the human genome led to the finding of 15,000 CTCF-binding sites that separate adjacent genes which show markedly reduced correlation in gene expression when compared with genes that are in a similar arrangement but that are not separated by CTCF-binding sites71. A similar observation has been made for the D. melanogaster BEAF-32 protein72, which suggests that CTCF and other architectural proteins can allow neighbouring gene pairs to be differentially regulated. However, the classical enhancer-blocking function of CTCF seems to contradict more recent results that support a function for CTCF as a facilitator of enhancer function. Below, we describe in detail some of this new information to underscore the widespread, but not widely acknowledged, role for CTCF as a positive regulator of various transcriptional processes. Ultimately, models that explain how CTCF controls gene expression need to account for these two apparently contradictory functions — enhancer blocker and enhancer facilitator — of this protein.

An updated view of CTCF function

CTCF helps to tether distant enhancers to their promoters. Recent observations seem to contradict the idea of enhancer blocking as a predominant role for CTCF. For example, one study examined interactions between promoters and their regulatory sequences using the chromosome conformation capture carbon copy (5C) technique and found that 79% of long-range interactions between distal elements and promoters are not blocked by the presence of one or more intervening CTCF-bound sites73. Instead, a proportion of these interacting distal elements are significantly enriched for CTCF and/or histone modifications that are characteristic of active enhancers (that is, H3K4me1, H3K4me2 and H3K27 acetylation (H3K27ac)), which strongly supports the concept that one of the main roles of CTCF in genome function may be to facilitate the interaction between regulatory sequences and promoters. Activation of transcription requires the assembly of specific activators, the Mediator complex and the basal transcription machinery in a process that involves long-range chromosomal interactions between distal enhancers and proximal promoter elements. An enrichment of CTCF-binding sites at promoters and intergenic regions has been observed in ChIP followed by sequencing (ChIP–seq) studies, which also suggests that one of the main functions of CTCF is to target regulatory elements to their cognate promoters. This conclusion is supported by the finding of a significant overlap between cell-type-specific CTCF-binding sites and enhancer elements74, as well as by studies at several individual loci. For example, CTCF-mediated topological organization of the major histocompatibility complex class II (MHC-II; also known as HLA-D) locus precedes transcriptional activation75. Activation of MHC-II gene expression by interferon-γ (IFNγ) treatment requires the looping of the XL9 enhancer element and its cognate promoters that is mediated by CTCF, MHC class II transactivator (CIITA) and specific transcription factors76.

CTCF has also been shown to be important in regulating the expression of complex gene clusters in which regulatory sequences are far from some of their target genes. For example, in human islets, CTCF maintains long-range interactions between the insulin (INS) and synaptotagmin 8 (SYT8) genes that are necessary for SYT8 transcription77. In the mammalian brain, neuronal diversity is attained through a combination of stochastic promoter choice and alternative pre-mRNA processing of the protocadherin (PCDH) genes. Each PCDH mRNA contains a variable 5′ exon followed by a common region. The PCDH gene cluster is comprised of more than 50 different 5′ exons, each preceded by its own promoter (Fig. 3). CTCF and cohesin bind to most of these promoters78 and the distant enhancer element HS5-1 (Ref. 79). Alternative isoform expression requires CTCF-mediated DNA looping between the HS5-1 enhancer and active PCDHA promoters80,81 (Fig. 3). Conditional knockout of CTCF in mouse postmitotic projection neurons leads to reduced expression of PCDH genes, neuronal defects and abnormal behaviour, which suggests that CTCF is required to tether the HS5-1 enhancers to the various promoters82.

Figure 3: CTCF regulates enhancer–promoter interactions in a multigene cluster.
figure 3

a | The human protocadherin A (PCDHA) gene cluster contains 13 similar, tandem, variable first exons (1–13; transcribed ones are shown in blue and untranscribed ones in white) and two related c-type ubiquitous first exons (c1 and c2; shown in yellow). Each of these 15 variable first exons is adjacent to its own promoter and is spliced to three downstream constant exons (1–3; shown in black). Alternative isoforms of PCDHA are expressed stochastically, whereas all the c-type isoforms are expressed ubiquitously in all cells. The SK-N-SH cells depicted here express isoforms 4, 8 and 12. b | Promoter choice and the formation of an active chromatin hub is mediated by CCCTC-binding factor (CTCF)–cohesin DNA looping between the distal HS5-1 enhancer and distinct promoters at the PCDHA gene cluster. Individual variable exons or ubiquitous exons may be expressed and joined to the three exons from the constant region by mRNA splicing. Binding of CTCF to the promoter that precedes individual exons is correlated with the level of gene activity. The active promoters are distinguished from the inactive ones by an enrichment for histone H3 trimethylated at lysine 4 (H3K4me3) and a depletion of DNA methylation, which leads to expression of the downstream genes.

PowerPoint slide

A third recent example that underscores the role of CTCF in promoting enhancer–promoter interactions comes from studies in mouse ESCs, in which the TATA-binding protein-associated factor 3 (TAF3) — a component of the core promoter-recognition complex TFIID — is required for endodermal differentiation. In addition to promoters, TAF3 localizes to distal sites that contain CTCF and cohesin, and the two sequences form a loop in a TAF3-dependent manner83 (Fig. 4). Given the role of TAF3 in regulating lineage commitment in ESCs, the distal elements that contain binding sites for both CTCF and TAF3 might have acquired H3K4me1 and H3K4me2 pre-patterning in ESCs to become endodermal enhancers, thus supporting the idea that CTCF can tether distal regulatory sequences to their target promoters.

Figure 4: CTCF facilitates endodermal enhancer–promoter interactions in ESCs.
figure 4

Recruitment of TATA-binding protein-associated factor 3 (TAF3) at endodermal enhancers by CCCTC-binding factor (CTCF) and chromatin looping activates the mitogen-activated protein kinase 3 (Mapk3) gene in mouse embryonic stem cells (ESCs). Apart from being a component of TFIID at core promoters, TAF3 may also associate with other transcription factors across the genome in ESCs. For example, TAF3 represses the activity of pluripotency-associated transcription factors (octamer-binding protein 4 (OCT4; also known as POU5F1), SOX2 and homeobox protein NANOG). H3K4me3, histone H3 trimethylated at lysine 4; MED12, Mediator complex subunit 12; Pol II, RNA polymerase II; TBP, TATA-binding protein.

PowerPoint slide

Observations from genome-wide analyses of intrachromosomal interactions also support a role of CTCF in facilitating contacts between transcription regulatory sequences. An analysis of CTCF-mediated interactions using ChIA–PET in mouse ESCs suggests that this protein is involved in clustering promoters of different genes, perhaps to establish 'transcription factories'. Interestingly, 28% of genes with promoters that are brought into close proximity (<10 kb) to p300 sites by CTCF-mediated contacts are upregulated in mouse ESCs, and knockdown of CTCF results in downregulation of some of these genes, which supports the notion that CTCF may be involved in mediating enhancer–promoter interactions during transcription initiation63.

CTCF regulates recombination at the antigen receptor loci. The role of CTCF in mediating enhancer–promoter communication may also contribute to the regulation of other nuclear processes such as V(D)J recombination. The B cell immunoglobulin (Ig) and T cell receptor (Tcr) loci comprise multiple copies of variable (V), diversity (D), joining (J) and constant (C) gene segments that span across large genomic regions (Fig. 5). During the adaptive response, unique epigenetic features and three-dimensional chromatin architecture at these loci provide the framework for recombinase-activating gene (RAG)-mediated DNA recombination of the gene segments to generate antigen receptor diversity84. Although CTCF-mediated long-range chromatin interactions are not essential for the progression of V(D)J recombination, they may influence lineage- and/or developmetal stage-specific segment choice during recombination.

Figure 5: CTCF regulates V(D)J recombination.
figure 5

V(D)J recombination at antigen receptor loci is regulated by chromatin accessibility, which correlates with active histone modifications and transcription. CCCTC-binding factor (CTCF) may influence the outcome of V(D)J recombination by regulating enhancer–promoter interactions and locus compaction. a | Organization of the mouse immunoglobulin heavy chain complex (Igh) locus is shown. b | CTCF-mediated looping of diversity (DH)–joining (JH)–constant (CH) segments imposes ordered (that is, DH–JH) recombination by controlling the communication of enhancers (Eμ and 3′ regulatory region (3′RR)) with distinct gene segments. Binding of CTCF at intergenic control region 1 (IGCR1) blocks the influence of the Eμ enhancer on proximal variable (VH) segments and prevents the spread of active histone modifications from DH into the proximal VH region. In addition, it inhibits the level of antisense transcription within the DH region and modulates locus compaction in collaboration with other factors (for example, transcription factor YY1, DNA-binding protein Ikaros, paired box protein PAX-5 and transcription factor 3 (TCF3; also known as E2A)). As a consequence, CTCF within IGCR1 may bias the rearrangement of distal over proximal VH segments with DJH joins. RAG, recombinase-activating gene.

PowerPoint slide

Looping between distant CTCF-binding sites may bring distant gene segments together. In mouse pro-B cells, chromatin looping of CTCF-binding sites at the immunoglobulin heavy chain complex (Igh) locus occurs independently of the Eμ enhancer and contributes to the compaction of the locus85,86 (Fig. 5). In double-positive thymocytes, CTCF-mediated looping between the Eα enhancer and specific promoters within the Tcra–Tcrd locus facilitates Vα–Jα over Vδ–Dδ–Jδ rearrangement87. By establishing interactions between specific sequences, CTCF may also impede other sequences from contacting each other. In fact, this may be the basis for the enhancer-blocking function of CTCF. In the Igh locus, two CTCF-binding sites within intergenic control region 1 (IGCR1) mediate ordered and lineage-specific VH–DJH recombination and bias distal over proximal VH rearrangements88. Positioned between the VH and DH clusters, IGCR1 suppresses the transcriptional activity and the rearrangement of proximal VH segments by forming a CTCF-mediated loop that presumably isolates the proximal VH promoter from the influence of the downstream Eμ enhancer (Fig. 5). Similarly, in pre-pro-B cells, CTCF promotes distal over proximal Vκ rearrangement by blocking the communication between specific enhancer and promoter elements in the Igk locus89.

CTCF regulates transcriptional pausing and alternative mRNA splicing. The existence of a proportion of CTCF-binding sites in the 5′UTR and introns of genes suggests a role for CTCF in regulating transcriptional events downstream of the initiation step. Indeed, recent studies indicate that CTCF can control both pausing of Pol II and alternative mRNA splicing. For example, CTCF binds to both the first intron and upstream regulatory elements in the mouse myeloblastosis oncogene (Myb) locus. During erythroid differentiation, looping between the first intron, promoter and upstream enhancer elements that is mediated by CTCF and key erythroid transcription and elongation factors is required for Pol II-mediated transcriptional elongation and high expression of the Myb gene90. This three-dimensional architecture is lost upon differentiation, when CTCF interferes with Pol II elongation at the first intron, which leads to low expression of Myb. In this case, the dual functions of CTCF in transcription initiation and pausing seem to rely on its ability to stabilize long-range interactions with regulatory sequences and to impede the elongation of Pol II. The effect of CTCF on Pol II elongation may be widespread, given that the genome-wide presence of CTCF at promoter-proximal regions in 5′UTRs strongly correlates with high pausing indexes91.

In other cases, hindering elongation of Pol II by CTCF may result in the inclusion or exclusion of specific exons in the mature mRNA. One example of this phenomenon occurs at the CD45 gene in humans, which expresses alternatively spliced transcripts during lymphocyte differentiation. Binding of CTCF to exon 5 of the gene promotes its inclusion in the CD45 mRNA, whereas disruption of CTCF binding results in exclusion of this exon. Interestingly, it seems that DNA methylation of CTCF recognition sequences in exon 5 determines whether this protein binds to exon sequences, as knockdown of DNMT1 during late stages of lymphocyte differentiation leads to CTCF binding and inclusion of exon 5 in CD45 transcripts92 (Fig. 6).

Figure 6: CTCF promotes alternative mRNA splicing.
figure 6

Mutually exclusive DNA methylation and CCCTC-binding factor (CTCF) binding may regulate alternative splicing. At the CD45 locus, DNA methylation (represented by red circles) at exon 5 inhibits CTCF binding, which leads to fairly unimpeded transcriptional elongation by RNA polymerase II (Pol II) and subsequent exclusion of exon 5 during splicing of the resultant mRNA (upper panel). By contrast, hypomethylation of exon 5 leads to CTCF binding and Pol II stalling, which promotes the inclusion of exon 5 (lower panel).

PowerPoint slide

Genome topology may rationalize CTCF roles

Results from experiments that are aimed at mapping all interactions in the genome using Hi-C suggest that genomes of higher eukaryotes are organized into topologically associating domains (TADs), which are defined by a high frequency of interactions within domains and a low frequency of interactions between adjacent domains (Fig. 7). In D. melanogaster, TAD boundaries are gene-dense regions that are enriched for highly transcribed genes and clusters of architectural protein-binding sites, including those of CTCF, BEAF-32, Su(Hw), Modifier of mdg4 (Mod(mdg4)), Chromator (Chro) and Cp190 (Refs 93,94). Similarly, TAD borders in mammals are enriched for binding sites of CTCF and double-strand break repair protein rad21 homologue (RAD21), housekeeping and tRNA genes, and SINEs95 (Fig. 7). The enrichment of CTCF and RAD21 at TAD borders may have a causal role in determining their establishment. This conclusion is supported by results from experiments in which a 58-kb region located at the border between the TADs of Tsix (X (inactive)-specific transcript, opposite strand) and Xist (inactive X specific transcripts) in the mouse X chromosome was deleted. Elimination of these sequences — which include a CTCF-binding site and the Xist, Tsix and regulatory region 18 (Rr18; also known as Xite) genes — leads to increased interactions in the previous inter-TAD border region and to the formation of a new TAD border at an adjacent location96.

Figure 7: CTCF regulates three-dimensional genome architecture.
figure 7

a | Schematic data generated by Hi-C in mammalian cells are shown in an interaction heat map of a ~2.5-Mb chromosome segment. The topologically associating domains (TADs) and their borders are indicated. b | The presence of multiple binding sites for CCCTC-binding factor (CTCF) and TFIIIC at TAD borders may contribute to the establishment of the border. This arrangement may provide an explanation for the observed function of CTCF as an enhancer blocker. Conversely, CTCF-binding sites within TADs may facilitate enhancer–promoter looping through the recruitment of cohesin. The blue box denotes the promoter of the gene. c | Chromatin features of TAD borders in mammals and Drosophila melanogaster are shown. The TAD borders in mammals are enriched for housekeeping and tRNA genes, short interspersed nuclear elements (SINEs) and CTCF-binding sites. In D. melanogaster, they are enriched for highly transcribed genes and clusters of binding sites for various architectural proteins, such as Suppressor of Hairy wing (Su(Hw)), Modifier of mdg4 (Mod(mdg4)) and Boundary element-associated factor of 32 kDa (BEAF-32). The roles of TFIIIC, cohesin and condensin proteins in mediating TAD border formation remain to be determined. Cp190, Centrosomal protein 190 kDa.

PowerPoint slide

Recent studies using Hi-C to investigate the role of CTCF and cohesin in the three-dimensional organization of the mammalian genome support a similar role for CTCF as a boundary protein between TADs, although the details vary between the different studies97,98,99. Depletion of cohesin in HEK293 human embryonic kidney cells results in a general loss of intrachromosomal interactions without affecting the TAD organization, whereas depletion of CTCF causes a similar decrease in the frequency of intradomain interactions concomitant with an increase in the frequency of interactions between adjacent TADs97. Cohesin-deficient postmitotic mouse astrocytes also show a reduced number of long-range interactions that are mediated by CTCF and cohesin but additionally display a relaxation of TAD organization98. This TAD relaxation could be a consequence of a reduction in TAD border strength due to the lack of cohesin binding or to an increase in the frequency of inter-TAD interactions as observed in CTCF-depleted HEK293 cells. A similar decrease in the frequency of cohesin-mediated interactions was observed in cohesin-depleted developing mouse thymocytes that were arrested in G1 phase, and an increase in the frequency of alternative interactions resulted in changes to gene expression99.

In mammals, only 15% of genomic CTCF-binding sites are present at TAD borders, whereas the other 85% are present inside TADs95; this indicates that CTCF and cohesin alone are insufficient to separate different TADs — a conclusion supported by the fairly mild effects on TAD organization in cells that are depleted of CTCF or cohesin. In D. melanogaster, CTCF forms clusters with other architectural proteins at TAD borders, and vertebrate CTCF might adopt a similar strategy. Several lines of evidence suggest TFIIIC as a candidate architectural protein that cooperates with CTCF at TAD borders in vertebrates. As discussed above, TFIIIC colocalizes with CTCF near many tRNA genes and ETC loci in mammalian cells49,50,51, and it also binds to SINEs and tRNA genes, both of which functionally behave as enhancer-blocking insulators in humans47,48 and are enriched at TAD borders95. As CTCF has been shown to recruit cohesin in mammalian cells, and TFIIIC interacts with both cohesin and condensin in yeast, CTCF and TFIIIC might act as docking sites for these proteins to stabilize interactions that are required for the formation of TAD borders. Such borders do not allow cross-interactions between sequences in the two adjacent TADs. Thus, it is possible that sequences at these borders represent the enhancer-blocking insulators that have previously been characterized in transgenic assays (Fig. 7). Additional studies will be needed to clarify whether clustering of CTCF, TFIIIC, cohesin and condensin occur at TAD boundaries of mammalian cells and whether their presence is required for border formation.

The majority of CTCF-binding sites (~85%) are found within TADs and are, by definition, unable to form a border. What is the role of CTCF at these sites? Studies in pre-pro-B cells using Hi-C suggest that CTCF located within TADs is primarily involved in mediating short-range intra-TAD interactions100. As discussed above, the function of these CTCF-mediated interactions may be to direct enhancers within the TAD to the appropriate gene promoter. In large mammalian genomes, the resolution of Hi-C data is limited by the number of sequencing reads, which restricts the amount of structural information that can be obtained at the sub-TAD level. The use of 5C over large (1–2 Mb) genomic regions has made possible the mapping of finer topologies at the sub-megabase scale101. These topologies originate from interactions that are mediated by CTCF, cohesin and Mediator either alone or in various combinations. Many of these interactions change during cell differentiation and occur between genomic regions containing epigenetic signatures that are characteristic of enhancers and promoters. Furthermore, it seems that different combinations of these three architectural proteins mediate interactions at different length scales, whereas CTCF in combination with cohesin is enriched in constitutive interactions in mouse ESCs, and these interactions do not change when these cells differentiate into neural progenitor cells. These results are suggestive of a functional specialization of CTCF-mediated contacts as a consequence of interactions between this protein and its various partners. The formation of different complexes with other proteins inside TADs and at TAD borders may underlie the different functions of CTCF in genome organization and provide an explanation for its apparently contradictory properties as both an enhancer facilitator and an enhancer blocker (Fig. 7).

Conclusions and perspectives

The emerging theme from recent studies is that CTCF functions as an architectural protein that contributes to the establishment of genome topology. This is attained at two levels that are likely to be interrelated and that account for most previous observations. At a global level, interactions mediated by CTCF and other architectural proteins result in the formation of TADs. At a more local sub-megabase scale, CTCF may be involved in 'fine-tuning' intrachromosomal interactions within TADs to regulate various aspects of gene expression.

Cooperation of CTCF with other protein partners, which are possibly regulated by covalent modifications, may determine its functional specificity. First, association with other proteins — such as TFIIIC, cohesins and condensins — at specific genomic locations may result in the formation of TAD borders by precluding interactions across these sites, thus rationalizing the observed enhancer-blocking properties of this protein. Second, association of CTCF with other proteins, such as cohesin and Mediator, may define the range and stability of chromosomal interactions within TADs, which provides an explanation for its other roles in transcription. The ability of CTCF to bind to RNA opens the possibility for ncRNAs in helping to stabilize these contacts and perhaps regulate their function. Finally, covalent modifications of CTCF and its partners are also likely to influence their regulatory potential. The principal outcome of CTCF-mediated contacts is to regulate transcription at various levels, including initiation, promoter selection, promoter-proximal pausing and splicing. The role of CTCF in determining three-dimensional genome organization has so far been considered mostly in the context of its effect on transcription during G1 phase, but both the protein and the architectural properties of the genome it controls are also likely to be important at other stages of the cell cycle, including DNA replication during S phase and chromosome condensation during mitosis102. In particular, important issues for future studies include how the TAD organization relates to the structure of metaphase chromosomes and how this affects gene expression at the beginning of G1 phase.