Abstract
Dynamic changes in the three-dimensional (3D) organization of chromatin are associated with central biological processes, such as transcription, replication and development. Therefore, the comprehensive identification and quantification of these changes is fundamental to understanding of evolutionary and regulatory mechanisms. Here, we present Comparison of Hi-C Experiments using Structural Similarity (CHESS), an algorithm for the comparison of chromatin contact maps and automatic differential feature extraction. We demonstrate the robustness of CHESS to experimental variability and showcase its biological applications on (1) interspecies comparisons of syntenic regions in human and mouse models; (2) intraspecies identification of conformational changes in Zelda-depleted Drosophila embryos; (3) patient-specific aberrant chromatin conformation in a diffuse large B-cell lymphoma sample; and (4) the systematic identification of chromatin contact differences in high-resolution Capture-C data. In summary, CHESS is a computationally efficient method for the comparison and classification of changes in chromatin contact data.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The datasets analyzed in this study have been obtained from the Gene Expression Omnibus (Rao et al.10, GSE63525; Bonev et al.12, GSE96107; Despang et al.50, GSE125294) and ArrayExpress (Hug et al.9, E-MTAB-4918; Díaz et al.48, E-MTAB-5875).
Code availability
The CHESS source code and the code for generating the synthetic Hi-C matrices and running tests on them is available on GitHub: (https://github.com/vaquerizaslab/CHESS). The intervaltree and tqdm packages used internally in CHESS can be found at https://github.com/chaimleib/intervaltree and https://github.com/tqdm/tqdm, respectively. In addition, CHESS uses internally the following published packages: FAN-C71 (https://github.com/vaquerizaslab/fanc); Cython72; SciPy69; scikit-image59; NumPy73,74; Pandas75; Pathos76; Pybedtools77; Kneed78.
References
Bonev, B. & Cavalli, G. Organization and function of the 3D genome. Nat. Rev. Genet. 17, 661–678 (2016).
Vietri Rudan, M. et al. Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 10, 1297–1309 (2015).
Acemel, R. D., Maeso, I. & Gómez‐Skarmeta, J. L. Topologically associated domains: a successful scaffold for the evolution of gene regulation in animals. Wiley Interdiscip. Rev. Dev. Biol. 6, e265 (2017).
Lazar, N. H. et al. Epigenetic maintenance of topological domains in the highly rearranged gibbon genome. Genome Res. 28, 983–997 (2018).
Eres, I. E., Luo, K., Hsiao, C. J., Blake, L. E. & Gilad, Y. Reorganization of 3D genome structure may contribute to gene regulatory evolution in primates. PLoS Genet. 15, e1008278 (2019).
Yang, Y., Zhang, Y., Ren, B., Dixon, J. R. & Ma, J. Comparing 3D genome organization in multiple species using phylo-HMRF. Cell Syst. 8, 494–505.e14 (2019).
Ke, Y. et al. 3D chromatin structures of mature gametes and structural reprogramming during mammalian embryogenesis. Cell 170, 367–381.e20 (2017).
Du, Z. et al. Allelic reprogramming of 3D chromatin architecture during early mammalian development. Nature 547, 232–235 (2017).
Hug, C. B., Grimaldi, A. G., Kruse, K. & Vaquerizas, J. M. Chromatin architecture emerges during zygotic genome activation independent of transcription. Cell 169, 216–228.e19 (2017).
Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Dixon, J. R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015).
Bonev, B. et al. Multiscale 3D genome rewiring during mouse neural development. Cell 171, 557–572.e24 (2017).
Nagano, T. et al. Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature 547, 61–67 (2017).
Gibcus, J. H. et al. A pathway for mitotic chromosome formation. Science 359, eaao6135 (2018).
Spielmann, M., Lupiáñez, D. G. & Mundlos, S. Structural variation in the 3D genome. Nat. Rev. Genet. 19, 453–467 (2018).
Krijger, P. H. L. & de Laat, W. Regulation of disease-associated gene expression in the 3D genome. Nat. Rev. Mol. Cell Biol. 17, 771–782 (2016).
Darrow, E. M. et al. Deletion of DXZ4 on the human inactive X chromosome alters higher-order genome architecture. Proc. Natl Acad. Sci. USA 113, E4504–E4512 (2016).
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Crane, E. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015).
Yang, T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 27, 1939–1949 (2017).
Sauria, M. E. G. & Taylor, J. QuASAR: quality assessment of spatial arrangement reproducibility in Hi-C data. Preprint at bioRxiv https://doi.org/10.1101/204438 (2017).
Ursu, O. et al. GenomeDISCO: a concordance score for chromosome conformation capture experiments using random walks on contact map graphs. Bioinformatics 34, 2701–2707 (2018).
Yan, K.-K., Yardımcı, G. G., Yan, C., Noble, W. S. & Gerstein, M. HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps. Bioinformatics 33, 2199–2201 (2017).
Shavit, Y. & Lio’, P. Combining a wavelet change point and the Bayes factor for analysing chromosomal interaction data. Mol. Biosyst. 10, 1576–1585 (2014).
Huynh, L. & Hormozdiari, F. TAD fusion score: discovery and ranking the contribution of deletions to genome structure. Genome Biol. 20, 60 (2019).
Paulsen, J. et al. HiBrowse: multi-purpose statistical analysis of genome-wide chromatin 3D organization. Bioinformatics 30, 1620–1622 (2014).
Lareau, C. A. & Aryee, M. J. diffloop: a computational framework for identifying and analyzing differential DNA loops from sequencing data. Bioinformatics 34, 672–674 (2018).
Djekidel, M. N., Chen, Y. & Zhang, M. Q. FIND: difFerential chromatin INteractions Detection using a spatial Poisson process. Genome Res. 28, 412–422 (2018).
Stansfield, J. C., Cresswell, K. G., Vladimirov, V. I., Dozmorov, M. G. HiCcompare: an R-package for joint normalization and comparison of HI-C datasets. BMC Bioinformatics 19, 279 (2018).
Lun, A. T. L. & Smyth, G. K. diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data. BMC Bioinformatics 16, 258 (2015).
Cook, K. B., Hristov, B. H., Le Roch, K. G., Vert, J. P. & Noble, W. S. Measuring significant changes in chromatin conformation with ACCOST. Nucleic Acids Res. 48, 2303–2311 (2020).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Wang, Z. & Bovik, A. C. A universal image quality index. IEEE Signal Process. Lett. 9, 81–84 (2002).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Harmston, N. et al. Topologically associating domains are ancient features that coincide with Metazoan clusters of extreme noncoding conservation. Nat. Commun. 8, 441 (2017).
Lee, J. et al. Synteny Portal: a web-based application portal for synteny block analysis. Nucleic Acids Res. 44, W35–W40 (2016).
Schwarzer, W. et al. Two independent modes of chromatin organization revealed by cohesin removal. Nature 551, 51–56 (2017).
Nora, E. P. et al. Targeted degradation of CTCF decouples local insulation of chromosome domains from genomic compartmentalization. Cell 169, 930–944.e22 (2017).
Haarhuis, J. H. I. et al. The cohesin release factor WAPL restricts chromatin loop extension. Cell 169, 693–707.e14 (2017).
Rao, S. S. P. et al. Cohesin loss eliminates all loop domains. Cell 171, 305–320.e24 (2017).
Wutz, G. et al. Topologically associating domains and chromatin loops depend on cohesin and are regulated by CTCF, WAPL, and PDS5 proteins. EMBO J. 36, 3573–3599 (2017).
Gassler, J. et al. A mechanism of cohesin-dependent loop extrusion organizes zygotic genome architecture. EMBO J. 36, 3600–3618 (2017).
Lupiáñez, D. G. et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012–1025 (2015).
Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016).
Flavahan, W. A. et al. Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature 529, 110–114 (2016).
Hnisz, D. et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science 351, 1454–1458 (2016).
Díaz, N. et al. Chromatin conformation analysis of primary patient tissue using a low input Hi-C method. Nat. Commun. 9, 4938 (2018).
Hughes, J. R. et al. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat. Genet. 46, 205–212 (2014).
Despang, A. et al. Functional dissection of the Sox9–Kcnj2 locus identifies nonessential and instructive roles of TAD architecture. Nat. Genet. 51, 1263–1271 (2019).
Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011).
Lin, D. et al. Digestion-ligation-only Hi-C is an efficient and cost-effective method for chromosome conformation capture. Nat. Genet. 50, 754–763 (2018).
Beagrie, R. A. et al. Complex multi-enhancer contacts captured by genome architecture mapping. Nature 543, 519–524 (2017).
Cardozo Gizzi, A. M. et al. Microscopy-based chromosome conformation capture enables simultaneous visualization of genome organization and transcription in intact organisms. Mol. Cell 74, 212–222.e5 (2019).
Sampat, M. P., Wang, Z., Gupta, S., Bovik, A. C. & Markey, M. K. Complex wavelet structural similarity: a new image similarity index. IEEE Trans. Image Process. 18, 2385–2401 (2009).
Homola, T., Dohnal, V. & Zezula, P. Searching for sub-images using sequence alignment. In Proc. 2011 IEEE International Symposium on Multimedia 61–68 (IEEE, 2011).
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
Knight, P. A. & Ruiz, D. A fast algorithm for matrix balancing. IMA J. Numer. Anal. 33, 1029–1047 (2013).
van der Walt, S. et al. scikit-image: image processing in Python. PeerJ. 2, e453 (2014).
Behara, K. N. S., Bhaskar, A. & Chung, E. Geographical window based structural similarity index for OD matrices comparison. J. Intell. Transp. Syst., https://doi.org/10.1080/15472450.2020.1795651 (2020).
Djukic, T., Hoogendoorn, S. & Van Lint, H. Reliability assessment of dynamic OD estimation methods based on structural similarity index. In Proc. Transportation Research Board 92nd Annual Meeting (Transportation Research Board, 2013).
Breakey, D. & Meskell, C. Comparison of metrics for the evaluation of similarity in acoustic pressure signals. J. Sound Vib. 332, 3605–3609 (2013).
Hines, A. & Harte, N. Speech intelligibility prediction using a Neurogram Similarity Index Measure. Speech Commun. 54, 306–320 (2012).
Tomasi, C. & Manduchi, R. Bilateral filtering for gray and color images. In Proc. Sixth International Conference on Computer Vision 839–846 (IEEE, 1998).
Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. A Syst. Hum. 9, 62–66 (1979).
Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148, 458–472 (2012).
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R. & Mozziconacci, J. Normalization of a chromosomal contact map. BMC Genomics 13, 436 (2012).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Blythe, S. A. & Wieschaus, E. F. Zygotic genome activation triggers the DNA replication checkpoint at the midblastula transition. Cell 160, 1169–1181 (2015).
Kruse, K., Hug, C. B. & Vaquerizas, J. M. FAN-C: a feature-rich framework for the analysis and visualisation of C data. Preprint at bioRxiv https://doi.org/10.1101/2020.02.03.932517 (2020).
Behnel, S. et al. Cython: the best of both worlds. Comput. Sci. Eng. 13, 31–39 (2011).
Oliphant, T. E. A Guide to NumPy (Trelgol Publishing, 2006).
van der Walt, S., Colbert, S. C. & Varoquaux, G. The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13, 22–30 (2011).
McKinney, W. Data structures for statistical computing in Python. In Proc. Python in Science Conference 56–61 (ScyPy.org, 2010).
McKerns, M. M., Strand, L., Sullivan, T., Fang, A. & Aivazis, M. A. G. Building a framework for predictive science. Preprint at https://arxiv.org/abs/1202.1056 (2012).
Dale, R. K., Pedersen, B. S. & Quinlan, A. R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011).
Satopaa, V., Albrecht, J., Irwin, D. & Raghavan, B. Finding a ‘Kneedle’ in a haystack: detecting knee points in system behavior. In Proc. 31st International Conference on Distributed Computing Systems Workshops 166–171 (IEEE Computer Society, 2011).
Acknowledgements
Work in the Vaquerizas laboratory is funded by the Max Planck Society, the Deutsche Forschungsgemeinschaft (DFG) Priority Programme SPP 2202 ‘Spatial Genome Architecture in Development and Disease’ (project no. 422857230 to J.M.V.), the DFG Clinical Research Unit CRU326 ‘Male Germ Cells: from Genes to Function’ (project no. 329621271 to J.M.V.), the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 643062—ZENCODE-ITN to J.M.V.) and the Medical Research Council in the UK. This research was partially funded by the European Union’s H2020 Framework Programme through the European Research Council (grant no. 609989 to M.A.M.-R.). We thank the support of the Spanish Ministerio de Ciencia, Innovación y Universidades through grant no. BFU2017-85926-P to M.A.M.-R. The Centre for Genomic Regulation thanks the support of the Ministerio de Ciencia, Innovación y Universidades to the European Molecular Biology Laboratory partnership, the ‘Centro de Excelencia Severo Ochoa 2013–2017’, agreement no. SEV-2012-0208, the CERCA Programme/Generalitat de Catalunya, Spanish Ministerio de Ciencia, Innovación y Universidades through the Instituto de Salud Carlos III, the Generalitat de Catalunya through the Departament de Salut and Departament d’Empresa i Coneixement and cofinancing by the Spanish Ministerio de Ciencia, Innovación y Universidades with funds from the European Regional Development Fund corresponding to the 2014–2020 Smart Growth Operating Program. S.G. thanks the support from the Company of Biologists (grant no. JCSTF181158) and the European Molecular Biology Organization Short-Term Fellowship programme.
Author information
Authors and Affiliations
Contributions
N.M. and J.M.V. conceptualized the study. S.G., N.M. and K.K. devised the methodology. N.M. and J.M.V. carried out the investigation. S.G., K.K. and N.D. obtained the resources. S.G., N.M., K.K., M.A.M.-R. and J.M.V. prepared and wrote the original draft of the manuscript. S.G., N.M., K.K., N.D., M.A.M.-R. and J.M.V. wrote, reviewed and edited the draft. J.M.V. supervised the study. M.A.M.-R. and J.M.V. acquired the funding.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Performance analysis of the CHESS algorithm.
a, CHESS P values in dependence of the relative noise level in synthetic matrices. Shown are the cases of equal amounts of noise in reference R and query Q (top) and different amounts of noise (bottom, noise only added to Q). Each case is examined for normalised and observed/expected (obs/exp) matrices, and different window sizes in the SSIM algorithm. b, Empirically determined CHESS P values in dependence of the size factor between R and Q for normalised (left) and observed/expected (obs/exp) matrices (right) (details in Methods). a, b, Solid lines indicate the mean, shaded areas the standard deviation over 100 simulations per parameter combination.
Extended Data Fig. 2 Technical details of the SSIM algorithm applied to Hi-C matrices.
a, Schematic overview of the structural similarity algorithm (SSIM). SSIM scores are calculated on all submatrices of R /Q at a given window size (WS). The final SSIM score is the mean of all SSIM submatrix scores. b, SSIM submatrix formula. Different components are coloured: illuminance (green), structure * contrast (red). x, y refer to submatrices (at the same positions) of the two full matrices for which the SSIM average is computed (see panel a). μ indicates the mean, σ the standard deviation, c1 and c2 are small constants that are introduced only for numerical reasons. c and d, SSIM comparisons of a matrix to itself (red dots) and 1,000 random matrices of the same size (blue dots). c, SSIM component values in dependence of SSIM score for different SSIM window sizes. d, Scatterplots of ranked SSIM scores at window size 100 vs. ranked scores at smaller window sizes.
Extended Data Fig. 3 Additional analysis of the CHESS algorithm.
a, Uniform distribution of empirically determined CHESS P values for comparisons of matrices with 100 % noise added. b, Distribution of structural similarity scores (SSIM) for background and truth comparisons at 25 k/Mb and 1.5 M/Mb simulated sequencing depth. Above each: Fractional change (value at x % noise/value at 0 % noise) of the standard deviation (std) of background scores and mean of truth scores over 100 simulations per parameter combination.
Extended Data Fig. 4 CHESS is robust to changes in noise due to random ligations and sequencing depth in real Hi-C data.
a, Examples of 5 Mb matrices used in this analysis including a 5, 80 and 95 % of added noise (random ligations between pairs of loci). We tested to what extent CHESS is able to identify two matrices as being identical, after noise and sequencing depth were adjusted independently in them. Matrices are based on chromosome 19 data from Bonev et al.12. a, examples of the data with different amounts of noise. b, empirically determined P values and z-scores of CHESS runs with different window sizes, noise levels and simulated sequencing depths (details in Methods). Step size and matrix resolution were both 25 kb. Lines for 2 x 105 and 1 x 106 overlap for runs with window sizes > 1 Mb. c, As in panel a, but comparing CHESS runs with 2.5 Mb window size on matrices binned at 25 kb and 10 kb. b and c, solid lines indicate the mean, shaded areas the standard deviation over 1976, 2066, 2156, 2246, 2300 matrix pairs for window sizes 10 Mb, 7.5 Mb, 5 Mb, 2.5 Mb, 1 Mb, respectively.
Extended Data Fig. 5 Reproducibility of CHESS using different window (WS) and step sizes (SS), sequencing depths and resolutions.
For this analysis were tested the WS (250 kb - 3 Mb), SS (25 kb - 1 Mb), sequencing depths (percentage of reads between 20 and 80) and resolutions (10 kb and 25 kb) (details in Methods). X-axis labels: varied parameters in parentheses, fixed parameters before. The first two boxplots with red dots represent the Jaccard indices (JI) between CHESS results in Bonev et al.12 using different WS, SS and sequencing depths. The boxplots with blue dots correspond to the Díaz et al.48 dataset; in this case using different WS, SS, and then between different WS, SS and resolutions. mESC mouse embryonic stem cells, NPC neural progenitor cells. Boxplot elements: centre line: median, whiskers: 1.5x interquartile range, box limits: upper-lower quartile.
Extended Data Fig. 6 CHESS benchmark against HOMER, diffHiC and ACCOST.
a, Upset plot representing the intersection size between differential interactions of CHESS, HOMER, diffHiC and ACCOST. Below, an example is shown for each intersected group. b, Computational requirements of CHESS, HOMER, diffHiC and ACCOST. The first line plot shows the CPU usage, the second the memory consumption. The vertical dashed line represents the end of the run.
Extended Data Fig. 7 CHESS performance on differently sized simulated matrices with realistic noise and sequencing depth.
Shown are empirically determined CHESS p- and z-scores (details in Methods) for comparisons of R with a read depth of 100 read pairs / 100 bins and a resized copy Q. Scaling factor is indicated on the x-axis. A noise level of 25 % was added to both matrices independently. Sequencing depth was adjusted to 100 k/Mb. Solid lines indicate the mean, shaded areas the standard deviation over 100 simulations per parameter combination. Colours correspond to the different sizes of R.
Extended Data Fig. 8 Feature extraction from Capture-C data.
Examples of differential feature extraction with CHESS between the wt (top contact map) and different mutants (middle contact map) in the Despang et al.50 dataset. Lost and gained structures in the mutants are highlighted in blue and red squares, respectively. Log2 fold-change maps are depicted below (bottom contact map) with identified features coloured according to the directionality of the change. Below each comparison, the genomic annotation is represented, highlighting the modification of each mutant. The vertical lines define the CTCF binding motifs, dashed when deleted. Red hexagons demarcate TAD boundaries. Feature extraction between wt and a, ∆Bor, in which the border was deleted. b, ∆BorC1, in which the border and the first CTCF binding motif were deleted. c, ∆BorC1-2, in which the border and the two first CTCF binding motifs were deleted. d, ∆BorC1-4, in which the border and four CTCF binding motifs were deleted. e, ∆CTCF, in which the border and all the CTCF binding motifs were removed. f, Bor-KnockIn, in which the border was moved to a new location within the Sox9 locus. g, InvC∆Bor, in which the Sox9 sequence was inverted and the border was removed.
Supplementary information
Supplementary Information
Supplementary Figs. 1–5
Supplementary Table
Supplementary Table 1
Rights and permissions
About this article
Cite this article
Galan, S., Machnik, N., Kruse, K. et al. CHESS enables quantitative comparison of chromatin contact data and automatic feature extraction. Nat Genet 52, 1247–1255 (2020). https://doi.org/10.1038/s41588-020-00712-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-020-00712-y
This article is cited by
-
DiffDomain enables identification of structurally reorganized topologically associating domains
Nature Communications (2024)
-
KDM3B inhibitors disrupt the oncogenic activity of PAX3-FOXO1 in fusion-positive rhabdomyosarcoma
Nature Communications (2024)
-
High-resolution Hi-C maps highlight multiscale chromatin architecture reorganization during cold stress in Brachypodium distachyon
BMC Plant Biology (2023)
-
Revisiting the use of structural similarity index in Hi-C
Nature Genetics (2023)
-
A deep learning method for replicate-based analysis of chromosome conformation contacts using Siamese neural networks
Nature Communications (2023)