Abstract
We analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer genes carried probable driver mutations. Some non-coding regions exhibited high mutation frequencies, but most have distinctive structural features probably causing elevated mutation rates and do not contain driver mutations. Mutational signature analysis was extended to genome rearrangements and revealed twelve base substitution and six rearrangement signatures. Three rearrangement signatures, characterized by tandem duplications or deletions, appear associated with defective homologous-recombination-based DNA repair: one with deficient BRCA1 function, another with deficient BRCA1 or BRCA2 function, the cause of the third is unknown. This analysis of all classes of somatic mutation across exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operating, and progresses towards a comprehensive account of the somatic genetic basis of breast cancer.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
Data deposits
Raw data have been submitted to the European-Genome Phenome Archive under the overarching accession number EGAS00001001178 (please see Supplementary Notes for breakdown by data type). Somatic variants have been deposited at the International Cancer Genome Consortium Data Portal (https://dcc.icgc.org/).
Change history
18 January 2019
In the Methods section of this Article, 'greater than' should have been 'less than' in the sentence 'Putative regions of clustered rearrangements were identified as having an average inter-rearrangement distance that was at least 10 times greater than the whole-genome average for the individual sample.'. The Article has not been corrected.
References
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009)
Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012)
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012)
Hicks, J. et al. Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res. 16, 1465–1479 (2006)
Bergamaschi, A. et al. Extracellular matrix signature identifies breast cancer subgroups with different clinical outcome. J. Pathol. 214, 357–367 (2008)
Ching, H. C., Naidu, R., Seong, M. K., Har, Y. C. & Taib, N. A. Integrated analysis of copy number and loss of heterozygosity in primary breast carcinomas using high-density SNP array. Int. J. Oncol. 39, 621–633 (2011)
Fang, M. et al. Genomic differences between estrogen receptor (ER)-positive and ER-negative human breast carcinoma identified by single nucleotide polymorphism array comparative genome hybridization analysis. Cancer 117, 2024–2034 (2011)
Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012)
Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010)
Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010)
Banerji, S. et al. Sequence analysis of mutations and translocations across breast cancer subtypes. Nature 486, 405–409 (2012)
Ellis, M. J. et al. Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature 486, 353–360 (2012)
Shah, S. P. et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature 486, 395–399 (2012)
Stephens, P. J. et al. The landscape of cancer genes and mutational processes in breast cancer. Nature 486, 400–404 (2012)
The Cancer Genome Atlas Network Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012)
Wu, Y. M. et al. Identification of targetable FGFR gene fusions in diverse cancers. Cancer Discovery 3, 636–647 (2013)
Giacomini, C. P. et al. Breakpoint analysis of transcriptional and genomic profiles uncovers novel gene fusions spanning multiple human cancer types. PLoS Genet. 9, e1003464 (2013)
Robinson, D. R. et al. Functionally recurrent rearrangements of the MAST kinase and Notch gene families in breast cancer. Nature Med. 17, 1646–1651 (2011)
Karlsson, J. et al. Activation of human telomerase reverse transcriptase through gene fusion in clear cell sarcoma of the kidney. Cancer Lett. 357, 498–501 (2015)
Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013)
West, J. A. et al. The long noncoding RNAs NEAT1 and MALAT1 bind active chromatin sites. Mol. Cell 55, 791–802 (2014)
Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013)
Vinagre, J. et al. Frequency of TERT promoter mutations in human cancers. Nature Commun. 4, 2185 (2013)
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013)
Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. & Stratton, M. R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013)
Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014)
Natrajan, R. et al. Characterization of the genomic features and expressed fusion genes in micropapillary carcinomas of the breast. J. Pathol. 232, 553–565 (2014)
Kalyana-Sundaram, S. et al. Gene fusions associated with recurrent amplicons represent a class of passenger aberrations in breast cancer. Neoplasia 14, 702–708 (2012)
Tubio, J. M. Somatic structural variation and cancer. Brief. Func. Genomics 14, 339–351 (2015)
Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nature Genet. 46, 1160–1165 (2014)
Ussery, D. W., Binnewies, T. T., Gouveia-Oliveira, R., Jarmer, H. & Hallin, P. F. Genome update: DNA repeats in bacterial genomes. Microbiology 150, 3519–3521 (2004)
Lu, S. et al. Short inverted repeats are hotspots for genetic instability: relevance to cancer genomes. Cell Rep. 10, 1674–1680 (2015)
Voineagu, I., Narayanan, V., Lobachev, K. S. & Mirkin, S. M. Replication stalling at unstable inverted repeats: interplay between DNA hairpins and fork stabilizing proteins. Proc. Natl Acad. Sci. USA 105, 9936–9941 (2008)
Wojcik, E. A. et al. Direct and inverted repeats elicit genetic instability by both exploiting and eluding DNA double-strand break repair systems in mycobacteria. PLoS ONE 7, e51064 (2012)
Pearson, C. E., Zorbas, H., Price, G. B. & Zannis-Hadjopoulos, M. Inverted repeats, stem-loops, and cruciforms: significance for initiation of DNA replication. J. Cell. Biochem. 63, 1–22 (1996)
Kozak, M. Interpreting cDNA sequences: some insights from studies on translation. Mamm. Genome 7, 563–574 (1996)
Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nature Rev. Genet. 15, 585–598 (2014)
Birkbak, N. J. et al. Telomeric allelic imbalance indicates defective DNA repair and sensitivity to DNA-damaging agents. Cancer Disc. 2, 366–375 (2012)
Abkevich, V. et al. Patterns of genomic loss of heterozygosity predict homologous recombination repair defects in epithelial ovarian cancer. Br. J. Cancer 107, 1776–1782 (2012)
Popova, T. et al. Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas with BRCA1/2 inactivation. Cancer Res. 72, 5454–5462 (2012)
Puente, X. S. et al. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature 475, 101–105 (2011)
Morganella, S. A. et al. The topography of mutational processes in breast cancer genomes. Nature Commun. http://dx.doi.org/10.1038/ncomms11383 (2016)
Fong, P. C. et al. Inhibition of poly(ADP-ribose) polymerase in tumors from BRCA mutation carriers. N. Engl. J. Med. 361, 123–134 (2009)
Forster, M. D. et al. Treatment with olaparib in a patient with PTEN-deficient endometrioid endometrial cancer. Nature Rev. Clin. Oncol. 8, 302–306 (2011)
Turner, N., Tutt, A. & Ashworth, A. Targeting the DNA repair defect of BRCA tumours. Curr. Opin. Pharmacol. 5, 388–393 (2005)
Waddell, N. et al. Whole genomes redefine the mutational landscape of pancreatic cancer. Nature 518, 495–501 (2015)
Kozarewa, I. et al. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nature Methods 6, 291–295 (2009)
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009)
Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008)
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010)
Greenman, C., Wooster, R., Futreal, P. A., Stratton, M. R. & Easton, D. F. Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics 173, 2187–2198 (2006)
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013)
Sun, L., Craiu, R. V., Paterson, A. D. & Bull, S. B. Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genet. Epidemiol. 30, 519–530 (2006)
The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
Wilkerson, M. D. & Hayes, D. N. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26, 1572–1573 (2010)
Zhang, H., Meltzer, P. & Davis, S. RCircos: an R package for Circos 2D track plots. BMC Bioinformatics 14, 244 (2013)
Acknowledgements
This work has been funded through the ICGC Breast Cancer Working group by the Breast Cancer Somatic Genetics Study (BASIS), a European research project funded by the European Community’s Seventh Framework Programme (FP7/2010-2014) under the grant agreement number 242006; the Triple Negative project funded by the Wellcome Trust (grant reference 077012/Z/05/Z) and the HER2+ project funded by Institut National du Cancer (INCa) in France (grant numbers 226-2009, 02-2011, 41-2012, 144-2008, 06-2012). The ICGC Asian Breast Cancer Project was funded through a grant of the Korean Health Technology R&D Project, Ministry of Health and Welfare, Republic of Korea (A111218-SC01). Personally funded by grants above: F.G.R.-G., S.M., K.R., S.M. were funded by BASIS. Recruitment was performed under the auspices of the ICGC breast cancer projects run by the UK, France and Korea. For contributions towards instruments, specimens and collections: Tayside Tissue Bank (funded by CRUK, University of Dundee, Chief Scientist Office & Breast Cancer Campaign), Asan Bio-Resource Center of the Korea Biobank Network, Seoul, South Korea, OSBREAC consortium, The Icelandic Centre for Research (RANNIS), The Swedish Cancer Society and the Swedish Research Council, and Fondation Jean Dausset-Centre d’Etudes du polymorphisme humain. Icelandic Cancer Registry, The Brisbane Breast Bank (The University of Queensland, The Royal Brisbane and Women’s Hospital and QIMR Berghofer), Breast Cancer Tissue and Data Bank at KCL and NIHR Biomedical Research Centre at Guy’s and St Thomas’s Hospitals. Breakthrough Breast Cancer and Cancer Research UK Experimental Cancer Medicine Centre at KCL. For pathology review: The Mouse Genome Project and Department of Pathology, Cambridge University Hospitals NHS Foundation Trust for microscopes. A. Richardson, A. Ehinger, A. Vincent-Salomon, C. Van Deurzen, C. Purdie, D. Larsimont, D. Giri, D. Grabau, E. Provenzano, G. MacGrogan, G. Van den Eynden, I. Treilleux, J. E. Brock, J. Jacquemier, J. Reis-Filho, L. Arnould, L. Jones, M. van de Vijver, Ø. Garred, R. Salgado, S. Pinder, S. R. Lakhani, T. Sauer, V. Barbashina. Illumina UK Ltd for input on optimization of sequencing throughout this project. Wellcome Trust Sanger Institute Sequencing Core Facility, Core IT Facility and Cancer Genome Project Core IT team and Cancer Genome Project Core Laboratory team for general support. Personal funding: S.N.-Z. is a Wellcome Beit Fellow and personally funded by a Wellcome Trust Intermediate Fellowship (WT100183MA). L.B.A. is supported through a J. Robert Oppenheimer Fellowship at Los Alamos National Laboratory. A.L.R. is partially supported by the Dana-Farber/Harvard Cancer Center SPORE in Breast Cancer (NIH/NCI 5 P50 CA168504-02). D.G. was supported by the EU-FP7-SUPPRESSTEM project. A.S. was supported by Cancer Genomics Netherlands through a grant from the Netherlands Organisation of Scientific research (NWO). M.S. was supported by the EU-FP7-DDR response project. C.S. and C.D. are supported by a grant from the Breast Cancer Research Foundation. E.B. was funded by EMBL. C.S. is funded by FNRS (Fonds National de la Recherche Scientifique). S.J.J. is supported by Leading Foreign Research Institute Recruitment Program through the National Research Foundation of Republic Korea (NRF 2011-0030105). G.K. is supported by National Research Foundation of Korea (NRF) grants funded by the Korean government (NRF 2015R1A2A1A10052578). J.F. received funding from an ERC Advanced grant (no. 322737). For general contribution and administrative support: Fondation Synergie Lyon Cancer in France. J. G. Jonasson, Department of Pathology, University Hospital & Faculty of Medicine, University of Iceland. K. Ferguson, Tissue Bank Manager, Brisbane Breast Bank and The Breast Unit, The Royal Brisbane and Women's Hospital, Brisbane, Australia. The Oslo Breast Cancer Consortium of Norway (OSBREAC). Angelo Paradiso, IRCCS Istituto Tumori “Giovanni Paolo II”, Bari Italy. A. Vines for administratively supporting to identifying the samples, organizing the bank, and sending out the samples. M. Schlooz-Vries, J. Tol, H. van Laarhoven, F. Sweep, P. Bult in Nijmegen for contributions in Nijmegen. This research used resources provided by the Los Alamos National Laboratory Institutional Computing Program, which is supported by the US Department of Energy National Nuclear Security Administration under contract no. DE-AC52-06NA25396. Research performed at Los Alamos National Laboratory was carried out under the auspices of the National Nuclear Security Administration of the United States Department of Energy. N. Miller (in memoriam) for her contribution in setting up the clinical database. Finally, we would like to acknowledge all members of the ICGC Breast Cancer Working Group and ICGC Asian Breast Cancer Project.
Author information
Authors and Affiliations
Contributions
S.N.-Z., M.R.S. designed the study, analysed data and wrote the manuscript. H.D., J.S., M. Ramakrishna, D.G., X.Z. performed curation of data and contributed towards genomic and copy number analyses. M.S., A.B.B., M.R.A., O.C.L., A.L., M. Ringner, contributed towards curation and analysis of non-genomic data (transcriptomic, miRNA, methylation). I.M., L.B.A., D.C.W., P.V.L., S. Morganella, Y.S.J., contributed towards specialist analyses. G.T., G.K., A.L.R., A-L.B.-D., J.W.M.M., M.J.v.d.V., H.G.S., E.B., A. Borg., A.V., P.A.F., P.J.C., designed the study, drove the consortium and provided samples. S.Martin was the project coordinator. S.McL., S.O.M., K.R., contributed operationally. S.-M.A., S.B., J.E.B., A.Brooks., C.D., L.D., A.F., J.A.F., G.K.J.H., S.J.J., H.-Y.K., T.A.K., S.K., H.J.L., J.-Y.L., I.P., X.P., C.A.P., F.G.R.-G., G.R., A.M.S., P.T.S., O.A.S., S.T., I.T., G.G.V.d.E., P.V., A.V.-S., L.Y., C.C., L.v.V., A.T., S.K., B.K.T.T., J.J., N.t.U., C.S., P.N.S., S.V.L., S.R.L., J.E.E., A.M.T contributed pathology assessment and/or samples. A. Butler., S.D., M.G., D.R.J., Y.L., A.M., V.M., K.R., R.S., L.S., J.T. contributed IT processing and management expertise. All authors discussed the results and commented on the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Extended data figures and tables
Extended Data Figure 1 Landscape of driver mutations.
a, Summary of subtypes of cohort of 560 breast cancers. b, Driver mutations by mutation type. c, Distribution of rearrangements throughout the genome. Black line represents background rearrangement density (calculation based on rearrangement breakpoints in intergenic regions only). Red lines represent frequency of rearrangement within breast cancer genes.
Extended Data Figure 2 Rearrangements in oncogenes.
a, Variation in rearrangement and copy number events affecting ESR1. Clear amplification in top panel, transection of ESR1 in middle panel and focused tandem duplication events in bottom panel. b, Predicted outcomes of some rearrangements affecting ETV6. Red crosses indicate exons deleted as a result of rearrangements within the ETV6 genes, black dotted lines indicate rearrangement break points resulting in fusions between ETV6 and ERC, WNK1, ATP2B1 or LRP6. ETV6 domains indicated are: N-terminal (NT) pointed domain and E26 transformation-specific DNA binding domain (ETS).
Extended Data Figure 3 Recurrent non-coding events in breast cancers.
a, Manhattan plot demonstrating sites with most significant P values as identified by binning analysis. Purple highlighted sites were also detected by the method seeking recurrence when partitioned by genomic features. b, Locus at chr11 65 Mb, which was identified by independent analyses as being more mutated than expected by chance. Bottom, a rearrangement hotspot analysis identified this region as a tandem duplication hotspot, with nested tandem duplications noted at this site. Partitioning the genome into different regulatory elements, an analysis of substitutions and indels identified lncRNAs MALAT1 and NEAT1 (topmost panels) with significant P values.
Extended Data Figure 4 Copy number analyses.
a, Frequency of copy number aberrations across the cohort. Chromosome position along x axis, frequency of copy number gains (red) and losses (green) y axis. b, Identification of focal recurrent copy number gains by the GISTIC method (Supplementary Methods). c, Identification of focal recurrent copy number losses by the GISTIC method. d, Heatmap of GISTIC regions following unsupervised hierarchical clustering. Five cluster groups are noted and relationships with expression subtype (basal, red; luminal B, light blue; luminal A, dark blue), immunohistopathology status (ER, PR, HER2 status; black, positive), abrogation of BRCA1 (red) and BRCA2 (blue) (whether germline, somatic or through promoter hypermethylation), driver mutations (black, positive), HRD index (top 25% or lowest 25%; black, positive).
Extended Data Figure 5 miRNA analyses.
Hierarchical clustering of the most variant miRNAs using complete linkage and Euclidean distance. miRNA clusters were assigned using the partitioning algorithm using recursive thresholding (PART) method. Five main patient clusters were revealed. The horizontal annotation bars show (from top to bottom): PART cluster group, PAM50 mRNA expression subtype, GISTIC cluster, rearrangement cluster, lymphocyte infiltration score and histological grade. The heatmap shows clustered and centred miRNA expression data (log2 transformed). Details on colour coding of the annotation bars are presented below the heatmap.
Extended Data Figure 6 Rearrangement cluster groups and associated features.
a, Overall survival (OS) by rearrangement cluster group. b, Age of diagnosis. c, Tumour grade. d, Menopausal status. e, ER status. f, Immune response metagene panel. g, Lymphocytic infiltration score.
Extended Data Figure 7 Contrasting tandem duplication phenotypes.
Contrasting tandem duplication phenotypes of two breast cancers using chromosome X. Copy number (y axis) depicted as black dots. Lines represent rearrangements breakpoints (green, tandem duplications; pink, deletions; blue, inversions; black, translocations with partner breakpoint provided). Top, PD4841a has numerous large tandem duplications (>100 kb, rearrangement signature 1), whereas PD4833a has many short tandem duplications (<10 kb, rearrangement signature 3) appearing as ‘single’ lines in its plot.
Extended Data Figure 8 Hotspots of tandem duplications.
A tandem duplication hotspot occurring in six different patients.
Extended Data Figure 9 Rearrangement breakpoint junctions.
a, Breakpoint features of rearrangements in 560 breast cancers by rearrangement signature. b, Breakpoint features in BRCA and non-BRCA cancers.
Extended Data Figure 10 Signatures of focal hypermutation.
a, Kataegis and alternative kataegis occurring at the same locus (ERBB2 amplicon in PD13164a). Copy number (y axis) depicted as black dots. Lines represent rearrangements breakpoints (green, tandem duplications; pink, deletions; blue, inversions). Top, an ~10 Mb region including the ERBB2 locus. Middle, zoomed-in tenfold to an ~1 Mb window highlighting co-occurrence of rearrangement breakpoints, with copy number changes and three different kataegis loci. Bottom, demonstrates kataegis loci in more detail. log10 intermutation distance on y axis. Black arrow, kataegis; blue arrows, alternative kataegis. b, Sequence context of kataegis and alternative kataegis identified in this data set.
Supplementary information
Supplementary Information
This file contains Supplementary Methods and Data and additional references. (PDF 2344 kb)
Supplementary Information
This file contains some acknowledgements and the EGA accession numbers. (PDF 181 kb)
Supplementary Tables
This file zipped contains Supplementary Tables 1-21. (ZIP 42202 kb)
Rights and permissions
About this article
Cite this article
Nik-Zainal, S., Davies, H., Staaf, J. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016). https://doi.org/10.1038/nature17676
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature17676
This article is cited by
-
Human papillomavirus E7 protein induces homologous recombination defects and PARPi sensitivity
Journal of Cancer Research and Clinical Oncology (2025)
-
Mesoscale DNA features impact APOBEC3A and APOBEC3B deaminase activity and shape tumor mutational landscapes
Nature Communications (2024)
-
Melvin is a conversational voice interface for cancer genomics data
Communications Biology (2024)
-
Evolving copy number gains promote tumor expansion and bolster mutational diversification
Nature Communications (2024)
-
Prognosis prediction and risk stratification of breast cancer patients based on a mitochondria-related gene signature
Scientific Reports (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.