Abstract
We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Nowell, P.C. Science 194, 23–28 (1976).
Aparicio, S. & Caldas, C. N. Engl. J. Med. 368, 842–851 (2013).
Greaves, M. & Maley, C.C. Nature 481, 306–313 (2012).
Shah, S.P. et al. Nature 486, 395–399 (2012).
Ding, L. et al. Nature 481, 506–510 (2012).
Nik-Zainal, S. et al. Cell 149, 994–1007 (2012).
Carter, S.L. et al. Nat. Biotechnol. 30, 413–421 (2012).
Govindan, R. et al. Cell 150, 1121–1134 (2012).
Shah, S.P. et al. Nature 461, 809–813 (2009).
Gerlinger, M. et al. N. Engl. J. Med. 366, 883–892 (2012).
The 1000 Genomes Project Consortium. Nature 467, 1061–1073 (2010).
Harismendy, O. et al. Genome Biol. 12, R124 (2011).
Rosenberg, A. & Hirschberg, J. in Proc. 2007 Joint Conf. Empir. Methods Natural Lang. Process. Comput. Natural Lang. Learn. (EMNLP-CoNLL) Vol. 410, 420 (2007).
Bashashati, A. et al. J. Pathol. 231, 21–34 (2013).
Forshew, T. et al. Sci. Transl. Med. 4, 136ra68 (2012).
Dawson, S.J. et al. N. Engl. J. Med. 368, 1199–1209 (2013).
Sottoriva, A. et al. Proc. Natl. Acad. Sci. USA 110, 4009–4014 (2013).
Fritsch, A. & Ickstadt, K. Bayesian Anal. 4, 367–392 (2009).
Ng, S.B. et al. Nature 461, 272–276 (2009).
Van Loo, P. et al. Proc. Natl. Acad. Sci. USA 107, 16910–16915 (2010).
Greenman, C.D. et al. Biostatistics 11, 164–175 (2010).
Yau, C. et al. Genome Biol. 11, R92 (2010).
Untergasser, A. et al. Nucleic Acids Res. 40, e115 (2012).
Li, H. & Durbin, R. Bioinformatics 26, 589–595 (2010).
Acknowledgements
This work is funded by Canadian Institutes for Health Research (CIHR), Genome Canada, Genome British Columbia, Canadian Cancer Society Research Institute and Canadian Breast Cancer Foundation grants to S.P.S. and S.A. S.P.S. is supported by the Michael Smith Foundation for Health Research and is the Canada Research Chair (CRC) for Computational Cancer Genomics. S.A. is the CRC for Molecular Oncology. A.R. is supported by a CIHR Banting scholarship.
Author information
Authors and Affiliations
Contributions
Project conception and oversight: S.P.S., S.A., A.R.; method development: A.R., A.B.-C., S.P.S.; implementation and benchmarking: A.R.; manuscript writing and editing, study design and execution: A.R., A.B.C., S.P.S., S.A.; single-cell sequencing: J.K., D.Y., A.W., E.L., J.B.; data analysis and interpretation: G.H.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–14, Supplementary Results, Supplementary Discussion and Supplementary Note (PDF 5370 kb)
Supplementary Table 1
Allelic counts, IBBMM and PyClone PCN cellular prevalence estimates for mutations in high grade serous ovarian cancer case 2. Copy number predictions where inferred using PICNIC as described in the Online Methods. Cellular prevalences where computed by taking the mean of the post burnin trace for the cellular prevalences for the respective methods. The standard deviation of the cellular prevalence parameter estimated from the post burnin trace is also included. Cluster ids (last two columns) were predicted from the post burnin trace using the MPEAR clustering criteria as described in the Online Methods and Supplementary Note. Mutation ids list gene name, chromosome and chromosome coordinate. All coordinates are in the hg19 coordinate system. (XLS 50 kb)
Supplementary Table 2
Allelic counts, IBBMM and PyClone PCN cellular prevalence estimates for mutations in high grade serous ovarian cancer case 1. Copy number predictions where inferred using PICNIC as described in the Online Methods. Cellular prevalences where computed by taking the mean of the post burnin trace for the cellular prevalences for the respective methods. The standard deviation of the cellular prevalence parameter estimated from the post burnin trace is also included. Cluster ids (last two columns) were predicted from the post burnin trace using the MPEAR clustering criteria as described in the Online Methods and Supplementary Note. Mutation ids list gene name, chromosome and chromosome coordinate. All coordinates are in the hg19 coordinate system. (XLSX 40 kb)
Rights and permissions
About this article
Cite this article
Roth, A., Khattra, J., Yap, D. et al. PyClone: statistical inference of clonal population structure in cancer. Nat Methods 11, 396–398 (2014). https://doi.org/10.1038/nmeth.2883
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.2883
This article is cited by
-
CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction
Nature Protocols (2024)
-
Multiregion sampling of de novo metastatic prostate cancer reveals complex polyclonality and augments clinical genotyping
Nature Cancer (2024)
-
Adoptive neoantigen-reactive T cell therapy: improvement strategies and current clinical researches
Biomarker Research (2023)
-
ACT-Discover: identifying karyotype heterogeneity in pancreatic cancer evolution using ctDNA
Genome Medicine (2023)
-
The heterogeneity and clonal evolution analysis of the advanced prostate cancer with castration resistance
Journal of Translational Medicine (2023)