Abstract
Next-generation sequencing (NGS) is routinely applied in life sciences and clinical practice, but interpretation of the massive quantities of genomic data produced has become a critical challenge. The genome-wide mutation analyses enabled by NGS have had a revolutionary impact in revealing the predisposing and driving DNA alterations behind a multitude of disorders. The workflow to identify causative mutations from NGS data, for example in cancer and rare diseases, commonly involves phases such as quality filtering, case–control comparison, genome annotation, and visual validation, which require multiple processing steps and usage of various tools and scripts. To this end, we have introduced an interactive and user-friendly multi-platform-compatible software, BasePlayer, which allows scientists, regardless of bioinformatics training, to carry out variant analysis in disease genetics settings. A genome-wide scan of regulatory regions for mutation clusters can be carried out with a desktop computer in ~10 min with a dataset of 3 million somatic variants in 200 whole-genome-sequenced (WGS) cancers.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
No previously unpublished data sets were generated or analyzed during the current study.
References
Koboldt, D. C., Steinberg, K. M., Larson, D. E., Wilson, R. K. & Mardis, E. R. The next-generation sequencing revolution and its impact on genomics. Cell 155, 27–38 (2013).
Boycott, K. M., Vanstone, M. R., Bulman, D. E. & MacKenzie, A. E. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat. Rev. Genet. 14, 681–691 (2013).
Sabarinathan, R. et al. The whole-genome panorama of cancer drivers. Preprint at https://www.biorxiv.org/content/early/2017/09/20/190330 (2017).
Steensma, D. P. et al. Clonal hematopoiesis of indeterminate potential and its distinction from myelodysplastic syndromes. Blood 126, 9–16 (2015).
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Alioto, T. S. et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 6, 10001 (2015).
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Donner, I. et al. Candidate susceptibility variants for esophageal squamous cell carcinoma. Genes Chromosomes Cancer 56, 453–459 (2017).
Katainen, R. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821 (2015).
Kondelin, J. et al. Comprehensive evaluation of protein coding mononucleotide microsatellites in microsatellite-unstable colorectal cancer. Cancer Res. 77, 4078–4088 (2017).
Hänninen, U. A. et al. Exome-wide somatic mutation characterization of small bowel adenocarcinoma. PLoS Genet. 14.3, e1007200 (2018).
Pradhan, B. et al. Detection of subclonal L1 transductions in colorectal cancer by long-distance inverse-PCR and Nanopore sequencing. Sci. Rep. 7, 14521 (2017).
Aavikko, M. et al. Loss of SUFU function in familial multiple meningioma. Am. J. Hum. Genet. 91, 520–526 (2012).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Jolma, A. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388 (2015).
Alston, C. L., Rocha, M. C., Lax, N. Z., Turnbull, D. M. & Taylor, R. W. The genetics and pathology of mitochondrial disease. J. Pathol. 241, 236–250 (2017).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Danecek, P. et al. The variant call format and VCF tools. Bioinformatics 27, 2156–2158 (2011).
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Milne, I. et al. Tablet—next generation sequence assembly visualization. Bioinformatics 26, 401–402 (2009).
Carver, T., Harris, S. R., Berriman, M., Parkhill, J. & McQuillan, J. A. Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics 28, 464–469 (2011).
Fiume, M. et al. Savant Genome Browser 2: visualization and analysis for population-scale genomics. Nucleic Acids Res. 40, W615–W621 (2012).
Abeel, T., Van Parys, T., Saeys, Y., Galagan, J. & Van de Peer, Y. GenomeView: a next-generation genome browser. Nucleic Acids Res. 40, e12 (2011).
Wöste, M. & Dugas, M. VIPER: a web application for rapid expert review of variant calls. Bioinformatics 34, 1928-1929 (2018).
Kallio, M. A. et al. Chipster: user-friendly analysis software for microarray and other high-throughput data. BMC Genomics 12, 1 (2011).
Mularoni, L., Sabarinathan, R., Deu-Pons, J., Gonzalez-Perez, A. & López-Bigas, N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 17, 128 (2016).
Smedley, D. et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat. Protoc. 10, 2004–2015 (2015).
Fu, Y. et al. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15, 480 (2014).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Quang, D., Chen, Y. & Xie, X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31, 761–763 (2014).
Jagadeesh, K. A. et al. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat. Genet. 48, 1581–1586 (2016).
Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).
Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Hoffman, M. M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).
Mathelier, A. et al. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44, D110–D115 (2016).
Korhonen, J., Martinmäki, P., Pizzi, C., Rastas, P. & Ukkonen, E. MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25, 3181–3182 (2009).
Acknowledgements
We thank T. Kivioja for his guidance in regard to the SELEX data and A. Ollikainen for the voice-over in the demonstration videos. We thank B. Pradhan and L. Kauppi for sharing their unpublished Nanopore data. We also thank M. Aavikko, L. van den Berg, D. Berta, O. Kilpivaara, J. Kondelin, H. Kuisma, Y. Li, M. Mehine, H. Metsola, J. Ravantti, L. Sipilä, T. Tanskanen, P. Vahteristo and N. Välimäki for testing BasePlayer and giving suggestions and additional support. We acknowledge ZeroTurnaround for creating the JRebel plugin for Eclipse (IDE). This work was supported by grants from the Biomedicum Helsinki Foundation; the Cancer Society of Finland; the Emil Aaltonen Foundation; the Juhani Aho Foundation for Medical Research; the Sigrid Juselius Foundation; the Academy of Finland (Finnish Center of Excellence Program 2012–2017, 250345); the European Research Council (ERC, 268648); a European Union Framework Programme 7 Collaborative Project (SYSCOL, 258236); the Nordic Information for Action eScience Center (NIASC); and a Nordic Center of Excellence grant financed by NordForsk (62721 to K.P.).
Author information
Authors and Affiliations
Contributions
R.K. designed and developed the protocol. R.K. and E.P. wrote the protocol. I.D. contributed to writing the protocol. I.D., T.C., E.K. and K.P. assisted in developing and testing the software. E.P., V.M. and L.A.A. supervised the research.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
1. Katainen, R. et al. Nat. Genet. 47, 818–821 (2015): https://doi.org/10.1038/ng.3335
2. Pradhan, B. et al. Sci. Rep. 7, 14521 (2017): https://doi.org/10.1038/s41598-017-15076-3
3. Donner, I. et al. Genes Chromosomes Cancer 56, 453–459 (2017): https://doi.org/10.1002/gcc.22448
Integrated supplementary information
Supplementary Figure 1 BasePlayer settings for variant analysis in recessive case.
A family trio and gnomAD exome control files are opened. Son (uppermost sample track) is set as an affected male. The parents are selected accordingly from the dropdown menus. “Recessive” checkbox is selected in “Inheritance” tab of the “Variant Manager”.
Supplementary Figure 2 Visualization of long-read sequencing data in BasePlayer.
Three split views are shown, tracking the split mappings for a single long read. An inset info panel shows information on the selected read and a schematic illustration of split read orientations (bottom of the info panel).
Supplementary Figure 3 TF binding affinity change prediction settings in BasePlayer.
(a) Affinity change annotation settings without variant filtering. (b) Affinity change annotation settings with variant filtering. Value limit is set to “1”. (c) Annotation results. Affinity change for each overlapping TFs are shown in the variant row of the result table (red circle). In the circled case, the variant occurs at the HOXD12 binding site, which has affinity score of 6.57 at that locus and variant decreases the binding affinity by 1.52. (d) TF motif and variant visualization at sequence level zoom. Affinity changes for each overlapping TFs are reported in “VCF info” dialog (bottom-right) if the track is applied and “report affinity change” is selected in the track settings.
Supplementary Figure 4 CADD prediction settings in BasePlayer.
(a) The column selector for TSV files. (b) Selected column headers for the CADD TSV file. (c) Track settings for the CADD annotation. (d) Annotation results. CADD annotation is shown in the variant row of the result table (red circle).
Supplementary Figure 5 Variant analysis steps in Procedure Case 1.
The effects of filtering, comparison and annotation on variant counts (top right corner of the Variant Manager) in chromosome 10. (a) Initial setup with no filters applied. (b) Quality and coverage filtering thresholds are set. (c) Only coding variants shared by all samples are visible. (d) Linkage compatible regions applied. Variants outside these regions are excluded. (e) Control file (gnomAD exomes) applied, resulting in one shared variant.
Supplementary Figure 6 M-CAP annotation settings in BasePlayer.
(a) Column selector for M-CAP file. Fourth column is set as “Base”. (b) Track settings for M-CAP track. Value limit is set to 0.025 and “Intersect” is unselected. “File format” button opens the “Column selector”.
Supplementary Figure 7 ClinVar annotation settings in BasePlayer.
“Annotation” checkbox is selected for the VCF track.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7, Supplementary Table 1 and Supplementary Tutorials 1–5
Rights and permissions
About this article
Cite this article
Katainen, R., Donner, I., Cajuso, T. et al. Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer. Nat Protoc 13, 2580–2600 (2018). https://doi.org/10.1038/s41596-018-0052-3
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-018-0052-3
This article is cited by
-
Whole-exome sequencing reveals candidate high-risk susceptibility genes for endometriosis
Human Genomics (2023)
-
Clinically relevant germline variants in allogeneic hematopoietic stem cell transplant recipients
Bone Marrow Transplantation (2023)
-
Enrichment of cancer-predisposing germline variants in adult and pediatric patients with acute lymphoblastic leukemia
Scientific Reports (2022)
-
A novel uterine leiomyoma subtype exhibits NRF2 activation and mutations in genes associated with neddylation of the Cullin 3-RING E3 ligase
Oncogenesis (2022)
-
Parity associates with chromosomal damage in uterine leiomyomas
Nature Communications (2021)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.