Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer

Katainen, Riku; Donner, Iikki; Cajuso, Tatiana; Kaasinen, Eevi; Palin, Kimmo; Mäkinen, Veli; Aaltonen, Lauri A.; Pitkänen, Esa

doi:10.1038/s41596-018-0052-3

Protocol
Published: 15 October 2018

Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer

Riku Katainen^1,2,
Iikki Donner^1,2,
Tatiana Cajuso^1,2,
Eevi Kaasinen^1,2,
Kimmo Palin^1,2,
Veli Mäkinen³,
Lauri A. Aaltonen^1,2 &
…
Esa Pitkänen^1,2,4

Nature Protocols volume 13, pages 2580–2600 (2018)Cite this article

3006 Accesses
25 Citations
15 Altmetric
Metrics details

Subjects

Abstract

Next-generation sequencing (NGS) is routinely applied in life sciences and clinical practice, but interpretation of the massive quantities of genomic data produced has become a critical challenge. The genome-wide mutation analyses enabled by NGS have had a revolutionary impact in revealing the predisposing and driving DNA alterations behind a multitude of disorders. The workflow to identify causative mutations from NGS data, for example in cancer and rare diseases, commonly involves phases such as quality filtering, case–control comparison, genome annotation, and visual validation, which require multiple processing steps and usage of various tools and scripts. To this end, we have introduced an interactive and user-friendly multi-platform-compatible software, BasePlayer, which allows scientists, regardless of bioinformatics training, to carry out variant analysis in disease genetics settings. A genome-wide scan of regulatory regions for mutation clusters can be carried out with a desktop computer in ~10 min with a dataset of 3 million somatic variants in 200 whole-genome-sequenced (WGS) cancers.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of the NGS data analysis capabilities and features of BasePlayer.**

**Fig. 2: The main window of BasePlayer, displaying three samples, a genomic region track and a population control data track.**

**Fig. 3: Variant Manager user interface and functions.**

**Fig. 4: Candidate genes in the result table and variant visualization.**

**Fig. 5: Somatic variants in the regulatory genome.**

**Fig. 6: Variant Manager settings in somatic cluster analysis.**

**Fig. 8: Variant quality validation by read-level inspection.**

ParseCNV2: efficient sequencing tool for copy number variation genome-wide association studies

Article 01 November 2022

GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data

Article 21 August 2023

Sharing genetic variants with the NGS pipeline is essential for effective genomic data sharing and reproducibility in health information exchange

Article Open access 26 January 2021

Data availability

No previously unpublished data sets were generated or analyzed during the current study.

References

Koboldt, D. C., Steinberg, K. M., Larson, D. E., Wilson, R. K. & Mardis, E. R. The next-generation sequencing revolution and its impact on genomics. Cell 155, 27–38 (2013).
Article CAS PubMed PubMed Central Google Scholar
Boycott, K. M., Vanstone, M. R., Bulman, D. E. & MacKenzie, A. E. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat. Rev. Genet. 14, 681–691 (2013).
Article CAS PubMed Google Scholar
Sabarinathan, R. et al. The whole-genome panorama of cancer drivers. Preprint at https://www.biorxiv.org/content/early/2017/09/20/190330 (2017).
Steensma, D. P. et al. Clonal hematopoiesis of indeterminate potential and its distinction from myelodysplastic syndromes. Blood 126, 9–16 (2015).
Article CAS PubMed PubMed Central Google Scholar
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Article CAS PubMed Google Scholar
Alioto, T. S. et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 6, 10001 (2015).
Article CAS PubMed Google Scholar
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
Article CAS PubMed Google Scholar
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Article CAS PubMed PubMed Central Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS PubMed PubMed Central Google Scholar
Donner, I. et al. Candidate susceptibility variants for esophageal squamous cell carcinoma. Genes Chromosomes Cancer 56, 453–459 (2017).
Article CAS PubMed Google Scholar
Katainen, R. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821 (2015).
Article CAS PubMed Google Scholar
Kondelin, J. et al. Comprehensive evaluation of protein coding mononucleotide microsatellites in microsatellite-unstable colorectal cancer. Cancer Res. 77, 4078–4088 (2017).
Article CAS PubMed Google Scholar
Hänninen, U. A. et al. Exome-wide somatic mutation characterization of small bowel adenocarcinoma. PLoS Genet. 14.3, e1007200 (2018).
Article Google Scholar
Pradhan, B. et al. Detection of subclonal L1 transductions in colorectal cancer by long-distance inverse-PCR and Nanopore sequencing. Sci. Rep. 7, 14521 (2017).
Article PubMed PubMed Central Google Scholar
Aavikko, M. et al. Loss of SUFU function in familial multiple meningioma. Am. J. Hum. Genet. 91, 520–526 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article Google Scholar
Jolma, A. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388 (2015).
Article CAS PubMed Google Scholar
Alston, C. L., Rocha, M. C., Lax, N. Z., Turnbull, D. M. & Taylor, R. W. The genetics and pathology of mitochondrial disease. J. Pathol. 241, 236–250 (2017).
Article CAS PubMed Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. The variant call format and VCF tools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Article CAS PubMed PubMed Central Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Article CAS PubMed PubMed Central Google Scholar
Milne, I. et al. Tablet—next generation sequence assembly visualization. Bioinformatics 26, 401–402 (2009).
Article PubMed PubMed Central Google Scholar
Carver, T., Harris, S. R., Berriman, M., Parkhill, J. & McQuillan, J. A. Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics 28, 464–469 (2011).
Article PubMed PubMed Central Google Scholar
Fiume, M. et al. Savant Genome Browser 2: visualization and analysis for population-scale genomics. Nucleic Acids Res. 40, W615–W621 (2012).
Article CAS PubMed PubMed Central Google Scholar
Abeel, T., Van Parys, T., Saeys, Y., Galagan, J. & Van de Peer, Y. GenomeView: a next-generation genome browser. Nucleic Acids Res. 40, e12 (2011).
Article PubMed PubMed Central Google Scholar
Wöste, M. & Dugas, M. VIPER: a web application for rapid expert review of variant calls. Bioinformatics 34, 1928-1929 (2018).
Article PubMed PubMed Central Google Scholar
Kallio, M. A. et al. Chipster: user-friendly analysis software for microarray and other high-throughput data. BMC Genomics 12, 1 (2011).
Article Google Scholar
Mularoni, L., Sabarinathan, R., Deu-Pons, J., Gonzalez-Perez, A. & López-Bigas, N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 17, 128 (2016).
Article PubMed PubMed Central Google Scholar
Smedley, D. et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat. Protoc. 10, 2004–2015 (2015).
Article CAS PubMed PubMed Central Google Scholar
Fu, Y. et al. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15, 480 (2014).
Article PubMed PubMed Central Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Article CAS PubMed PubMed Central Google Scholar
Quang, D., Chen, Y. & Xie, X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31, 761–763 (2014).
Article PubMed PubMed Central Google Scholar
Jagadeesh, K. A. et al. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat. Genet. 48, 1581–1586 (2016).
Article CAS PubMed Google Scholar
Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).
Article PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hoffman, M. M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).
Article CAS PubMed Google Scholar
Mathelier, A. et al. JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44, D110–D115 (2016).
Article CAS PubMed Google Scholar
Korhonen, J., Martinmäki, P., Pizzi, C., Rastas, P. & Ukkonen, E. MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics 25, 3181–3182 (2009).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank T. Kivioja for his guidance in regard to the SELEX data and A. Ollikainen for the voice-over in the demonstration videos. We thank B. Pradhan and L. Kauppi for sharing their unpublished Nanopore data. We also thank M. Aavikko, L. van den Berg, D. Berta, O. Kilpivaara, J. Kondelin, H. Kuisma, Y. Li, M. Mehine, H. Metsola, J. Ravantti, L. Sipilä, T. Tanskanen, P. Vahteristo and N. Välimäki for testing BasePlayer and giving suggestions and additional support. We acknowledge ZeroTurnaround for creating the JRebel plugin for Eclipse (IDE). This work was supported by grants from the Biomedicum Helsinki Foundation; the Cancer Society of Finland; the Emil Aaltonen Foundation; the Juhani Aho Foundation for Medical Research; the Sigrid Juselius Foundation; the Academy of Finland (Finnish Center of Excellence Program 2012–2017, 250345); the European Research Council (ERC, 268648); a European Union Framework Programme 7 Collaborative Project (SYSCOL, 258236); the Nordic Information for Action eScience Center (NIASC); and a Nordic Center of Excellence grant financed by NordForsk (62721 to K.P.).

Author information

Authors and Affiliations

Genome-Scale Biology Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland
Riku Katainen, Iikki Donner, Tatiana Cajuso, Eevi Kaasinen, Kimmo Palin, Lauri A. Aaltonen & Esa Pitkänen
Department of Medical and Clinical Genetics, Medicum, University of Helsinki, Helsinki, Finland
Riku Katainen, Iikki Donner, Tatiana Cajuso, Eevi Kaasinen, Kimmo Palin, Lauri A. Aaltonen & Esa Pitkänen
Department of Computer Science and Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland
Veli Mäkinen
Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
Esa Pitkänen

Authors

Riku Katainen
View author publications
You can also search for this author in PubMed Google Scholar
Iikki Donner
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Cajuso
View author publications
You can also search for this author in PubMed Google Scholar
Eevi Kaasinen
View author publications
You can also search for this author in PubMed Google Scholar
Kimmo Palin
View author publications
You can also search for this author in PubMed Google Scholar
Veli Mäkinen
View author publications
You can also search for this author in PubMed Google Scholar
Lauri A. Aaltonen
View author publications
You can also search for this author in PubMed Google Scholar
Esa Pitkänen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.K. designed and developed the protocol. R.K. and E.P. wrote the protocol. I.D. contributed to writing the protocol. I.D., T.C., E.K. and K.P. assisted in developing and testing the software. E.P., V.M. and L.A.A. supervised the research.

Corresponding authors

Correspondence to Riku Katainen or Esa Pitkänen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 BasePlayer settings for variant analysis in recessive case.

A family trio and gnomAD exome control files are opened. Son (uppermost sample track) is set as an affected male. The parents are selected accordingly from the dropdown menus. “Recessive” checkbox is selected in “Inheritance” tab of the “Variant Manager”.

Supplementary Figure 2 Visualization of long-read sequencing data in BasePlayer.

Three split views are shown, tracking the split mappings for a single long read. An inset info panel shows information on the selected read and a schematic illustration of split read orientations (bottom of the info panel).

Supplementary Figure 3 TF binding affinity change prediction settings in BasePlayer.

(a) Affinity change annotation settings without variant filtering. (b) Affinity change annotation settings with variant filtering. Value limit is set to “1”. (c) Annotation results. Affinity change for each overlapping TFs are shown in the variant row of the result table (red circle). In the circled case, the variant occurs at the HOXD12 binding site, which has affinity score of 6.57 at that locus and variant decreases the binding affinity by 1.52. (d) TF motif and variant visualization at sequence level zoom. Affinity changes for each overlapping TFs are reported in “VCF info” dialog (bottom-right) if the track is applied and “report affinity change” is selected in the track settings.

Supplementary Figure 4 CADD prediction settings in BasePlayer.

(a) The column selector for TSV files. (b) Selected column headers for the CADD TSV file. (c) Track settings for the CADD annotation. (d) Annotation results. CADD annotation is shown in the variant row of the result table (red circle).

Supplementary Figure 5 Variant analysis steps in Procedure Case 1.

The effects of filtering, comparison and annotation on variant counts (top right corner of the Variant Manager) in chromosome 10. (a) Initial setup with no filters applied. (b) Quality and coverage filtering thresholds are set. (c) Only coding variants shared by all samples are visible. (d) Linkage compatible regions applied. Variants outside these regions are excluded. (e) Control file (gnomAD exomes) applied, resulting in one shared variant.

Supplementary Figure 6 M-CAP annotation settings in BasePlayer.

(a) Column selector for M-CAP file. Fourth column is set as “Base”. (b) Track settings for M-CAP track. Value limit is set to 0.025 and “Intersect” is unselected. “File format” button opens the “Column selector”.

Supplementary Figure 7 ClinVar annotation settings in BasePlayer.

“Annotation” checkbox is selected for the VCF track.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7, Supplementary Table 1 and Supplementary Tutorials 1–5

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Katainen, R., Donner, I., Cajuso, T. et al. Discovery of potential causative mutations in human coding and noncoding genome with the interactive software BasePlayer. Nat Protoc 13, 2580–2600 (2018). https://doi.org/10.1038/s41596-018-0052-3

Download citation

Published: 15 October 2018
Issue Date: November 2018
DOI: https://doi.org/10.1038/s41596-018-0052-3

This article is cited by

Whole-exome sequencing reveals candidate high-risk susceptibility genes for endometriosis
- Susanna Nousiainen
- Outi Kuismin
- Pia Vahteristo
Human Genomics (2023)
Clinically relevant germline variants in allogeneic hematopoietic stem cell transplant recipients
- Atte K. Lahtinen
- Jessica Koski
- Ulla Wartiovaara-Kautto
Bone Marrow Transplantation (2023)
Enrichment of cancer-predisposing germline variants in adult and pediatric patients with acute lymphoblastic leukemia
- Suvi P. M. Douglas
- Atte K. Lahtinen
- Outi Kilpivaara
Scientific Reports (2022)
A novel uterine leiomyoma subtype exhibits NRF2 activation and mutations in genes associated with neddylation of the Cullin 3-RING E3 ligase
- Miika Mehine
- Terhi Ahvenainen
- Pia Vahteristo
Oncogenesis (2022)
Parity associates with chromosomal damage in uterine leiomyomas
- Heli Kuisma
- Simona Bramante
- Lauri A. Aaltonen
Nature Communications (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.