To the Editor:
Identification of pathogenic DNA sequence alterations in patients with inherited diseases is one of the main tasks of human genetics. Next-generation sequencing (NGS) techniques enable sequencing of hundreds of candidate genes, whole linkage intervals or the entire exome. This inevitably leads to the detection of vast numbers of alterations, all of which have to be tested for their disease-causing potential. A recent study revealed more than 3.5 million alterations in the whole genome of a single individual, roughly corresponding to 1,000 alterations per mega–base pair1.
Automated pre-evaluation of sequence variations can help to direct the subsequent in-depth analysis to the most promising candidates, hence saving time and resources. However, the currently available evaluation tools predict only the outcome of amino-acid exchanges and cannot process thousands of queries in a reasonable time.
To meet the challenges of handling high-throughput sequencing data, we developed MutationTaster, a free, web-based application for rapid evaluation of the disease-causing potential of DNA sequence alterations. MutationTaster integrates information from different biomedical databases and uses established analysis tools (Supplementary Methods). Analyses comprise evolutionary conservation, splice-site changes, loss of protein features and changes that might affect the amount of mRNA. Test results are then evaluated by a naive Bayes classifier2, which predicts the disease potential. A typical query is completed in less than 0.3 seconds.
Depending on the nature of the alteration, MutationTaster chooses between three different prediction models, which are either aimed at 'silent' synonymous or intronic alterations (without_aae), at alterations affecting a single amino acid (simple_aae) or at alterations causing complex changes in the amino acid sequence (complex_aae).
To train the classifier, we generated a dataset with all available and suitable common polymorphisms and known disease-causing mutations extracted from common databases and the literature. We cross-validated the classifier five times including all three prediction models and obtained an overall accuracy of 91.1 ± 0.1%. We also compared MutationTaster with similar applications (Panther3, Pmut4, PolyPhen and PolyPhen-2 (ref. 5) and 'screening for non-acceptable polymorphisms' (SNAP)6) and analyzed the identical 1,000 disease-linked mutations and 1,000 polymorphisms with all programs. For this comparison, we used only alterations causing single amino acid exchanges. MutationTaster performed best in terms of accuracy and speed (Table 1). A description of all training and validation procedures and detailed statistics are available in Supplementary Methods.
MutationTaster can be used via an intuitive web interface to analyze single mutations as well as in batch mode. To streamline and to standardize the analysis of NGS data, we provide Perl scripts that can process data from all major platforms (Roche 454, Illumina Genome Analyzer and ABI SOLiD). MutationTaster hence allows the efficient filtering of NGS data for alterations with high disease-causing potential (see Supplementary Methods for an example).
Present limitations of the software comprise its inability to analyze insertion-deletions greater than 12 base pairs and alterations spanning an intron-exon border. Also, analysis of non-exonic alterations is restricted to Kozak consensus sequence, splice sites and poly(A) signal. We will add tests for other sequence motifs in the near future. MutationTaster is available at http://www.mutationtaster.org/.
Note: Supplementary information is available on the Nature Methods website.
References
Wheeler, D.A. et al. Nature 452, 872–876 (2008).
Hand, D.J. & Yu, K.M. Int. Stat. Rev. 69, 385–398 (2001).
Mi, H. et al. Nucleic Acids Res. 33, D284–D288 (2005).
Ferrer-Costa, C. et al. Bioinformatics. 21, 3176–3178 (2005).
Adzhubei, I.A. et al. Nat. Methods 7, 248–249 (2010).
Bromberg, Y. & Rost, B. Nucleic Acids Res. 35, 3823–3835 (2007).
Acknowledgements
This project was supported by the Deutsche Forschungsgemeinschaft via the NeuroCure Cluster of Excellence, Exc 257, and the Collaborative Research Center 665 TP C4. M.S. and D.S. are members of the German network for mitochondrial disorders (mitoNET, 01GM0862), funded by the German ministry of education and research (BMBF). We thank H. Peters for providing mutation data, M. Zhang and M. Reese for allowing us to integrate polyadq and NNSplice, E. Lüdeking for proofreading the manuscript, and all beta users whose valuable recommendations guided the development of MutationTaster.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Methods (PDF 399 kb)
Rights and permissions
About this article
Cite this article
Schwarz, J., Rödelsperger, C., Schuelke, M. et al. MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods 7, 575–576 (2010). https://doi.org/10.1038/nmeth0810-575
Issue Date:
DOI: https://doi.org/10.1038/nmeth0810-575
This article is cited by
-
Single-cell RNA sequencing in donor and end-stage heart failure patients identifies NLRP3 as a therapeutic target for arrhythmogenic right ventricular cardiomyopathy
BMC Medicine (2024)
-
Comprehensive review and expanding the genetic landscape of Cornelia-de-Lange spectrum: insights from novel mutations and skin biopsy in exome-negative cases
BMC Medical Genomics (2024)
-
A novel missense COL9A3 variant in a pedigree with multiple lumbar disc herniation
Journal of Orthopaedic Surgery and Research (2024)
-
Extraskeletal chondroma of the toe in a child with DICER1 tumor predisposition syndrome: support for a dominant negative mechanism
Virchows Archiv (2024)
-
Clinical and genetic characterization of a large cohort of Chinese patients with Bietti crystalline retinopathy
Graefe's Archive for Clinical and Experimental Ophthalmology (2024)