To the Editor:

Identification of pathogenic DNA sequence alterations in patients with inherited diseases is one of the main tasks of human genetics. Next-generation sequencing (NGS) techniques enable sequencing of hundreds of candidate genes, whole linkage intervals or the entire exome. This inevitably leads to the detection of vast numbers of alterations, all of which have to be tested for their disease-causing potential. A recent study revealed more than 3.5 million alterations in the whole genome of a single individual, roughly corresponding to 1,000 alterations per mega–base pair1.

Automated pre-evaluation of sequence variations can help to direct the subsequent in-depth analysis to the most promising candidates, hence saving time and resources. However, the currently available evaluation tools predict only the outcome of amino-acid exchanges and cannot process thousands of queries in a reasonable time.

To meet the challenges of handling high-throughput sequencing data, we developed MutationTaster, a free, web-based application for rapid evaluation of the disease-causing potential of DNA sequence alterations. MutationTaster integrates information from different biomedical databases and uses established analysis tools (Supplementary Methods). Analyses comprise evolutionary conservation, splice-site changes, loss of protein features and changes that might affect the amount of mRNA. Test results are then evaluated by a naive Bayes classifier2, which predicts the disease potential. A typical query is completed in less than 0.3 seconds.

Depending on the nature of the alteration, MutationTaster chooses between three different prediction models, which are either aimed at 'silent' synonymous or intronic alterations (without_aae), at alterations affecting a single amino acid (simple_aae) or at alterations causing complex changes in the amino acid sequence (complex_aae).

To train the classifier, we generated a dataset with all available and suitable common polymorphisms and known disease-causing mutations extracted from common databases and the literature. We cross-validated the classifier five times including all three prediction models and obtained an overall accuracy of 91.1 ± 0.1%. We also compared MutationTaster with similar applications (Panther3, Pmut4, PolyPhen and PolyPhen-2 (ref. 5) and 'screening for non-acceptable polymorphisms' (SNAP)6) and analyzed the identical 1,000 disease-linked mutations and 1,000 polymorphisms with all programs. For this comparison, we used only alterations causing single amino acid exchanges. MutationTaster performed best in terms of accuracy and speed (Table 1). A description of all training and validation procedures and detailed statistics are available in Supplementary Methods.

Table 1 Comparison of MutationTaster with other prediction tools

MutationTaster can be used via an intuitive web interface to analyze single mutations as well as in batch mode. To streamline and to standardize the analysis of NGS data, we provide Perl scripts that can process data from all major platforms (Roche 454, Illumina Genome Analyzer and ABI SOLiD). MutationTaster hence allows the efficient filtering of NGS data for alterations with high disease-causing potential (see Supplementary Methods for an example).

Present limitations of the software comprise its inability to analyze insertion-deletions greater than 12 base pairs and alterations spanning an intron-exon border. Also, analysis of non-exonic alterations is restricted to Kozak consensus sequence, splice sites and poly(A) signal. We will add tests for other sequence motifs in the near future. MutationTaster is available at http://www.mutationtaster.org/.

Note: Supplementary information is available on the Nature Methods website.