Abstract
This work aims at addressing the question of whether the new CASAVA1.8, which boasts improvements such as local realignments of reads, is at par with the well accepted pipeline of BWA mapping, duplicate removal, local realignment, re-calibration and variant calling using GATK. We therefore compare the two methods on chromosome 21 of a Yoruba trio and compare the results to the genotype identified by the 1000 genomes project.We find that the mapping performance is the same for CASAVA1.8 and the academic pipeline, resulting in a mean coverage of about 22. CASAVA1.8 and GATK both call about 70.000 SNPs per individual of which 80% overlap between CASAVA1.8, GATK and the 1000 genomes project. This stands in contrast to the indel calling performance where CASAVA1.8 calls about 12,000 indels while GATK calls 16,000. Furthermore, CASAVA1.8 has a higher Mendelian error rate and frequently more than one alternative allele per locus indicating a non-optimal alignment.We conclude that CASAVA1.8 has come a long way and can be considered a mature SNP calling approach. However, CASAVA1.8 does not deliver the same quality in the indel calling set compared to the newly incorporated Dindel-algorithm of GATK. It hence remains the best practice to use CASAVA1.8 for producing fastq files and switch at this stage to the academic tools for mapping, alignment improvement and variant calling.
Similar content being viewed by others
Article PDF
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bauer, D. Variant calling comparison CASAVA1.8 and GATK. Nat Prec (2011). https://doi.org/10.1038/npre.2011.6107.1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/npre.2011.6107.1
Keywords
This article is cited by
-
QTL mapping and identification of genes associated with the resistance to Acanthoscelides obtectus in cultivated common bean using a high-density genetic linkage map
BMC Plant Biology (2022)
-
Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers
BMC Bioinformatics (2017)