The latest sequencers take a week or two to generate about a billion short reads, stretches of about 50–400 bp of DNA sequenced from a longer molecule. Researchers face challenges on two levels when turning massive collections of reads into biologically meaningful information. The first set of challenges lies in processing the reads themselves: mapping them to their genomic locations, and then assembling them into longer contiguous stretches of DNA. The second set of challenges lies in interpreting large collections of reads, which may be assembled into whole genomes, to understand the functional effects of genetic variation. Thousands of genomes from humans, plants, animals and disease tissues have already been sequenced—and all are in need of better interpretation. Although algorithms, such as BLAST for searching and CLUSTALW for aligning, continue to be the workhorses of sequence analysis, several next-generation computational methods have emerged to cope with the DNA sequences captured in billions of short reads and thousands of genomes.
The advance. Two methods for de novo transcriptome assembly of short reads were published this year from Lior Pachter and colleagues1 and from Aviv Regev and colleagues2. The transcriptome can be analyzed by sequencing cDNA reverse transcribed from RNA (RNA-Seq), but mapping and assembling the resulting reads are challenging owing to the complexities introduced by RNA splicing. The two methods are the first that robustly assemble full-length transcripts, including alternative splicing isoforms. In contrast to previous approaches, these two methods first map reads to the genome using software that takes possible splice junctions into account, thereby making assembly more manageable. Then, they apply graph-based algorithms to determine1,2 and quantify1 the most likely splice isoforms. The algorithms were applied to mammalian transcriptomes to follow global patterns of splicing during a developmental time course1 and to identify novel, spliced, long, noncoding RNAs that had not been annotated by existing methods2.
This is a preview of subscription content, access via your institution