Fig. 3 | Nature Communications

Fig. 3

From: Direct RNA sequencing on nanopore arrays redefines the transcriptional complexity of a viral pathogen

Fig. 3

Error correction and generation of pseudotranscripts to overcome sequencing errors inherent to the nanopore method. a Raw nanopore reads include numerous indel and substitution errors that hinder the identification of encoded ORFs and thereby impede annotation of the transcriptome. Illumina datasets generated from the same material allowed error correction using proovread (and see Figure S2). Subsequently, the transcript start/stop positions and internal splice positions were used to generate pseudotranscripts free of indel and substitution errors that permit unambiguous ORF prediction. Example changes in CIGAR string lengths for a given read are shown for each step of correction. b To optimize proovread error correction, we tested a range of subsampled Illumina datasets and evaluated corrected reads by the length of the CIGAR string (see Methods). Because optimal Illumina subsampling varies between reads, we subsequently applied a decision matrix utilizing the best-corrected version of a given read (filled boxes) as scored by the shortest CIGAR string length. Where multiple subsampling sets produce identical shortest CIGAR scores (shaded boxes), no difference was observed between the resulting sequences. The bold red line indicates the path chosen (i.e., from which error-corrected dataset a given read was drawn), while the dotted lines indicate alternative paths that produce the exact same result due to having identical CIGAR string lengths. c Schematic representation of the effect of error correction. The overall length of error-corrected nanopore reads is marginally less than raw sequence reads but the aligned portion of error-corrected reads is longer. d For each sequence read, the longest encoded ORF (>90 nt) was identified. Here, error correction notably increases the proportion of sequence reads containing translatable ORFs. In other words, the removal of indel errors improves our ability to identify novel and known ORFs

Back to article page