Abstract
Over the past decade, long-read, single-molecule DNA sequencing technologies have emerged as powerful players in genomics. With the ability to generate reads tens to thousands of kilobases in length with an accuracy approaching that of short-read sequencing technologies, these platforms have proven their ability to resolve some of the most challenging regions of the human genome, detect previously inaccessible structural variants and generate some of the first telomere-to-telomere assemblies of whole chromosomes. Long-read sequencing technologies will soon permit the routine assembly of diploid genomes, which will revolutionize genomics by revealing the full spectrum of human genetic variation, resolving some of the missing heritability and leading to the discovery of novel mechanisms of disease.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
van Dijk, E. L., Jaszczyszyn, Y., Naquin, D. & Thermes, C. The third revolution in sequencing technology. Trends Genet. 34, 666–681 (2018).
Shendure, J. et al. DNA sequencing at 40: past, present and future. Nature 550, 345–353 (2017).
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Sudmant, P. H. et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761 (2015).
Ng, S. B. et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat. Genet. 42, 790–793 (2010).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Simonson, T. S. et al. Genetic evidence for high-altitude adaptation in Tibet. Science 329, 72–75 (2010).
Sudmant, P. H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019). This study compares multiple sequence and mapping technologies for the genomes of three parent–child trios and quantifies the amount of missing genetic variation. A method, Phased-SV, is developed that partitions long-read data on the basis of phased single-nucleotide polymorphisms, which resolves the sequence of both structural haplotypes.
1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Bailey, J. A., Yavor, A. M., Massa, H. F., Trask, B. J. & Eichler, E. E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001).
Hodgkinson, A., Chen, Y. & Eyre-Walker, A. The large-scale distribution of somatic mutations in cancer genomes. Hum. Mutat. 33, 136–143 (2012).
Hills, M., Jeyapalan, J. N., Foxon, J. L. & Royle, N. J. Mutation mechanisms that underlie turnover of a human telomere-adjacent segmental duplication containing an unstable minisatellite. Genomics 89, 480–489 (2007).
Hastings, P. J., Lupski, J. R., Rosenberg, S. M. & Ira, G. Mechanisms of change in gene copy number. Nat. Rev. Genet. 10, 551–564 (2009).
Zheng, G. X. Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016).
Zhang, F. et al. Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube. Nat. Biotechnol. 35, 852–857 (2017).
Wang, O. et al. Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly. Genome Res. 29, 798–808 (2019).
Li, R. et al. Illumina synthetic long read sequencing allows recovery of missing sequences even in the “finished” C. elegans genome. Sci. Rep. 5, 10814 (2015).
Peters, B. A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Ghurye, J. et al. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput. Biol. 15, e1007273 (2019).
Garg, S. et al. Efficient chromosome-scale haplotype-resolved assembly of human genomes. Preprint at bioRxiv https://doi.org/10.1101/810341 (2019).
Harewood, L. et al. Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumours. Genome Biol. 18, 125 (2017).
Chu, J., Mohamadi, H., Warren, R. L., Yang, C. & Birol, I. Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art. Bioinformatics 33, 1261–1270 (2017).
Jung, H., Winefield, C., Bombarely, A., Prentis, P. & Waterhouse, P. Tools and strategies for long-read sequencing and de novo assembly of plant genomes. Trends Plant Sci. 24, 700–724 (2019).
Sedlazeck, F. J., Lee, H., Darby, C. A. & Schatz, M. C. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet. 19, 329–346 (2018).
Chaisson, M. J. P., Wilson, R. K. & Eichler, E. E. Genetic variation and the de novo assembly of human genomes. Nat. Rev. Genet. 16, 627–640 (2015).
Pollard, M. O., Gurdasani, D., Mentzer, A. J., Porter, T. & Sandhu, M. S. Long reads: their purpose and place. Hum. Mol. Genet. 27, R234–R241 (2018).
Mantere, T., Kersten, S. & Hoischen, A. Long-read sequencing emerging in medical genetics. Front. Genet. 10, 426 (2019).
Kronenberg, Z. N. et al. High-resolution comparative analysis of great ape genomes. Science 360, eaar6343 (2018).
Audano, P. A. et al. Characterizing the major structural variant alleles of the human genome. Cell 176, 663–675.e19 (2019). This article provides a large catalogue of sequence-resolved structural variants based on long-read sequence analysis of a diverse panel of 15 genomes and identifies instances where the human reference has a minor allele for a structural variant. It also develops a machine learning-based approach for genotyping sequence-resolved structural variants in Illumina whole-genome shotgun sequence data, which led to the discovery of expression quantitative trait loci and new lead variants for genome-wide association studies.
Huddleston, J. et al. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res. 27, 677–685 (2017).
Chaisson, M. J. P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015). This article describes one of the first methods for sequencing and assembling structural variation from long-read sequence data. It shows that most of these variants are novel, and thus a large amount of human genetic variation is missed with short-read sequencing approaches.
Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. Preprint at bioRxiv https://doi.org/10.1101/735928 (2019). This landmark study shows that PacBio and ONT long reads are able to generate a de novo genome assembly superior in contiguity to all other genome assemblies (including hg38). Importantly, it reveals the first telomere-to-telomere sequence assembly of a human chromosome and shows that it is possible to resolve megabase-sized arrays of near-identical tandem repeats (that is, the centromere) with long and ultra-long reads.
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018). This article demonstrates that ONT ultra-long reads can be used for de novo human genome assembly. Additionally, this assembly resolved both haplotypes of the human major histocompatibility locus for the first time.
Shafin, K. et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0503-6 (2020). This study describes the rapid assembly of 11 human genomes using ONT long reads, and it debuts a new assembler (Shasta) and polisher (HELEN). This article provides the methodological basis for scalability in human genome assembly using long reads.
Payne, A., Holmes, N., Rakyan, V. & Loose, M. BulkVis: a graphical viewer for Oxford Nanopore bulk FAST5 files. Bioinformatics 35, 2193–2198 (2019).
Rang, F. J., Kloosterman, W. P. & de Ridder, J. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol. 19, 90 (2018).
Ardui, S., Ameur, A., Vermeesch, J. R. & Hestand, M. S. Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res. 46, 2159–2168 (2018).
Carneiro, M. O. et al. Pacific Biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13, 375 (2012).
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Korlach, J. Understanding accuracy in SMRT® sequencing. PacBio https://www.pacb.com/wp-content/uploads/2015/09/Perspective_UnderstandingAccuracySMRTSequencing.pdf (2015).
Rhoads, A. & Au, K. F. PacBio sequencing and its applications. Genomics Proteomics Bioinformatics 13, 278–289 (2015).
Weirather, J. L. et al. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. F1000Res 6, 100 (2017).
Fox, E. J., Reid-Bayliss, K. S., Emond, M. J. & Loeb, L. A. Accuracy of next generation sequencing platforms. Next Gener. Seq. Appl. 1, 1000106 (2014).
Chin, C.-S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Vaser, R., Sović, I., Nagarajan, N. & Šikic, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at arXiv https://arxiv.org/abs/1207.3907 (2012).
Gordon, D. et al. Long-read sequence assembly of the gorilla genome. Science 352, aae0344 (2016).
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36, 2253–2255 (2020).
Vollger, M. R. et al. Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Ann. Hum. Genet. 84, 125–140 (2020).
Wenger, A. M. et al. Highly-accurate long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019). This study introduces PacBio HiFi reads as a new data type and reveals the power of highly accurate (greater than 99%), long (greater than 10 kb) reads for de novo genome assembly and structural variant detection.
Vollger, M. R. et al. Long-read sequence and assembly of segmental duplications. Nat. Methods 16, 88–94 (2019). This article quantifies the extent to which segmental duplications remain unassembled in long-read genomes. Additionally, it describes a method to locally reconstruct segmental duplications by partitioning long-read sequence data using paralogous sequence variant graphs and locally assembling them.
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications and allelic variants from high-fidelity long reads. Preprint at bioRxiv https://doi.org/10.1101/2020.03.14.992248 (2020).
Miao, H. et al. Long-read sequencing identified a causal structural variant in an exome-negative case and enabled preimplantation genetic diagnosis. Hereditas 155, 32 (2018).
Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 20, 129 (2019).
Li, C. et al. INC-Seq: accurate single molecule reads using nanopore sequencing. Gigascience 5, 34 (2016).
Wilson, B. D., Eisenstein, M. & Soh, H. T. High-fidelity nanopore sequencing of ultra-short DNA targets. Anal. Chem. 91, 6783–6789 (2019).
Oxford Nanopore. 1D squared kit available in the store: boost accuracy, simple prep. Oxford Nanopore Technologies http://nanoporetech.com/about-us/news/1d-squared-kit-available-store-boost-accuracy-simple-prep (2017).
Lewandowski, K. et al. Metagenomic nanopore sequencing of influenza virus direct from clinical respiratory samples. J. Clin. Microbiol. 58, e00963-19 (2019).
Charalampous, T. et al. Nanopore metagenomics enables rapid clinical diagnosis of bacterial lower respiratory infection. Nat. Biotechnol. 37, 783–792 (2019).
Quick, J. et al. Real-time, portable genome sequencing for Ebola surveillance. Nature 530, 228–232 (2016).
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
Okubo, M. et al. GGC repeat expansion of NOTCH2NLC in adult patients with leukoencephalopathy. Ann. Neurol. 86, 962–968 (2019).
Sone, J. et al. Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease. Nat. Genet. 51, 1215–1221 (2019). The authors show that PacBio CLRs and ONT long reads can detect structural variation in clinically relevant disease-risk genes, which were previously missed with short-read whole-exome and whole-genome sequencing.
Stefansson, H. et al. Large recurrent microdeletions associated with schizophrenia. Nature 455, 232–236 (2008).
Sharp, A. J. et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat. Genet. 38, 1038–1042 (2006).
Hsieh, P. et al. Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes. Science 366, eaax2083 (2019). The authors describe large structural variants, originating in Neanderthals or Denisovans, that show signs of adaptation and positive selection in the Melanesian population. In particular, they use long reads to assemble a 386-kb duplication polymorphism that is present in 79% of Melanesians but generally absent from other populations, demonstrating the importance of developing new human reference genomes.
Shi, L. et al. Long-read sequencing and de novo assembly of a Chinese genome. Nat. Commun. 7, 12065 (2016).
Seo, J.-S. et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016).
International Human Genome Project Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Chin, C.-S. & Khalak, A. Human genome assembly in 100 minutes. Preprint at bioRxiv https://doi.org/10.1101/705616 (2019). This article describes a unique and fast genome assembly algorithm called Peregrine that uses PacBio HiFi data. This long-read assembler is able to assemble a human genome in less than 100 minutes or ~30 CPU hours.
Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).
Steinberg, K. M. et al. High-quality assembly an individual of Yoruban descent. Preprint at bioRxiv https://doi.org/10.1101/067447 (2016).
Oliver, J. S. et al. High-definition electronic genome maps from single molecule data. Preprint at bioRxiv https://doi.org/10.1101/139840 (2017).
Udall, J. A. & Dawe, R. K. Is it ordered correctly? Validating genome assemblies by optical mapping. Plant Cell 30, 7–14 (2018).
Ameur, A. et al. De novo assembly of two Swedish genomes reveals missing segments from the human GRCh38 reference and improves variant calling of population-scale sequencing data. Genes 9, 486 (2018).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv https://arxiv.org/abs/1303.3997 (2013).
Watson, M. & Warr, A. Errors in long-read assemblies can critically affect protein prediction. Nat. Biotechnol. 37, 124–126 (2019).
Koren, S., Phillippy, A. M., Simpson, J. T., Loman, N. J. & Loose, M. Reply to ‘Errors in long-read assemblies can critically affect protein prediction’. Nat. Biotechnol. 37, 127–128 (2019).
Loman, N. J., Quick, J. & Simpson, J. T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 12, 733–735 (2015).
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017). The authors report a method to detect methylated cytosines in raw ONT reads based on characteristic signal disruptions in ONT data using the computational tool Nanopolish. This tool is used to map methylation within the centromere for the first time.
Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174–1182 (2018). The authors demonstrate a method to phase haplotypes for de novo genome assembly known as trio binning in which reads from the parents are used to identity and partition reads from the child into haplotypes before sequence assembly.
Porubský, D. et al. Direct chromosome-length haplotyping by single-cell sequencing. Genome Res. 26, 1565–1574 (2016).
Patterson, M. et al. WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Computational Biol. 22, 498–509 (2015).
Kronenberg, Z. N. et al. Extended haplotype phasing of de novo genome assemblies with FALCON-Phase. Preprint at bioRxiv https://doi.org/10.1101/327064 (2019).
Porubsky, D. et al. A fully phased accurate assembly of an individual human genome. Preprint at bioRxiv https://doi.org/10.1101/855049 (2019).
Eichler, E. E. Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet. 17, 661–669 (2001).
Rodriguez, O. L., Ritz, A., Sharp, A. J. & Bashir, A. MsPAC: A tool for haplotype-phased structural variant detection. Bioinformatics 36, 922–924 (2019).
Bzikadze, A. V. & Pevzner, P. A. centroFlye: assembling centromeres with long error-prone reads. Preprint at bioRxiv https://doi.org/10.1101/772103 (2019).
Ebbert, M. T. W. et al. Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight. Genome Biol. 20, 97 (2019).
Feng, Z. et al. Detecting DNA modifications from SMRT sequencing data by modeling sequence context dependence of polymerase kinetic. PLoS Comput. Biol. 9, e1002935 (2013).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single molecule sequencing. Nat. Methods 15, 461–468 (2018).
Chaisson, M. J. & Tesler, G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13, 238 (2012).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Berlin, K. et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33, 623–630 (2015).
Mizuguchi, T. et al. A 12-kb structural variation in progressive myoclonic epilepsy was newly identified by long-read whole-genome sequencing. J. Hum. Genet. 64, 359–368 (2019).
Merker, J. D. et al. Long-read genome sequencing identifies causal structural variation in a Mendelian disease. Genet. Med. 20, 159–163 (2018).
Zeng, S. et al. Long-read sequencing identified intronic repeat expansions in SAMD12 from Chinese pedigrees affected with familial cortical myoclonic tremor with epilepsy. J. Med. Genet. 56, 265–270 (2019).
Reiner, J. et al. Cytogenomic identification and long-read single molecule real-time (SMRT) sequencing of a Bardet–Biedl syndrome 9 (BBS9) deletion. NPJ Genom. Med. 3, 3 (2018).
Sato, N. et al. Spinocerebellar ataxia type 31 is associated with ‘inserted’ penta-nucleotide repeats containing (TGGAA)n. Am. J. Hum. Genet. 85, 544–557 (2009).
Dutta, U. R. et al. Breakpoint mapping of a novel de novo translocation t(X;20)(q11.1;p13) by positional cloning and long read sequencing. Genomics 111, 1108–1114 (2019).
de Jong, L. C. et al. Nanopore sequencing of full-length BRCA1 mRNA transcripts reveals co-occurrence of known exon skipping events. Breast Cancer Res. 19, 127 (2017).
Wenzel, A. et al. Single molecule real time sequencing in ADTKD-MUC1 allows complete assembly of the VNTR and exact positioning of causative mutations. Sci. Rep. 8, 4170 (2018).
Ishiura, H. et al. Noncoding CGG repeat expansions in neuronal intranuclear inclusion disease, oculopharyngodistal myopathy and an overlapping disease. Nat. Genet. 51, 1222–1232 (2019).
Aneichyk, T. et al. Dissecting the causal mechanism of X-linked dystonia-parkinsonism by integrating genome and transcriptome assembly. Cell 172, 897–909.e21 (2018).
Song, J. H. T., Lowe, C. B. & Kingsley, D. M. Characterization of a human-specific tandem repeat associated with bipolar disorder and schizophrenia. Am. J. Hum. Genet. 103, 421–430 (2018).
Carvalho, C. M. B. et al. Interchromosomal template-switching as a novel molecular mechanism for imprinting perturbations associated with Temple syndrome. Genome Med. 11, 25 (2019).
Giesselmann, P. et al. Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing. Nat. Biotechnol. 37, 1478–1481 (2019).
Sulovari, A. et al. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl Acad. Sci. USA 116, 23243–23253 (2019).
Lei, X. X. et al. TTTCA repeat expansion causes familial cortical myoclonic tremor with epilepsy. Eur. J. Neurol. 26, 513–518 (2019).
Fiddes, I. T. et al. Human-specific NOTCH2NL genes affect Notch signaling and cortical neurogenesis. Cell 173, 1356–1369.e22 (2018).
Suzuki, I. K. et al. Human-specific NOTCH2NL genes expand cortical neurogenesis through Delta/Notch regulation. Cell 173, 1370–1384.e16 (2018).
Mefford, H. C. et al. Recurrent rearrangements of chromosome 1q21.1 and variable pediatric phenotypes. N. Engl. J. Med. 359, 1685–1699 (2008).
Brunetti-Pierri, N. et al. Recurrent reciprocal 1q21.1 deletions and duplications associated with microcephaly or macrocephaly and developmental and behavioral abnormalities. Nat. Genet. 40, 1466–1471 (2008).
He, Y. et al. Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants. Nat. Commun. 10, 4233 (2019).
National Human Genome Research Institute. NHGRI funds centers for advancing the reference sequence of the human genome. Genome.gov https://www.genome.gov/news/news-release/NIH-funds-centers-for-advancing-sequence-of-human-genome-reference (2019).
Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods 15, 201–206 (2018). The authors describe a method to sequence full-length native RNA molecules with ONT sequencing technologies, simplifying the process by removing the steps to convert RNA into cDNA before sequencing.
Workman, R. E. et al. Nanopore native RNA sequencing of a human poly(A) transcriptome. Nat. Methods 16, 1297–1305 (2019).
Soneson, C. et al. A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes. Nat. Commun. 10, 3359 (2019).
Flusberg, B. A. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7, 461–465 (2010).
Vilfan, I. D. et al. Analysis of RNA base modification and structural rearrangement by single-molecule real-time detection of reverse transcription. J. Nanobiotechnol. 11, 8 (2013).
Rand, A. C. et al. Mapping DNA methylation with high-throughput nanopore sequencing. Nat. Methods 14, 411–413 (2017).
Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013).
Au, K. F. et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl Acad. Sci. USA 110, E4821–E4830 (2013). This article shows that full-length mRNA transcripts can be sequenced from end to end to identify novel gene isoforms using the PacBio Iso-Seq method. This article also provides a catalogue of the poly(A) transcriptome in human embryonic stem cells using a combination of Iso-Seq and short-read sequencing data.
Byrne, A. et al. Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nat. Commun. 8, 16027 (2017).
Oikonomopoulos, S., Wang, Y. C., Djambazian, H., Badescu, D. & Ragoussis, J. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations. Sci. Rep. 6, 31602 (2016).
Dougherty, M. L. et al. Transcriptional fates of human-specific segmental duplications in brain. Genome Res. 28, 1566–1576 (2018).
Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl Acad. Sci. USA 115, 9726–9731 (2018).
Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011).
Clark, M. B. et al. Long-read sequencing reveals the complex splicing profile of the psychiatric risk gene CACNA1C in human brain. Mol Psychiatry 25, 37–47 (2020).
Tang, A. D. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat. Commun. 11, 1438 (2020).
Clark, T. A. et al. Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing. Nucleic Acids Res. 40, e29 (2012).
Huang, Y. et al. The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing. PLoS One 5, e8888 (2010).
Pacific Biosciences. Detecting DNA base modifications using single molecule, real-time sequencing. PacBio https://www.pacb.com/wp-content/uploads/2015/09/WP_Detecting_DNA_Base_Modifications_Using_SMRT_Sequencing.pdf (2015).
Frommer, M. et al. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc. Natl Acad. Sci. USA 89, 1827–1831 (1992).
An, N., Fleming, A. M., White, H. S. & Burrows, C. J. Nanopore detection of 8-oxoguanine in the human telomere repeat sequence. ACS Nano 9, 4296–4307 (2015).
Liu, H. et al. Accurate detection of m6A RNA modifications in native RNA sequences. Nat. Commun. 10, 4079 (2019).
Leger, A. et al. RNA modifications detection by comparative Nanopore direct RNA sequencing. Preprint at bioRxiv https://doi.org/10.1101/843136 (2019).
Lorenz, D. A., Sathe, S., Einstein, J. M. & Yeo, G. W. Direct RNA sequencing enables m6A detection in endogenous transcript isoforms at base specific resolution. RNA https://doi.org/10.1261/rna.072785.119 (2019).
Li, Y. & Tollefsbol, T. O. DNA methylation detection: bisulfite genomic sequencing analysis. Methods Mol. Biol. 791, 11–21 (2011).
Schaefer, M., Pollex, T., Hanna, K. & Lyko, F. RNA cytosine methylation analysis by bisulfite sequencing. Nucleic Acids Res. 37, e12 (2009).
Levanon, E. Y. et al. Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat. Biotechnol. 22, 1001–1005 (2004).
Incarnato, D. et al. High-throughput single-base resolution mapping of RNA 2΄-O-methylated residues. Nucleic Acids Res. 45, 1433–1441 (2017).
Bakin, A. V. & Ofengand, J. Mapping of pseudouridine residues in RNA to nucleotide resolution. Methods Mol. Biol. 77, 297–309 (1998).
Tsai, Y.-C. et al. Amplification-free, CRISPR-Cas9 targeted enrichment and SMRT sequencing of repeat-expansion disease causative genomic regions. Preprint at bioRxiv https://doi.org/10.1101/203919 (2017).
Hafford-Tear, N. J. et al. CRISPR/Cas9-targeted enrichment and long-read sequencing of the Fuchs endothelial corneal dystrophy–associated TCF4 triplet repeat. Genet. Med. 21, 2092–2102 (2019).
Suzuki, Y. et al. AgIn: measuring the landscape of CpG methylation of individual repetitive elements. Bioinformatics 32, 2911–2919 (2016).
Ni, P. et al. DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning. Bioinformatics 35, 4586–4595 (2019).
McIntyre, A. B. R. et al. Single-molecule sequencing detection of N6-methyladenine in microbial reference materials. Nat. Commun. 10, 579 (2019).
Liu, Q. et al. Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat. Commun. 10, 2449 (2019).
Stoiber, M. et al. De novo identification of DNA modifications enabled by genome-guided nanopore signal processing. Preprint at bioRxiv https://doi.org/10.1101/094672 (2017).
Lee, I. et al. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. Preprint at bioRxiv https://doi.org/10.1101/504993 (2019).
Beyter, D. et al. Long read sequencing of 1,817 Icelanders provides insight into the role of structural variants in human disease. Preprint at bioRxiv https://doi.org/10.1101/848366 (2019).
Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).
Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).
Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).
Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl Acad. Sci. USA 108, 1513–1518 (2011).
Sanders, A. D., Falconer, E., Hills, M., Spierings, D. C. J. & Lansdorp, P. M. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs. Nat. Protoc. 12, 1151–1176 (2017).
Sanders, A. D. et al. Single-cell analysis of structural variations and complex rearrangements with tri-channel processing. Nat. Biotechnol. 38, 343–354 (2020).
Porubsky, D. et al. Dense and accurate whole-chromosome haplotyping of individual genomes. Nat. Commun. 8, 1293 (2017).
Wu, J. K et al. Thrombocytopenia-absent radius syndrome: background, pathophysiology, epidemiology. Medscape https://reference.medscape.com/article/959262-overview (2019).
Rosenfeld, J. A. et al. Proximal microdeletions and microduplications of 1q21.1 contribute to variable abnormal phenotypes. Eur. J. Hum. Genet. 20, 754–761 (2012).
Acknowledgements
The authors thank M. J. Chaisson and D. Porubsky for assistance with the figures, K. Munson for technical assistance and commentarial insight and T. Brown for assistance in editing the manuscript. This work was supported, in part, by grants from the US National Institutes of Health (HG010169 to E.E.E.) and the US National Institute of General Medical Sciences (1F32GM134558-01 to G.A.L.). M.R.V. was supported by a US National Library of Medicine Big Data Training Grant for Genomics and Neuroscience (5T32LM012419-04). E.E.E. is an investigator of the Howard Hughes Medical Institute.
Author information
Authors and Affiliations
Contributions
The authors contributed equally to all aspects of the article.
Corresponding author
Ethics declarations
Competing interests
E.E.E. is on the scientific advisory board of DNAnexus Inc.
Additional information
Peer review information
Nature Reviews Genetics thanks M. Schatz and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
All of Us: https://allofus.nih.gov/
Arrow: https://github.com/PacificBiosciences/GenomicConsensus
Loman Labs: https://lab.loman.net/2017/03/09/ultrareads-for-nanopore/
Medaka: https://github.com/nanoporetech/medaka
Nanopolish: https://github.com/jts/nanopolish
Pacific Biosciences: does speed impact quality and yield?: https://github.com/PacificBiosciences/ccs#does-speed-impact-quality-and-yield
Supplementary Information
Glossary
- Next-generation sequencing
-
A sequencing method in which an entire genome is sequenced from fragmented DNA, producing short (less than 300 bp) sequencing reads at high speed and low cost.
- Sequence-by-synthesis
-
A sequencing technology used primarily by Illumina, in which a DNA polymerase synthesizes a strand of DNA complementary to a template by incorporating a fluorescently labelled deoxynucleoside triphosphate that is imaged to identify the base and then cleaved before the process is repeated to determine the order and identity of each base in the DNA strand.
- Single-nucleotide variants
-
Instances in which a single base within a read or genome differs from the base found at the same position in other individuals or populations.
- Copy number variants
-
Instances in which a sequence of bases within a genome differs in the number of copies among individuals or populations.
- Indels
-
Insertions or deletions of bases in the genome of an organism.
- Structural variant
-
A genetic variant greater than 50 bp in length that includes insertions, deletions, inversions or translocations of DNA segments, and copy number differences.
- Segmental duplications
-
Blocks of DNA that are greater than 1 kb in length, occur at more than one site within a genome and share greater than 90% sequence identity.
- Linked-read sequencing
-
A synthetic long-read DNA sequencing method wherein short-read sequencing is applied to long DNA molecules to ‘link’ reads together from the same original long molecule.
- Long-read sequencing
-
A sequencing method used by Pacific Biosciences and Oxford Nanopore Technologies, wherein native DNA or RNA molecules are sequenced in real time, often without the need for amplification, producing reads more than 10 kb in length.
- Contigs
-
Continuous (or ‘contiguous’) sequences of DNA generated by assembling overlapping sequencing reads.
- Single-molecule, real-time (SMRT) sequencing
-
A DNA sequencing method used by Pacific Biosciences wherein the sequence of a single DNA molecule is derived in real time, with no pause after the detection of the bases.
- SMRTbell
-
A double-stranded DNA template used in Pacific Biosciences SMRT sequencing wherein both DNA ends are capped with hairpin adapters. A SMRTbell template is topologically circular and structurally linear.
- SMRT Cell
-
A flow cell comprising arrays of zero-mode waveguide nanostructures used during Pacific Biosciences SMRT sequencing.
- Zero-mode waveguides
-
Nanophotonic devices that confine light to a small observation volume and are part of the SMRT Cell used during Pacific Biosciences SMRT sequencing.
- Flow cell
-
A disposable component of short-read and long-read sequencing platforms that houses the chemistry to sequence DNA and/or RNA molecules.
- Subreads
-
The sequence derived from a single pass of the DNA polymerase as it processes along the SMRTbell template multiple times during Pacific Biosciences SMRT sequencing. Subreads do not contain any adapter sequences.
- Homopolymers
-
Sequences of consecutive identical bases.
- Single-pass
-
The traversal of a single strand within a SMRTbell template by a DNA polymerase during Pacific Biosciences SMRT sequencing.
- Polishing tools
-
Computational tools that increase genome assembly quality and accuracy. These tools typically compare reads to an assembly to derive a more accurate consensus sequence.
- Squiggle
-
A series of voltage shifts that represent overlapping k-mers from a DNA molecule as it translocates through a nanopore during Oxford Nanopore Technologies sequencing.
- Sequencing coverage
-
The average number of unique reads that align to, or ‘cover’, a sequence or genome.
- Circular consensus sequencing
-
(CCS). A sequencing mode used by Pacific Biosciences in which a DNA polymerase makes multiple passes around the SMRTbell template, generating noisy subreads that are computationally combined to generate a highly accurate high-fidelity consensus read.
- Polymerase reads
-
The sequence derived from one or more passes of the DNA polymerase around a SMRTbell template, including both adapters and inserts. Polymerase reads are trimmed to exclude any low-quality regions and are generated by Pacific Biosciences SMRT sequencing.
- Read N50
-
The sequence length of the shortest read at 50% of the total sequencing dataset sorted by read length. In other words, half of the sequencing dataset is in reads larger than or equal to the read N50 size.
- ONT long read
-
A read that is 10–100 kb in length and generated by Oxford Nanopore Technology (ONT) sequencing.
- ONT ultra-long read
-
A read that is greater than 100 kb in length and generated by Oxford Nanopore Technology (ONT) sequencing.
- Contig N50
-
The sequence length of the shortest contig at 50% of the total genome length sorted by contig length. In other words, half of the genome sequence is contained in contigs larger than or equal to the contig N50 size.
- Optical mapping
-
A technique commonly used to scaffold sequence contigs that involves constructing ordered genomic maps from single molecules of DNA with a fluorescent readout.
- Electronic mapping
-
A technique commonly used to scaffold sequence contigs that involves constructing ordered genomic maps from single molecules of DNA with an electronic readout.
- Phased de novo genome assembly
-
A genome assembly in which the maternal and paternal haplotypes are resolved.
- Trio binning
-
A method in which short reads from two parental genomes are used to partition long reads from their offspring into haplotype-specific sets before the assembly of each haplotype.
- Paralogous sequence variants
-
Single nucleotide differences between duplicated loci in the genome that are invariant in a population.
- CHM13 human genome
-
A complete hydatidiform mole (CHM) genome that has lost the maternal genome and duplicated the paternal genome. This genome is currently the focus of the Telomere-to-Telomere (T2T) consortium's genome assembly efforts due to its essentially haploid nature and stable karyotype.
- Whole-genome sequencing
-
Sequencing of the entire genome without using methods for sequencing selection.
- SVA
-
A type of retrotransposon insertion composed of a (CCCTCT)n hexamer simple repeat region at the 5′ end, an Alu-like region, a variable number of tandem repeat (VNTR) region, a short interspersed element of retroviral origin (SINE-R) region, and a poly(A) tail after the putative polyadenylation signal.
- Uniparental disomy
-
Inheritance of two copies of a chromosome or segments of a chromosome from one parent, instead of one copy from each parent.
- Expression quantitative trait loci
-
Loci that explain a fraction of the genetic variant of a gene expression phenotype.
- Genome-wide association studies
-
An approach used in genetics research to associate specific genetic variations with particular traits.
- Introgression
-
The transfer of genetic information from one species to another as a result of hybridization between them and repeat backcrossing.
Rights and permissions
About this article
Cite this article
Logsdon, G.A., Vollger, M.R. & Eichler, E.E. Long-read human genome sequencing and its applications. Nat Rev Genet 21, 597–614 (2020). https://doi.org/10.1038/s41576-020-0236-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41576-020-0236-x
This article is cited by
-
DNA satellite and chromatin organization at mouse centromeres and pericentromeres
Genome Biology (2024)
-
RUBICON: a framework for designing efficient deep learning-based genomic basecallers
Genome Biology (2024)
-
Whole genome sequencing in clinical practice
BMC Medical Genomics (2024)
-
Long-read transcriptome landscapes of primary and metastatic liver cancers at transcript resolution
Biomarker Research (2024)
-
Long-read sequencing reveals the structural complexity of genomic integration of HPV DNA in cervical cancer cell lines
BMC Genomics (2024)