Abstract
UV radiation may lead to melanoma and nonmelanoma skin cancers by causing helix-distorting DNA damage such as cyclobutane pyrimidine dimers (CPDs). These DNA lesions, if located in important genes and not repaired promptly, are mutagenic and may eventually result in carcinogenesis. Examining CPD formation and repair processes across the genome can shed light on the mutagenesis mechanisms associated with UV damage in relevant cancers. We recently developed CPD-Seq, a high-throughput and single-nucleotide resolution sequencing technique that can specifically capture UV-induced CPD lesions across the genome. This novel technique has been increasingly used in studies of UV damage and can be adapted to sequence other clinically relevant DNA lesions. Although the library preparation protocol has been established, a systematic protocol to analyze CPD-Seq data has not been described yet. To streamline the various general or specific analysis steps, we developed a protocol named CPDSeqer to assist researchers with CPD-Seq data processing. CPDSeqer can accommodate both a single- and multiple-sample experimental design, and it allows both genome-wide analyses and regional scrutiny (such as of suspected UV damage hotspots). The runtime of CPDSeqer scales with raw data size and takes roughly 4 h per sample with the possibility of acceleration by parallel computing. Various guiding graphics are generated to help diagnose the performance of the experiment and inform regional enrichment of CPD formation. UV damage comparison analyses are set forth in three analysis scenarios, and the resulting HTML pages report damage directional trends and statistical significance. CPDSeqer can be accessed at https://github.com/shengqh/cpdseqer.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All datasets (GSE1034875, GSE799773 and GSE1192496) used in the demonstration for this protocol are available through the NCBI Short Read Archive (https://www.ncbi.nlm.nih.gov/sra). All figures used in this article are original. All preprocessed resource files listed in Supplementary Data 1 are available at https://cqsweb.app.vumc.org/Data/cpdseqer/.
Code availability
This protocol, including all scripts (Shell, Python and R), is hosted at https://github.com/shengqh/cpdseqer. A comprehensive test case involving all 17 steps entails an empirical CPD-Seq dataset and corresponding stepwise testing code scripts, which are available at https://cqsweb.app.vumc.org/Data/cpdseqer/.
References
Guy, G. P., Machlin, S. R., Ekwueme, D. U. & Yabroff, K. R. Prevalence and costs of skin cancer treatment in the US, 2002–2006 and 2007–2011. Am. J. Prev. Med. 48, 183–187 (2015).
Mouret, S. et al. Cyclobutane pyrimidine dimers are predominant DNA lesions in whole human skin exposed to UVA radiation. Proc. Natl Acad. Sci. USA 103, 13765–13770 (2006).
Mao, P., Smerdon, M. J., Roberts, S. A. & Wyrick, J. J. Chromosomal landscape of UV damage formation and repair at single-nucleotide resolution. Proc. Natl Acad. Sci. USA 113, 9057–9062 (2016).
Mao, P., Wyrick, J. J., Roberts, S. A. & Smerdon, M. J. UV-induced DNA damage and mutagenesis in chromatin. Photochem. Photobiol. 93, 216–228 (2017).
Mao, P. et al. ETS transcription factors induce a unique UV damage signature that drives recurrent mutagenesis in melanoma. Nat. Commun. 9, 2626 (2018).
Elliott, K. et al. Elevated pyrimidine dimer formation at distinct genomic bases underlies promoter mutation hotspots in UV-exposed cancers. PLoS Genet. 14, e1007849 (2018).
Premi, S. et al. Genomic sites hypersensitive to ultraviolet radiation. Proc. Natl Acad. Sci. USA 116, 24196–24205 (2019).
Lindberg, M., Bostrom, M., Elliott, K. & Larsson, E. Intragenomic variability and extended sequence patterns in the mutational signature of ultraviolet light. Proc. Natl Acad. Sci. USA 116, 20411–20417 (2019).
Brown, A. J., Mao, P., Smerdon, M. J., Wyrick, J. J. & Roberts, S. A. Nucleosome positions establish an extended mutation signature in melanoma. PLoS Genet. 14, e1007823 (2018).
Mao, P., Smerdon, M. J., Roberts, S. A. & Wyrick, J. J. Asymmetric repair of UV damage in nucleosomes imposes a DNA strand polarity on somatic mutations in skin cancer. Genome Res. 30, 12–21 (2020).
Duan, M., Selvam, K., Wyrick, J. J. & Mao, P. Genome-wide role of Rad26 in promoting transcription-coupled nucleotide excision repair in yeast chromatin. Proc. Natl Acad. Sci. USA 117, 18608–18616 (2020).
Mao, P. et al. Genome-wide maps of alkylation damage, repair, and mutagenesis in yeast reveal mechanisms of mutational heterogeneity. Genome Res. 27, 1674–1684 (2017).
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Ward, C. M., To, T. H. & Pederson, S. M. ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files. Bioinformatics 36, 2587–2588 (2020).
Guo, Y., Ye, F., Sheng, Q. H., Clark, T. & Samuels, D. C. Three-stage quality control strategies for DNA re-sequencing data. Brief. Bioinform. 15, 879–889 (2014).
Patel, R. K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7, e30619 (2012).
Girardot, C., Scholtalbers, J., Sauer, S., Su, S. Y. & Furlong, E. E. Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers. BMC Bioinformatics 17, 419 (2016).
Andrews, S. A Quality Control Tool for High Throughput Sequence Data. Available at http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).
Guo, Y. et al. Multi-perspective quality control of Illumina exome sequencing data using QC3. Genomics 103, 323–328 (2014).
Yu, H. et al. Non-canonical RNA-DNA differences and other human genomic features are enriched within very short tandem repeats. PLoS Comput. Biol. 16, e1007968 (2020).
Acknowledgements
This study was supported by a Cancer Center Support Grant (P30CA118100) and R01ES030993-01A1 from the National Cancer Institute, funding from the National Institutes of Health (R21ES029302), a pilot grant from the UNM Center for Metals in Biology and Medicine (P20GM130422), the Bioinformatics Shared Resources and the Biostatistics Shared Resources at The Comprehensive Cancer Center. None of the funding bodies were involved in the study design; data collection, analysis or interpretation; or writing of the manuscript.
Author information
Authors and Affiliations
Contributions
Q.S., H.Y. and L.J. developed code for the protocol. M.D. and J.H. performed protocol testing. H.K. provided statistical support. J.J.W. and P.M. provided knowledge support for CPD-Seq. Y.G., P.M., H.Y. and S.N. wrote the manuscript. Y.G. and P.M. supervised the project.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Protocols thanks Ashby J. Morrison, Anna R. Poetsch and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Mao, P. et al. Nat. Commun. 9, 2626 (2018): https://doi.org/10.1038/s41467-018-05064-0
Mao, P. et al. Genome Res. 30, 12–21 (2020): https://doi.org/10.1101/gr.253146.119
Duan, M. et al. Proc. Natl Acad. Sci. USA 117, 18608–18616 (2020): https://doi.org/10.1073/pnas.2003868117
Supplementary information
Supplementary Data 1
Inventory of pre-compiled resources files for CPDSeqer. Includes file name, download link, brief description and applicable steps of BED files and naked DNA normalization files.
Rights and permissions
About this article
Cite this article
Sheng, Q., Yu, H., Duan, M. et al. A streamlined solution for processing, elucidating and quality control of cyclobutane pyrimidine dimer sequencing data. Nat Protoc 16, 2190–2212 (2021). https://doi.org/10.1038/s41596-021-00496-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-021-00496-3
This article is cited by
-
ASH1L-MRG15 methyltransferase deposits H3K4me3 and FACT for damage verification in nucleotide excision repair
Nature Communications (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.