Abstract
Single-cell RNA sequencing (scRNA-seq) is one of the most efficient technologies for human tumor research. However, data analysis is still faced with technical challenges, especially the difficulty in efficiently and accurately discriminating cancer/normal cells in the scRNA-seq expression matrix. If we can address these challenges, we can have a deeper understanding of the intratumoral and intertumoral heterogeneity. In this study, we developed a cancer/normal cell discrimination pipeline called pan-Cancer Seeker (CaSee) devoted to scRNA-seq expression matrix, which is based on the traditional high-quality pan-cancer bulk sequencing data using transfer learning. CaSee is the first tool directly used to discriminate cancer/normal cells in the scRNA-seq expression matrix, with much wider application fields and higher efficiency than copy number variation (CNV) method which requires corresponding reference cells. CaSee is user-friendly and can adapt to a variety of data sources, including but not limited to scRNA tissue sequencing data, scRNA cell line sequencing data, scRNA xenograft cell sequencing data and scRNA circulating tumor cell sequencing data. It is compatible with mainstream sequencing technology platforms, 10× Genomics Chromium, Smart-seq2, and Microwell-seq. Here, CaSee pipeline exhibited excellent performance in the multicenter data evaluation of 11 retrospective cohorts and one independent dataset, with an average discrimination accuracy of 96.69%. In general, the development of a deep-learning based, pan-cancer cell discrimination model, CaSee, to distinguish cancer cells from normal cells will be compelling to researchers working in the genomics, cancer, and single-cell fields.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 50 print issues and online access
$259.00 per year
only $5.18 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data download address links used in this study are in Supplementary Table 2. Raw CTC single-cell sequencing data is available in the China National Center for Bioinformation (BioProject ID PRJCA007531, accession number HRA002759).
Code availability
All data processing codes, and configuration files are stored on GitHub (https://github.com/yuansh3354/CaSee).
References
Wagner J, Rapsomaniki MA, Chevrier S, Anzeneder T, Langwieder C, Dykgers A, et al. A Single-Cell Atlas of the Tumor and Immune Ecosystem of Human Breast Cancer. Cell. 2019;177:1330–1345.e18.
Villani A-C, Satija R, Reynolds G, Sarkizova S, Shekhar K, Fletcher J, et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 2017;356:eaah4573.
Peng J, Sun B-F, Chen C-Y, Zhou J-Y, Chen Y-S, Chen H, et al. Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 2019;29:725–38.
Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 2014;344:1396–401.
Nguyen QH, Pervolarakis N, Blake K, Ma D, Davis RT, James N, et al. Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity. Nat Commun. 2018;9:2028.
Bassez A, Vos H, Van Dyck L, Floris G, Arijs I, Desmedt C, et al. A single-cell map of intratumoral changes during anti-PD1 treatment of patients with breast cancer. Nat Med. 2021;27:820–32.
Kim N, Kim HK, Lee K, Hong Y, Cho JH, Choi JW, et al. Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma. Nat Commun. 2020;11:2285.
Zhang X, Lan Y, Xu J, Quan F, Zhao E, Deng C, et al. CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res. 2019;47:D721–D728.
Yuan H, Yan M, Zhang G, Liu W, Deng C, Liao G, et al. CancerSEA: a cancer single-cell state atlas. Nucleic Acids Res. 2019;47:D900–D908.
Oh DY, Kwek SS, Raju SS, Li T, McCarthy E, Chow E, et al. Intratumoral CD4+ T Cells Mediate Anti-tumor Cytotoxicity in Human Bladder Cancer. Cell. 2020;181:1612–1625.e13.
Taylor AM, Shih J, Ha G, Gao GF, Zhang X, Berger AC, et al. Genomic and functional approaches to understanding cancer aneuploidy. Cancer Cell. 2018;33:676–689.e3.
Zarrei M, MacDonald JR, Merico D, Scherer SW. A copy number variation map of the human genome. Nat Rev Genet. 2015;16:172–83.
Zhou Y, Bian S, Zhou X, Cui Y, Wang W, Wen L, et al. Single-cell multiomics sequencing reveals prevalent genomic alterations in tumor stromal cells of human colorectal cancer. Cancer Cell. 2020;38:818–828.e5.
Gao R, Bai S, Henderson YC, Lin Y, Schalck A, Yan Y, et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat Biotechnol. 2021;39:599–608.
Shao X, Yang H, Zhuang X, Liao J, Yang P, Cheng J, et al. scDeepSort: A pre-trained cell-type annotation method for single-cell transcriptomics using deep learning with a weighted graph neural network. Nucleic Acids Res. 2021;49:e122–e122.
He Y, Yuan H, Wu C, Xie Z. DISC: A highly scalable and accurate inference of gene expression and structure for single-cell transcriptomes using semi-supervised deep learning. Genome Biol. 2020;21:170.
Yamada H, Liu C, Wu S, Koyama Y, Ju S, Shiomi J, et al. Predicting materials properties with little data using shotgun transfer learning. ACS Cent Sci. 2019;5:1717–30.
Zhu R, Qiu T, Wang J, Sui S, Hao C, Liu T, et al. Phase-to-pattern inverse design paradigm for fast realization of functional metasurfaces via transfer learning. Nat Commun. 2021;12:2974.
Hu J, Li X, Hu G, Lyu Y, Susztak K, Li M. Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis. Nat Mach Intell. 2020;2:607–18.
Qiu YL, Zheng H, Devos A, Selby H, Gevaert O. A meta-learning approach for genomic survival analysis. Nat Commun. 2020;11:6350.
Bell CC, Fennell KA, Chan Y-C, Rambow F, Yeung MM, Vassiliadis D, et al. Targeting enhancer switching overcomes non-genetic drug resistance in acute myeloid leukaemia. Nat Commun. 2019;10:2723.
Maynard A, McCoach CE, Rotow JK, Harris L, Haderk F, Kerr DL, et al. Therapy-induced evolution of human lung cancer revealed by single-cell RNA sequencing. Cell. 2020;182:1232–1251.e22.
Roels J, Kuchmiy A, De Decker M, Strubbe S, Lavaert M, Liang KL, et al. Distinct and temporary-restricted epigenetic mechanisms regulate human αβ and γδ T cell development. Nat Immunol. 2020;21:1280–92.
Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184:3573–3587.e29.
Krämer A, Green J, Pollard J, Tugendreich S. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics. 2014;30:523–30.
Xi E, Bing S, Jin Y. Capsule Network Performance on Complex Data. arXiv:171203480 [cs, stat] 2017. http://arxiv.org/abs/1712.03480 (accessed 8 Dec2021).
Wang L, Nie R, Yu Z, Xin R, Zheng C, Zhang Z, et al. An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nat Mach Intell. 2020;2:693–703.
Qiao K, Zhang C, Wang L, Yan B, Chen J, Zeng L, et al. Accurate reconstruction of image stimuli from human fMRI based on the decoding model with capsule network architecture. 14.
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017;60:84–90.
Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. arXiv:171009829 [cs] 2017. http://arxiv.org/abs/1710.09829 (accessed 8 Dec2021).
Kharchenko PV, Silberstein L, Scadden DT. Bayesian approach to single-cell differential expression analysis. Nat Methods. 2014;11:740–2.
Rambow F, Rogiers A, Marin-Bejar O, Aibar S, Femel J, Dewaele M, et al. Toward minimal residual disease-directed therapy in melanoma. Cell. 2018;174:843–855.e19.
Lee H-O, Hong Y, Etlioglu HE, Cho YB, Pomella V, Van den Bosch B, et al. Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat Genet. 2020;52:594–603.
Oren Y, Tsabar M, Cuoco MS, Amir-Zilberstein L, Cabanos HF, Hütter J-C, et al. Cycling cancer persister cells arise from lineages with distinct programs. Nature. 2021;596:576–82.
Han X, Zhou Z, Fei L, Sun H, Wang R, Chen Y, et al. Construction of a human cell landscape at single-cell level. Nature. 2020;581:303–9.
Bischoff P, Trinks A, Obermayer B, Pett JP, Wiederspahn J, Uhlitz F, et al. Single-cell RNA sequencing reveals distinct tumor microenvironmental patterns in lung adenocarcinoma. Oncogene 2021. https://doi.org/10.1038/s41388-021-02054-3.
Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban RH, Hamilton SR, et al. Gene expression profiles in normal and cancer cells. Science. 1997;276:1268–72.
Andreatta M, Carmona SJ. UCell: Robust and scalable single-cell gene signature scoring. Computational Struct Biotechnol J. 2021;19:3796–8.
Hu L, Liang S, Chen H, Lv T, Wu J, Chen D, et al. ΔNp63α is a common inhibitory target in oncogenic PI3K/Ras/Her2-induced cell motility and tumor metastasis. Proc Natl Acad Sci USA. 2017;114:E3964–E3973.
Acknowledgements
This work was supported by the China Postdoctoral Science Foundation (2021M690806), Beijing Natural Science Foundation Haidian original innovation joint fund (L202023), the National Natural Science Foundation of China (32027801, 31870992, 21775031), the Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDB36000000, XDB38010400), CAS-JSPS (Grant No. GJHZ2094), Research Foundation for Advanced Talents of Fujian Medical University (XRCZX2017020, XRCZX2019005). The funding body had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. We thank Dr.Jianming Zeng(University of Macau), and all the members of his bioinformatics team, biotrainee, for generously sharing their experience and codes. We also thank Prof. Xiaopei Shen (Fujian Medical University) and his teammate, Hao Fu, Haibo Zhu, Guanghao Liu, Mengyao Wang, et al. for generously help, discussion, and advice about CaSee pipeline.
Author information
Authors and Affiliations
Contributions
YSH and XLZ were responsible for designing, analysing data, interpreting results and writing the paper. ZMY was responsible for analysing data. JYD, YZW, YZ, and XJL were responsible for CTC isolation and scRNA-seq sequencing. ZYH and CXG contributed to interpreting results. XLZ and ZYH conducted the analyses.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sh, Y., Zhang, X., Yang, Z. et al. CaSee: A lightning transfer-learning model directly used to discriminate cancer/normal cells from scRNA-seq. Oncogene 41, 4866–4876 (2022). https://doi.org/10.1038/s41388-022-02478-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41388-022-02478-5
This article is cited by
-
Domain generalization enables general cancer cell annotation in single-cell and spatial transcriptomics
Nature Communications (2024)