Abstract
High-throughput screening (HTS) is an integral part of early drug discovery. Herein, we focused on those small molecules in a screening collection that have never shown biological activity despite having been exhaustively tested in HTS assays. These compounds are referred to as 'dark chemical matter' (DCM). We quantified DCM, validated it in quality control experiments, described its physicochemical properties and mapped it into chemical space. Through analysis of prospective reporter-gene assay, gene expression and yeast chemogenomics experiments, we evaluated the potential of DCM to show biological activity in future screens. We demonstrated that, despite the apparent lack of activity, occasionally these compounds can result in potent hits with unique activity and clean safety profiles, which makes them valuable starting points for lead optimization efforts. Among the identified DCM hits was a new antifungal chemotype with strong activity against the pathogen Cryptococcus neoformans but little activity at targets relevant to human safety.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
References
Macarron, R. et al. Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10, 188–195 (2011).
Austin, C.P., Brady, L.S., Insel, T.R. & Collins, F.S. NIH Molecular Libraries Initiative. Science 306, 1138–1139 (2004).
Dobson, C.M. Chemical space and biology. Nature 432, 824–828 (2004).
Krier, M., Bret, G. & Rognan, D. Assessing the scaffold diversity of screening libraries. J. Chem. Inf. Model. 46, 512–524 (2006).
Chuprina, A., Lukin, O., Demoiseaux, R., Buzko, A. & Shivanyuk, A. Drug- and lead-likeness, target class, and molecular diversity analysis of 7.9 million commercially available organic compounds provided by 29 suppliers. J. Chem. Inf. Model. 50, 470–479 (2010).
Bickerton, G.R., Paolini, G.V., Besnard, J., Muresan, S. & Hopkins, A.L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).
Lipinski, C.A., Lombardo, F., Dominy, B.W. & Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).
Petrone, P.M. et al. Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem. Biol. 7, 1399–1409 (2012).
Petrone, P.M. et al. Biodiversity of small molecules—a new perspective in screening set selection. Drug Discov. Today 18, 674–680 (2013).
Wawer, M.J. et al. Toward performance-diverse small-molecule libraries for cell-based phenotypic screening using multiplexed high-dimensional profiling. Proc. Natl. Acad. Sci. USA 111, 10911–10916 (2014).
Wang, Y. et al. PubChem's BioAssay Database. Nucleic Acids Res. 40, D400–D412 (2012).
Wang, Y. et al. PubChem BioAssay: 2014 update. Nucleic Acids Res. 42, D1075–D1082 (2014).
Oprea, T.I. et al. A crowdsourcing evaluation of the NIH chemical probes. Nat. Chem. Biol. 5, 441–447 (2009).
Durstenfeld, R. Algorithm 235: Random permutation. Commun. ACM 7, 420 (1964).
Nissink, J.W.M. & Blackburn, S. Quantification of frequent-hitter behavior based on historical high-throughput screening data. Future Med. Chem. 6, 1113–1126 (2014).
Kenseth, J.R. & Coldiron, S.J. High-throughput characterization and quality control of small-molecule combinatorial libraries. Curr. Opin. Chem. Biol. 8, 418–423 (2004).
Gleeson, M.P., Hersey, A., Montanari, D. & Overington, J. Probing the links between in vitro potency, ADMET and physicochemical parameters. Nat. Rev. Drug Discov. 10, 197–208 (2011).
Azzaoui, K. et al. Modeling promiscuity based on in vitro safety pharmacology profiling data. ChemMedChem 2, 874–880 (2007).
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Stumpfe, D., Hu, Y., Dimova, D. & Bajorath, J. Recent progress in understanding activity cliffs and their utility in medicinal chemistry. J. Med. Chem. 57, 18–28 (2014).
Dimova, D., Hu, Y. & Bajorath, J. Matched molecular pair analysis of small molecule microarray data identifies promiscuity cliffs and reveals molecular origins of extreme compound promiscuity. J. Med. Chem. 55, 10220–10228 (2012).
Breinbauer, R., Manger, M., Scheck, M. & Waldmann, H. Natural product guided compound library development. Curr. Med. Chem. 9, 2129–2145 (2002).
King, F.J. et al. Pathway reporter assays reveal small molecule mechanisms of action. J. Assoc. Lab. Autom. 14, 374–382 (2009).
Nigsch, F. et al. Determination of minimal transcriptional signatures of compounds for target prediction. EURASIP J. Bioinform. Syst. Biol. 2012, 2 (2012).
Hoepfner, D. et al. High-resolution chemical dissection of a model eukaryote reveals targets, pathways and gene functions. Microbiol. Res. 169, 107–120 (2014).
Glerum, D.M., Shtanko, A., Tzagoloff, A., Gorman, N. & Sinclair, P.R. Cloning and identification of HEM14, the yeast gene for mitochondrial protoporphyrinogen oxidase. Yeast 12, 1421–1425 (1996).
Lee, A.Y. et al. Mapping the cellular response to small molecules using chemogenomic fitness signatures. Science 344, 208–211 (2014).
Camadro, J.M., Matringe, M., Scalla, R. & Labbe, P. Kinetic studies on protoporphyrinogen oxidase inhibition by diphenyl ether herbicides. Biochem. J. 277, 17–21 (1991).
Qin, X. et al. Structural insight into human variegate porphyria disease. FASEB J. 25, 653–664 (2011).
Hamon, J. et al. In vitro safety pharmacology profiling: what else beyond hERG? Future Med. Chem. 1, 645–665 (2009).
Watkins, R.E. et al. The human nuclear xenobiotic receptor PXR: structural determinants of directed promiscuity. Science 292, 2329–2333 (2001).
Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).
Rose, P.W. et al. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 43, D345–D356 (2015).
Pletnev, I. et al. InChIKey collision resistance: an experimental testing. J. Cheminform. 4, 39 (2012).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Bemis, G.W. & Murcko, M.A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).
Yan, B. et al. Quality control in combinatorial chemistry: determination of the quantity, purity, and quantitative purity of compounds in combinatorial libraries. J. Comb. Chem. 5, 547–559 (2003).
Gaugaz, F.Z. et al. The impact of cyclopropane configuration on the biological activity of cyclopropyl-epothilones. ChemMedChem 9, 2227–2232 (2014).
Clinical and Laboratory Standards Institute. Reference method for broth dilution antifungal susceptibility testing of filamentous fungi (approved standard) 2nd edn., MA38-A2 (Clinical and Laboratory Standards Institute, Wayne, Pennsylvania, USA, 2008).
Clinical and Laboratory Standards Institute. Reference method for broth dilution antifungal susceptibility testing of yeast (approved standard) 3rd edn., M27-A3 (Clinical and Laboratory Standards Institute, Wayne, Pennsylvania, USA, 2008).
Acknowledgements
A.M.W. and G.L.G. were presidential postdoctoral fellows supported by the Education Office of the Novartis Institutes for BioMedical Research. The authors thank M. Schirle, R. Nutiu, S. Reiling and E. Gregori-Puigjané for valuable discussions; T. Aust, O. Galuba and R. Riedl for support with the HIP and follow-up experiments; M. Popov and F. Nigsch for help with data mining; P. Selzer for the cell permeability model; G. Wendel, B. Burakowska and L. Koppes for help with compound management; and R. Guha, J. Bittker and J. Braisted for help with BARD.
Author information
Authors and Affiliations
Contributions
A.M.W., E.L., J.W.D. and M.G. conceived the study with contributions from A.S., I.M.W. and C.N.P. A.M.W. carried out the large-scale computational analyses of the Novartis and PubChem HTS assay results. G.L.G. performed the gene expression experiments. F.J.K. directed and analyzed the reporter-gene assay experiments. D.H. directed and analyzed the S. cerevisiae growth inhibition and chemogenomics experiments. C.S. performed S. cerevisiae experiments. J.M.P. and M.L.G. conducted the quality control experiments. J.T. and V.P. designed and performed the antifungal panel experiments. S.C. did safety profiling experiments. P.K. and A.C.-C. supervised the profiling of natural products against the cancer cell line panel. A.M.W., E.L., D.H., J.W.D. and M.G. wrote the manuscript with contributions from all authors that read and discussed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
As employees of Novartis, the authors do have a perceived financial conflict of interest.
Supplementary information
Supplementary Text and Figures
Supplementary Results, Supplementary Tables 1–12, Supplementary Note 1 and Supplementary Figures 1–14. (PDF 1946 kb)
Supplementary Data Set 1
PubChem assay identifiers. All PubChem bioassays used in the analysis are reported. If two assay identifiers are listed in the same row, the corresponding PubChem bioassays have been combined because they reported different readouts from the same experiment. (XLS 101 kb)
Supplementary Data Set 2
Compound structures. The file reports InChI keys and SMILES strings for all dark compounds identified in the PubChem data set and a subset (10,355 structures) of the dark compounds in the Novartis data set (due to intellectual property reasons not all structures can be made available). For each compound, the field “set” reports whether the compound was identified as dark chemical matter for the PubChem, Novartis or both data sets. (XLSX 7000 kb)
Supplementary Data Set 3
Quality control results. For 623 compound structures identified as dark chemical matter in the Novartis data set, results from our quality control experiments are reported. Purity, identity, concentration, and comments about how to interpret the observed data for special cases (e.g. highly fluorinated compounds) are given. Compounds are represented by InChI keys and SMLES strings. (XLSX 54 kb)
Supplementary Data Set 4
DCM scaffolds. The data set lists 95 scaffolds that were significantly enriched in the PubChem DCM set. Scaffolds are reported as SMILES strings. For each scaffold, numbers of PubChem DCM and ACT compounds that it represents are reported. (XLSX 12 kb)
Supplementary Data Set 5
Dark chemical matter Bayes classifier. We attach the naive Bayes model trained on the PubChem data set as Pipeline Pilot component (xml file). This component returns a dark matter score for each molecular data record sent to it. (XML 2227 kb)
Supplementary Data Set 6
Reporter gene assay results. For 322 active (“ACT”) and 337 dark (“DCM”) compounds, we make activity readouts from the reporter gene assay panel available. Each row in the data table reports normalized activities for one compound across the 41 RGAs given in Supplementary Table 10. Activities were obtained 24 hours after compound treatment. If a compound has been tested in replicates, the reported activity value is the average of the normalized activities obtained for the different replicates. For details on compound activity normalization see the main text and references provided therein. (XLSX 274 kb)
Supplementary Data Set 7
Gene expression profiles. For 89 active (“ACT”) and 111 dark (“DCM”) compounds, we report measured fold changes and calculated R-scores for the 61 genes in our transcriptional profiling panel. Supplementary Data Set 7 reports gene expression changes after compound treatment with a final compound concentration of 1 μM. Genes are represented by EntrezGene identifiers, as listed in Supplementary Table 11. (XLSX 516 kb)
Supplementary Data Set 8
Gene expression profiles. For 89 active (“ACT”) and 111 dark (“DCM”) compounds, we report measured fold changes and calculated R-scores for the 61 genes in our transcriptional profiling panel. Supplementary Data Set 7 reports gene expression changes after compound treatment with a final compound concentration of 10 μM. Genes are represented by EntrezGene identifiers, as listed in Supplementary Table 11. (XLSX 518 kb)
Supplementary Data Set 9
Yeast growth inhibition compound list. The data set lists 178 dark compounds that were tested in yeast growth inhibition experiments. Only compound 1 reported in the manuscript showed activity in confirmation experiments, i.e., all other compounds are considered as inactive. Compounds are reported as InChI keys and SMILES strings. (XLSX 18 kb)
Rights and permissions
About this article
Cite this article
Wassermann, A., Lounkine, E., Hoepfner, D. et al. Dark chemical matter as a promising starting point for drug lead discovery. Nat Chem Biol 11, 958–966 (2015). https://doi.org/10.1038/nchembio.1936
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nchembio.1936
This article is cited by
-
Progress on open chemoinformatic tools for expanding and exploring the chemical space
Journal of Computer-Aided Molecular Design (2022)
-
Treatment strategies for cryptococcal infection: challenges, advances and future outlook
Nature Reviews Microbiology (2021)
-
Novel lead structures with both Plasmodium falciparum gametocytocidal and asexual blood stage activity identified from high throughput compound screening
Malaria Journal (2017)
-
Opportunities and challenges in phenotypic drug discovery: an industry perspective
Nature Reviews Drug Discovery (2017)
-
Systematic chemical-genetic and chemical-chemical interaction datasets for prediction of compound synergism
Scientific Data (2016)