Key Points
-
High-throughput (HTS) and virtual screening (VS) have progressed rather independently over the years. However, these disciplines have similar goals and are highly complementary. There are good indications that drug discovery research will increasingly benefit from an integrated approach to screening.
-
A diverse array of VS methods has been developed, including structural queries, pharmacophores, molecular fingerprints, QSAR models, diverse cluster analysis tools, statistical techniques and docking calculations. In addition, VS techniques have been implemented to filter large databases for compounds with desired or undesired chemical groups, drug-like character, preferred solubility and absorption characteristics, or oral bioavailability.
-
Both small-molecule- and structure-based VS have recently produced several success stories in the search for novel inhibitors or antagonists of diverse biological targets.
-
Some VS methods have been introduced or adapted for the analysis of HTS data, taking into account that such data sets are usually noisy and error prone. Prominent among these methods are different partitioning and clustering algorithms that can derive predictive models of biological activity from screening data.
-
Similar approaches are used to interface HTS and VS directly. At present, this is best accomplished by the application of iterative screening strategies, such as focused or sequential screening. Although the details of such strategies can differ considerably, they have in common that small subsets of compounds are computationally selected from large databases and assayed. On the basis of the obtained results, the search for biologically active molecules is further refined in subsequent iterations.
-
In several case studies, sequential screening has yielded significant improvements in hit rates over random screening. It is not uncommon for iterative screening to achieve hit rates between 10% and 40% (by markedly reducing the number of compounds to be tested).
-
As the size of compound databases and the number of available screening targets rapidly increase, it is conceivable that combined computational and biological screening might soon become a focal point of pharmaceutical research, despite the advances that are being made in the HTS arena towards even higher throughput.
Abstract
High-throughput and virtual screening are important components of modern drug discovery research. Typically, these screening technologies are considered distinct approaches, as one is experimental and the other is theoretical in nature. However, given their similar tasks and goals, these approaches are much more complementary to each other than often thought. Various statistical, informatics and filtering methods have recently been introduced to foster the integration of experimental and in silico screening and maximize their output in drug discovery. Although many of these ideas and efforts have not yet proceeded much beyond the conceptual level, there are several success stories and good indications that early-stage drug discovery will benefit greatly from a more unified and knowledge-based approach to biological screening, despite the many technical advances towards even higher throughput that are made in the screening arena.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Handen, J. S. High-throughput screening — challenges for the future. Drug Discov. World 47–50 (Summer 2002).
Fox, S., Farr-Jones, S. & Yund, M. A. High-throughput screening for drug discovery: continually transitioning into new technologies. J. Biomol. Screen. 4, 183–186 (1999).
Smith, A. Screening for drug discovery: the leading question. Nature 418, 453–459 (2002).
Fox, S., Farr-Jones, S., Sopchak, L. & Wang, H. Fine-tuning the technology strategies for lead finding. Drug Discov. World 24–30 (Summer 2002).
Bajorath, J. Rational drug discovery revisited: interfacing experimental programs with bio- and chemo-informatics. Drug Discov. Today 6, 989–995 (2001).
Drews, J. Drug discovery: a historical perspective. Science 287, 1960–1964 (2000).
Bajorath, J. Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening. J. Chem. Inf. Comput. Sci. 41, 233–245 (2001).
Bajorath, J. Virtual screening: methods, expectations, and reality. Curr. Drug Discov. 2, 24–28 (2002).
Brown, F. K. Chemoinformatics: what is it and how does it impact drug discovery. Annu. Rep. Med. Chem. 33, 375–384 (1998).
Agrafiotis, D. K., Lobanov, V. S. & Salemme, R. F. Combinatorial informatics in the post-genomics era. Nature Rev. Drug Discov. 1, 337–346 (2002). An excellent review of diversity analysis, library design and profiling methods.
Kuntz, I. D. Structure-based strategies for drug design and discovery. Science 257, 1078–1082 (1992).
Halpering, I., Ma, B., Wolfson, H. & Nussinov, R. Principles of docking: an overview of search algorithms and a guide to scoring functions. Proteins 47, 409–443 (2002).
Willett, P., Barnard, J. M. & Downs, G. M. Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996 (1998). This manuscript provides an introduction to similarity searching and a good description of different similarity metrics.
Livingstone, D. J. The characterization of chemical structures using molecular properties. A survey. J. Chem. Inf. Comput. Sci. 40, 195–209 (2000). An extensive review of different types of molecular property descriptor.
Cramer, R. D., Redl, G. & Berkoff, C. E. Substructural analysis. A novel approach to the problem of drug design. J. Med. Chem. 17, 533–535 (1974).
Barnard, J. M. Substructure searching methods. Old and new. J. Chem. Inf. Comput. Sci. 33, 532–538 (1993).
Gund, P. in Progress in Molecular and Subcellular Biology Vol. 5 (ed. Hahn, F. E.) 117–142 (Springer–Verlag, Berlin, 1977).
Sheridan, R. P., Rusinko, A., Nilakantan, R. & Venkataraghavan, R. Searching for pharmacophores in large coordinate databases and its use in drug design. Proc. Natl Acad. Sci. USA 86, 8156–8159 (1989).
Martin, Y. C. 3D database searching in drug design. J. Med. Chem. 35, 2145–2154 (1992).
Pearlman, R. S. Rapid generation of high quality approximate 3D molecular structures. Chem. Des. Auto. News 2, 1–7 (1987).
Gasteiger, J., Rudolph, C. & Sadowski, J. Automatic generation of 3D-atomic coordinates for organic molecules. Tetrahedron Comp. Method. 3, 537–547 (1990).
Cramer, R. D. et al. Prospective identification of biologically active structures by topomer similarity searching. J. Med. Chem. 42, 3919–3933 (1999).
Andrews, K. M. & Cramer, R. D. Toward general methods for targeted library design: topomer shape similarity with diverse structures as queries. J. Med. Chem. 43, 1723–1740 (2000).
Hall, L. H. & Kier, L. B. The E-state as the basis for molecular structure space definition and structure similarity. J. Chem. Inf. Comput. Sci. 40, 784–791 (2000).
Kier, L. B. & Hall, L. H. Database organization and searching with E-state indices. SAR QSAR Environ. Res. 12, 55–74 (2001).
Hull, R. D. et al. Latent semantic structure indexing (LaSSI) for defining chemical similarity. J. Med. Chem. 44, 1177–1184 (2001).
Raymond, J. W. & Willett, P. Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J. Comput. Aided Mol. Des. 16, 59–71 (2002).
Cramer, R. D., Patterson, D. E. & Bunce, J. D. Comparative molecular field analysis (CoMFA). Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 110, 5959–5967 (1988).
Hopfinger, A. J. et al. Construction of 3D-QSAR models using the 4D-QSAR analysis formalism. J. Am. Chem. Soc. 119, 10509–10524 (1997).
Duca, J. S. & Hopfinger, A. J. Estimation of molecular similarity based on 4D-QSAR analysis: formalism and validation. J. Chem. Inf. Comput. Sci. 41, 1367–1387 (2001).
Hopfinger, A. J., Reaka, A., Venkatarangan, P., Duca, J. S. & Wang, S. Construction of a virtual high throughput screen by 4D-QSAR analysis: application to a combinatorial library of glucose inhibitors of glycogen phosphorylase b. J. Chem. Inf. Comput. Sci. 39, 1151–1160 (1999). An instructive example of the adoption of a multidimensional QSAR model for VS calculations.
Xue, L., Godden, J. W. & Bajorath, J. Evaluation of descriptors and mini-fingerprints for the identification of molecules with similar activity. J. Chem. Inf. Comput. Sci. 40, 1227–1234 (2000).
Xue, L., Stahura, F. L., Godden, J. W. & Bajorath, J. Mini-fingerprints detect similar activity of receptor ligands previously recognized only by three-dimensional pharmacophore-based methods. J. Chem. Inf. Comput. Sci. 41, 394–401 (2001). This paper shows that conceptually simple but carefully designed 2D fingerprints can recognize molecules that have diverse structures but similar activity.
Mason, J. S. et al. New 4-point pharmacophore method for molecular similarity and diversity applications: overview over the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. J. Med. Chem. 42, 3251–3264 (1999). An extensive introduction to the four-point pharmacophore methodology.
Mason, J. S. & Cheney, D. L. Library design and virtual screening using multiple point pharmacophore fingerprints. Pac. Symp. Biocomput. 5, 576–587 (2000).
McGregor, M. J. & Muskal, S. M. Pharmacophore fingerprinting. 1. Application to QSAR and focused library design. J. Chem. Inf. Comput. Sci. 39, 569–574 (1999).
Bradley, E. K. et al. A rapid computational method for lead evolution: description and application to α1-adrenergic antagonists. J. Med. Chem. 43, 2770–2774 (2000).
Brown, R. D. & Martin, Y. C. Use of structure–activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584 (1996).
Brown, R. D. & Martin, Y. C. The information content of 2D and 3D molecular descriptors relevant to ligand–receptor binding. J. Chem. Inf. Comput. Sci. 37, 731–740 (1997).
Matter, H. Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional descriptors. J. Med. Chem. 40, 1219–1229 (1997).
Willett, P., Wintermann, V. & Bawden, D. Implementation of non-hierarchic cluster analysis methods in chemical information systems: selection of compounds for biological testing and clustering of substructure search output. J. Chem. Inf. Comput. Sci. 26, 109–118 (1986).
Barnard, J. M. & Downs, G. M. Clustering of chemical structures on the basis of two-dimensional similarity measures. J. Chem. Inf. Comput. Sci. 32, 644–649 (1992).
Pearlman, R. S. & Smith, K. M. Novel software tools for chemical diversity. Perspect. Drug Discov. Design 9, 339–353 (1998).
Pearlman, R. S. & Smith, K. M. Metric validation and the receptor-relevant subspace concept. J. Chem. Inf. Comput. Sci. 39, 28–35 (1999). A landmark paper rationalizing the design of low-dimensional reference spaces for cell-based partitioning.
Bayley, M. J. & Willett, P. Binning schemes for partition-based compound selection. J. Mol. Graph. Model. 17, 10–18 (1999).
Agrafiotis, D. K. & Rassokhin, D. N. A fractal approach for selecting an appropriate bin size for cell-based diversity estimation. J. Chem. Inf. Comput. Sci. 42, 117–122 (2002).
Xue, L. & Bajorath, J. Molecular descriptors for effective classification of biologically active compounds based on principal component analysis identified by a genetic algorithm. J. Chem. Inf. Comput. Sci. 40, 801–809 (2000).
Xie, D., Tropsha, A. & Schlick, T. An efficient projection protocol for chemical databases: single value decomposition combined with truncated Newton minimization. J. Chem. Inf. Comput. Sci. 40, 167–177 (2000).
Godden, J. W., Xue, L. & Bajorath, J. Classification of biologically active compounds by median partitioning. J. Chem. Inf. Comput. Sci. 42, 1263–1269 (2002).
Sheridan, R. P. & Kearsley, S. K. Why do we need so many chemical similarity search methods? Drug Discov. Today 7, 903–911 (2002).
Walters, W. P., Stahl, M. T. & Murcko, M. A. Virtual screening — an overview. Drug Discov. Today 3, 160–178 (1998).
Hann, M., Hudson, B., Lifely, R., Miller, L. & Ramsden, N. Strategic pooling of compounds for high-throughput screening. J. Chem. Inf. Comput. Sci. 39, 897–902 (1999).
Lipinski, C. A. Avoiding investments in doomed drugs. Curr. Drug Discov. 1, 17–19 (2001).
Sutter, J. M. & Jurs, P. C. Prediction of aqueous solubility for a diverse set of heteroatom-containing organic compounds using a quantitative structure–property relationship. J. Chem. Inf. Comput. Sci. 36, 100–107 (1996).
Huuskonen, J., Salo, M. & Taskinen, J. Aqueous solubility prediction of drugs based on molecular topology and neural network modeling. J. Chem. Inf. Comput. Sci. 38, 450–456 (1998).
Klopman, G. & Zhao, H. Estimation of aqueous solubility of organic molecules by the group contribution approach. J. Chem. Inf. Comput. Sci. 41, 439–445 (2001).
Jorgensen, W. L. & Duffy, E. R. Prediction of drug solubility from structures. Adv. Drug. Deliv. Rev. 54, 355–366 (2002).
Wessel, M. D., Jurs, P. C., Tolan, J. W. & Muskal, S. M. Prediction of human intestinal absorption of drug compounds from molecular structure. J. Chem. Inf. Comput. Sci. 38, 726–735 (1998).
Egan, W. J., Merz, K. M. Jr & Baldwin, J. J. Prediction of drug absorption using multivariate statistics. J. Med. Chem. 43, 3867–3877 (2000).
Ertl, P., Rohde, B. & Selzer, P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J. Med. Chem. 43, 3714–3717 (2000).
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).
Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).
Sheridan, R. P. The most common chemical replacements in drug-like compounds. J. Chem. Inf. Comput. Sci. 42, 103–108 (2002).
Oprea, T. Property distribution of drug-related chemical databases. J. Comput. Aided Mol. Des. 14, 251–264 (2000).
Ghose, A. K., Viswanadhan, V. N. & Wendoloski, J. J. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J. Comb. Chem. 1, 55–67 (1999).
Muegge, I., Heald, S. L. & Brittelli, D. Simple selection criteria for drug-like chemical matter. J. Med. Chem. 44, 1841–1846 (2001).
Gillet, V. J., Willett, P. & Bradshaw, J. Identification of biological activity profiles using substructural analysis and genetic algorithms. J. Chem. Inf. Comput. Sci. 38, 165–179 (1998). A good example of the usefulness of genetic algorithms in descriptor analysis. Here, a genetic algorithm implementation was used to assign weighting factors to molecular descriptors for the prediction of drug-like molecules.
Ajay, A., Walters, W. P. & Murcko, M. A. Can we learn to distinguish between 'drug-like' and 'nondrug-like' molecules? J. Med. Chem. 41, 3314–3324 (1998).
Sadowski, J. & Kubinyi, H. A scoring scheme to distinguish between drugs and non-drugs. J. Med. Chem. 41, 3325–3329 (1998). References 68 and 69 were the first to apply machine-learning techniques to the systematic prediction of drug-likeness. Different from QSAR-type analysis, neural network models can capture non-linear property relationships.
Norinder, U., Sjöberg, P. & Österberg, T. Theoretical calculation and prediction of blood–brain-barrier partitioning of organic solutes using MolSurf parameterization and PLS statistics. J. Pharm. Sci. 87, 952–959 (1998).
van de Waterbeemd, H., Camenisch, G., Folkers, G., Chretien, J. R. & Raevsky, O. A. Estimation of blood–brain barrier crossing of drugs using molecular size and shape, and H-bonding descriptors. J. Drug Target. 6, 151–165 (1998).
Kelder, J., Grootenhuis, P. D., Bayada, D. M., Delbressine, L. P. & Ploemen, J. P. Polar molecular surface as a dominating determinant for oral absorption and brain penetration of drugs. Pharm. Res. 16, 1514–1519 (1999).
Ajay, A., Bemis, G. W. & Murcko, M. A. Designing libraries with CNS activity. J. Med. Chem. 42, 4942–4951 (1999).
Caldwell, G. W., Ritchie, M. M., Masucci, J. A., Hagemann, W. & Yan, Z. The new pre-clinical paradigm: compound optimization in early and late phase drug discovery. Curr. Topics Med. Chem. 1, 353–366 (2001).
Yoshida, F. & Topliss, J. G. QSAR model for drug human oral bioavailability. J. Med. Chem. 43, 2575–2585 (2000).
de Groot, M. J., Ackland, M. J., Horne, V. A., Alex, A. A. & Jones, B. C. A novel approach to predicting P450 mediated drug metabolism. CYP2D6 catalyzed N-dealkylation reactions and qualitative metabolite predictions using a combined protein and pharmacophore model for CYP2D6. J. Med. Chem. 42, 1515–1524 (1999).
Ekins, S. et al. Three- and four-dimensional quantitative structure–activity relationship (3D/4D-QSAR) analyses of CYP2C9 inhibitors. Drug Metab. Dispos. 28, 994–1002 (2000).
Jones, J. P., Mysinger, M. & Korzekwa, K. R. Computational models for cytochrome P450: a predictive electronic model for aromatic oxidation and hydrogen atom abstraction. Drug Metab. Dispos. 30, 7–12 (2002).
Ahlberg, C. Visual exploration of HTS databases: bridging the gap between chemistry and biology. Drug Discov. Today 4, 370–376 (1999).
Engels, M. F., Wouters, L., Verbeeck, R. & Vanhoof, G. Outlier mining in high throughput screening experiments. J. Biomol. Screen. 7, 341–351 (2002).
Chen, X., Rusinko, A. & Young, S. S. Recursive partitioning analysis of a large structure–activity data set using three-dimensional descriptors. J. Chem. Inf. Comput. Sci. 38, 1054–1062 (1998).
Rusinko, A., Farmen, M. W., Lambert, C. G., Brown, P. L. & Young, S. S. Analysis of a large structure–biological activity data set using recursive partitioning. J. Chem. Inf. Comput. Sci. 39, 1017–1026 (1999). References 81 and 82 establish the recursive partitioning approach for the analysis and mining of large screening data sets.
Cho, S. J., Shen, C. F. & Hermsmeier, M. A. Binary formal inference-based recursive modeling using multiple atom and physicochemical property class pair and torsion descriptors as decision criteria. J. Chem. Inf. Comput. Sci. 40, 668–680 (2000).
van Rhee, A. M. et al. Retrospective analysis of an experimental high-throughput screening data set by recursive partitioning. J. Comb. Chem. 3, 267–277 (2001).
Miller, D. A. Results of a new classification algorithm combining K nearest neighbors and recursive partitioning. J. Chem. Inf. Comput. Sci. 41, 168–175 (2001).
Blower, P., Fligner, M., Verducci, J. & Bjoraker, J. On combining recursive partitioning and simulated annealing to detect groups of biologically active compounds. J. Chem. Inf. Comput. Sci. 42, 393–404 (2002).
Nicolaou, C. A., Tamura, S. Y., Kelley, B. P., Bassett, S. I. & Nutt, R. F. Analysis of large screening data sets via adaptively grown phylogenetic-like trees. J. Chem. Inf. Comput. Sci. 42, 1069–1079 (2002). The introduction of a new clustering method that shows promise in extracting diverse structure–activity relationships from screening data.
Tamura, S. Y., Bacha, P. A., Gruver, H. S. & Nutt, R. F. Data analysis of high-throughput screening results: application of multidomain clustering to the NCI anti-HIV. J. Med. Chem. 45, 3082–3093 (2002).
Menard, P. R., Lewis, R. A. & Mason, J. S. Rational screening set design and compound selection: cascaded clustering. J. Chem. Inf. Comput. Sci. 38, 497–505 (1998).
Rosenkranz, H. S. et al. Development, characterization and application of predictive-toxicology models. SAR QSAR Environ. Res. 10, 277–298 (1999).
Roberts, G., Myatt, G. J., Johnson, W. P., Cross, K. P. & Blower, P. LeadScope: software for exploring large sets of screening data. J. Chem. Inf. Comput. Sci. 40, 1302–1314 (2000).
Labute, P. Binary QSAR: a new method for the determination of quantitative structure activity relationships. Pac. Symp. Biocomput. 4, 444–455 (1999).
Gao, H. Application of BCUT metrics and genetic algorithm in binary QSAR analysis. J. Chem. Inf. Comput. Sci. 41, 402–407 (2001).
Gao, H., Williams, C., Labute, P. & Bajorath, J. Binary quantitative structure–activity relationship (QSAR) analysis of estrogen receptor ligands. J. Chem. Inf. Comput. Sci. 39, 164–168 (1999).
Stahura, F. L., Godden, J. W., Xue, L. & Bajorath, J. Distinguishing between natural products and synthetic molecules by Shannon descriptor entropy analysis and binary QSAR calculations. J. Chem. Inf. Comput. Sci. 40, 1245–1252 (2000).
Stahura, F. L., Godden, J. W. & Bajorath, J. Differential Shannon entropy analysis identifies molecular descriptors that predict aqueous solubility of synthetic compounds with high accuracy in binary QSAR calculations. J. Chem. Inf. Comput. Sci. 42, 550–558 (2002).
Harper, G., Bradshaw, J., Gittin, J. C., Green, D. V. S. & Leach, A. R. Prediction of biological activity for high-throughput screening using binary kernel discrimination. J. Chem. Inf. Comput. Sci. 41, 1295–1300 (2001).
Doman, T. N. et al. Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J. Med. Chem. 45, 2213–2221 (2002). One of very few case studies that directly compares the performance of VS and HTS analysis.
Singh, J. et al. Identification of potent and novel α4β1 antagonists using in silico screening. J. Med. Chem. 45, 2988–2993 (2002).
Gr¨neberg, S., Stubbs, M. T. & Klebe, G. Successful virtual screening for novel inhibitors of human carbonic anhydrase: strategy and experimental confirmation. J. Med. Chem. 45, 3588–3602 (2002).
Stahura, F. L., Xue, L., Godden, J. W. & Bajorath, J. Methods for compound selection focused on hits and application in drug discovery. J. Mol. Graph. Model. 20, 439–446 (2002).
Manallack, D. T. et al. Selecting screening candidates for kinase and G protein-coupled receptor targets using neural networks. J. Chem. Inf. Comput. Sci. 42, 1256–1262 (2002).
Valler, M. J. & Green, D. Diversity screening versus focused screening in drug discovery. Drug Discov. Today 5, 286–293 (2000).
Martin, Y. C., Kofron, J. L. & Traphagen, L. M. Do structurally similar molecules have similar biological activity? J. Med. Chem. 45, 4350–4358 (2002).
Engels, M. F. M. & Venkatarangan, P. Smart screening: approaches to efficient HTS. Curr. Opin. Drug Discov. Develop. 4, 275–283 (2001). An instructive description of a sequential-screening strategy, including several interesting benchmark calculations.
Engels, M. F. M., Thielemans, T., Verbinnen, D., Tollenaere, J. P. & Verbeeck, R. CerBeruS: a system supporting the sequential screening process. J. Chem. Inf. Comput. Sci. 40, 241–245 (2000).
Jones-Hertzog, D. K., Mukhopadhyay, P., Keefer, C. E. & Young, S. S. Use of recursive partitioning in the sequential screening of G protein-coupled receptors. J. Pharmacol. Toxicol. Methods 42, 207–215 (1999).
Kauvar, L. M. et al. Predicting ligand binding to proteins by affinity fingerprinting. Chem. Biol. 2, 107–118 (1995).
Dixon, S. L. & Villar, H. O. Bioactive diversity and screening library selection via affinity fingerprinting. J. Chem. Inf. Comput. Sci. 38, 1192–1203 (1998). This paper describes the application of affinity fingerprints in iterative screening situations and provides insights into the predictive value of this approach.
McGovern, S. L., Caselli, E., Grigorieff, N. & Shoichet, B. K. A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. J. Med. Chem. 45, 1712–1722 (2002).
Powers, R. A., Morandi, F. & Shoichet, B. K. Structure-based discovery of a novel, noncovalent inhibitor AmpC β-lactamase. Structure 10, 1013–1023 (2002).
Sotriffer, C. A., Gohlke, H. & Klebe, G. Docking into knowledge-based potential fields: a comparative evaluation of DrugScore. J. Med. Chem. 45, 1967–1970 (2002).
Wei, B., Baase, W., Weaver, L. Matthews & Shoichet, B. K. A model binding site for testing scoring functions in molecular docking. J. Mol. Biol. 322, 339–355 (2002). A well-designed study that uses T4 lysozyme mutant structures as a versatile model system for the evaluation of docking and scoring functions.
Acknowledgements
The author is grateful to F. Stahura for critical review of the manuscript and help with illustrations.
Author information
Authors and Affiliations
Related links
Glossary
- SUBSTRUCTURE
-
A defined structural fragment of a molecule.
- PHARMACOPHORE
-
The spatial arrangement of chemical groups or features in a molecule that are known or thought to determine its activity. The most popular pharmacophore models consist of three or four points separated by defined distance ranges. In most cases, pharmacophore geometry is not known from experiment, but is predicted.
- MOLECULAR GRAPH
-
A two-dimensional representation of the connectivity pattern in a molecule, with atoms shown as vertices and bonds as edges.
- QUANTITATIVE STRUCTURE–ACTIVITY RELATIONSHIP (QSAR).
-
QSAR analysis refers to methods that relate structural features of molecules to biological activity in quantitative terms. In most cases, QSAR analysis attempts to establish linear relationships between selected structural features in a series of related molecules and their known level of activity. If successful, models derived from training sets can be applied to predict molecules with higher potency.
- BINARY BIT STRING
-
A series of 1 or 0 characters. Each bit position is either set 'on' (that is, set to 1) or 'off' (0), and can account for the presence or absence of a specific feature.
- TANIMOTO COEFFICIENT
-
The most popular metric for the quantitative comparison of binary molecular fingerprints. This coefficient is defined as Tc = bc/(b1 + b2 − bc). In this formulation, b1 represents the number of bits that are set on in the first fingerprint, b2 is the number of bits that are set on in the second fingerprint, and bc is the number of bits common to both fingerprints. If the Tc value is 1, then the compared fingerprints are identical.
- COMBINATORIAL PROBLEM
-
As used here, the term describes the situation that the number of possible pairwise comparisons c grows with the number of objects n according to the formula c = n(n − 1)/2. So, if n becomes increasingly large, methods that rely on pairwise comparisons of, for example, database molecules become computationally infeasible.
- BINNING
-
This process divides coordinate axes into intervals (typically of equal size). If binning is applied to the axes of 2D and 3D coordinate systems, grids and cells are obtained, respectively.
- NEURAL NETWORK
-
Artificial neural networks are collections of mathematical models that are interconnected and organized in different layers. Given this architecture, the models correspond to neurons and the connections to synapses of the nervous system. Neural network simulations are analogous to an adaptive learning process. So, neural nets are typically trained to distinguish between different objects and their properties in learning sets, and the resulting models are then applied to make predictions on test sets.
- QUANTITATIVE STRUCTURE–PROPERTY RELATIONSHIP (QSPR).
-
A variation of the QSAR approach, in which structural features of molecules are not quantitatively related to biological activity, but instead to physical properties, such as aqueous solubility or passive absorption.
- DRUG-LIKE
-
The concept of 'drug-likeness' is based on the premise that drugs share specific molecular characteristics that systematically distinguish them from other synthetic or natural compounds.
- LOGP(O/W)
-
The logarithm of the octanol/water partition coefficient (often abbreviated logP) describes the solubility of a compound in octanol (hydrophobic solvent) relative to its solubility in water (polar solvent).
- RULE-OF-FIVE
-
On the basis of statistical analysis of known drugs, candidate compounds are likely to have unfavourable absorption, permeation and bioavailability characteristics if they contain more than 5 hydrogen-bond donors, more than 10 hydrogen-bond acceptors, a logP greater than 5 and/or a molecular mass of more than 500 Da.
- PRINCIPAL COMPONENT ANALYSIS
-
(PCA). A mathematical method that captures the variance in a data set with respect to chosen variables, and transforms correlated variables into a smaller number of uncorrelated ones for data presentation.
- GENETIC ALGORITHM
-
Computational implementation of a problem-solving approach that uses principles of biological competition and population dynamics. Model parameters are encoded in a 'chromosome', and are varied. Chromosomes yield possible solutions to a given problem by means of a fitness function. Chromosomes that correspond to the best intermediate solutions are subjected to operations that are analogous to gene recombination and mutation to produce the next generation. This process continues until solutions reach a predefined convergence criterion.
- ADME
-
Absorption, distribution, metabolism and excretion are important effects that determine the in vivo characteristics of drug (candidate) molecules.
- DECISION TREE
-
A data set is successively divided at decision points. At each point, a 'yes' or 'no' decision is made for each object, dividing the data into smaller and smaller subsets along the tree. All objects in a given subset share the same signature of 'yes' or 'no' decisions.
- BINARY DESCRIPTORS
-
These types of descriptor capture two defined states (and not continuous value ranges). Typical examples include a specific substructure or bond pattern. The feature detected by a binary descriptor is either 'present' (state 1) or 'absent' (state 2). Application of binary descriptors allows the classification of molecular data sets by means of decision trees.
- SCAFFOLD
-
Often defined as the core structure of a small molecule, the scaffold is typically a ring system that has diverse chemical groups attached. Accordingly, it is obtained by removal of these attached groups.
- CHEMOTYPE
-
A family of molecules that has a unique core structure or scaffold.
- PHYLOGENETIC TREE
-
This classification structure has its origin in biology to describe evolutionary relationships. It classifies a family of objects into 'most-similar' sets by subdividing them at branch points into successively smaller subsets with increasing object similarity. The final subsets represent unique leaves of the tree. Different from a simple decision tree, a phylogenetic tree structure can create multiple branches at each point.
- BAYES' THEOREM
-
A mathematical formulation that determines the probability that a specific result was due to a particular cause, if multiple possible causes exist. For example, a molecular database consists of 50% synthetic reagents, 30% drug-like molecules and 20% natural products. If the activity rates of synthetic compounds, drug-like molecules and natural products are 1%, 50% and 15%, respectively, what is the probability that a given biological activity in this database is represented by a natural product?
- SIMILARITY PARADOX
-
In the context of virtual screening (VS), minor chemical modifications of otherwise similar molecules can render them either active or inactive. VS calculations are expected to identify series of molecules that share the same scaffold. However, if only inactive compounds were selected for testing, VS analysis would have 'failed', although a relevant chemotype was identified. This highlights potential problems associated with the selection of only one or a few representative molecules from a series of similar ones.
- ANALOGUE
-
A member of a series of closely related molecules that has only minor chemical modifications that distinguish it from others belonging to this chemotype. Analogues of active molecules are often generated to improve potency and/or other compound characteristics, such as solubility or oral availability.
Rights and permissions
About this article
Cite this article
Bajorath, J. Integration of virtual and high-throughput screening. Nat Rev Drug Discov 1, 882–894 (2002). https://doi.org/10.1038/nrd941
Issue Date:
DOI: https://doi.org/10.1038/nrd941
This article is cited by
-
PermuteDDS: a permutable feature fusion network for drug-drug synergy prediction
Journal of Cheminformatics (2024)
-
Performance evaluation of drug synergy datasets using computational intelligence approaches
Multimedia Tools and Applications (2024)
-
Comprehensive analysis of Seriphidium kurramense: GC/MS profiling, antibacterial and antibiofilm activities, molecular docking study and in-silico ADME profiling
Discover Applied Sciences (2024)
-
Therapeutic Potential of HMF and Its Derivatives: a Computational Study
Applied Biochemistry and Biotechnology (2024)
-
Molecular docking as a tool for the discovery of molecular targets of nutraceuticals in diseases management
Scientific Reports (2023)