The global attention on repurposing existing treatments for COVID-19 has highlighted the potential for drug repurposing.

Thousands of existing drugs have already been studied in in vitro and in vivo models of COVID-19, 400+ drugs have been administered to COVID-19 patients, and dozens of drugs have undergone randomized controlled trials1,2. Several existing treatments, including dexamethasone, tocilizumab and heparin are likely to have contributed to major reductions in mortality among critically ill patients with COVID-19 by limiting immune hyper-activation and mitigating clotting-related sequelae of the virus3,4,5. Relatively less success has been found with antivirals that could be effective earlier in the disease course at preventing COVID-19 morbidity and mortality. In order to efficiently identify treatments among all of the parallel efforts and corresponding datasets, it is critical to pool data and apply cutting-edge machine learning (ML) approaches to predict promising treatment approaches for further development. Writing in Nature Machine Intelligence, Tudor I. Oprea and colleagues6 present an open-source, open-access ML suite called REDIAL-2020 for estimating anti-SARS-CoV-2 activities from molecular structure.

By modelling eleven assays spanning across the areas of viral entry, viral replication, live virus infectivity, in vitro infectivity and human cell toxicity to publicly available high-throughput drug screen (HTDS) data, this tool provides a useful method for screening novel compounds for anti-SARS-CoV-2 activities and prioritizing promising compounds for further development. The sheer volume of HTDS data for SARS-CoV-2, variety of assays utilized to assess various anti-SARS-CoV-2 activities and cytotoxicity, and heterogeneity between cell lines7 utilized for these screens make computational tools like REDIAL-2020 critically important to identify certain classifiers that can predict antiviral activity. A limitation acknowledged by the authors is the substantial intra- and inter-experimental variability. Given that drugs should be active in some assays (alpa LISA, 3CL, CPE) but inactive in others (TruHit, ACE2, Cytotox), each prediction comes with brief text to help the end user decide if the estimate is consistent with a desirable outcome. One limitation of this approach — and HTDS generally — is that it identifies drugs with potential antiviral effects but does not assess immunomodulatory activity, which has proven to be critical in COVID-19. Finally, this tool is only as good as the assays underlying it. Despite millions of dollars of investment into HTDS over the past two decades, relatively few repurposed drugs have been identified as effective treatments in patients. That said, the rapid basic understanding of SARS-CoV-2 biology, as well as the large number of efforts, likely increases the potential for success in COVID-19.

Thus, a key unanswered question is: does the tool perform well at predicting effectiveness in humans? It would have been useful to highlight the predicted activities of drugs that have been found to be effective in randomized controlled trials (for example, remdesivir) and drugs that have not (for example, hydroxychloroquine). However, the feedback loop of this intrinsic limitation of basic, preclinical research (that is, HTDS) is rarely supplemented with clinical insight. It would have also been helpful to see a list of the antivirals that appeared most promising. Fortunately, the authors provide the data in an easy-to-use format that can be accessed on the open-source and open-access DrugCentral web portal. While Oprea and colleagues have led the way with sharing open-source data for over a decade8,9,10, the global shift towards more data sharing and access during the COVID-19 pandemic is likely to speed up progress for COVID-19 therapeutics and ML advances beyond the pandemic.

As drugs work through modulation of therapeutic targets, rather than directly modulating a disease, HTDS provides a potentially helpful step to quickly screen large amounts of drugs and potential drug targets, which may be involved in a disease like COVID-19. Though we already know certain drug–drug target–disease links, drugs may have multiple drug targets, some of which may be yet unknown. Additionally, drug targets may be involved in multiple diseases, some of which may also be unknown (Fig. 1). ML allows us to gain insights from HTDS data and identify both drug targets and applications for diseases which may yet be unknown.

Fig. 1: Fundamental basis for drug repurposing.
figure 1

A given drug (drug A) can be helpful for multiple diseases because multiple diseases may share the same therapeutic target or because that drug may hit multiple targets each involved in multiple diseases. Some targets of therapeutics are well established while others are not, and some therapeutic targets for diseases are well established while others are not. The unknown number of targets for a given drug and the unknown number of therapeutic targets for a given disease make drug repurposing challenging and exciting. For instance, drug A can work in many ways, but only some are known. (1) It is known that drug A can inhibit target A to stop disease 1. (2) Drug A inhibiting target 1 is also able to stop disease 2 and others. But, no one knows that target A is also important for disease 2 or other diseases. (3) Drug A can also inhibit target B, which can stop diseases 3, 4 and so on. But no one knows that drug A can inhibit target 2 and/or that target 2 is important for diseases 3, 4 and so on. (4) Diseases also could have multiple targets, for which there might already be drugs available (drug B for target C that stops disease M), but the link between target C and disease M is unknown.

It is important to note that there are multiple potential starting points for drug repurposing: besides a ‘drug-first’ approach to identify promising drug candidates represented by HTDS, studies can also be ‘target-first’ by identifying therapeutic targets and then matching existing treatments or ‘data-first’ by applying ML approaches to the published literature to identify therapeutic targets and treatments. The Center for Cytokine Storm Treatment & Laboratory (CSTL) at the University of Pennsylvania, where we are based, has led a target-first approach for Castleman disease, and groups such as HealX and the Hugh Paul Precision Medicine Institute at the University of Alabama at Birmingham have led a data-first approach for other rare diseases11. Regardless of how a treatment is identified, in vitro and in vivo studies are often needed to validate preclinical findings before the drug is prescribed off-label (that is, for a specific disease that it is not approved to treat) for the disease of interest; occasionally, candidate drugs (for example, statins) are prescribed for another purpose (for example, elevated cholesterol) in a disease population of interest (for example, COVID-19)2. This real-world evidence is helpful to identify drugs that should proceed to pilot open-label (that is, a research study in which both the researcher and the participant know the treatment the participant is receiving) and ultimately randomized controlled trials to investigate efficacy (and safety). Next, systematic efforts should be made to determine if the drug should be adopted in clinical practice and/or given regulatory approval for the new disease indication. In parallel, it is important to track all drugs being used off-label and experimentally to identify promising candidates for further investigation. In COVID-19, several resources exist to harness and leverage these datasets. REDIAL-2020 is an important resource for helping to identify promising approaches from a drug-first perspective. CURE ID, CORONA Project1, N3C and COVID-19 Research Database centralize data on physician-reported treatment use, published uses of treatments, electronic medical record reports, and insurance claims data reports, respectively. These tools and resources focused on drug repurposing for COVID-19 have uncovered the potential and the pitfalls for drug repurposing more broadly.

In addition to tools that centrally track data on the various steps in the drug repurposing process, we need solutions to track data across the various steps for a given disease (and ultimately for all diseases). To that end, we launched the COVID-19 Registry of Off-Label and New Agents (CORONA) Project1 to track drug repurposing efforts for COVID-19. It is expanding to integrate preclinical data with data on all treatments reported to be given to patients with COVID-19. We hope that scientific advances like REDIAL-2020 and CORONA will catalyse repurposing efforts and progress beyond COVID-19, unlocking new treatments for the approximately 75% of the 10,000 human diseases that do not have any.