Artificial intelligence, machine learning, and drug repurposing in cancer

ABSTRACT Introduction: Drug repurposing provides a cost-effective strategy to re-use approved drugs for new medical indications. Several machine learning (ML) and artificial intelligence (AI) approaches have been developed for systematic identification of drug repurposing leads based on big data resources, hence further accelerating and de-risking the drug development process by computational means. Areas covered: The authors focus on supervised ML and AI methods that make use of publicly available databases and information resources. While most of the example applications are in the field of anticancer drug therapies, the methods and resources reviewed are widely applicable also to other indications including COVID-19 treatment. A particular emphasis is placed on the use of comprehensive target activity profiles that enable a systematic repurposing process by extending the target profile of drugs to include potent off-targets with therapeutic potential for a new indication. Expert opinion: The scarcity of clinical patient data and the current focus on genetic aberrations as primary drug targets may limit the performance of anticancer drug repurposing approaches that rely solely on genomics-based information. Functional testing of cancer patient cells exposed to a large number of targeted therapies and their combinations provides an additional source of repurposing information for tissue-aware AI approaches.


Introduction
Drug repurposing (also called drug repositioning, reprofiling, redirecting, and drug rediscovery [1]) is a strategy for identifying new therapeutic purposes for approved drugs in medical indications beyond the scope of their original therapeutic use [2].Drug repurposing offers various advantages over the denovo development of entirely new drugs, including the possibility to speed-up the discovery process and to reduce failure rates in the clinical development and testing phases [3].In particular, drug repurposing makes it possible to avoid safety evaluation in preclinical models and humans, hence leading to potentially lower overall development costs, if the safety testing has been completed for the original indication and it displays dose-compatibility with the new indication.Traditionally, drug repurposing success stories have mainly resulted from largely opportunistic and serendipitous findings [4]; for example, sildenafil citrate was originally developed as an antihypertensive drug, but later repurposed by Pfizer and marketed as Viagra for the treatment of erectile dysfunction based on retrospective clinical experience, leading to massive worldwide sales.
Over recent years, a number of computational approaches have been developed for a more systematic drug repurposing process.Popular information sources for in-silico drug repurposing include, for instance, electronic health records, genome-wide association analyses or gene expression response profiles, pathway mappings, compound structures, target-binding assays, and other phenotypic profiling data [4].Several systematic review articles on the use of computational approaches are available [4], which cover also machine learning (ML) and artificial intelligence (AI) algorithms, such as those based on network propagation, matrix factorization, and completion, as well as recently developed deep learning models [5][6][7][8].Databases and other resources supporting insilico drug repurposing, such as Drug Repurposing Hub [9] and RepurposeDB [10], have also been recently surveyed [11].There are also excellent reviews and perspectives on the use of ML and AI approaches in the overall drug discovery and development process [12,13], as well as in the lead optimization or designing of completely new molecules [14].
Our focus here is on supervised ML and AI methods that make use of publicly available databases and information sources.A particular emphasis is placed on the use of comprehensive target activity profiles of drugs as a resource for a systematic repurposing process, in which an existing drug is found to have an off-target effect or a newly recognized ontarget effect for a new indication, hence providing sufficient evidence to take it forward for further development and commercial exploitation.Such target-based drug repurposing makes use of the fact that most drugs are not specific for any single target, but rather display a wide spectrum of target activity.In cancer applications, some of the unintended offtargets correspond to known anticancer targets, while others may reveal new cancer vulnerabilities [15].However, we note that drug repurposing is not by any means limited to anticancer applications alone, but covers various medical indications [16].For instance, a recent review surveyed how existing drugs may have activity against SARS-CoV-2 to be readily applied to treat COVID-19 patients [17,18].Similarly, target repositioning [19] can be used in the field of infectious diseases, where a drug is used to inhibit the ortholog target proteins in other species [20,21].
The repurposing process is often initiated after phenotypic observations of adventitious polypharmacological drug activities.For instance, we observed a surprising activity for axitinib, an endothelial growth factor receptor (VEGFR) inhibitor approved for advanced renal cell carcinoma, in primary chronic myeloid leukemia (CML) and acute lymphoblastic leukemia (ALL) cells [22].Since these cancers are driven by the oncogenic BCR-ABL1 fusion protein, we hypothesized that axitinib might bind to BCR-ABL1.This was confirmed by structural and functional analysis, and interestingly, axitinib bound to T315I-mutated BCR-ABL1 with roughly 40 times higher affinity than to the wild-type BCR-ABL1.Currently, axitinib is being investigated in an alternating regimen with bosutinib for CML patients (NCT02782403).Subsequent reports, however, have indicated that axitinib may lose potency when additional compound mutations emerge in BCR-ABL1 [23], and the drug does not seem to be effective against ponatinibresistant T315I-mutated cells [24].These observations raise the question whether one could use AI algorithms to predict at least some of the potential drawbacks already before the repurposing process enters the clinical stage.

Data resources for in-silico drug repurposing
We start by going through selected data and information resources that we find useful for in-silico drug repurposing.Rather than providing a systematic review of all developed resources, we mainly focus on information sources motivated by the axitinib repurposing study from the previous section, including resources for drug-target activity data, cell-based pharmacogenomic data, and chemical structure information.For more comprehensive surveys of various data resources, the reader is referred to recent reviews [4][5][6]11].We will discuss the use of these resources in Section 3.

Drug-target interaction resources
Comprehensive knowledge about the intended on-targets and non-intended or so-called off-targets of a drug is important for understanding its underlying mechanism of action (MoA), and for modeling its efficacy or toxicity in various tissue and cancer types.As shown in the motivating example of axitinib study, drug-target activity profiles are highly valuable in drug repurposing [22].In contrast to proprietary resources, which were used e.g. in Drug Repurposing Hub, we promote here the use of publicly available drug-target activity resources and how these can be useful in training supervised ML models for insilico off-target predictions and drug repurposing.Table 1 highlights 18 selected compound/target databases, along with various features such as the number of compounds, targets and interactions covered, as well as whether API is provided for programmatic data access for AI-based explorations.For simplicity, we have divided the compound-target activity data types into three categories according the type of activity data they contain: quantitative bioactivity data (e.g. from multi-dose K d , K i, or IC 50 assays), binary interactions (both active and inactive drug-target pairs), and unary interactions (only active drug-target pairs).These categories determine whether regression or classification algorithms are applicable for the target activity predictions, and whether one has true positive as well as true negative examples for training of the supervised prediction models.
Most of the in-silico DTI prediction studies are based on one of the resources listed in Table 1 [42].So far, ChEMBL is the most popular target activity resource for regression modeling (i.e.prediction of quantitative drug-target binding affinities).Classification algorithms try to predict whether a drug has sufficient potency against the given target.In addition to the problem formulation (regression vs. classification), we have argued that at least the following factors should be taken into consideration in in-silico target prediction studies to avoid reporting overoptimistic drug-target activity prediction results: (i) multiple evaluation datasets specific to particular drug and target families to evaluate the application domain of the prediction model, (ii) evaluation procedure, where nested cross-validation is preferred over the standard cross-validation, and (iii) prediction problem setting (i.e.whether the training and test sets of compound-target pairs share common drugs and targets, only drugs or targets, or neither, where the latter is often the most challenging case) [43].Obviously, the more comprehensive is the information present in the databases, e.g. in terms of drug classes and target families, the better coverage the prediction algorithm will have.The predicted target activities should also be experimentally validated before suggesting for drug repurposing [44].Accordingly, we recently organized an IDG-DREAM Challenge, where the teams used bioactivity data from ChEMBL, DTC, and BindingDB to make quantitative target activity predictions, which were later validated using subsequent experimental assays [42].

Article highlights
• AI-guided drug repurposing benefits from large drug-target binding affinity resources for compound off-target activity predictions • Repurposing leads needs to be further explored in cell-based pharmacogenomic resources for drug efficacy and toxicity predictions • A wide variety of supervised machine learning algorithms have been developed for drug-target activity and drug response predictions • There is critical need for context-specific modeling of tissue-specific drug mode of action for more actionable drug repurposing applications • Scattered location of heterogeneous preclinical pharmacogenomic data limits our ability to use these data in AI-based drug repurposing Drug/target data resources listing the number of compounds, number of targets, and the number of drug-target interactions.Mut, the database contains drug activities also for mutant proteins; Vis, implements network visualizations for drug-target interaction networks.*Data type A, contains only active drugs for the targets; B, contains quantitative bioactivity data for drug-target binding affinity; C, contains both active and inactive drug-target pairs.Table contents adapted with permission from Oxford University Press from review paper [11].

Cell line and patient-derived omics resources
Drug-target bioactivity information offers possibilities to make informed predictions whether the explored compounds have the possibility to modulate a given target or not, and to what extent, but this information is typically cell context independent.However, since the drug MoA is often highly cell context-specific, it is important to actually measure (or predict) the activity of the compound against the cell model or target using cell-based assays.Cell line omics resources contain drug response data along with multi-omics profiles for established cancer cell lines (in vitro models), whereas patient-derived resources include pharmacogenomic information on the patient primary cells tested against various drugs (ex-vivo models).Table 2 lists a selected set of drug response and omics resources, along with additional features, such as number of drugs, cell lines, patient samples, and whether the resource contains API or drug response visualizations, useful for drug repurposing AI-applications.
The drawback of most of these resources is that they do not provide programmatic API (except for GDSCtools), and that pharmacogenomic data typically come solely from one lab or study (except for CellMinderCDB and PharmacoDB that integrate data from multiple studies).However, these data are freely available either through GUI (downloadable in many cases) or using batch queries.The patient-derived primary cell data are still limited in these resources, but at least PharmacoDB is currently extending to ex-vivo data as well.We do not consider here more complex preclinical models, such as patient-derived xenografts (PDXs) or other animal models, as the pharmacogenomic data from these models are still rather scarce for AI developments.However, the cellbased omics resources also enable one to predict patient responses to drug treatments, such as those available in The Cancer Genome Atlas (TCGA) resource; see Section 3.3.

Biological pathway information resources
Biological pathways facilitate the understanding of the inner working of the cells and the cellular responses of the drugs, and can therefore aid the drug repurposing efforts.For instance, mapping of the protein targets of drugs either to the same or orthogonal pathways may help to reveal the MoA of both multi-targeted monotherapies and combination therapies.However, various databases may contain different representations of the same biological pathways, which leads to variable results of statistical target pathway enrichment analysis and predictive models in the context of precision medicine [54].In this section, we highlight six pathway databases that contain information of compound target pathways, along with their characteristics in terms of the number of proteins, compounds, pathways and interactions (Table 3).PathwayCommons [55] and KEGG Pathways [56] are currently the two most comprehensive databases in terms of the Cell-based drug response and omics resources listing the number of compounds, number of established cell lines, and number of patient-derived primary cell samples.Vis, resource implements visualizations for compounds.Table contents adapted with permission from Oxford University Press from review paper [11].
number of reactions or interactions.Four out of six pathway databases also provide programmatic access for data using APIs, making them easy for systematic AI model development.

Chemical structure and protein property data resources
The chemical structural descriptors and target protein properties provide important information for AI and ML models for drug repurposing.There are various online web-servers and toolkits to calculate chemical descriptors for drugs and target properties of proteins.For instance, ChemCPP calculates kernel functions between the compounds [61].EDragon software computes more than 1600 topological and geometrical descriptors for the chemicals [62].The Open Babel toolkit provides several useful features including substructure search and calculation of fingerprints of the chemicals [63].RDKit provides features including 2D depiction, molecular serialization, fingerprint generation, and similarity analysis for the compounds [64].Finally, PyDPI is python package that computes molecular descriptors for drugs and structural and physiochemical properties for proteins [65].
There are also web-tools that help to draw chemical structures, compute physiochemical properties and chemical fingerprints.These tools have opened-up various applications for in-silico drug-drug interaction prediction [66] and for drug toxicity prediction [67].ChemSketch is a package to draw chemical structures including organics, organometallics, polymers, and Markush structures [68].KNIME comprises features for molecule conversion into various formats, generation of signatures, fingerprints, and molecular properties [69].PaDEL-Descriptor is a software for calculating molecular descriptors and 10 different types of fingerprints [70].BlueDesc is a free tool, which computes 36 different types of fingerprints [71].However, most of the fingerprint calculation methods are derived from the following five fingerprints: MACCS, PubChem, FP2-based, Atom Pair, and ECFP4.Table 4 lists selected open-access databases that contain chemical structural information, such as InchiKeys and SMILES, and that implement options for structure or sub-structural searches either through GUI or API, which we find useful for in-silico drug repurposing.

Algorithms for drug-target interaction predictions
To accelerate the costly and time-consuming experimental mapping approach to identify DTIs by means of biochemical experiments, various computational approaches have been developed over the past decade, providing a systematic means for prediction of potential DTIs [77][78][79].Concomitant with the experimental drug-target discovery efforts that provide either quantitative or qualitative compound-target interactions data (see Table 1), computational tools are being built to predict activities against new molecular targets for drug repurposing.For instance, ML models are using orthogonal drug-target space deconvolution, where the molecular structures of both the drugs and targets help to guide the in-silico predictions [80,81].Another research line has utilized crowdsourcing-based AI and ML methods to effectively predict target activities for kinase inhibitors [42].Similarly, Cichonska et al. adopted pairwise multi-kernel learning to predict the compound-kinase target-binding affinities [82].Extending to other target families, Li et al. predicted compound activity classes for enzyme, ion channel, G protein-coupled receptors (GPCRs), and nuclear receptors using substructure chemical fingerprints and rotation forest classifier [83].
There are excellent review articles that provide a comprehensive overview of AI-and ML-based methods for  [88].DGraphDTA utilizes graph neural networks to obtain deeper representations for drug-target activity prediction, based on structural information of both molecules and proteins, where the two network graphs of drug molecules and proteins are built up, respectively, [89].
Recently, several deep learning methods have been developed for predicting DTIs, including convolutional network model that first uses a graph convolutional network to learn the features for each drug-protein pair, and then based on these feature representations as inputs, utilizes deep neural network to classify between positive and negative DTI classes [90].These in-silico methods provide a deeper understanding of the factors affecting DTI prediction, and have opened novel strategies for computational drug repurposing.
Accurate DTI prediction has the potential to not only complement the experimentally mapped DTI networks but also to provide novel drug repurposing leads by extending the target space of already approved drugs [91].There are also in-silico methods that make use of DTI mappings or predictions directly in the drug repurposing process.For instance, Mei et al. have proposed a multi-label learning framework to find new uses for approved drugs, and conversely to discover new drugs for known target proteins [92].In their framework, each drug is treated as a class label and its target proteins as classspecific training data to train l 2 -regularized logistic regression model.Stratified multi-label cross-validation showed that 84.9% of the known target proteins were correctly predicted at least for one drug, and the proposed framework correctly recognized 86.73% of the independent test DTIs from DrugBank.These results show that the proposed framework could generalize well in the large drug space without requiring the information of drug chemical structures and target protein structures.The recently introduced iDrug method integrates drug repositioning and DTI prediction into one coherent model via cross-network embedding [93].The embedding approach provides a principled way to transfer knowledge across the drug-target-disease relationships, and in doing so, it enhances the prediction accuracy for both of the prediction tasks (i.e.DTI and drug-disease relationships).The performance of the iDrug method was tested on various real-world datasets, covering multiple disease types, hence making it widely applicable to repurpose drugs for several indications.For more targeted application, Molecule Transformer-Drug Target Interaction (MT-DTI) is a pre-trained deep learning-based drug-target model to identify commercially available drugs that could act on viral proteins for the inhibition of SARS-CoV-2 [94].Through a detailed analysis, the authors showed that atazanavir, an antiretroviral medication for treatment of HIV, proved to be the most potent drug with an inhibitory potency of K d = 94.94 nM against the SARS-CoV -2, followed by remdesivir (K d = 113.13nM), efavirenz (K d = 199.17nM), ritonavir (K d = 204.05nM), and dolutegravir (K d = 336.91nM).

Algorithms for molecular docking and molecular dynamic simulations
Molecular docking is a widely used in-silico method in structure-based drug design, due to its ability to predict the binding-conformation of small molecule ligands to the appropriate target-binding site [95][96][97].The drawback of molecular docking is that the 3D structures of many target proteins have not yet been resolved, which is required for running the docking simulations.Furthermore, the accuracy of docking-based methods decreases in cases where the number of known ligands for a protein is not sufficient [98].Regardless of these limitations, there are several examples of successful dockingbased drug off-target activity predictions [99].For instance, antipsychotic agent thioridazine was found among 1500 FDAapproved compounds to possess anti-inflammatory activity by binding and inhibiting IκB kinase, which is critical for the NF-ΚB pathway [100].Similarly, virtual docking accurately predicted inhibitory activity of five compounds from a collection of more than 1400 FDA-approved drugs against Pseudomonas aeruginosa quorum-sensing (population-wide virulence) mechanisms, with antipsychotic agent pimozide displaying potent in vitro activity in inhibiting bacterial virulence gene expression [101].Moreover, AI is also emerging as an increasingly accurate approach for predicting the 3D structures of proteins from their amino-acid sequences [102,103].
We recently implemented VirtualKinomeProfiler, an efficient computational platform that captures distinct representations of the chemical similarity space of the druggable kinome for speeding-up drug discovery and repurposing process for highly promiscuous kinase inhibitors [104].An ensemble support vector machine (eSVM) algorithm enabled activity classification for >30 M compound-kinase pairs, using which we carried out in-silico activity predictions for >151 K compounds in terms of their drug repositioning and lead molecule potential.Experimental testing with biochemical assays validated 19 of the 51 of the predicted interactions, leading to a 1.5-fold increase in precision and 2.8-fold decrease in falsediscovery rate, which demonstrated its potential to expedite the kinome-specific drug discovery process.There are also several other case studies, where structural information of chemicals has been directly utilized for drug repurposing applications in various target classes.For instance, CATNIP is ML model for drug repurposing that requires only similarity information of the molecules based on their structural, target, or pathway information [105].Another model utilized chemical fingerprint information to predict that 22 FDA-approved drugs have potential activities on heart failure, and confirmed experimentally 8 of the 22 of the cardioprotective activities in vitro [106].

Algorithms for cell and tissue-based drug response predictions
Once the target activity potential of a drug has been predicted or established, either by using DTI prediction algorithms or molecular docking methods, the next important prediction task involves the investigation whether the drug has efficacy in a relevant cell context.This is critical because biochemical compound affinity and structure-based modeling provide only hypotheses of compound activity against a particular disease target, and these predictions need to be further investigated using a relevant disease model.In anticancer applications, cancer cell line models and patient-derived primary cells are widely used for such predictive purposes (see Table 2).
As an early community effort and an example for other in-silico precision oncology studies, NCI-DREAM Drug Sensitivity Prediction Challenge benchmarked in 2013 a number of supervised ML algorithms based on genomewide omics and drug response profiles of 53 human breast cancer cell lines [107].Notable, the predictive models that made use of multiple omics profiles of the cancer cell lines had the best performance, suggesting that the genomic, transcriptomic, epigenomic, and proteomic profiles each provides complementary predictive signal for the cell-based drug response modeling.The best-performing approach was based on the Bayesian efficient multiple kernel learning (BEMKL) model [108], a kernelized regression model that makes use of multi-task and multi-omics learning, where the pairwise similarities of cell lines in terms of the multiple omics profiles are first represented as separate profile kernels, and a multiple kernel learning algorithm then calculates a combined kernel as the weighted sum of all profilespecific kernels.Finally, multi-task learning allows one to estimate the BEMKL model simultaneously for all the drugs as related prediction tasks.
After the DREAM Drug Sensitivity Prediction Challenge, hundreds of prediction algorithms have been developed for matching cancer cell omics features to the cell-based drug efficacies.Some common features of the best-performing methods can be inferred from two recent systematic analyses in cancer cell lines datasets [109,110].Both of these comparative analyses focused on multi-omics and multi-target learning approaches, and concluded that matrix-factorization and kernel-based methods performed best in drug response prediction across various cancer cell lines.More specifically, similarity-regularized matrix factorization (SRMF) approximates the drug response matrix by the product of two low-rank similarity matrices; one that uses the cell line omics profiles, and the other that is based on drug structural similarities [111].Similarly, pairwise multi-kernel learning (pairwiseMKL) method integrates heterogeneous cell line and chemical structure information into a single model, enabling the joint analysis of the kernel mixture weights for the different information sources [82].Importantly, SRMF and pairwiseMKL methods showed robust and improved performance in various cell line datasets and in terms of different evaluation metrics [109,110].
However, there are still some critical missing pieces that need to be addressed in these drug efficacy prediction methods when used for drug repurposing.The first challenge is how to identify panels of multi-omics features that are predictive of the drug efficacy in the target cancer type.While matrix factorization and kernel-based methods often provide high predictive accuracy, they cannot directly identify clinically actionable biomarkers among the genome-wide omics profiles [112].Toward feature selection, the use of drug-target activity information has been shown to improve the predictive performance and interpretability of drug efficacies [113].Recent systematic analysis demonstrated how rather simple feature selection methods enabled identifying relative small feature panels using prior information on targets and pathways of molecularly targeted drugs, whereas wider feature sets were required for drugs affecting general cellular mechanisms (i.e. standard chemotherapies) [114].These results indicate that there are both target-based and non-target-based features that can be predictive of specific drug efficacies in various cancer types (see Figure 1).
The next challenge is how to best predict treatment outcomes in cancer patients (e.g.clinical in vivo responses to treatments), rather than merely drug efficacies in established cell lines (in vitro responses), as the former enables straightforward translational precision oncology applications and drug repurposing opportunities.A recent systematic analysis investigated the importance of a number of modeling components for the clinical treatment response prediction of cancer patients [115].As expected, the sample size of the patient response data was found as an important determinant for the predictive modeling, along with experimental noise within the data that can easily deteriorate the models' robustness.Rather surprisingly, the in vitro drug treatment profile was not among the most predictive feature when predicting the clinical response of the same drug in actual cancer patients.These results indicate that even cell line models of high accuracy do not necessarily translate to accurate predictions of drug response processes in cancer patients in vivo [115].
For drug repurposing, there is an added need for accurate tissue-specific drug efficacy predictions to study the efficacy of a drug in a relevant tissue-of-origin.Recent models, such as tissue-guided LASSO, make use of information on samples' tissue-of-origin to improve in vivo prediction performance [116].It was shown that tissue-guided LASSO improves the clinical predictions and was able to distinguish resistant and sensitive patients for selected drugs.Furthermore, the method identified genes associated with the drug response, including known targets and pathways involved in the drugs' MoA.Surprisingly, the use of information on the tissue-of-origin did not improve the prediction results, suggesting that there is still room for improvement for tissue-aware drug efficacy predictions.We further argue that one needs to consider several drug response-informative gene sets when predicting the potential efficacy and toxicity of specific drugs, some of which are illustrated in Figure 1.Finally, one needs to avoid inhibiting so-called anti-targets, i.e. proteins that are involved in normal cellular processes, which may lead to severe toxic side effects if modulated.

Conclusion
This review described the use of supervised ML and AI models, with accompanying data resources, for three levels of prediction tasks related to drug repurposing process.First, biochemical bioactivity predictions for new DTIs; second, cell-based compound response predictions for drug-cell line/patient interactions; and third, drug repurposing predictions by means of novel drug-disease relationships.Each of these levels is important for understanding the MoA of the repurposed drugs in terms of their on/off target potencies and tissue-based response profiles.In addition to the identified protein targets, repurposed drugs may reveal additional molecular targets and pathways that can be further exploited therapeutically using other drugs or their combinations.Polypharmacological effects originating either from combination therapies or multi-targeted drugs are important for treating complex diseases, including many cancers and viral infections, but the potential toxicity of polytherapies needs to be carefully predicted using computational and experimental models.We also note that the entire field of drug repurposing is at risk of publication bias in the sense that much of the content of the various data and information sources is derived from published research; this introduces biases, e.g.well-known drugs tend to have more publications, and therefore weighting evidence more heavily than for lesser-studied drugs.
Figure 1.Schematic illustration of overlaps between cancer-related gene sets.There are both target-based and non-target-based features that can be predictive of specific drug efficacies in various cancer types.The cancer genes and protein targets should be studies separately for each tissue type (e.g.breast cancer) and inhibitor class (e.g.HER2 inhibitors).Selective efficacies are preferred in the repurposing predictions, as tissue of origin-independent targets may lead to toxic side effects.

Expert opinion
In this section, we highlight our opinion on drug repurposing specifically in cancer research, where large-scale cancer sequencing efforts are being carried out to identify genomic aberrations specific to each tumor type.These genomic data are invaluable to match drug therapies targeting specific aberrations, either using the drug's intended medical indications or repurposed drugs.However, even though the extent of genomic testing and the diversity of our pharmacological portfolio are constantly increasing, we argue that genomics alone is currently insufficient to identify therapeutic options for the majority of patients, especially for those with advanced disease or cases without known cancer drivers and rare cancer types.The scarcity of clinical patient data and focus on genetic aberrations as the primary drug targets may further limit the accuracy of those drug repurposing approaches that rely solely on genomics-based information.We and others believe that this limitation can be partly addressed by functional testing of cancer patient cells exposed to large number of both targeted and conventional therapies using drug testing assays in patient-derived cell models ex vivo, and later verified in patient-derived organoids (PDO) or xenograft (PDX) models in vivo [117][118][119].Cell-based drug testing enables identification of patient-selective target activities, rather than broadly toxic effects that often lead to severe toxic side-effects.Compared to the genomics-only approach, predictions from drug testing are often pharmaceutically actionable.However, we believe that integration of mutation profiling and drug sensitivity testing leads to improved, and sometimes unexpected drug repurposing options (e.g.axitinib for CML and ALL [17]).
In addition to the data from in vitro or ex vivo model systems (Table 2), there is also a need for flexible computational models that can speed-up the early investigation of both the therapeutic and toxic effects of small molecules before entering into lengthy and costly animal or clinical studies.Rather than using single outcomes to rank the insilico predictions, we argue that it is important to carefully dissect various readouts, such as those quantifying efficacy, toxicity, or synergy of multi-targeting mono-and combinatorial therapies in the pre-clinical model systems, when developing safe and effective therapeutic regimens for cancers and other diseases [120].The use of both in-silico and preclinical pharmacogenomic predictions can greatly reduce the extensive cost, time and risks associated with drug discovery process, before entering clinical trials.While a large number of insilico drug repurposing approaches have been developed, including AI and ML models, what is unclear, however, is how useful these methods are in producing clinically efficacious repositioning hypotheses.Most computational studies perform analytic validation, where the prediction results are compared to existing biomedical knowledge.When examining the repositioning literature, however, there appeared no consistent practices for validation of the methods [121].To address this unmet need, Brown and Patel reviewed the computational repositioning literature, focusing on the studies in which authors claimed to have validated their work.Their analysis revealed a widespread variation in the types of strategies, predictions made, and databases used as 'gold standards' [121].This suggests that further developments are needed to make the in-silico drug repurposing predictions more actionable.
However, the heterogeneous preclinical data are currently housed in various locations.Drug-target bioactivity profiles are being collected in drug/target databases (Table 1), which provide insights into the potential use of small-molecule compounds to modulate various on-and off-targets, including mutant targets and wild-type proteins.Cell-based drug response phenotypic data (Table 2) provide further evidence that the compound is actually effective in a given cell context or patient-derived sample (and not broadly effective in many cell types, which may be a sign of toxic effects).Finally, drugtarget potencies and gene-drug associations can be linked to tumor genomic profiles and associated lifestyle and clinical data to make informed decisions about therapeutic efficacies, hence leading to translationally actionable drug repurposing opportunities.The scattered location of the preclinical pharmacogenomic data means that these information sources are currently available in formats that are not interoperable with each other, greatly limiting our ability to use these data in a systematic manner in AI-based predictive models.In the past, the lack of common standards for cancer models and chemical compounds, as well as meta-data for quantitative drug response profiles, further prevented the wider translational re-use of such data.Recent data harmonization efforts, such as DrugTargetCommons [122] for compound-target activities, PharmacoDB [123] for cell-based drug response profiles, as well as Cell Model Passports [124] and Xeva [125] for in vitro, ex vivo and in vivo models, are likely make their integrated use more straightforward in the AI models.
Although genomic sequencing and cell-based drug testing technologies continue to improve, wider adoption of genomicsbased precision oncology and functional drug repurposing in the clinics has been held back by several logistic, regulatory and financial issues.For instance, even though the off-target potencies of approved drugs should lead to rather straightforward drug repurposing opportunities, it is often unclear for the academic researcher how to deal with approvals of off-label use of drugs or investigational molecules that show potency in patientderived samples ex vivo, perhaps in combination with agents from other pharma companies.At the regulatory level, new types of clinical trials may be needed to get molecules approved sometimes for very narrow and specific indications, e.g.basket trials for molecularly targeted patient subgroups, or umbrella trials for rare cancer types.Furthermore, sharing and re-use of the pharmacogenomic data for new research or translational purposes is often complicated by uncertainties at the legal or ethical level, as different countries adopt divergent legislations.For translational applications, working with early phase diagnostic patients, rather than with the late stage relapsed cases, should lead to improved and sometimes also more durable outcomes.For routine cancer diagnosis and prognosis, cellbased drug sensitivity testing ex vivo cannot be implemented for each cancer patient, which calls for accurate response predictive biomarkers inferred, for instance, by computational AI models.This requires collaborative and multidisciplinary effort between experimental scientists, computational biologists and clinicians or translational researchers to solve these and other future challenges.
A recent comprehensive review of the time and cost expenditures of drug repurposing clinical trials in acute myeloid leukemia (AML) debunked the common dogmas associated with drug repurposing, namely (1) drug repurposing saves time, (2) phase I clinical trials can be skipped, and (3) repurposed drugs are safe as their toxicity profile is known [126].However, the realities are much more complex, and in particular the toxicities of drug combinations can be unexpected, and should not be underestimated.For example, combination with cholesterol medication pravastatin with idarubicin and cytarabine resulted in multi-organ failure in AML patients [126].Thus, it remains vital to develop better AI and ML models to predict combinatorial toxicities.Furthermore, there is a need to further improve our capacity to understand the effects of tumor subclonality and adaptive responses to drug responses, repurposed or otherwise.Notably, a recent report featuring single-cell DNA sequencing of 123 primary AML samples revealed simultaneous co-evolution of several independent but leukemogenic tumor subclones in each patient sample [127], implying a requirement for multi-targeting treatments for a lasting tumor control using either drug combinations or promiscuous drugs [128].Fortunately, computational tools are being developed to help us decipher the multiple cellular drug targets and their associated pathways, with the aim to better predicting toxicities and targeting multiple subdiseases in the patient.Open-access, crowdsourced webbased resources to complement missing drug activity annotations [122], combined with AI-based predictive models and analytic visualizations should facilitate manual efforts by automated data mining approaches toward more systematic and accurate drug repurposing leads.
We also note that many computational repurposing predictions are mechanistic or statistical only, and will require separate evaluation for specific medical indications and patient populations.It is well known, for instance, that drug metabolism and pharmacodynamics are influenced by gender, age, concomitant medications and food intake, as well as underlying physiological states, and thus drug repurposing from one indication to another still necessitates a thorough understanding of the individual and disease-specific clinical safety parameters [129,130].FDA maintains both the 'passive' postmarketing pharmacovigilance database FAERS (FDA Adverse Event Reporting System) and the 'active' sentinel system, which collect information on adverse events that may occur in patients outside the clinical trials in the long term.In cancer treatment, for instance, genetic alterations that may negatively or positively influence drug efficacy in the malignant tissue are being collected in databases such as OncoPDSS [131], but the germline changes, and epigenetic and non-genetic physiological states that impact efficacy and safety outside the tumor context have not been similarly annotated.Yet, single nucleotide variation and other genetic alterations can combine with physiological states to deviate drug responses.For instance, individual nucleotide variances in drug metabolizing CYP450 cytochrome family enzymes that alter drug metabolism, such as CYP2C19, drastically influence both efficacy and safety of several drugs, such as antiplatelet agent clopidogrel.Taken together, while the process of drug repurposing can be initiated through drug-target or pathway interactions, the actual clinical translation will depend on several additional biological and physiological checkpoints.

Table 1 .
Drug-target interaction resources for target activity predictions.

Table 2 .
Cell-based pharmacogenomic resources for drug efficacy predictions.
A genome-scale library that catalogs transcriptional responses to chemical and genetic perturbation.CMAP contains 1 M response profiles resulting from perturbations of multiple cell types.A web-application assembling the largest in vitro drug screens in a single database, allowing users to easily query the harmonized data from multiple studies released to date.

Table 3 .
[86]way resources for understanding compounds' mode of action.Table contents adapted with permission from Oxford University Press from review paper[11].EXPERT OPINION ON DRUG DISCOVERY DTI prediction.For instance, Chen et al.[84]categorized the learning methods into nearest neighbor methods, bipartite local models, matrix factorization methods, and semisupervised methods, and discussed pros and cons of the method classes.There are also reviews on various classes of DTI prediction methods; for instance, Sachdev et al. provided a review on feature-based chemogenomic methods for DTI prediction[85], and Wu et al. discussed the pros and cons of network-based methods for predicting DTIs[86].They further sub-divided net- [87]-based methods into categories such as network-based inference (NBI) methods, similarity inference methods, random walk-based methods, and other network-based methods.As specific examples of network methods, DTiGEMS+ is a computational approach to predict DTIs using graph mining and similarity-based techniques[87].Mongia et al. proposed method that is based on multi-graph regularized nuclear norm minimization to identify interactions between drugs and target proteins from three inputs: known DTI network, similarities over drugs, and those over targets

Table 4 .
Chemical structure databases using InchiKey searches or structure drawings.