Drug repurposing for coronavirus (COVID-19): in silico screening of known drugs against coronavirus 3CL hydrolase and protease enzymes

Abstract In December 2019, COVID-19 epidemic was described in Wuhan, China, and the infection has spread widely affecting hundreds of thousands. Herein, an effort was made to identify commercially available drugs in order to repurpose them against coronavirus by the means of structure-based virtual screening. In addition, ZINC15 library was used to identify novel leads against main proteases. Human TMPRSS2 3D structure was first generated using homology modeling approach. Our molecular docking study showed four potential inhibitors against Mpro enzyme, two available drugs (Talampicillin and Lurasidone) and two novel drug-like compounds (ZINC000000702323 and ZINC000012481889). Moreover, four promising inhibitors were identified against TMPRSS2; Rubitecan and Loprazolam drugs, and compounds ZINC000015988935 and ZINC000103558522. ADMET profile showed that the hits from our study are safe and drug-like compounds. Furthermore, molecular dynamic (MD) simulation and binding free energy calculation using the MM-PBSA method was performed to calculate the interaction energy of the top-ranked drugs. Communicated by Ramaswamy H. Sarma


Introduction
Coronavirus disease  has become an important public issue across the globe since December 2019. As of 12th of April 2020, more than 1.79 million cases have been reported in 210 countries and territories (Worldometer, 2020). It affects people worldwide and there is no vaccine yet for this virus. Certain types of pneumonia are implicated to the new coronavirus, which is considered a big threat to global public health. There is an urgent need to develop potent anti-COVID-19 agents for the prevention of the outbreak and stop viral infections.
Repurposing of known small molecules seems to be very efficient way in order to develop potent drugs to combat coronavirus in this short time period. Recently, a number of efforts were made to design novel inhibitors or employ drug repurposing approach to identify anti-COVID-19 drugs, which can act as promising inhibitors against coronavirus protease Sarma et al., 2020). The known coronavirus 3-chymotrypsin-like protease (3CLpro), also known as Mpro, is the main protease, which is required for proteolytic maturation of the corona virus. This Mpro have an essential role in the immune regulation and cleaving the polyproteins pp1a and pp1ab, which making them attractive and important targets for anti-COVID-19 drugs (Zhou et al., 2019). Functional proteins such as RNA polymerase, endoribonuclease and exoribonuclease are generated by cleavage of pp1a and pp1ab polyproteins by Mpro . Therefore, targeting Mpro enzyme will inhibit the viral maturation and enhance the host innate immune response against COVID-19. 3D crystal structure of 3CL hydrolase from specific coronavirus (PDB ID: 6LU7) is recently reported in the public domain. This could be an excellent target to be used to screen small molecule libraries to inhibit the cleavage of the viral polyprotein and stop the spread of infection. In the Protein Data Bank (PDB) archive, there are also some proteins whose sequence identity at least 90% similar to the COVID-19 coronavirus 3CL hydrolase (Mpro). These proteins structures include bound inhibitors, which could lead to repurpose novel drug against coronavirus 3CL hydrolase. Recent study has shown that the cellular transmembrane protease serine 2 (TMPRSS2) is used by COVID-19 for the purpose of entering the host cells, and blocking TMPRSS2 might offer a promising treatment option and prevent the virus from entering the host cells (Hoffmann et al., 2020). Transmembrane serine proteases are connected to some viral respiratory infections, where they facilitate the entry of the virus into the lungs (Shulla et al., 2011). TMPRSS2 belongs to the serine protease transmembrane family type II, and it was recognized by its involvement in the cleavage of the influenza virus hemagglutinin protein in epithelial cells (B€ ottcher et al., 2006). In addition, studies have shown that TMPRSS2 can drive the cleavage spike protein, which is a coronavirus fusion glycoprotein. Spike proteins on the host cell surface are activated by TMPRSS2 to facilitate the virus cell-membrane fusion and the entry of the virus (Gierer et al., 2013;Matsuyama et al., 2010). Targeting TMPRSS2 in some animal studies decreased the pathological severity of influenza virus infection (Iwata-Yoshikawa et al., 2019). Hence, TMPRSS2 is an attractive target for designing and developing antiviral drugs (Laporte & Naesens, 2017). Herein, our study aims to help in coronavirus inhibition in one of two ways, preventing the virus from entering the host cells, and blocking the virus maturation inside the host cells. Computer-aided drug design (CADD) has been used for the identification of potent inhibitors against coronavirus (Berry et al., 2015;Oany et al., 2014). In this study, we have applied virtual screening approach, homology modeling of human TMPRSS2 enzyme, molecular docking study and ADMET profile analysis in order to identify novel and potential inhibitors against COVID-19 ( Figure 1).

Protein and ligand setup
The recent resolved crystal structure of 3CL hydrolase enzyme was retrieved from the Protein Data Bank (PDB ID: 6LU7) with a resolution of 2.16 Å. The enzyme has one chain with a total of 306 amino acids. The enzyme was prepared at physiological pH (7.4) using "Prepare Protein" protocol in BIOVIA Discovery Studio 4.5 (DS 4.5) and missing residues insertion was allowed if found. Water molecules and the inhibitor were removed from the structure. For the purpose of repurposing of known inhibitors, a total of 4,500 approved drugs were downloaded in SDF file format from ChEMBL (https://www.ebi.ac.uk/chembl/), DrugBank (https://www.drugbank.ca/) and Selleckchem (https:// www.selleckchem.com/) databases. In addition, a total of 30,000 drug-like molecules were downloaded from ZINC15 database (https://zinc15.docking.org/). All small molecules were prepared and protonated at physiological pH (7.4) and their geometry was minimized using DS 4.5.

Homology modeling
Lack of information of crystal structures of particular proteins has delayed and obstructed the drug industry. Homology  modeling is widely applied to create a reliable structure of a protein using its own amino acid sequence (Vyas et al., 2012). To date, TMPRSS2 enzyme structure has not been resolved yet. Therefore, Basic Local Alignment Search Tool (BLAST) (https://blast.ncbi.nlm.nih.gov/Blast.cgi) search was carried out to identify the most suitable template for our target (TMPRSS2). According to the results, the crystal structure of human transmembrane protease serine (DESC1) showed high identity of 41%, 187 Bit-Score and 2.55333e-55 E-value with the query target. Consequently, the 3D structure of DESC1 was retrieved from PDB website (PDB ID: 2OQ5, 1.61 Å) (Kyrieleis et al., 2007). TMPRSS2 sequence in FASTA file format (UniProt ID: O15393) was downloaded from UniProt (http://www.uniprot.org/). The template and the target sequences were aligned using "Align Sequences" in DS 4.5. Twenty models were created using "Homology Modeling" protocol in DS 4.5 using MODELLER plug-in (Webb & Sali, 2016). MODELLER was then used to verify the 20 built models. The best model was found to be M0001 with a Normalized DOPE score of À1.106201. The created homology modeling was then minimized using the "Clean Geometry" tool provided by DS 4.5, which minimizes the energy under CHARMM forcefield. Additionally, the protein was prepared using "Prepare Protein" protocol and protonated at pH, 7.4. M0001 model was then refined using 3Drefine web server (http://sysbio.rnet.missouri.edu/3Drefine/) (Bhattacharya et al., 2016).

Structure-based virtual screening
Structure-based virtual screening refers to in silico identification of potential chemical molecules out of large number of compound libraries, which have high affinity to proteins of known structure, based on the binding of the small molecule with the protein binding pocket (Sliwoski et al., 2014). To perform the virtual screening approach in our study, AutoDock Vina was used. AutoDock Vina offers an accurate and high performance docking tool, which relies on both empirical and knowledge-based scoring functions (Trott & Olson, 2009). The docking parameters for both Mpro and TMPRSS2 are given in Table 1. AutoDock Vina was used to assess the binding affinity of the native inhibitor (N3) obtained from the crystal structure of Mpro, as well as the binding affinity of selected known inhibitors of TMPRSS2 retrieved from the literature (Sielaff et al., 2011). The binding energy threshold was based on the average binding energy scores of docked known inhibitors against Mpro and TMPRSS2, which found to be À6.2 kcal/mol and À7.2 kcal/ mol, respectively. In the present study, the threshold was set to À8 kcal/mol due to the fact that hit compounds were expected to show lower binding energy scores, and to narrow down the search space. Selection criterion was set to ensure that the virtual screening will end up only with those molecules with the highest binding affinities to their respective targets. A total of 3451 molecules satisfied our selection criterion (DG À8 kcal/mol) and showed the highest binding affinity against Mpro and TMPRSS2 enzymes, respectively.

Molecular docking study
The hits compound from the virtual screening study were further docked into the studied targets using AutoDock 4.2 (Morris et al., 2009). Grid box mapping parameters are shown in Table 2. Grid box parameters were set to cover the whole binding pocket and its adjacent residues. Twenty million energy evaluations were performed for each compound with  a total of twenty runs using AutoDock 4.2. Lamarckian Genetic Algorithm was used and Gasteiger partial charges were added onto all studied proteins.

Electrostatic potential calculation
Electrostatic potential was predicted using DelPhi program included in DS 4.5, which employs the Poisson-Boltzmann formula on a cubic lattice using the finite-difference technique to calculate the electrostatic charge distribution of TMPRSS2 and Mpro enzymes (Nicholls & Honig, 1991).

ADMET profile and drug-like prediction
Absorption, distribution, metabolism, excretion and toxicity (ADMET) profile has a great importance in drug industry and is widely used in CADD to reduce undesired effects. In our study, web-based tools admetSAR (Yang et al., 2019) and  SwissADME (Daina et al., 2017) were used to predict the drug-likeness and ADMET profile.

Molecular dynamic (MD) simulation
Molecular dynamic approach is widely used to assess atoms behavior, structural stability and to study the conformational changes on atomic level. Herein, top-ranked compounds, homology model of TMPSS2 as well as the crystal structure of Mpro enzyme were subjected to MD simulation by nanoscale molecular dynamics (NAMD) software (Phillips et al., 2005), and the temperature was set to 303 K. Configuration files for MD simulations were generated by CHARMM-GUI website (http://www.charmm-gui.org/) (Jo et al., 2008). Ligand parameterization was performed using CHARMM General Force Field (CGenFF) web-based tool (https://cgenff. umaryland.edu/). All systems were solvated using transferable intermolecular potential water molecules (TIP3P) model (Boonstra et al., 2016). MD simulations were performed in two steps; equilibration under NVT ensemble for 5 ns timescale, where the systems' energy was minimized for 10,000 steps, and production for 50 ns timescale under the NPT ensemble.

Binding free energy calculation
The binding free energy (DG bin ) between protein and ligand is controlled by three factors; the gas-phase free energy (DG MM ), the solvation free energy (DG sol ) and the change in the system entropy (ÀTDS), which can be calculated by molecular mechanics Poisson-Boltzmann Surface Area (MM-PBSA) module, as expressed in Equation (1) (Kollman et al., 2000): In the present study, MM-PBSA calculations were performed using Calculation of Free Energy (CaFE) tool to calculate the interaction energy between the ligand and the residues of the protein active site (Liu & Hou, 2016). In the MM-PBSA calculations, the external dielectric constant was set to 80.0, the internal dielectric constant was 1.0 and the reciprocal of grid spacing of 1.0 Å was employed.

Homology modeling
Out of the 20 models generated by MODELLER, we selected model M0001, which had the lowest Normalized DOPE score (À1.106201). Sequence alignment for M0001 and the human DESC1were performed unsing DS 4.5, and the sequence identity was 39.4%, while sequence similarity was 61% (Figure 2). In fact, if the target and template sequences share 30% and more of amino acid identity, the homology modeling is considered to be reliable and successful (Xiang, 2006). Structural alignment of our homology modeling structure (target) with the 3 D structure of DESC1 (template) was performed using DS 4.5, and the calculated RMSD was 0.40 Å, which suggests the high similarity between the target and the template (Figure 3). Our model (M0001) was verified using the Web-based version of Protein Structure Analysis tool, ProSA. ProSA is used to calculate the z-score of a specific model and correlate this score to those calculated scores from all publicly available structures on PDB website. Accordingly, the zscore of our model M0001 was -6.69, which is in the range of all PDB conformations (Figure 4a). Moreover, the energy plot describes amino acid position depending on its local energy, where negative values relate to the accuracy of the structure (Figure 4b). In addition, PROCHECK web-based tool was used to generate Ramachandran plot to assess the energetically allowed regions of our model (Figure 4c). 83.8% of the residues are in the most favored regions, 15.2% are in additional allowed regions and 1.0% of the residue are in disallowed regions. ERRAT (https:// servicesn.mbi.ucla.edu/ERRAT/) was used to calculate the overall quality factor (OQF) for non-bonded atomic interactions, and generally any structure with OQF of 50% and more is considered as a high quality model. For our homology modeling, the overall quality factor calculated by the ERRAT server was 62% ( Figure 5).

Structure-based virtual screening and molecular docking results
Calculated binding energies and estimated inhibition constants of the top-ranked ligands retrieved in our study are given in Table 3.

Mpro potential inhibitors
Our molecular docking study revealed two of the approved drugs Talampicillin and Lurasidone with the highest binding affinities to Mpro enzyme. Talampicillin was found to have a binding energy score of À11.17 kcal/mol and an inhibition constant of 6.49 nM. Talampicillin is a prodrug of ampicillin, an antibacterial drug with a high potency against gram-negative bacteria. Talampicillin formed conventional and carbon hydrogen bonds with residues Gly143, Glu166, Gln189 and Gln192, p-r bond with His41, two alkyl interactions with the residues Met165 and Met49, two p-alkyl bonds with the residues Cys145 and Pro168, a p-stacked interaction with Leu167 and many van der Waals interactions were found as well (Figures 6a, 7a). Lurasidone was the secondbest candidate inhibitor against Mpro that showed and estimated inhibition constant value of 6.52 nM with a predicted binding energy score of À11.17 kcal/mol. Lurasidone was approved by the FDA in 2010 as an antipsychotic drug for treating schizophrenia patients (Corponi et al., 2019). Different ligand-protein interactions formed after the molecular docking study with Lurasidone, which includes hydrogen bonds with the residues His41 and Glu166, alkyl interactions with Met165 and Met49, p-alkyl interaction with Pro168, Met165 and His41 and multiple van der Waals interactions with the receptor (Figures 6b and 7b). Talampicillin and Lurasidone showed a reliable binding mode in Mpro and blocking the active site of the enzyme. The two topranked compounds from the ZINC15 library were ZINC000015988935 and ZINC000103558522. ZINC000015988935 was found to have a binding energy of À12.39 kcal/mol and an inhibition constant of 826 pM, while ZINC000103558522 had a binding energy of -12.36 kcal/mol and an inhibition constant of 866 pM. ZINC000015988935 showed a p-sulfur interaction with Met49, seven H-bonds with the residues Arg188, Asp187, Gln189, Ser144, Cys145 and Glu166, alkyl interaction with Met165 and a p-alkyl interaction with Cys145 (Figures 6c and 7c). ZINC000103558522 shared two p-sulfur interactions with residues Met49 and Cys145, hydrogen bonds with Glu166, Gln189, Cys145, Ser144 and Gly143, alkyl interactions with Met165 and Met49 and lastly p-alkyl interactions with Cys145 and Met49 and various van der Waals interactions with the enzyme (Figures 6d  and 7d). ZINC000015988935 and ZINC000103558522 depicted more hydrogen bond numbers shared with the enzyme in comparison to Talampicillin and Lurasidone, and this explains the higher binding affinity estimated for the Mpro enzyme (Figures 6  and 7).

TMPRSS2 potential inhibitors
Rubitecan drug bound to our model with a binding energy value of À9.47 kcal/mol and an inhibition constant of 115.06 nM. Rubitecan is a potent antitumor drug and has a high potency against solid tumors and pancreatic cancer (Clark, 2006). The nitroso group (NO-) of Rubitecan formed hydrogen bonds with His296 and Lys300 (Figure 6e, 8a). Rubitecan also formed a p-cation bond with residue Lys342, an alkyl interaction with Lys340 and a p-alkyl interaction with Lys300. According to our molecular docking study, Loprazolam was found to have a binding energy score of À9.31 kcal/mol and an inhibition constant of 149 nM against our homology modeling of TMPRSS2. Loprazolam is described for treating insomnia, hypnotic and muscle spasm (Clark et al., 1986). Loprazolam showed several interactions including p-cation with Lys300, H-bonds with Lys390, His296 and Tyr337, p-alkyl interaction with His296 and an alkyl interaction with residue Lys300 (Figure 6f, 8b). Our molecular docking study showed compound ZINC000000702323 form ZINC15 with the highest binding affinity against TMPRSS2. ZINC000000702323 was found to have a binding energy of À12.18 kcal/mol and a Ki value of 1.18 nM. ZINC000000702323 was found to have hydrogen bonds with the residues Gly462, Cys437, Gln438 and Gly439, p-lone pair interaction with Trp461, p-p interactions with residues His296 and Trp461, amide-p interaction with Trp461 and p-alkyl interactions with His296, Lys342 and Tyr474 and an alkyl interaction with the Lys342 (Figure 6g and 8c).
ZINC000012481889 compound was the second hit in our study with a binding energy of À12.14 kcal/mol and a Ki of 1.26 nM. ZINC000012481889 formed p-r interactions with Lys300 and His296, a salt bridge with Asp345, a p-cation with Lys300, halogen bonds with Gly464 and Ser436, multiple H-bonds with the residues Lys342, Ser460, Ser441, Cys437, Gln438, Ser339, Lys340 and Tyr337, a p-p T-shaped bond with His296, a p-alkyl interaction with Lys300 as well as many van der Waals interactions (Figures 6h and 8d).

Electrostatic potential distribution
Electrostatic potential helps understanding the structural aspects of the protein and the ligands and on those interactions between them. Electrostatic charges were mapped on our homology modeling, TMPRSS2, and the coronavirus protease Mpro for better understanding the surface nature of both enzymes. Intensity of positive potential was found to cover the binding pocket of the protease Mpro (Figure 9a, b). The inner cavity of the binding pocket of the TMPRSS2 model was found to have a negative potential and the rest of the active site is covered by positive charge (Figure 9c, d).

ADMET profile and drug-likeness
Lipinski's rule of 5 was used to predict the drug-likeness of our hits from the ZINC15 library. Lipinski's rule of 5 states that for    any ligand to be considered as a drug-like, a molecule should obey these criteria: molecular weight <500 Dalton, number of H-bond donors <5, number of H-bond acceptors <10 and LogP <5 (Lipinski, 2004). Accordingly, all of our hits obeyed this rule and thus considered as drug-like compounds. Furthermore, our compounds were found to be within the reference range considering their water solubility (LogS), Caco-2 permeability and topological polar surface area (TPSA) ( Table 4).

Molecular dynamic simulation analysis
In present study, to evaluate the stability of the homology model and the drug candidates, these structures were subjected to MD simulation. Root mean square deviation (RMSD) was calculated considering the proteins backbone with respect to the initial conformations. As depicted in Figure 10a, RMSD of free Mpro enzyme remained stable between 5 ns and 34 ns timescale at 2.4 Å, then slightly increased and persisted at 3.6 Å from 35 ns till the end of the run. The RMSD of Mpro_ZINC000015988935 reached its steady state at 3.1 Å at 10 ns. The RMSD of Mpro_Talampicillin showed that the system was balanced after 5 ns and slightly fluctuated between 2.3 and 3 Å and displayed the greatest effect on Mpro's stability. As illustrated in Figure 10b, The RMSD curves of the free TMPRSS2 had reached the equilibrium state after 3 ns. The RMSD of compounds TMPRSS2_Rubitecan and TMPRSS2_ZINC000000702323 revealed that the two complexes had shown almost the same behavior during the MD simulation and achieved structural stability the last 15 ns. A thorough study of the root mean square fluctuation (RMSF) curves of the free Mpro and its complexes showed all amino acids located in the active site of the enzyme had RMSF fluctuations between 0.8 Å and 1.7 Å, which indicate that the studied compounds kept close contact with their binding pockets during the MD simulations. However, Mpro_ZINC000015988935 showed higher fluctuations for residues 43-45 and 187-190 and this observation is consistent with the RMSD profiles of Mpro_ZINC000015988935 (Figure 11a). The higher fluctuations depicted in the RMSF profiles of TMPRSS2 and its complexes are located within the loop regions of the protein, and this is expected considering the high flexibility nature of loops (Figure 11b).

Binding free energy calculation
The binding free energy (MM-PBSA) was computed after the MD simulation the last 10 ns for all the complexes and the results are given in Table 5. Compound ZINC000015988935 depicted the lowest binding free energy with Mpro enzyme, while Talampicillin exhibited relatively higher binding energy.
On the other hand, Rubitecan showed the lowest binding free energy with the homology modeling TMPRSS2 enzyme in comparison to compound ZINC000000702323.

Conclusion
Mpro and TMPRSS2 are shown to be important and highly potent targets for the inhibition of COVID-19 infection. In this study, the structure-based virtual screening identified four commercially available drugs (Talampicillin, Lurasidone, Rubitecan and Loprazolam) with a potential inhibition of Mpro and TMPRSS2 enzymes. These drugs might be repurposed against COVID-19. In addition, four novel compounds were identified from ZINC15 library showed promising high affinities against Mpro and TMPRSS2 enzymes. These hits were described as drug-like compounds and showed harmless ADMET properties and may aid in developing and optimizing more efficient and potent COVID-19 inhibitors. Trajectories analysis showed that the studied complexes have displayed structural stability during the MD runs.