Targeting SARS-CoV-2 main protease: structure based virtual screening, in silico ADMET studies and molecular dynamics simulation for identification of potential inhibitors

Abstract COVID-19 pandemic has created a healthcare crisis across the world and has put human life under life-threatening circumstances. The recent discovery of the crystallized structure of the main protease (Mpro) from SARS-CoV-2 has provided an opportunity for utilizing computational tools as an effective method for drug discovery. Targeting viral replication has remained an effective strategy for drug development. Mpro of SARS-COV-2 is the key protein in viral replication as it is involved in the processing of polyproteins to various structural and nonstructural proteins. Thus, Mpro represents a key target for the inhibition of viral replication specifically for SARS-CoV-2. We have used a virtual screening strategy by targeting Mpro against a library of commercially available compounds to identify potential inhibitors. After initial identification of hits by molecular docking-based virtual screening further MM/GBSA, predictive ADME analysis, and molecular dynamics simulation were performed. The virtual screening resulted in the identification of twenty-five top scoring structurally diverse hits that have free energy of binding (ΔG) values in the range of −26-06 (for compound AO-854/10413043) to −59.81 Kcal/mol (for compound 329/06315047). Moreover, the top-scoring hits have favorable AMDE properties as calculated using in silico algorithms. Additionally, the molecular dynamics simulation revealed the stable nature of protein-ligand interaction and provided information about the amino acid residues involved in binding. Overall, this study led to the identification of potential SARS-CoV-2 Mpro hit compounds with favorable pharmacokinetic properties. We believe that the outcome of this study can help to develop novel Mpro inhibitors to tackle this pandemic. Communicated by Ramaswamy H. Sarma


Introduction
SARS-CoV-2 is a novel coronavirus belonging to the coronaviradae family and is responsible for acute respiratory illness . World Health Organization has declared it as a global pandemic and categorized it into the emergency diseases. More than 32.7 million confirmed cases along with 991000 fatalities have been reported as on 28th September 2020 (Weekly Epidemiological Update, 2020). COVID-19 has created a substantial threat to human lives by putting the healthcare system in unprecedented situations. Scientists and researchers across the globe are trying to identify new therapies as well as repurposing the clinically established therapies for the management of COVID-19 Liu et al., 2020). No specific vaccine and the antiviral agent has translated to the clinics so far for the effective control of COVID-19. The available drug regimens have limited efficacy thus imposing the need for comprehensive strategies and optimized therapeutics development (Sanders et al., 2020). At this unprecedented time to reduce the false-positive errors and to expedite the drug discovery process, artificial intelligence could provide a real-time solution. Using the structure-based virtual and high throughput screening followed by pharmacological investigation may be an effective strategy for the development of therapeutics against SARS-CoV-2 (Adeshina et al., 2020).
The genome of SARS-CoV-2 is single-stranded positivesense RNA having a length of $30,000 nucleotides . This RNA is a 5 0 -cap structure with 3 0 -poly (A) tail. There are six open reading frames (ORFs) observed in the SARS-CoV-2 genome. The ORF1a and ORF1ab encode for polyproteins (pp1a and pp1ab) that undergoes the process of proteolysis and get processed into nonstructural proteins (NSP1-NSP16) (Gordon et al., 2020). Enzymes responsible for the proteolysis are the main protease (M pro ), also known as 3-chymotrypsin like cysteine protease (3CL pro ), and papainlike proteases. These NSPs form the replicase-transcriptase complex which assembles the subgenomic RNA which further promotes the synthesis of structural proteins namely envelope, spike, membrane and nucleocapsid Gordon et al., 2020). Thus, M pro is responsible for charging the polyproteins with their respective functional entities and is crucial for maintaining the life cycle of COVID-19. The M pro (NSP5) interacts with the epigenetic regulators i.e. histone deacetylase 2 (HDAC2) and tRNA methyltransferase (Gordon et al., 2020). Also, the trafficking across the mitochondria and endoplasmic reticulum are reported to be regulated by the NSP5 (Gordon et al., 2020).
Superimposed X-ray crystal structures of the main proteases of SARS-CoV-2, SARS-CoV and MERS-CoV display a high degree of sequence identity and conservation of active site residues. SARS-CoV-2 main protease has about 96% sequence identity with that of SARS-CoV main protease, while the rest 4% variable residues are not primarily involved in substrate binding . M pro is one of the nonstructural proteins which plays a pivotal role in the life cycle of the virus by processing various polyproteins that are translated from viral RNA. This fact along with the highly conserved nature of the main protease makes it a promising target for designing antiviral agents. The main protease is a cysteine protease that is biologically active in its dimeric form only. Electron density mapping of the individual chain confirmed the presence of 306 residues in each of the monomer, which in turn comprised of three domains and a catalytic dyad (His_41 and Cys_145). Residues from 8-101, 102-184, 201-303 form the domain-I, II and III respectively (Goyal & Goyal, 2020). Domain-III contains 5 antiparallel a helices which are arranged into a globular cluster and is attached to the domain-II with the help of a long loop of residues 185-200 . The catalytic dyad (His_41 and Cys_145) as well as the substrate-binding site, is situated inside the cleft present in between domain-I and II. In 6LU7 main protease, the co-crystallized ligand has shown to have anti-parallel alignment with the residues 164-168 on one side and with residues 189-191 on the other side. Several studies have reported the important residues like His_41, Met_49, Gly_143, Ser_144, His_163, His_164, Met_165, Glu_166, Leu_167, Asp_187, Arg_188, Gln_189, Thr_190, Ala_191, Gln_192 are responsible for substrate binding activity, whereas residues Arg_4, Ser_10, Gly_11, Glu_14, Asn_28, Ser_139, Phe_140, Ser_147, Glu_290, Arg_298 are responsible for dimerization of the main protease. Jin et al. (2020) have reported that important residues like His_41, Phe_140, Cys_145, His_163, Met_165, Glu_166, Leu_167, His_172, Asp_187 forming the catalytic binding pocket are highly conserved. The surface loop and the domain-III of M pro contains the most variable regions, whereas the substrate-binding site is highly conserved among the main proteases found in all coronaviruses. Inhibition of the main protease leads to blocking of replication of the coronavirus. As the M pro of SARS-CoV-2 is not homologous with any human proteases known so far, its inhibitors would be less likely to be toxic (Kim et al., 2016). Thus, targeting the M pro enzyme of SARS-COV-2 could provide a strategic approach to curb the viral replication and ultimately effective management of the COVID-19 crisis (Ullrich & Nitsche, 2020).
Determination of the crystal structure of M pro at high resolution (PDB ID:6YB7) provides a great opportunity for the use of structure based drug design tools to screen large libraries of commercially available compounds to identify hit molecules which would be cost-effective in the development of new antiviral agents Ren et al., 2013). A global program COVID MoonShot has been launched to crowdsource drug-design by medicinal chemists. The MoonShot database suggests that the crystallography studies have confirmed that M pro has a solvent accessible and empty active binding site. At this collaborative organ, more than 1500 crystal experiments have been done yet and the data is made available in the public domain. Whereas, more than 500 compounds have been tested which demonstrates the drug-design-based effective screening of hit molecules. Multiple fragment libraries e.g.; DSI-poised library, York3D, and MiniFrags FragLites & Peplites, etc. have been created and screened against M pro (Main protease structure and XChem fragment screen, 2020). Crystal soaking and co-crystallization are being used for the identification of potential covalent, non-covalent, and dimer targeting hit molecules.
In this study, we have performed a structure based virtual screening of a commercially available library of compounds against the active site binding pocket of SARS-CoV-2 M pro . The top-ranking compounds were further subjected to MM/ GBSA binding free energy calculation and in silico absorption, distribution, metabolism, excretion, and toxicity (ADMET) predictions for pharmacokinetic profiling of identified potential M pro inhibitors. Finally, a short molecular dynamics simulation is performed in order to evaluate the stability of the enzyme-inhibitor complex.

Material and methods
The computational studies including molecular dockingbased virtual screening, MM/GBSA, and ADMET calculations were performed using Schrӧdinger molecular modeling suite. Glide module implemented in Schrӧdinger molecular modeling suite was used for docking which approximates a complete systematic search of the conformational, orientational, and positional space of the docked ligand into the binding pocket (Friesner et al., 2004). MM/GBSA method was used to calculate binding free energy using the Prime module (Jacobson et al., 2002; Schr€ odinger Release 2020-3: Prime, Schr€ odinger, LLC, New York, NY, 2020). QikProp (Schr€ odinger Release 2020-3: QikProp, Schr€ odinger, LLC, New York, NY, 2020) was used for the prediction of ADME properties of the identified hit compounds. Molecular dynamics study was performed using Desmond (Bowers, 2006)

Protein preparation and docking grid generation
Beginning from the outbreak of this pandemic, several attempts have been made on the front of structural biology to solve the three-dimentional structure of different drug targets to aid in the drug discovery process. The first threedimentional crystal structure of M pro was published in early February 2020 (PDB ID: 6LU7) . For the current study, we have used the crystal structure of COVID-19 main protease in complex with a co-crystallized fragment 2cyclohexyl-N-(3-pyridyl)acetamide (Z31792168) (PDB ID: 5R84) (https://www.rcsb.org/structure/5R84). As a standard practice, the protein structure was prepared using the protein preparation wizard of the Schr€ odinger suite before further use which includes the assignment of bond orders were, the addition of missing hydrogens, and removal of crystallization water molecules. Further, protonation state and tautomer assignment were carried out and the protein structure was submitted to a restrained molecular mechanics refinement using OPLS_2005 force field (Jorgensen & Tirado-Rives, 1988) until the heavy atom root square deviation value converged to 0.3 Å. To define the binding site for docking of database compounds, a docking grid was generated using centeroid of the co-crystallized ligand Z31792168 with a cutoff of docking ligands with a maximum length of 20 Å. The prepared grid was further used for structure-based virtual screening.

Database preparation
A large number of commercially available chemical libraries are present from where one can cherry pick the compounds for in vitro/in vivo testing based on the initial computational screening (Lyu et al., 2019). One such library is Asinex Gold & Platinum collection which contains 29174 compounds and provides diverse and cost-effective coverage of drug-like chemical space with most of them obeying Lipinski's Rule of 5 (Asinex, 2020). As a first step, the library was filtered on the basis of optimum physicochemical properties using a modified Lipinski filter (Lipinski et al., 1997), and PAINS substructure (Baell & Holloway, 2010) filter for removal of panassay interference compounds to increase the chance to obtain pharmacokinetically optimum hit compounds. LigPrep (Schr€ odinger Release 2020-3: LigPrep, Schr€ odinger, LLC, New York, NY, 2020) module was used for library preparation which performed several tasks such as generated stereoisomers, neutralized charged structures, and determined the most probable ionization state at pH 7.0 ± 2.0, and generated tautomers. Stereoisomers were generated where the compounds were having chiral centers and a single lowest energy conformation obtained by ConfGen were retained which were further minimized using the Impref module by applying OPLS_2005 force field method. The prepared library containing 132727 compounds was used for docking-based virtual screening in the next step. The co-crystallized ligand Z31792168 was prepared in a similar manner.

Structure-based virtual screening
The structure-based virtual screening was performed with a slight deviation from the earlier reported procedure (Yadav et al., 2016). Virtual screening workflow implemented in Maestro (Schr€ odinger Release 2020-3: Maestro, Schr€ odinger, LLC, New York, NY, 2020) interface of Schrӧdinger Suite was used and the overall workflow of virtual screening followed by other in silico studies has been depicted in Figure 1. The dataset prepared in the previous step was used in high throughput virtual screening (HTVS) and the output compounds from HTVS were ranked on the basis of Glide Score and 25% of top-scoring compounds further parsed to standard precision (SP) mode of docking. Further, the topranked 25% compounds were parsed to extra precision (XP) (Friesner et al., 2006) mode of docking which in the end provided 250 compounds. For the purpose of comparision, the co-crystallized ligand, Z31792168 was docked in a similar manner and the docking score and binding pose is compared with the top scoring hits. After manual inspection of the protein-ligand interaction of the top-scoring compounds, the top 25 compounds were chosen for further analysis. The output from the virtual screening utility containing all the thermodynamic information in the form of Glide Score was analyzed using Glide XP visualizer, which enables visualization of ligand-receptor interactions in an interactive manner.

Binding free energy calculations
The virtual screening workflow provided us with the topscoring hit compounds but the XP docking is not able to estimate the binding free energy of the ligand-receptor complex. Therefore, we used Prime MM/GBSA approach to predict the binding free energy of the receptor-ligand complex (Lyne et al., 2006). The pose viewer file from the XP docking comprised of the receptor-ligand complex was used for calculation. Principally, the MM/GBSA approach employs molecular mechanics, the generalized Born model, and solvent accessibility method to predict the free energies from structural information circumventing the computational complexity of free energy simulations (Dill, 1997). The binding free energy of each ligand was calculated using the following equation: Where DE mm is the difference in the minimized energies between the receptor-ligand complex and the sum of energies of the unliganded receptor and ligands. DG sol is the difference in the GBSA solvation energy of the receptor-ligand complex and the sum of energies of the unliganded receptor and ligands. DG SA is the difference in surface area energies for the receptor-ligand complex and the sum of energies of the unliganded receptor and ligands.

In silico ADMET prediction
Since non-optimal ADMET properties are a major cause of attrition in drug development, optimization of these properties at the early stage of drug discovery is highly encouraged (Alavijeh & Palmer, 2004). Therefore, the top twenty-five hits were subjected to the calculation of various physicochemically significant and pharmacokinetically relevant ADMET properties using the QikProp module implemented in Schrodinger molecular modeling suite. (Schr€ odinger Release 2020-3: QikProp, Schr€ odinger, LLC, New York, NY, 2020). Similarily, the ADMET calculations were performed for the co-crytallised ligand, Z31792168 and were compared with those of top scoring hits.

Molecular dynamics simulation
The general limitation of any grid-based docking algorithm is that it treats the receptor as a rigid entity and therefore it provides a still picture of the protein-ligand interaction. However, in the physiological system, this interaction is dynamic in nature. Therefore, to better understand the interaction between the protein-ligand complex of identified hits, we performed a molecular dynamics simulation of the top five identified hits using Desmond molecular dynamics program (Schr€ odinger Release 2020-3: Desmond Molecular Dynamics System, D. E. Shaw Research, New York, NY, 2020. Maestro-Desmond Interoperability Tools, Schr€ odinger, New York, NY, 2020). In general, the protein-ligand complex for each hit was placed into an orthorhombic box of size 10 Â 10 Â 10 Å and solvated explicitly with TIP3P. OPLS3e was used as a force field for the system. The system was neutralized by Na þ and Clions at a final concentration of 0.15 M. The system was minimized and pre-equilibrated using the standard equilibration protocol implemented in Desmond which contains several steps before the final production run. Beginning with a Brownian Dynamics NVT simulation for 100 ps at 10 K temperature, with small timesteps and restraints on solute heavy atoms followed by NVT simulation for 12ps at 10 K temperature with small timesteps and restraints on solute heavy atoms. Then a 12 ps NPT simulation at 10 K temperature with restraints on solute heavy atoms and a 12 ps NPT simulation with restraints on solute heavy atoms, and finally a 24 ps NPT simulation without any restraints. After all these equilibration steps, a final production simulation was performed for 50 ns using NPT (normal Pressure and Temperature) ensemble at 300 K and 1.013 bars with the default setting of relaxation before simulation (Bowers, 2006). The Nose-Hoover Chain thermostat and Martyna-Tobias-Klein barostat were used to maintain the temperature and pressure, respectively (Martyna et al., 1994). The RESPA integrator was used with a time step of 2 fs during the simulation with a smooth PME method for the calculation of long-range electrostatic interaction. The energy and coordinates were saved in the trajectory at every 10 ps. The final simulation trajectory was analyzed using the simulation interaction diagram available in Maestro.

Structure-based virtual screening
To date, several crystal structures of SRS-CoV-2 Mpro have been solved and deposited in protein data bank using xray crystallography and cryo-EM technologies. In this study, we have utilized viral M pro structure in complex with a cocrystallized fragment 2-cyclohexyl-N-(3-pyridyl)acetamide (Z31792168) (PDB ID: 5R84). One of the important steps is proper curation and preparation of ligand libraries with diverse chemical scaffolds but with drug-like properties. The screening of dataset compounds was curated from Asinex website and the library was properly prepared properly as described in the methods section. The prepared library was then subjected to structure-based virtual screening workflow as implemented in the Schrodinger molecular modeling suite which finally led to the identification of top-scoring compounds hits. The docking score of the twenty-five top scoring hits compounds have been provided in Table 1 and it can be observed that the co-crytsllised ligand Z31792168 has a Glide gscre of À4.81 as compared to the top twenty five hits which has a Glide gscore in the range of À8.09 to À6.37. It indicates that the top scoring hits have better binding efficicney as compared to the co-crytsllised ligand Z31792168. The 2 D structures of the top twenty-five hits are shown in Figure 2 and the 2-D ligand interaction diagram and 3 D pose of the ligand in the binding pocket of the enzyme obtained after final docking step for the cocrystallized ligand and the top five hits is shown in Figures 3-5. It can be observed from the structures that the top-scoring hits represent diverse chemical scaffolds like substituted xanthine, dibenzylamine, naphthyl sulfonamide, N,N-disubstituted pyrazine, indole, and benzamide, etc.
Docking studies revealed that all the top 5 hits shared the same binding cavity and have shown interactions with various catalytic site residues like His_41, Asn_142, Gly_143, Ser_144, Cys_145, Met_165, Glu_166 of the main protease. The amide nitrogen of the co-crystallized native ligand was found to interact with M pro mainly through strong hydrogen bonding with Hie_164 amino acid residue and the pyridnyl nitrogen formed a hydrogen bond with Hie_163 residue (Figure 3(A)). Additionally, the pyridinyl ring is surrounded by Cys_145, Ser_144, Asn_142, Leu_141, and Phe_140. On the other hand, the cyclohexyl group is surrounded by Gln_189, Arg_188, Asp_187, and Hie_41. Similar kinds of interactions were observed in the case of the compound AG-690/ 11060013 where the cyclic keto groups of the core nucleus formed hydrogen bonds with Gly_143 and Thr_26 ( Figure  3(B)). Compound AG-690/11203374_1 and AG-690/ 11203374_2 interacted in a similar fashion as shown in Figure 4(A,B), respectively. with the active site of M pro in a similar fashion. The core nucleus formed hydrogen bonds with the Thr_26 and Gly_143 while the p-chloro phenoxy side chain is surrounded by Ser_46, Cys_44 and Hie_41 amino acid residues. Figure 5(A,B) shows docked pose and ligand interaction diagram of hits AG-690/11203374_3 and AH-034/04857012, respectively. Compound AG-690/ 11203374_3 interacted well with the active site of M pro through hydrogen bonds with Gly_143 and Thr26. Additional interaction of the chloro group on phenyl sidechain was observed with Cys44 in the form of a halogen bond. While the compound AH-034/04857012 formed a hydrogen bond with Glu166. Overall, in contrast to the co-crystallized ligand, the hit molecules have shown additional favorable interactions with other important residues like His_41, Asn_142, Gly_143, Ser_144, Cys_145. which makes them potential ligand for this enzyme.

Binding free energy calculations
Further, in order to calculate the free energy of binding, the MMGBSA approach was used. The binding free energy scores of top hits is given in

In silico ADMET prediction
Incorporation of ADMET predictions during pre-clinical drug discovery can help to reduce the risk of failure of drug candidates in clinical trials and this motivated development of several tools for ADMET prediction. In this study, we performed in silico ADMET calculations of top twenty-five hits along with the co-crystallised ligand, Z31792168 using QikProp module and the predicted values of several important parameters along with their acceptable range are given in Table 2 , QPlogPo/w (Predicted octanol/water partition coefficient), QPlogS (Predicted aqueous solubility), QPlogHERG (Predicted IC50 for the blockage of HERG K þ channels), QPPCaco (Predicted apparent Caco-2 cell permeability for gut-blood barrier), QPlogBB (Predicted brain/blood partition coefficient) etc. were selected for this study. We found a slight deviation of PISA parameter for compound AH-034/ 04857012 and AH-034/11365679, whereas a little deviation of QPlogS has been found for compound AH-034/11365679. Apart from these two compounds, all other top twenty-five hits and the co-crystallised ligand, Z31792168 were having favorable pharmacokinetic parameters, which signifies their druggable potential for human use.

Molecular dynamics simulation
Various factors like glide score, MMGBSA energy and ADMET properties were considered for selecting the top 5 hits (AG-690/11060013, AG-690/11203374_1, AG-690/11203374_2, AG-690/11203374_3, AH-034/04857012), which were further utilized for simulation studies. As an additional measure, we performed 50 ns long molecular dynamics simulation of the ligand-protein complex for the top five hits obtained from the virtual screening experiment. The rationale for this was to understand the dynamics of ligand binding to the active site of the enzyme. A molecular dynamics simulation study was necessary to examine the stability and dynamic fluctuations in the ligand-protein complex under a simulated biological environment. Figure 6 depicts various MD trajectory data analysis for ligand AG-690/11060013. The RMSD plot (Figure 6(A)) indicated a stable ligand-protein complex throughout the entire simulation period which was determined from the RMSD values ranging from 0.5 to 2.5 Å for both the protein and ligand system. Additionally, the flexibility of the protein system was assessed during the entire simulation by calculating the RMSF of individual amino acid residues of the protein ( Figure  6(B)) which ranged from 0.4 to 2.0 Å indicating a stable protein-ligand complex. The binding interactions between ligand and active site amino acid residues inside the binding pocket of M pro were computed and represented in Figure 6(C). It was observed that the ligand interacted with the M pro active site with THR_26, HIS_41, GLY_143 and GLU_166 mainly through hydrogen bonding. Hydrophobic interactions with HIS_41, MET_49 and MET_165 and various water bridges with residues SER_144 and CYS_145 also played a supportive role in binding the ligand with M pro active site. RMSF plot of ligand (Figure 6(D)) suggested that the ligand is located in the binding pocket and bound to the amino acid residues at the active site. Figure 6(E) depicts specific interactions between the ligand and active site amino acid residues of M pro . The xanthine core along with the side chain substitutions engaged the active site amino acid residues through various H-bonds, hydrophobic interactions, and water bridges. Finally, a timeline of the protein-ligand interactions was plotted (Figure 6(F)) in the form of hydrogen bonds, hydrophobic interaction, ionic interaction, and water bridges throughout the 50 ns simulation period. The top panel indicates the total number of specific protein-ligand contacts and the bottom panel indicates residue level interaction of the ligand. Overall, the ligand interacted well with the M pro binding pocket and a minimum of 4 contacts were present throughout the simulation period. Figure 7 depicts all the analysis data of MD trajectory for ligand AG-690/11203374_1. It can be observed from the root mean square deviation (RMSD) plot (Figure 7(A)) that the protein as well ligand remain stable throughout the simulation period and the RMSD value was between 1.1 to 2.7 Å for both protein and ligand, indicating a stable complex between them. Additionally, in order to evaluate the flexibility of the protein structure throughout the simulation, we calculated root mean square fluctuation (RMSF) of individual amino acid residues of the protein (Figure 7(B)). The whole protein had an RMSF value between 0.4 to 2.0 Å except the terminal amino acids that showed moderate movement throughout the 50 ns long simulation. The interaction of the ligand with the individual amino acid residues in the binding pocket of M pro was quantified and represented in Figure  7(C). It is clear from the plot that the lignad interacted with the M pro active site with THR_26 mainly through hydrogen bonding, with HIS_41 by hydrophobic interaction and with GLU_166 via both hydrogen bonding and water bridges. Hydrophobic interactions played a major role in the interaction of this ligand with M pro active site. Hydrogen bonding is one of the major forces involved in protein-ligand interaction and it has an indispensable role in stabilizing the protein-ligand complexes (Du et al., 2016). RMSF plot of ligand (Figure 7(D)) indicated that the ligand largely stayed bound to the active site amino acids. Figure 7(E) shows specific interactions of the ligand with active site amino acid residues of M pro . It is observed that both xanthine core, as well as side chain substitution, engaged the active site amino acid residues. Finally, a timeline of protein-ligand interactions in the form of hydrogen bonds, hydrophobic interaction, ionic interaction, and water bridges is plotted as shown in Figure  7(F) throughout the 50 ns simulation time. The top panel indicates the total number of specific protein-ligand contacts and the bottom panel indicates residue level interactions of the ligand. Overall, the ligand interacted well with the Mpro binding pocket and a minimum of 4 contacts were present throughout the simulation period. Figure 8 depicts the MD trajectory data for ligand AG-690/11203374_2. The RMSD plot (Figure 8(A)) indicated a stable ligand-protein complex throughout the entire simulation period which was ascertained from RMSD values ranging from 1 to 2.1 Å for protein and 0.37 to 2.1 Å for the ligand. The flexibility of the protein structure was evaluated throughout the simulation by calculating the RMSF of individual amino acid residues of the protein (Figure 8(B)). The entire protein system had an RMSF value between 0.3 to 1.7 Å except for the terminal amino acid residues which showed moderate movement throughout the 50 ns simulation. The interaction between ligand and individual amino acid residues inside the binding pocket of M pro is quantified and represented in Figure 8(C). It can be seen from the plot that the ligand interacted with the M pro active site with THR_26, HIS_41, ASN_142, GLY_143, and GLU_166 mainly through hydrogen bonding. Hydrophobic interactions with residues HIS_41 and MET_165 and water bridges with residues SER_144 and CYS_145 also played a supportive role in binding the ligand with M pro active site. RMSF plot of ligand ( Figure 8(D)) suggested that the ligand mostly stayed inside the binding pocket bound to the active site amino acid residues. Figure 8(E) depicts specific interactions between the ligand and active site amino acid residues of M pro . The xanthine core along with the side chain substitutions engaged the active site amino acid residues through various H-bonds and hydrophobic interactions. Finally, a timeline of proteinligand interactions in the form of hydrogen bonds, hydrophobic interaction, ionic interaction, and water bridges is shown in Figure 8(F) throughout the 50 ns simulation time. The top panel indicates the total number of specific proteinligand contacts and the bottom panel indicates residue level interactions of the ligand. Overall, the ligand interacted well with the M pro binding pocket and a minimum of 4 contacts were present throughout the simulation period. Figure 9 depicts the MD trajectory data for ligand AG-690/11203374_3. The RMSD plot (Figure 9(A)) indicated a stable ligand-protein complex throughout the entire simulation period which was ascertained from the RMSD values varying from 0.5 to 2.4 Å for both the protein and ligand. The flexibility of the protein structure was evaluated throughout the simulation by calculating the RMSF of individual amino acid residues of the protein (Figure 9(B)). The entire protein system had an RMSF value ranged from 0.4 to 2.0 Å except for the terminal amino acid residues which showed moderate movement throughout the 50 ns simulation duration. The interaction between ligand and individual amino acid residues inside the binding pocket of M pro is quantified and represented in Figure 9(C). It was observed that the ligand interacted with the M pro active site with amino acid residues THR_26, ASN_142, GLU_166, ASP_187, GLN_189, THR_190, and GLN_192 mostly through hydrogen bonding, which is one of the major forces involved in binding interactions as well as in stabilizing the protein-ligand complex. Hydrophobic interactions with HIS_41, MET_49 and MET_165 residues and water bridge with GLU_166 residue also played a supportive role in binding the ligand with M pro active site. RMSF plot of ligand (Figure 9(D)) suggested that the ligand mostly stayed inside the binding pocket bound to the active site amino acid residues. Figure 9(E) depicted specific   -2 to 6.5); QPlogS (Predicted aqueous solubility: -6.5 to 0.5); QPlogHERG (Predicted IC 50 for blockage of HERG K þ channels: concern <-5); QPPCaco (Predicted apparent Caco-2 cell, model for gut-blood barrier, permeability in nm/sec: poor if < 25 and great if > 500); QPlogBB (Predicted brain/blood partition coefficient: -3 to 1.2); QPPMDCK (Predicted apparent MDCK cell permeability in nm/sec predicting non-active transport across blood brain barrier; < 25 poor and > 500 great); QPlogKp (Predicted skin permeability; interactions between the ligand and active site amino acid residues of M pro . The xanthine core along with the side chain substitutions engaged the active site amino acid residues through various H-bonds and hydrophobic interactions. Finally, a timeline of protein-ligand interactions in the form of hydrogen bonds, hydrophobic interaction, ionic interaction and water bridges is plotted as shown in Figure 9(F) throughout the 50 ns simulation time. The top panel indicates the total number of specific protein-ligand contacts and the bottom panel indicates residue level interaction of the ligand. Overall, the ligand interacted well with the M pro binding pocket and a minimum of 4 contacts were present throughout the simulation period. Figure 10 shows all the MD trajectory analysis data for the ligand AH-034/04857012. It was observed from the RMSD plot ( Figure 10(A)) that the protein as well ligand remains stable throughout the simulation period and the RMSD value ranged from 0.87 to 2.12 Å for both the protein and ligand indicating a stable complex between them. Furthermore, the flexibility of the protein structure was evaluated throughout the simulation by calculating the RMSF of individual amino acid residues of the protein (Figure 10(B)). The entire protein had an RMSF value ranging from 0.4 to 2.0 Å except for the terminal amino acids, which showed moderate movement throughout the 50 ns long simulation. The binding interactions of the ligand with active site amino acid residues inside the binding pocket of M pro were quantified and represented in Figure 10(C). It was observed that the ligand interacted with the M pro active site with GLU_166 and GLN_189 through hydrogen bonding and water bridge. Hydrophobic interactions of the ligand with amino acid residues LEU_27, HIS_41, MET_49, MET_165, LEU_167, and PRO_168 played a major role in binding interactions with M pro active site. RMSF plot of ligand (Figure 10(D)) indicated that the ligand largely stayed bound to the active site amino acids. Figure 10(E) shows specific interactions of the ligand with active site amino acid residues of M pro . It can be observed that N-benzyl moiety and the carbonyl moiety present on phenoxyethyl side chain was engaged with the active site amino acid residues. Finally, a timeline of the protein-ligand interactions is plotted (Figure 10(F)) in the form of hydrogen-bonds, hydrophobic interaction, ionic interaction and water bridges throughout the 50 ns simulation duration. The top panel indicates total number of specific protein-ligand contacts and the bottom panel indicates residue level interaction of the ligand. Overall, the ligand interacted well with the M pro binding pocket and a minimum of 2 contacts were present throughout the simulation period.
The molecular dynamics result verified the nature of ligand interaction with active site residues as predicted by docking studies. It indicates highly stable binding poses of our top 5 hit molecules. Based on the overall findings, it appears that compound AG-690/11203374_1 and AG-690/ 11203374_2 have evolved to be the most successful hit molecules having a maximum number of interactions and good binding energy. Both of the compounds are substituted xanthines having an only structural difference in the side chain attached to the C-2 of imidazole moiety. The oxo functionality of the purine nucleus, hydroxyl group attached to the side chain on imidazole N 1 , and phenoxy side chain are highly important structural scaffolds responsible for producing favorable interactions with various catalytic site residues. In compound AG-690/11203374_2, a butyl-thio side chain attached to C-2 of the imidazole nucleus did not yield any interactions. However, the presence of polar ester functionality in the same chain helped in formation of additional hydrogen bonding with Asn_142 residue, as observed in compound AG-690/11203374_1.

Conclusions
Structure-based virtual screening based on molecular docking technique was utilized to identify twenty-five potential hit compounds against SARS-CoV-2 M pro from commercially available small molecule dataset. Extra precision molecular docking analysis revealed that the top hits compounds were involved in moderate to strong protein-ligand interaction with the binding site residues of M pro indicated from their good XP-score. The dominant attractive forces for ligand binding inside the active site cavity were hydrogen bonding and hydrophobic interactions. In silico ADME studies revealed that the hit compounds were having optimal pharmacokinetic properties. Additionally, top-five hit compounds were subjected to molecular dynamics simulation studies to examine the stability and integrity of their complex with the enzyme, M pro . The results from MD simulation have shown good stability and less conformational change of the receptor-ligand complex during a 50 ns trajectory.
Overall results show that compound AG-690/11203374_1 and AG-690/11203374_2 fits snugly inside the binding cavity and have shown strong interactions with highly conserved residues like His_41, Cys_145, Glu_166. Among the topranked hits, compound AG-690/11203374_1 and AG-690/ 11203374_2 emerged as the best in silico hits based on the docking, MMGBSA, dynamics and ADMET studies. Although the biological efficacy has yet to be determined for the two in silico hits, but it is anticipated that they could serve as suitable lead molecules for the development of new antiviral agents against this pandemic.

Funding
Indian Institute of Technology (BHU)10.13039/501100002742The work is supported by seed grant from Indian Institute of Technology (BHU) awarded to RK.