2D-QSAR, 3D-QSAR, molecular docking and ADMET prediction studies of some novel 2-((1H-indol-3-yl)thio)-N-phenyl-acetamide derivatives as anti-influenza A virus

ABSTRACT Due to the emergence of drug-resistant strains of influenza A virus (IAV) in recent times, the need to search and discover more potent anti-IAV inhibitors is of great interest, especially with the devastating COVID-19 pandemic. The present research applied 2D-QSAR, 3D-QSAR, molecular docking, and ADMET predictions on some novel analogs of 2-((1 H-indol-3-yl)thio)-N-phenyl-acetamide as IAV inhibitors. The 2D-QSAR modeling results revealed GFA-MLR ( =0.8861, q2 = 0.7864) and GFA-ANN ( =0.8980, q2 = 0.8884) models with the most relevant descriptors for predicting the anti-IAV responses of the compounds, which have passed the benchmarks for accepting QSAR models. The 3D-QSAR modeling results suggested CoMFA_SE ( =0.925, q2 = 0.59) and CoMSIA_EAD ( =0.929, q2 = 0.767) models for good and reliable activity predictions. The molecular docking of the compounds with the active site of neuraminidase (NA) receptor theoretically confirms their resilient potency. The compounds mostly formed H-bond and hydrophobic interactions with key residues, such as ARG118, ASP151, GLU119, TRP179, ARG293 and PRO431 that triggered the catalytic reaction for the NA inhibition. However, compounds 16 and 21 were identified as lead compounds in the in-silico search for more potent candidates. The outcome of this study set a course for the in-silico design and search of potential candidates for influenza therapy.


Introduction
Influenza (A) virus (IAV) infection remains one of the major causes of mortality and morbidity due to respiratory diseases in recent times even with the devastating Covid-19 pandemic [1]. The World Health Organization (WHO) reported about 2-5 million cases of severe illness caused by the ravaging seasonal influenza virus epidemic which resulted in over 500,000 deaths globally [2]. These flu epidemics cause severe respiratory infections in children, adults, the elderly, and individuals with underlying health conditions [3] [4]. Influenza virus neuraminidase (NA) is an enzyme that catalyzes the obliteration of terminal sialic acid residues (sialidase) which aids in liberating new virions formed from the infected cells and circulating to infect the neighboring cells [5]. As such, the NA inhibition can defend the host from being infected and prevent its proliferation [1]. Due to the highly preserved active site structure of neuraminidase [6], it has become an attractive molecular target for the exploration and development of novel anti-influenza inhibitors. Presently, Zanamivir (Relenza™), oseltamivir (Tamiflu™), laninamivir octanoate (Inavir™), and peramivir (Rapivab™) are the four (4) approved neuraminidase inhibitors for influenza treatment [7]. The IAV disease is usually linked to severe symptoms because of the intense genetic diversity characterized by chromosomal mutation between avian and human viruses. Presently, the only two major classes of antiviral medicines against the influenza A virus are inhibitors of M2-ion channel (rimantadine and amantadine) and neuraminidase (zanamivir and oseltamivir) targets that fight against its spreading around the globe. However, most influenza A virus strains have become resistant to these drugs in recent times. There is a lot of concern for the advent of drug resistance effects resulting from the high variability of the influenza virus or respiratory syncytial virus (RSV) [5]. This is because a patient infected with either virus can manifest similar symptoms at an early stage. The discovery of some novel compounds of 2-((1 H-indol-3-yl)thio) acetamide as dual inhibitors against IAV and RSV is a huge milestone for the rapid therapy of these respiratory coinfections. Moreover, the trial and error approach applied in the development of new drugs has been seen to be very tedious, costly, and timeconsuming [8].
The application of some computational modeling concepts such as two-dimensional quantitative structure-activity relationship (2D-QSAR), three-dimensional quantitative structure-activity relationship (3D-QSAR), molecular docking, and ADMET (absorption, distribution, metabolism, excretion, and toxicity) predictions can save time and reduce the cost of synthesizing the new potent drugs [9]. The QSAR analysis is usually employed to make activity predictions of the compounds using some numerical information derived from their molecular structures (descriptors). The molecular docking simulation is also an important molecular modeling strategy that provides information about the residual interaction types between the target ligands and the active site of a protein as a receptor [10]. In this study, 2D-QSAR, 3D-QSAR, molecular docking, and ADMET predictions were applied to some novel compounds of 2-((1 H-indol-3-yl)thio)acetamide to discover probable lead compounds for influenza disease therapy. The 2D-QSAR models were initially constructed as a multilinear regression model (MLR) based on genetic function approximation (GFA) for feature selection of the descriptors whose validation statistics were verified using artificial neural network (ANN) regression calculations. The 3D-QSAR models were constructed based on the comparative molecular field analysis (CoMFA) and comparative similarity indices analysis (CoMSIA) for the activity prediction of the compounds. The compounds were also virtually screened via molecular docking with the NA receptor to have more insight into the residual interactions of the most stable ligandprotein complexes formed. Finally, the druglikeness and pharmacokinetic prediction of the best lead compounds were identified and evaluated in the study.

Data set collection and biological activities (pEC 50 )
The data set comprising 32 compounds from a series of 2-((1 H-indol-3-yl)thio)-N-phenylacetamide derivatives as novel influenza A virus (IAV) inhibitors were selected from the earlier published literature [3]. The anti-IAV responses of the compounds were evaluated as half-maximal effective concentrations (EC 50 ) in a micromolar (μM) unit. The concentrations were further transformed to a logarithmic scale (pEC 50 = -log EC 50 × 10 −6 ) to reduce data skewness for the QSAR studies [11]. Table 1 shows the list of substitution patterns in the various structures of the compounds in the data set along with their respective EC 50 and pEC 50 values.

2D-QSAR study Molecular descriptor computations. The 2D
structures of the 32 chemical compounds in the dataset were carefully drawn using ChemDraw software. The structures were converted to 3D with the successive preliminary energy minimization at the molecular mechanics force field (MMFF) level using Spartan 14 software [10]. The minimized compound structures were completely optimized at the density functional theory (DFT) level with B3LYP/631 G** basis set in a vacuum to have a more realistic structural conformation when their respective equilibrium geometries were attained [12]. The pharmaceutical data exploration laboratory software (PaDEL Descriptor) was used to compute over 2000 molecular descriptors for all the optimized structures [13]. These molecular descriptors are computed based on the steric potentials, electronic, potential hydrogen bonds of path length, relative ionization, and hydrophobicity properties of structures [14]. As such, 1D, 2D, and some 3D java class descriptors were computed by retaining the 3D coordinates of the optimized structures.
Data pretreatment and division. The computed descriptors were pretreated by removing non-informative descriptors such as constant and highly inter-correlated descriptors [15]. The constant descriptors with a default variance cutoff of 0.001 and inter-correlated descriptors with a coefficient cutoff of 0.8 were applied to remove the non-informative descriptors [16,17]. Using the Kennard-Stone division algorithm, the pretreated descriptors were divided into a model development set (23 compounds as the training set) and 9 compounds as the test set for the evaluation of the prediction performance of the model.

2D-QSAR model building and statistical validation.
The 2D-QSAR model was initially built using the Material Studio software based on the genetic function approximation (GFA) algorithm for feature selection of the best subset descriptors in the training set [12]. The Friedman Lack-of-Fit (LOF) as the fitness function of the GFA model during the evolution process was measured, while the scaled LOF smoothness parameter was set as the default of 0.5 [18]. However, the LOF measured with material studio slightly differs from the original Friedman formula as shown in equation 1.
where C is the number of the model terms other than the constants, d is the scaled smoothing parameter, p is the total number of descriptors as model terms excluding constants, N is the training set compounds, γ is the safety factor with a score of 0.99 which ensures that the denominator must be equal to zero. The scaled smoothing parameter is related to the scaled LOF smoothness factor (γ which was set at default 0.5 for a well-defined LOF measure as shown in equation 2. In addition, the population sample was set to 10,000, the maximum generation was set to 1000, and the number of top equations was set to 1 for an effective model convergence [19]. The descriptor matrix of the built model was initially subjected to the Y-Randomization test as a measure to attest to the quality of the model before being exported to Molegro Data Modeler (MDM) for the development of the multi-linear regression (MLR) and the nonlinear regression model version based on artificial neural network (ANN) analysis [18,20]. The predictive ability of the GFA-MLR and GFA-ANN models generated was examined using the following internal validation parameters as follows: • The Pearson correlation coefficient (r): is a measure of the correlation of two variables x and y. It is mathematically defined as: where σ x and σ y are the standard deviations for the variable x + and y. However, Pearson correlation coefficient squared (r 2 ) is often used to describe relationships between two variables whose range of values is between 0 and 1.
• Adjusted r 2 : is a modification of the Pearson correlation coefficient that finetunes the number of descriptors used in the multi-linear regression model, which will always be less than or equal to the Pearson correlation coefficient as defined below: Where N corresponds to the number of compounds in the training set as data points and p is the number of descriptors in the built model.
• Spearman's rank correlation coefficient (ρ): is a well-ordered correlation coefficient that utilizes the data points hierarchy as a substitute for the raw data points, and it is defined as: where the raw data points are changed to ranks. d i is the difference between the ranks of corresponding values of x and y and N is the number of data points.
• Cross validated correlation coefficient often represented as (q 2 ): is a measure of the predictive power of the regression model, and it is defined as: where the predictive sum of squares (PRESS) is defined as: where x obs;i , and x pred;i refer to the experimental and predicted activities. The closer the score of q 2 is to 1.0, the better the model's predictive power.
• Root mean square error (RMSE): is a good measure for evaluating the prediction performance of the model generated which is proportional to the observed mean score as defined below.
RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi The reliability and the predictive performance of the models were also assessed with relevant external validation metrics as proposed by some prominent QSAR scientists, such as Alexendra Tropsha, Paola Gramatica, and Kunal Roy [21]. Some of the external validation metrics include the predicted coefficient of determination for the test set R 2 Pred À � , regression coefficients for the test set (R 2 test Þ, delta modified square of correlation coefficient among others [22].

Model Applicability Domain (AD). The model
applicability domain is the theoretical chemical space of the compounds defined by the descriptors and the modeled activity in which the acceptable QSAR model can make reliable predictions [11]. Thus, the technique helps in detecting the structural and response outliers in the training and test set, respectively. Furthermore, the leverage approach was utilized to assess the chemical space of a QSAR model, and the plot of standardized residuals against leverage values (h), also known as the Williams plot, was used to virtually screen the compounds [10]. As such, compounds with leverage scores less than the threshold (h < h*) and standardized residual scores within ±3.0σ (standard deviation unit) are set to be within the model's applicability domain. The warning leverage (h*) is calculated using: where D is the number of descriptors in the model and N is the number of compounds used as the training set. The molecular alignment of a database is one of the most crucial steps for building a reliable and predictive 3D-QSAR model. Hence, the distill rigid alignment was used to align the compounds of the studied dataset to the most potent compound (compound no. 21) as the template (Figure 1(a-b)).

Development of 3D QSAR models. The 3D-
QSAR modeling studies have been recognized as an important concept in designing more potent inhibitors, which relates their 3D structural attributes with bioactivity [23]. The 3D-QSAR models were developed using CoMFA and CoMSIA methods. For the CoMFA method, steric (S) and electrostatic (E) fields around the compounds are correlated with bioactivity, while in the case of the CoMSIA method, hydrophobic (H) and hydrogen bond acceptor/donor fields are correlated with bioactivity [24]. In the CoMFA model development, the aligned compounds were imported into a 3D cubic lattice with a grid spacing of 2 Å. The Tripos force field utilized a carbon sp 3 probe atom with a radius of 1.52 Å and +1 positive charge to compute steric field energies using Lennard Jones 6-12 potential and electrostatic field energies from Coulombic potential with distance as a function of the dielectric at each lattice point. The column filtering was set at a default parameter of 2.0 Kcal/mol to reduce noise and fast potential calculation. In CoMSIA model development, steric (S), electrostatic (E), hydrophobic (H), hydrogen bond acceptor (A), and donor (D) field descriptors are computed using the carbon sp3 atom with a positive charge (+1) and radius of 1 Å at default attenuation factor of 0.3 to build different CoMSIA models.

Statistical validation of the 3D-QSAR models.
The 3D-QSAR models were built by correlating the latent components from the set of available CoMFA and CoMSIA descriptors as the independent variables with the anti-IAV response effects of the compounds in the model building set (training set) through partial least squares (PLS) regression analysis [25]. The competency of the 3D-QSAR models built was analyzed based on the prominent statistical validation parameters for an acceptable QSAR model.

Molecular docking studies
A molecular docking investigation was carried out on the studied dataset using the molecular operating environment (MOE) V2015. 10 software. The H1N1 2009 pandemic influenza neuraminidase complexed with zanamivir (PDB: 4B7Q) was used as the protein receptor for the study, and the co-crystalized ligand (zanamivir) in the receptor was used as the reference drug [26].

Energy minimization and the docking protocol
The downloaded protein was imported into the MOE workspace, then further subjected to energy minimization by fixing all hydrogen atoms, lone pairs, partial charges, and conformational searching until an RMS distance of 0.1 Å and RMSD gradient of 0.01 kcal/mol were attained using Amber12: ET force field. The database of the 32 optimized compounds in the (.mdb) file was imported for the docking protocol [27]. To increase the docking accuracy, the co-crystalized ligand was initially re-docked with its protein to analyze the ligand-pocket interactions or active site, and the RMSD scores were calculated between the docked poses and the co-crystalized ligand. Then, a docking simulation of all ligands was carried out within the active sites of the receptor. The MOE program was adjusted to the triangle matcher method based on the London dG scoring function for placement selection and induced fit based on GBVI/WSA dG scoring function for refinement [28]. The best poses obtained were studied with the further visualization of the most stable complexes formed using Discovery studio.

Drug likeness evaluation and ADMET predictions
The evaluation of drug-likeness and pharmacokinetic parameters of potential drug candidates is crucial at the preliminary stage of the drug discovery which assists in rolling out any unfavorable effects of the candidates [29]. The pharmacokinetic parameters are built on desirable adsorption, distribution, metabolism, excretion, and toxicity (ADMET) of the query drug when administered [10]. An accurate and resourceful ADMETlab 2.0 web server (https://admetmesh. scbdd.com/) was exploited to predict several physicochemical and toxicity parameters in the study [30,31]. In addition, the druglikeness of the compounds was appraised in terms of Lipinski, Ghose, Veber, Egan, and Muegge rules using the SwissADME online web server at http://www.swissadme.ch/ index.php.

2D-QSAR studies
The 2D-QSAR modeling studies were carried out successfully on 2-((1 H-indol-3-yl)thio)-N-phenyl-acetamide derivatives as novel influenza A virus (IAV) inhibitors. The built 2D-QSAR models in this study were used to make predictions of the anti-IAV responses (pEC 50 ) for the compounds with the influence of some robust statistically significant molecular descriptors [10]. As mentioned earlier, the GFA model building protocol of Material Studio was used for the feature selection of the best descriptors matrix. Subsequently, 2D-QSAR models were successfully built based on the MLR and ANN modeling techniques with the five (5) best descriptors whose model internal and external validation metrics were shown in Tables 2  and 3.
The model descriptors are coded as ATSC7i, SpMin1_Bhm, AVP-0, L2m, and E1e, which portray some essential topological features of the compounds in numerical values as computed in Table S1. The proposed GFA-MLR model equation was given below: Furthermore, the reliability of the GFA-MLR model was evaluated with the validation metrics such as LOF value of 0.1235, r 2 (training set) of 0.8861, adjusted r 2 of 0.8525, crossvalidation (q 2 ) of 0.7864, RMSE of 0.1350, r 2 test of 0.7230, and r 2 pred of 0.6872 which have all passed the model criteria of accepting QSAR model. The Y-Randomization test was ascertained via random scrambling of the response activity while keeping the model descriptors constant to create random models [17]. The 50 random models were generated with low R 2 and Q 2 scores which attested that the original model is robust and not constructed by chance [15]. The coefficient of determination for the Y-randomization test (CR 2 p Þ was computed as 0.7818 (≥0.5) which confirmed the reliability of the model generated as shown in Table S2. Hence, the validation criteria of the model fully agreed with the acceptable threshold parameters [32]. The description name, variance inflation factor (VIF), and mean effect of the descriptors were presented in Table S3. The Pearson correlation of less than 0.5 signifies that there is no inter-correlation among each descriptor pair, while the VIF scores were calculated using Eq. 10, where R 2 is the Pearson correlation coefficient for each descriptor pair.
The VIF scores of the five (5) subset descriptors fall within the threshold limit (VIF< 10) suggesting void multicollinearity, which implies that the descriptors are orthogonal to one another [33]. Similarly, the same five (5) subset descriptors selected from the GFA-MLR techniquewere utilized as inputs for the ANN model with a single hidden layer. The predicted anti-IAV activity (pEC 50 ) of the compounds constitutes a single output layer, which is computed with the aid of a sigmoid transfer function. The model was built at a parameter set up (max epochs = 1000, momentum = 0.2, learning rate = 0.5, output layer learning rate = 0.44) using back-propagation algorithm [34]. After numerous trials, a hidden layer with four neurons was selected due to its lowest RMSE value as shown in Figure 2. Thus, a GFA-ANN with 5-4-1 neural architecture (Figure 3) was obtained with improved statistical validation parameters, such as r 2 (training set) of 0.9163, cross-validation (q 2 ) of 0.8739, RMSE of 0.1420, r 2 test of 0.7368, and r 2 pred of 0.6257 as shown in Tables 2 and 3 accordingly.
The relative contribution of each descriptor toward an increase or decrease in the anti-IAV responses is measured based on their mean effect scores (ME) defined as: where βi represents the coefficient of the descriptor i and Di represent each descriptor score for a molecule and n represents the number of training set molecules [35]. It was observed that SpMin1_Bhm and AVP-0 descriptors are the major contributors to the increase in the anti-IAV activity with positive mean effect scores of +0.7301 and +0.2361, respectively. In addition, the coefficient relevance and relevance scores were further computed to evaluate the influence of each descriptor used in building both GFA-MLR and GFA-ANN models, respectively, as shown in the relevance plot ( Figure 4). The coefficient relevance scores were computed from the GFA-MLR model using the available five (5) descriptors. Each coefficient relevance of a descriptor was computed through the multiplication of its model coefficient value and the standard deviation divided by the standard deviation of the target variable. Similarly, the relevance scores were computed from the GFA-ANN model, and each relevance score is computed by adapting all paths from the input layer neurons through the hidden neurons layers to the output neurons [20]. The product of all absolute values of the connection weights is added to the relevance score, which is normalized within the range of 0 and 100. Hence, the degree of contribution of the descriptors based on the coefficient relevance and relevance scores follow the same trend; AVP-0 > SpMin1_Bhm > E1e > L2m > ATSC7i. This confirms the greatest effect of AVP-0 and SpMin1_Bhm on the prediction of the anti-IAV activity. AVP-0 is a topological descriptor that gives information on the molecular connectivity indices computed using the Chi operator [36]. It is an average valence path, a zero-order descriptor that usually depends on the number  of the lone pair, π electrons, presence of heteroatoms, degree of flexibility, and branching in the compounds [37]. SpMin1_Bhm represents an Eigen value-based descriptor known as the smallest absolute eigenvalue of Burden modified matrix with lag 1 weighted by the relative atomic mass, while L2m is the 2nd component size directional Weighted Holistic Invariant Molecular (WHIM) descriptor also weighted by the relative atomic mass [37]. These two descriptors mainly describe the molecular structure in terms of size, shape, and symmetry as functions of some invariant reference frame. E1e descriptor is the first component accessible directional WHIM index weighted by the relative Sanderson electronegativity computed from the principal component of the molecular coordinate of the structure [37]. Hence, this proposed that the electronic distribution of atoms in the molecule had a significant effect on the anti-influenza activity of the compounds. ATSC7i is the centered Broto-Moreau autocorrelation -lag 7 weighted by first ionization potential which accounts for the electronic effects and degree of unsaturation in the molecular system [36,37].
Model error predictions (residuals) consist of three (3) important components such as random (variance), systematic (bias), and measurement (noise) error, but models are more affected by systematic errors [38]. Therefore, a model with high systematic error should be reconstructed again to reduce the high level of bias. This is because bias redirects the data into an artificial course that could lead to the wrong interpretation [19]. The ability of the GFA-MLR and GFA-ANN models in predicting the experimental activity of the compounds without any computational errors was assessed using the standardized residual versus anti-IAV response activity (pEC 50 ) plots as shown in Figures. S1 and S2 where all standardized residual values fall within the definite threshold of ±2.0. Hence, this implies that the model is free of systematic error and can give a good prediction.

Model applicability domain of the 2D-QSAR model
The applicability domain of the 2D-QSAR model in chemical space where the model can make a reliable prediction based on the five (5) defined model descriptors stated earlier. In  this study, the leverage approach was applied to examine the chemical space of the GFA-ANN model. The standardized residuals computed were plotted against the leverage values for all molecules (Williams plot) to identify the response and structural outliers as presented in Figure 5. Interestingly, all molecules in the dataset were confined within the standardized residual threshold limit of ±2.5 and leverage (h*) of 0.782, respectively. Therefore, the model predicted no response or a structural outlier.

Statistical validation results for the 3D-QSAR models
Before building the 3D-QSAR models, the optimized structures were automatically placed into a training set (24 compounds) and test set (8 compounds) based on the structural diversity method using Sybyl-X 2.1.1 program. Subsequently, the CoMFA model was built with both electrostatic and steric field contributions as the independent variables of the training set and was further exposed to cross-validation of PLS regression analysis. The statistical validation results of the CoMFA_ES model are r 2 (0.59 with 4 latent components), non-cross validated r 2 value (0.925), and SEE (0.1538). However, the statistical validation results from the various probable CoMFA models are summarized in Table 4.
For the CoMSIA modeling study, the statistical validation results of all 30 probable models with different field combinations based on the five (5) different field descriptors were shown in Table S4. Furthermore, the CoMSIA-EAD model computed a higher q 2 score of 0.767 with 3 components, a higher r 2 value of 0.929, and a relatively low SEE of 0.1526. However, the best models among the possible CoMSIA models with their robust statistical validation and contribution fractions are summarized in Table 5. In addition, the validation metrics of all possible CoMFA and CoMSIA models were found within the benchmark scores for an acceptable QSAR model that was proposed by Alexander Golbraikh and Alexander Tropsha (q 2 > 0.5 and r 2 > 0.6). This implies that the validation metrics of the models generated are statistically reliable, which indicates their predictive potential and robustness [38]. The graphs of predicted against experimental anti-IAV response effects (pEC 50 ) for the training and test set compounds of the models revealed a satisfactory linear correlation, as presented in Figures. S3 and S4, respectively.

Contour map analysis of the CoMFA and CoMSIA models
The utmost advantage of applying CoMFA and CoMSIA approaches is to be able to visualize the fields of the compounds influencing the variation of specific target properties in terms of contour maps. Compound 21 with the highest pEC 50 was selected as a template to examine the most prominent field contributions for the studied dataset. The steric and electrostatic contour maps of the CoMFA_ES model for compound 21 were shown in Figures 6(a-b). The CoMFA contour maps provide valuable information on the regions around the compounds that can decrease or increase bioactivities. For steric contour maps, the green contour depicts that the desirable addition of bulky groups in the regions would increase the activity, while the yellow contours portrayed that the steric or bulky groups are undesirable in the region for increasing activity [39]. In the electrostatic field contour maps, the red regions depict regions where electron-withdrawing groups enhance the activity, while the blue regions depict regions where electron-donating groups increase the activity. The green contour was predominantly distributed around the 2-EtO substituent on the benzene of compound 21, proposing the further introduction of bulky groups in the region would enhance the activity. Meanwhile, the yellow contours around the indole moiety of the same compound discourage further attachment of bulky fragments in the region, which would decrease the activity of the compound. The red contour near the hydrogen of the 2-EtO group of the compound suggests that attaching electronegative groups at the position may increase the anti-IAV activity of the compound.
The 3D contour maps for the best CoMSIA_EAD model were shown in Figures 7  (a-c), where the electrostatic contour map in the model is more or less similar to that of the CoMFA model. As such, the discussion will make more emphasis on the hydrogen-bond fields. For the hydrogen bond acceptor contour map (Figure 7(b)), the magenta contours reveal the favorable acceptor regions and the red contours indicate unfavorable regions. Hence, the magenta contour was observed near position 6    of the benzene ring and sulfur atom at position 2 in the indole moiety of the same compound. The HBD contour map is presented in Figure 7 (c), where the cyan contour depicts the HBD favorable region and the purple contour reveals unfavorable HBD regions for the HBD contour map. The purple contour was observed near the 2-EtO and -HN moiety of the same compound. Based on the CoMFA and CoMSIA contour maps, the general strategy for improving the anti-IAV activity of the compound is summarized in Figure 8. The outcome of this analysis theoretically reaffirmed the importance of the 2-ethoxyl substitution on aniline and halo substituent on the indole ring of the template compound for a higher IAV activity as similarly reported by Zhang et al., (2019) [3]. Hence, the addition of more electronegative and steric groups in the highlighted regions is crucial in designing more hypothetical compounds with improved anti-IAV inhibitory effects.

Molecular docking results
The 2-((1 H-indol-3-yl)thio) acetamide derivatives (32 compounds) of the dataset were docked with the H1N1 neuraminidase receptor (PDB: 4B7Q) using MOE software as depicted earlier. Before the docking started, the co-crystalized zanamivir was extracted from the receptor, and then docked into the binding pocket to confirm the reliability of the docking algorithm used by the MOE program as well as to note the amino acid residues surrounding the ligand. The results revealed the docking scores of the best poses of the compounds ranging from −7.1303 to −5.8682 kcal/mol which confirmed their good potency when compared with the ribavirin (compound 32) as shown in Table 6. This observation is in agreement with the experimental findings of Zhang et al., (2019) on the potent inhibitory effect of the compounds.
The compounds 16 and 21 were found to have higher binding scores of −7.1303 and −7.0431 kcal/mol, respectively, in comparison with others in the dataset. As such, these selected compounds are the most preferable candidates to be considered as lead molecules due to their higher anti-IAV (pEC 50 ) and binding scores.  The residual interactions of compound 16 revealed three (3) conventional H-bonds, four (4) carbon-H bonds, eight (8) electrostatic, and three (3) hydrophobic interaction types with different amino acid residues in the active site of the targeted receptor as summarized in Table 7. For the conventional H-bond interaction, the residues of ARG118, ARG293, and ARG368 behave as H-bond donors to the carbonyl oxygen (C = O) of the compound 16 as H-bond acceptors to form the conventional hydrogen bond interaction at different bond distances of 2.8159, 2.1645, and 2.1796 Å, respectively. Also, the hydrogen atoms of the methoxy (6-OCH 3 ) group of the same compound behave as H-bond donors to the oxygen atoms of the ASP151, TRP179, GLU119, and ASP151 as the acceptors to form the carbon-H bond interactions at different bond distances. The 16complex formed three different π-cation electrostatic interaction types with ARG152, ARG152, and ARG368 at interaction distances of 3.9068, 4.7185, and 3.0610 Å, respectively, while five different π-anion electrostatic interactions were conversely formed with ASP151 (4.1748 Å), ASP151 (4.3502 Å), GLU228 (4.6985 Å), GLU278 (3.4884 Å), and GLU278 (3.0477 Å) at different interaction distances accordingly. For the hydrophobic interaction, the 2-EtO group of the compound interacts with the alkyl group of ILE149 to form one (1) alkylalkyl hydrophobic interaction type at a distance of 4.3738 Å, while the remaining 2-hydrophobic interactions formed were as a result of the π-alkyl interaction type with ARG225 and PRO431 residues at 5.4696 Å and 4.4297 Å interaction distances, respectively.
The residual interactions of compound 21 revealed three (3) conventional H-bonds, five (5) electrostatic, and six (6) hydrophobic interaction types with different amino acid residues at the active site of the targeted receptor as summarized in Table 8. For the conventional H-bond interaction, the residues of ARG118 and ARG293 behave as H-bond donors to the oxygen of carbonyl (C = O) and sulfur atom of compound 21 as H-bond acceptors to form the conventional hydrogen bond interactions at different bond distances of 3.0445 and 2.4928 Å, respectively, while the -NH group of the acetamide moiety in the compound behaves as the H-bond donor to the oxygen atom of ASP151 amino acid residue. The 21-complex formed three different π-cation electrostatic interaction types with ARG152 (4.5505 Å) and ARG368 (3.4231 Å & 3.4482 Å), while two different π-anion electrostatic interactions were formed with ASP151 (4.1664 Å) and GLU278 (3.4884 Å) at different interaction distances accordingly. For the hydrophobic Score: the final docking score, rmsd_refine: the root mean square deviation between the pose before and after refinement, E_conf: the energy of the conformer. E_refine: core from the refinement stage, calculated to be the sum of the van der Waals electrostatics and solvation energies, under the Generalized Born solvation model (GB/VI), E_score1: Score from rescoring stages 1, E_place: Score from the placement stage, E_score2: Score from rescoring stages 2

Evaluation of drug-likeness and ADMET parameters
The physicochemical parameters of the compounds are usually evaluated and linked to some filter variants for drug-likeness appraisals based on certain rules such as Lipinski's rule, Pfizer's rule, GSK's rule, and Golden triangle rule. Hence, physicochemical radar chart of the lead compounds 16 and 21 was generated from the ADMETlab 2.0 web server as shown in Figure 11.
Most of the physicochemical properties for the selected compounds are within the upper limit (brown) except for Log P and Log D whose values are slightly above the optimal limit as presented in the radar charts. The selected lead compounds have also passed the Lipinski rule of five, as shown in Table 9. The Lipinski's criteria for drug-likeness include molecular weight (MW ≤ 500 g/mol), n-octanol/water distribution coefficient (Log P ≤ 5), number of hydrogen bond acceptors (nHA ≤ 10), and number of hydrogen bond donors (nHD ≤ 5) [40,41]. It could be observed that the lead compounds have slightly higher Log P scores in comparison with the optimal limit (0 < Log P < 3). This suggests that the compounds have low aqueous solubility and good oral bioavailability [42,43]. The Log P also gives information on the cellular membrane permeability and hydrophobic binding to macromolecules, such as the target receptors, plasma proteins, metabolizing enzymes, or transporters [44]. However, zanamivir and oseltamivir have lower Log P scores of 1.013 and −1.317 which tend to experience difficulty in penetrating the lipid bilayer of the cell membrane.
The quantitative estimate of drug-likeness (QED), synthetic accessibility (SA) scores, and other drug-likeness rules of the selected compounds are also evaluated in Table 10). The QED scores of the compounds are less than 0.67 signifying unattractive measures as drug-like compounds, but they are predicted to be easily synthesized due to their low SA scores of less than 6. The lead compounds were observed to satisfy other druglikeness rules as shown in Table 10.
Some of the relevant computed ADMET parameters include human intestinal absorption (HIA), human colon adenocarcinoma cell lines (Caco-2) permeability, Madin −Darby Canine Kidney cells (MDCK) permeability, plasma glycoprotein (Pgp) inhibitor, plasma glycoprotein (Pgp) substrate, plasma protein binding (PPB), volume distribution (VD), blood-brain barrier (BBB) penetration, human cytochromes (CYP), clearance (CL), half-life (T 1/2 ), AMES toxicity, carcinogenicity (Carc), eye irritation (EI), respiratory toxicity (RT) as shown in Table 11. The Caco-2 cell permeability has been an imperative index for an eligible drug candidate, which is linked with human intestinal absorption [41]. The lead compounds were considered to have proper Caco-2 cell permeability because their values are higher than the optimal score of −5.15 log cm/s. The MDCK permeability is employed as an in-vitro model for permeability screening, and its  apparent coefficient is used to evaluate the efficiency of chemicals in the body, and also to estimate the effect of the blood-brain barrier. The lead compounds were predicted to have low MDCK permeability due to lower computed coefficients when compared with the optimal score of 2.0 × 10 −5 cm/s. The output results of the lead molecules revealed an excellent probability of being Pgp non-inhibitor and Pgp nonsubstrate whose range of values is within 0 and 1. PPB is one of the most important mechanisms of drug uptake and distribution resulting from the drug-protein bindings in the plasma, which strongly affects the pharmacodynamic behavior of the drug [45]. The lead compounds were also predicted to have a high value of PPB (>90%) depicting a broad therapeutic index. The volume distribution (VD) is used to correlate the administered drug dose with the actual concentration in the system, which describes the in-vivo distribution [45]. As such, the lead compounds are predicted to have proper VD values, which are within the range of 0.04-20 L/kg. The BBB permeated output of the lead molecules was classified as BBB+ category whose predicted values are > −1. For the metabolism, the predicted outputs revealed the probabilities of being either lead inhibitors or substrates of CYPs (1A2, 3A4, 2C9, 2C19, and 2D6) isoenzymes ranging from 0 to 1. The clearance of a drug (CL) is also an important pharmacokinetic measure that describes how the drug is excreted from the body. The clearance penetration results of the lead compounds 16 and 21 with probability values of 8.746 and 7.328 mg/min/kg, respectively, are predicted to have low clearance levels. Hence, none of the compounds showed any serious toxicity profile threats with respect to AMES mutagenicity, eye irritation and respiratory toxicity. Key: The prediction probability values are classified into six symbols: 0-0.1(-), 0.1-0.3(-), 0.3-0.5(-), 0.5-0.7(+), 0.7-0.9(++), and 0.9-1.0 (+++). Generally, '+++' or '++' represents the molecule is more likely to be toxic or defective, while '−−'or'−' represents nontoxic or appropriate.

Conclusion
In conclusion, the study utilized some molecular modeling concepts, such as 2D-QSAR, 3D-QSAR, molecular docking, and ADMET predictions on 32 analogs of 2-((1 H-indol-3-yl)thio) acetamide as IAV inhibitors to theoretically visualize the various lead compounds. The 2D-QSAR (GFA-MLR and GFA-ANN) models with the feature selected descriptors such as AVP-0, SpMin1_Bhm, E1e, L2m, and ATSC7i were found to have reliable anti-IAV activity predictions. The 3D-QSAR models further revealed the prominence of steric, electrostatics, H-bond acceptors and donor regions for improving anti-IAV activity of the compounds based on CoMFA and CoMSIA methods. The statistical validation of both 2D-QSAR and 3D-QSAR were all within the global benchmarks for accepting QSAR models, which supports the predictive competence of the models. The molecular docking results showed a good consistency with our 2D-QSAR and 3D-QSAR studies, and two lead compounds (16 and 21) were identified as the possible candidates for future in-silico design and search of more potent anti-IAV compounds with improved activities. Moreover, the druglikeness and ADMET predictions of the lead compounds revealed non-violation of Lipinski's rule, good pharmacokinetic and toxicity profiles. Thus, the outcome of this theoretical study identified the possible lead compounds for future in-silico exploration of improved anti-influenza agents.