ANN and RSM based modelling for optimization of cell dry mass of Bacillus sp. strain B67 and its antifungal activity against Botrytis cinerea

ABSTRACT The present study was conducted to present the comparative modelling, predictive and generalization abilities of response surface methodology (RSM) and artificial neural network (ANN) for optimization of fermenting medium. Cell dry mass and inhibition zone of strain B67 against Botrytis cinerea were used as response variables. The response variables were optimized and modelled as a function of five independent variables (pH, gelatine percentage, incubation period, agitation speed, and temperature) using response surface methodology and artificial neural network. The results of both approaches were compared for their modelling abilities in terms of root-mean-squared error (RMSE), mean absolute error (MAD), chi-square, and correlation coefficient, computed from experimental and predicted data. ANN models were proved to be superior to RSM with lower RMSE, MAD, and chi-square and higher values for correlation coefficient, coefficient of determination, and predictive coefficient of determination. The optimum fermenting conditions predicted were pH 6.65, gelatine 3.30%, incubation period 35 h, agitation speed 163 rpm, and incubation temperature 33.64 °C, with 15.00 g/L and 31.64 mm cell dry mass and inhibition zone, respectively. The predictive models were validated experimentally and were found in agreement with experimentally obtained values.


Introduction
The excessive use of synthetic pesticides for the management of plant pathogens has given rise to problems due to the development of resistance in pathogens against traditional fungicides [1]. Therefore the biological management of pathogens has gained interest as an alternative approach that is sustainable and safe both economically and environmentally [2]. Biological control agents have an important position in the integrated management of plant diseases and especially the bacteria from Bacillus genus have potential to serve as biological control agents [3][4][5][6].
Although a large number of biological control agents have been discovered, the number of commercially available ones is very small [7]. The contribution of biological control agents is less than 5% of the total use of pesticides in agriculture [3], which might be due to complications in the formulation and fermentation processes of these biological control agents [7]. Low yield and high production cost are major limiting factors for the development, commercialization, and implementation of biological control agents [8]. Thus, the optimization of fermenting medium is crucial to overcoming the above limitations [7]. Efficient fermentation is an important step for industrial use of bacteria as biological control agents for enhancing their cell density and yield of products of interest.
Single-factor optimization is time consuming and laborious. Furthermore, it hardly ever promises the determination of optimal conditions for effective and efficient production of the target product [9]. These limitations of the one-factor-at-a-time approach can be overcome by using alternative approaches like response surface methodology (RSM) and artificial neural network (ANN). RSM has been largely used in optimization of fermentation medium [10][11][12][13][14][15]. RSM encompasses a group of statisticsbased approaches for building models, designing experiments, assessing effects of factors and searching for optimal conditions [16]. It is a designed experimental procedure which allows the simultaneous optimization of several factors [10]. The experimental responses in RSM are fitted to a quadratic function and the successful application of RSM in a number of studies suggested to be the best choice for many of the fermentation systems [17].
In recent years, ANN has arisen as an efficient and attractive approach for nonlinear multifactor modelling due to its generic structure and ability to learn from historical data [18,19]. ANN is more efficient than RSM because it does not require a prior description of proper fitting function and it has the ability of universal approximation, i.e. approximation of almost all kinds of nonlinear functions, while RSM is only useful for quadratic approximation [17]. Moreover ANN is structured in nature and useful for getting more insight information, i.e. it has the ability to provide sensitivity analysis and to reveal the interactive effect of two factors on the system [20][21][22][23].
The present study had two objectives: (i) to maximize the cell dry mass of Bacillus sp.strain B67 and its antifungal potential against Botrytis cinerea and (ii) to compare the modelling ability of RSM and ANN for optimization of fermenting medium components, namely pH, gelatine percentage, incubation period, rotation speed, and incubation temperature. The optimized conditions predicted by both RSM and ANN were compared and experimentally verified.

Biocontrol agent and fungal pathogen
The Bacillus sp. strain B67 was obtained from the Department of Pesticides Science, Plant Protection College, Shenyang Agricultural University, Liaoning, China. It was isolated from the rhizosphere of a tomato plant and was maintained in Tryptic Soy Broth (TSB) with 20% glycerol at ¡80 C after initial testing of its antifungal activity against B. cinerea [24], it was activated in TSB for 48 h before use [2]. The fungal pathogen (B. cinerea) was also obtained from the Department of Pesticides Science, Plant Protection College, Shenyang Agricultural University, Liaoning, China, where it was isolated from tomato plant leaves. The biological control agent (Bacillus sp. strain B67) and the pathogen (B. cinerea) were deposited in China General Microbial Cultural Collection Centre under accessions numbers CGMCC1.15933 and CGMCC 3.18224, respectively.

Preliminary antifungal activity of biocontrol agent
Pure cultures of fungus were initially grown in Petri dishes containing standard potato dextrose agar (PDA)(20% potato extract, 2% dextrose, and 1.5% agar) medium and incubated at 28 C for 5 days. After this period, 8 mm disks were cut from the edge of actively growing colonies of fungus with the aid of a cork borer. Two plugs were placed at opposite edges of each dish. The strain B67 was streaked on the centre of the PDA plate at the time of fungi transplanting. After incubation for 5 days at room temperature, the radii of the zones of inhibition of the fungus were measured in two perpendicular directions.

Growth medium and culture conditions
The inoculum of strain B67 was prepared by transferring a single colony to a 250 mL conical flask containing 100 mL of basal medium (standard Luria-Bertrani medium) composed of tryptone 10 g/L, yeast extract 5 g/L, and NaCl 10 g/L and grown for 12 h at 28 C. Then, 10% (v/v) of the inoculum culture was added to 250 mL conical flasks containing 100 mL of the experimental medium consisting of the basal medium adjusted according to experimental design ( Table 1). The cultures were incubated at different temperatures, using different rotator speeds for the different incubation periods as determined by the experimental design ( Table 1).

Estimation of cell dry mass and antifungal activity
Cells were harvested by centrifugation at 10,000 g for 20 min. The cell pellet was washed with phosphate buffer and recentrifuged. After that, the cell pellet was dried to constant mass in an oven at 80 C for 24 h.
The cultural supernatants were used to evaluate the antifungal activity. The antifungal activity was determined by the Oxford cup plate assay system. PDA medium (70 mL) was heated until completely melted and then slowly cooled to 50 C, and rapidly mixed with 8 mL of a spore suspension of B. cinerea (10 7 spores/mL) before being poured onto the 180 mm plates and allowed to solidify at room temperature. The supernatant (150 mL) was added into each Oxford cup on the plate, incubated at 28 C for 5 days in the dark. The antifungal activity was determined by measuring the diameter of inhibition zones in two perpendicular directions.

RSM based modelling
In the present study, RSM was employed to determine the optimum medium components for cell dry mass and antifungal activity against B. cinerea. A central composite design (CCD) with five factors and five levels for each was used to design the experiment ( Table 1). The medium components selected for optimization included pH (X 1 ), gelatine percentage (X 2 ), incubation period (X 3 ), agitation speed (X 4 ) and temperature (X 5 ). Central composite designed (CCD) consisted of 16 cube, 10 axial, and 6 central pints as 32 triplicate experiments were performed in a randomized order. The central values selected for experimental design were: medium pH 6.5, gelatine 3%, incubation period 35 h, rotator speed 150 rpm, and incubation temperature 34 C. Each variable was coded at 5 levels between ¡2 and 2 and the coding of the variables was done using Equation (1).
where xi is a dimensionless value of the independent variable, Xi is the real value of the independent variable, Xcp is the real value of the impendent variable at the central point and Dxi is the step change in the real value of variable i corresponding to a variation in a unit for the dimensionless value of variable i. The process performance was determined by analysing the response (Y) and the relationship between response and input parameters was determined by Equation (2).
where x1, x2…, xk are input factors and e is the error which describes the differentiation. A second-order polynomial regression equation was used to fit the experimental data and to describe the relevant model terms using MINITAB 17 software. A quadratic model, which also includes the linear model, was described by the Equation (3).
where Y is the predicted response; b0 is the intercept constant; bj, bjj and bijare the interaction coefficients of the linear, quadratic and the second-order terms, respectively; k is the number of factors; xi and xj are variables (i and j range from 1 to k); ei is the error [25][26].
Multiple regression was employed to analyse the experimental data, and the significance of regression coefficients was evaluated by F-test. Modelling was started with a quadratic model including linear, squared, and interaction terms and the model competences were tested in terms R 2 , adjusted R 2 , and predicted R 2 values. The significant terms in the model were determined by analysis of variance (ANOVA) for each response, and ANOVA tables were constructed. The regression coefficients were used to make statistical calculations to generate Contour plots from the regression models.

ANN based modelling
The feed forward architecture of ANN, also known as multilayer perceptron (MLP) was further used to provide nonlinear mapping between the input and the output variables. ANN was employed to construct the predictive model with five input variables (pH, gelatine percentage, incubation period, agitation speed, and temperature) and two output variables (cell dry mass and inhibition zone) for the same set of experimental data used for RSM.
temperature, while the output layer comprised of two neurons, i.e. cell dry mass and inhibition zone. The data flow in a forward direction, i.e. from an input layer to an output layer. A real number quantity, known as weights, is linked with the assembly of neurons, which is a variable parameter of the network. The neurons in the input layer just introduce the scaled input data to the hidden layer via weights. The neurons in the hidden layer perform two tasks, i.e. they sum up the weighted inputs to neurons, including bias as shown by Equation (4).
where Wij are the connection weights; Xi is the input variable; uj is bias. The weighted output is then passed through an activation function f (sum). The activation shifts the space in non-linearity of input data. The sigmoid transfer function was used in this study, as shown in Equation (5).
The input for the output layer comes from the output produced by the hidden layer. The same function is used by the neuron in the output layer and produces the same output using the same technique as that of the neuron in the hidden layer. ANN training is an iterative process where a pre-specified error function is minimized by adjusting the weights appropriately. An error function (the root-mean-squared error (RMSE) based upon the calculated output and actual experimental output was formulated using Equation (6).
where i is the index of pattern; N is the number of patterns used in data training; M is the number of output nodes; y i n andŷ n i are the target and predicted outputs of the nth node, respectively.

Comparison of RSM and ANN
The goodness of fit and prediction abilities of computed models were evaluated by calculating RMSE [27], mean absolute error (MAE) [28], chi square (x 2 ) [29] and correlation coefficients (R 2 ) [30]. The formulas used for these analyses are given in Equations (6)-(9), respectively. Furthermore, the values predicted by RSM and ANN were plotted against the corresponding experimental values.
where n is the number of experiments; Y i,e is the experimental value for ith experiment; Y i,p is the predicted value for the ith experiment; Y e is the average of the experimental values.
The fitness and adequacy of the models were analysed by ANOVA. The F values (81.82, 131.08) indicate that both of the models have satisfactory goodness of fit ( Table 2). The models p values were very low, i.e. 0.000 also confirm the models results (Table 3). P values were used to evaluate the significance of different coefficients, which provides the information required to understand the interaction pattern among the experimental variables. Smaller P values refer to larger significance of the respective coefficient. The coefficients estimates and their respective P values showed that, among all tested variables, X 1 (pH), X 2 (gelatine), X 3 (agitation speed), X 4 (incubation period), and X 5 (temperature) had a significant effect in both models, i.e. on cell dry mass and inhibition zone ( Table 3).
The models' fitness was expressed by the coefficient of determination (R 2 ) and adjusted coefficient of determination R 2 . The values for R 2 were 0.9834 and 0.9881 for cell dry mass and inhibition zone, respectively, which showed that both of the models were >98% accurate to express the variability of the data ( Table 2). The adjusted coefficient of determination corrects the R 2 value for sample size and number of terms in the model, and its values (0.9713 for cell dry mass and 0.9796 for inhibition zone) were also high enough to express the high significance of models ( Table 2).
The effects of the independent variables and the interactive effects of each independent variable on the response variables were illustrated by contour plots. Note: X 1 , pH; X 2 , gelatin (%); X 3 , incubation period (h); X 4 , agitation speed (rpm); X 5 , temperature ( C).
The oval shape of the contour plots indicates a significant interaction between the independent variables. The smallest ellipses in the contour plots represent the maximum predicted values. The cell dry mass and inhibition zone for each pair of independent variables are presented in Figures 2(A-J) and 3(A-J). It is apparent that the cell dry mass and inhibition zone increase with the increase in pH from 5.5 to 6.65 but a further increase in pH caused a reduction in both cell dry mass and inhibition zone. A similar relation was observed for gelatine, incubation period, agitation speed, and incubation temperature. An initial increase in these variables resulted in an increase in the cell dry mass and inhibition zone but, after a certain level, further increase resulted in reduced cell dry mass and inhibition zone (Figures 2 and 3). Therefore, the maximum cell dry mass and inhibition zone were predicted at pH 6.5, gelatine concentration of 3.30%, incubation period of 35 h, agitation speed of 163 rpm and incubation temperature of 33.63 C (Figures 2 and 3). The cell dry mass (14.80 g/L) and antifungal activity (32.25 mm inhibition zone) obtained in the present work were much higher than the values reported previously [2,31,32]. The growth of Bacillus species is generally considered be an aerobic process, so the amount of dissolved oxygen is an important parameter in the fermentation process [33]. The oxygen supply is directly related to the agitation speed, and therefore, the agitation speed was found to be an important factor affecting the cell dry mass and antifungal activity. Lower or higher pH of the fermenting medium affects the behaviour of the secondary metabolites production [34,35]. Similarly, lower or higher temperatures affect the enzymatic activity of organisms [36]. The present findings are in agreement with these observations, as both of the factors, i.e. pH and temperature, were found to significantly affect the cell dry mass and the inhibition zone. The results revealed that lower or higher levels of pH, gelatine, incubation period, agitation speed, and incubation temperature were not favourable for cell dry mass and antifungal activity [37].

ANN modelling
Multilayer perceptron (MLP) with logistic sigmoid function was used to develop ANN-based process models. An MLP with feed-forward architecture comprised of five input nodes (pH, gelatine, incubation period, agitation speed, and temperature) and two output nodes (cell dry mass and inhibition zone) was used ( Figure 4). First, the ANN model was optimized to obtain minimum dimension and error during the training and testing of the data. The design of the experiment and the respective experimental yield was used for training the network. The data were divided into training, validation and test sets to avoid over-training and over-parameterization. Data training was performed by changing the number of neurons in the hidden layer and for different combinations of ANN-specific parameters, i.e. initialization and learning rate. The generalization capacity of the model was confirmed by selecting the weights of parameter resulting in the least test set of RMSE. The MLP with 10 nodes in the hidden layer resulted in the least value for the test set RMSE (Figure 4). The correlation values between the predicted and the experimental sets were 0.999 for cell dry mass and 0.997 for inhibition zone ( Table 2).

Validation of model predictions
The predictive models were validated by conducting experiments in triplicate, using the optimized conditions. The maximum predicted values for cell dry mass and inhibition zone were 15.00 g/L and 31.12 mm,

Comparison of RSM and ANN models
Nutritional constituents and culture conditions play a crucial role in the development of an effective fermentation process [38][39][40]. The statistical techniques allowed determination of the best fermentation conditions that give the maximum cell dry mass and antifungal activity for Bacillus sp. strain B67. The RSM and ANN models were compared for their predictive capabilities. The values predicted by RSM and ANN are presented in Table 1.
Descriptive statics in terms of RME, MAE, chi-square (x 2 ) and correlation coefficient (R 2 ) between experimental and predicted values were calculated to compare the results (Table 4). These results showed that the ANN models have higher modelling capability compared to the RSM models for both cell dry mass and inhibition zone. The linear regression analysis between the values predicted by RSM and ANN showed that the values predicted by ANN are much closer to the line of perfect prediction than those predicted by RSM ( Figure 5). The better modelling ability of ANN can be attributed to its universal approximation ability for nonlinearity, whereas RSM is only limited to a second-order polynomial regression [26,41]. Modelling using RSM is easier compared to ANN, as ANN needs a higher number of inputs than RSM for better predictions. ANN has excellent prediction and optimization abilities, while sensitivity analysis is more precise in RSM. RSM is recommended for modelling of a new process, while ANN is best suited for nonlinear systems that include interactions higher than quadratic. Moreover ANN does not require any prior specification for suitable fitting function [42].

Conclusions
In this work, fermentation conditions for strain B67 were optimized for maximizing the cell dry mass and antifungal activity by using RSM and ANN approaches. Furthermore, the predictive and generalization abilities of both techniques were compared. ANN showed more precise optimum conditions for maximizing the cell dry mass and inhibition zone than RSM. The ANN predictive ability   was proved to be better than that of RSM and it can be concluded that ANN gives a more accurate replacement of RSM.