Comparative analyses on medium optimization using one-factor-at-a-time, response surface methodology, and artificial neural network for lysine–methionine biosynthesis by Pediococcus pentosaceus RF-1

ABSTRACT Optimization strategy that encompassed one-factor-at-a-time (OFAT), response surface methodology (RSM), and artificial neural network method was implemented during medium formulation with specific aim for lysine-methionine biosynthesis employing a newly isolated strain of Pediococcus pentosaceus RF-1. OFAT technique was used in the preliminary screening of factors (molasses, nitrogen sources, fish meal, glutamic acid and initial medium pH) before proceeded to optimization study. Implementation of central composite design of experiment subsequently generated 30 experimental runs based on four factors (molasses, fish meal, glutamic acid, and initial medium pH). From RSM analysis, a quadratic polynomial model can be devoted to describing the relationship between various medium components and responses. It also suggested that using molasses (9.86 g/L), fish meal (10.06 g/L), glutamic acid (0.91 g/L), and initial medium pH (5.30) would enhance the biosynthesis of lysine (15.77 g/L) and methionine (4.21 g/L). Alternatively, a three-layer neural network topography at 4-5-2 predicted a further improvement in the biosynthesis of lysine (16.52 g/L) and methionine (4.53 g/L) by using formulation composed of molasses (10.02 g/L), fish meal (18.00 g/L), and glutamic acid (1.17 g/L) with initial medium pH (4.26), respectively.


Introduction
Amino acids are widely applied in food, pharmaceutical, medicine, and chemical industries [1]. They are most commonly utilized as nutritional supplement or as additive in animal feeds. Amino acids are very important element in many metabolic activities. As such, their bioprocessing plays a major role in improving the efficacy of animal protein production contributing to the increase in protein supply [2][3][4]. Dietary provision for protein inevitably relates to the requirement of amino acids, since they are the building blocks of protein whilst some being the products of protein hydrolysis as well [5][6][7].
For industrial microbial cultivation process, medium composition plays a critical role due to its major influences on the formation, concentration, and yield of a particular culture's end product [8,9]. As such, optimization of medium is usually a major concern when one is tasked with maximizing the profit [10,11]. For each bioproduct, the process facility and suitable strategies have to be elaborated by a comprehensive and detailed process characterization, which in turn will determine the most relevant process parameters influencing the final yield or productivity affecting the overall process economics [11].
The traditional one-factor-at-a-time (OFAT) approach for optimization exercise can be time-consuming. Nonetheless, it can serve the purpose of coarse estimation of the optimum levels [12][13][14]. On the other hand, statistical method such as response surface methodology (RSM) enables researchers to design the experiments and evaluating the interactions among factors and responses throughout the study. More researches in recent time have used this approach that combines experimental design, regression modeling techniques, and optimization tool to predict the maximum yield for bioproducts of interest [15][16][17]. CONTACT  Artificial neural network (ANN) is another means of analysing experimental outputs that is fundamentally different from RSM. Nowadays, the use of ANN in the field of predictive microbiology has inspired several studies [8,18,19]. The attractiveness of ANN as empirical modelling schemes lies in their ability to extract, with high accuracy and irrespective of the degree of nonlinearity existing between system variables, the intrinsic relationships between independent and dependent variables through training of the network on a set of examples representing the phenomenon to be modelled [20]. In other words, ANN is a highly simplified model mimicking the structure of a biological network. A set of biological neurons receive inputs, combines them, presents it as a nonlinear operation on the result, and then output the final result [21,22].
Numerous studies were reported in the literature whereby models were derived based on the RSM and ANN analyses of data-set gained from the same experimental design. ANN and RSM models were then compared for their predictive capacity. Several researchers had reported the combined ANN and RSM model development in various bioprocessing optimization studies [15,16,[23][24][25]. While the use of ANN or RSM optimization method for lactic acid bacteria (LAB) cultivation and protein biosynthesis has been reported [19,25,26]; this is the first such report on lysine-methionine biosynthesis by Pediococcus pentosaceus using these two approaches. The main objective of this study is to optimize the medium formulation and initial medium pH on lysine-methionine biosynthesis by P. pentosaceus RF-1 in accordance to the traditional OFAT and statistical approaches of RSM and ANN.

Materials and methods
Bacterial strain and maintenance P. pentosaceus RF-1, which is a facultative anaerobe strain, was used and maintained in 5% (v/v) glycerol at ¡80 C. The strain was routinely grown in de Man Rogosa Sharpe (MRS) medium. P. pentosaceus RF-1 was locally isolated from fermented milk and has been characterized by full-length 16S rRNA gene sequencing. Phylogenetic analysis had disclosed a taxonomic position of the strain to be closely related to P. pentosaceus ATCC 25 745 with 99% of similarity. The strain was deposited in the Microbial Culture Collection Unit (UNICC), Institute of Bioscience, Universiti Putra Malaysia, under the accession number of UPMC1087 [27].
Molasses would undergo pre-treatment via dilution with distilled water containing 2% (v/v) sodium dihydrogen phosphate in the ratio of 1:1 and autoclaved [28]. The cultivation medium was set to pH 7 with the addition of 1 M NaOH or 1 M HCl. before sterilization at 121 C for 20 min. After sterilization, the medium was left to cool at room temperature and then added with CaCO 3 , molasses, and 10% (v/v) of inoculum. The inocula were prepared by inoculating a colony of the strain grown on MRS agar plate into 5 mL of MRS broth in a 100-mL test tube with continuous shaking (100 rpm) on a rotary incubator shaker at 37 C for 12 h. The submerged batch cultivations for the growth of P. pentosaceus RF-1 were carried out using 250-mL shake flask filled with 150 mL of medium for 18 h. About 3 mL samples were withdrawn at time intervals during the cultivation for the analyses of cell concentration, glucose consumption, and lysine-methionine concentration.

Design of experiment (DOE)
One-factor-at-a-time The experimental factors and their associated levels for medium formulation and pH adjustment via OFAT experiments are shown in Table 1. All experiments were performed in triplicates, and the results were reported as the mean of these replication.

Central composite design (CCD)
Four variables and five levels were used in this study. The four variables used were molasses, fish meals, glutamic acid, and initial medium pH ( Table 2).

Response surface methodology (RSM) modelling
The results from CCD were then statistically evaluated by Design Expert 6.0.6 software (Stat-Ease Inc.). Table 1. Parameters and variables used in OFAT approach.

Parameter
Variables and concentrations (g/L) Molasses 1, 3,5,10, and 12 g/L Nitrogen source Yeast extract, peptone, palm kernel cake, and fish meal Fish meal 1, 3, 5, 10, 15, and 20 g/L Glutamic acid 0.1, 0.3, 0.5, 1.0, and 5.0 g/L Initial pH pH 5, 6, 7, and 8 Independent variables were allocated a high level (+1) and low level (-1). An axial distance (+a) of 1.6 was chosen to make the design rotatable. Central point was denoted as (0) and maintained at a constant value, as it is providing an unbiased estimate of the process error variance. The centre point was set as mid-point value, and there were six centre points for these particular experiments. The best suited regression model in terms of the two responses of interest, i.e. lysine and methionine concentration was found to be the quadratic model resembling second-order polynomial as per the following equation: where Y is the lysine and methionine concentration, j is the index numbers of the pattern, x j is the coded variables, b j, b jj , and b jk were linear, quadratic, and interactive coefficient. The F-value was considered to be significant. The lack of fit (LOF) was registered as nonsignificant and regression produced a good multiple correlation coefficients (R 2 ).

Artificial neural network (ANN) modelling
The same set of CCD group of experimental data, which had been used for the RSM design, was also employed in developing the ANN. Versatile ANN software (Neural Power, Ver. 2.5, CPC-X Software, USA) was chosen to recognize the relevant pattern or regression from tabulated data. They were further divided into two sets to serve for training (30 data) and testing (4 data) purposes. From Table 3, the bold number represents the randomly selected testing set. Every network developed would depict four input variables and two output responses; each underwent training for computation of network parameters. The performance of network was consulted concurrently with the testing set during training to avoid 'over trained' [16,26]. Training a neural network model means selecting one model from the set of allowed models that minimizes the cost criterion. As to supervise the training, designed networks were trained to the point of exhibiting root-mean-square error (RMSE) which theoretically should be closest to 0.01 (Equation (2)). The networks correlation coefficient (R) (Equation (3)) and determination coefficient (DC) (Equation (4)), respectively, are closest or equal to 1: where N is the number of experiments, x obs is the observed value, x p is the predicted value obtained from ANN, x m is the average of actual values, and x pm is the average of predicted value. A multilayer full feed-forward neural network was used to predict the model of lysine-methionine biosynthesis. The software allows for 5-30 numbers of neuron selections per layer when designing a network, each with an increment of one neuron at a time. In this experiment, the search for the best topology was restricted to a network containing a single hidden layer. The optimal number of neurons in hidden layers as well as the suitable transfer functions chosen for hidden and output layers (sigmoid, hyperbolic tangent function, Gaussian,  linear, threshold linear, and bipolar linear) were manually determined iteratively based on the ability of network to provide the most accurate prediction of the testing set and least minimum of cost function. Different learning algorithms were used to train the networks. However, a common default for selection is that of back-propagation learning algorithm, during training, a set of inputs is presented to a network of randomly pre-assigned weights. Each neuron in the hidden and output layers first calculates the weighted sum of its inputs and passes the result through a transfer function to produce an estimate of output that corresponds to the input data-set. The result was then compared to the corresponding desired values, and the error is backpropagated through the network to adjust the connection weights according to the learning instruction. This practice is reiterated until the predetermined target RMSE is reached [8].

Analytical methods
Cell and glucose concentrations A 1 mL of sample was centrifuged (10,000 x g, 10 min, 4 C) to separate the cell pellet from supernatant. The supernatants were collected for glucose determination [29] with absorbance measured at 540 nm, while cell pellets were used for cell concentration determination [12].

Amino acid concentration
Amino acids were determined preliminary to assured tendency of amino acids in the medium using the quantitative methods by high-performance liquid chromatography (HPLC). For the routine detection, quantitative analysis by acid ninhydrin methods [1,30] was used.

Results and discussion
One-factor-at-a-time (OFAT) Preliminary optimization of the medium formulation (molasses, nitrogen sources, and glutamic acid) and initial medium pH was conducted in shake-flask experiments ( Figure 1) whereby biosynthesis of lysine and methionine by P. pentosaceus RF-1 was demonstrated as a strain-dependent in the culture medium.
In order to achieve efficient and optimal production, various studies on medium composition had been dedicated to the selection of carbon and nitrogen sources. Molasses, often regarded as a waste from sugar factories was chosen as a prospect for the main carbon substrate in fermentation media. Effect of molasses concentration was carried out ranging from 1 to 12 g/L in the study ( Figure 1(A)). The best concentration was detected at 5 g/L when P. pentosaceus RF-1 produced high lysine and methionine at 6.68 and 3.29 g/L, respectively. Inhibition of cell growth and amino acids biosynthesis was observed when more than 10 g/L molasses was used in formulation.
Nitrogen is used both for functional and structural purposes by different microorganisms. The form of nitrogen has profound effect on the microbial metabolism [12]. Figure 1(B) evaluates the best candidates for nitrogen source among yeast extract, peptone, palm kernel cake (PKC), and fish meal. Maximum cell concentration of P. pentosaceus RF-1 was observed in medium adopting PKC due to its high nitrogenous compounds and protein value. Nonetheless, fish meal turned out at the top in affecting biosynthesis of lysine (6.67 g/L) and methionine (3.13 g/L). The concentration of fish meal was then varied from 1 to 20 g/L in the following experiments (Figure 1(C)). Highest biosynthesis of lysine (6.84 g/L) and methionine (3.01 g/L) was detected when using fish meal at 5 g/L. It is worth noted that molasses when coupled with fish meal in the cultivation medium are particularly effective to provide tremendous microbial growth for further exploitation in producing various metabolites, biopolymers, and enzymes [31,32].
Protein-rich fish meal as the sole source of nitrogen depends upon the available level of the limiting amino acid. In this study, glutamic acid addition ranges from 0.1 to 5 g/L. Figure 1(D) shows that the best concentration of glutamic acid for lysine biosynthesis (6.86 g/L) was detected at 0.3 g/L, whereby it was slightly higher at 0.5 g/L for methionine biosynthesis (2.38 g/L). The addition of glutamic acid is essential due to this substrate occupying in between energy and protein metabolism, and very crucial in cases of metabolic for amino acid mechanism [33,34].
Optimal pH range for the growth of all genus Pediococcus is between 6.0 and 6.5 [35,36]. Effect of initial medium pH experiment was carried out ranging from pH 4 to pH 8. The final result obtained is in agreement with others [36], whereby pH 7 was found to be most suitable for amino acids secretion by microorganism (Figure 1(E)).

RSM modelling
Findings from the preliminary screening in OFAT experiment were then applied to RSM modelling. Four factors to be optimized were molasses (A), fish meal (B), glutamic acid (C), and initial pH (D), in which they were assigned to a number of runs as determined through CCD. A total of 30 experiments (Table 3) were conducted to evaluate their effect towards two responses (lysine and methionine synthesis). Results from CCD indicate that both produce optimal responses in experiment no. 16. The corresponding concentrations for factors A, B, C, and D are 10, 15, 1 g/L, and pH of 8, leading to the highest lysine and methionine produced at 14.13 and 4.96 g/L, respectively.
RSM simulation predicts that quadratic model was most suited to describe the relationship between factors and responses. Regression was performed to fit the response function with experimental data, resulting in two full actual models as per the following equations: where lysine and methionine represent the predicted responses; A, B, C, and D are coded values of molasses, fish meal, glutamic acid, and initial medium pH, respectively. The statistical analysis for significances of all factors was described by analysis of variance (ANOVA) in Table 4. The determinations of R 2 coefficient, correlation, and model significance (F-value) were used to analyse the adequacy of the model. The quality of fit of the equation was expressed by the DC, R 2 . A good R 2 should be 80% and above [12]. Based on the results, R 2 obtained for lysine and methionine are 0.9016 and 0.9039, respectively. These indicated that the models could explain about 90% of the variability, and it was attributed to the independent variables. It has been denoted that good of fit was determined by R 2 Adj. The R 2 Adj accurately shows that the extraneous factor terms in a derived model equation will affect in some reduction in the calculation of the error sum of squares [16]. In this study, R 2 Adj stands at 0.8097 (lysine) and 0.8143 (methionine), respectively, thus indicated an agreement of a good model among the obtained and predicted values for output responses.
Model significance (F-value) is a measure of variation of the data around the mean. The probability value (P model > F) of less than 0.05 implies that each of these models was considered significant, indicates that the present model can serve as a good prediction of the experimental results. Meanwhile, a result from the experiment was confirmed hence to be acceptable in good agreement relying on the value of the coefficient of variation (CV) that shows lysine at 14.69% and methionine 11.67%, respectively. From Table 2, centre points with a coded value (0) were repeated six times to estimate the pure error for the LOF tested. Models with a significant LOF term were not used for predictions, whereas insignificant LOF is the most desirable (p > 0.1). Both models produced LOF that are deemed not significant (P model > F) at 0.1369 and 0.4927, respectively.
From ANOVA analysis (Table 3), four independent variables were denoted to have a significant effect on P. pentosaceus RF-1 producing lysine and methionine. A p-value is used as a tool to determine the significance of each coefficient. Every parameter was estimated, and the corresponding p-values for lysine and methionine are shown in  Table 4. Positive coefficient for A, B, and C was indicated as a linear effect on the response. Table 4 shows some of the model terms of responses of lysine (A, B, A2, and AC) and methionine (A, B, and A2) which had a p-value <0.05. Therefore, the simplified quadratic model equations (Equations (7) and (8)) appropriate in describing lysine and methionine biosynthesis are as follows: Lys ¼ þ9:74 þ 2:41A þ 0:57B À 1:60A2 þ 0:95AC (7) Met ¼ þ3:17 þ 0:77A þ 0:21B À 0:31A2 (8)

ANN modelling
Thirty experimental runs derived from CCD were also analysed through ANN, whereby the outcomes in terms of the observed, predicted, and absolute deviation of amino acid biosynthesis prediction made by the best neural network constructed are shown in Table 5. About 120 network architectures had been developed and tested for the prediction of lysine-methionine biosynthesis by P. pentosaceus RF-1. Following training and testing procedures, Table 6 describes the effect of different normal feed-forward network architectures on the model residual error, showing three examples of top network architectures that yielded quite a high accuracy compared to the rest. Nonetheless, only one network was selected as the best predictor based on the error reduction criterion.
Training a neural network entails selecting a learning algorithm that can minimize the error or cost criterion. Table 6 implies that ANN modelling of experimental data-set leading to the least residual error calculated was either trained using batch back propagation (BBP) or incremental back propagation (IBP). IBP is most adequate for the purpose of network training when both the training and, more importantly, the testing sets are able to return prediction values exhibiting the lowest RMSE and networks correlation coefficient (R) and DC closest to 1.0. The details of the learning algorithm have been reported elsewhere [15,18,37].
As for the best network design in terms of accuracy of which it is assigned as set no. 1, it has three layers with network topology of 4-5-2 ( Figure 2). Output response for lysine when allied with sigmoidal function for hidden layer and linear function for output layer produces RMSE, R, and DC are at values of 1.83, 0.98, and 0.85, respectively. In addition, the output that predicted methionine using the same topology of neural network registers RMSE, R, and DC at values of 0.61, 0.99, and 0.85. This network consequently compromises a good bias and variance, and model explanation could promote a good generalization. It was reported in literature that one hidden layer is typically adequate to provide an accurate prediction, and it could be the first choice for any practical feed-forward network design [22]. Hence, a single hidden layer network was used in this study.  Figure 3 illustrates the level of importance (in percentages) or effectiveness of each factor (medium constituent) when analysed with Neural Power. The highest level of significance was attributed to fish meal (37.79%), followed by molasses (34.16%), glutamic acid (15%), and initial medium pH (13.05%). Numerous investigators have looked for ways of producing microbial amino acids using inexpensive media. In this study, P. pentosaceus RF-1 is shown to tolerate fish meal in the medium formulated as it is required for growth and amino acids biosynthesis. Fish meal provides not only a relatively larger proportion of proteins and nucleic acid but also more growth factor as compared to other nitrogen used [2,32,38].
The second-ranked molasses contain a high concentration of C6 to support the fermentation process; it is also an enriched source of 'B' vitamin [2,28,31]. However, this study revealed that utilization of molasses is rather limited to not more than 12 g/L in the medium due to inhibitive effect on the growth of P. pentosaceus RF-1. In fact, very high concentration of molasses would obviously darken the medium, or it could provide a complex Maillard reaction with fish meal for P. pentosaceus RF-1 to effectively grow.
Glutamic acid does not simply function as an energy source, but also as a precursor for nucleotides, guanosine triphosphate, purine, and pyrimidine, hence providing an essential component for the cell replication [33]. Most LAB depends on the addition of glutamic acid into the medium [34]. It has also exhibited a growth action of LAB in amino acid biosynthesis [4,6]. Figure 4 shows the three-dimensional plots describing the interaction of four factors on lysine biosynthesis as predicted by the best ANN network that are quite similar to representation by RSM (data not shown). From Figure 4(A), too high an increase in molasses concentration eventually causes a reduction in lysine whilst biosynthesis actually peaked in the mid-range of fish meal. On the other hand, Figure 4(B) depicts that too low of pH would inhibit product secretion but an increase in the pH range actually causes not too much improvement in lysine biosynthesis as compared to molasses. Figure 4 (C,D) interestingly shows that in some area of response curve, glutamic acid has an inverse relation with molasses and fish meal, lower concentration of glutamic acid is actually preferred when the concentration of molasses or fish meal was increased to a very high proportion in the medium formulation. Figure 5 depicts surface plots for methionine biosynthesis with the interaction between molasses with fish meal ( Figure 5(A)), initial pH with molasses ( Figure 5(B)), glutamic acid with molasses ( Figure 5(C)), and glutamic acid with fish meal (Figure 5(D)) as modelled by the neural network. When molasses concentration was fixed at   10 g/L, methionine biosynthesis increases, as glutamic acid and fish meal concentrations were raised to a certain level, methionine production starts to regress. The interaction between initial medium pH with molasses ( Figure 5(B)) and glutamic acid with molasses ( Figure 5(C)) shows similarity in trend.
Comparison between OFAT, RSM, and ANN Table 7 shows the follow-up experimental results that validate the new medium formulations as suggested by ANN as well as RSM simulation when compared to the optimal points from OFAT which are as follows: molasses (5 g/L), fish meal (5 g/L), glutamic acid (0.3 g/L), and initial medium pH 7. Validation experiments were required to verify the suggested maximum achievable concentration of lysine and methionine from the fermentation of P. pentosaceus RF-1. Based on the best point optimization function in Design Expert which possess a desirability factor closest to 1.0 (most desirable condition), RSM simulation had suggested that molasses to be set at 9.86 g/L, fish meal at 10.06 g/L, glutamic acid at 0.91 g/L, and initial medium pH 5.3. As for ANN, forecasting on the best possible medium formulation was made by solving the previous best network model through Rotation Inherit Optimization solver module of NeuralPower which in turn yielded a different recipe: molasses (10.02 g/L), fish meal (18 g/L), glutamic acid (1.17 g/L), and initial medium pH (4.26). All three conditions were tried on P. pentosaceus RF-1 subjected to 18-h batch cultivation. Figure 6 depicts the comparison between OFAT, RSM, and ANN recipes on P. pentosaceus RF-1 cell concentration, glucose consumption, and lysine-methionine biosynthesis. Observation made on the three media formulated via OFAT, RSM, or ANN indicates that P. pentosaceus RF-1 initially experienced a lag phase in the first 2 h of cultivation. Exponential growth phase ensued until 12-h cultivation for culture to gain high cell growth (X max ) before reaching stationary phase. For medium using ANN and RSM formula, measured maximum cell growth at 12 h of cultivation was approximately 1.24 § 0.046 and 1.154 § 0.019 g/L, respectively. Cell density was slightly higher in OFAT-based formulation with X max achieved at 1.31 § 0.01 g/L. During exponential phase of cell growth, the maximum amount of lysine-methionine production was also observed, implying that the biosynthesis of lysine-methionine is a growth-associated process.
Generally, the statistical approaches of RSM and ANN are sequential strategies which enable us to design, analyse, and find the optimum level and assessing the  interrelationship effects of factors leading to the higher growth of P. pentosaceus RF-1. Both approaches have a similar prominence on lysine-methionine biosynthesis by P. pentosaceus RF-1. It was found that actual biosynthesis of lysine and methionine using ANN suggested recipe produced 16.52 § 0.18 and 4.53 § 0.03 g/L against the predicted values of 14.45, and 4.34 g/L, respectively. It was apparent that the medium formulated through ANN slightly improved on the biosynthesis of lysine-methionine by P. pentosaceus RF-1 when cultured in medium proposed by RSM (an increase of 4.8% for lysine and 7.6% for methionine).

Conclusion
The statistical based models provide good predictions for the independent variables regarding lysinemethionine production whereby the superior ANN signified more precision among the predictions made. RSMderived formulation demonstrated maximum lysine and methionine biosynthesis at 15.77 § 0.10 and 4.21 § 0.08 g/L, respectively. The predicted values from RSM model were somewhat comparable with the experimental values obtained. Finally, as a measure of comparison, it is quite apparent from the data obtained that the improvement of medium constituent undertaken through a more systematic statistical approach has a sound merit given that ANN-and RSM-based formulation managed to encouragingly increase the biosynthesis of lysine-methionine by as much as 100% against that of OFAT method.