On prediction error variance to determining optimal design for two variable quadratic logistic model

Abstract Optimal design of experiment for logistic models has been examined and applied in a wide range of applications. The optimality of the designs is mostly determined by using general equivalence theorem with no attention paid to the extent at which the design can be useful for determining the predictive capability of the model. This paper addressed the predictive capability of optimal design model for two variable quadratic logistic regression model through prediction error variance(PEV). The PEV is a useful way to determining the predictive capability of a model in optimal design. The study used some initial guess parameters to represent any position of parameter in the design space through a simulation study of 10000 experimental runs. The design was optimal when the PEV value is less than one at nine equally weighted support points. The result of the analysis was able to identify the design that is good for prediction among all the designs obtained and conclude that prediction error variance should be used to test the stability of optimal design of experiment for two variable quadratic logistic models.


PUBLIC INTEREST STATEMENT
Experiment is a controlled study in which observations are made and data collected forms the basis for analysis and subsequent conclusions. The quality of the analysis depends directly on the experimental design. Research in optimal design of experiment focused majorly on linear models in the past but has been extended to non-linear models including Generalized Linear Models (GLM). One major problem of finding optimal designs for GLM is its dependency on unknown model parameters. Prediction Error Variance (PEV) is a useful way to investigate the predictive capability of the model in optimal design techniques. It gives a measure of the precision of a model's predictions. This can be used in any area of research to ascertain its optimality. A low PEV (close to zero) means that good predictions are obtained at that point. If the design PEV < 1, then the errors are reduced by the model fitting process and a good prediction is obtained at that point.

Introduction
Experimenters carry out experiments in almost all bodies of research and investigations for the purpose of discovering certain things about a particular process or system. Each of the experimental run is a test. Therefore, an experiment can be defined as a test or series of runs in which a reasonable adjustment is made to the input factor of a process or system to determine the purpose of changes that may be observed in the output response (Douglas, 2001). The nature of the experiment determines the analysis that will be carried out and the conclusion that will be made. Previous works on design of experiment are concentrated majorly on continuous response for linear model. The assumption of ordinary least square for classical model that the error term is normally distributed with a constant variance over the design space is widely used in this regard Jie et al. (2016). The assumption is not satisfied even asymptotically with logistic models for binary response data where response Y can only take one out of the two possible outcomes denoted for convenience as 0 and 1. This kind of response observation comes up in medical trials where at the end of a trial period, the patient has either recovered denoted as (Y = 1) or has not recovered denoted as (Y = 0) from a particular illness, Yang,et al. (2015). Some other examples include but not limited to: student pass or fail an examination; a lecturer is an indigene of a state or not an indigene. Response variables in these forms are logistic in nature that followed binomial distribution which is a family of generalized linear model that are characterized by three components namely: -(i) The distribution of each of the independent response variables Y 1 , Y N belongs to the exponential family (ii) The linear predictor η ¼ X T β is linear (in β) combination of k control variables x 1 ; x 2 ; . . . ; x k and p parameters (iii) The link function gðμ) specifies the relationship between expected value of the response variable E(Y) =μ and the linear predictor. This is a monotonic and differentiable function Designing an optimal experiment for this kind of response model is not easy because it solely depends on the initial guess parameter. The optimality of the design is determined by using a general equivalence theorem. This may not be actually optimal if further investigation to obtain the design predictive capability is carried out. This paper hereby used prediction error variance (PEV) as a means to confirm the design optimality.

Aim and objectives
This study aims to design experiments that are optimal for two-variable quadratic logistic regression models with the purpose to obtaining optimal designs for estimation of the associated quadratic response curve. The objectives of the study are to: - (1) obtain the performance of optimal as well as some non-optimal designs in small samples under two-factor quadratic logistic models.
(2) determine the predictive capability of two variable quadratic logistic model by using prediction error variance.

Motivation
Most of the researches that are carried out in the literature on optimal design of experiment for quadratic logistic models are based on one variable for linear models, one variable for quadratic models and multiple linear models. Hence, this study is motivated to extend the work of Fornius (2008) of one variable quadratic logistic model to two variable quadratic regression models to determine a design that can be useful for prediction.

Review of related work
Non-linear models are very important due to their various areas of application in different fields of study. Jafari et al. (2014) discussed D-optimal criterion where an appropriate models based on a logistic regression model with three covariates and the information matrix rely on less active parameters was introduced that resulted in a locally optimal designs for various specific states. It is noteworthy that certain designs with different points were presented and their optimality were obtained subject to the parameters space. Silvey (1980) discussed the support points for D-optimal design in general and the Caratheodory's theorem. The work also explained that in most linear problems, there is an upper bound on the number of support points for an optimal design. This was later presented by Chaloner & Larntz (1989) who gave examples of D-optimal design for one-variable logistic regression models that is based on a finite number of support points and more specifically on at most p(p + 1)/2 points, where p is the number of model parameters. Also, the D-optimal design for the linear models typically has an optimal number of support points that is the same as the number of parameters p and the weights associated with the support points are equal and thus equal to 1/p Gwenda (2010) . Li and Deng (2018) proposed a prediction-oriented design criterion, I-optimality and developed an efficient sequential algorithm of constructing I-optimal designs for generalized linear models. The General Equivalence Theorem was established for the I-optimality and the proposed algorithm was used to choose the support points sequentially and also to update the weights of support points of the design. The proposed algorithm is computationally efficient with guaranteed convergence property. Illustrative numerical examples were conducted on the proposed algorithm with a view to evaluate how feasible and efficient it is computationally.

Research and methods
Optimal design for two variable quadratic logistic models are described in this section with overall predictive power of the model. Method to obtaining optimal design was also presented that will be used for the simulation in conformity with the objectives of this study.

General equivalence theorem
The General Equivalent Theorem is fundamental to the theory of optimal design. There are many methods in practice to determine the optimal design. These include analytical, numerical, algorithms and graphical methods, used separately or in combination of two or more methods. Ordinarily, no method is preferred favorable but the optimality design determination depends majorly on the problem under consideration. The selected method to derive D-Optimal designs is described below; According to Li & Deng (2018), D-optimality is achieved by minimizing |M −I (ε; αÞ|, or equivalently maximizing |M(ε; αÞ|. To start with, the number of design point k is not known. Though it is known that there exists a D-optimal design with p � k � p pþ1 ð Þ 2 design points. A plot of the standardized predictor variance, d(x, ε; αÞ serves as a useful tool, it helps to show whether a suggested design is optimal or not. If the design is optimal, the maximum peak of d(x, ε; αÞ be equal to the number of parameters in the model. The maxima will also appear at the design points. In the non-optimal case the plot can give a hint of the optimal number of design points by looking at the number of peaks of the function d(x, ε; αÞ. The D-optimal design is essentially obtained according to the following steps.
(i) Begin with a k-point design (ii) Minimize |M −I (ε; αÞ| yielding the best possible k-point design. (iii) Plot the standardized predictor variance, d(x, ε; αÞ). If the visual inspection indicates that the design clearly is non-optimal, return to step ii and try a design with k = k + 1 point.
(iv) Verify optimality of the suggested design using the General Equivalent Theorem by assuming that the maximum of d(x, ε; αÞ) are attained at the candidate design points (either analytically or numerically). If the design cannot be verified to be optimal, go back to step ii.
The general equivalence theorem is used to construct and check designs and this is applied to a wide variety of the alphabet optimality criteria. Many of which are based on the information matrix M(�), which is a function of the design �. Atkinson and Donev (2007) state that "The General Equivalence Theorem can be viewed as a consequence of the result that the derivatives are zero at a minimum of a smooth function over an unconstrained region." Let ψ{M(�)} represent a general measure of impression for the design �. In the context of D-optimal design, for example, this measure is the determinant of the inverse of the information matrix ψ{M(�)} =InjM ðÀ 1Þ ð�Þj ¼ À InjMð�Þj, which is minimized. Atkinson and Donev (2007) and Fisher (1989) showed that the function we are minimizing, ψ, depends on � through the information matrix M(�). Using the notation adopted by the authors, let � be a one point design, denoted by x, and let the design � 0 be given by Then the information matrix of � 0 can be written as (1) and the derivative of ψ in the direction of � is The General Equivalence Theorem states the equivalence of the following three conditions on ξ* according to Ford et al (1992) and Li et al (2018): 1. The design� � minimizes Ψ(ξ).
3. The maximum over X of ψ(x, � � ) is equal to p, the number of parameters in the model, and the maximum occurs at the points of support of the design.

Standardized predictor variance or prediction error variance (PEV)
Prediction Error Variance (PEV) is a useful way to investigate the predictive capability of the model. It gives a measure of the precision of a model's predictions. PEV can be examined for designs and for models. It is useful to remember that: PEV (model) = PEV (design) * MSE. So the accuracy of the model's predictions is dependent on the design PEV and the mean square errors in the data. For this, PEV for the design should be made as low as possible, as it is multiplied by the error on the model to give overall PEV for the model. A low PEV (close to zero) means that good predictions are obtained at that point. If the design PEV < 1, then the errors are reduced by the model fitting process and a good prediction is obtained at that point. If design PEV >1, then any errors in the data measurements are multiplied. Overall predictive power of the model will be more accurate if PEV is closer to zero.
This study considered the following design matrix for the quadratic logistic model, x 2i 4 2 6 6 6 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 7 7 7 5 (3) Note: the only dependency of the design on observed values is in the variance of the measurement error. We can look at the PEV for a design (without MSE, as we don't yet have any observations) and see what effect it will have on the measurement error, if it is greater than 1 it will magnify the error, and the closer it is to 0 the more it will reduce the error. PEV is examined for designs or for the global models and can also be used to know how the underlying model predicts over the design region.

Two variables quadratic logistic model for generalized linear model
This study expressed two control variables quadratic logistic model which is an extension of logistic regression model with one control variable examined by Fornius (2008) (1). The responses are assumed to be Bernoulli and independently distributed Y,bern ω i ð Þ ¼ binð1; ω i Þ with the logit link function accroding to Rursell (2019) and McCullagh & Nelder (1989).
and response probability E(Y) = P(Y = 1) = ωðxÞ given by where the linear predictor for the logistic model with two control variables and quadratic terms is The logit link specifies the connection between probability of positive response ωðxÞ and linear predictor λðx i Þ From Equation (5), The variance of response probability in generalized linear model setting is given as Recall from Equation (37) hence, the variance becomes Where α 0 ; α 1 ; α 2 ; α 3 ; α 4 represent the model parameters Because of the fact that the link function is monotonic optimizing ω implies optimizing the linear predictor λ, thus the optimum points are obtained by using gradient vector ν λ i and equate the resulting solution to zero by Nelder & Wedderburn (1972). That is ν λ i ¼ @λ @x1 ; @λ Equating (10) to zero and solve for x 1 and x 2 α 1 þ 2α 3 x 1 ¼ 0 Thus, the solution for X 1 and X 2 at the optimum are respectively x 1m ¼ À α 1 2α 3 and The response curve that describes ω i as a function of x is symmetric around these points.
Computing the Hessian matrix of the linear predictor λ i , Equation (12) shows whether the probability of positive response ω x ð Þ has a maximum or minimum which is determined by the sign of the parameters α 3 and α 4 . The parameter α 0 determines the height of the curve in the optimum point. The linear predictor at the optimum point is obtained as If the response curve has a maximum ðα 3 < 0; α 4 < 0Þ a greater of α 0 means that the maximum of ω is closer to 1 and if there is minimum ðα 3 < 0; α 4 < 0Þ a greater of α 0 means that the minimum of ω is closer to 1. Therefore, the size of α 3 and α 4 determines the relative width of the response curve for a given height and a larger absolute value of α 3 and α 4 means a more narrow curve. Thus, the parameters determine the shape of the functionω x ð Þ.
Four sets of parameters guess that are considered for the simulation study in this study are presented in Table 1. The parameter sets are chosen to represent different variations of the shape of the response curve, it can be "high" or "low" and given the scale on x-axis, it can be "wide" or "narrow".
For the logistic regression with linear predictor in Equation (5), the standardized information matrix for the design obtained as Where varðx ij Þ is the variance of the probability of response as defined in Equation (10), and is the information matrix for the design and P i is design weight.

Results and discussion
The initial design for the four set of true parameters are presented in Table 2. This was used to obtain the values of the covariates X 1 and X 2 that maximizes |M (ε; αÞ| and the results of the investigation of optimal design of experiment for small and large samples in simulation study were discussed and presented alongside with the PEV value.

Simulation study
Optimal design of experiment for generalized linear models are obtained in many situations by using best parameter guess. This is achieved either by a conducted experiment or a pilot survey. In this study, optimal design of experiment for the proposed two variable quadratic logistic model was obtained through a simulation study of 10000 experimental run by using some parameter guess for the initial design. The purpose of the simulation study is to juxtapose optimal designs to some non-optimal designs and determining the existence of optimal design in small samples. Therefore, grid search method was used in the simulation and the result of the four types of response curve are presented in Tables 3-10. The expected optimum value and prediction error variances were obtained for both small and large samples. Table 3 presents predictive power of the model with initial parameter α ¼ ð2; 0; 0; À 0:1; À 0:1Þfor high-wide type of response curve when the sample sizes are small. Here, PEV values obtained for 5point, 6point, 7point and 8point designs cannot be used to predict the model performance since they are greater than one (the higher the PEV, the lower the predictive capability of the model). The value of PEV for the four non-optimal designs; 5points design, 6points design, 7points design and 8points design are, respectively, 2.012, 1.454, 1.219 and 1.161 when the sample size is 10 at a very high response on the vertical axis spread widely over the design region of the two X coordinates. This result does not indicate any significant change when the sample size is increased to 25 for all the non-optimal designs. Only the D-optimal design can be used for the model prediction since its PEV value is approximately 0.6 for the two small sample sizes.

Discussion
Tables 3-10 had shown the predictive power of the model considered in this study. The results had shown that model performance in optimal design setting is dependent on the initial best guess parameters but not on the sample size. The type of response curve considered for two small sample sizes n = 10 and n = 25 and two large sample sizes n = 50 and n = 100 did not indicate any significant variation in the values of PEV obtained from the simulation. This is a confirmation of existing argument in the literature that optimal design is not sample size-dependent irrespective of the sample size used. D-optimal design shows a noticeable predictive capability of the model since its prediction error variance (PEV) is less than one. Any design with PEV greater than one, such a design may be optimal locally but not good for prediction.

Conclusion and recommendation
The study has presented optimal experimental design for two variables quadratic logistic regression model. The optimality of the design was confirmed through the prediction error variance which is less than one and the optimal design according to caratheodory's theorem is within the bounds on the support points p where 5≤ ρ ≤ 15 . Four types of response curves are examined and the number of support points ranges from 5 to 9 on each with the sample size. For quadratic logistic model with number of parameters P = 5, caratheodory's theorem gives the bounds on the support points as 5≤ S ≤ 15 . However, these designs satisfied the bound as the optimal design for the model considered was obtained at 9 support points. Locally optimal designs for the model under consideration were derived for four different parameter sets. The result of the experiment indicates that optimal design exists by observing the visual inspection of the surface plot and contour plot. It is therefore recommended that determination of optimal design for quadratic logistic model should be verified through prediction error variance to ascertain the stability of the model used for the design.