Minimax Efficient Random Experimental Design Strategies With Application to Model-Robust Design for Prediction

Abstract In game theory and statistical decision theory, a random (i.e., mixed) decision strategy often outperforms a deterministic strategy in minimax expected loss. As experimental design can be viewed as a game pitting the Statistician against Nature, the use of a random strategy to choose a design will often be beneficial. However, the topic of minimax-efficient random strategies for design selection is mostly unexplored, with consideration limited to Fisherian randomization of the allocation of a predetermined set of treatments to experimental units. Here, for the first time, novel and more flexible random design strategies are shown to have better properties than their deterministic counterparts in linear model estimation and prediction, including stronger bounds on both the expectation and survivor function of the loss distribution. Design strategies are considered for three important statistical problems: (i) parameter estimation in linear potential outcomes models, (ii) point prediction from a correct linear model, and (iii) global prediction from a linear model taking into account an L 2-class of possible model discrepancy functions. The new random design strategies proposed for (iii) give a finite bound on the expected loss, a dramatic improvement compared to existing deterministic exact designs for which the expected loss is unbounded. Supplementary materials for this article are available online.


Decision-Theoretic Design and Random Decisions
In frequentist decision-theoretic experimental design, the success of the experiment in relation to its objective is quantified by the value of a loss function, (θ ,α). Here, the vector θ contains the true parameter values and is chosen by Nature, while the vectorα = h(ξ , y) contains estimates of the interest parameters, α = a(θ ). Additionally, ξ = (x 1 , . . . , x n ) denotes the design, to be chosen by the Statistician, and y = (y 1 , . . . , y n ) T ∈ R n is the vector of responses, yet to be observed.
Prior to the experiment the loss is a random variable and so cannot simply be minimized. Thus, the design is instead chosen to give a favorable pre-experimental distribution of possible losses. Usually in the optimal design literature only deterministic choices of ξ are considered. However, we will argue that if Nature is a passive participant rather than a reactive and antagonistic opponent, then one can often obtain a loss distribution with better properties by instead using a suitable random strategy to select ξ .
In the conventional approach, favorability of the loss distribution associated with design ξ is measured by considering the expected loss, R(θ, ξ ) = E [ (θ,α)], also known as the risk. If the risk is independent of θ , then ξ is simply chosen to minimize R(θ , ξ ). Otherwise a minimax design is typically used instead, that is, a ξ mM that minimizes (ξ ) = max θ ∈ R(θ , ξ ) with respect to ξ , where is the set of possible values for θ . Most standard "alphabetic" design optimality criteria for linear models, such as A-, L-, and V-optimality, can be derived from this framework via an appropriate choice of loss function. Note that (ξ ) is a tight upper bound for the (unknown) expected loss, R(θ , ξ ), attained at the true parameter values. Hence, among all deterministic designs, ξ mM gives the strongest possible bound on the attained expected loss.
Though the above property appears to give a strong argument in favor of the use of ξ mM , in fact in both game theory and statistical decision theory it is widely recognized that a minimax deterministic decision is often outperformed by a randomized decision strategy (e.g., Blackwell and Girshick 1979, Berger 1985, chap. 5, Thie and Keough 2011). Since design selection can be viewed as a decision problem, or alternatively a game pitting the Statistician against Nature, it stands to reason that random decision strategies should also be beneficial for experimental design. Nonetheless, aside from a few minimax analyses of Fisherian randomization (Wu 1981;Li 1983;Hooper 1989;Bhaumik and Mathew 1995), the topic of minimax random strategies for design selection appears almost totally unexplored in the literature.
In this article, we address this deficiency by introducing the notion of a general random design strategy (RDS) and investigating the performance of a minimax RDS in a number of important design problems. We find that a minimax RDS typically gives a substantially reduced upper bound on the attained expected loss, and a substantially reduced upper bound on the survivor function, that is, the probability that the loss exceeds a given threshold. It is able to do so by exploiting our key assumption that Nature is passive, and does not change θ in response to our choice for ξ .
The generality of our decision-theoretic formulation enables a wide variety of design problems to be addressed in a unified framework. Specifically, in this article we consider: (i) design for parameter estimation in a linear potential outcomes model with fixed (i.e., nonrandom) unknown unit effects (cf. Bailey 1981;Wu 1981;Dasgupta, Pillai, and Rubin 2015;Ding 2017). (ii) design for prediction at an unknown point in a correctlyspecified, normal-response linear model. and, of particular importance, the problem that motivated this work: (iii) model-robust design for global prediction in a normalresponse linear model contaminated by an unknown model discrepancy function belonging to an L 2 -class (cf. Wiens 2015).

Organization of the Article
In Section 2, the definition of a minimax RDS is presented, and general bounds are introduced for the survivor function of the loss distribution. The connections between the proposed framework and other related notions in the literature are discussed.
In Section 3, random design strategies are explored for a linear potential outcomes model in which unit-treatment additivity may not hold. We show for the first time that a well-known design and analysis strategy from the additive case remains minimax under these weaker assumptions. In particular, we show that under appropriate conditions the minimax combination of RDS and estimator is to use complete randomization of a classical optimal design and ordinary least squares (cf. Li 1983).
Section 4 illustrates for the first time how random design strategies improve minimax efficiency under a loss function extending G-optimality, under the assumptions of mean-zero normal unit effects and finite design space. Examples are presented in which a minimax RDS gives a reduced bound on the expected loss and the survivor function of the loss distribution compared to both a deterministic G-optimal exact design (cf. St John and Draper 1975) and a G-optimal approximate design.
The most interesting results are given in Section 5, where random translation design strategies are proposed for modelrobust prediction, in the presence of model discrepancy from an L 2 -class. Our new strategies give designs with finitely many runs and bounded expected loss. This represents a major improvement on existing model-robust design theory, in which it is only possible to achieve bounded expected loss by using a design with infinitely many runs, a practical impossibility.
Section 6 contains further discussion of the context of our results. Proofs are deferred to the supplementary materials.

Some Terminology and Assumptions
The vector x i ∈ X , i = 1, . . . , n, denotes the treatment applied to the ith experimental unit. The set, X , of possible treatments is assumed to be a compact subset of R q for some q ∈ N. Let denote the set of competing designs, so that ξ = (x 1 , . . . , x n ) ∈ . Throughout this article, it is assumed that = X n ; that is, any run order of any choice of n treatments from X , not necessarily distinct, is permitted. Clearly this would not be the case if, for example, a multi-stratum design structure were required (Bingham 2015). Given ξ = ξ and θ = θ , the response vector y is assumed to be a random draw from the probability measure P(·|ξ , θ ) on R n .

Random Design Strategies
Definition 2.1. An RDS π is a probability measure on such that the n-point design ξ to be run in the experiment is chosen at random by sampling from π .
A design ξ chosen via application of a random design strategy is called a random design. Note that a deterministic design is a special case of a random design; it corresponds to a strategy which assigns probability 1 to a particular ξ ∈ , that is, a strategy π with singleton support supp(π ) = {ξ }. Hence, there is no disadvantage to considering random designs; if a random design gives no improvement over a deterministic design, then the optimal π will be a point-mass distribution, δ ξ . Where emphasis is required, we refer to a RDS with more than one support design as nondeterministic. For mathematical precision, we suppose that π(A) is defined for any Borel-measurable subset A ⊆ .
The following assumption that Nature is passive and not antagonistic seems plausible in most situations, and is key to our analysis. For a detailed discussion of why this assumption is needed, see Section 6.
Assumption 2.1. θ is fixed, that is, it is chosen independently of ξ .
If Assumption 2.1 holds then, prior to the sampling of a specific design realization, the pre-experimental loss distribution is that induced on (θ,α) by the joint distribution of ξ and y. The attained pre-experimental expected loss is thus . This can be given a tight upper bound via R(θ, π) ≤ (π) = max θ ∈ R(θ , π).
Note that writing R(θ, π) and (π) for the risk and risk bound of an RDS π is a slight abuse of notation as we have already defined R(θ , ξ ) and (ξ ) as the risk and risk bound of a deterministic design ξ . However, the concepts are analogous, and it should be clear from the context which is intended.  S(θ, π , u), for two design strategies in the example of Section 3.2.1, in the case = 2 . Gray line: Unrandomized L-optimal design, ξ mM (tight bound). Black line: Randomized L-optimal design, π mM (Markov bound).

Survivor Function of the Loss Distribution
So far we have only considered the expected loss. To analyze the loss distribution in more detail, one can consider its survivor function, Pr[ (θ,α) > u] (for a related idea see the quantile criterion of Kapelner et al. 2020). For an RDS π , this is S The attained survivor function S(θ, π , u) is unknown due to its dependence on θ. However, it can be bounded tightly in a similar way to the attained expected loss, namely S(θ , π , u) ≤ max θ ∈ S(θ , π , u). A weaker bound can be obtained using Markov's inequality, giving . This latter bound is not tight, but it can be useful for comparison of a minimax RDS with a minimax deterministic design in cases where max θ ∈ S(θ , π mM , u) is difficult to compute. Compared to a minimax deterministic design, a minimax RDS typically gives a reduced upper bound on the survivor function. Figures 1, 2, and 5 illustrate this phenomenon.

Fisherian Randomization
In practice, an element of randomness in design selection is already commonly recommended by most statisticians. Specifically, it is almost unanimously accepted as beneficial to perform Fisherian randomization of the allocation of treatments to experimental units. This has many advantages, but we focus on two.
First, randomization can be used as a basis for statistical inference without strong modeling assumptions, for example, using Neymannian randomization-based estimation or Fisher's exact test. These techniques can only be applied if an appropriate randomization has been used, or alternatively a valid rerandomization procedure (e.g., Morgan and Rubin 2012). They are not applicable with a fully deterministic optimal design (Li, Ding, and Rubin 2018).
Second, Fisherian randomization improves experiment robustness, although this benefit is not captured by design performance metrics such as D-efficiency. In our view, this inability to explain the advantage of randomization is a regrettable weakness of standard optimal design theory. In contrast, with minimax theory randomization arises as a necessary consequence of optimality under appropriate conditions. For example, in a small series of pioneering papers it was shown that, under uncertainty about the mean of the random errors in a linear model, Fisherian randomization of a standard optimal design is minimax (see Wu 1981;Li 1983;Hooper 1989;Bhaumik and Mathew 1995), deepening the mathematical foundation of longstanding statistical practice.
Despite its many advantages, Fisherian randomization is quite restrictive when viewed within the wider space of general random strategies. As a consequence it will not be optimal in all situations. From the existing literature it is currently unclear what is a minimax strategy when there is uncertainty about aspects of the problem other than the unit effects, for example, the functional form of the regression model, or the choice of a location for prediction. Our more flexible approach enables good statistical properties to be obtained in a wider range of problems.

Approximate Designs
In addition to the exact designs discussed in Section 1.1, another traditional optimal design approach is to work with an approximate design η = {x 1 , . . . ,x K ; w 1 , . . . , w K }, with support pointš x 1 , . . . ,x K ∈ X and weights w 1 , . . . , w K > 0 ( K k=1 w k = 1) (e.g., Kiefer and Wolfowitz 1959). Many numerical methods exist for constructing an approximate design that is optimal with respect to some criterion such as G-or D-optimality (e.g., Yang, Biedermann, and Tang 2013;Harman, Filová, and Richtárik 2020).
Practical implementation of an approximate design depends on the use of a rounding method to determine an integer number, n k ≈ nw k , of runs to be allocated to treatmentx k , subject to K k=1 n k = n. Most commonly the rounding is determined via an optimization procedure. For example, with Kiefer rounding the n k are selected to minimize max k |n k /n − w k |, and with Adams rounding the n k are selected to maximize min k n k /(nw k ) (Pukelsheim 2006, chap. 12). Adams rounding gives an optimal efficiency bound. Another common approach is Federov's method (Pronzato and Pázman 2013, pp. 296-297). We refer to a deterministic (exact) design obtained by one of these procedures as a deterministic rounded optimal approximate design (ROAD). The choice could also be randomized, by selecting one member uniformly at random from the set of discretizations satisfying the Kiefer or Adams criteria (see Section 4.1). Sometimes this randomized rounding procedure gives a minimax RDS (e.g., Sections 4.1.2 and 4.1.3), other times it does not (e.g., Section 4.1.4).
Note that independent random sampling of points from the approximate design measure exhibits poor properties and is never recommended. For small n it gives a nonnegligible probability of obtaining a singular exact design. Its asymptotic performance is also inferior compared to other rounding methods. Specifically, under independent sampling the difference between the proportion, n k /n, of runs allocated tox k and the optimal proportion, w k , is of order O p (n −1/2 ) as n → ∞. In contrast, with the Adams method the difference is of the smaller order O(1/n).
An RDS π is a measure on exact designs, unlike η which is a measure on treatments. Consequently, π contains much more detailed information about the experimental procedure than η. For example, unlike η, the RDS π implicitly specifies the probability distribution of the unit-treatment allocation, and any correlations between the replication numbers of two treatments over different realizations of the design. In Section 4, we show that this additional detail can enable superior statistical performance, especially in multifactor experiments with small sample size.

Random Balance Designs
Despite the superficially similar nomenclature, the approach presented in this article has little in common with the muchcriticized "random balance" designs (Satterthwaite 1959). Our perspective is that it is not the randomness of such designs that is an issue per se, but the poor structure of that randomness, as those strategies are chosen without any decision-theoretic justification. For polynomial response surface models, random balance designs can often lead to problems such as highly correlated parameter estimators or even nonestimability due to partial or total confounding of some factor effects. In contrast, we show that the random strategies proposed in Sections 4 and 5 for the estimation of polynomial response surface models give demonstrably better efficiency than deterministic designs.

Potential Outcomes Models
The potential outcomes framework was first introduced by Neyman for designed experiments (Splawa-Neyman, Dabrowska, and Speed 1990), and has been extended to observational studies (e.g., Rubin 2005) and factorial experiments (Dasgupta, Pillai, and Rubin 2015). We denote by Y i (x) (i = 1, . . . , n) the potential outcome for the response that would occur if the ith experimental unit were to receive treatment x ∈ X . The totality of these counterfactual potential outcomes is referred to as the science. However, the "fundamental problem of causal inference" is that only one treatment can be applied per unit and hence the science can only ever be partially observed.
Our assumptions can be more clearly stated after rewriting the potential outcomes as denotes the mean response under conditions x, and i (x) denotes the unit effect of the ith unit under treatment x. By construction, the unit effects satisfy 1 n n i=1 i (x) = 0 for all x ∈ X . Unlike conventional statistical modeling, in the Neymanian approach the unit effects are treated as fixed unknowns instead of random variables. The only manner in which randomness arises in the observed responses is therefore from the random assignment of treatments to experimental units. There is no need to assume normality or independence of the unit effects. It is even possible to relax the assumption of unit-treatment additivity, equivalent to the condition that i (x) ≡ e i for all x ∈ X , which is commonly made in experimental design (e.g., Kempthorne 1955;Bailey 1981Bailey , 2017.
In the remainder of this section we adopt a linear potential outcomes model, in which where a known regression basis function that is continuous with respect to the topology on X , and β = (β 0 , . . . , β p ) T ∈ R p+1 a vector of unknown parameters. We typically assume that f 0 (x) = 1, that is, the model has an intercept. The response actually observed for the ith unit, which receives treatment x i , is In the absence of blocks, the traditional approach to designing an experiment is to use a completely randomized design (CRD). This consists of selecting a deterministic n-tuple of treatments,ξ = (x 1 , . . . ,x n ), and allocating these to experimental units according to a random permutation ρ ∼ Uniform(S n ), giving x i =x ρ −1 (i) . Here, S n denotes the symmetric group of order n.
The probability measure corresponding to the CRD strategy can be denoted concisely using the concept of a pushforward measure. Specifically, the permutation ρ ∈ S n acts as a bijection on . , x n ) ∈ , and on random design strategies via the pushforward operation ρ : π → ρ * π , with ρ * π(A) = π(ρ −1 (A)), with A a Borel-measurable subset of . The probability measure corresponding to the CRD defined above can then be written as π CRD,ξ = 1 n! ρ∈S n ρ * δξ , where δξ denotes the point-mass probability distribution with support {ξ }.
Under a CRD, a reordered version of the response vector can be shown to follow a correlated heteroscedastic nonnormal linear model, as we see below. Let r i = Y ρ(i) (x i ) denote the response from the unit which is allocated to treatmentx i under the CRD. Further let (x) = ( 1 (x), . . . , n (x)) T denote the vector of unit effects functions and E(ξ ) denote the n × n matrix with i, jth

Minimax Estimator and Design Strategy
Before identifying the minimax estimator and design strategy, we must first specify the loss function and its corresponding risk, together with our assumptions about the set of possible unit effects and the estimator used. It is supposed that the experimental goal is estimation of a transformation of the parameters, α = β, where is an r × (p + 1) matrix and β 0 is not of interest, so that the first column of consists of zeroes. A corresponding loss function is whereα = h(ξ , y) is the estimator, not necessarily ordinary least squares, and θ = ( , β) ∈ = E × R p+1 , where E denotes the set containing all unit effects function vectors considered possible prior to the experiment. When computing the risk no integral is needed with respect to y, since y is uniquely determined given ξ and θ under model (3). Hence, Two possibilities are considered for the set of possible unit effects, denoted E 1 and E 2 , giving rise to two possibilities for , This set consists of all unit effects function vectors such that S 2 (x) ≤ σ 2 . This corresponds to the situation where the variances of the potential outcomes are bounded and allowed to differ among treatments, but there is no prior knowledge that a particular treatment has a smaller variance. The second possibility is which corresponds to the case of unit-treatment additivity with S 2 (x) ≡ s 2 ≤ σ 2 . The sets E 1 and E 2 are both invariant to permutations of the unit labels, corresponding to an assumption that prior knowledge about the units is homogeneous. If it is instead believed that there is some structure to the experimental units, such as blocking, then a more heterogeneous set of possible unit effects function vectors should be considered.
Several conditions are imposed on the estimatorα. First, we suppose it is linear, that is,α = h(ξ , y) = A ξ y, as is conventional for Neymanian point estimation (see Dasgupta, Pillai, and Rubin 2015;Zhao et al. 2018). In addition, we will suppose thatα is invariant, that is, h(ρ(ξ ), ρ(y)) = h(ξ , y) for all ρ ∈ S n . In other words, permutation of the order in which the data are written yields identical estimates. Further, we suppose thatα is continuous in the sense that the map ξ → A ξ is continuous on the support, supp(π ), of the RDS. These properties are all satisfied by the ordinary least squares estimator, Under more restrictive assumptions about the unit effects than adopted here, the optimal combination of estimator and design is well-known. In particular, with normal random errors i ∼ N(0, σ 2 ) it is known by the Gauss-Markov theorem that α OLS is the best linear unbiased estimator. Moreover, in this case the design strategy minimizing the expectation of the loss (4) would be an L-optimal deterministic design, that is, a ξ * L ∈ that minimizes tr[ T M −1 ξ ] with respect to ξ ∈ . The result below identifies the minimax combination of RDS and estimator for the potential outcomes model (3).
Theorem 3.1. For model (3) and loss function (4) with = 1 or 2 : (i) ifα is invariant, given any nonsingular π the strategyπ = 1 n! ρ∈S n ρ * π obtained by uniform random permutation of the treatments sampled from π satisfies (π) ≤ (π); (ii) for the strategyπ , the ordinary least squares estimator α OLS is unbiased and minimax among all continuous linear invariant estimators. With this design strategy and estimator, we have (π) = nσ 2 n−1 tr[ T M −1 ξ ]dπ(ξ ); (iii) subject to the constraint thatα is continuous, linear and invariant, a minimax combination of RDS and estimator is complete randomization of an L-optimal deterministic design together withα =α OLS .
For a deterministic design, the minimax estimator and maximum risk under the potential outcomes model (3) are given by the following result.
Proposition 3.2. For model (3), loss function (4), and a deterministic design, ξ , with M ξ invertible: (i)α OLS is minimax among all estimators of α for = 1 or 2 ; denotes the maximal eigenvalue of a matrix; and (iii) the survivor function of the loss dis- The max-risk efficiency of RDS π relative to π is defined as eff(π ; π) = (π)/ (π ). From Proposition 3.2(ii) we see that, relative to the completely randomized version of ξ , the max-risk efficiency of the unrandomized version of ξ satisfies for = 1 , 2 , with equality when = 2 .

Example: Full Quadratic Model, Three Factors,
x 3 ) T ∈ X and we set = [0 p×1 |I p ]. By Theorem 3.1, the minimax RDS, π mM , is to apply Fisherian randomization of the run order to an L-optimal design, ξ * L . Such a design has been computed using co-ordinate exchange and is given in the supplementary materials ( Table 2). The risk bound for this strategy is (π mM ) = 1.34σ 2 . Randomization provides a substantial efficiency gain: by (5) the max-risk efficiency of the unrandomized version of ξ * L is at most 24.9% relative to π mM . Figure 1 shows, for each of ξ * L and π mM , an upper bound on the attained survivor function of the loss distribution in the case = 2 , using Proposition 3.2(iii) and the Markov bound from Section 2.3. Here we have exploited the fact that the bounds only depend on the ratio u/σ 2 to produce a plot without assuming a specific value of σ 2 . We see that, compared to the unrandomized L-optimal design, the minimax RDS reduces the worst-case probability of a loss exceeding u for all values of u shown. Note that by "reduces, " we mean "gives a value which is less than or equal to the original value. " From the above, it is clear that randomization provides a substantial benefit. However, it is noteworthy that it is nonetheless inadequate to randomize the run order of a poor set of treatments. For example, the CRD based on ξ bad in Table 2 has a max-risk efficiency of at most 0.9% relative to π mM , far lower than the unrandomized L-optimal design.

G-Optimal Random Design Strategies
In this section, it is assumed that X is finite, and that under treatment x ∈ X the response is distributed as N[μ(x), σ 2 ] with σ 2 > 0 unknown. In addition the model is assumed to be linear, that is, the mean response function satisfies (2). A classic deterministic optimal design for prediction for this model is the G-optimal exact design, ξ * G ∈ = X n , which minimizes max denotes the ordinary least squares prediction of the mean response.
The G-optimal exact design above may be derived as a minimax deterministic decision-theoretic design via the choice of an appropriate loss function. To do so, we must assume that the goal of the experiment is to predict α = μ(x) = f T (x)β at a point x ∈ X , which (a) is not known by the Statistician at the time of planning the experiment, (b) does not change as a result of the choice of ξ , and (c) is known during analysis. For example, this may be the case if the prediction is made by someone else. Then an appropriate loss function is given by the predictive squared error, depending on the unknown θ = (x, σ 2 , β) ∈ = X × [σ 2 ,σ 2 ] × R p+1 , with σ 2 andσ 2 defining lower and upper bounds on σ 2 , respectively. Under an alternative experimental goal, a different loss may be more suitable. For example, if the aim is instead to globally estimate the whole function μ, then integrated squared prediction error may be appropriate. In Section 5, the latter loss function is applied for such global prediction problems in the context of approximate linear models.
To verify that the minimax deterministic design under loss (6) coincides with a G-optimal exact design as claimed, first note that it is minimax to setα =μ(x) (see Proposition B.5 in the supplementary materials). In this case, the loss simplifies as and the risk of a deterministic design ξ ∈ becomes It is clear from this that the G-optimality criterion is equivalent to minimization of max θ ∈ R(θ , ξ ) with respect to ξ .
The assumption that x does not change in response to the choice of ξ , that is, assumption (b) above, is analogous to Assumption 2.1. Therefore, decision-theoretic arguments indicate that a minimax deterministic design may be outperformed by an RDS π . The risk of π is R(θ , π) = E ξ ,y|β,σ 2 (θ ,α) = σ 2R (x, π), withR(x, π) = f T (x)E{M −1 ξ }f(x), and a minimax RDS minimizes (π) =σ 2 max x ∈XR (x , π) with respect to π . We also refer to a minimax RDS as a G-optimal RDS. If the numbers of elements of X and are sufficiently small, then it is possible to obtain a minimax RDS numerically by solving an appropriate linear programming problem, using a standard method from game theory (see Section A in the supplementary materials). However, this is not suitable as a general-purpose approach if n is large or if X has a large number of points.
Before proceeding we note that for any nonsingular RDS the worst-case survivor function of the loss distribution can be computed using the following result.
Proposition 4.1. For a normal response linear model , loss function (6), and a nonsingular RDS π , the survivor function of the loss distribution satisfies max θ ∈ S(θ , π , u) =

Approximate Designs
A G-optimal approximate design η * for this problem minimizes From the general equivalence theorem, this is equivalent to a D-optimal approximate design. The following results show that for certain sample sizes a deterministic rounding of η * can be minimax or highly minimax efficient within the set of RDS, in which case it is not necessary to use a nondeterministic strategy.
Proposition 4.2. Suppose that η * is minimally supported and n is divisible by p + 1. Then (i) the support points of η * are equally weighted; (ii) the ROAD has n/(p+1) replicates of each support point of η * ; and (iii) the deterministic ROAD is minimax within the set of RDS.
Proposition 4.3. The max-risk efficiency of a deterministic ROAD, ξ A , obtained via Adams apportionment of the Goptimal approximate design η * is at least 1 − K/n, where K denotes the number of support points of η * .
As an example, consider the implications of these results for a one-factor polynomial regression of degree d. It is well-known that the G-(equivalently D-) optimal approximate design is minimally supported. Hence, the deterministic ROAD ξ A is minimax if n is divisible by the number of parameters, or highly efficient if n is large. However for small sample sizes or multi-factor problems, a rounding of the G-optimal approximate design may be inefficient compared to a minimax RDS. Inefficiency can arise whether the rounding is chosen deterministically (e.g., Sections 4.1.2-4.1.4) or at random from the set of optimal discretizations (e.g., Section 4.1.4).
In contrast, the deterministic designs obtained using standard methods are suboptimal. For example, the G-optimal deterministic exact design, ξ mM = (−1, 0, 1), has a max-risk efficiency of 100 (π mM )/ (ξ mM )% = 90%. The G-optimal approximate design η * has two support points, x = ±1, each with weight 1 2 . Rounding η * gives either ξ 1 or ξ 2 ; each of these roundings satisfies both the Kiefer and Adams rounding criteria. Note therefore that in this case the minimax RDS can be viewed as a random choice of one of the two possible Adams roundings of the G-optimal approximate design. The randomization is essential: viewed as deterministic exact designs, both ξ 1 and ξ 2 are suboptimal, each with a max-risk efficiency of 75%.

Example: Full Quadratic Model, 2 Factors, n = 6
Runs, X = {−1, 0, 1} 2 Here, f(x) = (1, x 1 , x 2 , x 1 x 2 , x 2 1 , x 2 2 ) T for x = (x 1 , x 2 ) T ∈ X . There are 76 possible nonsingular deterministic designs modulo permutations of the run order, which under the model assumptions in Section 4 do not affect the risk function and so are irrelevant. (If weaker assumptions were used, such as in Section 3, then the run order would also need to be considered.) The minimax deterministic design has (ξ mM ) = 2.75σ 2 .
A minimax RDS, π mM , was obtained using linear programming (see Figure 2, left panel). This has 8 support designs, ξ 1 , . . . , ξ 8 , with varying probabilities, and (π mM ) = 1.55σ 2 ; in fact,R(x, π mM ) = 1.55 for all x ∈ X . Support designs ξ 1 , ξ 5 , ξ 6 , and ξ 7 are minimax deterministic designs; the remaining four are not but they are helpful in reducing the risk bound for the random design strategy.
The minimax RDS again outperforms traditional deterministic designs. The deterministic G-optimal exact design has a max-risk efficiency of 1.55/2.75 × 100% = 56% relative to the minimax RDS. The G-optimal approximate design for this problem is given in Table 1. The 6-run Kiefer roundings of this approximate design contain all 4 corner points and the center point, plus one edge mid-point. The max-risk efficiency of a fixed (i.e., nonrandomized) Kiefer rounding is just 56.4% relative to the minimax RDS. Adams rounding is not recommended here: due to the small sample size, it may give a singular design. Randomizing the choice of Kiefer rounding leads to a strategy with a max-risk efficiency of just 73%. Hence, randomized rounding of a G-optimal approximate design is inadequate in this example, and the more flexible general RDS is necessary.  Figure 2 (right panel) shows that, compared to several other strategies, the minimax RDS gives a reduced upper bound on the attained survivor function of the loss distribution, S(θ , π , u). In particular, for a wide range of values of u it outperforms the minimax deterministic design, and both fixed and randomized Kiefer roundings of the G-optimal approximate design.

Model-Robust Design and Approximate Linear Models
A long-standing problem with many traditional "alphabetic" design optimality criteria is their reliance on an assumed model, which must be specified prior to the experiment by the Statistician. The resulting designs are often inefficient if the true datagenerating model differs from the one that has been used to compute the optimal design (Box and Draper 1959). This is in part a consequence of the fact that most design optimality criteria are variance-based; more robust designs may be obtained by accounting for the bias that is introduced if the model is incorrect. For example, suppose that the true data-generating model is y ∼ N[μ(x), σ 2 ], with μ not necessarily linear, and the Statistician's a priori assumed model for design purposes is the linear model y ∼ N[f T (x)β, σ 2 ]. When the experimental goal is global prediction, a common choice for the design is the V-optimal design, ξ * V , which minimizes tr( , with λ Lebesgue measure. Equivalently, the V-optimal design minimizes the integrated variance of predictions from the assumed linear model. Variance-based criteria such as V-optimality are reasonable if the assumed model is correct, that is, if μ(x) = f T (x)β, because in this case the predictions are unbiased. However, when the assumed model is incorrect, the predictions are biased, and this should be accounted for in the design. For example, we might evaluate design performance using the integrated mean squared error of predictions from the linear model. On this basis, Box and Draper (1959) found that a varianceminimizing design is often outperformed by a purely biasminimizing design. However, their conclusions were limited to the somewhat unrealistic case where the true model μ is a polynomial, and the assumed linear model is a polynomial of lower degree.
A more flexible approach to model-robust design can be achieved by allowing the true and assumed model means to differ by an essentially arbitrary function. More precisely, authors such as Wiens (2015) suppose that where ψ is a discrepancy function that represents the error that results from approximating μ with a linear model. The class H is chosen to include all discrepancy functions considered possible a priori by the Statistician; most commonly it is defined as where L 2 (X ; λ) denotes the set of real-valued functions on X that are square-integrable with respect to λ (e.g., Wiens 1992;Heo, Schmuland, and Wiens 2001;Dette and Wiens 2009). Other choices are possible, for example based on a uniform bound or a smoothness class for ψ (Li and Notz 1982;Yue and Hickernell 1999). Note that orthogonality condition in (10) is perfectly natural: it corresponds to the assumption that the parameter values β ba give the best linear model approximation to the true model, as measured by the L 2 distance. To see this, note that the L 2 -best approximating parameter values β ba An appropriate decision-theoretic formulation can be developed by considering the experimental goal of global prediction. In this case, the interest "parameter" is α = μ, which can be estimated viaμ(x) = f T (x)β. A suitable loss function is then the integrated squared prediction error of predictions from the assumed linear model, that is, where θ = (ψ, β ba , σ 2 ) ∈ = H × R p+1 × [σ 2 ,σ 2 ]. The corresponding risk, R(θ , ξ ), is the integrated mean squared prediction error. It can be shown that the risk is independent of β ba , and so the maximum risk satisfies sup θ∈ R(θ , ξ ) = sup (ψ,σ 2 )∈H×[σ 2 ,σ 2 ] R(ψ, σ 2 , ξ ).
Although the discrepancy class (10) has the advantages of being flexible and well-studied, to date it has been troublesome to use when the treatment space is uncountably infinite, for example, X = [−1, 1] q . In this case, deterministic designs with finitely many runs have woeful performance: it can be shown that any such ξ has unbounded expected risk, that is, (ξ ) = sup θ∈ R(θ , ξ ) = ∞, even if n is large (Wiens 1992). Even worse, the survivor function has the undesirable property given in Proposition 5.1. Roughly speaking, the loss is almost sure to exceed any finite bound in the worst case, due to the possibility of arbitrarily unfavorable states of Nature.
To address the poor performance of finite deterministic designs, the existing literature proposes the use of an optimal design with infinitely many support points, defined through a probability density function f on X (e.g., Wiens 2015). However such an f is not practically useful. To obtain a feasible experiment, f must be approximated by a design ξ with finitely many points, yet if chosen deterministically any such approximation will suffer from the same problems outlined above. Hence, nothing is gained by constructing an optimal f . Our novel solution to this paradox is to instead use a random translation design strategy (see Section 5.2). As we show, such a strategy leads to an experiment with bounded risk and improved bounds on the survivor function of the loss distribution.
The risk for an arbitrary nonsingular RDS, π , can be written as a bias-variance decomposition, where the mean integrated variance (MIV) and mean integrated squared bias (MISB) are given by MIV(σ 2 , π) = quantifies the effect of the discrepancy function on the bias of the predictions.
In the case of no discrepancy, that is, ψ(x) = 0 for all x, the minimax RDS reduces to a point-mass measure on the traditional deterministic V-optimal design, ξ * V . To see this, note that (12) becomes R(0, σ 2 , π) = σ 2 E ξ tr(AM −1 ξ ) ≥ σ 2 tr(AM −1 . To find minimax efficient random strategies, we need to be able to compute the tight risk bound, (π) = sup (ψ ,σ 2 )∈H×[σ 2 ,σ 2 ] R(ψ , σ 2 , π) =σ 2 E ξ tr(AM −1 ξ ) + sup ψ ∈H b(ψ , π) + τ 2 . One potential approach would be to devise an algorithm to numerically maximize b(ψ , π) with respect to the function ψ . However, such an algorithm would likely be extremely computationally intensive. Thus, it is desirable to obtain analytical formulas for (π). For general design strategies, this remains an open problem for future research. However, we have successfully identified a flexible class of random design strategies for which (π) is analytically tractable.
To interpret this definition, note that if t ∼ Uniform(T ) then E(t) = T tdλ(t) = 0 q , and so c i describes the mean location of the ith design point over all the different potential realizations of the random design. The size of the set T of possible translations determines the degree of randomness; when T is small the design is close to deterministic. The regularity condition (ii) in Definition 5.1 states that the support sets for the different design points must be nonoverlapping (apart from a set of measure zero) and is needed to prove Theorem 5.1. The realized design ξ = (x 1 , . . . , x n ) is a translation of the mean designξ = (c 1 , . . . , c n ) by a common vector t that is sampled randomly according to a uniform distribution on T . Thus, ξ retains the same geometric shape asξ (see Figure 3).
For random translation designs, the risk bound is given by the following theorem.
Theorem 5.1. For a nonsingular random translation design strategy π = π RT (ξ , T ), with λ(T ) > 0, An obvious choice is to set T = [− δ 2 , δ 2 ] q above, with δ ≥ 0. Lemma 5.1 gives necessary and sufficient conditions for such a choice to satisfy condition (ii) of Definition 5.1 to give a valid random translation design strategy. Nonsingularity of the strategy must be checked separately. We refer to such a design strategy as a hypercuboidal random translation design strategy, denoted π H (ξ , δ). Note that λ(T ) = δ q , and so the conditions of Theorem 5.1 hold only for δ > 0. Nonetheless, Equation (13) remains valid for δ = 0 provided τ 2 > 0. This is true because for δ = 0 the design strategy is deterministic and it is known in this case that (π) = ∞ (Wiens 1992). . . . , n, j = 1, . . . , q; and (ii)

Simple Illustrative Examples
For illustration we present approximately minimax hypercuboidal random translation design strategies for the following problems: (i) n = 3 runs, q = 1 factor, and an approximate quadratic model, that is, f(x) = (1, x, x 2 ) T , and (ii) n = 4 runs, q = 2 factors, and an approximate first-order model, that is, f(x) = (1, x 1 , x 2 ) T , x = (x 1 , x 2 ) T . Minimax strategies were identified using the approach described in Section 5.3.1 and are plotted in Figure 4 for a range of values of τ 2 /σ 2 . For both problems, it is clear that the minimax choice forξ is similar to the V-optimal deterministic design (for q = 1, ξ * V = (−1, 0, 1); for q = 2, ξ * V is the 2 2 factorial), modified to account for the constraints of Lemma 5.1. The minimax choice for δ increases as τ 2 increases, that is, if protection is sought against a discrepancy function with larger L 2 norm, then an RDS with greater variance must be used.

Heuristics for Larger Examples
In our experience, for problems with larger dimensionality it is computationally expensive to identify a global optimum of (ξ , δ) using the brute-force optimization approach described in Section 5.3.1. However, the results for the simple illustrative examples suggest that Heuristic 5.1 may be adequate to identify a combination ofξ , δ with high minimax efficiency. The heuristic essentially performs a one-dimensional optimization to robustify a V-optimal deterministic design. The associated computational cost is minimal, even in problems with a large number of factors and runs. Nonetheless, the gain in robustness compared to the V-optimal design is dramatic, as the resulting RDS has a bounded expected loss.
Heuristic 5.1. To construct an efficient strategy: 1. Calculate a V-optimal or highly V-efficient exact design, ξ * V , for example, by using coordinate descent to minimize tr(AM −1 ξ ) with respect to ξ ; 2. Form a parameterized mean design,ξ δ , that approximates ξ * V and which satisfies the constraints of Lemma 5.1. To do this, move the points of ξ * V away from the boundary of [−1, 1] q if necessary, and split any replicates; 3. Choose δ to minimizeˆ (ξ δ , δ).
With no discrepancy (i.e., if ψ(x) ≡ 0) the risk bound from the V-optimal deterministic design would be 3.918σ 2 , compared with 7.03σ 2 in the presence of discrepancy if the heuristic random strategy is used. Thus, provided one uses an efficient random strategy, the presence of discrepancy only leads to a 34% increase in the bound on the root mean integrated squared prediction error. In contrast, if a deterministic V-optimal design is used, the presence of discrepancy leads to an unbounded risk. Figure 5 also shows upper bounds on the survivor function of the loss distribution for both the optimal heuristic strategy π 2 and a deterministic strategy (all deterministic strategies have the same tight bound). For most values of u the random strategy provides a substantially reduced bound on the probability of a loss exceeding u.
Of course, use of a random translation design strategy results in reduced variance-efficiency if in fact the model is correct. We quantify this loss using V-efficiency, eff V (ξ ) = , defined assuming that ψ(x) ≡ 0. Note that eff V (ξ ) is a random variable due to dependence on ξ . Figure 5 shows that for π 2 the realized design typically has a V-efficiency of around 70%. This seems more than adequate given that the random strategy provides such dramatic improvements in robustness.

Discussion
We believe that the results in this article highlight untapped potential for novel random design strategies to lead to substantial improvement in the properties of the loss distribution for a variety of experimental design problems. We anticipate that future research will realize these benefits in diverse areas where there is a priori uncertainty, including design for nonlinear models and screening experiments.
The discussion below focuses on two main themes. First, we clarify assumptions and potential misconceptions, for example, the importance of Assumption 2.1 and ideas about optimality over repeated samplings and conditional risk. Second, connections with other areas are explored, including the Bayesian interpretation of randomization and links with the computer model calibration literature.

Importance of Nature's Passivity
We now illustrate the importance of Assumption 2.1 in obtaining improved bounds on the expected loss and the survivor function of the loss distribution. Without it a minimax RDS would give no advantage over a minimax deterministic design. To see this suppose that, instead of θ and ξ being independent, it were possible for θ to be a function of ξ . In this case, the attained pre-experimental expected loss of π would not necessarily be given by R(θ, π) = R(θ , ξ )dπ(ξ ). For example, suppose that after observing our choice of ξ , but before generating y from P(·|ξ , θ ), Nature chooses a θ ∈ arg max θ ∈ R(θ , ξ ). In this case, the pre-experimental expected loss under π would be max θ ∈ R(θ , ξ )dπ(ξ ) ≥ max θ ∈ R(θ , ξ mM ). Hence, it would be impossible to improve upon the bound on the expected loss that is given by the minimax deterministic design. However, it seems implausible and unduly pessimistic to suppose that Nature behaves in such a reactive, intelligent and antagonistic manner. In the more realistic case that Assumption 2.1 holds, a minimax deterministic design will be inefficient compared to a minimax RDS due to its focus on guarding against this extreme pathological behavior. In contrast, a minimax RDS is able to reduce the probability of large losses by exploiting the fact that Nature cannot change θ.

Optimality Over Repeated Samplings
In common with all other optimal frequentist procedures, the minimax RDS is derived using an expectation over hypothetical realizations of the same experiment. This may cause some to be concerned that use of a design sampled from an RDS is optimal only if the same experiment is repeated over and over, when in fact it is only conducted once. However, this concern is unwarranted. Neyman's original justification for frequentist procedures that minimize expected risk is that if they are applied consistently in many different experiments then the total achieved loss across all experiments will be reduced (Berger 1984). Note that this point is not unique to our proposed method. Similar repeated sampling properties are also used when deriving traditional deterministic optimal designs, which typically minimize the variance of a point estimator. This variance is also computed by taking an expectation over hypothetical realizations of the same experiment.

Misconceptions About Conditional Risk
A related concern is that if one considers only the conditional risk, R(θ, ξ ), then at first sight it may appear that there are some drawbacks to the use of a ξ sampled from a minimax RDS rather than a minimax deterministic design. However, these are based on flawed reasoning, and so they should not discourage the use of a minimax RDS.
The first apparent drawback is as follows: once one has chosen ξ , the attained risk could be as large as max θ ∈ R(θ , ξ ). The latter is usually larger than the maximum conditional risk max θ ∈ R(θ , ξ mM ) that applies if a minimax deterministic design is used. This begs the question: has use of a minimax RDS really reduced the risk? More careful consideration shows that it is indeed very likely to have reduced the risk, because with high probability our random sampling procedure will have generated a ξ with R(θ , ξ ) < max θ ∈ R(θ , ξ mM ). For example, with the minimax random strategy for linear model prediction in Section 4.1.4, we have that (i) a tight lower bound on the probability is Pr[R(θ, ξ ) < max θ ∈ R(θ , ξ mM )] ≥ 0.684 ; and (ii) the probability that R(θ, ξ ) > max θ ∈ R(θ , ξ mM ) is at most 0.118. (For details of these calculations see the supplementary materials.) A second apparent drawback is that, if one considers only the conditional risk, it may seem that use of a minimax RDS has the disadvantage of replacing a certain experimental outcome with an uncertain one. However, this is simply not the case: the realized loss is uncertain regardless of whether one uses a deterministically or randomly selected design. As shown earlier (e.g., in Figures 1, 2, and 5) a minimax RDS typically gives stronger bounds on the properties of the distribution of possible losses.

Links With Bayesian Approach
Here we have focused on the minimax decision-theoretic framework. From a Bayesian perspective, randomized decisionmaking is often regarded as unnecessary (e.g., Lindley 1982). However, even in this context randomization has several advantages. First, it simplifies Bayesian causal inference (Rubin 1978). Second, randomization has been shown to be a Bayesian decision-theoretically optimal design strategy in situations where several parties have differing prior information or when the analyst, or final decision-maker, is a different person from the one designing the study (Berry and Kadane 1997;Bonassi, Nishimura, and Stern 2009). It may be interesting to investigate optimal Bayesian random design strategies in more complex experiments with multiple stakeholders than the simple settings described in the existing literature.

Computer Model Calibration
We briefly note some similarities and differences between the formulation in Section 5 and calibration of a computer simulator of a physical process (e.g., Kennedy and O'Hagan 2001). In the calibration literature, the basic idea is to approximate the expected response of the physical process under conditions x using the simulator output, η(x, θ). However, before predictions can be made, physically realistic values of the parameters θ must be determined. This can be done by combining data from physical experiments on the real process with data from a computer Figure 5. Example: Full quadratic model, n = 12, q = 3. Left: approximate risk boundˆ (ξ δ , δ) for the heuristic strategy with different values of δ (vertical line: optimal value, δ * = 0.268). Center: Bounds on the survivor function of the loss distribution, S(θ, π , u) (black line: random strategy π = π H (ξ δ * , δ * ), Markov bound. Gray line: Any deterministic design, tight bound from Proposition 5.1). Right: V-efficiency distribution of π H (ξ δ * , δ * ). experiment on the simulator. A major challenge is that, due the high computational expense of simulator runs, the value of η(x, θ ) can only be computed for a few combinations of inputs x, θ , necessitating the construction of a computationally cheaper approximation of η, known as an emulator.
Similar to our approach in Section 5, in calibration it is assumed that the true mean of the physical process differs from the simulator output by an explicit model discrepancy function, namely E(y) = μ(x) = η(x, θ ba ) + ψ(x) (cf. (9)). Here θ ba is a vector of parameter values giving a best approximation to the physical process. Recently developed L 2 -calibration approaches impose an orthogonality condition similar to the one in (10) (Tuo and Wu 2015;Plumlee 2017). An important difference is that in calibration the discrepancy function ψ is explicitly estimated, using Gaussian process techniques, whereas in Section 5 predictions are made without estimating ψ. It would be interesting to explore further connections between modelrobust design and calibration in future research.

Supplementary Materials
The online supplementary materials contain proofs of the theoretical results and some additional supporting numerical results. R code for the examples is available from the journal website and via the first author's personal website, at https://github.com/timwaite/random-designs.