Tuning Bayesian optimization for materials synthesis: simulating two- and three-dimensional cases

ABSTRACT Compared to the optimization of a 1D synthesis parameter in materials synthesis, the optimization of multi-dimensional synthesis parameters is challenging for researchers. Bayesian optimization (BO) has shown high performance in optimizing high-dimensional synthesis parameters when appropriate hyperparameters are adopted. However, hyperparameter tuning for the kernel and acquisition functions used in BO is yet to be fully discussed by material researchers. In this study, we simulated materials synthesis under 2D and 3D synthesis conditions using artificial model functions with different process windows to investigate the effects of hyperparameters. The assumed parameters were temperature, oxygen partial pressure, and the sputtering power for thin-film deposition. Our findings indicate that estimating the process window and the range of physical property change based on the experience and knowledge of the materials researcher is crucial for tuning the hyperparameters of the kernel function. The simulations for high-dimensional search spaces case also indicate that the number of trials for optimization of synthesis conditions might reach several hundred or more. Therefore, the dimensionality and range of the search space must be limited based on the number of practical experiments, which is crucial for applying Bayesian optimization to materials synthesis. Our results facilitate fully automated and autonomous materials synthesis using BO and robotics for materials exploration in a high-dimensional search space. GraphicABSTRACT


Introduction
High-throughput prediction and synthesis of novel materials are vital for achieving a sustainable society [1][2][3][4][5].Given the possible combinations of elements, there is an almost infinite number of new materials [6,7], and in recent years, materials composed of many elements have been synthesized.In addition, the number of controllable synthesis parameters has increased owing to more complex experimental procedures and more sophisticated equipment.Thus, optimizing high-dimensional synthesis parameters in a vast search space is necessary for materials synthesis.Conventionally, materials researchers have optimized experimental parameters based on their cumulative knowledge and experience in related fields, known as the trial-and-error process [7,8].As conventional methods cannot search the entire highdimensional search space owing to high experimental costs, the optimization of high-dimensional synthesis parameters remains challenging.
Compared with conventional trial-and-error materials exploration, Bayesian optimization (BO) has shown promising results in the prediction and optimization of experimental parameters and has been successfully applied in materials synthesis [9][10][11][12][13][14][15][16].The coupling of BO and robotics enables autonomous materials synthesis, enhancing the efficiency of materials exploration in high-dimensional search spaces [17].Proper tuning of the hyperparameters of the acquisition and kernel functions can optimize the synthesis conditions with a minimum number of trials.To date, the hyperparameters of the kernel function used in Gaussian process regression have been optimized by maximizing the marginal likelihood [18][19][20][21][22].The best hyperparameters of the kernel function can be automatically selected from the candidates by selecting the value that maximizes the marginal likelihood from the initial values of multiple hyperparameters.Even if hyperparameters are selected from multiple candidates using the marginal likelihoods, the appropriate hyperparameter must be included in the candidates.Therefore, it is important to establish a guideline for selecting hyperparameters of the acquisition and kernel functions for materials researchers.We previously discussed the optimization of one-dimensional (1D) experimental parameter case [23]; however, recent materials synthesis often demands the optimization of synthesis parameters in 2D or higher search spaces.
In this study, we explored and optimized the appropriate hyperparameters for BO under 2D and 3D synthesis conditions to minimize the number of trials required to determine the optimum synthesis parameters.We first built model functions with three different process windows to mimic actual 2D and 3D thin-film synthesis.We then compared the number of trials required to determine the optimum conditions with different combinations of hyperparameters.Our findings indicate that estimating the process window based on the experience and knowledge of the materials researcher is imperative in tuning hyperparameters.Increasing the dimensionality of the search space led to an exponential increase in the number of trials required for optimization, even when using the optimum hyperparameters.Thus, materials researchers need to limit the dimensionality and range of the search space based on the number of practical experiments.Finally, we investigated the efficiency of tuned BO and random search.Our results facilitate materials research using BO in higher-dimensional cases.

Generation of model functions
In materials synthesis, the targets are wide-ranging, including inorganic and organic compounds and polymers.Additionally, materials vary in form, from powder (bulk) to nanoparticles, thin films, and composites of different materials.Although there are various types of materials, their properties can be controlled by their synthesis parameters, such as the synthesis temperature, pressure, and composition.In some cases, the material properties may change discontinuously with changes in the synthesis parameters.For example, a change in the synthesis parameters may cause a phase transition, resulting in a sudden change in the physical properties of the material.
By contrast, material properties can continuously change with changing synthesis parameters, and if the range of the synthetic parameters is wide enough, material properties obtain the optimum values for specific values of synthesis parameters such as composition, gas pressure, temperature, and reaction time [17,[24][25][26][27][28][29].For example, although moderately high temperatures can improve the crystallinity of crystalline materials, extreme temperatures cause crystallinity degradation due to thermal decomposition [25].Additionally, in the thin-film synthesis of a transparent conductor, indium -tin-oxide, increasing the oxygen partial pressure increases the carrier mobility owing to improved crystallinity; however, it decreases the carrier density owing to decreased oxygen vacancy donors [26].In other words, resistivity reaches a minimum at a specific oxygen partial pressure because of a balance between carrier mobility and density.In the synthesis of organic hole transport materials for perovskite solar cells, the hole mobility varies with annealing time and the amount of dopant, for which the global maximum peak and a local maximum have been found in the twodimensional searching space [17].In this study, as the simplest case, we assumed that the shape of the global and local optimum peaks in materials synthesis is isotropic [17,27].The true function was approximated by model functions combining isotropic Gaussian functions and Gaussian noise.We note that, in actual materials synthesis, the shape of the global and local optimum peaks could be anisotropic.Future investigations into these cases will be reported in due course.
To simulate materials synthesis, we generated artificial model functions based on the following two key points [23].First, we consider the process window (P w ) in materials synthesis.P w is the range of synthesis conditions in which the desired physical property values can be obtained.When P w is large, it is easy to find the optimal synthesis conditions; however, this is difficult when P w is small.Considering the thin-film synthesis of materials such as LiCoO 2 [25], assuming a P w of approximately 100 °C is reasonable.The other key point is the introduction of local optima in the model functions.When optimizing physical properties such as electrical conductivity, their global maxima and local optima often appear [10,27].Therefore, it is a natural extension to introduce more local optima when generating higher-dimensional model functions.
In this study, we generated artificial 2D and 3D model functions to mimic thin-film deposition by sputtering with local optima and different P w values (Figure 1).The three synthetic parameters to be simulated were temperature, oxygen partial pressure, and sputtering power.Typical values of P w for temperature, oxygen partial pressure, and sputtering power are 100 °C [25], 1.0 × 10 −4 Pa [10], and 10 W [29], respectively.Based on these P w values, we generated model functions with three different P w values.A model function was designed by combining three or four isotropic Gaussian functions and Gaussian noise.The isotropic Gaussian functions correspond to a global maximum, local maximum, and background peaks in material synthesis.As the physical property values obtained from real experiments contain experimental noise, Gaussian noise � ~ N(0, 1) was added to the model functions.The model functions in the 2D and 3D cases are and where x 1 ; x 2 , and x 3 ð Þ represent 2D (3D) synthesis parameters; H is the peak height; μ 1 , μ 2 , (μ 3 ) are the peak positions in the 2D (3D) case; σ Gauss corresponds to the standard deviation of the Gaussian peak; H 0 is a coefficient that controls the magnitude of the noise ranging from 0.1% to 5% of the maximum peak height; and � is the Gaussian noise.f x ð Þ 2D and f x ð Þ 3D are the values of the physical properties.
Figure 1(a) shows a 2D model function generated by the addition of three Gaussian functions with peak heights of 1.2, 0.7, and 0.3, corresponding to a global maximum peak, local peak, and background peak, respectively.The 2D search space comprised 51 grids in each dimension (total grids: 51 2 ), and the temperature and pressure were considered as 2D synthesis parameters.We assumed a temperature range of 200-1200 °C and a pressure range of 1.0-11 × 10 −4 Pa.Here, one grid represents 20 °C or 0.2 × 10 −4 Pa.We assumed the standard deviation (σ Gauss ) of the global maximum peak to be P w , corresponding to a range beyond 90% of the maximum value.Three types of 2D model functions with different P w values were generated: P w = (P w for temperature, P w for pressure) = ( 60

Bayesian optimization
Bayesian optimization predicts the relationships between explanatory variables (synthetic conditions) and objective variables (physical properties) from acquired data, and sequentially searches for the global optimum based on the prediction curve and confidence interval.In this study, a prediction model for a property value was developed using Gaussian process regression [30].
For the Gaussian process regression, we used the radial-basis function (RBF) kernel, which is a typical kernel [31].The RBF kernel is given by where the variance, v, and lengthscale, l, are the hyperparameters to be optimized.Small and large values of v lead to smaller and larger scale changes, respectively, in the objective variables in the predicted curve.By contrast, small and large values of l lead to larger and smaller variations in objective variables, respectively, for a given interval of an explanatory variable.v and l were initially set to 0.1 − 10 and 0.5-5 grids, respectively, and were optimized using the truncated Newton method, which is a gradient method for maximizing the marginal likelihood in each iteration.As the hyperparameters obtained by the gradient method are generally local solutions, appropriate initial l and v values are important for obtaining a suitable prediction model, especially for applying BO for autonomous materials synthesis.We applied two types of acquisition functions: upper confidence bound (UCB) and expected improvement (EI) [32][33][34].The UCB is defined as where E is the predicted mean, σ is the standard deviation, and κ is the hyperparameter of the UCB.By contrast, EI is defined as where f max x ð Þ is the largest target value that has been explored, ξ is a hyperparameter of EI, is the cumulative distribution function of a normal distribution, and ϕ Z ð Þ is the probability density function of a normal distribution.The acquisition function determines the next searching point in BO, which depends on the type of acquisition function and its hyperparameters.Adjusting κ and ξ changes the strategy of finding the global maximum; the balance between 'exploitation' (searching for areas with high expected values) and 'exploration' (searching for areas with large standard deviations of the probability distribution and few data points) can be controlled through κ and ξ.The effects of κ and ξ were also investigated.
The number of trials (corresponding to the number of syntheses in actual experiments) for the simulation of each model function was 100 and 300-600 for the 2D and 3D cases, respectively, which included five randomly searched initial points.When 90% or more (approximate region of P w ) of the global maximum was found, we perceived that the maximum value search was successful.The N 90% index was introduced to compare the number of trials required for optimization quantitatively.This index is the number of trials required to determine the maximum value in 90 out of 100 model functions.In this study, for each P w , 500 model functions with different peak positions were generated for both the 2D and 3D cases to calculate the mean and standard deviation of N 90% .The peaks were separated by at least twice the P w value of the local maximum peak to prevent the overlapping of the global and local maximum peaks.Python and the GPy [35] package were used for the simulations.To accelerate the simulations by performing parallel calculations, the supercomputer TSUBAME3.0[CPU: Intel Xeon E5-2680 V4 Processor (Broadwell-EP, 7 cores, 2.4 GHz), RAM: 60 GB] at the Tokyo Institute of Technology and the Fujitsu PRIMERGY C×400M1/ CX2550M5 (Oakbridge-CX) [CPU: Intel Xeon Platinum 8280, RAM: 192 GB] at the Information Technology Center at the University of Tokyo were used.We note that the computational cost of the individual iterations of BO is low even when using an inexpensive laptop computer, and a supercomputer is not necessary for the application of BO to real materials synthesis.The global maximum was successfully observed after 30 trials (Figure 2(i)).The predicted curve is similar to the model function (Figure 2(j)), and more areas around the global and local maxima are exploited (Figure 2(k)).After 30 trials, the acquisition function has a lower value than those after 10 and 20 trials; therefore, the possibility of updating the maximum value is low (Figure 2(l)).

Two-dimensional case
We compared the number of trials required to find the global maximum with different P w values (Figure 3).When P w is small, the median of the distribution shifts to the right, indicating that BO requires more trials.The P w dependence of N 90% (ξ = 0.01) is described in Table 1.We compared the N 90% values using UCB and EI with different hyperparameters.Figures S3 and S4 show the search process and histogram obtained using UCB, respectively.Table 1 summarizes the N 90% values for the 2D model functions at varied κ (5-20) and ξ (0-0.01).Regarding the κ dependence of N 90% , at P w = (100 °C, 1 × 10 −4 Pa) and (140 °C, 1.4 × 10 −4 Pa), N 90% increases with increasing κ, and κ = 5 has the lowest N 90% .However, at P w = (60 °C, 0.6 × 10 −4 Pa), κ = 5 exhibits the worst performance (N 90% = 101), indicating that 'exploration' and 'exploitation' are weak and strong, respectively.Thus, when κ = 10, the optimization performance is stable for all three P w values.As in the case of UCB, the balance between exploration and exploitation is important in EI.Increasing ξ = 0 to ξ = 0.01 and 0.1 decreases N 90% , especially at P w = (60 °C, 0.6 × 10 −4 Pa).This result indicates that a moderately explorative acquisition function is advantageous when P w is small.Increasing ξ = 0.1 to ξ = 0.2 and 0.4 results in a stepwise increase in N 90% , for all three P w values.Interestingly, EI (ξ = 0.01 and 0.1) has a lower N 90% value than UCB (κ = 10) for all three P w values; therefore, EI can optimize 2D synthesis parameters in fewer trials compared with UCB.
The effects of l and v on N 90% were also investigated (Table 2).The acquisition function was EI (ξ = 0.01).For all three cases of P w , N 90% increased with increasing v, especially when v = 10.When v = 0.1 and 1, l had a smaller effect on N 90% .However, the effect was strong when v = 10, which suggests that the gradient method optimization did not work well because the initial values of both l and v deviated significantly from the optimal values.For the worst hyperparameters (l = 0.5, v = 10), the global maximum was not found in 100 trials, which indicates the importance of hyperparameter tuning.The best hyperparameter combinations were l = 1 and v = 0.1 and 1, resulting in the smallest N 90% at each P w .These results provide a guideline for selecting the initial hyperparameter values for Gaussian process regression in the field of material research, as discussed later.Additionally, we investigated N 90% when using multiple initial values.Gaussian process regression was performed on the 12 combinations of v and l used in Table 2, and the combination with the maximum marginal likelihood was selected as the hyperparameter used in each iteration.The obtained N 90% was almost comparable to N 90% using the best hyperparameter (l = 1 and v = 0.1-1, Table S1).Furthermore, the effects of the noise magnitude (H 0 ) were investigated using 2D model functions with P w = (100 °C, 1.0 × 10 −4 Pa) (Table 3).Varying the value of H 0 allows for systematic control of the amount of noise in the model function (Figure S5).When the noise magnitude is small (H 0 is set to 0.1 or 1% of the maximum peak height), κ = 5 and ξ = 0.01, 0.1 provides low N 90% .However, when the noise magnitude is large (H 0 is set to 5% of the maximum peak height), BO using UCB (κ = 5) and EI (ξ = 0.01, 0.1) failed the optimization within 100 trials, and the lowest N 90% is obtained when κ = 10 and ξ = 0.8.These results indicate that more explorative acquisition functions are advantageous for large noise cases.This is probably because BO easily falls into local solutions when the noise magnitude is large.In addition, compared to EI, UCB has a smaller effect of noise magnitude on N 90% , indicating that optimization is more robust against noise.
By contrast, the optimization process using EI detects one local maximum at approximately 400 °C, 10 × 10 −4 Pa, 180 W after the 20th trial (Figure 4(d)).In the 50th trial, the global maximum is successfully obtained (Figure 4(e)).A trace plot of search points for the model function is shown in Figure 4(g).EI takes only 29 trials to find the global maximum, whereas UCB requires 72 trials.These results indicate that EI provides a well-balanced search between exploitation and exploration.Qualitatively, EI can optimize 3D synthesis parameters in fewer trials compared with UCB for all three P w values (Figure S8).
Next, we optimized the hyperparameters of the acquisition functions in the 3D case.Table 4 compares the N 90% values between UCB and EI for different κ and ξ values.Similar to the 2D case, when using UCB, κ = 10 exhibits stable optimization performance for all three P w values.In contrast to the 2D case, when using EI, ξ strongly affects N 90% in the 3D case.At ξ = 0, the N 90% value of BO for all three P w values is beyond the total number of trials (300 and 600) in the simulation.Remarkably, by increasing ξ from 0 to 0.01, Table 2. Dependence of N 90% on the initial values of the lengthscale and variance for 2D model functions with process window P w = (60 °C, 0.6 × 10 −4 Pa), (100 °C, 1 × 10 −4 Pa), and (140 °C, 1.4 × 10 −4 Pa).The magnitude of noise (H 0 ) was set to 0.1% of the maximum peak height.Expected improvement (ξ = 0.01) was chosen as the acquisition function.The bold and underlined values denote the N 90% values at the optimum hyperparameters.

Discussion
The best hyperparameters of the RBF kernel function in the 2D and 3D cases were l = 1 and v = 0.1-1 (Tables 2 and S2).These results indicate the guideline for material researchers to tune the hyperparameters.Concerning lengthscale, l = 1 grid corresponds to a value smaller than P w for the model function.It is paramount for the materials researchers to estimate the P w from knowledge and experience and to set the value of l based on the estimated P w .When setting l, the materials researchers need to pay attention to the grid size.This is because the physical or chemical scale corresponding to l = 1 changes by the setting of the grid size.Concerning variance, the value of N 90% is much larger for v = 10 than for v = 0.1-1, indicating that v should not be too larger than the range of change in physical properties (objective variable).Therefore, materials researchers need to estimate the range of physical property change and set the value of v based on the estimated range.Additionally, preprocessing the measurement data, such as normalization and logarithmic transformation of the physical property values, is imperative when the physical property is expected to change drastically.Our guideline helps narrow down the hyperparameter values to several candidates, from which the best hyperparameter value will be automatically selected using the marginal likelihood, as shown in Table S1.Next, we compared materials explorations using BO and a random search.Here, optimizing the 3D synthesis parameters for model functions with P w = (60 °C, 0.6 × 10 −4 Pa, 6 W) is discussed.BO under the best condition (ξ = 0.01, l = 1, and v = 1) resulted in N 90% = 361.8± 30.5.
By contrast, N 90% in the random search case was 22,876 ± 3,466.Thus, hyperparameter-tuned BO can reduce the number of trials by two orders of magnitude compared with the random search.Additionally, a grid search that exhaustively searches for all synthesis conditions is generally not applicable to materials exploration in high-dimensional search spaces because of high experimental costs.Even if the number of synthesis conditions to search is greatly reduced by increasing the grid spacing, as in the case of the Design of Experiments method, finding the global optimum for narrow P w cases becomes more difficult, especially when P w is narrower than the grid spacing.Thus, BO provides a powerful method for exploring materials in high-dimensional search spaces.
Finally, we discuss the expected experimental time of the autonomous materials synthesis with 3D synthesis parameters.Here, we assume a thin-film synthesis with P w = (60 °C, 0.6 × 10 −4 Pa, 6 W), which results in N 90% = 361.8± 30.5.For example, considering an autonomous experimental system that can fabricate one thin-film sample every 2 h [10], we need approximately one month to automatically optimize the synthesis conditions of new materials.For easier cases, i.e.P w = (140 °C, 1.4 × 10 −4 Pa, 14 W), optimization requires approximately four days (N 90% = 49.8 ± 2.0).For the 2D case, the approximate time for optimizations varies with P w from two to five days.This discussion provides a guideline to choose the number of synthesis parameters in BO for materials synthesis.According to our 1D, 2D, and 3D simulation results [23], N 90% exponentially increases with increasing dimensions; This indicates that the number of experiments required to optimize the synthesis conditions exponentially increases with the increase in the number of dimensions.In real experiments, if the number of synthesis parameters is increased without careful consideration, the synthesis conditions will not be optimized in a feasible number of trials owing to the large time and financial costs.In such cases, the synthesis parameters should be reduced from four or more dimensions to two or three to find the best synthesis conditions.Thus, it is still important for materials researchers to select promising synthesis parameters carefully based on their knowledge and experience.However, an autonomous experiment that combines BO and robotics is essential for discovering novel materials in high-dimensional search spaces by efficiently conducting hundreds or more experiments.

Conclusion
We optimized the hyperparameters of the kernel and acquisition functions in BO for thin-film synthesis under 2D and 3D synthesis conditions.The values of P w are critical in materials synthesis, and thus, the effects of P w were considered.For both the 2D and 3D cases, EI performed better than UCB for three P w settings; the number of trials required to find the global maximum using EI was smaller than that for UCB.The best combination of hyperparameters in the 2D case was EI (ξ = 0.01), l = 1, and v = 1, which yielded N 90% = 21.8 ± 0.8 at P w = (100 °C, 1.0 × 10 −4 Pa).This result indicates that we have 90% confidence in obtaining the material with the best properties after 30 synthesis trials in the 2D case.For the 3D case, the optimum combination of hyperparameters was again ξ = 0.01, l = 1, and v = 1, and N 90% = 74.8± 2.5 with P w = (100 °C, 1.0 × 10 −4 Pa, 10 W).When selecting initial values for l and v, it is important to estimate the process window and the range of change in physical properties based on the experience and knowledge of materials researchers.Furthermore, the number of trials required for optimization in high-dimensional search spaces can reach more than several hundred.Therefore, the dimensionality of the search space and the range of synthetic parameters need to be appropriately limited by material researchers.These results contribute to the application of BO for materials exploration with multi-dimensional synthesis conditions and pave the way for fully automated and autonomous materials synthesis using a combination of BO and robots.

Figure 1 .
Figure 1(a) shows a 2D model function generated by the addition of three Gaussian functions with peak heights of 1.2, 0.7, and 0.3, corresponding to a global maximum peak, local peak, and background peak, respectively.The 2D search space comprised 51 grids in each dimension (total grids: 51 2 ), and the temperature and pressure were considered as 2D synthesis parameters.We assumed a temperature range of 200-1200 °C and a pressure range of 1.0-11 × 10 −4 Pa.Here, one grid represents 20 °C or 0.2 × 10 −4 Pa.We assumed the standard deviation (σ Gauss ) of the global maximum peak to be P w , corresponding to a range beyond 90% of the maximum value.Three types of 2D model functions with different P w values were generated: P w = (P w for temperature, P w for pressure) = (60 °C, 0.6 × 10 −4 Pa), (100 °C, 1.0 × 10 −4 Pa), and (140 °C, 1.4 × 10 −4 Pa) corresponding to σ Gauss = 3, 5, and 7 grids, respectively.We compared the effect of P w on the number of trials in our algorithm necessary to determine the global maximum.The standard deviations of the local maximum peak (σ Gauss = 6) and background peak (σ Gauss = 20) were set as constant values of (120 °C, 1.2 × 10 −4 Pa) and (400 °C, 4.0 × 10 −4 Pa), respectively.By contrast, Figure1(b) shows a 3D model function, which is the addition of four isotropic Gaussian functions with peak heights of 1.2, 0.7, 0.6, and 0.3.We note that the 2D model functions have one local solution, whereas the 3D model functions have two local solutions.This increase in the number of local solutions is a natural extension of the model function based on dimensional expansion.The 3D search space comprised 51 grids in each dimension (total grids: 51 3 ), and temperature, pressure, and sputtering power were considered 3D synthesis parameters.As an extension to the 2D case, we assumed that thin-film synthesis occurred within a temperature range of 200-1200 °C, pressure range of 1.0-11 × 10 −4 Pa, and power range of 100-200 W. Thus, one grid in the 3D

Figure 2
Figure 2  shows an example of the synthesis optimization process of the 2D model function with P w = (60 °C, 0.6 × 10 −4 Pa).The magnitude of noise (H 0 ) is set to 0.1% of the maximum peak height, which corresponds to experiments with very low noise.The acquisition function is EI with ξ = 0.01.We explain the Figure 2(b,c) show the predicted curve and confidence interval obtained after 10 trials, respectively.The confidence interval around the explored points is smaller than that in the unexplored area (Figure 2(c)).The largest acquisition value at approximately 300 °C and 8 × 10 −4 Pa is taken as the next search point (Figure 2(d)).After 20 trials, BO not only searched near the local maximum but also explored broader unexplored areas, indicating the successful tuning of the acquisition function (Figure 2(e)).The predicted curve approached the true model function (Figure 2(f)), and the region with large confidence intervals decreased (Figure 2(g)).

Figure 2 .
Figure 2. Example of the maximum value search process for a 2D model function with process window P w = (60 °C, 0.6 × 10 −4 Pa) when expected improvement (ξ = 0.01) was used as the acquisition function.The magnitude of noise (H 0 ) was set to 0.1% of the maximum peak height.Lengthscale = 1, variance = 1, and five initial points were randomly selected.(a, e, i) Model function and samples, (b, f, j) predicted curves, (c, g, k) confidence intervals, and (d, h, l) acquisition functions using (a-d) 10, (e-h) 20, and (i)-(l) 30 trials.Red and white circles correspond to the latest and earlier samples, respectively.Green points represent the next search points.The color maps of the model functions represent physical property values.The color maps of the predicted curve, confidence interval, and acquisition functions indicate the values of the predicted mean, standard deviation, and acquisition functions in Equation (5), respectively.

Figure 3 .
Figure 3. Numbers of trials required to find the global maximum of the model function with P w of (a) (140 °C, 1.4 × 10 −4 Pa), (b) (100 °C, 1 × 10 −4 Pa), and (c) (60 °C, 0.6 × 10 −4 Pa) when expected improvement (ξ = 0.01) was used as the acquisition function.Lengthscale = 1, and variance = 1.Each histogram was obtained from 500 model functions with the same value of P w but different peak positions.The magnitude of noise (H 0 ) was set to 0.1% of the maximum peak height.

26 N
90% drastically decreases.ξ = 0.01 results in the lowest N 90% for all three P w values, which is smaller than the best case in UCB (κ = 10).As in the 2D case, l and v are optimized for EI with ξ = 0.01 (TableS2), and the best hyperparameter combination is l = 1 and v = 0.1-1.

Figure 4 .
Figure 4. Examples of the maximum value search process for a 3D model function with process window P w = (100 °C, 1 × 10 −4 Pa, 10 W) after (a, d) 20, (b, e) 50, and (c, f) 100 trials, where the (a)-(c) upper confidence bound (UCB, κ = 10) and (d)-(f) expected improvement (ξ = 0.01) were used as the acquisition functions.(g) Trace plot of the maximum physical property explored by using expected improvement (EI) and UCB.The magnitude of noise (H 0 ) was set to 0.1% of the maximum peak height.Lengthscale = 1, variance = 1.The sphere size and color represent the physical property value, and the red points represent the values above 90% of the global maximum.The model function is the same as that in Figure 1(b).

Table 3 .
Magnitude of noise (H 0 ) dependence of N