A novel soot concentration field estimator applied to sooting ethylene/air laminar flames

Soot formation modeling, when incorporated into computational fluid dynamics of industrial devices, can be numerically prohibitive. Nonetheless, there remains a significant push to predict soot formation, so as to aid in environmentally sustainable design. The present work features a redesign of an inexpensive soot estimator, that has been developed and applied to laminar flames with great success. It is much more accurate and efficient than previous versions. The soot estimator consists of a library generated from validated sooting flame models, in which the Lagrangian histories of soot-containing fluid parcels are stored. The library is used in post-processing to estimate soot concentrations. For the first time, the estimator framework is used to predict the entire soot field. Also, important parameters to the estimator technique are analyzed. This work is conducted for nine different sooting ethylene/air coflow diffusion flames. The framework successfully predicts the entire soot field. When the data from many flames were combined into one library based on mixture fraction, temperature, and histories, it could predict all flames with high accuracy. Finally, two scenarios were considered to assess the framework with an independent set of data, and the predictor presented very good accuracy in capturing soot formation.


Introduction
The knowledge of soot formation and oxidation has been advancing in the past decades. So too has the complexity of leading-edge models. Additionally, new emission regulations have also been pushing industry to account for particulate matter (soot) reduction in their applications (International Civil Aviation Organization, 2006;International Energy Agency, 2017;Williams & Minjares, 2016). Therefore, searching for and developing techniques to reduce soot formation and emissions has become an important concern for researchers and industry. The coupling of detailed soot modeling with the already complex fluid dynamic and combustion phenomena, which industrial applications require, can become numerically prohibitive or unattractive.
In order to reduce the computational burden of ascertaining combustion and soot characteristics, the complexity of the chemistry and therefore the time involved to predict the state of the system can be reduced. Such reduction may be obtained by employing steady state and partial equilibrium assumptions. Also, automatic reduction techniques are in common use, such as the Intrinsic Low Dimensional Manifold (ILDM) (Maas & Pope, 1992), the Steady Laminar Flamelet Model CONTACT S.B. Dworkin seth.dworkin@ryerson.ca (SLFM) (Peters, 1984) and the Flamelet Generated Manifold (FGM) (van Oijen & de Goey, 2000). Even so, using these reduced chemistry techniques does not guarantee the exact prediction of Polycyclic Aromatic Hydrocarbons (PAHs), which are gas species that are essential to modeling nucleation and condensation processes of soot particles, making the utilization of expensive detailed soot models impractical. Another approach is to reduce the complexity of the soot model, using semi-empirical models (Fairweather, Jones, & Lindstedt, 1992;Leung, Lindstedt, & Jones, 1991), but then the predictive capability of the model for a range of conditions as well as the accuracy of soot predictions may be affected.
To address this problem, a novel solution has recently been proposed by Alexander, Bozorgzadeh, Khosousi, and Dworkin (2018), which leverages the knowledge from accurate soot modeling with detailed chemistry to estimate soot concentrations. This post-processing tool uses a library, built from Lagrangian hysteresis of key parameters that correlate soot formation from inception up to oxidation, to predict soot concentrations based on existing numerical Computational Fluid Dynamics (CFD) data. A series of numerical solutions of well validated sooting laminar diffusion flames were used as a training and validation dataset. This post-processing tool was demonstrated to be computationally inexpensive and was applied to laminar flames with some success.
It is important to state that this technique is not a model for soot formation and oxidation, but rather a framework which uses a library to estimate soot properties using correlations and interpolation (Alexander et al., 2018).
In Alexander et al. (2018), the assessment of the soot estimator was carried out analyzing the soot volume fraction f v along two representative pathlines in coflow laminar diffusion flames, one at the centerline and the other passing through the point of maximum soot volume fraction, leaving the question of whether or not this technique could have broader applicability or if it can be applied to estimate the entire field of f v . Therefore, in the present study the estimator framework is extended to predict the entire field of f v in nine atmospheric laminar diffusion flames, and the accuracy of the prediction is analyzed and assessed. A new procedure is added to reconstruct the soot formation field in Eulerian coordinates from the pathlines, which are in Lagrangian coordinates. Also, important parameters to the estimator technique are analyzed, such as the number of pathlines from each validated flame that is needed to build the library, the resolution of the library, and the choice of variable histories to which f v should be correlated in order to achieve a reasonable prediction for the f v field. A rigorous validation is done at the end, in which the estimator technique is tested against with data that did not go into the library.

Numerical methods
In this section the soot estimator is explained briefly, followed by the test cases considered in this work. Then the method of post-processing the history of fluid parcels from a flame is described. An explanation follows on how this information is used to estimate soot volume fraction. Lastly, the numerical procedures are detailed.

Background:
The soot estimator is developed to aid combustion device designers in reducing soot emissions (Alexander et al., 2018). From experimental and validated numerical data available in the literature, a library is created to be used as a post-processing tool to predict soot concentrations based on existing numerical Computational Fluid Dynamics (CFD) data. This library is built from Lagrangian hystereses of key parameters that correlate soot formation from inception up to oxidation.
Theoretically, the soot formation and destruction rates can be written as a function of local characteristics at any given time t as where T is the temperature, Y i is the mass fraction of any given species i, P is the pressure, f v is the soot volume fraction, and A s is the average soot surface area. Nevertheless, due to the intrinsic non-linearity of the problem and the fact that the soot time scale is longer than the flow and gas-phase chemistry scales, the evolution of soot cannot be tabulated into a library as a function of just local field characteristics. However, if the evolution is then seen from a historypoint-of-view, theoretically, soot concentration can be determined entirely by the history of a fluid parcel in which it is contained, from before inception to oxidation or smoke emission. This idea is supported by evidences shown in Veshkini, Dworkin, and Thomson (2014) and Kholghy, Veshkini, and Thomson (2016), in which it has been shown that soot evolution has a stronger dependency on the temperature history of soot particles than just particle size and local flame temperature. In Veshkini et al. (2014) the concept of temperature aging was used to develop a new rate for soot growth processes; and in Kholghy et al. (2016) important processes for soot maturity, such as the particle dehydrogenation, PAH molecular weight growth and shell formation, were regulated by temperature-time history of particles. So, within the Soot Estimator technique, it is assumed that the local instantaneous soot volume fraction f v can be correlated to the history of thermo-chemical properties of a combustion system, or key parameters, to which a fluid parcel has been exposed as where T h is the temperature history, Y i,h is the history of species i, P h is the time-integrated history of pressure, f v,h is integrated time history of soot volume fraction by time, and A s,h is the average soot surface area history. Therefore, if one tracks the history of all key parameters it would be possible to estimate the local f v without solving soot model equations. Equation (2) presents the fundamental idea from which the Soot Estimator technique is created.

Overview:
The soot estimator consists of a library of a series of numerical solutions of well validated sooting flames. Sectional detailed soot modeling are employed as well as a detailed kinetic mechanism to predict PAHs, which are the gas species responsible for the nucleation of the soot particles (Eaves et al., 2012(Eaves et al., , 2016Saffaripour, Kholghy, Dworkin, & Thomson, 2013;Veshkini et al., 2014;Zhang, Thomson, Guo, Liu, & Smallwood, 2009). From these numerical solutions the Lagrangian histories of soot-containing fluid parcels are algorithmically gathered and stored in a library. For populating the library, different validated numerical sooting flames are used. The library is built upon the integration in time of variables that are important to soot formation, such as temperature, mixture fraction MF, C 2 H 2 , C 6 H 6 , OH and/or O 2 . Once the library is created it can be used as a post-processor in conjunction with existing numerical data so no further modeling is required if only resulting soot quantities are desired. This technique can not only be applied to estimate f v as Equation (2) suggests, but it can also applied to correlate to soot characteristics parameters, such as particle number density, primary particle diameter, etc. For the purpose of the current study only f v predictions are analyzed. More information on the soot estimator can be found in Alexander et al. (2018).

Creation of library and usage:
The creation and usage of the Soot Estimator technique can be broken down into various steps, and for completeness the main steps are presented here in a summarized manner.
For creating the library: (1) Gathering the flames (see Section 2.2 for more details); (2) Extracting pathlines (see Section 2.3 for more details); (3) Computing histories from pathlines (see Section 2.4 for more details); (4) Averaging and storing f v associated with histories into a database (see Sections 2.4 and 2.7 for more details); For using the library, similar steps to library creation are used: (1) Gathering the target flame(s); (2) Extracting pathlines; (3) Computing histories from pathlines; (4) Interpolating f v values from histories using the library database (see Section 2.4 for more details); (5) Using f v field estimation framework to compute f v field (see Section 2.5 for more details).
Most of these steps are described in Alexander et al. (2018). The most important steps for the present work are repeated and expanded here.
Advantages, disadvantages and expected errors of the technique: As more flames are being simulated and validated, more data can be added so that the library is expanded, making it the biggest advantage of this framework.
A drawback is that the accuracy of the predictions is related to how close the target flame characteristics are to the flames stored in the library.
Species conservation is explicitly accounted for in the simulations that generate the flame CFD data, which get used to generate the library. Therefore, the library contains entries for soot concentration that are borne out of data from simulations that included conservation. When the library is then used to estimate soot concentrations in a given system, the quality and accuracy of the estimation will reflect the quality of the data used to estimate soot for the system in question. For example, if experimental data is used, then conservation will implicitly be accounted for. If numerical data is used and soot concentrations are estimated to be low ( ∼ O(1ppm) or lower), then having neglected conservation will have minimal impact (Zimmer, Pereira, van Oijen, & de Goey, 2017). If the soot concentrations are estimated to be high, then the error associated with neglecting conservation will be greater. If the numerical results are tuned to experimental temperatures, which is common practice in the gas turbine industry, the error associated with neglecting conservation will be minimal (Zimmer et al., 2017).

Data source: flames
The numerical solution for nine atmospheric pressure laminar diffusion flames simulated in Veshkini et al. (2014) were used in the present work to generate the libraries and as the target flames. In Veshkini et al. (2014) detailed models were used: For the gasphase kinetic modeling, the fully coupled elliptical conservation equations for mass, momentum, energy, and species mass fraction were solved. The equations were solved in the two-dimensional (z and r) cylindrical coordinate system. The DLR detailed kinetic mechanism (Dworkin, Zhang, Thomson, Slavinskaya, & Riedel, 2011;Slavinskaya & Frank, 2009;Slavinskaya, Riedel, Dworkin, & Thomson, 2012), consisting of 93 species and 719 reactions, was used to describe the oxidation of the fuel and the formation of PAHs. Soot was modeled using a detailed fully coupled sectional aerosol dynamics model (Park & Rogak, 2004;Park, Rogak, Bushe, Wen, & Thomson, 2005), and the soot particle mass range was divided into 35 sections. For each soot section two equations were solved, one for soot aggregate number densities, and another for primary particle number densities. The nucleation process is based on the collision of two Notes: Experimental of a Santoro et al. (1983), b Smooke et al. (2005), and c Shaddix and Smyth (1996); Computed of d Veshkini et al. (2014) pyrene molecules (C 16 H 10 ) (Appel, Bockhorn, & Frenklach, 2000;Frenklach & Wang, 1994). Also, the processes of surface growth, based on the HACA mechanism (Appel et al., 2000), PAH condensation, coagulation, soot particle coalescence, and oxidation were taken into consideration. All the numerical simulations were validated against experimental data in Veshkini et al. (2014) to varying degrees. It is important to note that the flame data used to populate the library need not be frozen. As flame simulation techniques continue to advance, newly computed data can be used to either regenerate or expand the library.
These nine flames represent the classical works from Santoro, Semerjian, and Dobbins (1983), Smooke, Long, Connelly, Colket, and Hall (2005), and Shaddix and Smyth (1996). They range from pure to diluted ethylene-air laminar coflow flames and from low soot concentration (f v,max = 0.1 ppm) to moderate soot concentration (f v,max = 16.3 ppm). The computed values for peak soot volume fraction are within the experimental range. The main characteristics of the set of numerical simulations are shown in Table 1.
These computed flames were used to compute the pathlines and their histories, which then were used to build the libraries and estimate the local f v using the post-processor.

Pathlines calculation
The history of each variable can be expressed by the integral of each local variable with respect to time along a pathline traversed by a fluid parcel that may contain soot. Each pathline p is calculated from following a sootcontaining fluid parcel position X p which follows the fluid velocity u with respect to time t solving the ordinary differential equation: where each pathline p has an initial position X p,0 , and Np is the total number of pathlines.

Histories calculation, library generation and library usage
In theory, the number of key parameters used in Equation (2) could be as high as the total number of variables traceable in the detailed solution. Nevertheless, storing such an amount of information would be unfeasible. Currently, the choice of the key parameters is based on the background knowledge of soot formation, which means that histories that can track conditions for soot nucleation, growth and oxidation are preferred, and it is also partially based on a trial-and-error process (Alexander et al., 2018). So, from the previous work (Alexander et al., 2018), at least three key parameters are necessary to estimate f v , and the following are the starting point of the current study: the histories of temperature, mixture fraction MF (calculated with Bilger's expression (Bilger, 1989)) and O 2 molar fraction X O 2 . They are calculated for each pathline as: where T h is the temperature history, MF h is mixture fraction history, and X O 2 ,h is the O 2 molar fraction history. Once the histories are calculated for all Np pathlines, the histories and the associated f v values are stored in the soot estimator library, so that The range of each history is divided into a specific number of bins and the final f v value associated to each bin is averaged from the number of entry points for that bin. In the last step of the soot estimator, the dependent variables are post-processed from an existing numerical solution, using the same steps described above, and then the local f v is estimated for each position X p , for each pathline, using the library and some interpolation technique.

f v field estimation framework
The governing equations for the simulation done in Veshkini et al. (2014) were solved in Eulerian coordinates, therefore any variable φ(r, z) field is presented in r and z. From this field the pathlines are calculated through Equation (3), so that each pathline is in Lagrangian coordinates and presented in X p (t). In this calculation process each pathline p, at each time t, is associated to a specific location in r and z as X p (t) = f (r, z). These pathline points are scattered in the r and z coordinate, which means that they do not follow the uniform grid. Then using interpolation, it is possible to interpolate this scattered data over the original uniform grid (Eulerian coordinates). In the present study the nearest neighbor interpolation method with Delaunay triangulation of the data is used to retrieve the estimated f v field after the estimator library has been used for every pathline. This procedure is schematically shown in Figure 1. In Step 1, the pathlines are calculated from the Eulerian coordinates. Three pathlines are shown in color and each time step for each pathline ( X p (t) = f (r, z)) is represented as a black dot. In Step 2, the soot estimator is applied to all points of these pathlines to get the estimated f v ( X p (t)), represented as red dots. In Step 3, a scattered data interpolation technique is used to get the estimated f v field in r and z, represented as a red uniform grid.

Important parameters
In this subsection important parameters to the estimator technique are discussed and analyzed, such as the number of pathlines from each flame to build the library, the variable histories to which f v should correlate in order to achieve a reasonable f v solution, and the resolution of the library.

Pathlines: quantity and location
The number of calculated pathlines Np is determined by the level of detail required to capture the desired field (Lebedeva & Osiptsov, 2016) (the f v field in the present case). In order to do so, every pathline starts at the inlet boundary and ends up at the outlet boundary. At the inlet, every computational node is used to start every pathline and extra pathlines are needed to cover the sooting area. This sooting area is defined from the flame centerline to the extent of the wings along the radial coordinate and the pathlines. In order to ensure that the peak f v is mapped out an additional pathline is added to track the pathline that goes through the maximum f v . To study the effect of the number of pathlines needed to create the library, three different cases are tested. These cases leverage the already refined mesh of the computed domain to cover the sooting area. In order for the radial distance between each node to be kept at the same dr, it is divided by two dr/2, and then it is divided by three dr/3, and these cases are presented in Table 2.
For each flame the Np is different, since each flame requires a unique grid to capture the soot field. The number of pathlines for the Smooke flames are generally half of the other flames, as those flames are smaller and thinner than the others, and therefore need fewer grid points to be solved. Another point to note is that the extent of the wing is recalculated every time for every case, resulting in a number of pathlines that follow the ratio of every case approximately. A parametric study for the effect of the number of pathlines is done and is presented in the Section 3.2.

Library resolution
The number of bins associated to each history, or in other words the resolution of each history, is another important parameter. The more resolution in the library the less overlapping points and the more congruent the averaging process is when the library is being created. In this study, three library resolutions of 100, 200 and 300 bins (uniformly distributed), for each independent variable, were tested and the impact on the estimated f v is shown in Section 3.2.

Variable histories
In addition, the choice of the variables which are integrated is also an important factor to determine the effectiveness of the technique. From previous studies only a small set of key variables were tested, but here this set is expanded and is composed of 10 variables. The set consists of the history of: temperature T, mixture fraction MF, and the gas species of O 2 , CO, CO 2 , H 2 , H 2 O, OH, C 2 H 2 and C 6 H 6 . The first eight variables are the major variables found in any combustion system. C 2 H 2 and C 6 H 6 are added to the set, since they are the major species which lead to PAH formation and thus affect soot surface reactions. Each variable can contribute directly or indirectly to the reactions of soot formation and/or oxidation. For example: the temperature history can be a strong indicator for relative heat transfer into the particles, and the H 2 history can correlate with the surface growth of soot particles since it plays an important role in the HACA-mechanism (Appel et al., 2000). Additionally, the selection of variable histories is determined by the reduction of dimensions of the intrinsic multidimensional problem (Equation (2)), and the ability of those selected variables to track the soot formation and destruction of a range of flames. The effect of the dimensionality (number of variable histories) of the database was analyzed in the previous work of ( Alexander et al., 2018). The authors found out that the more dimensions, the better soot predictions were, but with a higher computational cost. In the current study, only three-dimension libraries are used. So, the effect of the choice of these three histories is presented in Section 3.3.

Numerical method
The pathlines,Equation (3), are computed using a second-order time integration scheme, with an adaptive time-stepping based on the Courant Number (CN = 0.5). This method ensured similar accuracy as constant timestepping of t = 10 −6 , but with fewer iterations.
A new algorithm for library creation was developed in the present work in order to achieve higher computational speed. The library creation process is composed mainly of two parts: first, sorting all entry points of the data between the bins boundaries; and second, averaging the f v values in those bins which received multiple entries. The improvement made was on the routine to sort and populate the database, since this step is the most computationally demanding. Every pathline evolves differently in the database space and only a small portion of the bins are actually populated. As each pathline evolves in a highly non-linear way, we found that we are unable to use functions to classify the entries into the bins. This information was taken into account to improve searching and then populating steps of the current populating procedure. So, in the sorting process every entry is sorted one at a time following the list of entries (order as pathline evolution) only once. Also, the bin location associated to the previous entry is used as the initial searching point for the next entry. In this way, the sorting process looks only for the bins which are closer to that entry and not through all bins of all dimensions as in (Alexander et al., 2018). With this new algorithm higher resolution is possible with less computational cost. For example, let's compare the computational time for a 3D library with 100 bins for each independent variable for the SA flame, with Np = 134, which contains 133,866 entry points. The old algorithm (Alexander et al., 2018) took 26 minutes on a standard desktop PC and the new one took only 0.32 seconds, a speed up of 4,875. The library is built from the histories of pathlines of the sooting area only, so that it only contains relevant information. All numerical implementation of this work was done in MATLAB R2018a framework.

Results and discussion
In this section the results are presented and qualitative analysis is made. The solution of the detailed soot equations reported in Veshkini et al. (2014) will be referred to as computed and the solution provided by the soot estimator will be referred to as estimated or predicted, interchangeably. For quantitative comparison, the relative errors (error = abs((φ computed − φ estimated )/φ computed ) of two important variables: 1) Peak f v , and 2) Integrated f v are used to address the difference between the computed and estimated f v fields. The integrated f v represents the volumetric integration of the f v in the entire domain and is calculated as: This variable is chosen since it brings a quantification of the overall soot formation in the entire domain. The development of the soot predictor is composed by various steps, from testing a library generated by a single flame data on that flame, up to predicting a range of flames with a single library created by the same flames. All the steps of this development are addressed in this section, as well as the important parameters for building a framework which predicts the entire f v field with quantitative accuracy. As the framework is designed for predictive work, a final step in this development is required -a rigorous test. This test is used to check its ability to predict soot formation in a system which itself did not contribute to the generation of the library.
So, in the first part of this section the framework is assessed for individual flames in order to test the overall concept of the soot prediction in the entire domain. Following that, a parametric study is conducted to see the influence of the number of pathlines and the number of bins on the soot prediction. Next, the chosen settings are used to assess the f v field framework with a single combined library and a new study is done to find the best combination of variable histories. In the last part, a rigorous validation test, with independent data, is presented.

Assessing the f v field framework: individual flame validation
Here the overall framework concept is tested for individual flames, in which the library is built from each specific flame's histories and then used to predict soot concentration within the same flame. The libraries are based on T h , MF h , X O 2 ,h , with 100 bins, and Np = dr.
The first flame assessed is the SA flame and the f v field can be seen in Figure 2. The scales are different so that graph a) can emphasize how the soot volume fraction field is located in a very narrow region, and graphs b) and c) can show that the region close to the centerline (r = 0 cm) is not predicted (graph c) as well as the computed f v field (graph b). All three graphs share the same peak f v ,  and therefore the same f v contour level, presented in the right part of graph c) from 0 up to 10.7 ppm. It is possible to see some important aspects of the prediction: First, the framework predicts both the magnitude and overall distribution of the f v field in such a small area, and second, the soot values close to the centerline are not well captured. Similar results were found for the other flames, but are not shown here. The quantitative assessment of the results, for all nine flames, is shown in the Table 3.
The table shows that the peak f v is accurately predicted for all flames and that the f v field, assessed by a volumetric integration, is captured within the same order of magnitude as the detailed solution. All flames presented integrated f v errors around 10% except for the SM32 and SM40 flames, for which the errors were higher, even though the errors on the peak f v for those flames were low. The reason for the errors is because any particular bin may contain training data from more than one streamline which gets averaged. Those bins then get used for predictions. In the combined limit of infinitely refined bins with perfectly accurate determination of dimensions and correlations, no errors should occur. However, practical constraints lead to the errors seen. Two conclusions can be drawn here; the first is that for predicting the entire domain, the integrated f v is a better indicator than the peak f v , therefore it will be used herein as the best guide to develop this work; and the second conclusion is that for capturing the entire soot field of SM32 and SM40 flames properly, more investigation into the practical constrains of the technique must be completed.

Parametric study: effect of the number of pathlines Np and the library resolution on the fv prediction
In this section the effect of the number of pathlines Np and the library resolution on predicting f v field is presented. The starting point is the case presented in the previous section, where the libraries are based on T h , MF h , and X O 2 ,h , with 100 bins, and Np = dr. Three cases of library resolution of 100, 200, and 300 bins for each variable history, and three cases of varying numbers of pathlines, presented in Table 2, are tested. The libraries are built from each specific flame's histories and then used to predict the same flame. Table 4 presents the comparison of the relative error of f v,int for each flame and case. From the table it is possible to see that: First, looking at the average values, the number of bins have a larger effect on lowering the errors than the number of pathlines added to the library. Second, the case dr/2 has a higher influence (higher sensitivity) in reducing the errors from the previous case (dr), than the dr/3 case does in reducing the errors from dr/2. Thirdly, as seen previously, the SM flames have a higher error than the other flames for 100 bins, and similar results are found for 200 bins, but these errors vanish for the higher resolution libraries. This means that for the current library based on T h , MF h , X O 2 ,h , to capture the entire f v field properly, a higher library resolution is needed. The setting of 300 bins and dr/2 is chosen for further study, since it gives the average f v,int error value below 1%.

Effect of the choice of variable history with the combined libraries
In this section the ability of the soot estimator framework is assessed on reproducing all nine flames with a single combined library, built with all pathlines from these flames. In addition, the effect of variable history choice on predicting soot concentration for all flames is addressed here. In Section 2.6, 10 key variables were selected, and 120 possible combinations of three variables at a time were tested here. Table 5 shows a detailed comparison between the baseline library combination of T h , MF h , X O 2 ,h and the one with the lowest relative errors T h , MF h , X H 2 ,h .
Some important aspects can be seen in the table. Firstly, the single library with baseline combination (MF h , T h and X O 2 ,h ) could predict the entire f v field of some flames better than others. The estimator was able to reproduce the flames with similar fuel composition and burner characteristics, SA and SY flames, while the SM flames were not accurately predicted. Secondly, the single library with the best combination (MF h , T h and X H 2 ,h ) predicted all flames with high accuracy. This is the first time that one single library accurately predicts integrated f v in such diverse flames with such accuracy. Lastly, that the substitution of the X O 2 ,h to X H 2 ,h improved the prediction of all flames, especially the SM flames. To show the improvement, the comparison of the computed soot field and the predicted by the two libraries for the SM32 flame is shown in Figure 3. Note the differing scales in the colourmaps.
It is of interest to see the great improvement in the prediction of the f v field from the baseline library combination to the one with H 2 . The distribution and the magnitude of the predictions with the baseline library were not close to the computed solution, but the library based on MF h , T h and X H 2 ,h was able to capture all flame features.
The overall improvements due to the substitution of X O 2 ,h with X H 2 ,h in the library can be explained by the importance of it in the surface growth mechanism (Appel et al., 2000) in three parts. The first, is that the history of H 2 brings a direct correlation with the surface growth of soot particles since it plays an important role in the soot HACA mechanism. Second, the history of H 2 is also correlated with other species due to the chemical mechanism, bringing additional correlation of the gas species to soot formation and oxidation reactions. Third, O 2 species could not provide enough information in areas of the flame where it has low concentration and soot formation is of importance, such as at the centerline. Also, O 2 is specially important only in the soot oxidation region, after soot has nucleated and grown. The last point is that when all three variables MF h , T h and X H 2 ,h were combined a three-dimensional space highly correlated with soot formation was created, as perceived by Equation (2), independently of the flame characteristics. In order to provide an overall view of the effect of the other species and more options for the choice of variables, the five best, the baseline and the worst combinations are shown in Table 6.
From the table is interesting to see that all first five share similar characteristics, either in participating in the thermal exchange of energy T h , main combustion products CO 2 and H 2 O, intermediate product CO, or in the HACA mechanism as C 2 H 2 , H 2 and CO. Nevertheless, the most important aspect is that all five incorporate at least one species that participates in the soot growth processes, which can not be seen in the baseline (Rank 112) and in the worst composition (Rank 120).

Discussion about the Library
For the best library combination, the averaging process on those bins that have multiple entries produced a mean value with a variation of 9.6%, with a confidence of 2 standard deviations. This small variation indicates that most of the points averaged are close to the mean, and that the outlier points do not create excessive noise on the final stored value. During library formation, only a small part of it is populated, 0.67%, and during the process of predicting, an even smaller space is accessed to obtain the soot concentration to every single flame. This is expected, since the boundary limits are broad for considering all nine flames, each dimension is orthogonal to each other, and the bins are uniformly spaced along the domain.

Validating the combined library
In this section the chosen framework configuration is assessed in two scenarios in which the library is validated against an independent set of data. As the framework is designed for predictive work, it is necessary to rigorously test its ability to predict soot formation in a system which itself did not contribute to the generation of the library.

Predicting new flames:
In this scenario one flame from the dataset is selected as the target flame and the library is built from the remaining flames. In this case, the ability of the framework to predict the f v field on new or untrained data is assessed. This exercise stresses some important aspects of the technique, such as its ability to predict new flames and its dependency on the data that was used to create the library. Table 7 shows the relative error in predicting each of the nine flames with a library built from the remaining eight. Table 7 shows the relative error of integrated f v , and, additionally, the error on the peak f v to complement the assessment. Some important aspects can be drawn to attention. First, the ability of predicting the overall soot formation for all flames; all presented errors are lower than 19%. Second, the predicted peak f v were all within the same order of magnitude, and for SA and SY41 flames the relative error was below 5%. Third, some flames shared similar characteristics and therefore were predicted with a higher accuracy, for example SA and SY41. For other flames for which flame characteristics and fuel composition were not found in the training set, the predicting capability of the soot estimator was more vigorously tested, such as flame SM60 for which reasonable errors of less than 20% were found for both integrated and peak f v . It is important to state that the ideal case for using the soot estimator technique is to have a greater number of flame datasets in order to extrapolate the predictive capabilities to new flames. Nevertheless this assessment with only eight flames was adequate to demonstrate the potential of the technique.

Cross validation technique:
In the cross validation technique, the full dataset, with all nine flames, is partitioned into two random independent sets, one for building the library (or training) and the other for validating. In this scenario the predicting points will be found within the boundaries of the dataset, making this test a different assessment than the previous one. Here the k-fold cross validation (Geisser, 1975) is applied to assess the accuracy of the framework. In this technique the dataset, after being randomized, is divided in k groups, or folds, of similar size where (k−1) folds are used for training and the last k fold is used for evaluation. This process is repeated k times, and during the iteration each of the k folds is assigned as validation data.
For a less biased prediction (Rodriguez, Perez, & Lozano, 2010) k is set equal to 10 in this study, and all flames combined generated a full data set of 607,604 entries, so for each of the ten iterations, the training set is composed by 546,844 entries and the validation set by 60,760 entries.
The accuracy is measured with the relative error of f v (error = abs((f v,computed − f v,estimated ) /f v,computed ))) for every entry of the validation set. This is a more rigorous test than the ones of the previous sections, since every single entry is assessed instead of the overall f v field prediction. Nevertheless, this assessment also brings a challenge, since the exact prediction of f v close to numerical zero values results in errors and does not bring any valid information to the usage of this soot estimator. So, a threshold of f v = 0.0001 ppm (or 0.1 ppb) is used to limit this behavior. The accuracy of the technique is shown in Table 8. Table 8 presents a lot of information, but some results can be highlighted. First, the cross validation technique creates unique independent validation data for every iteration resulting in a less biased evaluation of the soot predictor. For every iteration the statistics of the error are different, especially the standard deviation which varies widely from 165 to 645. Second, looking at the averaged error values, it is observed that most of the errors are very small (median of 0.31), but there are some points for which the errors are larger resulting in an increase of the overall mean to 8.27% and the standard deviation to 314.98. This effect can be seen in the last three columns, where most of the predicted f v points have low errors.
It is important to remember that it is common to find measurement errors of about 30-40% on f v presented in the literature. If that is taken into account on using the present soot estimator, the predictor presented very good accuracy.

Conclusions
In the present study, the estimator framework was extended to predict the entire flow field of f v in nine laminar diffusion flames. Important parameters to the estimator technique were also analyzed, such as the number of pathlines from each flame to build the library, the library resolution and the choice of variable histories in order to achieve a reasonable f v field prediction. Also, a rigorous validation was conducted in order to assess the technique against independent data. The framework to predict the entire f v field was successfully implemented and showed quantitative accuracy on peak f v , integrated f v , and the entire f v field for individual flames. The integrated f v showed to be a good indicator of how accurate the f v field estimator is. For individual flame solutions, it was shown that a library with 300 bins and Np = dr/2 gave average f v,int error value below 1%.
When the data of all flames was combined into one library, the single library with the baseline combination (MF h , T h and X O 2 ,h ) was able to predict the entire f v field of some flames better than others. The single library, with the combination of MF h , T h and X H 2 ,h , predicted all flames with high accuracy for the first time.
Finally, two validation scenarios were applied to assess the accuracy of the framework in a system which itself did not contribute to the generation of the library. In the first, flames were predicted from libraries which did not have that flame dataset. And the second, random datasets were assessed with a k-fold cross technique. The predictor presented very good quantitative accuracy in capturing soot formation, specially for cases where flames characteristics were similar to the ones in the training dataset.
The technique has demonstrated its potential to work with different flame characteristics and fuel compositions. For further usage, flame datasets similar to the target problem must be gathered so that the library can be improved.

Notation
The following symbols are used in this paper:

Disclosure statement
No potential conflict of interest was reported by the authors.