Estimating CO2-Brine diffusivity using hybrid models of ANFIS and evolutionary algorithms

One of the important parameters illustrating the mass transfer process is the diffusion coefficient of carbon dioxide which has a great impact on carbon dioxide storage in marine ecosystems, saline aquifers, and depleted reservoirs. Due to the complex interpretation approaches and special laboratory equipment for measurement of carbon dioxide-brine system diffusivity, the computational and mathematical methods are preferred. In this paper, the adaptive neuro-fuzzy inference system (ANFIS) is coupledwith five different evolutionary algorithms for predicting the diffusivity coefficient of carbondioxide. TheR2 values forthe testingphase are 0.9978, 0.9932, 0.9854, 0.9738and0.9514 for ANFIS optimizedbyparticle swarmoptimization (PSO), genetic algorithms (GA), ant colonyoptimization (ACO), backpropagation (BP), and differential evolution (DE), respectively. The hybrid machine learning model of ANFIS-PSO outperforms the other models. ARTICLE HISTORY Received 15 May 2019 Accepted 22 January 2020


Introduction
Recently, the utilization of environmentally friendly and green energy has been accelerated because of the pollution crisis and increasing global energy demand. The carbon dioxide shows the valuable and wide application CONTACT Shahab S. shamshirbandshahaboddin@duytan.edu.vn, amir.mosavi@mailbox.tu-dresden.de This article has been republished with minor changes. These changes do not impact the academic content of the article.
Supplemental data for this article can be accessed here. https://doi.org/10. 1080/19942060.2020.1774422 in this issue (Chang et al., 2012;Hemmati-Sarapardeh et al., 2013;Li & Gu, 2014). One of the common methods for extracting methane from its hydrates without damaging to the marine is replacing carbon dioxide instead of methane hydrate (Bai et al., 2012;Ota et al., 2007). The utilization of carbon dioxide for heat transmission is more applicable for extraction heat from hot fractured rock respect to water Pruess, 2006;Zhang et al., 2016). Also in the other viewpoint the carbon dioxide injection to the geothermal reservoirs and seabed environment has straight effect on reduction of carbon dioxide emission (Agartan et al., 2015;Eccles et al., 2009;Javadpour, 2009;Rau & Caldeira, 1999;Ren et al., 2015;Ren et al., 2014;Trevisan et al., 2014a). When carbon dioxide has contact with water interface it can diffuse through the water so the diffusion coefficient is known as a major parameter which effects fluid diffusivity (Farajzadeh et al., 2009;Mutoru et al., 2011). This factor has a dominant effect on chemical reactions and mass transfer in porous media and solutions(S. P. Cadogan et al., 2014a;Trevisan et al., 2014b).
To overcome these problems some empirical relations have been proposed as reported in Table 1 (S. P. Cadogan et al., 2014b;Lu et al., 2013;Moultos et al., 2016;Othmer & Thakar, 1953;Wilke & Chang, 1955). D CO2 , μ, and V m is the carbon dioxide diffusion coefficient, the viscosity of solvent and molar volume. , M, n SE and denote the association parameter, Molecular weight of solvent, stokes-Einstein number and hydrodynamic radius of solute.
The dominant operational conditions such as temperature and pressure have strong effects on CO 2 diffusion during mass transfer so there are significant differences between diffusivity of carbon dioxide under high pressure and temperature and normal conditions. The salinity of the water is known as another critical parameter that effects CO 2 molecular diffusion (Cadogan, 2015). Wilke et al. suggested a predicting model for the estimation of diffusion coefficients of carbon dioxide in different viscosity and temperature (Wilke & Chang, 1955). Lu and coworkers developed a relation for the diffusion of carbon dioxide into pure water in the range of 268-473 K (Lu et al., 2013) for temperature. Moultos et al. utilized dynamics simulations to modify the Lu's equation and proposed a relation to forecast the diffusion coefficient of CO 2 and water system in the elevated temperatures and pressures (Moultos et al., 2016). In order to accurate determination of carbon dioxide diffusivity in the brine system, the viscosity of saline solution must be considered so Lu and Moultos correlations are not applicable directly for the CO 2 -brine system. Cadogan et al. made a modification on Stokes-Einstein equation to predict diffusivity of CO 2 -brine system but it is not reliable for high temperatures (S. P. Cadogan et al., 2014b).
The previous works have some limitations in the accuracy of the range of their applications and also the experimental works on this issue require a considerable amount of time and cost. Due to these facts, the importance of the development of an accurate and comprehensive method that has minimum cost has been highlighted. In order to accurate estimation of carbon dioxide diffusivity in solution with various salinities, the importance of proposing a general and accurate predictive algorithm becomes highlighted. On the other hand, the wide applications of artificial intelligence approaches in different parts of engineering processes and proposing the solutions for difficult issues (Baghban et al., 2016;Baghban et al., 2017a;Chau, 2017;Chuntian & Chau, 2002;Fotovatikhah et al., 2018;Moazenzadeh et al., 2018;Taherei Ghazvinei et al., 2018;Yaseen et al., 2018), cause artificial intelligence methods can be considered as an approach for estimation of CO 2 diffusivity in different conditions. First, a collection of carbon dioxide diffusivity coefficients was gathered, then, we utilized ANFIS to build a model for the prediction of carbon dioxide diffusivity coefficients in the brine system. The predicting algorithm is optimized by the utilization of five evolutionary algorithms which are DE, ACO, GA, PSO, and BP. The ability of each developed algorithm is assessed by utilization of graphical and statistical analyses and also compared with the available correlations in the literature. In order to clarify the present work, different steps are shown in Figure 1.

ANFIS
Zadeh introduced FL. The main characteristic of FL is known for the advancement of the linguistic parameters into the forms of mathematical equations. The regulations of FL include a number of if-then operators to change the quantitative parameters into qualitative. Due to a lack of accuracy in the modeling process, this approach could not satisfy scholars for prediction so the coupling FL and ANN were suggested as an efficient and accurate algorithm. The proposed tool which has potentials of FL such as MFs and if-then regulations called ANFIS. ANN is implemented to tune membership functions (Shamshirbandet al., 2019;Karkevandi-Talkhooncheh et al., 2017). Takagi-TSK as one of the FIs was applied in this study. TSK used to input and output patterns for if-then rules Nikravesh et al., 2003). Such ANFIS models have been widely used in building advanced prediction mod-els in the relevant applications (Dehghani et al., 2019;Mosavi & Edalatifar, 2018;Rezakazemi et al., 2019). Figure 2 illustrates the structure of ANFIS network which has n input data. X i and Y i represent the input and output of this network. The below rules are used to construct a new model in TSK FIS: Rule one: IF < X 1 , is A 11 , and X 2 , is A 21 , and . . . and X 5 , is A 51 > THEN < y 1 = a 1 X1 + b 1 X 2 + c 1 X 3 + d 1 X 4 + e 1 X 5 + f 1 > Rule two: IF < X 1 , is A 12 , and X 2 , is A 22 , and . . . and X 5 , is A 52 > THEN < y 2 = a 2 X 1 + b 2 X 2 + c 2 X 3 + d 2 X4 + e 2 X 5 + f 2 > Rule three: IF < X 1 , is A 13 , and X 2 , is A 23 , and . . . and X 5 , is A 53 > THEN < y 3 = a 3 X 1 + b 3 X 2 + c 3 X 3 + d 3 X 4 + e 3 X 5 + f 3 > Rule four: IF < X 1 , is A 14 , and X 2 , is A 24 , and . . . and X 5 , is A 54 > THEN < y 4 = a 4 X 1 + b 4 X 2 + c 4 X 3 + d 4 X 4 + e 4 X 5 + f 4 > Rule five: IF < X 1 , is A 15 , and X 2 , is A 25 , and . . . and X 5 , is A 55 > THEN < y 5 = a 5 X 1 + b 5 X 2 + c 5 X 3 + d 5 X 4 + e 5 X 5 + f 5 > Rule fix: IF < X 1 , is A 16 , and X 2 , is A 26 , and . . . and X 5 , is A 56 > THEN < y 6 = a 6 X 1 + b 6 X 2 + c 6 X 3 + d 6 X 4 + e 6 X 5 + f 6 > Rule seven: IF < X 1 , is A 17 , and X 2 , is A 27 , and . . . and X 5 , is A 57 > THEN < y 7 = a 7 X 1 + b 7 X 2 + c 7 X 3 + d 7 X 4 +e 7 X 5 +f 7 > Rule eight: IF < X 1 , is A 18 , and X 2 , is A 28 , and . . . and X 5 , is A 58 > THEN < y 8 = a 8 X 1 + b 8 X 2 + c 8 X 3 + d 8 X 4 + e 8 X 5 + f 8 > The phrases after IF and THEN represent antecedent and consequences respectively. A and B denote the fuzzy model sets of inputs. Figure 2 is carried out to show ANFIS model has five different layers which can be described in details as below (Dadkhah et al., 2017;Safari et al., 2014;Tatar et al., 2016): The changing of input data can be used for the creation of linguistic terms. The n nodes have been created to relate the linguistic attributes of input data. As the Gaussian functions are efficient with higher performance in prediction for algorithm so it is implemented to organize these linguistic attributes. The Gaussian functions are presented as below: (1) Here, σ , Z and O parameters and denote to the variance, center for Gaussian MF, and output, respectively. They can be determined during the model training.
Layer 2: The controlling of the accuracy and performance of qualifications is located in the second layer, the firing strength layer can be mentioned as another name for this layer, the following formulation explains the determination level: Layer 3: The normalization of firing strength which is placed in third layer can be formulated as below: Layer 4: The creation of linguistic terms is conducted within this layer. Furthermore, the effect of rules on outputs is presented as follow: In the above expression, n i , m i, and r i relate to linear parameters. In addition, the model has been developed for the purpose of reduction of the difference between the measured data and the forecasted values to optimize the parameters.
Layer 5: The alteration of rules to qualitative form by weighted average summation takes place in this layer as follows: To clarify the proposing of the algorithm, the schematic diagram of ANFIS coupled with the evolutionary algorithm is shown in Figure 3.

BP
BP minimizes the error by reducing the difference between estimated values and experimental data. to this end, the error is propagated back by optimization of variables in the network and biases and weights are adjusted to minimize the objective function (Afshar et al., 2014).

ACO
Dorigo introduced the ACO algorithm as an approach that is based on a population algorithm (Dorigo et al., 1996). ACO method is developed based on the natural behavior of ants to determine the shortest ways from their storage to food resources (Stützle & Hoos, 2000). Ants release a pheromone in their ways so the population can find probable paths or solutions. This approach can be applied for discrete space, not for the continuous domain. Due to this fact, a Gaussian model was proposed by the construction of a list of probable solutions to generalize this approach in continuous space. Some special solutions exist in solution archives all over time  (Heris & Khaloozadeh, 2014). These solutions can be obtained by optimization algorithm.the minimization of cost function as the first step of the ACO approach starts with finding the vector x ∈ X ⊆ R n x by the below steps (Lozano et al., 2006): 1. Initialization: generation of random N definitions in X and evaluation of cost function. 2. Solution archive: the best of the solution is described by x 1 and the worst of them is represented by x N . 3. Weight definition: the weights for members of solution archive are determined as follows: Based on the below condition: 4. Generating the probabilistic model: the Gaussian probabilistic model is constructed by the below formulations: where j and x[j] denote the decision variable and the jth part in x.The standard deviation and mean of Gaussian mixture are expressed in the following: where ξ > 0 denotes the exploration/exploitation balance.
5. New samples are generated by using the model G = (G 1 ; G 2 ; . . . G n x ) as new members of the solution archive and their objective functions are evaluated in this part. 6. The solution for the optimization problem is chosen by selecting the best solution archive. 7. The Criterions are checked to end the optimization or restart from part 3.

DE
Differential evolution as one of the major metaheuristics methods utilizes fundamental genetic operators to investigate a population which does not require the optimization problem to be differentiable. The differential evolution method is described below (Das et al., 2011): 1. The first iteration starts as mutation factor F and crossover rate CR are selected and fitness of elements F i k = f (X i k ) for each I are determined. 2. The mutation operator is used to generating a trial solution as below: 3. The crossover operator is employed on solutions as below: U i,j = TX ij if r and () ≤ CR or j = r and b X k ij if r and () > CR and j = r and b (13) r and b represents the random index of best element b which is obtained from r and b = r and (D) ∈ {1, 2, . . . , D4 4. Trial fitness TF i = f (u i,1 , u i,2 , . . . , U i,D ) is calculated and the selection operator is applied on trial solution as below:

Fitness estimation
is done and the best index is distinguished. 6. The end criterion is checked and the explained loop restarts from part 3 if it is necessary.

GA
One of the optimization approaches which has great potential in optimizing various objective function is GA. The initial solutions of this approach which are called chromosomes are generated randomly and some actions are done on chromosomes by means of operators such as mutation, crossover, and reproduction. In this algorithm, to determine the probability of offsprings production, the CF and MF can be used (Alam et al., 2015;Bedekar & Bhide, 2011).
(1) An initial solution is created randomly and CF and MF factors are generated. (2) In the second step the Chromosomes fitness, (3) the various GA operators produce the new Chromosomes X k+1 i , ∀i (4) the best one b1 is determined by a fitness assessment In the fifth step, the old chromosome is replaced by the new best one. (6) The algorithm continues the loop to meet the best conditions.

PSO
PSO is considered as a stochastic optimization which mimics the populations in nature such as fish, birds and insects (Kennedy, 2011). In the PSO algorithm, promoting the initial populations is known as an optimization problem and particles represent the solutions (Castillo, 2012). A group of particles denotes swarm so the particle and swarm terms represent individual and population has extensive application in this algorithm and genetic algorithm (Onwunalu & Durlofsky, 2010;Sharma & Onwubolu, 2009). In PSO algorithm, X i (t) and V i (t) denote position vector and velocity vector for particle i. The following equation is used for updating the particle velocity (Chen, 2013;Lin & Hong, 2007): In Equation (16), p best,id and g best,id represent the best position of particle and global position r, c and w are the random number, learning rate and inertia weight. The below expression shows the next particle position (Chiou et al., 2012;Eberhart & Kennedy, 1995;Shi & Eberhart, 1998): 1, 2, . . . , D (16)

Data gathering
Efficiency and accuracy of any estimating algorithm are highly dependent on the accuracy of experimental data used for the development of the predictive model   ( Das et al., 2011;Hajirezaie et al., 2017;Karkevandi-Talkhooncheh et al., 2017). So we gathered 86 actual data for the carbon dioxide diffusion coefficient in the brine system from the reliable papers (S. P. Cadogan et al., 2014b;Choudhari & Doraiswamy, 1972;Maharajh, 1973;Frank et al., 1996;Himmelblau, 1964;Hirai et al., 1997;Lu et al., 2013;D. M. Maharajh & Walkley, 1973;Nijsing et al., 1959;Reddy & doraiswamy, 1967;Tamimi et al., 1994;Tan & Thorpe, 1992;Thomas & Adams, 1965;Versteeg & Van Swaaij, 1988;J. Vivian & Peaceman, 1956). The experimental data consist of diffusion coefficients of carbon dioxide in the brine system as a function of viscosity, pressure, and temperature. The collected experimental data includes the diffusion coefficients for a pressure range of 0.1-49.3 MPa, a temperature range of 273-473.15 K and a viscosity range of 0.1388-1.95 mPa.s. In order to enhance the performance of training, the experimental data are normalized such as below: The degree of membership functions for different inputs of algorithms are shown in Figure 4.

Results and discussion
In this paper, the ANFIS algorithm was combined with GA, ACO, PSO, DE, and BP to estimate the diffusivity coefficient in terms of temperature, pressure, and viscosity. In order to investigate the performance of the developed models, different statistical parameters such as the R 2 , MAREs, RMSEs and STD were calculated and reported in Table 2. The aforementioned indexes were determined such as following formulations:  Table 3. The comparisons of these parameters exhibit that the proposed ANFIS algorithm combined with an appropriate evolutionary algorithm can be better predictive machine respect to the other published correlations.
The experimental and estimated diffusion coefficients are demonstrated simultaneously in Figure 5 for different evolutionary algorithms. This comparison expresses good consistency between actual and predicted diffusivity coefficients for proposed algorithms. In order to illustrate the consistency, the experimental diffusivity is depicted against predicted diffusivity in Figure 6 for ANFIS with different evolutionary algorithms. The resulted points for total data lie near to the bisector of the first quadrant which means the high capacity and great performance of algorithms. Furthermore, the determined fitting lines have formulations near the y = x line. One of the great approaches in the graphical analysis is the depiction of relative deviations for total datasets. So the relative deviations of predicted and actual diffusivity coefficients of different algorithms are shown in Figure 7. This analysis shows the low errors in the prediction of diffusivity for ANFIS combined with different algorithms. The ranges of relative error for different optimization processes are very low and also the PSO algorithm has better performance in comparison with others.   As mentioned before one of the important factors which have great impact on accuracy and performance of the model is accuracy and error of measured data points (Rousseeuw & Leroy, 2005), however the measured data utilized in the present paper were gathered from the reliable literature, due to experimental equipment and conditions they might contain some errors. Thus, here, the Leverage method has been employed to determine the inaccurate measured data. Referring to this computational method, the residuals are calculated and then inputs are utilized to organize Hat matrix by the below expression (Mohammadi et al., 2012b): In the above expression, X is known as the m×n matrix that n and m are the model parameters number and number of samples respectively. In order to the identification of inaccurate data and outliers of the dataset, William's diagram was plotted for outlier realization. As shown in Figure 8, the used data has some inaccurate and suspected data. These data are distinguished through standard residual indexes and limitations of the Leverage which are between −3 and 3. The leverage limit presented by H* can be determined by the below expression (Mohammadi et al., 2012a): The validity and generality of the proposed model for the diffusivity coefficient of carbon dioxide prediction at different conditions are illustrated in this study. The Relevancy factor is utilized to study the impact of input variables on diffusivity. The Relevancy factor can be formulated as following (Bemani et al., 2019;Razavi et al., 2019b;Razavi et al., 2019a): where Y i ,Ȳ,X k,i and X k denote the 'i' th output, output average, kth of input, and average of input. The range of absolute value of r is lower than one and as the absolute value of r is near to one the effectiveness of the input parameter on the diffusion coefficient is higher. Figure  8 shows the Relevancy factor of the diffusivity coefficient in terms of different parameters for ANFIS combined with five different evolutionary algorithms. This analysis shows that the temperature is the most effective input parameter on the diffusivity coefficient of carbon dioxide. Furthermore, temperature and pressure have a straight relationship with the diffusivity coefficient of carbon dioxide and also it can be seen that as viscosity increases, the coefficient decreases. According to the  above discussions, a present study is a helpful tool for engineers and scientists (Figure 9).

Conclusions
We proposed the ANFIS coupled with five different evolutionary algorithms to estimate CO 2 diffusivity in the brine system. In order to train and test these models, a total number of 86 measured data were collected from previous works. The different statistical indicators and graphical analysis demonstrated the high efficiency and accurate ability of proposing models. The determined R 2 values in the testing phase were 0.9978, 0.9932, 0.9854, 0.9738 and 0.9514 for ANFIS optimized by PSO, GA, ACO, BP, and DE respectively. Also for general evaluation of model four empirical correlations from previous papers were utilized and compared with the proposed algorithms. According to the aforementioned results, it can be concluded that proposed algorithms have better performance in the prediction of carbon dioxide diffusion coefficients. Moreover, sensitivity analysis showed that temperature and pressure are the most and least effective parameters on the carbon dioxide diffusion coefficient. As a recommendation for future works, it can be suggested that other machine methods can be used for the prediction of the carbon dioxide diffusivity coefficient. However, it is worthy to mention that these methods have a limitation or drawback in comparison with experimental works which is a dependency of the accuracy of the proposed models on the accuracy and amount of databanks.