Modified aquila optimizer for forecasting oil production

ABSTRACT Oil production estimation plays a critical role in economic plans for local governments and organizations. Therefore, many studies applied different Artificial Intelligence (AI) based methods to estimate oil production in different countries. The Adaptive Neuro-Fuzzy Inference System (ANFIS) is a well-known model that has been successfully employed in various applications, including time-series forecasting. However, the ANFIS model faces critical shortcomings in its parameters during the configuration process. From this point, this paper works to solve the drawbacks of the ANFIS by optimizing ANFIS parameters using a modified Aquila Optimizer (AO) with the Opposition-Based Learning (OBL) technique. The main idea of the developed model, AOOBL-ANFIS, is to enhance the search process of the AO and use the AOOBL to boost the performance of the ANFIS. The proposed model is evaluated using real-world oil production datasets collected from different oilfields using several performance metrics, including Root Mean Square Error (RMSE), Mean Absolute Error (MAE), coefficient of determination (R2), Standard Deviation (Std), and computational time. Moreover, the AOOBL-ANFIS model is compared to several modified ANFIS models include Particle Swarm Optimization (PSO)-ANFIS, Grey Wolf Optimizer (GWO)-ANFIS, Sine Cosine Algorithm (SCA)-ANFIS, Slime Mold Algorithm (SMA)-ANFIS, and Genetic Algorithm (GA)-ANFIS, respectively. Additionally, it is compared to well-known time series forecasting methods, namely, Autoregressive Integrated Moving Average (ARIMA), Long Short-Term Memory (LSTM), Seasonal Autoregressive Integrated Moving Average (SARIMA), and Neural Network (NN). The outcomes verified the high performance of the AOOBL-ANFIS, which outperformed the classic ANFIS model and the compared models.


Introduction
Forecasting oil production is crucial for petroleum engineers to alleviate the blind expenditure, ensure long-term development and maintain and monitor the life cycle of the oil reservoir. In addition, the reservoir parameters, including permeability, porosity, water saturation, type of crude oil, and reservoir heterogeneity, are considered the influence factors affecting the accuracy of forecasting oil production (Haider 2020). In the oil industry, various traditional approaches are employed to forecast oil production, including Numerical Reservoir Simulation (NRS) and Decline Curve Analysis (DCA) (Cumming 2013;Chong et al. 2017;Cancelliere, Verga, and Viberti 2011). NRS and DCA are the most common utilizing approaches to forecast oil production. However, these traditional approaches have limitations and obstacles to predict oil production accurately (Nwaobi and Anandarajah 2018). The application of NRS is presented as a reliable approach compared to the other conventional techniques. The NRS mainly depends on the accuracy of the static geological model and the quality of history matching in the dynamic model (Hutahaean, Demyanov, and Christie 2016;Al Rassas et al. 2020;Shao, Wu, and Li 2021). Moreover, the achievement of constructing an accurate 3D geological model is a cumbersome and challenging task (Hutahaean, Demyanov, and Christie 2017;Zhang et al. 2016). Furthermore, the DCA approach (Zhang et al. 2016;Wachtmeister et al. 2017) can predict the hydrocarbon (H.C) production rate by assessing the long-term H.C production data. In addition, the DCA approach employed the empirical equations to match the historical production volumes with different models, including exponential, hyperbolic, and harmonic models (Tomomi 2000). The applications of artificial intelligence (AI) in the oil and gas industries have grown very dramatically (Alkinani et al. 2019;Dela Torre, Gao, and Macinnis-Ng 2021), specifically, in predicting oil production (Ahmadi and Bahadori 2015;Montgomery and O'sullivan's 2017;Liu, Liu, and Gu 2020;Song et al. 2020), predicting Petrophysical properties, such as porosity and permeability (Erofeev et al. 2019;Ahmadi and Chen 2019), optimizing well placement and oil production (Ahmadi and Bahadori 2015;NwachukNwachukwu et al. 2018), and predicting of Pressure-Volume-Temperature (PVT) properties (El-Sebakhy 2009).
Moreover, in the literature, many studies have been presented for predicting oil production. For example, Fan et al. (2021) presented a new model by integrating Autoregressive Integrated Moving Average (ARIMA) and Long Short-Term Memory (LSTM) to predict oil production. Alalimi et al. (2021) presented an optimized Random Vector Functional Link (RVFL) for time series forecasting for Tahe oilfield, China. Liu, Liu, and Gu (2020) and Song et al. (2020) employed LSTM-based models to predict oil production using historical production datasets. Sagheer and Kotb (2019) introduced an efficient deep learning approach called DLSTM to overcome the cumbersome of conventional forecasting tools. Zhang and Hu (2021) introduced a new forecasting model using Multivariate Time Series (MTS) and Vector Autoregressive (VAR) to predict the oil production for water flooding reservoirs. Also, in (Wang, Song, and Li 2018), the authors developed a hybrid forecasting model by employing the Nonlinear Metabolism Gray Model (NMGM) and ARIMA.
The Adaptive Neuro-Fuzzy Inference System (ANFIS) is a well-known technique that has been employed in different applications, including timeseries prediction and forecasting applications, such as wind speed prediction (Liu, Tian, and Li 2015), river flow prediction (Belvederesi et al. 2020) airoverpressure prediction (Harandizadeh and Armaghani 2021) and others (Asl, Masomi, and Tajbakhsh 2020;Betiku et al. 2016;Singh et al. 2020). However, the conventional ANFIS model faces some shortcomings in its parameters configuration. The configuration process is very important and it has significant impacts on the quality of solutions as well as the training process. Thus, the applications of optimization methods can enhance the configuration process.
In this paper, we propose a modified Aquila Optimizer (AO) (Abualigah et al. 2021) using the Opposition-Based Learning (OBL), called AOOBL, to optimize ANFIS parameters and to boost its forecasting accuracy. In general, the AO algorithm has a high exploitation ability. However, its ability to explore the search space needs more improvements, so we use the OBL. The developed model, AOOBL-ANFIS, is applied to forecast oil production from different oilfields. In the developed AOOBL optimization algorithm, the OBL is employed to enhance the traditional AO algorithm's search process and avoid trapping at local optima. We used real-world oil production datasets from two different countries, China and Yemen, to evaluate the proposed AOOBL-ANFIS model. Additionally, we applied several modified ANFIS models using well-known optimization algorithms to assess the performance of the AOOBL method.
In this study, our main contributions are: • An efficient time-series forecasting approach, called AOOBL-ANFIS, to forecast oil production based on historical production data. • A new variant of the AO algorithm based on the OBL technique, which is used to boost the performance of the AO algorithm. • Enhance the performance of the traditional ANFIS model by using the developed AOOBL algorithm. The rest sections of the study paper are presented as follows. A number of related works are presented in Section 2. The preliminaries of the ANFIS, AO, and OBL are given in Section 3. The description of the developed AOOBL-ANFIS model is introduced in Section 4. Section 5 presents the evaluation experiments. The conclusion is presented in Section 6.

Different oil production forecasting techniques
In this section, we recap a number of the recently proposed methods employed for forecasting oil production. Singh, Seol, and Myshakin (2021) introduced a new method that could predict gas hydrate saturation (S h ) for any well by using different settings such as bulk density, porosity, compressional wave (P wave) velocity well-logs neural networks. The study results revealed that the accuracy of the developed methods in prediction S h was 83%. Al-Shabandar et al. (2021) proposed a new forecasting model to predict oil production using a deep Gated Recurrent Unit (GRU). The employed GRU comprises several hidden layers, in which each layer has a set of nodes. The proposed model has a simple structure and can identify time series datasets with long intervals. In (Alalimi et al. 2021), an enhanced version of the Random Vector Functional Link (RVFL) using Spherical Search Optimizer (SSO) was proposed to forecast oil production. This model was evaluated with oil production datasets collected from Tahe oilfield, China. McKenna et al. (2020) studied three different forms of uncertainty, such as facies geometry, reservoir rock heterogeneity, and permeability distribution, to determine their impact on the evaluation and prediction of reservoirs.
Different techniques, including Sequential Gaussian Simulation (SGS), Kriging, and probability-field (p-field), were employed to estimate previous uncertainty levels. Liu, Liu, and Gu (2020) introduced a reliable and accurate prediction model of oil production relying on empirical mode decomposition ensemble and LSTM. In (Negash and Yaw 2020), an Artificial Neural Network (ANN) based model was employed to forecast oil production, which involves a physicsbased extraction of features for fluid production prediction to enhance the prediction effect. Zanjani, Salam, and Kandara (2020) used three methods to forecast oil production, including ANN, Linear Regression (LR), and Support Vector Regression (SVR). The results revealed that all three methods achieved acceptable prediction results, where the best results were obtained by the ANN method. Abdullayeva and Imamverdiyev (2019) developed an oil forecasting model using a hybrid approach of Conventional Neural Network (CNN) and LSTM. Fan et al. (2021) developed a hybrid model using ARIMA and LSTM to forecast oil production. Moreover, different methods have been utilized for oil production forecasting, including LSTM (Liu, Liu, and Gu 2019; Sagheer and Kotb 2019), nonlinear Autoregressive Exogenous Model (NARX) (Heghedus, Shchipanov, and Rong 2019), and Higher-Order Neural Network (HONN) (NC et al. 2013).
However, the applications of traditional machine learning and advanced deep learning models require more data to train the model. Therefore, the ANFIS model performs better when the size of the data is small.

Applications of ANFIS model in time series forecasting approaches
This section summarizes some of the ANFIS applications in different time series approaches and the oil industry. Shojaei et al. (2014) applied the ANFIS model to estimate reservoir oil bubble point pressure. They used 750 time-series data collected from different locations to evaluate two modified versions of the ANFIS. Also, they compared the ANFIS to several techniques to approve its performance. Yavari et al. (2018) applied Hareland-Rampersad and Bourgoyne and Young models with ANFIS for drilling rate prediction. They used datasets from the South Pars gas field, Iran.
Additionally, they compared this approach to several well-known rates of penetration prediction methods. They found that ANFIS-based methods outperformed other methods. Kumar (2021) investigated Karanja oil using different conditions, namely, volume, catalyst, time, oil molar ratio, and microwave power for producing biodiesel. The ANFIS was applied to the prediction and modeling processes, which showed significant performance. In (Al-Qaness, Abd Elaziz, and Ewees 2018), the Sine Cosine Algorithm (SCA) was adopted to optimize ANFIS parameters to be applied for oil consumption forecasting. This model was applied to estimate oil consumption in two countries, the USA and Canada. In (Abd Elaziz, Ewees, and Alameer 2020), a modified ANFIS method was proposed using the genetic algorithm and salp swarm algorithm. The developed ANFIS model was applied to predict crude oil prices. In ( There are also different ANFIS applications in the time series prediction field. For example, Pousinho, Mendes, and Catalão (2011) proposed a modified ANFIS model to predict wind speed. They applied the particle swarm optimization algorithm for enhancing ANFIS prediction capability. Mohammadi et al. (2015) used the ANFIS model to estimate daily global radiation.

Preliminaries
The backgrounds of the applied methods, ANFIS model, AO algorithm, and OBL technique are described in detail in this section, as follows.

Adaptive neuro-fuzzy inference system (ANFIS)
In 1993, Jang (1993) proposed the ANFIS model as a combination of neural networks and fuzzy systems. The fuzzy system is a well-known technique that can be utilized to map the prior knowledge into constraint sets. In general, in the ANFIS model, the "IF-THEN rules" can be used to generate a mapping for the inputs and outputs. They are identified as the "Takagi-Sugeno inference model". Figure 1 shows the basic structure of the ANFIS model. The inputs of Layer 1 are represented by x and y. The output of the node i is represented by O 1i . The ANFIS model can be represented as follows: here, μ represents the generalized Gaussian membership function, where A i and B i are the membership values of μ. Additionally, the premise parameter set is represented by α i and ρ i . The output of the second layer can be formulated as: The third layer output can be defined as: w i is the i th output from the second layer. Moreover, the output of the fourth layer is computed as: where f represents a function, which depends on the input of the network (i.e. x and y) and its parameters. r i , q i , and p i are consequent parameters of the i node. Lastly, the output of the fifth layer is generated using F and � w i (that is defined in Equation (4)) and this is formulated in Equation (6):

Aquila Optimizer (AO)
The basic formulation of the Aquila Optimizer (AO) (Abualigah et al. 2021) is introduced in this section. In general, AO algorithm simulates the social behavior of Aquila to catch its prey in nature. Similar to other Metaheuristic (MH) techniques, AO is a population-based optimization technique that started by forming the initial population X that has N agents. This process has been performed using the following equation.
where UB j and LB j are limits of the search domain. r 1 2 0; 1 ½ � is a random value and Dim is the dimension of the agent.
The next step of the AO technique is to perform either exploration and exploitation until finding the optimal solution. Followed (Abualigah et al. 2021), two strategies are used to conduct exploration and exploitation.
The first strategy is used to perform the exploration depending on using the best agent X b and the average of agents (X M ). The mathematical formulation of this strategy is given as: In Equation (8), 1 À t T À � controls the search during the exploration phase. T denotes the maximum number of iterations, and rand refers to a random value between 0 and 1.
In addition, the second strategy uses the Levy flight (Levy D ð Þ) distribution and X b to update the exploration ability of the solutions, and this strategy is formulated as: where s ¼ 0:01 and β ¼ 1:5, while u and υ are random values, and Γ is a constant value. In Equation (10), X R is an agent randomly selected. Moreover, y and x are applied to emulate the spiral shape, and they are formulated as: where ω ¼ 0:005 and U ¼ 0:00565. r 2 2 0; 20 ½ � denotes a random value, and D 1 refers to integer numbers from 1 to the length of search space.
In (Abualigah et al. 2021), the first strategy is applied to update agents inside the exploitation phase depending on X b and X M , similar to exploration, and it is formulated as: where α and δ represent the exploitation adjustment parameters. rand 2 0; 1 ½ � is a random value. In the second strategy of exploitation, the agent can be updated using X b , Levy, and the quality function QF. The mathematical definition of this strategy is given as: In which randðÞ refers to a function that generates random values. Additionally, G 1 indicates different motions that are employed for tracking the best individual solution, as in the following equation: rand indicates a random value. More so, G 2 indicates decreasing values from 2 to 0, and it is computed as: Algorithm 1 shows the fundamental steps of the AO algorithm.

Opposition-based learning (OBL)
To formulate the OBL, suppose X O is an opposite value for the real value. Then, X 2 [LB,UB] is computed as: The opposite value: the X = (X 1 , X 2 , . . ., X n ) is a value in the search space, X 1 ; X 2 ; . . . ; X D and X j [UB j ; LB j ], j 2 1, 2, . . ., D. This can be employed as in Equation (20): Moreover, in the optimization process, X O and X solutions are evaluated using the fitness functions. Thereafter, the best solution is reserved, and the other one is ignored.

Proposed AOOBL-ANFIS model
Within this section, the developed AOOBL-ANFIS model used to predict oil production is introduced. The parameters of ANFIS are updated using the modified AOOBL algorithm that enhances the performance of the traditional AO technique. The first step is to divide the time-series data of oil production into two sets, namely training and testing tests. The training set contains 70% of all data samples, and the testing set has 30% of the total samples. More so, the number of clusters can be defined by the Fuzzy C-Mean (FCM) to build the ANFIS.
Then, a set of solutions X is generated and using each of them to constrict the ANFIS network. Then applying the training set to the ANFIS based on X i and compute the predicted output (P) of the training with evaluating it using the following equation: In Equation (21), P is the predicted output, T is the real data, N a the total number of training samples, and N s refers to the sample length. Third, the developed AOOBL is employed to update the current population X by using the operators of the AO algorithm and the OBL operators in Equation (19). The OBL technique is only applied in the exploration phase due to its computational cost. After that, the terminal condition is checked; updating steps will be repeated if the condition is not satisfied. Otherwise, the best configuration X b will be returned. Finally, the testing set is applied to the best configuration X b by determining the weights between Layers 4 and 5. In addition to assess the model quality for oil production time-series data. The steps of the developed AOOBL-ANFIS are presented in Figure 2.

Sunah oilfield, Yemen
Masila oilfield is situated in the Hadramout region, in the south part of Yemen, and is considered the most productive onshore oilfield (Lashin, Marta, and Khamis 2016). Block-14 is located in the Masila oilfield with a total area of 1250 km 2 . Block-14 consists of several oilfields, including Tawila, Sunah, N-Sunah, Camaal, N-Camaal, etc. Sunah oilfield is located in the Northeast corner of Masila oilfield. Sunah oilfield is the second-largest oilfield in Block-14, and it is subdivided into three reservoirs, namely S1, S2, and S3. Moreover, S1 is a sandstone reservoir, and it consists of three main reservoir units, namely S1A, S1B, and S1C; however, S1A is the target area of this study (Hakimi et al. 2017;Al-Areeq and Maky 2015).

Performance metrics
Four evaluation metrics are employed in this study as shown in Table 1, namely, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Standard deviation (Std), and Coefficient of Determination R 2 . Where � Y is the mean of Y, also Y and Py are output and its predicted value.

Masila oilfields, Yemen
First, we evaluate the proposed AOOBL-ANFIS using the production datasets of Masila oilfields, Yemen. More so, we considered several models to be compared the proposed AOOBL-ANFIS, including the conventional ANFIS, in addition to several modified ANFIS models using well-known optimization algorithms, such as the traditional AO algorithm, Particle Swarm Optimization algorithm (PSO), sine cosine algorithm (SCA), genetic algorithm (GA), gray wolf optimization algorithm (GWO), and slime mold algorithm (SMA). Table 2 records the results of all models in terms of RMSE, MAE, Std, R 2 , and the computation time. We can see that the AOOBL-ANFIS achieved the best RMSE value, followed by AO-ANFIS, GA-ANFIS, PSO-ANFIS, SMA-ANFIS, GWO-ANFIS, SCA-ANFIS, and the conventional ANFIS model, respectively. The AOOBL-ANFIS also got the best MAE value, where AO-ANFIS obtained the second rank. Other models came in the following order, GA-ANFIS, PSO-ANFIS, SMA-ANFIS, GWO-ANFIS, SCA-ANFIS, and conventional ANFIS model. More so, the AOOBL-ANFIS achieved the best R 2 value of 0.957, and three models obtained the second rank, AO-ANFIS, GA-ANFIS, and PSO-ANFIS. The SMA-ANFIS obtained the fourth rank, followed by GWO-ANFIS, SCA-ANFIS, and the conventional ANFIS model. Also, the AOOBL came in the first rank in terms of Std, followed by AO-ANFIS, GA-ANFIS, PSO-ANFIS, the conventional ANFIS model, GWO-ANFIS, SMA-ANFIS, and SCA-ANFIS. Furthermore, the AOOBL-ANFIS outperformed other models in terms of computation time.  Furthermore, Figure 3 shows the results of oil production prediction of the AOOBL-ANFIS against other compared models. It is clear that AOOBL-ANFIS is better than other models with the nearest values to the original data.

Results of Tahe oilfield
In this section, we evaluate the AOOBL-ANFIS using oil production data for 10 wells in Tahe oilfield, China. Table 3 tabulated the results of all compared models in terms of RMSE. The proposed AOOBL-ANFIS outperformed other compared models in terms of RMSE values in nine wells. Where in Well 10, the PSO-ANFIS obtained the first rank.
In terms of MAE value, as illustrated in Table 4, the AOOBL-ANFIS also obtained the best (smallest) MAE values in nine out of ten oil wells data. The PSO-ANFIS obtained the best MAE value in one out of ten. The AO-ANFIS came in the second rank in the average MAE values for all wells. Additionally, Table 5 shows the R 2 values of all compared models in all 10 wells. As shown from this table, AOOBL-ANFIS obtained the best R 2 in nine out of ten wells data, where the PSO-ANFIS obtained the best R 2 value for one well (Well 10).
Additionally, Figure 4 shows the prediction results of the AOOBL-ANFIS against the compared model. As noticed from this figure, the proposed AOOBL-ANFIS achieved the nearest values to the target data.

Statistical tests
For further assessments for the proposed AOOBL-ANFIS, we performed the Friedman test to highlight the differences between the proposed model and other compared models. The Friedman test is a type of non-parametric test. It is widely applied to detect differences between methods over multiple test runs. It ranks the methods and provides rank values for them, which can help determine the proposed method's effectiveness. Table 6 lists the Friedman test for Sunah oilfields data. As shown from the table, the AOOBL obtained the best results. Additionally, Table 7 lists the results of the Friedman test for Tahe oilfield data. Also, the AOOBL-ANFIS recorded the best results in nine wells, where the PSO-ANFIS obtained the best results in Well 10.

Discussion
In this section, we also present further discussions to elaborate the performance of the developed AOOBL-ANFIS. For example, Figures 5 and 6 shows the spot plot of all compared methods for Sunah oilfields, Yemen and Well 1 of Tahe oilfield, China, respectively. It is clear that the AOOBL has significant performance compared to other optimizers.
Moreover, we compare the developed AOOBL-ANFIS to several well-known methods used for time series forecasting in literature, namely, ARIMA, LSTM, Seasonal Autoregressive Integrated Moving Average (SARIMA), and Neural Network (NN). Table 8 shows the comparison results for all compared models for Sunah oilfield datasets. It is clear that the proposed AOOBL-ANFIS model obtained the best results in terms of RMSE, MAE, and R 2 . The ARIMA model came in the second rank, followed by SARIMA, LSTM, and NN, respectively. Table 9 displays the results of the compared time series methods for Well 1 in Tahe oilfields datasets. Also, the results in the table indicate that the developed AOOBL-ANFIS obtained the best performance in all measures, RMSE, MAE, and R 2 . The NN came in the second rank, followed by SARIMA, LSTM, and ARIMA, respectively.
In summary, according to the evaluation experiments, we conclude that the application of the AOOBL has a significant impact on the performance of the conventional ANFIS model. More so, the OBL also enhanced the performance of the   determining the ratio of solutions that will be updated using the OBL is a critical parameter that causes an increase in the time complexity of the developed method.

Conclusions
This study proposed a modified ANFIS model as a time-series forecasting approach for oil production. The traditional ANFIS was developed using an enhanced version of the Aquila Optimizer (AO) based on OBLtechnique. The developed model, AOOBL-ANFIS, was evaluated with different realworld oil production datasets collected from Masilah oilfield (Yemen) and Tahe oilfields (China). Also, it was compared to the conventional ANFIS model, and a modified ANFIS model using the conventional AO algorithm (AO-ANFIS), in addition to several modified ANFIS, namely, PSO-ANFIS, SMA-ANFIS, GA-ANFIS, SCA-ANFIS, and GWO-ANFIS. More so, it was compared to other well-known models, such as    ARIMA, SARIMA, LSTM, and NN. We applied several performance evaluation metrics, including RMSE, MAE, Std, and computational time to assess the performance of the AOOBL-ANFIS and the compared models. Experimental results have verified the outstanding performance of the developed AOOBL-ANFIS. We concluded that the AOOBL-ANFIS has significantly improved the conventional ANFIS performance. Additionally, we concluded that the OBL has a significant impact on the performance of the AOOBL-ANFIS compared to the conventional AO that was applied to modify the ANFIS model (AO-ANFIS) since the OBL boosted the search process of the conventional AO algorithm. According to the significant performance of the AOOBL-ANFIS, it could be utilized in other timeseries forecasting applications. Also, the AOOBL optimization method could be employed in other optimization tasks, such as image processing, cloud and fog computing, and others.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.