Development of modified cooperative particle swarm optimization with inertia weight for feature selection

Abstract The article presents a modified Cooperative Particle Swarm Optimization with Inertia Weight (CPSOIW) for Smart-technology of forecasting and control of complex objects. The software “CPSOIW (Cooperative Particle Swarm Optimization with Inertia Weight)” based on a modified CPSOIW algorithm has been developed in Python programming language and is used to process a multidimensional data and to create an optimal set of descriptors. The proposed algorithm combines the advantages of inertia weight particle swarm optimization (IWPSO) algorithm and cooperative particle swarm optimization (CPSO) algorithm. IWPSO algorithm allows to avoid an early convergence and to prevent particles from trapping into local optima due to update an inertia weight at each iteration. CPSO algorithm explores a search space efficiency and more detailed in a real time by parallel computing of subswarms The modelling results and comparative analysis of CPSOIW and IWPSO algorithms have been performed based on benchmark datasets and a real production data from Installation 300 of Tengizchevroil oil and gas company.


Abstract:
The article presents a modified Cooperative Particle Swarm Optimization with Inertia Weight (CPSOIW) for Smart-technology of forecasting and control of complex objects. The software "CPSOIW (Cooperative Particle Swarm Optimization with Inertia Weight)" based on a modified CPSOIW algorithm has been developed in Python programming language and is used to process a multidimensional data and to create an optimal set of descriptors. The proposed algorithm combines the advantages of inertia weight particle swarm optimization (IWPSO) algorithm and cooperative particle swarm optimization (CPSO) algorithm. IWPSO algorithm allows to avoid an early convergence and to prevent particles from trapping into local optima due to update an inertia weight at each iteration. CPSO algorithm explores a search space efficiency and more detailed in a real time by parallel computing of subswarms The modelling results and comparative analysis of CPSOIW and IWPSO algorithms have been performed based on benchmark datasets and a real production data from Installation 300 of Tengizchevroil oil and gas company.

G. Samigulina
ABOUT THE AUTHOR G. Samigulina is a doctor of technical sciences, Academician MAIN, head of the lab. "Intellectual control systems and forecasting" of the Institute of Information and Computational Technologies, Kazakhstan. She developed the immune-network technology for developing intelligent systems of prediction and control of complex objects under conditions of parameter uncertainty. Her research interests are intellectual systems of industrial automation, Smart-distance learning systems for people with visual disabilities, molecular design of drugs with desired properties, forecasting the risks of complex investment projects, etc.
Massimkanova Zhazira is a second year PhD student at Al-Farabi Kazakh National University by specialty Information Systems and junior researcher at Institute of Information and Computational Technologies, Kazakhstan. Her research interests are swarm intelligence algorithms.

PUBLIC INTEREST STATEMENT
Nowadays, in leading oil and gas companies intelligent systems are successfully used to control oil production processes, optimize production, diagnose the technical state of industrial equipment and alert about emergency situations. One of the main problems of intelligent systems are a processing and analysis of large amounts of data in real time. Currently swarm intelligence algorithms are actively developing and have the ability to quickly and detailed search for optimal solutions. The proposed cooperative particle swarm optimization with weight algorithm combines the advantages of inertia weight particle swarm optimization algorithm, which allows to avoid early convergence and to prevent particles from trapping into local optima due to update the inertia weight at each iteration and cooperative particle swarm optimization algorithm, which explores the search space in a real time efficiently and more detailed by parallel computing of subswarms.

Introduction
Recently the application of innovative Smart technologies for forecasting and control of complex nonlinear dynamic objects with parameter uncertainties based on intelligent methods is relevant task. These technologies allow to detect emergencies of industrial equipment, to forecast and control a production processes, to increase an efficiency and performance of equipment that promote to reduce a production cost (Andreadisa et al., 2014). Honeywell Experion PKS (Process Knowledge System) distributed control systems are widely used in modern industrial automation systems. The application of modern artificial intelligence methods for analysis and forecasting the states of complex industrial objects on the basis of a real production data is relevant and provides high efficiency control in case of emergency situations. For example, in work by (Sayda, 2011) intelligent methods are successfully used to forecast the states of complex industrial objects and to provide a timely notification about emergencies. In leading oil companies these methods are successfully used to control oil production facilities. The research of (2014) considers the issues of ensuring continuous data collection, a real-time monitoring assets and environmental conditions using thousands of sensors in subsurface wells and surface facilities, weather services, and others. Processing and analyzing the obtained data, mapping changes in the reservoir over time allow to increase the oil and gas production, to optimize expenses, to reduce the impact of environmental risks, and to ensure safe oil production process.
In many developed applications, prospects of using swarm intelligence algorithms for preliminary data processing and feature selection are presented. The existing listed above swarm intelligence algorithms are not universal. Depending on a type and a volume of used data, the development of efficiency-modified algorithms, which provide parallel processing in real time, is an actual task. For example, paper by (Shunmugapriya & Kanmani, 2017) proposes a novel hybrid algorithm based on ant colony and bee colony algorithms for feature selection and classification. The evaluation of proposed algorithm using 13 benchmark datasets shows the efficiency of its application. Article by (Ghaedi et al., 2014) presents an efficiency ACO algorithm to optimize of gas allocation to a group of wells in a gas lift. In article by (Abramov, 2016) bee colony algorithm is proposed to place objects in oil production. The algorithm describes a decentralized behavior of intelligent agents, which represent a self-organizing system. Paper by (Pradana et al., 2014) is devoted to a joint use of Binary Particle Swarm Optimization algorithm and decision tree for multidimensional data processing and feature selection. The experimental results based on 11 datasets show that the proposed method is more efficiency than naive Bayesian classifier and support vector machine. In paper by (Teng et al., 2017) v-shaped binary particle swarm optimization is used for adaptive feature selection. The modelling outputs show an optimization of feature subsets using the proposed algorithm. In study by (Najafzadeh et al., 2017) neuro-fuzzy group method of data handling (NF-GMDH) is implemented using PSO, gravitational search algorithm (GSA) and genetic algorithm (GA) to evaluate the pier scour depth. The testing results show the efficiency of NF-GMDH-PSO model in comparison with NF-GMDH-GA and NF-GMDH-GSA models. Article by (Gholamian & Meybodi, 2014) describes an enhanced comprehensive learning cooperative particle swarm optimization with fuzzy inertia weight (ECLCFPSO-IW) to solve a problem of early convergence and to avoid a local optimum. Paper by (Umapathy et al., 2010) presents PSO algorithm with three different inertia weights for optimal power flow solution. Algorithms with a constant inertia weight, a time-varying inertia weight and a global-local best inertia weight are considered to analyze an impact of inertia weight on convergence of PSO algorithm. The paper by (Saleem et al., 2019) presents the niche based bat algorithm (NBBA) to find the relevant subsets. In this algorithm a random walks of bats allow to avoid a local minima and enhances diversity of solutions in a search space. In research by (Tatwani & Kumar, 2019) a master slave parallel genetic algorithm is used for feature selection in high dimensional datasets. The experiments are performed on three high dimensional gene expression data. The modelling outputs show that a dimensionality and an execution time are significantly reduced.
Article by (Han & Bian, 2018) describes a combination of PSO algorithm and support vector machine (SVM) to determine oil recovery factor in a low-permeability reservoir. The accuracy and efficiency of proposed model are evaluated using 34 datasets. Modelling outputs show a low average percentage of absolute error. Paper by (Li et al., 2016) proposes a hybrid system based on PSO algorithm and neural network for power transformer fault diagnosis. The modelling results show a good performance of the proposed algorithm. The paper by (Wu et al., 2015) describes Multi-Agent Particle Swarm Optimization (MAPSO) for a power system economic load dispatch. The experimental results demonstrate that MAPSO algorithm exceeds an accuracy and a convergence speed of Interactive Genetic Algorithm and PSO algorithm regardless of a size of data. In article by (Wang et al., 2012) an integrated model based on PSO algorithm and a multiagent approach is used for optimal resource allocation. Paper by (Lindegaard Mikkelsen et al., 2013) presents Multi-layered Multi-Agent System (MMAS) based on PSO algorithm for allocation a production resources. Paper by (Shangxiong, 2008) proposes a parallel PSO algorithm based on multiagent cooperative algorithm. The system structure consists of several compute units, which are standard PSO algorithms in parallel processing. At each iteration, local best values of subswarms are transferred to a main swarm. The simulation results show that a distributed structure can increase an efficiency of system and a calculation speed. Therefore, literature review proves a relevance of using modified swarm intelligence algorithms to solve problems of feature selection and to create an optimal dataset. The most important advantage of PSO algorithm, compared to other swarm intelligence algorithms, is its speedy convergence to global optimum (Amoshahy et al., 2016). The proposed modified Cooperative Particle Swarm Optimization with Inertia Weight (CPSOIW) combines the advantages of Inertia Weight Particle Swarm Optimization (IWPSO) algorithm (Maca & Pech, 2015) and Cooperative Particle Swarm Optimization (CPSO) algorithm (Xia et al., 2018). Standard PSO algorithm initially uses a constant value of inertia weight, which leads to premature convergence. In order to preserve a diversity of solutions and to prevent particles from trapping into local optima, in paper by (Technological regulations, 2016) IWPSO algorithm was proposed, in which an inertia weight varies from iteration to iteration. When developing systems for forecasting and control of complex objects, an analysis a large amount of data in real time and a calculation time of algorithm are relevant. CPSO algorithm is aimed at reducing a calculation time of algorithm due to parallel processing of subswarms, it is advisable to combine the advantages of two algorithms in the modified CPSOIW algorithm.
The following structure of article is proposed: Section 2 presents the problem statement of research. Section 3 is devoted to the description of solution methods and algorithms. Section 4 presents the modelling results of developed algorithm and comparative analysis. Conclusion and references are given at the end of article.

Problem statement
The problem statement is formulated as following: it is necessary to develop modified Cooperative Particle Swarm Optimization with Inertia Weight (CPSOIW) algorithm for feature selection based on Smart-technology of forecasting and control of complex objects using a real production data from Tengizchevroil oil and gas company.
The research has been performed using a real production data of Tengizchevroil oil and gas company, the largest industrial enterprise in Kazakhstan. The enterprise consists of several complex technological lines and installations for various purposes. As a complex object, we consider Installation 300, which designed for cleaning petroleum gases from acidic components Technological regulations, 2016).

Solution methods and algorithms
The modified CPSOIW algorithm based on Inertia Weight Particle Swarm Optimization (IWPSO) and Cooperative Particle Swarm Optimization (CPSO) algorithms has been developed. In CPSO algorithm (Luo & Gao, 2019), the population of agents is divided into one main swarm and several subswarms. The agents of subswarms are focused on an intensive local search and acceleration of convergence speed. The agents of main swarm are focused on a global search and maintaining a solutions variety. Information sharing and rapid adjustment of strategies based on their own experience allow agents to converge quickly to a local solution. In CPSO algorithm, an inertia weight is a constant during iterations, which leads to early convergence. In order to prevent particles from trapping into local optima and to accelerate the particles' velocity, an inertia weight is varies from iteration to iteration in IWPSO algorithm. Figure 1 shows an agent cooperation, in which each subswarm is IWPSO algorithm.

Modified CPSOIW algorithm:
(1) Initialization of parameters c 1 , c 2 ; w max ; w min ; t max , where c 1 and c 2 À acceleration coefficients, w max À the start of inertia weight, w min À the end of inertia weight, t max À the maximum number of iterations.
(2) Initialization of search space. The generation of agents population in D-dimensional space. A random generation of initial positions (x i ) and velocities (v i ) of particles of main swarm M and two subswarms S, where i ¼ 1; 2; . . . n.
(3) Fitness-function evaluation of S subswarms f x s i t þ 1 ð Þ À � and determination the best values of fitness-function f p s g � � . As a fitness-function, CFS (correlation-based feature selector) algorithm (Hall, 1999) is used, which presented as following Eq.1: f s ¼ kr sf ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi k þ k k À 1 ð Þr ff q (1) where s-subset, which consist of k descriptors, r sf À the mean correlation "descriptor-class" f 2 s ð Þ,r ff À the mean correlation between "descriptor-descriptor".
( (3) The update of velocity of M main swarm v M i by the following Eq. 2: where t À iteration, w À inertia weight, r 1 ,r 2 ; r 3 -random variables distributed on [0,1], p M g À the best value of main swarm, p s g À the best value of subswarms, x M i À the position of main swarm.
(1) Update the position of particles of M main swarm x M i by the Eq. 3: (1) The calculation of inertia weight w for the whole swarm. The inertia weight varies from iteration to iteration and it is determined by Eq. 4: 9. Checking of stop condition t<t max 10. Execution 3-9 steps until the end of iterations.
11. The creation an optimal set of descriptors based on the best values of particles of M main swarm for further use at forecasting and control of complex objects.
The flow chart of modified CPSOIW algorithm is presented in Figure 2.

Modelling results
The software "CPSOIW (Cooperative Particle Swarm Optimization with Inertia Weight)" has been developed in Python programming language for feature selection.
The efficiency of proposed modified CPSOIW algorithm has been tested using benchmark datasets UCI (UCI Machine Learning Repository). In the repository, the datasets are stored in . arff format. The datasets contain features and instances. The following datasets are used: 1. Dataset "Sonar" contains 208 instances and 61 descriptors in range from 0,0 to 1,0. Data is classified into 2 classes, such as "R" (rock) and "M" (metal cylinder). The proposed CPSOIW algorithm is compared with IWPSO algorithm. The parameters, which used at comparing, are shown in Table 1.

Dataset
The modelling results of CPSOIW algorithm based on dataset "Sonar" are shown in Figure 3. The interface of software consists of a database connection button, an input field for parameters, a panel for displaying data from database and a panel for outputs.
The selection of informative descriptors based on IWPSO algorithm is performed using WEKA software (Samigulina & Massimkanova, 2019), which is widely used for machine learning and data mining. The software is written in Java programming language and has GNU General Public License. The comparison of modified CPSOIW algorithm and IWPSO algorithm is shown in Table 2.

Parameters/Descriptions Values
The number of particles in swarm P ð Þ 20 The end of inertia weight ðw min Þ 0.33 The start of inertia weight ðw max ) 0.9 Acceleration coefficient c 1 ð Þ 0.33 The maximum number of iterations t max ð Þ 100 The relevance of selected descriptors is estimated by the value of fitness-function. A value, which close to one, is considered as the best value of fitness-function. In modelling the first dataset using CPSOIW algorithm 13 informative descriptors from 61 descriptors have been chosen. The modelling outputs of IWPSO algorithm show that 17 informative descriptors from 61 descriptors have been selected. The value of fitness-function of CPSOIW algorithm is more than IWPSO algorithm. The modelling results of the second, the third and the fourth datasets based on CPSOIW algorithm show that 6, 4, and 7 informative descriptors from 19, 17, and 19 descriptors have been chosen.
The modified CPSOIW algorithm has been tested on a real production data from Installation 300 of Tengizchevroil oil and gas company (Maca & Pech, 2015). Installation 300 (I300) consists of the following main units: high-pressure gas cleaning, medium-pressure gas cleaning, dietalomine regeneration, amine filtration, chemical supply, etc. Normal functioning of I300 is supported by control and measuring devices such as LIC31002-buoy level gauge, FT31005-difference converter, TT31020-temperature converter, etc. As a database we will use the daily measurements from 19 sensors of Installation 300. Data from sensors is recorded every 4.5 min during the day. The dimension of database is 19×799, 15,181 instances (Table 3).
At forecasting a technical state of I300, the complex object parameters are classified to normal, boundary and emergency operation mode of I300. Table 4 shows the comparison of CPSOIW and IWPSO algorithms based on a real production data from I300.
The modelling outputs of CPSOIW algorithm show that the best value is achieved in a short time due to parallel computation. There are selected 9 informative descriptors from 19 descriptors. The computation time of CPSOIW algorithm is shorter than in IWPSO algorithm. The value of fitness-function 0,397 0,813 0,837 0,533

Conclusion
The development of modified cooperative particle swarm optimization with inertia weight allows to perform a multidimensional data preprocessing and to select an informative descriptors for Smart-technology of forecasting and control of complex objects. This developed algorithm combines the advantages of cooperative particle swarm optimization and inertia weight particle swarm optimization. The cooperative algorithm allows to explore a multidimensional space in detail and to improve a preprocessing speed due to parallel computing. The inertia weight particle swarm optimization algorithm updates an inertia weight at each iteration and improves solutions variety. Smart-technology of forecasting and control complex objects allows to determine the emergency situations of industrial equipment, quick react to changes in operating mode of installations, which significantly increase the efficiency and the performance of equipment. The best value of fitness-function 0,78 0,61 The number of selected descriptors 9 10 Calculation time 2 ms 5 ms