Hessian-regularized weighted multi-view canonical correlation analysis for working condition recognition of sucker-rod pumping wells

ABSTRACT In order to more accurately recognize and understand the working condition of sucker-rod pumping wells so as to maximally reduce the cost and increase the profit, a large amount of data has been collected during oil production with sucker-rod pumping wells. In view of the sucker-rod pumping production system in big data and IOT (Internet of things) of oil-gas production, to solve the limitations in the existing working condition recognition research and further improve the recognition accuracy and practicality with fewer labelled working condition samples by utilizing the measured parameters from multiple information sources effectively, in this paper, a novel working condition recognition method based on Hessian-regularized weighted multi-view canonical correlation analysis is proposed. Firstly, the features of the measured ground dynamometer cards, electrical power, wellhead temperature and wellhead pressure data are extracted as four different feature views based on the prior information, empirical knowledge and mechanism analysis. Then a model based on Hessian-regularized weighted multi-view canonical correlation analysis and cosine nearest neighbour multi-classification algorithm is established. The proposed method is applied to the recognition of eleven kinds of working conditions from sixty sucker-rod pumping wells in a certain block in Shengli Oilfield, China. In the case where there are small number of labelled training samples, based on cosine nearest neighbour classification method, the recognition rates are increased by 3.44% and 1.5% compared with traditional recognition methods based on measured ground dynamometer cards and electrical power data, respectively. In contrast to methods based on traditional multi-sources of feature connection, multi-view canonical correlation analysis as well as the unweighted Hessian-regularized multi-view canonical correlation analysis, the recognition rates are increased by 4.46%, 2.21% and 1.62%, respectively.


Introduction
In the oil exploitation and production, the sucker-rod pumping system has been in a dominant position for a long time. But affected for years by many factors, such as unpredicted geological structure and facilities used, etc., the working condition of sucker-rod pumping system is usually complicated and unstable, thus causing the oil well fault frequently and a large drop of the output and profit. Therefore, recognizing the working condition of sucker-rod pumping wells accurately and timely has great significance for improving the efficiency and productivity of the oil wells.
At present, the working condition recognition methods based on dynamometer cards are relatively popular and widely employed by combining indirect measured dynamometer cards (Zhang & Tang, 2008), pump dynamometer cards (Li, Gao, Tian, & Qiu, 2013;Liu, Luo, Lu, Fan, & Liu, 2013) and measured ground dynamometer cards (Wu, Sun, & Wei, 2011;Xu, Xu, & Yin, 2007) with CONTACT Yanjiang Wang yjwang@upc.edu.cn artificial intelligence algorithms, respectively. The recognition methods based on electrical parameters are mainly featured by measured electrical parameters or electric power cards (Chen, 2016;Sun, 2011). In addition, the methods based on multi-source data are rarely employed and mainly featured by utilizing pump dynamometer cards together with some production information, such as liquid-producing capacity, well condition data, etc. Wang, 2010). However, the above research still shows some limitations as follows: first, most of working condition recognition methods only utilize single information source, in which false alarms may be easily triggered. For example, working condition characteristics of assist-blowing and rod cutting are similar in dynamometer cards (Hu, Yi, & Tian, 2008), while working condition characteristics of valve leakage and valve failure are similar in electric power cards (Sun, 2011). Therefore, it is difficult to identify these working conditions accurately through dynamometer cards or electric power cards only. Second, the existing working condition recognition methods based on multi-source information need to be further improved in recognition performance and model robustness. On one hand, the methods based on traditional multi-feature connection mode, such as traditional SVM (Support Vector Machine), neural network, etc., have limitations to achieve further improvements (Wang, Liu, Xiong, & Tang, 2011). On the other hand, due to the unreliable initial collecting data from multiple information sources, such as production and statistics data, etc., the existing working condition recognition model is prone to instability. Third, most of the working condition recognition methods require a large number of labelled training samples. In real projects, the labelled working condition samples are difficult to obtain and the cost is very high, while the recognition methods based on the unlabelled training samples are usually not ideal in recognition accuracy (Li, Gao, Zhou, & Han, 2015;Liang, 2015). Fourth, affected by damping coefficient and 'division by zero', the working condition recognition methods with pump dynamometer cards or electric power cards often produce errors when calculating the features (Liang, 2015;Sun, 2011). In addition, some feature extraction methods for identifying the working condition require a lot of complicated calculations (He et al., 2008;Reges, Schnitman, Reis, & Mota, 2015). All the above problems limit the practical application of the existing working condition recognition methods.
With the development of big data and IOT of oil and gas production, massive real-time multi-source data, such as measured ground dynamometer cards, electrical power, wellhead temperature and wellhead pressure data etc., have been collected and stored in the oil recovery and production system of sucker-rod pumping wells. In order to overcome the above limitations in the existing working condition recognition research, to explore how to effectively utilize the above multiple measured parameters and sufficiently fuse the features of these multiple information sources and further improve the recognition accuracy and practicality by employing limited precious known working condition resources has important scientific and application value.
Multi-view canonical correlation analysis (MCCA) (Kettenring, 1971;Vía, Santamaría, & Pérez, 2007;Wang, Zhou, Liu, & Zhang, 2017) is one of the most attractive paradigms for multi-view feature extraction and fusion learning. The key idea of MCCA is to find a common subspace in which the correlation between the lowdimensional embeddings of any two views is maximized. MCCA learning method can make full use of the extracted features of an object with multiple views effectively. The features of different views can be fully mixed by maximizing the correlation between any two views, thus improving the accuracy of those methods based on single view feature learning and traditional multi-source data processing. Research showed that learning methods based on the combination of multi-view features and manifold information can further improve the recognition effect (Iosifidis, Tefas, & Pitas, 2013;Liu, Li, Lin, Tao, & Wang, 2014;Liu, Liu, Tao, Wang, & Lu, 2015). Especially the learning method based on the combination of Hessian regularization from manifold regularization and multi-view canonical correlation analysis can significantly improve classification results with fewer labelled samples (Liu, Yang, Tao, Cheng, & Tang, 2018). And furthermore, research showed that the recognition learning methods based on multi-view canonical correlation analysis can further promote classification effects by adding weighted technology to multiple views (Cai, Wang, Peng, & Qiao, 2014;Eleftheriadis, Rudovic, & Pantic, 2015).
Based on the discussion above, in this paper, a novel working condition recognition method based on Hessian-regularized weighted multi-view canonical correlation analysis is proposed. First, the measured ground dynamometer cards, electrical power, wellhead temperature and wellhead pressure signals are chosen as four different feature views and the features of each view are extracted according to the mechanism analysis, prior information and empirical knowledge. Then, a working condition recognition model of sucker-rod pumping wells is established by Hessian-regularized weighted multi-view canonical correlation analysis algorithm and cosine nearest neighbour multi-classification algorithm. Finally, the proposed method is applied to recognize 11 kinds of typical working conditions of a certain block in Shengli Oilfield, China. Experimental results show that the proposed method has better recognition accuracy in the cases of fewer labelled training samples (e.g. in 15% below), thus it has better popularization application.

Hessian regularization
In nonlinear manifold regularizations, Hessian regularization Tao, Jin, Liu, & Li, 2013) can reflect higher-order information of manifold distribution of sample data and exploit the local distribution geometry of the underlying data manifold more accurately. Thus Hessian regularization can better match the data inside training examples, and predict the data outside the training examples more effectively.
Suppose M is a v-dimensional data manifold in Euclidean space, and C ∞ (M) represents the set of smooth functions on M. The Hessian regularizer is formulated as: where dV(x) represents the natural volume element, f : M → R with f ∈ C ∞ (M), and S: C ∞ (M) → R denotes the regularization function.
Obviously, the Hessian regularizer can conduct second covariant derivative on f. According to the proved proposition (Eells & Lemaire, 1983), the null space of Hessian regularizer is linear functions with constant variation with regard to the geodesic distance. Owing to the richer null space, Hessian regularizer can better keep the local structure of the manifold and demonstrate superior extrapolating performance.

Hessian-regularized weighted multi-view canonical correlation analysis algorithm
Before describing the Hessian-regularized weighted multi-view canonical correlation analysis algorithm, in what follows, we will introduce the calculation of the Hessian matrix and the covariance matrix of the Hessianregularized canonical correlation analysis briefly, respectively.
The calculation of the covariance matrix of Hessianregularized canonical correlation analysis is described as follow. Suppose H is the whole Hessian that can represent the manifold structure of the dataset {x 1 , x 2 , · · · , x n }, H k is the Hessian of the tangent space formed by the p-neighbourhood of the k th example, then the example x p within the neighbourhood can be denoted by the linear combination of the p-nearest neighbours i.e. x p = X p∼k H T kp , where p ∼ k represents the example x p in the neighbourhood of x k , X p∼k means the example data matrix in the neighbourhood of x k and H T kp is the p th column of H T k . Therefore, the covariance relation between examples in the neighbourhood of the k th example can be written as follows: Then, the whole covariance matrix S H between the N examples can take the following expression by accumulating in the neighbourhood of each example The Hessian-regularized weighted multi-view canonical correlation analysis algorithm is the extension to Hessian-regularized canonical correlation analysis algorithm for three or more views, the calculation of the covariance matrix is similar.
The Hessian matrix calculation is summarized as follow. According to the manifold assumption that the close points in the intrinsic geometry will share the similar conditional distribution, so we construct the Hessian by considering the nearest neighbourhood within the same class.
For the υ th view examples, we identify the indices corresponding to the p-nearest neighbours of each example x (υ) n in the same class l n . And then we construct a matrix D (υ) n to express the neighbourhood of x (υ) n . Based on singular value decomposition on D (υ) n , we get the base vectors of the tangent coordinates U = [U 1 , U 2 , · · · , U l ] ∈ R p×l of the neighbourhood of x (υ) n . Then, we can develop a Hessian matrix H (υ) n by performing the Gram-Schmidt orthonormalization process on the matrix G = [1, U 1 , · · · , U l , U 11 , · · · , U ll ] and taking the last l(l + 1)/2 columns. The last l(l + 1)/2 columns of G stand for the squares and cross products of those l columns of U, e.g., U11 = U1°U1, and the symbol°stands for the element-wise product. And at last, we obtain a symmetric Hessian matrix H (υ) within the υ th view by accumulating the H (υ) n . After the calculation of the above two key matrices, the Hessian-regularized weighted multi-view canonical correlation analysis algorithm can be summarized as follows. Assume for the υ th view Hessian matrix and l n ∈ {1, 2, · · · , c} represents the class label of the n th example. The examples are drawn from a probility P that varies smoothly along the geodesics in the intrinsic geometry of a compact manifold M.
The Hessian-regularized weighted multi-view canonical correlation analysis algorithm aims to find a set of linear projections and discover the nonlinear correlations of multiview examples through the combination of Hessian regularization and multiview weight, thus preserves the locality of data manifold better and maximizes the sum of pair-wise correlations between the projected variables {y (υ) } V υ=1 . The proposed Hessianregularized weighted multi-view canonical correlation analysis algorithm can finally be taken as the following problem: where °H(j) is the between-view Hessian matrix, symbol°stands for the element-wise product. Similarly, The solution to problem (1) provides the first project directions of the V views i.e.α T 1 = (α T 11 , α T 21 , · · · , α T V1 ). For the remaining project directions, we can solve the following problem (2) iteratively: The optimization of the problem (1) and (2) belongs to a multivariate eigenvalue problem (MEP) and has no analytical solutions. There are a number of methods developed to solve the MEP problems, however, there is no rigorous evidence to prove the global converges of the current solutions especially for the general multiview case (Chu & Watterson, 1993). In our work, we realize an approximate optimization by relaxing the constraints and rewriting the problem (2) as follows: The solution to problem (3) is given by Lagrange multiplier technique.

Algorithm solution
Assume λ as the Lagrange multiplier, the Lagrangian function of problem (3) can be written as In problem (4), the partial derivative of F(α, λ) with regard to α i has the following form The V equations of (6) can be finally reformulated as The solution to problem (7) can be viewed as a generalized eigenvalue decomposition. In this paper, we perform the regularization operation to S WH D i.e. replace the S WH D by using S WH D ← S WH D + σ I, where σ is a small positive scalar and I ∈ R d×d is the identity matrix. Therefore, the problem (7) can always be regarded as a standard eigenvalue decomposition of S WH −1 D S WH and be implemented by using a paralleled methods (Liu, Zhang, Tao, Wang, & Lu, 2016).
Based on the solution of problem (7), we obtain the multiple h-dimensional feature vectors for each example. Then we concatenate the multiview representations to shape a whole vector for each example to carry out recognition tasks.

View selection
In a large amount of existing real-time multi-source information measured from the oil recovery and production system of sucker-rod pumping wells, the ground dynamometer cards mainly reflect the condition from wellbore and stratum (Hu et al., 2008), while the electrical power signal reflects the condition from ground and stratum (Chen, 2016;Sun, 2011). These two types of information can more comprehensively and timely reflect the working condition of sucker-rod pumping wells. But due to the similarity in the dynamometer card shape and electrical power card feature, a few working conditions, such as working conditions between the assist-blowing and the severe bottom leakage in tubing, between gas disturbing and lack of supply liquid, can be recognized accurately by wellhead temperature or wellhead pressure signal. Therefore, the above four information sources are selected and taken as feature views in the proposed approach.

View feature extraction
In view of the existing working condition recognition research limitations in feature extraction and to further improve the recognition accuracy, the features of each view are extracted based on mechanism analysis via the theoretical dynamometer card, prior information and expert knowledge.

Theoretical dynamometer card under static load
The illustration of the theoretical dynamometer card under static load is shown in Figure 1.
As can be seen in Figure 1, the horizontal ordinate represents displacement of polish rod, which is expressed by S, and the vertical ordinate represents load on polish rod, which is expressed by P. S r is the stroke of polish rod, S p is the stroke of piston (i.e. effective stroke), S l is the stroke loss of loading, S u is the stroke loss of unloading, P l is the weight of liquid column on piston. Point A(E) is the closing point of travelling valve, i.e. bottom dead centre. Point B is the opening point of standing valve, point C(F) is the closing point of standing valve, i.e. top dead centre. Point D is the opening point of travelling valve. A(E) → B → C(F) represents upward stroke stage, which is the process of loading and working, C(F) → D → A(E) represents down stroke stage, which is the process of unloading and working, A(E) → B → C(F) → D → A(E) represents a stroke, i.e. a work cycle of oil well pump.

Feature extraction
The feature data of measured ground dynamometer cards can be extracted by a work cycle variation of eight key factors, including area of dynamometer card, pump speed, load, weight of liquid column on piston (P l ), stroke (S r ), effective stroke (S p ), stroke loss (S l and S u ) and varying positions of key points during loading and unloading process.
Twelve features are extracted from the measured ground dynamometer cards. That is, stroke, pump speed, the actual area of dynamometer card, maximal load, minimal load, the maximum and minimal load ratio, the weight of liquid column on piston, the effective stroke, the stroke loss of loading, the stroke loss of unloading, the advanced loading position and the advanced unloading position. Of all the above 12 feature parameters, stroke, pump speed, maximal load and minimal load can be directly obtained from measured dynamometer card data. The actual area of dynamometer card is the area of closed curve encircled by the collection points of measured ground dynamometer cards. The maximum and minimal load ratio is equal to the ratio of the maximum load to the minimal load. The weight of liquid column on piston is equal to the difference between the maximum load and the minimal load. The effective stroke is equal to the displacement difference between travelling valve opening and closing point. The stroke loss of loading is equal to the displacement difference between standing valve opening point and travelling valve closing point. The stroke loss of unloading is equal to the displacement difference between standing valve closing point and travelling valve opening point. The advanced loading position is equal to the displacement of the first point in reverse of positive and negative direction of slope from travelling valve closing point to opening point. The advanced unloading position is equal to the displacement of the first point in reverse of positive and negative direction of slope from standing valve closing point to opening point.
The feature data of measured electrical power signal can be extracted by 'power feature' and 'AUC (Area under curve) feature' (Chen, 2016;Sun, 2011). The position of top dead centre and bottom dead centre can be directly obtained from measured dynamometer card data. Top dead centre is the maximum point of the displacement, and bottom dead centre is the minimum point of the displacement, i.e. the initial point of dynamometer card data (except the drifting data).
Seven features are extracted in measured electrical power signal, including uplink power, downward power, period power, uplink area, downward area, period area and the balance rate. Uplink power is equal to the sum of power during the up stroke stage. Downward power is equal to the sum of power during the down stroke stage. Period power is equal to the sum of uplink power and downward power. Uplink area is equal to the area encircled by electric power signal curve during the up stroke stage and horizontal axis based on time series. Downward area is equal to the area encircled by electric power signal curve during the down stroke stage and horizontal axis based on time series. Period area is equal to the sum of uplink area and downward area. The balance rate is equal to the ratio of uplink power to downward power.
Affected by accuracy of data acquisition and environment factors from ground and stratum, the wellhead temperature data collected from oil production site can not strictly follow the characteristics of the corresponding working condition, but it can essentially reflect heat energy loss of the corresponding working condition in each stroke.
Three features are extracted from the measured wellhead temperature signal, including uplink heat energy (temperature) loss, downward heat energy (temperature) loss and period heat energy (temperature) loss. The number of real-time collection points from measured wellhead temperature is usually less than the number of real-time collection points from measured ground dynamometer cards in a stroke. Through using the interpolation fitting method, the number of real-time collection points from the two above measured parameters can be synchronized. In addition, the data of top dead centre and bottom dead centre can be obtained by the measured ground dynamometer card data. Uplink heat energy (temperature) loss is equal to the sum of heat energy (temperature) loss during the up stroke stage. Downward heat energy (temperature) loss is equal to the sum of heat energy (temperature) loss during the down stroke stage. Period heat energy (temperature) loss is equal to the sum of uplink heat energy (temperature) loss and downward heat energy (temperature) loss.
The research status of the measured wellhead pressure signal on feature extraction is similar to the measured wellhead temperature signal. Affected by the stratum environment factor and data collection accuracy, the wellhead pressure data collected from oil production site can not strictly follow the characteristics of the corresponding working condition, but it can essentially reflect energy loss of the corresponding working condition in each stroke.
Three features are extracted from the measured wellhead pressure signal, including uplink energy (pressure) loss, downward energy (pressure) loss and period energy (pressure) loss. Similarly, the number of real-time collection points from measured wellhead pressure is usually less than the number of real-time collection points from measured ground dynamometer cards in a stroke. So, the number of real-time collection points between the two measured parameters can be synchronized by the interpolation fitting method, and the data of top dead centre and bottom dead centre can be gained by the measured ground dynamometer card data. Uplink energy (pressure) loss is equal to the sum of energy (pressure) loss during the up stroke stage. Downward energy (pressure) loss is equal to the sum of energy (pressure) loss during the down stroke stage. Period energy (pressure) loss is equal to the sum of uplink energy (pressure) loss and downward energy (pressure) loss.

Modeling for the working condition recognition
The modelling process of working condition recognition of sucker-rod pumping wells is shown in Figure 2.
The key to establish the model for identifying the working condition of sucker-rod pumping wells by Hessianregularized weighted multi-view canonical correlation analysis and cosine nearest neighbour algorithm is how to select neighbour number of Hessian construction, dimension of common subspace, weight of multiple views and the number of samples in classification. In view of the correctness, generalization, time complexity and practicability of the algorithm, the neighbour number of Hessian is defined as the number of training samples minus 1; the dimension of common subspace is selected by the minimum dimension among multiple views, the feature data from the measured wellhead temperature and wellhead pressure signal can be integrated together in this paper, thus the dimension of common subspace is set to 6; the multi-view weight can be obtained by tuning according to increasing the multiple of each view, the multiple of view with the low effect is larger than that of the high effect; the recognition effects based on different compared methods in this paper are mostly at or near the peak in the case that the number of training samples is 15% of the number of samples in each category, thus the number of training samples is far less than the number of test samples and is selected by 5-15% of the number of samples in each category. In addition, the number of samples from each view is identical.
The concrete recognition process for the working condition is described as follows: First, the features of original working condition data from the above four views are extracted respectively, the features of the measured wellhead temperature and wellhead pressure signal are merged by the concatenated mode, thus the working condition feature data set which contains three feature sets from three views is established and the training data set and test data set are generated as well. Then after defining the number of training samples and the neighbour number of Hessian, the training data set is established the Hessian matrix. Next, proper the common subspace dimension and the multiview weight are selected and the training data is trained and obtained the solution with the maximum correlation coefficient sum between views. Finally, after reduced the dimension by the above solution, the reconstructed training data set and test data set are gained, the working conditions are classified and recognized by the cosine nearest neighbour algorithm together with the reconstructed training data set and test data set.

Experiment results and discussions
To evaluate the validity and practicability of the proposed method and model, we conduct experiments on one working condition data set derived from sixty sucker-rod pumping wells in a typical high-pressure lowpermeability thin oil block in Shengli Oil Field, China. The working condition samples of sucker-rod pumping wells are selected closely according to the operation record of oil well. Each sample of measured ground dynamometer card, electrical power, wellhead temperature and wellhead pressure signal in a working condition sample is composed of the collected points from the real-time oil well production sites according to the coincident acquisition time. The working condition data set of sucker-rod pumping wells is established by the accumulated samples from sixty sucker-rod pumping wells through three years, which contains 11 categories of typical working condition (i.e. normal, lack of supply liquid, rod cutting, assist-blowing, stuck pump, travelling valve failing, wax precipitation, tubing leakage, pump leakage, travelling valve leakage, standing valve leakage), 150 samples in each category and 1650 samples in total in the working condition data set.

Comparison results by different views recognition methods based on fewer different marked training samples
The working condition data set including 1650 samples is uniformly divided into four groups before test, i.e. the training set is formed by 5%, 7%, 10% and 15% of the number of samples in each category respectively, the test set is formed by the rest corresponding proportion samples in each category respectively. Thus the number of samples in the training set and the test set is identified as N = {88,121,165,253} and T = {1562,1529,1485,1397} by four groups proportion respectively. The training set and the test set in each group contain 11 categories of working condition, each category of working condition contains the same number of samples, and the number of the training samples in each category is identified as n = {8,11,15,23} according to the above four groups proportion. Cosine nearest neighbour (COSNN) classifier is employed and the dimension of the common subspace is set to 6. The recognition accuracy can be obtained by the ratio of correct classified samples to total samples in the test set. Then the proposed method based on Hessianregularized weighted multi-view canonical correlation analysis and cosine nearest neighbour (i.e. Hes-Wei MCCA-COSNN) is compared with single information source recognition methods, such as traditional cosine nearest neighbour (i.e. COSNN), Hessian-regularized cosine nearest neighbour (i.e. HesCOSNN), traditional multi-sources of feature connection recognition method based on cosine nearest neighbour (i.e. mC-COSNN) and Hessian-regularized weighted canonical correlation analysis recognition method based on cosine nearest neighbour (i.e. Hes-Wei CCA-COSNN) by four groups of fewer different marked training samples respectively. The experiment is repeated for five times in each group and different training set and test set are employed every time. The comparison results can be obtained by the average of five results respectively as shown in Table 1.
From Table 1, it can be seen that in the cases of four groups based on fewer different marked training samples, Hes-Wei MCCA-COSNN (the proposed method) can obtain better recognition effect than the other recognition methods with different views. Compared by methods with ground dynamometer cards view only, Hes-Wei MCCA-COSNN improves the working condition average recognition rate by about 3.44% relative to COSNN and HesCOSNN. Similarly, compared by methods with electric power signal view only, Hes-Wei MCCA-COSNN can improve the average recognition rate by about 1.5% relative to COSNN and HesCOSNN; Compared by methods with ground dynamometer cards and electric power signal two views, Hes-Wei MCCA-COSNN can improve the average recognition rate respectively by about 0.62% and 0.04% relative to MC-COSNN and Hes-Wei CCA-COSNN respectively; Compared by method with ground dynamometer cards, electric power, the combination of wellhead temperature and wellhead pressure three views, Hes-Wei MCCA-COSNN can improve the average recognition rate by about 4.46% relative to MC-COSNN.
By selecting appropriate feature data, Hes-Wei MCCA-COSNN can further enhance the average recognition accuracy by adding new feature data, but MC-COSNN drops. In addition, based on different views, such as ground dynamometer cards, electric power, the combination of ground dynamometer cards and electric power, the combination of ground dynamometer cards, electric power and merged wellhead temperature and wellhead pressure, different working condition recognition methods (except MC-COSNN in three views) can obtain better recognition effect respectively in this paper.

Comparison results by various canonical correlation analysis recognition methods based on fewer different marked training samples
In this part, the proposed method Hes-Wei MCCA-COSNN is compared with traditional canonical correlation analysis recognition method (i.e. CCA-COSNN), weighted canonical correlation analysis recognition method (i.e. WeiCCA-COSNN), Hessian-regularized canonical correlation analysis recognition method (i.e. HesCCA-COSNN), Hessian-regularized weighted canonical correlation analysis recognition method (i.e. Hes-Wei CCA-COSNN), traditional multi-view canonical correlation analysis recognition method (i.e. MCCA-COSNN), weighted multi-view canonical correlation analysis recognition method (i.e. WeiMCCA-COSNN) and Hessian-regularized multi-view canonical correlation analysis recognition method (i.e. HesMCCA-COSNN) by four groups of fewer different marked training samples respectively. The experiment is repeated for five times in each group and different training set and test set are employed every time. The comparison results can be obtained by the average of five results respectively as shown in Table 2. It can be seen from Table 2, in the cases of four groups based on fewer different marked training samples, Hes-Wei MCCA-COSNN can obtain better recognition effect than the other recognition methods with canonical correlation analysis. Compared by methods with ground dynamometer cards and electric power signal two views, Hes-Wei MCCA-COSNN can improve the average recognition rate respectively by about 1.73%, 1.34%, 0.12% and 0.04% relative to CCA-COSNN, WeiCCA-COSNN, HesCCA-COSNN and Hes-Wei CCA-COSNN respectively. Compared by methods with ground dynamometer cards, electric power, the combination of wellhead temperature and wellhead pressure three views, Hes-Wei MCCA-COSNN can improve the average recognition rate by about 2.21%, 1.04% and 1.62% relative to MCCA-COSNN, WeiMCCA-COSNN and HesMCCA-COSNN respectively.
Through ground dynamometer cards and electric power two views, the working condition recognition methods based on canonical correlation analysis can obtain better recognition effect by Hessian regularization and weighted technology in this paper, and according to the descending order of the average recognition rate, the sort of the compared methods is Hes-Wei MCCA-COSNN (the proposed method), Hes-Wei CCA-COSNN, HesCCA-COSNN, WeiCCA-COSNN and CCA-COSNN. Similarly, through ground dynamometer cards, electric power, the combination of wellhead temperature and wellhead pressure three views, the working condition recognition methods based on canonical correlation analysis can also obtain better recognition effect by Hessian regularization and weighted technology in this paper, the sort of the compared methods is Hes-Wei MCCA-COSNN (the proposed method), WeiMCCA-COSNN, HesMCCA-COSNN and MCCA-COSNN by the descending order of the average recognition rate.

Comparison results by traditional multi-source feature extraction methods based on fewer different marked training samples
In this part, the proposed method (i.e. Hes-Wei MCCA-COSNN) is compared with traditional multi-feature connection recognition methods based on two views and three views respectively (i.e. MC-COSNN), traditional principal component analysis (PCA) recognition methods based on 95% and 99% of the contribution respectively (i.e. PCA-COSNN), traditional kernel principal component analysis (KPCA) recognition methods based on RBF kernel, 95% and 99% of the contribution respectively (i.e. KPCA-COSNN), traditional canonical correlation analysis recognition method (i.e. CCA-COSNN) and traditional multi-view canonical correlation analysis recognition method (i.e. MCCA-COSNN) by four groups of fewer different marked training samples respectively. The experiment is repeated for five times in each group and different training set and test set are employed every time. The comparison results can be obtained by the average of five results respectively as shown in Table 3. From Table 3, it can be seen that in the cases of four groups based on fewer different marked training samples, Hes-Wei MCCA-COSNN (the proposed method) can obtain better recognition effect than the other recognition methods with traditional multi-source feature extraction. Compared by methods with ground dynamometer cards and electric power signal two views, Hes-Wei MCCA-COSNN can improve the average recognition rate respectively by about 1.73%, 0.62%, 2.23% or 0.68%, 0.57% or 0.57% relative to CCA-COSNN, MC-COSNN, PCA-COSNN (contribution is 0.95 or 0.99), KPCA-COSNN (contribution is 0.95 or 0.99) respectively. Compared by methods with ground dynamometer cards, electric power, the combination of wellhead temperature and wellhead pressure three views, Hes-Wei MCCA-COSNN can improve the average recognition rate respectively by about 2.21%, 4.46%, 89.72% or 45.85%, 3.62% or 3.62% relative to MCCA-COSNN, MC-COSNN, PCA-COSNN (contribution is 0.95 or 0.99), KPCA-COSNN (contribution is 0.95 or 0.99) respectively.
With the two views of ground dynamometer cards and electric power, the proposed method and the other working condition recognition methods based on traditional multi-feature connection can obtain better recognition effect in this paper. By combing ground dynamometer cards and electric power with wellhead temperature and wellhead pressure, except the proposed method, the average recognition rate of all the other recognition methods drops, but the average recognition rate of the traditional canonical correlation analysis recognition methods and the traditional kernel principal component analysis recognition methods is higher than that of the traditional multi-feature connection recognition methods and the traditional principal component analysis recognition methods, and the average recognition rate of the proposed method is the highest.

Conclusion
A novel method for working condition recognition of sucker-rod pumping wells based on Hessian-regularized weighted multi-view canonical correlation analysis is proposed in this paper. The proposed method can greatly improve the generalization performance by integrating multi-view Hessian-regularized weighted learning and multi-view canonical correlation analysis algorithm. Experimental results show that the proposed method can obtain higher recognition accuracy than traditional methods based on single information source and multisources of feature connection by effectively utilizing a large amount of real-time multi-source information in big data and IOT of oil and gas production, thus further reduce fault alarm in working condition recognition of sucker-rod pumping wells.
On the basis of previous research, measured ground dynamometer cards, electric power, wellhead temperature and wellhead pressure signal are selected and the features are extracted by the mechanism analysis, prior information and empirical knowledge. Meanwhile, the working condition recognition model is established by Hessian-regularized weighted multi-view canonical correlation analysis and cosine nearest neighbour algorithm, thus further improve the model robustness and recognition performance. The established model is applied to recognize eleven kinds of typical working conditions of a certain block in Shengli Oilfield, China. Experimental results show that the proposed method has better recognition performance in the cases of fewer different marked training samples, therefore it is much more suitable for engineering applicability.
The proposed method applies the manifold regularization learning method and weighted technology to the traditional multi-view canonical correlation analysis method for identifying the working condition of sucker-rod pumping wells, which may provide a better solution to the common problem in fault diagnosis and recognition in which fault samples are difficult to obtain and afford a new thought to the multisource information fusion method and practicability, thus suggesting a significant scientific and application value.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work is supported by the National Natural Science Foundation of China (61671480).