A novel general kernel-based non-negative matrix factorisation approach for face recognition

Kernel-based non-negative matrix factorisation (KNMF) is a promising nonlinear approach for image data representation using non-negative features. However, most of the KNMF algorithms are developed via a specific kernel function and thus fail to adopt other kinds of kernels. Also, they have to learn pre-image inaccurately that may influence the reliability of the method. To address these problems of KNMF, this paper proposes a novel general kernel-based non-negative matrix factorisation (GKBNNMF) method. It not only avoids pre-image learning but also is suitable for any kernel functions as well. We assume that the mapped basis images fall within the cone spanned by the mapped training data, allowing us to use arbitrary kernel function in the algorithm. The symmetric NMF strategy is exploited on kernel matrix to establish our general kernel NMF model. The proposed algorithm is proven to be convergent. The facial image datasets are selected to evaluate the performance of our method. Compared with some state-of-the-art approaches, the experimental results demonstrate that our proposed is both effective and robust.


Introduction
Machine vision is a multidisciplinary technology, which requires much knowledge in different engineering areas, such as electronic engineering, optical engineering, software engineering and so on. Machine vision obtains useful information from images to establish intelligent models of the real world. While for recognizing facial images, face recognition has become an active research topic in electronic engineering. Face recognition technology is closely related to computer vision and machine learning, which is a biometric recognition technology based on the information of human face. Face recognition system uses cameras to collect facial images and then applies artificial intelligence technology to detect and recognise facial images. Dimensionality reduction and feature extraction are important in face recognition (X. Z. Liu & Ruan, 2019) because of the curse of dimensionality problem. Based on different criteria, a large number of feature learning schemes (J. H. Deng et al., 2011;Guillamet et al., 2002;Hancer, 2019;C. B. He et al., 2019;Hoyer, 2004;H. Lee et al., 2010;Li et al., 2001;Lin, 2014;W. X. Liu and Zheng, 2004;Nagra et al., 2020;Qian et al., 2020;Shi et al., 2012;Xu et al., 2003;Yi et al., 2017;Zafeiriou et al., 2006;Zhou and Cheung, 2021;Z. He et al., 2011;Turk & Pentland, 1991) have been presented to represent image data in a low dimensional feature space, among which deep learning (Davari et al., 2020;Too et al., 2020;Zalpour et al., 2020) has also been widely used in face recognition. The traditional dimensionality reduction methods involve principal component analysis (PCA) (Turk & Pentland, 1991), linear discriminant analysis (LDA) (Belhumeur et al., 1997;Yu & Yang, 2001), locality preserving projections (LPP) (X. He, 2003), wavelet transform (Akbarizadeh, 2012;Tirandaz & Akbarizadeh, 2015;Tirandaz et al., 2020) and nonnegative matrix factorisation (NMF) (D. D. Lee, 1999;D. D. Lee & Seung, 2001), etc. PCA aims to find a projection matrix such that the projected data preserve the maximum distribution and have uncorrelated features simultaneously. The goal of LDA is to make use of data label information such that the data have maximum ratio of inter-scatter to intra-scatter in a low dimensional feature space. LPP algorithm keeps the inherent local structure of the data after dimensionality reduction. However, PCA, LDA and LPP algorithms usually extract features containing negative elements, which are not interpretable in face recognition due to the non-negativity of the facial images. NMF, which does not allow subtract operation, is able to acquire the parts-based representation under the non-negativity constraint. NMF algorithm approximately decomposes the non-negative data matrix into the product of two nonnegative matrices, so it can extract non-negative sparse features (Eggert & Korner, 2004). Therefore, NMF has good interpretability in face recognition.
However, the above-mentioned methods are linear feature extraction algorithms. Their performances will be degraded when dealing with nonlinearly distributed data. It is known that the kernel method is effective to tackle the nonlinear problem in pattern recognition. Therefore, the linear approaches PCA, LDA, LPP and NMF are extended to their kernel counterparts KPCA (Schökopf et al., 1998), KLDA (Pekalska & Haasdonk, 2009), KLPP (Cheng et al., 2005) and KNMF (Zafeiriou & Petrou, 2010), respectively. This paper focuses on kernel-based NMF methods. The basic idea of kernel NMF is to map all image data into a kernel space through a nonlinear mapping. The mapped data can be represented as linear combinations of the mapped basis images with non-negative coefficients. Different kernel functions result in different kernel NMF algorithms, such as polynomial non-negative matrix factorisation (PNMF) (Buciu et al., 2008), quadratic polynomial kernel non-negative matrix factorisation (PKNMF) (Zhu et al., 2014), RBF kernel non-negative matrix factorisation (KNMF-RBF) (W. S. Chen et al., 2017), fractional power inner product kernel non-negative matrix factorisation (FPKNMF) (W. S. Chen et al., 2019), self-constructed cosine kernel nonnegative matrix factorisation (CKNMF) (W. S.  and so on. Empirical results on face recognition have shown that these kernel-based NMF algorithms outperform traditional linear NMF methods. Nevertheless, these kernel-based NMF methods mainly have two drawbacks, namely non-universal use of kernels and inaccurate pre-image learning. Usually, one KNMF algorithm has its own kernel, which does not work for the other kernels. For pre-image learning, these methods retain the first three terms of Taylor expansion and discard the rest terms directly. Therefore, the step of pre-image learning has a large error, which may negatively affect the performance of the algorithm. To solve these problems, Zhong et al. proposed a general NMF algorithm (GNMF) with a flexible kernel (Zhang & Liu, 2009). But GNMF algorithm still has some disadvantages. GNMF must utilise the inverse of the square root of the kernel matrix. It is known that the kernel matrix is just a positive semi-definite matrix and it is not necessarily reversible. This means that the GNMF algorithm may cause an ill-posed problem of computation. Especially, the GNMF algorithm cannot guarantee the non-negativity of the feature matrix since the inverse matrix may contain negative elements. Besides, GNMF employs eigenvalue decomposition of the kernel matrix, then directly sets the negative entries to zero. This procedure gives rise to a large decomposition error and thus affects its accuracy in classification tasks.
To sum up, most of the KNMF algorithms are developed using a specific kernel function, which limits the use of other kernels. Moreover, these methods have to learn pre-images inaccurately. These limitations may lead to unsatisfactory performance of KNMF algorithms in face recognition. In this paper, we come up with a novel general kernel-based nonnegative matrix factorisation (GKBNNMF) method. It not only avoids pre-image learning but also is suitable for any kernel functions as well. We assume that the mapped basis images can be non-negatively combined by the mapped training data, allowing us to use any kernel function in the algorithm. Then we establish GKBNNMF model via symmetric NMF strategy. Our GKBNNMF model is converted to two sub-convex optimisation problems. Employing the gradient descent method, we solve the optimisation problems and obtain the update rules of GKBNNMF. To prove the convergence of the GKBNNMF algorithm, we construct a new function according to the objective function and theoretically show that the constructed function is an auxiliary function. The iterative formulas of GKBNNMF can also be derived by finding the stationary point of the auxiliary function, which implies the convergence of the proposed GKBNNMF algorithm. Our method is applied to face recognition. Three publicly available face databases, namely ORL, Caltech 101 and JAFFE databases, are chosen for evaluations. We use three kernels, polynomial kernel, RBF kernel and fractional power inner product kernel, in our GKBNNMF algorithms. Compared with the state-of-the-art kernel-based NMF approaches, our methods have competitive performance. Our method can be also applied to the other nonnegative feature extraction problems in pattern recognition. For examples, some sources must be either zero or positive to be physically meaningful in source-separation problems (Martinez & Bray, 2003), the amount of pollutant emitted by a factory is nonnegative (Paatero & Tapper, 1994), the probability of a particular topic appearing in a linguistic document is nonnegative (Novak & Mammone, 2001) and note volumes in musical audio are nonnegative (Plumbley et al., 2002). Therefore, our method is also suitable to these applications. The novelty and contributions of our paper are highlighted as follows.
• Our method can make use of arbitrary kernel functions, while previous kernel-based NMF methods can only use specific kernels. Therefore, our method has more applicability. For example, in a scenario that several sources of data are available, we can use multiple kernels in our method to combine the multiple sources. • Our method can avoid inaccurately pre-image learning, which reduces the errors in the computations. Therefore, our method improves the reliability of kernel-based NMF methods. • The empirical results on face recognition indicate that our method is effective when using different kernels and also robust to Gaussian noise and speckle noise.
The remainder of this paper is organised as follows. Section 2 briefly introduces the NMF-based methods, including linear and nonlinear methods. In Section 3, we present GKBNNMF approach in detail. The experimental results are reported in Section 4. Finally, the conclusion is drawn in Section 5.

Related work
This section will introduce the related work. The details are as follows.

NMF
NMF (D. D. Lee & Seung, 2001) is a non-negative feature extraction method for partbased representation. NMF is to approximately decompose the non-negative data matrix X ∈ R m×n + into two non-negative factors, called the basis image matrix W ∈ R m×r + and the coefficient matrix H ∈ R r×n + . NMF solves the following optimisation problem: where F(W, H) is the objective function defined by F(W, H) = 1 2 X − WH 2 F . We can convert the above problem to two sub-convex optimisation problems and solve the problems using gradient descent method. The update rules are then acquired below: . The formula (1) is to normalise the columns of the matrix W such that the sum of each column equals 1. A ⊗ B and A B stand for the element-wise multiplication and division between A and B, respectively.

PNMF
NMF algorithm is a linear method, thus it cannot handle nonlinear problems. To deal with nonlinear problems, PNMF (Buciu et al., 2008) algorithm has been proposed recently. PNMF extends NMF via kernel method. The idea of kernel method is to map the samples from the original space to a high-dimensional linear space by a nonlinear function φ, so that the mapped samples are linearly separable in the high-dimensional feature space. The polynomial kernel function is defined as follows: by the nonlinear function φ. Then, the mapped data in the high-dimensional feature space is decomposed by NMF, i.e.φ( We define two kernel matrices: The optimisation problem of PNMF is The update rules of W and H are as follows: , where

The proposed GKBNNMF approach
To overcome the limitation of existing KNMF methods, we propose a new KNMF algorithm with a general kernel (GKBNNMF). Assuming that the basis vectors are represented as a linear combination of training data, we give a new optimisation problem of nonlinear NMF. Any type of kernel function can be used in the algorithm by replacing the kernel function with the corresponding kernel matrix. Then we derive the update rules of the optimisation problem. The convergence analysis of GKBNNMF is discussed as well.
Let the non-negative data matrix be X = [x 1 , x 2 , . . . , x n ] ∈ R m×n + , and the mapped data matrix be φ(X) = [φ(x 1 ), φ(x 2 ), . . . , φ(x n )] associated with a nonlinear mapping φ. The inner product between x and y in the feature space can be represented by a kernel function The dimensionality of φ(X) is often very high or even infinite. Therefore, it is infeasible to decompose φ(X) directly. We can resolve this problem in the following way. The basis vec- Then we present the following cost function: where K = φ T (X)φ(X) is the kernel matrix upon X and H ≥ 0. In (3), we convert the variable W to A. By changing the variable, we can use any kernel function avoiding the problem of the non-negativity of W.
We further simplify the objective function (3). Since K is a symmetric semi-positive definite matrix and K = φ T (X)φ(X), we can find φ(X) by symmetric nonnegative matrix factorisation (SNMF). Apply SNMF to K, that is, K ≈ SS T with S ∈ R n×k and S ≥ 0. Then (3) can be rewritten as Let an auxiliary matrix B = S T A. Note that B plays the role of φ(W). Substituting B for S T A in (4) leads to Then the optimisation problem of GKBNNMF is Before we derive the update rules of GKBNNMF, we introduce the definition of auxiliary function and a lemma.

Definition 3.1: G is an auxiliary function for function F(h) if the conditions
are satisfied.

Lemma 3.2: If G(h, h t ) is an auxiliary function, then F is non-increasing under the update
where t means the tth iteration.

Solution of H for fixed B
When B is fixed, we denote F(H) = F (B, H). This leads to the following optimisation subproblems: To solve (7), we use the gradient descent method to obtain the update rule for H: where is computed as follows: Substituting (9) into (8), we obtain To keep the nonnegativity of H t+1 , we set that then ρ t can be derived from Equation (11) We substitute Equation (12) into Equation (10) and obtain the update rule for H as follows: The objective function (5) From Equation (14), it can be seen that each of the column vectors of H are independent in the optimisation problem. So the cost function can be simplified as follows: We construct the auxiliary function of the function F(H .i ) below.
where δ ab is the indicator function. Then is an auxiliary function of F(H .i ) in Equation (15).
Comparing Equation (18) with Equation (17), we have that which implies the semi-positive definiteness of N. According to Lemma 1, we obtain update rule of H as follows: Setting From (16) and (21), we obtain the update rule of H ai as

Solution of B for fixed H
When H is fixed, we denote F(B) = F(B, H). We obtain the optimisation subproblem as follows: Similar to H, we use gradient descent method to find the updated rule for B as follows: The objective function (5) with respect to the matrix B = [B T 1. , B T 2. , . . . , B T n. ] T can be rewritten as From equation (25), the row vectors in B are independent in the optimisation problem. We can simplify the cost function by writing it in a row-wise form.
where δ ab is the indicator function, then is an auxiliary function of F(B i. ) in Equation (26).
Comparing Equation (29) with Equation (28), we see that which implies the semi-positive definiteness of M.
According to Lemma 1, we obtain Setting From (27) and (32), we obtain the update rule of B ib as

Determination of A
When B is learned by (6), we can find A according to the relation B = S T A. Since B and S are known and non-negative, we can solve the following optimisation problem: The update rule of A can be obtained as follows:

Convergence analysis
In this section, we will prove the convergence of GKBNNMF algorithm. (5) is non-increasing under the iterative updating rules

Theorem 3.5: The function F(B, H) in Equation
Proof: In the previous sections, we have obtained the iterative formulas (22) and (33) F(B, H) is bounded below, implying the convergence of GKBNNMF.

Feature extraction and recognition
Given a testing sample, we present a feature extraction method using GKBNNMF algorithm.
In the training process, coefficient matrices H (0) and B (0) are randomly generated, where the elements of the matrices are uniformly distributed in (0,1). Every column of W is normalised.
In the testing process, assuming that y is a testing sample, we have where h y is the extracted feature of φ(y). Multiply both sides by φ(X) T , we obtain i.e. K Xy = KAh y . Then h y can be obtained as where (KA) + is the Moore-Penrose pseudo inverse of KA. Assume that there are c classes and the number of training samples of class j is n j (j = 1, 2, . . . , c). The average of class j can be expressed as where 1 n j ×1 is an n j × 1 vector with all ones, and X j is the data matrix consisting of the vectors from class j. The steps of our algorithm is shown below.

Feature Extraction Stage
Step 1: The training samples are represented as non-negative column vectors. Then all the training samples are combined into a matrix X.
Step 2: Set the initialisation matrices B 0 and H 0 , characteristic number r, k, maximum number of iterations I max and error threshold .
Step 3: Update formulas (36) are used to update the matrices B and H.
Step 4: If the cost function F(B, H) ≤ or the number of iterations reaches I max , the iteration is stopped and B and H are output. Otherwise, go to Step 2.
Step 6: If the cost function F(A) ≤ or the number of iterations reaches I max , the iteration is ended and A is output. Otherwise, go to Step 5.

Recognition Stage
Step 6: For the test sample y, calculate the corresponding kernel vector K Xy .
Step 7: Compute the feature h y as h y = (KA) + K Xy .
Step 8: If p = arg min j h y − m j , then the test sample y belongs to class p, where m j is the average of class j.

Experimental results
This section will empirically discuss the convergence of the proposed algorithm and evaluate the performance of the proposed approach on three face databases, namely ORL, Caltech 101 and JAFFE. We use the RBF kernel in the GKBNNMF and GNMF algorithms, called GKBNNMF-RBF and GNMF-RBF. The algorithms such as NMF (D. D. Lee & Seung, 2001), KPCA (Schökopf et al., 1998), PNMF (Buciu et al., 2008) and GNMF-RBF (Zhang Liu (2009), W. S. Chen et al. (2017)) are chosen for comparison. We will further consider the noise experiment on ORL face database using these algorithms. Finally, two kernel functions, polynomial kernel and fractional inner-product kernel (W. S. Chen et al., 2019), will be employed in GKBNNMF and GNMF algorithms for comparisons. We called these algorithms GKBNNMF-Poly, GKBNNMF-FP, GNMF-Poly and GNMF-FP.

Face database
Here we introduce the three face databases used in the experiments.

The ORL database of faces
This face database involves 40 distinct individuals, each of which has 10 different images. For some individuals, the images were taken at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling) and facial details (glasses/no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).

Japanese female facial expression (JAFFE) database
The database contains 213 images of 7 facial expressions (6 basic facial expressions and 1 neutral) posed by 10 Japanese female models. Each image has been rated on 6 emotion adjectives by 60 Japanese subjects. Figure 1 presents the example images of the ORL, Caltech 101, and JAFFE databases.

Convergence
In this section, we empirically verify the convergence of our algorithm using different kernels, such as RBF kernel and polynomial kernel. The convergence curves of GKBNNMF-RBF and GKBNNMF-Poly on ORL dataset are given in Figure 2. Our target matrix for decomposition is X 750×80 , which is generated using the image data from ORL database. We set the maximum iteration number I max , feature number r, RBF kernel parameter t and polynomial kernel parameter d as I max = 500, r = 80, t = 7e2 and d = 2 respectively. The curves of cost function against iteration number are plotted in Figure 2 on ORL database using GKBNNMF-RBF and GKBNNMF-poly algorithms.
At each iteration, the smaller value of the cost function is, the faster convergent speed is. It can be seen that the proposed algorithms are fastly convergent under different kernel functions.

Performance on face recognition
In our experiment, the resolution of each face image is reduced by a double D4 wavelet transform. Here we use the Gaussian kernel in our method (GKBNNMF) and GNMF algorithm. Five algorithms, namely, KPCA, NMF, PNMF, KNMF-RBF and GNMF-RBF are selected for comparisons. KPCA (Schökopf et al., 1998) uses the RBF kernel function k(x, y) = exp(− x−y 2 t ) with t = 1e2. PNMF adopts the polynomial kernel function k(x, y) = x, y d with d = 2. For all algorithms, the maximum number of iterations I max is set to 300. The experiments are repeated ten times. The mean accuracy and standard deviation (std) are recorded.

Results on Caltech 101 database
For Caltech 101 face database, we set the feature number r to 440 during the experiment. The parameter t of RBF kernel function in KNMF-RBF, GNMF-RBF and GKBNNMF-RBF algorithms is set to 5e4. We randomly selected TN (TN = 4, 5, . . . , 8) images from each person as the training sample, and the remaining (18 − TN) * 19 images as the testing sample. The experimental results are tabulated in Table 1 and plotted in Figures 3 and 4.  The mean accuracies and standard deviations (std) of the algorithms under different training samples are recorded in Table 1. We can see that the average recognition accuracy of our algorithm GKBNNMF-RBF increases from 76.43% with TN = 4 to 84.00% with TN = 8. In addition, the mean recognition accuracy of NMF (D. D. Lee & Seung, 2001), PNMF (Buciu et al., 2008), KNMF-RBF (W. S. Chen et al., 2017) and GNMF-RBF (W. S. Chen et al., 2017;Zhang & Liu, 2009) increased from 68.68%, 51.39%, 61.73% and 66.13% with TN = 4 to 80.37%, 60.05%, 64.47% and 78.95% with TN = 8. The results indicate that the proposed GKBNNMF-RBF algorithm outpeforms the other algorithms. To show the performance of each algorithm on this database in details, we will use the cumulative match characteristic (CMC) curve with rank 1 -rank 5 accuracy for comparisons. The CMC curves are plotted in Figure 4, where the number of training samples (TN) ranges from 4 to 8. It can be seen that the CMC curves of our method are always located in the highest position, while those of KPCA are at the lowest position. This means that our GKBNNMF-RBF achieves the best performance and KPCA has the worst performance among all the compared algorithms. It also reveals that the non-negative feature is more suitable for non-negative image data in classification tasks.

Results on JAFFE database
For the JAFFE face database, TN (TN ranges from 3 to 7) images are randomly selected from each person as training samples, and the remaining (20 − TN) * 10 images are used as testing samples. We set the feature number r of all algorithms to 450 and the parameter t of the Gaussian kernel k(x, y) = exp(− x−y 2 t ) in GKBNNMF-RBF, KNMF-RBF and GNMF-RBF algorithms to 5e4. The experimental results are tabulated in Table 2, and plotted in Figures 5  and 6.
The average accuracy and standard deviation (std) of each algorithm under different training samples are listed in Table 2. The Rank1 recognition accuracy on the JAFFE database is plotted in Figure 5. From this figure, we observe that the average recognition rate of our proposed algorithm GKBNNMF-RBF increases from 96.76% with TN = 3 to 98.62% with TN = 7. The results on JAFFE database once again show that our algorithm has better performance than GNMF-RBF algorithm.
To further compare the performance of each algorithm, cumulative match characteristic (CMC) curves with training samples of 3, 4, . . . , 7 are drawn in Figure 6. It can be seen that when TN = 3, 4, and 7, the CMC curves of our method are at the highest position, while those of GNMF-RBF are always at the lowest position. This means that our GKBNNMF-RBF has the best performance and GNMF-RBF has the worst performance among all the compared algorithms.

Experiments with noise
This subsection will discuss the robustness of the proposed GKBNNMF-RBF algorithm on the ORL database with Gaussian noise and speckle noise. The testing images from the ORL dataset are contaminated by zero-mean Gaussian white noise and uniformly distributed zero-mean random multiplicative noise with variance σ varying from 0.2 to 0.6, respectively. The Gaussian noised images (σ = 0.2) and speckle noised images (σ = 0.2) of one individual from the ORL dataset are given in Figure 7. The algorithms including KPCA, NMF, PNMF, KNMF-RBF and GNMF-RBF are adopted for comparisons. The noise experiments of the compared algorithms are run 10 times under the same conditions.

Results with Gaussian noise on ORL database
We randomly choose three non-noised images from each person for training, and the rest seven images with Gaussian noise for testing. The feature number r of all the compared algorithms is set to 440. For the RBF kernel-based algorithms such as KNMF-RBF, GNMF-RBF and GKBNNMF-RBF, their kernel parameters t are set to 5e4, 1e3 and 1e3, respectively. Their mean accuracies and standard deviations (std) are recorded in Table 3 and plotted in Figure 8. It can be seen that when the variance σ of Gaussian noise increases from 0.2 to 0.6, the accuracies of all algorithms decrease. The performance of our GKBNNMF-RBF method decreases from 85.43% to 61.82%. In contrast, the performances of KPCA, NMF, PNMF, KNMF-RBF and GNMF-RBF decrease to a large extent. As can be seen from Figure 8, NMF is most sensitive to noise and its performance is significantly degraded, which indicates that the kernel-based NMF methods are more robust to noise than the linear NMF methods. It is interesting that the KNMF methods with the general kernels, namely GKBNNMF-RBF and GNMF-RBF, surpass the KNMF methods with a special kernel. GKBNNMF-RBF and GNMF-RBF have similar results when σ is less than 0.25. As σ is large than 0.25, the proposed GKBNNMF-RBF approach outperforms all the compared approaches. This means that the proposed algorithm has superior robust performance when the data are heavily contaminated.

Results with speckle noise on ORL database
We randomly choose three non-noised images from each person for training, and the rest seven images with speckle noise for testing. We set the feature number r of all algorithms to 400. The parameter t of the RBF kernel in GKBNNMF-RBF, KNMF-RBF and KNMF-RBF algorithms are set to 1e3, 5e4 and 5e4, respectively. The average accuracies and standard  deviations (std) of all the compared methods are tabulated in Table 4 and plotted in Figures 9. It indicates that the performances of all the compared algorithms decrease as the intensity of the speckle noise increases. We can see that when the variance of speckle-noise raises from 0.2 to 0.6, the accuracy of our GKBNNMF-RBF algorithm decreases from 91.82 to 89.68. The results show that the proposed GKBNNMF-RBF approach achieves the best performance under different levels of speckle noises. GNMF-RBF is the second-best algorithm among the compared algorithms, while NMF is most sensitive to speckle noise and its performance is the most severely degraded as the speckle noise is raised. Especially, with the variations of variance in speckle noise, our GKBNNMF-RBF method has the smallest changes  in recognition accuracy. This means that the proposed algorithm has good robustness to speckle noise.

Results on JAFFE database
For the JAFFE database, we compare GKBNNMF algorithms using different kernel functions to GNMF, PNMF and FPKNMF algorithms. We set the feature number r of each algorithm to 450, and the parameter t of the RBF kernel function of GKBNNMF-RBF and GNMF-RBF algorithm to 5e4 and 5e5 respectively. The parameter of polynomial kernel is set to 6, and that of fractional inner product kernel is set to 0.5. Table 5 lists the average accuracy and standard deviation of each algorithm under different training samples. The Rank1 recognition rates on the JAFFE database are plotted in Figure 10. The CMC curves with training samples of 3, 4, . . . , 7 are also plotted in Figure 11. We observe that GKBNNMF-RBF has the best performance. The performance of FPKNMF is   Figure 11. CMC curve comparison on JAFFE database.
the second best. The results also show that GKBNNMF outperforms GNMF when they use the same type of kernel.

Conclusion
It is known that the existing KNMF methods mainly encounter problems including inaccurate pre-image learning and non-universal use of kernels. This paper focuses on these problems and proposes a general kernel-based non-negative matrix factorisation (GKBN-NMF) approach, which can overcome the drawbacks of existing KNMF methods. We establish the GKBNNMF model using the symmetric NMF technique to ensure the non-negativity of the decomposition. The proposed GKBNNMF algorithm is shown to be convergent by theoretical analysis and empirical validation. The experiments on face recognition have demonstrated the effectiveness of our approach.
There are several challenges that need to be met in future study. First, the performance of the algorithm is sensitive to the setting of the hyperparameters including the characteristic number r and the kernel parameters. Currently, we set these hyperparameters manually. The initialisation matrices are also crucial to the performance. The cross-validation technique and some initialisation methods of NMF (Atif et al., 2019) could be considered to find good values of these hyperparameters.
Second, faster implementations of our method for large-scale data are also important for future research. The low-rank approximation of kernel method (Fine & Scheinberg, 2002) could be considered to speed up the algorithm. Alternatively, we can use the block technique (Pan et al., 2011) to reduce the size of the matrix, which may also reduce the computational complexity of the algorithm.
Finally, our method relies on the non-negativity of the kernel matrix, which can be ensured for the commonly used kernel functions. However, if the non-negativity of the kernel matrix is not satisfied, other factorisation instead of symmetric NMF should be used.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was partially supported by the Natural Science Foundation of Shenzhen (2020081500052 0001) and the Interdisciplinary Innovation Team of Shenzhen University .