Robust human activity recognition using single accelerometer via wavelet energy spectrum features and ensemble feature selection

Wearable sensor-based human activity recognition has been widely used in many fields. Considering that a multi-sensor based recognition system is not suitable for practical applications and long-term activity monitoring, this paper proposes a single wearable accelerometer-based human activity recognition approach. In order to improve the reliability of the recognition system and remove redundant features that have no effect on recognition accuracy, wavelet energy spectrum features and a novel feature selection method are introduced. For each activity sample, wavelet energy spectrum features of the acceleration signal are extracted and the activity is represented by a feature set including wavelet energy spectrum features and features of other attributes. Then, consideringthelimitationofsinglefilterfeatureselectionmethod,thispaperproposesanensemble-basedfilterfeatureselection(EFFS)approachtooptimizethefeatureset.Featuresthatarerobust tosensorplacementandhighlydistinguishablefordifferentactivitiesareselected.Intheexperi-ment,theaccelerationdataaroundwaistiscollectedandtwoclassifiers:k-nearestneighbour(KNN) andsupportvectormachine(SVM)areutilizedtoverifytheeffectivenessoftheproposedfeaturesandEFFSmethod.Experimentresultsshowthatthewaveletenergyspectrumfeaturescanincrease thediscriminationbetweendifferentactivitiesandsignificantlyandimprovetheactivityrecognitionaccuracy.Comparedwithotherfourpopularfeatureselectionmethods,theproposedEFFSapproach provideshigheraccuracywithfewerfeatures.


Introduction
In recent years, human activity recognition (HAR) has become a popular research area in pattern recognition and machine learning. This technology benefits numerous applications, such as health-care monitoring, fitness, smart home, etc. For example, it can be used in monitoring daily activities of elderly people living alone (Chernbumroong, Shuang, & Yu, 2015) and providing technical support to improve the speed of swimmers (Mooney, Corley, Godfrey, Quinlan, & Ó Laighin, 2016). By controlling the indoor temperature according to the amount of human activity, effective energy control can be achieved in smart homes (Javed, Larijani, Ahmadinia, & Gibson, 2017). Human activity recognition also plays an important role in developing somatosensory animations and games (Tachibana, Noah, Bronner, Ono, & Onozuka, 2011).
In activity recognition, video-based activity recognition has experienced great breakthroughs. Distributed cameras were utilized for tracking and activity recognition (Song et al., 2010). Microsoft Kinect camera was utilized by Huynh-The et al. (Huynh-The et al., 2018) for CONTACT Jie Wang wangjie@hebut.edu.cn developing an effective indoor activity analysis approach. This approach includes skeleton-based feature extraction and topic model-based learning, which are two major contributions. Ma et al. (Ma et al., 2014) also used Kinect camera and built a dataset that consists of six types of activities, curvature scale space (CSS) features were extracted from each frame and each activity was represented by a feature bag. At last, a parameter-optimized extreme learning machine was utilized to identify fall from other activities. Kwolek et al. (Kwolek & Kepski, 2015) proposed a fall detection method that combined Kinect and acceleration sensor. The acceleration data was used to monitor the trend of falling and the depth of the image was used to finalize and validate the fall warnings from accelerated data. But this method fails when the subject leaves Kinect's detection range. Video-based activity recognition methods have some disadvantages: first, the performance of these methods will be greatly affected by external conditions, especially when there is high illumination in the external environment. Moreover, the cameras are not suitable for installation in some places, such as bathrooms and bedrooms, which would invade personal privacy. Last but not least, the video-based system is complicated and cameras with high performance are expensive.
Sensor-based activity recognition which can get rid of the influence of light and disturbances shade has attracted more attentions from researchers (Cornacchia, Ozcan, Zheng, & Velipasalar, 2017). Previous works in this area can be roughly divided into two categories according to the composition of the sensor hardware system. In the first category, the general idea is to utilize the networked sensor layout to enhance the system's recognition capability (Lai et al., 2013). The final decision can be obtained by using fusion theory, such as data-level fusion (Gravina, Alinia, Ghasemzadeh, & Fortino, 2017), featurelevel fusion (Liu, Gao, John, Staudenmayer, & Freedson, 2012), and decision-level fusion (Mo, Liu, Gao, & Freedson, 2013). For example, Zhu et al. (Zhu & Sheng, 2009) proposed a data-level fusion approach for activity recognition and the acceleration data from feet and waist were utilized. The experiments showed that this method achieved an accuracy of 88.1% when using SVM, which was 12.3% higher than that of only using the acceleration data from hip. A gait analysis system based on body sensor network (BSN) is proposed in (Wang, Qiu, Cao, & Jiang, 2013) to measure quantitative gait parameters. The proposed BSN contains two wireless inertia sensors fixed to the left and right ankles, acceleration and angular velocity are obtained from both sides. Lai et al. (Lai, Chang, Chao, & Huang, 2011) designed a multi-accelerometer based fall detection system. Six accelerometers were fixed on different parts of the body when the accelerometer detected acceleration value was significantly higher than the normal range, the system issued a fall warning. However, these threshold-based fall detection methods are susceptible to noise, wear location and individual differences. The application of BSN also covers sports training. For example, a badminton training system which consists of left wrist sensor, right wrist sensor, waist sensor and right ankle sensor is designed in (Wang, Guo, & Zhao, 2016). The proposed system fused the data of four sensors at the feature level and utilized the double-layer hidden markov model (HMM) classification algorithm to recognize 14 types of badminton batting activities. Despite the multi-sensor system have made outstanding achievement. Excessive sensors on the body will cause inconvenience to the person's daily life, especially in long-term monitoring. Moreover, using multiple sensors will significantly increase the cost of the equipment.
Comparatively, the second category which utilizes a single sensor for activity recognition has attracted more attention because of its convenience and practicality (Shoaib, Bosch, Incel, Scholten, & Havinga, 2015).
Optimization work is always performed in the feature set so that the classifier can achieve good recognition performance and generalization in the single sensor-based recognition system. Khan et al. (Khan, Lee, Lee, & Kim, 2010) proposed a feature extraction method for real-time activity recognition on mobile phones. However, the system has poor performance when the phone is in the pocket of pants. A cascade classifier was proposed by Cheng (Cheng & Jhan, 2013) for fall detection by using a single triaxial accelerometer, different positions had been tested for the proposed cascade classifier. Experiments showed that the best result of 98.48% accuracy rate had been obtained when the single accelerometer was fixed to the chest. Wang et al. (Wang, Chen, Yang, Zhao, & Chang, 2016) explored the power of triaxial accelerometer built-in a smartphone in recognizing human physical activities. Experiments show that the accelerometer has superiority over the gyroscope in distinguishing some daily activities such as lying from sitting and standing. Acceleration data from lumbar position was used to recognize the four activities in (Li & Yao, 2016). However, this method is only effective when the upper body changes from vertical to level position. Margarito et al. (Margarito, Helaoui, Bianchi, Sartor, & Bonomi, 2016) researched the performance of template matching method on activity recognition by using single accelerometer placed on the wrist. Its performance is compared with the statistical learning classifier through experiments and it is found to be suitable as a supplement to the statistical learning recognition method. In (Gibson, Amira, Ramzan, Casaseca-De-La-Higuera, & Pervez, 2016), wavelet features of the accelerator data from the subject's chest were extracted and a multi-classifier frame which fuses the decision of each classifier by voting method was utilized for fall detection. However, this method requires more complex equipment, which is not conducive to its promotion and popularization, in addition, a multi-classifier system may reduce the real-time of detection. Rodríguez-Martín et al. (Rodríguez-Martín et al., 2015) proposed a hierarchical classifier to recognize the posture transition of patients with Parkinson's disease by using a single accelerometer fixed to the waist. The first level classifier identifies whether a person has executed a posture transition and the second level classifier recognize which type of posture transition has occurred.
Though most of the aforementioned researches have attempted to improve the recognition performance and reliability of activity recognition system, most of them ignored that the accelerometer is worn on the human body and is usually removed when not in use and is worn again before use. Therefore, wearing it once again does not guarantee complete and exact alignment with the previous position, which usually brings a deviation from the standard position. Some works (Wang, Wu, Chen, Ghoneim, & Hossain, 2016), (Chen, Zhu, Chai, & Zhang, 2017) have begun to study the robustness of recognition performance when positional changes occur. This problem is easy to ignore and is likely to cause a decline in recognition performance. Therefore, this paper focused on the reliability of single-accelerometer based human activity recognition and selecting optimum number of features robust to the position around waist.
At present, wavelet energy spectrum features based on wavelet analysis have been widely applied to fault diagnosis (Yan, Gao, & Chen, 2014) and biological signalbased disease diagnosis (Shriram, Sundhararajan, Shete, & Daimiwal, 2017), (Patidar, Pachori, & Acharya, 2015). As a kind of time-frequency feature, wavelet energy spectrum is more advantageous to deal with the mode mixing problem compared with some traditional time and frequency features. However, to our best knowledge, wavelet energy spectrum is rarely used in activity recognition problems. Therefore, in this paper, the acceleration wavelet energy spectrum features were utilized to construct feature set, enhancing the discrimination between different kinds of activities.
On the other hand, high-quality features are critical to improving the recognition accuracy of pattern recognition. There are uncorrelated or redundant features in the feature set that do not contribute to the recognition accuracy. Furthermore, these redundant features will lead to higher computational cost. Therefore, it is necessary to optimize the optimal feature subset to achieve good recognition performance and reduce computational cost. In addition, a good feature subset also needs to be robust to the position of the sensor's offset. Compared with wrapper method, filter feature selection methods can be easily applied to high-dimensional datasets and have high computational efficiency (Vergara & Estévez, 2014). However, most of these methods can only evaluate features from an evaluation criterion and they can't measure the features from the perspective of the feature set. Recently, some researches (Chandrashekar & Sahin, 2014), (Apolloni, Leguizamón, & Alba, 2016), (Abeel, Helleputte, Van de Peer, Dupont, & Saeys, 2010) have found that ensemble feature selection can effectively identify features with high discrimination in feature subsets and determine features with high correlation with output. Thus, an ensemble-based filter feature selection (EFFS) method is proposed in this paper to select the optimum number of features which are relevance and robust to the deviation position.
The structure of the paper is as follows: after the introduction, Section 2 introduces the wavelet energy spectrum features and the feature set that used in this paper. Section 3 presents the proposed ensemble-based filter feature selection (EFFS) method. Section 4 describes the experimental setup and data sets. The proposed features and the EFFS method are experimentally evaluated and analyzed in Section 5. Some concluding remarks are given in Section 6.

Wavelet packet decomposition
Based on wavelet transform, wavelet packet decomposition (Rioul & Vetterli, 1991) can decompose the acceleration signal into low-frequency (H) and high-frequency (G) components. The low-frequency approximation signal and high-frequency detail signal can be obtained by using wavelet packet decomposition. The acceleration signals of different activities have different frequency distributions (Bouten, Koekkoek, Verduin, Kodde, & Janssen, 1997). For example, the frequency of activity 'walk' is almost in the range of 1-6 Hz, and the frequency of activity 'jogging' is almost in the range of 1.7-16 Hz. The wavelet packet decomposition can better reflect the different characteristics of the high-frequency and the lowfrequency part of the acceleration signal under different activities. The three-layer wavelet packet decomposition process is shown in Figure 1.
The specific process of wavelet packet decomposition is as follows: given the orthogonal scaling function φ(t) and the wavelet function ϕ(t), the two scale relationship is: Where h 0k and h 1k are filter coefficients in multiresolution analysis. To further generalize the two-scale equation, define the following recursive relations: When n = 0, w 0 (t) = φ(t), w 1 (t) = ϕ(t). The set of functions defined above {w n (t)} n∈Z is the wavelet packet determined by w 0 (t) = φ(t). The wavelet packet {w n (t)} n∈Z is a set of functions including the scaling function w 0 (t) and the wavelet function w 1 (t). Therefore, the wavelet packet coefficient recursion formula is: The reconstruction formula of the wavelet packet is: Where d j+1,2n k and d j+1,2n+1 k are the kth decomposition sequences obtained by decomposing the original signal through the j-layer wavelet packet; h 0 , h 1 , g 0 , g 1 are lowpass and high-pass filter coefficients, respectively.

Wavelet energy spectrum features
The wavelet energy spectrum is the result of wavelet packet decomposition from the energy point of view. For different types of activities, the energy distribution of the acceleration signals in each frequency band will have a large diversity. Besides, the acceleration signal of human activity has the characteristics of non-linearity. So the wavelet energy spectrum can be used as the feature information and is suitable for distinguishing different activities. The relationship between the energy of the wavelet packet transform and the energy of the original signal is as follows: Therefore, the sum of squared signals in each frequency band after wavelet packet decomposition can be selected as the wavelet packet energy spectrum. The wavelet decomposition result is represented by d i,j (k), and the energy of the signal in each frequency band is: Where N represents the length of the original signal, all E i,,j constitutes the wavelet packet energy spectrum: After normalization, the feature vector is: The number of layers of the wavelet packet decomposition is directly related to the number of sub-bands obtained and the dimension of the eigenvector. Through a large number of experimental tests, we found that when the three-layer wavelet packet of the acceleration signal is decomposed, the energy distribution of each sub-band can effectively distinguish the activities. So in this paper, the dmey wavelet function is used to perform three-layer wavelet packet decomposition on the x, y, z triaxial acceleration signals. Since the sampling frequency of our input signal is 150 Hz, according to the sampling theorem, the frequency at the root node of the wavelet packet decomposition is set to 0-64 Hz. According to this, the frequency distribution of the first band of the third layer is 0-8 Hz, and the frequency of the subsequent band is sequentially incremented by 8 Hz. A total of 24 dimensional energy spectrum features are obtained. The distribution of the wavelet packet energy spectrum under six randomly selected activity samples is shown in Figure 2.
It can be seen from Figure 2 that the eigenvectors formed by the wavelet energy spectrum can clearly distinguish the six activities. Among them, the energy distribution of activity 'walking' is more gradual than the other activities and the activity 'ascending stairs' shows the most obvious fluctuation of the energy in the 24 frequency bands. The activity 'jump' has lower energy values in the first 8 bands than other activities and the activity 'descending stairs' is better distinguished from other activities in the last 6 bands. The overall trend of the energy distribution of the two running activities is similar, however, there is a significant difference in the energy amplitude in some frequency bands.

The proposed ensemble-based filter feature selection (EFFS) method
Ensemble feature selection is a new technique proposed to optimize the feature set (Solorio-Fernández, Carrasco-Ochoa, & Martínez-Trinidad, 2016). The superiority of ensemble feature selection has been demonstrated by many research fields, such as molecular signatures (Haury, Gestraud, & Vert, 2011), cancer diagnosis (Abeel et al., 2010) and load forecasting (Hua, Bao, Xiong, & Chiong, 2015). To our best knowledge, there are currently no reported researches applying ensemble feature selection to human activity recognition. Therefore, in this paper, EFFS method is proposed and utilized for feature selection in activity recognition. This section first introduces the four filter feature selection methods that we used in the EFFS method.

Information gain
Information gain (Agarwal & Mittal, 2013) as a classical filter feature selection method can determine the relevant attributes of each feature from a set of feature sets. This method ranks and selects the features based on information theory to reduce the feature dimensions of the machine learning algorithm. By measuring the entropy of the distribution, the features' uncertainty are taken into consideration. Then, it ranks the features based on the relationship between feature and class. The inadequacy of the method is that it ignores the uniformity of the distribution of feature items between classes. The entropy of variable X can be defined as: Let P(x i ) denote the prior probability value of X. After obtaining the value of another variable Y, the entropy of X is defined as: In the above equation, when Y is known, the posterior probability of X and Y can be expressed as P(x i | y j ). The information gain can be defined as the reduction of X entropy due to the additional information Y brings to X, which is defined as: Based on the above theoretical analysis, Equation (11) can be used to calculate the feature relevance ranking, which will be utilized to select the robust features with better distinguishing performance.

Gain ratio
Gain ratio (Ibrahim, Badr, & Shaheen, 2012) is an improvement of the information gain method. This method is a supplement to information gain to cope with the characteristics of features with large diversity. This kind of deviation may reduce the generalization ability of learning algorithm. The gain ratio improves information gain by determining the feature through using the number and size of branches and taking intrinsic information from the feature distribution entropy. The following formula can be used to calculate the gain ratio for a given feature x and y.

Chi-squared statistic
The Chi-squared (χ 2 ) statistic (Nissim, Moskovitch, Rokach, & Elovici, 2012) measures the independence of two variables by calculating the Chi-squared score. The independence of the feature item relative to the one activity can be judged by Chi-squared. A low value of χ 2 indicates that the features and class are independent whereas a large value indicates that there is a high degree of dependency between the features and the class. The Chisquared statistic has the advantage of small computational complexity. However, this method has the negative correlation between the frequency of occurrence of features and the category, which may affect the feature selection. Chi-square statistic can be defined as:

ci)P(r,ci) − P(r,ci)P(r, ci)] 2 P(r)P(r)P(ci)P(ci)
Where N represents the entire dataset, r represents the presence of the feature and c i represents the activity category. P(r, c i ) indicates the probability that r occurs in the c i and P(r, c i ) indicates the probability that r does not appear in the c i . Also, P(r, ci) and P(r, ci) respectively indicate the probability that a feature occurs and does not occur in an activity category that is not marked as c i . P(r) represents the probability of a feature appearing in the dataset and P(r) is the probability that the feature does not appear in the dataset. P(c i ) is the probability that the dataset is marked as c i .

ReliefF
ReliefF (Kononenko, 1994) is a multi-category feature selection method proposed by Kononenko in 1994 base on Kira's work. ReliefF is mainly used to solve problems such as multi-classification, data loss and data noise (Reyes, Morell, & Ventura, 2015). ReliefF can handle incomplete data and reduce the effects of noise in the data. However, its random selection of sample strategies affects the features that have a significant effect on the separation of small samples. The main point of the ReliefF is to perform feature evaluation based on the distinguishing ability of features on close-range samples.
The high-quality features should make the same kind of samples close to each other, and the different types of samples are far away from each other. The ReliefF updates the weight of the feature using equation (14): Where W[i] is the weight of feature i, R s are the samples randomly selected in the training set, H j is the jth sample of the k nearest neighbour samples that is the same kind as H j , and M j is the jth sample of the k nearest neighbour samples that are heterogeneous with R s . P(c) is the probability of occurrence of a class C sample.

The framework of proposed EFFS method
The EFFS method is a data preprocessing stage before implementing the machine learning algorithm. Our proposed EFFS method is based on the feature ranking obtained by the above-mentioned filter method. It utilizes a single-party filter method to perform initial selection of the original feature space, this includes that the four feature selection method mentioned above are utilized to obtain the results of feature ranking according to the degree of feature importance. The ranking result of each filter method can be expressed as: (2), · · · , XI(k), · · · , XI(m)] AGR = [XG(1), XG(2), · · · , XG(k), · · · , XG(m)] ACS = [XC(1), XC(2), · · · , XC(k), · · · , XC(m)] AReliefF = [XR(1), XR(2), · · · , XR(k), · · · , XR(m)] Where X I (k), X G (k), X C (k) and X R (k) represents the ranking number of the k th feature in these four methods, respectively. A IG , A GR , A CS and A ReliefF represents the ordering of features by the four feature selection methods, respectively. Based on the feature ranking obtained by the above four filter-based feature selection methods, EFFS method utilizes the 'linear summation' rule to combines the output of each filter method. The ranks of each feature in the four filter feature selection methods will be added and EFFS utilizes the sum of ranking obtained as an index to measure the importance of the features. Each feature is sorted according to the new index from the smallest to the largest, so as to obtain the ensemble ranking of features under the four methods, which can be expressed as: Where In this formula, α, β, γ , ε are the weights corresponding to the four methods. These weights in the formula are obtained from multiple experiments to obtain a better weight combination of the four feature selection methods. Different weight combinations including (2, 1, 2, 1), (5, 3, 2, 3) and (2,5,4,3) were tested by experiments. The experimental results of these three weight combinations are detailed in Section 5.3. X(k) is the index used to sort features by the EFFS method, and the features are ranked according to the value of X(k) from small to large. Figure  3 shows the specific flow of the proposed EFFS method.

Experimental equipment and data
The experimental dataset has been acquired by using the TRIGNO EMG-acceleration signal sensing device with an acceleration range of ± 6g and a sampling frequency of 150Hz. The data acquisition platform of the device is shown in Figure 4(a). Since the data collection node, as shown in Figure 4(b) has wireless transmission function, the signal can be transmitted to the data acquisition platform in real time. The whole acquisition process is performed under wireless conditions, as shown in Figure 4(c). Five healthy students took part in the experiment and they were asked to perform the activities shown in the second column of Table 2 in sequence. Each volunteer was asked to tie the data collection node to the specified standard position (as shown at the top of Figure 5) and the simulated deviation position (shown at the bottom of Figure 5). The samples acquisition under the specified standard position are utilized for training and the samples acquisition under the simulated deviation position are utilized for testing the robustness of the proposed method to the sensor position deviation. So we can improve the reliability of the recognition system to the sensor deviation from the feature selection aspect. The right side of Figure 5 shows the acceleration signals for a set of 'walking' activity collected by the sensor in both positions. After pre-processing of the collected data, we utilized the sliding window technique to split the activity   signal for feature extraction. The sliding window with 50% overlap is chosen and the window length is 300 samples. The features of the each window are extracted as shown in Table 1 and then the features are normalized. Finally, under the standard position and the deviation position, the number of feature samples for each action is shown in the third column of Table 2.

Experimental results and analysis
Since data was acquired from five subjects, the leave one out (LOO) strategy was utilized to train and recognize activities. The feature samples collected under the standard position of the four volunteers were used as training data and the feature samples under the deviation position of another volunteer were used as test data. The experiment was repeated five times until the deviation data of five volunteers were tested and the final results were the average of five experimental values.

The influence of wavelet energy spectrum features on recognition
Feature information with higher discrimination has great benefits for the performance of the classifier. Figure 6 shows the effect of the wavelet energy spectrum features of the two classifiers on the overall accuracy of the recognition. Tables 3 and 4 show the effects of wavelet energy spectrum feature on the accuracy of each activity using SVM and KNN classifiers, respectively. It can be observed Figure 6. The effect of the proposed features (wavelet energy spectrum) on the overall recognition accuracy. It can be observed from Figure 6 that the feature set with wavelet energy spectrum features can significantly increase the overall accuracy of classifiers. This demonstrates that the feature set consisting wavelet energy spectrum features can improve the discrimination between the activities. For the reason of the classification principle, the overall recognition enhancement effect of the proposed features under different classifiers is not consistent. Among them, the improvement of SVM is more obvious.
from Tables 3 and 4 that between these two classifiers, the improvement of wavelet energy spectrum features for different activities is different. The accuracy of KNN classifier for the activities 'Ascending stairs', 'Descending stairs', 'Running (7km/h)' and 'Running (10km/h)' are significantly increased. The accuracy of SVM classifier for the activities 'Ascending stairs', 'Descending stairs', 'Jumping', 'Sitting', 'Running (7km/h)' and 'Running (10km/h)' have increased by more than 2%. Additionally, in order to gain a better insight into the effect of wavelet energy spectrum on distinguishing different activities, Tables 5-8 show the confusion matrix of activity recognition using KNN and SVM classifier with feature set of wavelet energy spectrum features or without wavelet energy spectrum features, which can demonstrate the effectiveness of wavelet energy spectrum features in activity recognition.
It can be seen from Tables 5-6 that the KNN classifier has many misrecognitions of some similar activities when the wavelet energy spectrum features are not considered. For example, activity 'Running (7km/h)' is misrecognized as 'Running (10km/h)', activity 'jumping' is misrecognized as 'Ascending stairs' and 'Running (10km/h)'. After applying the wavelet energy spectrum features, the discrimination of activity is increased. For example, the discriminations between activities 'jump' and 'Ascending stairs', activity 'Running (10km/h)' and 'Running (7km/h)' are increased. It can also be seen from Tables 7-8 that after application of wavelet energy spectrum features, it is easier to distinguish the activity 'jump' and 'Running (7km/h)', activity 'Ascending stairs' and 'Descending stairs'.

Comparison of feature selection methods
In the previous section, we analyzed the effect of wavelet energy spectrum features on activity recognition performance. Based on this, we further optimized the feature set with robustness to the deviation position by feature selection method to improve recognition performance. First, all the features in Table 1 are extracted as data samples for the data set under the deviation position. Considering the randomness of the samples, the feature ranking results of each feature selection method may be not representative. Therefore, in this paper, each feature selection method is run five times and the feature ranking result is the average of the five experiments. For convenience of expression, the ranking results of the four feature selection methods are arranged according to the feature IDs in Table 1, as shown in Figure 7.
As can be seen from Figure 7, some features are ranked higher in the selected four feature selection methods. For example, feature ID7 ranks high in the gain ratio, chisquare statistics and ReliefF method and feature ID56 ranks high among the four feature selection methods. This indicates that these features have a higher degree of activity discrimination ability. Similarly, there are some features that are ranked behind by some feature selection methods. For example, feature ID27 ranks lower in the information gain, chi-square statistic, and ReliefF methods, feature ID14 ranks lower in the information gain, gain ratio, and ReliefF methods. This indicates that these features are considered to be less distinguishing or redundant features by some feature selection methods. According to the results of the first three feature selection methods, some of the proposed 24 new features (ID 36-59) are ranked after 20. Compared with the first three feature selection methods, less proposed features Table 5. Confusion matrix for activity recognition using KNN and feature set without wavelet energy spectrum.   S T  2 7 3  0  2  1  0  0  0  0  0  JP  0  228  0  0  0  4  11  31  2  W  1  0  2 6 2  5  0  6  2  1  0  SD  3  0  10  369  1  9  2  2  1  LY  0  0  0  2  362  3  22  0  12  R7  0  3  3  12  4  302  118  2  1  R10  0  7  0  0  18  121  314  0  2  AS  0  21  0  0  0  1  0  261  35  DS  0  2  0  2  15  0  0  35  253   Table 7. Confusion matrix for activity recognition using SVM and feature set without wavelet energy spectrum.   are ranked after 20 by ReilefF. This indicates that different evaluation criteria have different rankings for the proposed features. Some of the proposed features show poor ability to distinguish in the first three methods. However, more proposed features show better ability to distinguish in ReliefF. Therefore, the feature rankings obtained by the four feature selection methods are used as a reference for EFFS method. This enables us to eliminate the features that are irrelevant or redundant. In order to verify the EFFS method experimentally, several comparative experiments are performed. The EFFS is compared with four traditional filter feature selection methods. In addition to the feature selection approaches, we also test the EFFS method under the classifiers of different classification principles. And the SVM and KNN classifiers are utilized to measure the effectiveness of the proposed feature selection method. Figures 8 and 9 show the recognition accuracy of the proposed EFFS and four comparison feature selection methods using KNN and SVM classifiers respectively with feature selection dimensions of 1-50 dimensions. In Figures 8 and 9, the number x on the X-axis refers to the top x selected features with descending order of importance using different methods. The parameters of the EFFS α, β, γ , and ε are set as 2, 1, 2, and 1 respectively. This combination of parameters was chosen randomly in the experiment.
The purpose of feature selection is to replace the redundant features with features that are as distinguishing as possible. Too many features may contain some redundant features, which does not have much significance for improving the accuracy of the classification algorithm and may increase the computational complexity of the algorithm. Therefore, selecting the appropriate number of features can greatly improve the computational efficiency of the classification algorithm. However, few features tend to reduce the recognition accuracy of the algorithm to a great extent. It can be seen from Figures 8-9 that few features cannot guarantee high recognition accuracy. For example, no matter which  method is adopted, if the feature set contains less than 8 features, both classifiers SVM and KNN achieve less than 65% accuracy. To a certain extent, as the number of selected features increases, the recognition performance gradually gets better. The accuracy can reach 80.4% by using EFFS with 24 features when using KNN and can reach 81.7% by using EFFS with 20 features when using SVM. It can also be observed from Figures 8-9 that the proposed EFFS outperforms other four filter-based feature selection methods. The method proposed in this paper can obtain higher accuracy under the same feature dimension, which can show its superiority.

The performance of EFFS under different parameters
In the proposed method, the weights α, β, γ and ε are used to coordinate the relationship between the four independent feature selection methods in the EFFS method. Since the ranking criteria of each feature selection method is different, weight configurations can reflect the diversity effects of the EFFS. To better illustrate the EFFS method, we utilized SVM and KNN to evaluate the effect of the EFFS method on the recognition accuracy against different weight configurations. Three sets of parameter combinations were randomly selected for the experiment. Figures 10 and 11 respectively shows the recognition accuracy of SVM and KNN against three weight configurations of α, β, γ and ε using the proposed EFFS. ReliefF is chosen as the benchmark method for comparison.
It can be seen from Figures 10-11 that when the respective weights of the proposed method are (2, 1, 2, 1), the accuracy of the classifier can achieves its highest value. This may be due to the fact that this set of weights can balance the four filter-based feature selection methods and obtain the most distinguishing feature set. It can also be found from Figures 10-11 that the accuracies of the classifiers are different against different  weighted configurations, but the overall average level is higher than the results obtained by the baseline method ReliefF, which again proves the validity of the proposed EFFS method in this paper.

Conclusions
In order to reduce the influence of minor deviation from the standard position of accelerometer on HAR, the wavelet energy spectrum features and EFFS method have been proposed in this paper. We utilized two kinds of classifier KNN and SVM to test the feature set with and without our proposed features. The experiments show that compared with the feature set without introduced features, the feature set with wavelet energy spectrum features can improve the discrimination of activities, which is beneficial to improve the accuracy of classifier. Several comparison experiments with four filter-based feature selection methods are employed to verify the effectiveness of the proposed EFFS method in human activity recognition. The results show that the proposed method is more robust and can get better recognition performance compared with other approaches under the same number of features. In future work, we will conduct experiments using the proposed EFFS method to analyze the performance of wavelet energy spectrum features in HAR. Besides, we will utilize data sets from other body positions such as chest or ankle and more classifiers such as extreme learning machine or deep learning methods to verify the significance of wavelet energy spectrum features and the proposed feature selection method in HAR.

Disclosure statement
No potential conflict of interest was reported by the author(s).