Device-free human micro-activity recognition method using WiFi signals

ABSTRACT Human activity tracking plays a vital role in human–computer interaction. Traditional human activity recognition (HAR) methods adopt special devices, such as cameras and sensors, to track both macro- and micro-activities. Recently, wireless signals have been exploited to track human motion and activities in indoor environments without additional equipment. This study proposes a device-free WiFi-based micro-activity recognition method that leverages the channel state information (CSI) of wireless signals. Different from existed CSI-based micro-activity recognition methods, the proposed method extracts both amplitude and phase information from CSI, thereby providing more information and increasing detection accuracy. The proposed method harnesses an effective signal processing technique to reveal the unique patterns of each activity. We applied a machine learning algorithm to recognize the proposed micro-activities. The proposed method has been evaluated in both line of sight (LOS) and none line of sight (NLOS) scenarios, and the empirical results demonstrate the effectiveness of the proposed method with several users.


Introduction
Human motion and activity analysis has received increasing attention in recent decades because of advances in computing and sensing technologies as well as interest in action and gesture recognition applications such as security and surveillance, human-computer interaction, and gaming (Campbell et al. 2008). Information collected from target objects and target environments can be exploited to identify an appropriate action (Campbell et al. 2008). Therefore, human motion and activity analysis is realized by combining sensing and reasoning to deliver context-aware data that can be employed to provide personalized support in many applications (Chen et al. 2012). Traditional human activity recognition (HAR) approaches have proposed various novel methods which are applied in different sensing areas, such as security, entertainment, and healthcare (Lara and Labrador 2013;Poppe 2010). Vision-based techniques generally require the installation of cameras in the perceived environments to track human motion; therefore, vision-based (or camera-based) systems require adequate light conditions and cannot traverse through physical barriers such as walls. The primary drawbacks of sensor-based sensing mechanisms are the burdens imposed by installation in test areas or the human body as well as inconvenient usage, particularly for the patients or elderly. The shadowing effects produced by a moving object in the line of sight (LOS) between a wireless transmitter and a receiver in a home wireless network can be used to track object motions in indoor environments (Woyach, Puccinelli, and Haenggi 2006;Youssef and Mah 2007). The received signal strength indicator (RSSI) of a wireless signal fluctuates due to the object movements in indoor environments. Therefore, such fluctuations can be used to track object motions without requiring burden equipment. This observation has opened a new avenue for sensing technologies that depend on only wireless signals. Many studies have proposed human localization, human motion detection, macro HAR, and micro-activity recognition methods that use RSSI and channel state information (CSI).
Device-free CSI-based methods outperform RSSIbased methods relative to localization accuracy of <1 m, wherein RSSI is >1 m . CSI-based methods can detect an anomaly which is affected by environmental changes and can reflect the varying multipath reflection caused by an intruder's presence due to its frequency diversity (Bhartia et al. 2011). RSSI and CSI are coarse-and fine-grained channel information, respectively. The main differences are that (1) RSSI is the average value of the signals received, whereas CSI contains more information about the fading channel through amplitude and phase and (2) RSSI is severely affected by multipath effects.
Herein, the CSI of wireless physical layer is leveraged to track human micro-activities in indoor environments such as dodge, push, strike, circle, punch, bowl, drag, pull and kick. An efficient method is proposed to extract both amplitude and phase information from CSI. Each unique motion is observed to have different impacts on wireless CSI; therefore, each action has a unique pattern. However, exposing the unique patterns of each micro-activity is challenging. This study proposes an effective method to extract patterns from amplitude and phase information. Then, a machine learning algorithm is employed to classify nine micro-activities.
The main contributions of this paper are summarized as follows.
• A device-free micro-activity recognition method is proposed that uses ubiquitous wireless devices, thereby eliminating the need for additional hardware installation. • Differing from previous methods, the proposed method exploits both phase and amplitude information in CSI. • An efficient pattern recognition method is proposed to expose the start and end points of a given activity. • The efficiency of the proposed method is verified via empirical results.

RSSI
Over the last decade, RSSI has been used in human localization, human motion detection, and human activity recognition research. For human motion detection (Kosba, Saeed, and Youssef 2012;Moore et al. 2010;Yang et al. 2010), RSSI has been leveraged to capture environmental changes that become anomalous when an intruder enters an observed environment. Moore et al. (2010) presented a system positioned human motion that considers changes in the standard deviation of the received signal strength between stationary wireless transmitters and receivers at fixed locations. Kosba, Saeed, and Youssef (2012) utilized RSSI to track environmental changes that fluctuate when a moving object (i.e. a human) enters an area of interest. Yang et al. (2010) proposed an RSSI-based joint intrusion learning method that simultaneously classifies several human intrusion patterns. Booranawong, Jindapetch, and Saito (2018) presented a human motion detection and tracking method that uses RSSI. This method defined two functions: one function to enhance the collecting and measuring process of RSSI signals affected by human motion and a second function that uses a predefined threshold and zone selection method for human motion detection in indoor environments.
Over the recent years, device-free RSSI-based technology has developed to recognize human macro human activity, such as sitting, walking, lying, and standing (Gu, Quan, and Ren 2014;Scholz et al. 2013;Sigg et al. 2014). Scholz et al. (2013) presented a device-bound and device-free HAR system that can classify four macro activities (standing, walking, lying, and sitting). The proposed system used 802.15.4 RSSI for HAR in two different methods. In the first method, the target user carries a wireless node (device-bound); in the second method, the target user moves in a wireless sensor network (WSN) without a WSN node (device-free). They performed 10-fold cross-validation with all nodes and achieved accuracies of 0.896 (device-bound), 0.894 (devicefree), and 0.88 using an accelerometer. However, this method requires the target user to carry wireless devices or install wireless sensor nodes in the target environments. Therefore, such a system cannot be considered device-free because the system implementation requires special devices to be carried by the target object or to be installed in the test environment. Sigg et al. (2014) presented a human activity recognition system that considers the fluctuation in radio signals caused by human activities. The proposed system can track several human activities such as lying, walking, standing, and crawling. Their system achieved high accuracy under different conditions. Although this method leveraged RSSI to track human activities, it was implemented using software defined radio with special devices such as universal software radio peripherals. Gu, Quan, and Ren (2014) proposed a WiFi-assisted HAR method primarily aiming to utilize data mining technique to abstract fingerprints of several activities in the radio signal strength data. They tested sitting, standing, and walking activities in a static environment and achieved an accuracy of 0.75 using a KNN classifier and 91% using a proposed fusion algorithm (Gu, Quan, and Ren 2014).
Some studies have recently presented micro-activity recognition systems such as hand gesture recognition (Abdelnasser, Youssef, and Harras 2015; Melgarejo et al. 2014). A hand gesture recognition scheme was previously proposed based on readily deployable ubiquitous WLAN devices using a sophisticated WARP v3 board equipped with two RE14P directional patch antennas (Melgarejo et al. 2014). The proposed scheme was implemented and tested in two different scenarios: gesture-based electronic activation from a wheelchair (tested up to 25 gestures and achieved 92% accuracy) and gesture-based control of a car infotainment system (average accuracy of 84%).
WiGest (Abdelnasser, Youssef, and Harras 2015) is a device-free gesture recognition system that leverages the fluctuation in RSSI caused by human hand motions in test movements. WiGest recognized several hand gestures and achieved an accuracy of 87.5% using a single transmitter and an average accuracy of 96% using three overhead transmitters.
However, RSSI-based device-free sensing technique has a limitation due to the variability of RSSI caused by environmental changes that may cause false detections.

CSI
Recently, CSI was leveraged for indoor localization and activity recognition. A device-free WiFi-based localization method was proposed by utilizing changes in CSI across multiple wireless link subcarriers (Wu et al. 2012). In a CSI-MIMO fingerprint positioning system (Chapre et al. 2014), MIMO information, CSI amplitude, phase information of each subcarrier were comprehensively utilized to obtain accurate position information. Zhou et al. (2014) modeled the CSI subcarrier amplitude as a histogram. They then applied the empirical mode decomposition (EMD) algorithm for signal classification and constructed a fingerprint database to design a passive omnidirectional human detection system (Omni-PHD), which can effectively detect the full range of human emergence. Xi et al. (2014) proposed a devicefree crowd counting system (FCC). FCC observes the variation in CSI and its relation with the number of moving people. Han et al. (2018) leveraged CSI to track human motion in an indoor environment using the multiple antenna voting method. A home intruder system based on the CSI of WiFi signals was also presented (Al-qaness et al. 2016b). This method tracks door intrusion by tracking the fluctuations of human motion in an indoor environment. Lv et al. (2017) also presented an indoor intrusion system based on CSI. This system uses a hidden Markov model to classify human intrusion actions. Wang et al. (2014) proposed the E-eyes indoor activity recognition system. The E-eyes system divides human activities into in-place activities, such as cooking in the kitchen or bathing in the bathroom, and walking activities such as walking while talking on the phone. E-eyes uses a WiFi access point and common family WiFi devices to construct an activity recognition system that identifies a fixed-position activity (i.e. in-place activities) and walking movements. This system can identify different activities in the same location, such as standing while washing dishes, and human walking activities, such as walking while talking on the phone. This system is based on the fact that different activities have a different CSI amplitude histogram distribution, and this histogram distribution is used to construct human activity feature information profiles in a semi-supervised manner. Then, human activities are identified using pattern matching algorithms. Wei et al. (2015) proposed HAR by analyzing CSI at the receiver end of the communication system to classify four activities (walking, standing, lying, and sitting). They demonstrated the manner in which radio frequency interference (RFI) can impact device-free HAR applications. They proved that conducting experiments in environments with RFI caused significant impact on CSI vectors. In the absence of RFI, different activities yield different CSI vectors that can be differentiated visually. The results obtained in an environment without RFI showed good accuracy; however, in the presence of RFI, accuracy gradually decreased. This system was evaluated using a pair of Wireless Ad hoc System for Positioning (WASP) nodes.
De Sanctis et al. (2015) proposed the HAR system, known as WIBECAM, which recognizes walking, sitting, and standing. WIBECAM periodically collects beacon frames sent by WiFi access points. WIBECAM works like snapshots by observing the received beacon frames and calculates frequency domain spectral metrics of each observed frame. Wang et al. (2017) presented a CSI-based activity recognition scheme known as CRAM. CRAM recognizes several human activities, such as running, walking, sitting down, opening a refrigerator, falling, pushing, and boxing. CRAM considers the correlation between the collected CSI value and the implemented activity. Dong et al. (2018) proposed a CSIbased HAR scheme that incorporates correlationbased fusion, Doppler spread spectrum, and moving variance segmentation methods. Furthermore, other studies (Al-qaness et al. 2016a; Li et al. 2016) have proposed different methods to avoid stream different sensitivities to human motion since each CSI stream has a different sensitivity to human motion, so, some streams may have more sensitivity, while others may have not enough sensitivity to human motions. To avoid false detection, insensitive CSI streams must be eliminated. A so-called bad stream elimination algorithm was previously proposed (Al-qaness et al. 2016b) to eliminate so-called bad streams (insensitive streams). Principal component analysis (PCA) was also applied across all CSI streams ). However, CSI-based HAR is still in its nascent stages and requires improvement.
More so, CSI has been leveraged in various human micro-activity recognition studies. For example, Nandakumar, Kellogg, and Gollakota (2014) presented a CSI-based hand gesture recognition method that classifies four hand gestures in LOS and simple non-line-of-sight (NLOS) scenarios with average accuracies of 91% and 89%, respectively. He et al. (2015) presented a WiFi-based hand gesture recognition system known as WiG that can classify four hand gestures in both LOS and NLOS scenarios, demonstrating accuracies of 92% and 88%, respectively. Sun et al. (2015) proposed the in-air handwriting recognition system known as WiDraw. They used the angle of arrival of the received signals at the wireless receiver part to track the direction of the target hand. WiDraw classified several in-air handwriting actions at an average accuracy of 91%. Another study (Al-qaness and Li 2016) presented the WiGeR: WiFi-based gesture recognition system that extracts CSI and employs wavelet analysis and short-time energy to expose unique patterns produced by unique hand motions in a specific duration. WiGeR classified 13 hand gestures under different scenarios and achieved an average accuracy of 92% in several scenarios. Tian et al. (2018b) presented a hand gesture recognition scheme that leverages the CSI of WiFi signals. The basic idea to build a virtual antenna from the reflected signals due to hand motions. They adopted support vector machine (SVM) to recognize each hand motion. The proposed scheme was evaluated with 6 hand gestures and achieved an average accuracy of 97%. A hand gesture recognition method based on CSI, namely, WiCatch was presented (Tian et al. 2018b). Wicatch also adopted SVM to classify nine hand gestures and achieved an accuracy of 96%.

Methodology
The proposed method includes four stages: data collection and normalization, pattern segmentation, feature selection, and activity classification. Figure 1 shows a workflow of the proposed method. First, CSI is collected at detection point (DP) (i.e. wireless receiver), which is a laptop installed with Ubuntu and the open source CSI-Tool (Halperin et al. 2011). The collected CSI data are then preprocessed to adopt the adequate CSI data and to remove noise by applying a Butterworth filter and PCA. Then, the proposed method extracts several features and builds a feature vector for input to a machine learning classifier. Random forest (RF) algorithm is then used to recognize the proposed micro-activities. Each stage is illustrated in detail in the following sections.

Data collection and normalization
CSI is a collection of information that describes the state of a channel and includes amplitude as well as phase information. The CSI-Tool (Halperin et al. 2011) can be used to extract CSI from commodity wireless network interface controllers (NIC).
Herein, CSI data collected from a wireless network that composed of a transmitter with two transmitted antennas (WiFi router) as an access point (AP) and a laptop installed IWL 5300 NIC as a DP with three received antennas. Therefore, according to OFDM, each collected packet had 23 CSI streams (6 streams). Each stream has 30 subcarriers, as reported by the IWL 5300 NIC (Halperin et al. 2011 where H is the raw CSI and each element in the CSI matrix is represented as follows: where H i;j f k ð Þ is the amplitude of CSI, ffH i;j f k ð Þ is the phase information of the CSI, and i refers to the number of streams, where j refers to the numbers of subcarriers. The gathered CSI is affected by surrounding electromagnetic noise, as shown in Figure  2; therefore, a lowpass Butterworth filter is applied to filter noise (Figure 3). Therefore, for reported CSI streams, we perform PCA to obtain p principle components (PCs) as follows.
Data collection: The system collects N CSI packets and removes noise using Butterworth filters. Then, the system obtains an N × S n matrix H t,r .
where H t,r (n) represents a vector containing S n × 1dimensional CSI values from all S n streams at the (t, r) antenna pair for the n th collected CSI packet.
Covariance matrix: The system normalizes H t,r to obtain the normalized version Z t,r with zero mean and unit variance. Then, the system calculates the covariance matrix.
Eigen decomposition: The system calculates eigenvectors by eigen decomposition of the correlation matrix.
Principal components: The system constructs and obtains the top p PCs as follows: where ρ represents the eigenvectors and p represents the principal components.
Due to the high correlation of CSI, noises are primarily captured in the first PC; thus, the system removes the first component and retains the remaining PCs (Figure 4).

Pattern segmentation
An envelope extraction method based on the Huang-Hilbert transform (Huang et al. 1998) is used herein to determine the width of the dynamic time window. Huang et al. (Huang et al. 1998) proposed the EMD algorithm. EMD is a self-adaptive signal processing method; thus, it is suitable for application in nonstationary signal processing scenarios. EMD decomposes data into intrinsic mode functions (IMFs) to solve problems involving non-stationary data for which the Hilbert transform cannot be used. Each IMF can represent a type of oscillatory mode embedded in the signal.
Moreover, each IMF should satisfy the following conditions: 1) the number of extreme and zero-crossing points must either be equal or differ by at most one among the entire dataset and 2) the mean value of the envelope is defined by the local maxima, and the local minima must be zero at any point in the dataset. The EMD algorithm for signal X(t) can be represented as follows:.
(1) Initialize   (4) Calculate the mean value h ik (t) of the upper envelop and lower envelop; then, let (5) If h i(k+1) satisfies the above conditions, it becomes an IMF; thus, set c i (t) = h i(k+1) (t); otherwise, k = k + 1 and the step number 4 are repeated.
The residue is calculated as r iþ1 t ð Þ ¼ r i t ð Þ À c i t ð Þ. If r i+1 (t) contains at least two extremes, it is treated as input to derive the following IMF. Otherwise, the process completes and r i+1 (t) is denoted as the final residue r f .
(6) The original signal X(t) should be decomposed into a set of IMFs c i (t) and a residue r f as Figure 5 shows that the width of the peak is very precise. In the third layer of the envelope curve, the left and right slope zero points are clearly identifiable. The time between the two zero points is the width of the dynamic time window.

Feature extraction and activity classification
The proposed method extracts six features (mean, maximum value, standard deviation, percentiles, median absolute deviation, and entropy) for both the amplitude and phase of CSI to classify micro-activities. RF classification algorithm (Breiman 2001) is employed to classify the proposed micro-activities. Figure 6 shows the workflow of the RF algorithm. RF is used to rank the importance of variables in the classification problem.
To measure variable importance in a given dataset, the RF must first be fitted to the data. The system records out-of-bag errors for each data point and averaged over the forest during this process. The importance of the j th feature after training is measured by permuting the values of the j th feature among the training data and by computing the out-of-bag error on the perturbed dataset. Then, the system computes an importance score for the given feature by averaging the difference in out-of-bag error before and after permutation over all trees. This score is normalized by the standard deviation of the difference. Herein, the system ranks the features with large values for the estimated score as more important than the other features.

Experimental setup
Test experiments were conducted in an apartment having a lot of furniture. The experimental hardware comprised one TP-LINK TL-WR842N router as an AP and one Lenovo laptop installed with Ubuntu 14.04. AP and DP were placed in the LOS of each other at a distance of 3 m. DP and AP were placed at the top-left and bottom-right corners of the room, respectively. The proposed activities were implemented at a fixed position because the signal values at each position differed. The door and windows were closed to ensure stability in the test environment. We implemented nine micro-activities discussed in the literature (Pu et al. 2013), i.e. push, dodge, strike, pull, drag, kick, circle, punch (twice), and bowl, that were labeled as nine classes (C1-C9, respectively). Table 1 shows additional details about the experimental factors.
The proposed method was evaluated under two scenarios, i.e. LOS and NLOS (Figure 7). In the LOS scenario, the user, AP, and DP were located in the same room. In the NLOS scenario, the user and DP were in the same room and the DP was in another room.

Evaluation results
The evaluation experiments were conducted in indoor environments described above with up to 10 users performing the proposed micro-activities individually. We collected 300 samples of each micro-activity across 10 sessions. In each session, each user was asked to perform each micro-activity 30 times at a fixed position. The proposed method measured the 10 th cross-validation. The confusion matrix was used to show the achieved results, as shown in Figure 8.
The precision, recall, and F-measure metrics were used to evaluate the proposed method. Precision is a positive predictive value calculated as follows: Packet Index Amplitude (dB) Figure 5. Activity duration width via envelope extraction.
where TP and FP represent true positive and false positive, respectively. Recall (or sensitivity) is calculated as follows: where FN represents false negative. F-measure is the weighted average of recall and precision. F-measure is calculated as follows:    Table 2 shows the precision, recall, and F-measure results for each activity.
In the NLOS scenario, the user and DP (laptop) were in the same room and the AP (WiFi router) was in another room. Figure 9 shows the confusion matrix of the NLOS scenario. The overall accuracy of the NLOS scenario was 89.147%.
Moreover, precision, recall, and F-measure were also used to evaluate the proposed method under the NLOS scenario. Results are shown in Table 3.
Overall, the evaluation results demonstrated that the proposed method achieved high accuracy for each activity in both the LOS and NLOS scenarios. In future, we plan to extend this study to overcome current challenges primarily related to testing two or more users simultaneously due to the sensitivity of CSI to human motion.

Conclusions
This study proposed a device-free CSI-based human micro-activity recognition method that uses the CSI of an indoor ubiquitous wireless infrastructure. We described recent advances of device-free human sensing technology and proposed an effective method to address challenges, including CSI filtering, pattern segmentation, and micro-activity classification. The proposed method was evaluated in a complex indoor environment, and it classified nine human micro-activities. The proposed method was evaluated in both LOS and NLOS scenarios with 10 users, and the empirical results demonstrate that the proposed method reached high accuracy. Currently, device-free WiFi-based sensing technologies (including both RSSI and CSI) have achieved good performance relative to sensing single human activities; however, such technologies are limited when tracking two or more people movements. Therefore, tracking multiple people activities requires further investigation in the future.