Estimation of the Torques Produced by Human Upper Limb during Eating Activities Using NARX-NN

ABSTRACT Upper limb movement disorders significantly hamper the ability of impaired to perform basic activities of daily living (ADL). Eating, without doubt, is one of the essential ADLs necessary for human survival. To develop a rehabilitation system meant specifically to assist the hand during eating, an in-depth knowledge of hand motion and the forces/torques produced, during eating is vital. Since, Human Upper Limb (HUL) motion is highly dexterous, its dynamic model can be beneficial for predicting the torques during different eating activities. Four degrees of freedom (DOF), dynamic model of HUL including wrist and elbow joints, focusing on elbow and wrist flexion/extension, forearm pronation/supination, wrist flexion/extension and wrist adduction/abduction is formulated, using Nonlinear AutoRegressive network with eXogenous input Neural Network (NARX-NN), during different eating activities. We conducted an experimental validation involving five different food types and using two types of cutleries. Torque prediction accuracy of the model is determined by comparing predicted values to that of measured load cell torques, for all eating activities and using Root mean square error (RMSE) as a statistical measure, to test the model performance. Torques predicted by the model track the measured torque efficiently.


Introduction
Dysphagia and other eating complications are common among post-stroke patients, leading to complications, such as malnutrition, dehydration, suffocation, and eventually death (Jacobsson et al. 2000;Westergren 2006;Westergren, Hallberg, and Ohlsson 1999;Westergren et al. 2002). Albert Westergren et al. (2008,) in their research conducted in an urban hospital in Sweden, considered 162 stroke patients over one year and found that eating difficulties were found in 80%, while 52.5% of patients could not eat without any assistance. The most prominent eating difficulties encountered in the total sample included: 'eats three-quarters or less of served food' (60.1%), 'manipulating food on the plate' (56.2%), and 'transport of food to the mouth' (46.4%).
Hand motion plays a crucial role in eating (feeding). With roughly 30 degrees of freedom (DOFs), this complex structure can perform intricate tasks requiring varying amounts of forces/torques and dexterity. An indepth knowledge of hand motion and the forces/torques produced during eating is vital to develop a robotic rehabilitation system explicitly meant to assist the hand function during eating. Without sufficient knowledge of the motion of the hand and the forces/torques applied by the hand during eating activity, it is not easy to effectively develop rehabilitation robots for eating activities. Modeling hand motion (i.e., motion and force/torque producing the motion) during eating can be complicated since it is subjected to the type of food (solid, liquid) to be ingested and the type of cutlery (fork or spoon) to be used. A dynamic model of the HUL for estimating the torques produced during different eating activities is pivotal. Dynamic modeling can be beneficial to study the interactions between humans and rehabilitation systems to ensure human safety and enhance human performance. This quantification of the subjects' effort (torques) can serve as a guideline (or reference torque input) for developing assisted robotic rehabilitation systems meant for eating activities.
Numerous dynamic models have been formulated using mathematical methods like Newton-Euler, Lagrange (Buondonno and De Luca 2015;Massa and Vignolo 2016;McGrath, Howard, and Baker 2017) and Kane's method (Hirza, Ariff, and Rambely 2009;Rambely and Fazrolrozi 2012;Rosen et al. 2005;Tumit et al. 2015), and also Artificial Neural Networks (ANNs). Multi-body mathematical modeling is one of the popular noninvasive methods biomechanists employ to study various human motions and the corresponding torques and forces produced during multiple ADLs. However, the mathematical models can be cumbersome if the system involved has many DOFs and is 3-dimensional, making the inverse dynamics calculation process lengthy and not real-time. This drawback is one of the primary reasons for developing a dynamic model of the limb using the Nonlinear AutoRegressive network with eXogenous input Neural Network (NARX-NN) for instantaneous torque prediction.
Nonlinear Auto-Regressive eXogenous-NN (NARX-NN) models are increasingly used to estimate joint dynamics. The NARX model is based on the linear Auto-Regressive eXogenous (ARX) model, commonly used in timeseries modeling. It is a recurrent dynamic network, with feedback connections enclosing several network layers.
A NARX model to effectively decode the shoulder, elbow, and wrist movement based on the EMG signals was developed by (Liu et al. 2017). The input training data to the model consisted of the EMG signals from six muscles of the arm, and the angular motion of the shoulder, elbow, and wrist joints measured by an exoskeleton robot called IntelliArm acted as the output training data was recorded, while the subject moved their arm voluntarily only in the horizontal phase. The estimation performance of the model was about 98% for the shoulder, elbow, and wrist joints for both the control group and the impaired group of subjects.
In a similar study by (Raj and Sivanandan 2016), a NARX structure-based multiple-layer perceptron neural network (MLPNN) model was proposed for the estimation of elbow joint angle and elbow angular velocity from the Surface Electromyography (SEMG) signals. The training data included the SEMG from the biceps brachii muscle of the human hand as input and the elbow angular displacement and elbow angular velocity during extension and flexion of the elbow as the outputs. For feature extraction from the SEMG, two time-domain parameters, Integrated EMG (IEMG) and Zero Crossing (ZC), were extracted. The NARX MLPNN model was trained using Levenberg-Marquart algorithm. The average regression coefficient value (R) obtained for elbow angular displacement prediction was 0.9641, and the elbow angular velocity prediction was 0.9347. Hence, the proposed model could estimate the elbow joint angle and elbow angular velocity with considerable accuracy.
Several similar studies have successfully been conducted to estimate human dynamics using NARX models and EMG signals (Ayati et al., 2015;Akbari and Talasaz 2014;Jali et al. 2014aJali et al. , 2014b. EMG signal is a commonly used biological signal for human motor intention prediction, which is an essential element in human-robot interaction systems. They have been extensively used for muscle force estimation for the past few decades. Several torque prediction models using EMG as input have also been developed. (Jali et al. 2014a) in their study used a two-layer feed-forward network trained using Back Propagation Neural Network (BPNN) to model the EMG signal to elbow torque value. The EMG signal of the biceps brachii muscle act as the input of the ANN, while the elbow torque is the desired output. The ANN model with 20 hidden neurons had an MSE of 0.13807 and average regression of 0.999.
A Hill-type EMG-driven model for ankle-joint estimation was developed by (de Oliveira and Luporini Menegaldo 2010). This paper proposed to find individual-specific muscle maximum force F om by estimating muscle physiological cross-sectional area (PCSA) using ultrasound, which is then multiplied by a reasonable value of maximum muscle-specific tension to obtain the output ankle torque. Three EMGs-Soleus, gastrocnemius medialis, and gastrocnemius lateralis acted as inputs acquired in a series of experiments involving eight adult male subjects, performing an isometric contraction protocol consisting of 10 s step contractions at 20% and 60% of the maximum voluntary contraction level. Isometric torque was simultaneously collected using a dynamometer which acted as the output. A statistically significant reduction in the root means square error was observed when ultrasound obtained F om was used compared to F om from the literature. EMG signals are extensively used in human-robot systems to foresee the purpose of the user's motion. Acquisition (electrode placement) of these signals is the most critical step as the subsequent processes primarily depend on the quality of the signal (Bi, Feleke, and Guan 2019). Electrodes may shift away from the selected part of the muscle (because of dynamic changes in the human body) or may lose contact with the skin's surface. This reduces the amplitude of the quantified signal and thereby affects prediction precision (Ghapanchizadeh, Ahmad, and Ishak 2016;Mesin, Merletti, and Rainoldi 2009). Apart from complex pre-processing of EMG signals to extract human motion intentions, EMG signals may not necessarily contain the complete information produced by the motor system, specifically, when a preplanned motion is canceled to avoid any catastrophic results (Mirabella 2014;Mirabella and Lebedev 2017). Moreover, an EMG signal collected from part of the muscle does not represent the muscle as a whole (Staudenmann et al. 2010).
From the previous works discussed, it can be concluded that abundant research has been done to predict human motion dynamics using various methods. However, the novelty of this study, is to utilize user's wrist and elbow angular motion to estimate the corresponding joint torques, which has not been explored yet. NARX-NN is used to map the dynamic relationship between the arm motion (elbow and wrist motion) and the torques generated. The network is trained using the Levenberg-Marquardt algorithm.
The modeling of HUL during different eating activities, considering different food characteristics and different cutleries, has not been studied in detail yet. Although (Perry and Rosen 2006), in their quest to develop a 7-DOF exoskeleton, along with 24 other basic ADLs, included an eating task using fork and spoon, however, the focus of this activity was mainly on analyzing the grasping action of the spoon by a healthy and impaired patient, rather than analyzing the dynamics of HUL, considering various food characteristics and cutleries involved. The proposed NARX model in this study aims to estimate the torques produced in HUL (elbow and wrist joint), focusing on the elbow flexion/extension, forearm pronation/supination, wrist adduction/abduction, and wrist flexion/extension motions, using the user's wrist and elbow angular motion, while performing different eating activities in a fraction of a second, which has not been studied extensively in the previous works. An experiment is then performed to validate the formulated model by comparing its predicted torques to those measured by the load cells of a robotic system while performing different eating activities using different cutleries. The focus of this study is the torques produced at the wrist and elbow joint only. The torques produced at the shoulder joint will be considered in a future study. This paper is organized as follows: Section 2 describes the experimental design employed in this study. Section 3 presents the experimental results of the NARX-NN model developed. Section 4 presents the discussion of the results, and lastly, the conclusion is drawn in Section 5.

Experimental Setup
An experiment has been performed using a 4-DOF mechanical HUL robotic system (coupled with the human arm) as shown in Figure 1. All the DOFs possess a revolute configuration. The four joints correspond to the elbow flexion/extension q 1 ð Þ, forearm pronation/supination q 2 ð Þ, wrist abduction/ adduction q 3 ð Þ and wrist flexion/extension q 4 ð Þ. All four joints have load cells attached, which are used for the corresponding torque measurement. The electrical unit comprises of NI USB-6211 Data acquisition (DAQ) system, which acts as an interface between the sensors and the MATLAB/Simulink software.
Xsens Motion Tracker (MTw) has been used for capturing the motion of the wrist and elbow joints during various eating activities as shown in Figure 2. The orientation data obtained in the form of Roll, Pitch & Yaw, corresponds to the elbow flexion/extension q 1 ð Þ, forearm pronation/supination q 2 ð Þ, wrist radial/ulnar flexion (abduction/adduction) q 3 ð Þ and wrist extension/flexion q 4 ð Þ. Specialized Dynamixel XH430-W350-R servo motors were used for the wrist joint to allow easy movements, resulting in near-zero stiffness and frictional torques. Thus, these parameters were ignored in the model. Meanwhile, for the elbow joint, the stiffness and frictional torque of the exoskeleton was previously estimated by another researcher (Mounis, Azlan, and Sado 2020). The torque measurement calibration accounted for the stiffness and frictional torques for each of the load cells attached at these joints. Thus, the measured torques for our motion study were only due to the human input torque, while the exoskeleton's inertia, centripetal, Coriolis, and gravitational torques have all been accounted for in our current Kane's model (Hussain and Azlan 2019).

Experimental Procedure
Five healthy, right-handed subjects, including three males and two females, with an average age of 30 years and an average weight of 70 Kg, volunteered for the experiment. The experiment comprised subjects performing five different eating activities as in our previous HUL motion analysis study (Hussain, Zainul Azlan, and Bin Yusof 2018) while wearing the robotic system and Xsens Motion Tracker (MTw) on the upper limb. The experiment was performed without the actuation of the robotic system as it ensured that the torque measured by the load cell is the torque produced by the human effort only. Using the actuated system implied that the machine assists the upper limb during eating; as such, the torque recorded by the sensors would not represent the accurate measure of the torque generated by the HUL while eating. The system with the load cells attached at the four joints measured the torques produced simultaneously, as the subject performed the eating activity, while the Xsens Motion Tracker (MTw) at wrist and elbow joints measured the angles of HUL. Five eating activities, with varied food characteristics, were performed using two different eating tools (fork/spoon) are: Eating rice (solid) with a spoon. Eating vegetable salad (solid) with a fork. Eating noodles (solid) with a fork. Eating soup broth (viscous liquid) with a spoon. Eating thick cereal (not viscous) with a spoon. Before the experiment, the subjects were trained to perform eating activities while wearing the robotic system. Each activity consisted of three trials by each participant using either fork or spoon while sitting comfortably on a chair with food on the table. Each trial lasted for around 20 seconds. The average of the three trials has been used for the analysis. Each eating trial is divided into three events, as shown in Figure 3. All the eating activities begin and conclude with the origin, where the subject's right upper limb is stationary and rested on the robotic system. The hand and the wrist are in the neutral position, while the elbow joint was extended approximately 100°. Event A occurs when the participant moves their arm from origin to grasp the cutlery. During event B, the participant grasps the cutlery, digs into the food, and, while holding the cutlery, brings the food into the mouth to eat, and event. Event C shows the point when after eating, the participant releases the grip of the cutlery and brings their hand back to the origin.

Narx Neural Network
The defining equation for the NARX model is  an independent (exogenous) input signal. That is, in this study, the future torque values of the wrist and elbow joints depend on both the previous torque and the previous joint angle values. This network also uses tapped delay lines to store previous values of the u t ð Þ and y t ð Þ sequences. Due to this reason, NARX neural network models can learn more effectively, converge faster, and display better generalizations than other recurrent networks. Numerous studies have proven that NARX neural network successfully uses its output feedback loop to improve its predictive performance in complex time series prediction tasks and consistently outperforms standard neural network-based predictors, such as the Time Delay Neural Networks (TDNN) and Elman architectures (Menezes and Barreto 2008). In (Lin, Horne, and Giles 1998;Lin et al. 1996), authors reported that learning long-term temporal dependencies with gradient-descent techniques is more effective in NARX than in simple multilayer perceptron (MLP) based recurrent models. This happens due to the NARX model's input vector skillfully built through two tapped-delay lines where one is sliding over the input signal together and the other sliding over the network's output hence, adding to the network's stability and performance In this study, the NARX model was developed using the Neural Network Time Series Toolbox in MATLAB 2019a. The orientation data obtained using Xsens MTw (input training dataset) is used to calculate the corresponding torques produced during elbow flexion/extension T 1 ð Þ, forearm pronation/ supination T 2 ð Þ, wrist flexion/extension T 3 ð Þ and wrist adduction/abduction T 4 ð Þ, while performing various eating activities, by performing inverse dynamics of a 3D Kane's mathematical model of the upper limb formulated in our previous study (Hussain and Azlan 2016) and used as output training dataset (target).
Four individual networks were trained corresponding to the torques predicted during elbow flexion/extension T 1 ð Þ, forearm pronation T 2 ð Þ, wrist adduction/abduction T 3 ð Þ and wrist flexion/extension T 4 ð Þ respectively, instead of training a single network for predicting all the four together, which lead to better training results (Section 3.1). Each network consisted of a single input layer consisting of an angle and the output The training, validation, and test data sets in all four networks are 70%, 15%, and 15%, respectively. The total training dataset consisted of an average of 3799 target timesteps, with 2659 timesteps used in training (70%), 570 timesteps for validation (15%) and 570 for testing (15%) for each eating activity. Torques T 1 , T 2 , T 3 and T 4 for each eating activity consisted of an average of 120 input u t ð Þ and output y t ð Þ steps. The number of hidden neurons in the hidden layers and the number of delays in the inputs and outputs are determined by trialand-error procedure. Here, better training results were obtained by selecting the number of hidden neurons as 10. The delay is 2 s, except for the training of the network for elbow flexion/extension T 1 ð Þ, the hidden neurons are taken as 2 for cereal, rice and noodle eating activities and 5 for soup and vegetable eating activities. Figure 4 shows the flowchart of input and target training datasets of 4 individual NARX models used for torque estimation during various eating activities.
Unlike, inverse dynamics approach where for each eating activity, the corresponding input angles have to be adjusted, with every change in the input motion activities, the four NARX individual networks have been trained once, for predicting torques of the elbow flexion/extension T 1 ð Þ, forearm pronation/supination T 2 ð Þ, wrist flexion/extension T 3 ð Þ and wrist adduction/abduction T 4 ð Þ, respectively and then, these four networks were used to estimate the torques for all other eating activities.

NARX-NN Training Performance Validation
The performance of the network is assessed based on the Mean Squared Error (MSE) of the training data and Regression (R) between the target outputs and the network outputs. MSE is the average squared difference between outputs and targets. Lower values are better. Zero refers to no error. R values measure the correlation between outputs and targets. An R-value of 1 means a close relationship and 0 is a random relationship. The training stops when the validation error ceases to decrease after specific iterations. The training performance of all four networks is summarized in Table 1.

Elbow Flexion/extension Network Training Performance
This network is trained using the soup eating activity data of a subject, where the training dataset includes the angle of elbow flexion/extension q 1 ð Þ as the input (from MTw) and the output target T 1 ð Þ is the corresponding torque (calculated from Kane's math model (Hussain and Azlan 2019)). The network was generalizable for all eating activities, with a RMSE of 0.009 Nm.
The best validation performance of the network is obtained at 2.0875e-06 at epoch 153. As shown in Figure 5, the MSE decreased continuously until it achieved the best validation performance.
Regression values of the training, validation and test data are all very close to 1, hence, showing a good correlation between the outputs and targets, as shown in Figure 6.
The network performance is further validated by checking the error autocorrelation function and input-error crosscorrelation function of the network, shown in Figure 7-Figure 8. Error autocorrelation function depicts how the prediction errors are related in time. There should only be a single nonzero value of the autocorrelation function for a perfect prediction model on occurring at zero lag. That is, the prediction errors are completely uncorrelated with each other. In this network, some correlations fall within the 95% confidence limit around 0, which is acceptable for a model to be adequate. Similarly, the input-error cross-correlation function illustrates how the errors are correlated with the input sequence u t ð Þ. For a perfect prediction model, all the correlations should be zero. All the correlations fall within the confidence limits around zero in this network, as shown in Figure 8.  Figure 9 shows the time series response of the network. It displays the inputs, targets, and errors versus time. It also specifies which time points were selected for training, testing and validation. Hence, as seen from Figure 9, the red-colored parts show the time points used as test targets and test outputs, while the yellow fluctuation in error shows the data points where the error occurs.
The network performance of the rest three networks for the prediction of forearm pronation/supination T 2 ð Þ, wrist flexion/extension T 3 ð Þ and wrist adduction/abduction T 4 ð Þ is included in the supplementary data.

Experimental Results
The experiment performed using the 4-DOFs prototype robotic system coupled with the human upper limb, was to validate and determine the accuracy of the torques predicted by the formulated NARX-NN in real-time, by comparing to those measured by the loadcells, corresponding to the motions elbow flexion/extension q 1 ð Þ, forearm pronation/supination q 2 ð Þ, wrist abduction/adduction q 3 ð Þ and wrist flexion/extension q 4 ð Þ, during various eating activities.
Cereal activity Figure 10(a) shows the motion angles of subject 1, during cereal eating activity. The torque validation graphs comparing the NARX-NN model prediction to that of the load cell readings, during cereal eating activity of subject 1 are shown in Figure 10 As demonstrated in Figure 10, during event A as the arm moves to grasp the cutlery, all the joint torques, and the joint angles start fluctuating. During event B, while the subject is eating, the joint torques increase and attain a stable state. During event C, as the subject drops the cutlery and moves his/her hand back to the origin the joint torques, and corresponding angles start fluctuating again and eventually decrease, the torques reach a minimum value as predicted by the NARX-NN model. It was observed from the torque graphs, that the maximum torque was generated, during event B in both wrist and elbow joints, for the majority of the eating activities. Also, the torques  produced during forearm supination T 2 ð Þ, torque produced during wrist abduction T 3 ð Þ and wrist flexion T 4 ð Þ have opposite direction to that of elbow flexion T 1 ð Þ. This indicates that the net torque produced by the wrist joint acts in the opposite direction to the torque generated by the elbow joint. Hence, the torques predicted by the NARX-NN model are consistent with the measured torques by the load cells.
It can also be observed that motion trajectories and the torque prediction graph of the other four subjects have shown results. It can be observed that the torque trajectories for all eating activities show similar trends, this can be attributed to the similar basic eating action involved.

Result Validation
Root mean square error (RMSE) is used as a validation means for the performance of the NARX-NN model. It is a standard statistical metric used to measure model performance in various fields. It can be defined as the square root of the mean of the squared differences between the corresponding elements of the forecasts (f) and observations (o). The smaller an RMSE value, the closer forecasted and observed values are (Barnston ((1992)); Chai and Draxler 2014)

RMSE
where N is the number of elements. The average RMSE results of all the subjects are shown in Table 2.
The RMSE results of all torques, which consists of elbow flexion/extension, forearm pronation/supination, wrist adduction/abduction and flexion/extension, indicate that the NARX-NN model formulated fits all the torques well, with an average RMSE of 0.09 Nm, for all eating activities. The estimation performance of the NARX-NN model is appropriate to be used for torque prediction. This result shows that the formulated method successfully modeled the wrist and elbow joints of the HUL, while eating different food types and using various cutlery. It provides an alternative way of predicting joint dynamics using motion data instead of commonly used EMG signals which can be complicated to acquire and process.
The experimental validation results of this study are better in comparison to the ANN model developed by (Jali et al. 2014a) with an MSE of 0.13807. NARX-NN model developed by (Jali et al. 2014b) to predict EMG-based elbow joint torque has not been validated externally; only network training validation performance is included. Moreover, in (Akbari and Talasaz 2014;Jali et al. 2014aJali et al. , 2014b, only 1 DOF (elbow flexion) has been considered, while in this study, 4 DOFs have been included while performing five different eating activities, which to our knowledge, has not been done before.
The torques predicted by the NARX-NN model can be helpful to study the dynamics of the wrist and elbow joints while performing various eating activities using different cutleries. This quantification of torques can serve as a guideline in designing and improving the assistive robotic and rehabilitation systems meant for eating activities of post-stroke patients and other patients with upper limb disabilities, where the user's muscular efforts need to be considered.
The model developed can determine the torques for 4-DOFs of the human upper limb for various eating activities, using two kinds of cutlery, which have not been studied extensively in detail yet. Unlike the present state-of-the-art studies, which have developed NARX-NN models using EMG signals, the user's wrist and elbow angular motion has been used to estimate the corresponding joint torques in this study. The torques predicted by this model are instantaneous, unlike the inverse dynamic mathematical model, which can be complicated and consumes high computational time and effort.

Conclusion
In this study, a dynamic model of the human upper limb, focusing on the elbow flexion/extension motion, forearm pronation/supination, wrist adduction/abduction, and wrist flexion/extension, has been formulated using NARX-NN, during different eating activities and using various cutlery. This study has certain limitations which can be addressed in future work. 1) For the modeling of the HUL, only wrist and elbow joints have been included, while the fingers have been neglected. 2) Significant improvements are required in the design of the prototype robotic platform used for the model validation.
The system should be redesigned as an HUL exoskeleton with the machine joints precisely in line with the axis of human joints to provide more accurate torque results.
3) The load cells can be replaced by the actual torque sensors, which provide more reliable and consistent data.
N. Z. Azlan received her bachelor's degree (Hons.) in Mechatronics Engineering from International Islamic University Malaysia (IIUM) in 2003. She pursued her Masters's and Ph.D. studies in Mechanical and Control Engineering at Tokyo Institute of Technology, Japan. She is currently a lecturer at IIUM.