Development of writing task recombination technology based on DMP segmentation via verbal command for Baxter robot

ABSTRACT This paper developed a character recombination technology based on dynamic movement primitive (DMP) segmentation using verbal command on a Baxter robot platform. Movements are recorded from a human demonstrator. The operator physically guides the Baxter robot to perform the movements for five times. This training data set is also utilized for playback process. Subsequently, the dynamic time warping is employed to pre-treat the data. The DMP is used to model and generalize every single movement. Gaussian mixture model is used to generate multiple patterns after the teaching process. Then the Gaussian mixture regression algorithm is applied to reduce the position errors in 3D space after the generation of a synthesized trajectory. A remote PC is used to control the command of Baxter to record or playback any trajectories via user datagram protocol (UDP) by typing commands in a text file. In addition, Dragon NaturalSpeaking software is used to transfer the voice data to text data. This proposed approach is tested by performing a Chinese character writing task with a Baxter robot, where different Chinese characters are written by teaching only one character.


Introduction
Teaching by demonstration (TbD) technology for robotics has recently received much attention because of its ability to transfer human skills to robot efficiently. Through human guidance, the robots are able to efficiently learn manipulating skills for performing tasks is another advantage of TbD. It has many advantages, compared to the conventional programming methods, such as: (i) TbD does not require a human instructor with expert skills/knowledge; (ii) human-to-robot skill transfer is achieved in a convenient and efficient manner; and (iii) most significantly, it takes human characteristics into account such as flexibility and compliance (Yang, Zeng, Cong, Wang, & Wang, 2018). Thereby, these benefits facilitate the performance of task accomplishment (Li, Yang, Wan, Annamalai, & Cangelosi, 2017). The intelligent learning based on TbD technology with trajectory matching is a research topic in imitation learning that has received considerable attention in the past few decades. Conventional approaches in this area of study are spline-based methods (Hu, Zhou, & Sun, 2008), dynamic system method (Kurosawa, Nakayama, Kato, Jamalipour, & Nemoto, 2007) and probabilistic model method (Hofmann, 2001).
CONTACT Chenguang Yang cyang@theiet.org Spline-based methods can quickly generate trajectories. However it is time-dependent, sensitive to interference, does not have the ability to adjust in real time, and the calculations needs to be revised when new data are received. Dynamic system approach can model discrete and rhythmic motions, which does not rely on time, and can be used in real time correction. It has topological equivalence and is often used as a dynamic movement of high level characterization primitives, such as dynamic motion primitives (DMP) (Huang, Yang, Yang, & Li, 2017) for the construction of complex motif primitive libraries (complex motions consist of simple motions represented by a series of primitives). However DMP is not suitable for direct encoding of complex motions and requires more teaching information (position, speed and acceleration). Probabilistic model method is to model motion as a stochastic model, such as dynamic Bayesian network (DBN), Gaussian mixture model (GMM), hidden Markov model (HMM), which are often used to match trajectory (Li, Yang, Ju, & Annamalai, 2018). Probability model has strong coding ability and noise processing ability, excellent robustness, which can handle highdimensional problems. In particular, GMM has a strong ability of encoding and reproducing continuous complex trajectories. Compared with DMP, it only needs space position-based teaching information and can be used for imitation learning of complex motions.
The basic theory of GMM is that as long as the number of Gaussian mixtures is sufficiently large, an arbitrary continuous distribution can be approximated by the weighted averages of Gaussian mixtures with arbitrary precision. Therefore, it is widely used in trajectory generation for robot simulation learning and has a strong behavioural coding capability. For example, in Oh, Volling, and Gonzalez (2015), by training the teaching data, learning GMM and obtaining a stable estimation of multi-dimensional dynamic system of nonlinear motion, in addition to be generalized to the unknown scene but also can be adjusted online in the case of interference. In Muhlig, Gienger, Hellbach, Steil, and Goerick (2009), the framework of variance-based imitation learning is given in the task space, and the motion modelling is performed using GMM, and the expectation maximization (EM) algorithm is used to reconstruct the trajectory, which is finally performed using the Gaussian mixture regression (GMR) and the optimization evaluator to realize imitative learning. The studies by researchers Van Erven and Harremos (2014) and Wang et al. (2016) used a Gaussian process to establish a random forward model to represent the motion to be simulated with the Kullback-Leibler divergence as an indicator of imitative performance, to ensure effective learning of the data through distribution prediction and to perform trajectory matching, and finally to achieve motion simulation.
Recently, content-based retrieval methods have gained significance in motion-captured data retrieval (Lew, Sebe, Djeraba, & Jain, 2006). In the data matching process, the start frame and the end frame of the search condition sequence are first indexed into the library to select possible alternative segments in the motion capture database, and finally a dynamic time warping (DTW) method is used to calculate the similarity to determine the final search result (Yao, Wen, & Lu, 2015). With the continuous development of robot research, its movement behaviour is more complex, and requires the robot's learning ability to be much greater. At the same time, the traditional algorithms are difficult to achieve for complex movements that are not easily obtained by movement laws, such as hitting the ball and a writing task (Yao et al., 2015). Furthermore, the robot requires the ability to learn to enhance its intelligence, so that it can achieve self-compensation correction, and has the ability (to interact with the random dynamic environment to deal with sudden and unknown situations) (Yao et al., 2015). The main advantage of robot learning is that it can find effective control strategies to complete complex motion tasks when traditional methods are rather challenging.
Owing to the fact that writing is a complex trajectory motion, the learning process has the following problems: (1) continuous complex trajectory characterization; (2) discrete trajectory generation. Some scholars have proposed a control chart model (De la Torre Gutierrez & Pham, 2016) and a recurrent neural network (Chung, Gulcehre, Cho, & Bengio, 2014) to learn writing skills, however all of which are not able to effectively solve the listed problems. Though the control chart model can generate discrete trajectories, the ability to represent complex trajectories is not enough, and the recurrent neural network may only be used for the reproduction of simple trajectories. Owing to the above problems, a GMM-based teaching method for complex trajectory characterization is required to be used in writing skills learning.
The imitation learning process in this paper is to encode the teaching data through GMM, extract the behaviour characteristics, and reconstruct the data through GMR, so as to realize the continuous Chinese character writing skills in trajectories. This method can effectively solve the problem caused by only using the DMP. Based on the basic DMP, a multi-task extension is applied by teaching the robot every single unit of the Chinese characters to achieve the regroup tasks after segmentation of trajectories. Using the developed techniques, the Baxter robot has successfully achieved writing the Chinese characters with non-continuous trajectory, which has good coding ability and generalization performance. The framework has a basic TbD procedure in four phases: demonstration, segmentation, alignment, and generalization (Yang, Zeng, Liang, et al., 2018). The overview of the framework is shown in Figure 1.

Baxter robot
The Baxter robot (shown in Figure 2) was developed by Rethink Robotics in the United States and is an innovative intelligent collaborative robot, which is an ideal alternative to manpower outsourcing and fixed work automation . With its unique features and benefits, Baxter enables manufacturers to create cost-effective solutions when handling small batches, multi-variety production jobs, freeing the hands of technical staff. Nowadays, there are many industrial leading companies worldwide applying Baxter to their production and have thus obtained a huge commercial competitive advantage (Wilson, Schultz, Ansari, & Murphey, 2017).

Dragon NaturallySpeaking
Dragon NaturallySpeaking is a speech recognition software released by Dragon Systems of Newton, Massachusetts (Altman, 2014). By using Dragon Natu-rallySpeaking the operators are able to create documents, reports, emails, fill in forms and workflows with verbal command (Altman, 2014). By speaking to the computer, the words appear as text in Microsoft Office Suite, Corel WordPerfect, and all Windows-based applications. Significantly, operators are able to create voice commands to make the computer run applications in multiple steps, which is time-saving.

Dynamic movement primitive
The biomimetic robot expert Ijspeert proposes a dynamic system method (Schaal, Peters, Nakanishi, & Ijspeert, 2005). The essence of this method is to use a series of linear differentiated equations to model the motion behaviour as a nonlinear dynamic attractor model by adding a learning automatic forcing term. This method can model discrete motions (such as writing tasks) and rhythmic movement (such as drumming) (Schaal et al., 2005). DMP utilizes an easy-to-understand dynamic system to achieve the expression of motion trajectories. The basic model of DMP is as follows (Schaal, 2006): where α z and β z are constants, τ is the scale factor, g is the target position, f is a forcing term, which is a nonlinear function of the Gaussian function, z and y represent the position and velocity of the primitive during the motion. Point attractor (z, y) = (0, g), y eventually converges to g.

Gaussian mixture model
The GMM uses the Gaussian probability density function to accurately quantify samples and decomposes one sample into several Gaussian probability density functions . The Gaussian model is mainly determined by two parameters: variance and mean. Learning different means and variances will directly affect the stability, accuracy, and convergence of the model. Since the background of moving targets are modelled, the two parameters of variance and mean in the Gaussian model are required to update in real time.
The equation of GMM is (Calinon & Billard, 2009): where P(n) denotes the prior probability; P(n | k) is the conditional probability obeying the Gaussian distribution. Therefore, the entire teaching data set can be represented by the Gaussian mixture model; k is the GMM, n is the number of Gaussian distributions.

Methodology
In this chapter, the proposed method is investigated for teaching, playback and the DMP segmentation-based regroup tasks, including the proposed DMP promoted by adding GMM, after the pretreatment using DTW. Here we apply the DTW to match the similarity between the character and the candidate domain, using the weighted distance in both directions as the final distance.

Pretreatment using DTW
DTW is an effective time series matching method for different lengths and is widely used in the field of time series processing and signal processing (Yao et al., 2015). Earlier applications were in areas such as speech recognition. Given a candidate area sample C with a width of M and a character sample Q to be queried with a width N, the size of the candidate area is the same as the size of the character to be queried, M = N. Considering that the writing task is well structured and continuous, for any two columns where there is no crossmatching occurring in between and each column can find the other for matching, which results in the same continuity (Solis, Marcheschi, Frisoli, Avizzano, & Bergamasco, 2007). The restriction ensures that the ith column of the sample C and the jth column of the sample Q have an accumulated distance D(c i , q j ), and it is only jointly deter- which is shown as follows (Yao et al., 2015): where both i, j > 1, d(c i , q j ) is the distance function between two samples, here in this paper, there are five demonstrations, hence there are also five dimensional features f 0 -f 4 , and the distance between element c i in column i in C and element q j in column j in Q, is defined by the Euclidean distance defined as (Yao et al., 2015): where c if k is the kth dimension of the ith column in sample C, q jf k is the kth dimension of the jth column in sample Q.
Although there is a difference in writing between the same characters, this difference should be kept within a small local area. Therefore, the global path must be constrained to maintain the invariance of the local structure and accelerate the solution of the problem. It is important to limit the spatial distance between the two columns c i and q j to be less than r elements (Zhang, Adl, & Glass, 2012).
where k is a constant coefficient and seqL is the length of the feature sequence.
Using the dynamic programming method can accelerate the solution of the horizontal distance D(c M , q N ). Since the feature sequence of the character Q to be queried and the feature sequence length of the candidate domain are the same, and it is not necessary to normalize the accumulated distance D(c M , q N ) by the sequence length (Zhang et al., 2012). Considering the length and width of the characters are inconsistent, and ultimately contribute to inconsistency, hence in this paper, we use the weighted two-way DTW algorithm to calculate the final distance of two characters, the weight of the length of the sequence itself: Finally, each candidate sample C and the dis-queried character Q are arranged in ascending order from dist(C, Q) to obtain a list of candidate regions. Since each character will have several key points, the overlapping candidate fields need to be eliminated. According to the obtained candidate sample list, for any single sample inside, if there is an order prior to, and if the overlapping area ratio exceeds the threshold, it is eliminated. The final list is the final test result.

Trajectory generation
The basic idea of DMP is to use an easily understood dynamic system to achieve the expression of motion trajectory (Ijspeert, Nakanishi, Hoffmann, Pastor, & Schaal, 2013). Among them, the spring damping system is the simplest expression, which is shown as follows: where K is the spring coefficient, Z is the damping coefficient; g is the target position, x 0 is the initial position, and g-x 0 is the proportional coefficient of the trajectory shape (Ijspeert et al., 2013). Here we use proportional coefficient to scale the trajectory shape when the new target point is farther than the initial target point of the teaching trajectory; x, v,v represent position, velocity, and acceleration, respectively; τ is a time-scale parameter that affects the speed at which the motion is generated (Ijspeert et al., 2013). Equation (9) represents a transformation system. Every independent transformation system refers to one degree of freedom (DOF), where f (t) is a nonlinear disturbance force and can be generated through learning. A specific expression is as follows (Pastor, Hoffmann, Asfour, & Schaal, 2009): where ϕ i is the Gaussian function, wherein c i is the centre, h i is the width. By adjusting the weight ω i , Equation (11) can be used to express arbitrary shape trajectories. a is the constant, t is a phase parameter with the value from 1 monotonically converges to 0 (Pastor et al., 2009). We can conclude that during the process, the external factors decreases as it nears the target position g, which ensures the stability of the system to the goal. Equation (12) represents a canonical system, which is used to realize the coupling between multiple transformation systems, and is not directly dependent on time (Pastor et al., 2009). After selecting the starting point x 0 and target g of the canonical system t = 0 , then integrating the canonical system, we can generate a movement by using the weight parameter. The principle of DMP is to obtain the nonlinear transformation function f (t) by learning from movements of the demonstrator (Pastor et al., 2009). However, there is a limitation to create the transformed system by using multiple demonstrated paths, hence we employed the GMM to overcome the above issues.
The teaching data is acquired by the motion capture system, and the spatial coordinate information of the end effector of the teacher is obtained. First, dimensionality reduction techniques, such as principal component analysis, are used to perform data preprocessing, and the three-dimensional data is mapped into two-dimensional space to obtain two-dimensional data. Teaching data, as a sample point of the learning model, is inputted into the learning model during the data encoding. In order to simplify the data processing steps, the emphasis is on the representation learning and generalization of the system, the two-dimensional teaching data of this article α = {α s , α t } are obtained directly by the previous section, where α s , α t , respectively, represent the spatial value and the time value of the teaching information. TheGMM is consisting of multiple Gaussian distributions of the value α i,t of each element in the sample (Li, Ma, Shan, & Zhang, 2011), (15) where K is the number of Gaussian functions, with typical value 3-5. The larger the K value is, the more the model can represent a complex situation; ω i,t , μ i,t and −1 i,t are the weights, mean values, and covariance matrix of the i-th Gaussian distribution, which are required to be determined.
The EM algorithm is used to estimate the parameters of the GMM (ω i , μ i and −1 i ), and the parameters are learned by searching the parameters in the probability model (Ho & Ermon, 2016). The EM algorithm is often applied to parameter estimation with hidden variables. The above is a maximum likelihood estimation problem. The idea of this algorithm is to continuously improve the lower bound of the likelihood function and then to optimize the parameters (Ho & Ermon, 2016). In order to prevent falling into the local optimum, k-means clustering is used for the initialization of the parameters. After the parameters(K, ω i , μ i and −1 i ) are determined, a GMM of the teaching data is obtained, and the teaching behaviour is characterized.
In this paper, GMR is used to reconstruct the teaching data of GMM learning, by doing this, the generalized output is obtained. α t of teaching data is used as a searching point, and its according spatial value α s is estimated by GMR. Known that η α i , μ i , i meet the Gaussian distribution (Calinon & Billard, 2009): where , and the conditional probability of α f and k satisfies the Gaussian distribution at the given α s , k (Calinon & Billard, 2009). Then, we have and then the average μ s and variance s of the number k of GMM components can be calculated as follows (Calinon & Billard, 2009): where the mean μ s is the required teaching data reconstruction value (μ s = α s ), and finally the generalized data points α = (α s , α t ) and the variance memory for extracting task constraints s can be obtained. The generalized data points are not included in the teaching data, but encapsulates all the essential features of the teaching behaviour. Under the relevant constraint of s , smooth and reliable motion trajectories can be generated to control the robot effectively.

Experimental setup
A Baxter robot is used to test the performance of the developed method, whose arm has 7 DOFs, which are S 0 and S 1 (two shoulder joints), E 0 and E 1 (two elbow joints), and W 0 , W 1 , W 2 (three wrist joints). The motion of arm can be controlled by programming under Ubuntu operation system, which has been widely used for Baxter platform with python language. A marker pen is attached Figure 3. The experimental setup for the Chinese character writing task.
to the gripper of Baxter. The operator physically guides the Baxter to write a Chinese character in a flat paper by holding the marker pen. The experimental setup is shown in Figure 3. Regarding our experimental platform, Visual studio 2013 and OpenCV library are used within a Windows 10 operation system. The experiment environment is an adequately illuminated indoor environment. During the teaching process, the operator demonstrates five times to writing the complete Chinese character 'Mu'. In doing this, we have four separate single primitives, which are generalized by DMP to regroup other Chinese words. There is a self-made implementation running in a remote PC to control the recording and playback of the trajectories of Baxter by defining any text for the locally outputted trajectory files via UDP. In addition, this remote PC is installed with the software Dragon NaturallySpeaking, which transfers the voice signals to text signals, to generate robot motion control commands.
The demonstration process is repeated five times with the joint W 2 fixed, and we record the values of the joints S 0 , E 1 , W 0 and W 1 respectively. The data collected from demonstrations are used to train the proposed DMP method. The parameters of the DMP model are set as: τ = 1, K = 20, a = 8. The GMM has a strong trajectory coding ability for complex tasks. This paper uses the tablet to acquire the writing data in the teaching mode, applies the GMM-based imitative learning to learn the writing skill, and obtains the generalized output through the GMR, which is performed manually by Baxter robot. Based on the operator demonstrated strokes, the second, third and  forth strokes of the Chinese character 'Mu' are chosen to be generalized.

Results and analysis
The experimental trajectories are plotted through Matlab. In order to do this, the five recorded movement trajectories for each stroke are saved in Cartesian space, where we use K-means method to initial the analysed data and we apply the EM algorithm to obtain the GMMs. After, we use the DTW method to align the five recorded trajectories, here the first curve is chosen as the reference to be aligned with others. It can be seen that Figure 4(b,d,f,h) are the reconstructed trajectories by GMR using Matlab and Figure 4(a,c,e,g) are the five demonstrated trajectories. Here we take the second step to be spatially generalized, which is the across stroke. Using the GMM-based imitative learning, the trajectory can be continuously used to write the Chinese character 'Mu'. The blue dotted line is the teaching trajectory by demonstration, the black solid line is the generated trajectory and the red solid line is the generalized result after DMP and GMM coding. Next, the first stroke and all the other generalized strokes are able to form a new Chinese character 'Bu' by using the verbal commands orderly as shown in Figure 5. The demonstration process record the various data of six joints S 0 , S 1 , E 0 , E 1 , W 0 and W 1 . The joint W 2 is fixed to value 0. This data set above is used to train the modified DMP. Figure 6 illustrates the training result.
We can draw the conclusion that the maximum and the minimum values of all the joints angles between the demonstration and generalization, in some special time points, spaced about 0.04 radians apart, which means that the range of the arm movement of the robot in those 2 situations is differing at about 0.04, which leads to different arm motions in two different positions. The movement of joints S 0 to W 1 are regenerated through the teaching process, which enables robot to perform the Chinese character writing task successfully as shown in Figure 5, and synthesizes the features of our proposed technology as well. As shown in the Figure 4, smooth curves are obtained from multiple demonstrations using the modified DMP. Hence, the robot performs the writing tasks well after being taught by demonstration.

Conclusion
A Chinese character recombination technology based on DMP segmentation using verbal command for a Baxter robot has been developed in this paper, which performs well in continuous writing of trajectories with generalization. In the motion generation part, we choose the discrete DMP as the basic motion model, because it is able to generalize the motion trajectories. To promote the learning efficiency of the DMP model, we employ the GMM and GMR to estimate the unknown function of the model referring to the movement. With this modification, the DMP model is able to represent a better movement (under the multiple demonstrations-based tasks). Through this method, simultaneous encoding of multiple teachings can effectively eliminate the influence of noise, which is able to improve the stability of the system. Moreover, our proposed method can be extended to achieve multi-task learning and improve the system's co-processing capabilities. A Chinese character writing experiment is employed on a Baxter robot to exam the efficiency of our designed methods ordered by the verbal commands using Dragon NaturallySpeaking software, which establishes an efficient robot learning framework for TbD and facilitates the robot learning at the higher level.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was partially supported by Engineering and Physical Sciences Research Council (EPSRC) under Grant EP/S001913/1 and by Royal Society under International Exchanges award IE170247.