A library for constraint consistent learning

This paper introduces the first, open source software library for Constraint Consistent Learning (CCL). It implements a family of data-driven methods that are capable of (i) learning state-independent and -dependent constraints, (ii) decomposing the behaviour of redundant systems into task- and null-space parts, and (iii) uncovering the underlying null space control policy. It is a tool to analyse and decompose many everyday tasks, such as wiping, reaching and drawing. The library also includes several tutorials that demonstrate its use with both simulated and real world data in a systematic way. This paper documents the methods contained within the library, including the implementations of said methods in tutorials and associated helper methods. The software is made freely available to the community, to enable code reuse and allow users to gain in-depth experience in statistical learning in this area. GRAPHICAL ABSTRACT


Introduction
Consistent Learning (CCL) is a family of methods for learning different parts of the equations of motion of redundant and constrained systems in a data-driven fashion [1][2][3]. It is able to learn representations of selfimposed or environmental constraints [3][4][5], decompose the movement of redundant systems into task-and null space parts [2,6], and uncover the underlying null space control policy [7,8]. CCL enables: (1) Learning the constraints encountered in many everyday tasks through various representations. (2) Learning the underlying control behaviour from movements observed under different constraints.
In contrast to many standard learning approaches that incorporate the whole of the observed motions into a single control policy estimate [9], CCL separates the problem into learning (i) the constraint representations and (ii) the underlying control policy. This provides more flexibility to the robot in the reproduction of the behaviour, for example, in face of demonstration data of one behaviour recorded under different constraints, CCL can learn a single policy that generalises across the constraints [8]. CONTACT  In terms of (1), the type of constraints may fall into the categories of state independent or state dependent constraints. For example, for state-space represented as end-effector coordinates, when wiping a table (see Figure 1(b)), the flat surface acts as a hard restriction on the actions available (motions perpendicular to the surface will be eliminated by the constraint), regardless of where the effector is located on the surface, so represents a state independent constraint. When wiping or stirring soup in a bowl (see Figure 1(c)), the curved surface introduces a state dependency in the constraint, since the restriction of motion is dependent on the location of the effector. The ability to predict how the constraints can influence the outcomes of control actions can speed-up learning of the new skills and enhance safety (e.g. the learned constraints can prevent exploration actions that cause excessive force). Furthermore, the availability of constraint knowledge can reduce the dimension of the search space when optimising behaviours [10].
In terms of (2), the observable movements may contain partial information about the control policy masked by the constraint, or higher priory task objectives [2]. For example, in a reaching task (see Figure 1(a)), humans move their arms towards the target (primary task) while minimising effort, for instance, by keeping the elbow In each case, redundancy is resolved in the same way. (b) A table wiping task is subject to a state-independent flat environment constraint using a robot arm. (c) A bowl wiping task subject to a state-dependent curvature constraint. low (secondary objective). The imperative of executing the primary task represents a self-imposed constraint, that restricts the secondary objective. Interaction with environmental constraints, can also mask the actions applied by the demonstrator. For example, when grasping an object on a table such as a pen, the fingers slide along the table surface resulting in a discrepancy between the observed motion and that predicted by the applied actions. CCL can help uncover these components, enabling the intended actions to be reconstructed and applied to new situations [11].
This paper introduces the CCL library as a community resource for learning in the face of kinematic constraints, alongside tutorials to guide users who are not familiar with learning constraint representations and underlying policies. As its key feature, CCL is capable of decomposing many everyday tasks (e.g. wiping [3], reaching [2] and gait [12]) and learn subcomponents (e.g. null space components [2], constraints [3] and unconstrained control policy [8]) individually. The methods implemented in the library provide means by which these quantities can be extracted from the data, based on a collection of algorithms developed since CCL was first conceived in 2007 [13]. It presents the first unified Application Programmers Interface (API) implementing and enabling use of these algorithms in a common framework.

Background
CCL addresses the problem of learning from motion subject to various constraints for redundant systems. The behaviour of constrained and redundant systems is wellunderstood from an analytical perspective [14], and a number of closely related approaches have been developed for managing redundancy at the levels of dynamics [15], kinematics [16,17], and actuation [18,19]. However, a common factor in these approaches is the assumption of prior knowledge of the constraints, the control policies, or both. CCL addresses the case where there are uncertainties in these quantities and provides methods for their estimation.

Direct policy learning
One of the most fundamental approaches to modelling movements is through Direct Policy Learning (DPL) [1,9,20]. Given data of observed movements as states x and actions u, this approach, usually formulated as a supervised learning problem, estimates a function of the policy π through direct regression, which is a mapping between states and actions u = π(x), π : R P → R Q Figure 2. Policy based modelling. Demonstration data is given as pairs of states and actions which are used to form a model of the policy represented here as a vector field. To then reproduce the movement from the resulting model, a corresponding mapping is defined which maps the demonstrators states x and actions u to the imitators x and u [21].
where x ∈ R P and u ∈ R Q are the state and action spaces, respectively. Once demonstration data is used in learning, this results in a reproduced movement by an imitator as illustrated in Figure 2. As shown, data is given as pairs of states and actions. A model of the policy is then produced which results in corresponding pairs of statedependant actions, with the goal being to approximate the policy as close as possible. One way to obtain a mapping between the states and actions is by minimising the error of whereπ is an estimated function of the policy π. 1 As pointed out in [1], when learning policies through DPL, demonstrations are either performed in free space such as doing sign language or under consistent constraints such as interacting in an unchanging environment with obstacles. These two categories that fall under demonstrations that follow DPL are referred to as unconstrained or consistently constrained, respectively. Moreover, DPL being more of a traditional and older planning algorithm, is deterministic in nature and suffers from poor generalisation when dealing with undemonstrated states. It simply assumes that each action leads to a particular state, however this ideology does not always apply in the real world due to the ever changing environment [20]. Nonetheless, there are extensions to this approach to mitigate its lack of robustness to generalisation such as through exploration policies which typically implements a reward function. However, this too brings up another issue involving finding a balance between Exploration vs. Exploitation, i.e. deciding how much more varying data is required compared to being satisfied with the policy modelled up to a certain point in time, with the risk lying in excess time spent at improving a policy not justified by the rate of improvement of said policy. Ultimately, other approaches exist which focus on generalisation without the same shortcomings.

Constraint formalism
As discussed in Section 2.1, there are implicit assumptions set by DPL regarding demonstrations such that these are expected to be either unconstrained or consistently constrained [1]. However, this makes it infeasible when it comes to modelling many everyday tasks due to variability in the real world. Interactions with the ever changing environment tend to involve some type of constraint subject to variability and this may have an impeding factor to the success of a task, such as when walking on uneven terrain or how a surface of a table limits movement of a hand at various angles during a wiping task ( Figure 3) [6]. Not being able to take such variability into account which should be expected in a real world environment shows a limitation of DPL [1]. While constraints are generally viewed as an impeding factor to a task, they can also be used to aid systems in interactions such as by exploiting environmental constraints in order to help facilitate grasping [22].
Starting with the DPL based approach (Section 2.1) which describes autonomous systems of the form where demonstration data is given as N pairs of observed states x ∈ R P and actions u ∈ R Q , and π is a direct mapping between the two. When including a constraint, the model is subject to a set of P-dimensional constraints where P < Q where x ∈ R P represents state and u ∈ R Q represents the action. A(x) ∈ R S×Q is the constraint matrix which projects the task space policy onto the relevant part of the control space. The vector term b(x) ∈ R S , if present, represents constraint-imposed motion (in redundant systems, it is the policy in the task space). Inverting (2), results in the relation is the null space projection matrix that projects the null space policy π(x) onto the null space of A. Note that, the constraint can be state dependent (A(x)) or state independent (A). Here, I ∈ R Q×Q denotes the identity matrix. v(x) ≡ A(x) † b(x) and w(x) ≡ N(x)π(x) are termed the task-space and the null-space component, respectively. The task space can refer to the DoF (Degree of Freedom) required to perform the primary task such as reaching towards a target point, and the null space controls a second lower priority objective in a way where it doesn't interfere with the main task such as avoiding joint-limits, self-collision or kinematic singularities [23,24]. This formalism is generic and works for a wide variety of systems, it not only applies to kinematics, but also to redundant actuation [25], and redundancy in dynamics [26]. Typically, it is assumed that the only directly observable quantities are the state-action pairs (x n , u n ) n ∈ 1, . . . , N which contain an unknown combination of v(x) and w(x), or at greater granularity A(x), b(x), N(x) and π(x). Note that, application of learning approaches [27] that do not consider the composition of the data in terms of the constraints are prone to poor performance and modelling errors. For example, applying direct regression to learn policies where there are variations in the constraints can result in model averaging effects that risk unstable behaviour [1]. This is due to factors such as (i) the non-convexity of observations under different constraints, and (ii) degeneracy in the set of possible policies that could have produced the movement under the constraint. However, increasingly more methods are being developed which overcomes the aforementioned issues such as by learning what movements can be performed in the task space [28,29] as well as others which separate the task and null space component [2] and then learn the constraint from either the task space [30] or null space [3,11,31]. Providing an open-source collection of software tools suitable for application to this class of learning problems, can help those working in the field to avoid the common pitfalls and thus the CCL library is presented which implements CCL methods such that they can be applied directly to such probelms. Implementation details are provided in Section 4.

Learning functions
A number of algorithms exist for the estimation of the quantities A, v, w and π. The following provides a brief summary of the data structures, evaluation metrics and methods relating to CCL provided in the CCL library 2 looking at the theoretical basis for each algorithm.

Data structures
CCL methods are tested on data that is given as tuples {x n , u n } N n=1 of observed states and constrained actions. It is assumed that all the actions u are generated using the same underlying policy π(x). In particular 3 u n = A n † b n + N n π n , where A n , b n and π n are not explicitly known. The observations are assumed to be grouped into K subsets of N data points, each (potentially) recorded under different (task or environmental) constraints.
Learning is more effective when the data contains sufficiently rich variations in one or more of the quantities defined in (3), since methods learn the consistency by teasing out the variations. For instance, when learning v, variations in w are desirable [2]. For learning π, observations subject to multiple constraints (variation in A) are necessary [1]. For learning constraint A, variations in π are desirable [6].

Evaluation criteria
For testing different aspects of the learning performance, a number of evaluation criteria have been defined in the CCL literature. These include metrics comparing estimation against the ground truth (if known), as well as those that provide a pragmatic estimate of performance based only on observable quantities. Table 1 shows an overview of all key functions as well as their respective equations for learning and evaluation which are covered. Moreover, the implementation of these performance measures are discussed in Section 4.6.

Normalised projected policy error (NPPE)
If the ground truth constraint matrix A n and unconstrained policy π n are known, then the (NPPE) provides the best estimate of learning performance [3]. It measures the difference between the policy under the real constraints and the policy under the estimated constraints [3]. NPPE in this case is defined as: where N is the number of data points, π n are samples from the policy, N is the true projection matrix andÑ is the learnt projection matrix. The error is normalised by the variance of the observations under the true constraints σ 2 u . This method is only used for validating the results, as it requires the ground truth π n and N which is assumed to be unknown.

Normalised projected observation error (NPOE)
In the absence of ground truth A n and π n , the NPOE must to be used [3]. In a practical setting, constraint learning may be performed without prior knowledge of the π n or N, thus the NPOE plays a vital role:

Null space projection error (NPE)
To evaluate the predictions of the null-space component modelw k (x), the NPE can be used [2]. Note that, use of this function assumes knowledge of the ground truth w n .

Normalised unconstrained policy error (NUPE)
To evaluate the estimated unconstrained control policy modelπ(x), the NUPE is used [8]. It assumes access to the ground truth unconstrained policy π n and works by directly comparing the true policy against the learnt policy π n −π n 2 (7)

Normalised constrained policy error (NCPE)
To evaluate the estimated constrained control policy model π(x), NCPE is used [8]. It assumes that the true w n is known. In some cases, variations in constraints may be insufficient to fully estimate the true policy π. In such cases, where constraints show little variation, it may not even be necessary to uncover the hidden components of the fully unconstrained policy as these can usually be eliminated by the constraints. Thus the following approach is suggested which measures the difference between the data and the estimated policy, when the latter is projected by the same constraints as in the training data.

Learning the constraint matrix
Several methods for estimating the constraint matrix A are presented. These assume that (i) each observation contains a constraint A = 0, (ii) the observed actions take the form u = w = Nπ, (iii) u are generated using the same null space policy π, and (iv) A is not explicitly known for any observation (optionally, features in form of candidate rows of A, may be provided as prior knowledge-see Section 3.3.2). The methods for learning A under these assumptions, are based the insight that the projection of u also lies in the same image space [3], i.e.
If the estimated constraint matrixÃ is accurate, it should also obey this equality. Hence, the estimate can be formed by minimising the difference between the left and right hand sides of (9), as captured by the error function whereÑ n = I −Ã † nÃ n . This can be expressed in simplified form Note that, in forming the estimate, no prior knowledge of π is required. However, an appropriate representation of the constraint matrix is needed. 4 As outlined in Section 1, in the context of learning A, there are two distinct cases depending on whether the constraints are state-dependent or not.

Learning state independent constraints
The simplest situation occurs if the constraint is unvarying throughout the state space. In this case, the constraint matrix can be represented as the concatenation of a set of unit vectorsÃ where the sth unit vectorâ s is constructed from the parameter vector θ ∈ R Q−1 that represents the orientation of the constraint in action space (see [3] for details). A visualisation of the simplest case of a onedimensional constraint in a system with two degrees of freedom where A =â ∈ R 1×2 is shown in Figure 4. It can be seen how θ (in this case,â = (cos θ , sin θ)) corresponds to different estimates of the constraint its associated null-space.
In [3], the use of this method is demonstrated in both simulation and a real world experiment. It is first evaluated in a toy example with the goal of learning the null-space projections from constrained data. For this, a two-dimensional system with a one-dimensional constraint is set-up. This constrained system is tested using three different control policies which include, a linear policy, limit-cycle and sinusoidal policy where in all cases errors for the NPPE and NPOE are below 10 −5 . This approach is then evaluated in the real world on a 7DoF Kuka Lightweight Robot (LWR-III) to see how it performs when learning from data with a higher dimensionality as well as with more complex constraints. All results for experiments following this set-up give a NPPE and NPOE less than 10 −3 . Further details on the theory behind this approach and its experiments can be found in [3].

Learning state dependent constraint
Another proposed method for learning a state-dependent constraint, i.e. the constraint A(x) depends on the current state of the robot x. For learning A(x), two scenarios may be encountered where (i) there is prior knowledge of the constraint in form of candidate rows (see [6]), or (ii) no prior knowledge is available.
In the case of (i), (x) ∈ R R×Q is the feature matrix and˜ (x) ∈ R S×R is the selection matrix specifying which rows represent constraints. The feature matrix can take the form (x) = J(x), where J is the Jacobian mapping from the joint space to the end-effector task space. Similar to (12),˜ (x) is described by a set of S orthonormal vectors where λ s ∈ R S is the sth dimension in the task space and λ i ⊥ λ j for i = j. Parameter θ s ∈ R S−1 is used for representing the constraint vector λ. Each θ s is modelled as θ s = ω s β(x) where ω s ∈ R (Q−1)×G is the weight matrix, and β(x) ∈ R G is the vector of G basis functions that transform x into a set of feature vectors in higher dimensional space. SubstitutingÃ n =˜ n n , (11) can be written as The optimal˜ can be formed by iteratively searching the choice of λ s that minimises (15). In [6], experiments to validate this method are presented in a simulated toy example and three link planar arm as well as a real world humanoid hand. The three link system looks at constraints affecting the end-effector of a system where various combinations of constraints are applied on a 2D Cartesian space as well as the orientation of the system's end-effector. Finally, it is also tested using an AR10 Robotic hand holding a remote of an air conditioner, in a problem where the task-space controls the pressing of a button using its thumb and the nullspace controls moving the middle finger to a comfort posture. The goal of this experiment is to extract the taskspace component in face of the distracting null-space movements so that the action of pressing a button can be generalised to other buttons (details can be found in [6]).

Learning the task and null space components
When the unconstrained control policy is subject to both constraint and some task space policy (b(x) = 0), it is often useful to extract the task and null space components (v(x) and w(x), respectively) of the observed actions u(x). These assume that the underlying null-space policy and the task constraint are consistent.
Assuming that the underlying null-space policy π(x) and the task constraint A(x) are consistent across the data set, the null space component w(x) should satisfy the condition N(x)u(x) = w(x) as noted in [2], in this case, w(x) by minimising withP n =w nw n / w n 2 , wherew n =w(x n ) projects an arbitrary vector onto the same space of w, and u n is the nth data point. This error function eliminates the taskspace components v(x) by penalising models that are inconsistent with the constraints, i.e. those where the difference between the model,w(x), and the observations projected onto that model is large.
The approach is validated in simulation of a 2D system, three link planar arm as well as a 7DoF Kuka Lightweight arm. It is also tested in a comparative study against the at the time state-of-the-art DPL where it is shown to perform better when estimating null-space policies (for details, see [2]).

Learning the unconstrained control policy
Another situation where CCL plays a role is when there is a need to estimate the underlying unconstrained control policy. These assume either (i) b = 0 no additional task is involved or (ii) w is learnt using the method proposed in Section 3.4.
As shown in [8], an estimateπ(x) can be obtained by minimising the inconsistency error The risk function (17) is compatible with many regression models (Details see [2]).

Tutorials
The API of the CCL library provides functions for a series of algorithms for the estimation of the quantities A, v, w and π. Moreover, the CCL library package provides a number of tutorial scripts to illustrate its use.
The following provides a summary of the implementation and multiple examples are provided which demonstrate learning using both simulated and real world data, including, learning in a simulated 2-link arm reaching task, and learning a wiping task. Included in this is a toy example which is described in detail, alongside a brief description of other included demonstrations which were tested with real world systems. Further details of the latter are included in the documentation.
To help put these functions into context, the majority of examples provided in this section are explained for use with a simple, illustrative toy example, in which a two-dimensional system is subject to a one-dimensional constraint. In this, the learning of (i) the null space components, (ii) the constraints, and (iii) the null space policy are demonstrated.
The toy example code has been split into three main sections for learning different parts of the equations. Detailed comments of the functions can be found in the Matlab script. The procedure for generating data is dependant on the part of the equation you wish to learn. Details can be found in the documentations.

Implementation
A system diagram of the CCL library is shown in Figure 5. The naming convention is following ccl_xxx_xxx indicating the category and functionality of the implementations. The following provides brief notes on the implementation languages and library installation, explains the data structures used in toolbox.

Language and installation notes
The library is implemented in both C and Matlab and is available for download from Github. 5 The CCL library is provided under the GNU General Public License v3.0, and documentation is provided online. 6 To use the Matlab package, the user can just add the CCL library into the current path. Installation of the C package has been tested on Ubuntu 14.04 Linux systems and is made easy through use of the autoconf utility. For Windows systems, only the Matlab package is currently tested and supported. The C package's only dependency is on the 3rd-party, open-source GNU Scientific Library (GSL) 7 .
A detailed explanation of the functions follows, alongside tutorial examples that are provided to enable the user to easily extend use of the library to their chosen application (see Section 4).

Data structures
The methods implemented in the toolbox work on data that is given as tuples {x n , u n } N n=1 of observed states and constrained actions. The CCL library uniformly stores the data tuples in a data structure reflecting this problem structure. The latter includes the data fields incorporating samples of the input state (x), action (u), actions from unconstrained policy (π), task space component (v) and null space component (w). The major data structure and data fields are listed in Table 2.

Learning null space components w
Here, guidance is given on how to use the library for learning W. This method is applicable to data when demonstrations consist of both v and w. As learning  is performed on the null-space component, it is necessary to separate these. However, if all demonstrations are already recorded in the null-space, then data is given as w and it's therefore not necessary to execute this step.
In the CCL library, this functionality is implemented in the function where U is the observed actions U. The outputs are fun (function handle) and J (analytical Jacobian of the objective function (16)).
Minimisation of (16) penalises models that are inconsistent with the constraints, i.e. those where the difference between the model,w k (x), and the observations projected onto that model is large (for details, see [2]). In the CCL library, this functionality is implemented in the function model = c c l _ l e a r n v _ n c l ( X , U, model ) where the inputs are X (input state), Y (observed actions combined with task and null space components) and model (model parameters). The output is the learnt model model.
A sample Matlab snippet is shown in Table 3 and explained as follows. Firstly, the user needs to generate training data given the assumption that w is fixed but v is varying in your demonstration. The unconstrained policy is a limit cycle policy [4]. For learning, the centre of the Radial Basis Functions are chosen according to K-means and the variance as the mean distance between the centres. A parametric model with 16 Gaussian basis functions is used. Then the null-space component model can be learnt through ccl_learnv_ncl. Finally, the  Table 3. Sample code snippet for learning null-space component from Matlab.

Learning null space constraints A
Once data is in the correct format to proceed with learning, it is important to know how to select the appropriate functions from the library to learn a state-independent A and state dependent A(x), where b(x) = 0. If the provided data contains a linear constraint (i.e. stateindependent), then the use of ccl_learn_nhat is suggested. On the other hand, if the constraint is nonlinear (i.e. state-dependant), then it is necessary to use ccl_learn_alpha in order to correctly estimate the constraint.)

Learning state independent constraints
For learning a linear constraint problem, constant A is used and ccl_learn_nhat is implemented. The CCL library provides the function optimal = ccl_learna_nhat (Un) to form the estimateÃ in this scenario, where Un is the collated set of samples of u (see Table 2) and optimal is the learnt model. The C counterpart 8  where the input argument Un and output argument optimal in the C library is identical with Matlab implementation. The arguments dim_u and dim_n are used for defining the dimensionality of the array Un.
The function works by searching for rows of the estimated constraint matrixÃ using the function [ model , s t a t s ] = c c l _ l e a r n a _ s f a (UnUn , Un , model , s e a r c h ) the arguments model and search contain the initial model (untrained) model, parameters controlling the search, respectively and UnUn is the matrix W W where W is a matrix containing samples of w. This is pre-calculated prior to the function call for speed. The function returns the learnt model model and learning statistics stats (see Table 2).
A sample Matlab script is provided in Table 4 and explained as the following. Firstly, it simulates a problem where the user faces is learning a fixed constraint in which the system's null-space controller π varies.

Learning state dependent constraints
The objective function for (11) and (15)  where Phi is the feature matrix . The implementation of case (ii) where where the inputs varargin are the objective function (fun), initial guess of model parameters (xc, i.e. ω) and training options (options).
As an example for learning a parabolic constraint, a state-dependent constraint A(x) of the form A(x) = [−2ax, 1] is used in the library. For this, ccl_learn_ alpha is implemented. Finally, nPPE and nPOE are used to evaluate the learning performance.

Learning null space policy π
Looking at learning an unconstrained control policy π. Data may be given as motions which are subject to some constraint and it may be of interest to separate the control policy of the system from the constraint such that this policy can be applied to new situations. To do so ccl_learnp_pi can be used.
This applies to the use case where π is consistent but A is varying. The CCL library implements the learning of these models through the functions model = c c l _ l e a r n p _ p i (X , U, model ) where inputs X and U are observed states and actions. The outputs are the learnt model model.
In the CCL library, the risk function (17) is implemented as a parametric policy learning scheme. Wherẽ π(x) = Wβ(x), W ∈ R Q×M is a matrix of weights and β(x) ∈ R M is a vector of fixed basis functions (e.g. linear features or Gaussian radial basis functions). A locallyweighted linear policy learning is also implemented in the library to improve the robustness of the policy learning (Details see [2]). A reinforcement learning scheme (with possibly deep network structure) can also be used to learn more complex policies [32][33][34].
A sample Matlab script is provided in Table 5 and explained as the following. For learning, 10 Radial Basis Functions are used and learned using the same K-means algorithm. ccl_learnp_pi is then implemented for training the model. Finally, NUPE and NCPE are calculated to evaluate the model's performance. In the library, a locally weighted policy learning method is also implemented in both Matlab and C. For details please refer to the documentation.
Other examples such as a 2-link arm and wiping examples are also implemented which follow a similar procedure to the toy example but in a higher dimension to aid novice users wanting to take advantage of the CCL to learn the kinematic redundancy of a robot. Moreover, users can easily adapt these learning methods, where the provided demonstrations such as wiping and real data are set-up to use the 7DoF KUKA LWR3 and Trakstar 3D electromagnetic tracking sensor, respectively, are reconfigured to suit their own systems and requirements.

Implementation of evaluation criteria
Once a constraint is learnt, results can be validated by using some function to assess the learnt model's performance. In the context of evaluating the constraints, the CCL library provides implementations of several performance metrics which have been discussed in Section 3.2. where U_t is the true null space component W, N_p is the learned projection matrix N, and Pi is the unconstrained control policy π. The outputs are the NPPE, variance and (non-normalised) mean-squared PPE, respectively.

NPOE
As mentioned before, the NPOE is used in the absence of ground truth A n and π n . The function implemented for computing the Normalised Projected Observation Error is

NPE
The NPE is implemented in the function [ umse , v , nmse ] = c c l _ e r r o r _ n p e (Un , Unp ) where Un and Unp are the true and predicted null space components. The return values are the NPE, variance and (non-normalised) mean squared projection error.

NUPE & NCPE
The NUPE and NCPE are implemented in the functions where input F is the true unconstrained control policy commands, Fp is the learned unconstrained control policy and P is the projection matrix. The outputs are NUPE (respectively, NCPE), the sample variance, and the mean squared Unconstrained Policy Error (respectively, Constrained Policy Error).

Conclusion
This paper introduced the CCL library, an open-source collection of software tools for learning different components of constrained movements and behaviours. The implementations of the key functions have been explained throughout the paper, and interchangeable and expandable examples have been demonstrated using both simulated and real world data for their usage. For the first time, the library brings together a diverse collection of algorithms developed over the past ten years into a unified framework with a common interface for software developers in the robotics community. In the future work, Matlab and python wrappers will be released by taking advantage of the fast computation routine implemented in C.

Notes
1. For brevity, here, and throughout the thesis, the notation a n may be used to denote the quantity a evaluated on the nth sample. For example, if a is a vector quantity computed from the state x, then a n = a(x n ). 2. Due to space constraints, the reader is referred to the original research literature for in-depth details of the theoretical aspects. 3. For brevity, here, and throughout the paper, the notation a n is used to denote the nth sample of the (matrix or vector) quantity a. Where that quantity is a function of x, the notation a n denotes the quantity calculated on the nth sample of x, i.e. a n = a(x n ). 4. Note: While the pseudoinverse operation has conditioning problems when close to a rank-deficient configuration, the constraint learning methods deal with this during the inversion by ignoring all singular values below a certain threshold. 5. www.github.com/mhoward3210/ccl 6. nms.kcl.ac.uk/rll/CCL_doc/index.html 7. www.gnu.org/software/gsl/ 8. The C library implements all learning functions, but for the remainder of the paper, Matlab functions will be used throughout for consistency.