A generalized stochastic optimal control formulation for heart rate regulation during treadmill exercise

ABSTRACT The stochastic optimal control formulation for heart rate (HR) regulation during treadmill exercise is extended here to encompass low-pass characteristics in the compensator and in the input sensitivity function. The latter governs the response of the treadmill speed command to disturbances arising from physiological heart rate variability (HRV), and it must be shaped to give appropriate control loop behaviour. In a comparative test series involving 20 healthy male subjects, the low-pass compensator was found to give substantially and significantly lower mean intensity of changes in treadmill speed (1.0 vs.  m2/s2, low-pass vs. non-low-pass compensators, ), but at the cost of a significant increase in mean root-mean-square tracking error (2.46 vs. 1.74 bpm, ). The experimental results demonstrated accurate and robust control of HR across the 20 subjects tested, despite a simple pre-existing nominal model having been used for controller design. The results also provide strong evidence that the magnitude of HRV decreases naturally over time. The principal design issue in this application is that of suppression of disturbances arising from HRV, but changes in treadmill speed must remain acceptable to the runner. The theoretical extension of the stochastic optimal control problem formulation derived here allows this to be addressed by appropriate shaping of the key sensitivity functions using two tuning parameters, thus facilitating the tradeoff between tracking accuracy and control signal intensity.


Introduction
Automatic control of heart rate (HR) during treadmill exercise is important because HR is commonly used to prescribe exercise intensity for fitness training programmes (Garber et al., 2011;Pescatello, Arena, Riebe, & Thompson, 2014). Employment of feedback control allows target heart rate profiles to be achieved by automatic determination of treadmill speed, e.g. Hunt & Fankhauser (2016); this obviates the need for the runner to intervene by manually setting speed using the treadmill's control panel.
Recent work has demonstrated that the fundamental dynamic response of HR to changes in treadmill speed can be adequately represented by a simple first-order linear model (Hunt, Fankhauser, & Saengsuwan, 2015). Several empirical studies have, furthermore, demonstrated that highly accurate and robust control of heart rate can be achieved using a single linear time-invariant (LTI) model of this type (Hunt & Fankhauser, 2016;Hunt & Maurer, 2016;Hunt & Liu, 2017;; these studies employed a single pre-existing model from a separate systems identification study, Hunt et al. (2015), and did not require identification of the HR dynamics of the individual runners tested.
It is of course accepted that the HR response to exercise is nonlinear and time-varying, but it does not automatically follow that nonlinear and/or time-varying feedback control is required to achieve high-precision and robust control; in fact, the evidence reviewed above suggests otherwise (Hunt & Fankhauser, 2016;Hunt et al., 2015;Hunt & Liu, 2017;Hunt & Maurer, 2016). In a head-to-head comparison, the performance outcomes for linear and nonlinear controllers were not significantly different for moderate-to-vigorous exercise intensities; furthermore, nonlinear control was found to be overly sensitive at low running speed (Hunt & Maurer, 2016).
This evidence strongly suggests that the main design challenge for HR regulation and control arises from the existence of wide-spectrum, physiologically-based heart rate variability (HRV) (Sassi et al., 2015). The existence of uncertainty in plant model parameters and/or structure (e.g. nonlinearity, time variability) does not appear to be the principal issue to be addressed in the design of feedback controllers for HR. HRV appears in the plant as an output disturbance, therefore care must be taken to ensure that the control signal, viz. the treadmill speed command, is not too-strongly excited in a frequency range which would be negatively perceived by the runner. It has been elucidated in reference Hunt & Fankhauser (2016) that ultra-low frequency HRV, resulting in very slow changes in speed, are acceptable; at the other end of the spectrum, high-frequency disturbances will generally be attenuated above the bandwidth of the feedback loop; but disturbances in the range close to the crossover frequency, classified as very-low frequency HRV (Malik et al., 1996), are critical because unpleasant excitation of the speed signal can occur.
To address this principal design challenge, a method for HR control which explicitly shapes the input sensitivity function, i.e. the transfer function between the HRV disturbance and the control signal (denoted U o in the sequel), has been derived and empirically tested (Hunt & Fankhauser, 2016). This method allows the gain of U o to be set to a specific value at a given critical frequency. As a minimum, for any other feedback design approach for HR control, the frequency response of U o should be carefully examined during the design process. Alternative approaches to treadmill HR control, e.g. references Cheng et al. (2008), Su et al. (2010), Scalzi et al. (2012) and Lu et al. (2016), have neglected the key points discussed above. These studies, moreover, have failed to employ objective outcome measures for evaluation of control performance and have lacked formality and rigour in empirical assessments.
The use of quantitative time-domain performance outcomes arises naturally in the HR control problem, and such measures have been employed in recent work (Hunt & Fankhauser, 2016;Hunt & Liu, 2017;Hunt & Maurer, 2016): root-mean-square (RMS) tracking error, Equation (31) below, describes the accuracy of HR reference tracking. It is also important to quantify the intensity of changes in treadmill speed, which can be done using the average power of the control signal, Equation (32).
Motivated by the importance of these outcome measures, the HR control task has been formulated within a stochastic LQ-optimal control framework (Hunt, 1989;Hunt & Liu, 2017) (here, 'L' signifies a linear plant model and 'Q' a quadratic cost function). That work, Hunt and Liu (2017), identified an explicit link that can be made between the optimal control cost function and the outcome measures: RMSE and control signal power give sample estimates of the quadratic terms in the cost function; their relative importance is set using the control weighting parameter. It was also shown to be convenient to model HRV disturbances as stochastic processes. These factors, it was suggested, make the stochastic optimal method a direct and natural approach to HR control.
However, a limitation of the LQ-optimal controller proposed in Hunt and Liu (2017) is that the compensator transfer function, and therefore also the input sensitivity function U o of the closed-loop system, did not have low-pass frequency-domain characteristics: this was a consequence, viewed in the time domain, of the compensator being merely causal and not strictly causal (this is the standard formulation in the stochastic optimal control approach, Hunt, 1989). Due to this property, the empirical results obtained in that work tended to have relatively high control signal power, because of the effects of the non-attenuated high-frequency HRV as already discussed. In its basic form, therefore, the LQ-optimal approach did not offer sufficient flexibility to allow shaping of U o to deal with HRV in specific frequency bands.
To address this deficit, the novel theoretical contribution of the present work is a reformulation of the stochastic optimal control problem to give low-pass characteristics in the compensator and in the input sensitivity function: this is achieved by constraining the compensator to be strictly causal, by extending the plant model to include a measurement noise component, and by derivation of the corresponding optimal control solution. This contribution is important because, on the one hand, it allows quantitative, natural measures of control system performance to be explicitly incorporated in the objective function for controller synthesis while, on the other hand, the low-pass loop properties ensure satisfactory behaviour of the control signal. These two factors are not, to our knowledge, directly addressed in any existing HR control strategy.
The principal empirical contributions, based on a formal test series with 20 subjects using the two quantitative outcome measures described above, are: (i) a formal statistical comparison of the new low-pass solution and the existing non-low-pass controller; (ii) an investigation of changes in physiological HRV over an extended time period while closed-loop HR regulation is in operation. These contributions are important for two reasons. Firstly, the subject cohort is large enough to allow well-powered, formal statistical analysis (most previous studies reported only a small number of individual cases, and had no statistical analysis). Secondly, possible changes in HRV over time are considered here for the first time.

Plant model
The nominal plant has the discrete time-domain format (see Figure 1) Here, the controlled variable y represents heart rate (HR) while the control signal u is the treadmill speed command. P d = B/A is a strictly causal transfer function linking u and y, A and B are polynomials in q −1 , with A assumed without loss of generality to be monic. Since B/A is taken to be strictly causal, the leading coefficient of B (the coefficient of the term q 0 ) is zero. Figure 1. Plant model and control structure. The controlled variable y is heart rate (HR); the control signal u is the treadmill speed command; and the reference signal r is the target heart rate (HR * ). The term u is the sample-to-sample change in the control sig- The disturbance term d is modelled as a stochastic process ψ d driving a shaping filter C/(∇A). A second stochastic process, ψ n , represents measurement noise, and z is the measurement signal. Transfer functions: P d is the nominal plant, C fb is the feedback compensator and C pf is a deterministic reference prefilter.
Physiological heart rate variability and other sources of uncertainty in the controlled variable are lumped into a disturbance term d, which is modelled as the output of a filter C/(∇A) driven by a stochastic process ψ d ; C and ∇ are polynomials, with 1/∇ integrating ψ d by virtue of the backward difference operator ∇(q −1 ) = 1 − q −1 . The intensity of ψ d is denoted σ d .
As noted previously, Hunt and Liu (2017), this integrating stochastic disturbance model admits various forms of heart rate variability including, but not limited to, random walks, random steps at random times, deterministic steps, or combinations thereof.
The controlled output y is subject to measurement noise represented by a stochastic process ψ n of intensity σ n . The measurement z which is available for feedback is thus given by It has previously been observed that a first-order transfer function P d , when used as the basis of linear time-invariant feedback design, is sufficient for the achievement of accurate and robust control of heart rate (Hunt & Fankhauser, 2016;Hunt & Liu, 2017;Hunt & Maurer, 2016). In this case, P d takes the strictly-causal form Furthermore, a previous system identification study proposed the adoption of a single model for moderate to vigorous exercise intensities, obtained empirically as the average of 48 individually identified models (Hunt et al., 2015). The average model has steadystate gain k = 24.2 bpm/(m/s) and time constant 57.6 s. A sample interval of 5 s was employed here. This meets contemporary guidelines which recommend having approximately 4-10 samples over the rise time (Åström & Wittenmark, 2011): with this sample interval, there are approximately 10 samples during the identified plant time constant of 57.6 s. A sample interval of 5 s has also been employed and found to be appropriate in several previous studies (Hunt & Fankhauser, 2016;Hunt et al., 2015;Hunt & Liu, 2017;Hunt & Maurer, 2016). Using a sample interval of 5 s, the discrete-time model Equation (5) is This pre-existing model is employed in the sequel for controller calculation. Thus, the controllers evaluated here were not specifically designed for any of the subjects tested.

Controller structure
The feedback compensator C fb operates on the intermediate signal e = r − z to compute the control signal u (see Figure 1), where r is the output of an optional reference prefilter C pf which is driven by the reference signal r. In the present study, C pf was implemented simply as a static gain, computed to give unity steady-state gain from r to y.
The feedback compensator C fb is purposely constrained in two ways: integral action is obtained by including the factor ∇(q −1 ) = 1 − q −1 in the denominator, and the compensator is made strictly causal by including a pure delay term q −1 in the numerator. C fb thus takes the form where G and H are polynomials to be determined by the cost-function minimization, with H assumed to be monic. Inclusion of integral action in the compensator via 1/∇ is concordant with the integrating disturbance term 1/∇ in the plant, Equation (2). In line with the Internal Model Principle (Francis & Wonham, 1976), this ensures zero steady-state tracking error for constant command and disturbance signals: expressed in the frequency domain, the factor 1 − z −1 in the denominator gives the compensator infinite steady-state gain, since lim ω→0 |C fb (e −jω )| = ∞ in this case.
The requirement that the compensator be strictly causal (and not merely causal), achieved via the delay term q −1 in the numerator, makes the compensator gain roll off to zero at high frequency, i.e. it has lowpass character. This in turn, as desired, makes the feedback loop insensitive to HRV disturbances at frequencies above the required loop bandwidth. This property can be understood by frequency-domain consideration of C fb (z −1 ): when expressed in terms of z rather than z −1 , the strictly-causal condition gives a compensator transfer function which is strictly proper in z, and not merely proper. This constraint achieves low-pass behaviour because lim ω→∞ |C fb (e −jω )| = 0 results. As noted below (Section 3.3), the input sensitivity function U o , Equation (9), is also low pass in this case.

Frequency-domain analysis
As an aid to controller design and empirical performance interpretation, two closed-loop transfer functions are employed. The input sensitivity function, denoted U o , principally describes the response between the disturbance d and the control signal u (it also defines the response of u to measurement noise ψ n and to the filtered reference r ). Shaping of |U o | therefore governs the effect of the broad-spectrum heart rate variability disturbance in d on the treadmill speed command u. The design goal with respect to u is to ensure that this control signal is not overly excited in frequency ranges which are readily perceptible to the runner.
The sensitivity function, S o , is the transfer function from the disturbance d to the controlled variable y. Thus, |S o | determines the degree to which HRV disturbances are suppressed by the feedback and, therefore, how accurate the HR regulation performance will be.
U o and S o are defined as follows: These expressions anticipate the definition of the closedloop characteristic equation for the optimal compensator, which is given later, in Equation (14), as A∇H + q −1 BG = D c D f . It can be seen that, by virtue of the pure delay term q −1 in the strictly-causal compensator, the input sensitivity function U o in Equation (9) is also strictly causal. Consequently, by a similar argument to that employed in Section 3.2, U o also has low-pass behaviour.

Cost function
The cost function defined below, Equations (12) and (13), penalises weighted sample-to-sample changes in the control signal, as defined by the intermediate signal u (see Figure 1) which is obtained using the inverse of the integration operator (i.e. differentiation), because Compensator polynomials G and H are those which give the minimum value of a cost function J, defined in the time and frequency domains as or, in terms of the signal u, as In the above, ρ is a tuneable control weighting, E is the expectation operator, and the terms φ y etc. represent signal spectral densities. ∇ * is defined as the conjugate polynomial ∇ * (z −1 ) = ∇(z).

Optimal control solution: low-pass behaviour
The optimal control solution presented here is a specialization of the general theory derived in (Hunt, 1989) to the plant model Equation (2) and strictly-causal compensator structure Equation (8). Moreover, an explicit solution to the design equation (14) is derived for the case of a first-order plant.
In general, the polynomials G and H that minimize the cost function J satisfy the linear polynomial Diophantine equation which is the characteristic equation of the feedback loop. D c and D f are strictly stable monic polynomials obtained from the spectral factorisations Under the assumption of a first-order plant with a degree of A of n a = 1 (i.e. Equation (5)) the degrees of D c and D f are seen to be n dc = 2 and n df = 2, respectively. Thus, the characteristic polynomial in (14), which we denote as = D c D f , has degree n φ = 4 and must take the form Consideration of Equation (14) then shows that a unique, minimal-degree solution for G and H must have polynomial degrees n g = 1 and n h = 2, whence the optimal strictly-causal compensator structure for a first-order plant is and the Diophantine equation (14) takes the specific structure An explicit solution is obtained by multiplying out terms on the left and equating coefficients of like powers on the left and right sides, viz.
The optimal compensator is then calculated in four steps: (1) perform spectral factorisations Equations (15) and (16) to obtain strictly stable polynomials D c and D f , (2) calculate as the product D c D f and identify the coefficients φ 1 to φ 4 from the form of Equation (17), (3) calculate compensator coefficients h 1 , h 2 , g 0 and g 1 from Equations (20)-(23), (4) implement feedback compensator as Equation (18).
For both compensators tested in the sequel, a sample interval of T s = 5 s was used, the nominal plant was taken to be that of Equation (6), and C was chosen to be C(q −1 ) = 1. For the low-pass compensator, intensities σ d = 1 and σ n = 435 were selected. The normalized value σ d = 1 was chosen, without loss of generality, because it is merely the relative values of σ d and σ n that affect the result of the spectral factorization Equation (16). The value of σ n was tuned to give a similar degree of rolloff of the input sensitivity function as employed in a previous study with a low-pass compensator, Hunt & Fankhauser (2016): there, as here with σ n = 435, the magnitude |U o | was −35 dB at frequency 0.01 Hz (see Equation (9) and Figure 2(a)). The control weighting was set to ρ = 67,000, which is the same value employed in a previous optimal control study with a non-low-pass compensator, denoted there as C 2 (Hunt & Liu, 2017). With these specifications, the compensator was calculated by following the steps summarized above to be where the suffix 'LQLP' denotes a Linear model, Quadratic cost function, and a compensator with Low Pass characteristics. The input sensitivity and sensitivity function magnitudes for this compensator, |U o | and |S o |, obtained using Equations (9) and (10), are plotted in Figure 2.

Optimal control solution: non-low-pass behaviour
To facilitate a comparative evaluation of the new low-pass and the existing non-low-pass compensators, the latter was designed using the simplified nominal plant model and corresponding optimal control solution derived and documented previously, Hunt and Liu (2017). Briefly, the causal non-low-pass compensator structure was (cf. Equation (8)) the plant model Equation (2) had C(q −1 ) = 1, and no measurement noise ψ n , Equation (4), was included in the model, i.e. σ n = 0. With σ d = 1, these restrictions of the problem formulation result in D f = 1 (Equation (16)) and in the simplified design equation (cf. Equation (14)) which, for a first-order plant model (5), gives the solution with Here, φ 1 and φ 2 are obtained from the spectral factorization (15), since in this case D f = 1 and therefore has the second-degree structure (q Using the plant model Equation (6) and, in common with the low-pass compensator, taking ρ = 67,000, C fb was calculated to be (this is compensator C 2 in Hunt and Liu (2017)). The shorter suffix 'LQ' is used to distinguish between this nonlow-pass compensator and the low-pass compensator C fb,LQLP , Equation (24). The input sensitivity and sensitivity function magnitudes for this compensator, |U o | and |S o |, obtained using modifications to Eqns. (9) and (10) (in this case, U o = AG/D c and S o = A∇H/D c ), are also plotted in Figure 2.

Experimental methods
In order to compare performance outcomes for the optimal non-low-pass (LQ) and low-pass (LQLP) compensators, 20 healthy male subjects were recruited for participation in a formal empirical test series; subject characteristics are summarized in Table 1.
Each subject was tested with both controllers, with the two tests for each subject taking place on separate days. The study design was counterbalanced by changing the order of presentation of the two controllers, i.e. LQ then LQLP vs. LQLP then LQ, for consecutive subjects. In total, therefore, there were 40 individual tests: 10 subjects were tested in the order LQ then LQLP, and 10 subjects were tested in the order LQLP followed by LQ.
Each controller test had a total duration of 45 min (2700 s). The target heart rate HR * (reference signal r) was set to a constant level throughout each test; it was individually determined for each subject to be at the boundary of moderate and vigorous exercise intensities, calculated according to reference Pescatello et al. (2014) as 76.5% of age-predicted maximal heart rate, HR max . Thus, HR * = 0.765 × HR max , where HR max was taken to be HR max = (220 − age) bpm, Shargal et al. (2015).
Real-time control experiments were performed using a computer-controlled treadmill (type Venus, h/p/cosmos Sports and Medical GmbH, Nussdorf-Traunstein, Germany) and a chest belt for heart-rate measurement (model T34,Polar Electro Oy,Kempele,Finland). Further information regarding the equipment employed is given in detail elsewhere, Hunt and Fankhauser (2016).
All procedures performed in this study in regard to the human participants were in accordance with the ethical standards of the local research committee: the study protocol was reviewed and approved by the Ethics Committee of the Swiss Canton of Bern. Informed consent was obtained from all individual participants. Notes: n = 20, all male; SD, standard deviation; BMI, body mass index (mass/height 2 ).

Primary outcome measures and statistical data analysis
The accuracy of output regulation was quantified using the RMS tracking error, calculated over N discrete sample instants from t 0 to t 1 , with N = t 1 − t 0 + 1. In general, y nom (t) is the nominal output obtained by simulating the closed-loop transfer function from reference r to controlled variable y ( Figure 1). Here, in the case of constant-reference regulation, y nom was set to the target heart rate value HR * which was constant over the evaluation interval, i.e. y nom (t) = HR * . The intensity of the control signal was quantified as the average power of sample-to-sample changes in the control signal u using This quantity is referred to in the following as 'average control signal power'. As noted previously, Hunt and Liu (2017), RMSE provides a sample estimate of the expectation E{y 2 (t)} in the cost function, Equation (12), while P ∇u is a sample estimate of E{u 2 (t)}, the second term in the cost function.
An overall performance evaluation was carried out, whereby the primary outcomes RMSE and P ∇u were averaged across all subjects and statistically analysed for differences in mean values between the two controllers tested, LQ and LQLP. For this analysis, an overall evaluation interval from 500 s to 2700 s (duration 2200 s, or 36 min 40 s) was used. The first 500 s were excluded to Table 2. Primary outcome measures for LQ vs. LQLP controllers and p-values for comparison of means (see also Figure 3). 12.2 ± 6.0 1.0 ± 0.6 11.2 (8.5, 13.9) 5.1 × 10 −8 Notes: n = 20; LQ, optimal controller with non-low-pass behaviour; LQLP, optimal controller with low-pass behaviour; MD, mean difference of LQ-LQLP; SD, standard deviation; 95% CI, 95% confidence interval for the mean difference; p-values, paired two-sided t-tests; RMSE, root-mean-square tracking error; P ∇u ; average control signal power; bpm, beats per minute.

Figure 3.
Primary outcomes: data samples for RMSE and P ∇u for all 20 subjects for the controllers LQ and LQLP (see also Table 2). The green lines link the sample pairs from each subject. The red horizontal bars depict mean values (given numerically in Table 2). D = LQ-LQLP is the difference between the paired samples. MD is the mean difference (red horizontal bar), with its 95% confidence interval (CI) in blue. The value 0 is outwith the respective 95% CI in both cases, indicating a significant difference between the means: this conforms with p < 0.05 for these variables (Table 2); the black horizontal bars also mark a significant difference between the means, where * * * * ⇔ p < 0.0001 (Table 2)  In the upper part of each figure, HR * is the heart rate reference (signal r), HR nom is the target nominal heart rate response (simulated, y nom ), and HR is the measured heart rate (controlled variable y). In the lower graphs, u is the control signal, i.e. the treadmill speed command. The thick red horizontal bars mark the overall outcome evaluation interval 500 ≤ t ≤ 2700 s. RMSE: root-mean-square tracking error, Equation (  allow initialization transients to settle. The strength of correlation between RMSE and P ∇u was analysed for both controllers using the sample Pearson correlation coefficient, denoted r. To analyse the evolution of heart rate variability over time, RMSE and P ∇u were also computed across the overall evaluation interval using a moving window of duration 730 s (12 min 10 s); the window duration of 730 s was chosen to be as close as possible to one third of the overall evaluation-interval duration (2200 s), bearing in mind the sample interval constraint of T s = 5 s.
Furthermore, mean values of RMSE and P ∇u for the first and last 730-s windows, obtained using both controllers, LQ and LQLP, were statistically analysed for differences between these two time periods. The first 730-s window, being the first third of the overall evaluation interval, is referred to in the sequel as 'Phase 1', while the last 730-s window is referred to as 'Phase 3' (it is the third third of the overall evaluation interval).
Paired two-sided t-tests were used for all hypothesis testing (normality of the data samples was confirmed in all cases using the Kolmogorov-Smirnov test with Lilliefors correction), the null hypothesis in all tests being that no differences in the mean values existed. For all comparisons of means, mean differences (MD) and their 95% confidence intervals (CI) were computed and are presented. The significance level for hypothesis testing was set to 5% (α = 0.05). Statistical analysis was carried out using the Matlab Statistics and Machine Learning Toolbox (The Mathworks Inc., USA).

Results
There were large and significant differences in both RMSE and P ∇u between the LQ and LQLP controllers (see Table 2 and Figure 3). LQ gave more accurate regulation performance (lower RMSE) but at the cost of a higher control signal intensity (higher P ∇u ): for the LQ vs. LQLP comparison, mean RMSE/(bpm) was 1.74 vs. 2.46 (p = 6.5 × 10 −6 ) and mean P ∇u /(10 −4 m 2 /s 2 ) was 12.2 vs. 1.0 (p = 5.1 × 10 −8 ).
Exemplary test data are presented in Figure 4 for the LQ and LQLP controllers: the figure shows the results having the lowest, median and highest values for RMS tracking error amongst all subjects with each controller.
Strong to very strong correlation between RMSE and P ∇u was observed ( Figure 5): for the LQ controller, there was a strong, positive linear correlation with r = 0.86 (p = 1.1 × 10 −6 ); for LQLP, there was a very strong, positive linear correlation with r = 0.95 (p = 6.6 × 10 −11 ).
The values of RMSE and P ∇u for the LQ and LQLP controllers, calculated on a moving window of duration 730 s and averaged across all subjects, showed a consistent downward trend over time ( Figure 6). Formal statistical comparison of the first and last 730-s windows was conducted (see Table 3 and Figure 7). This analysis showed that, for LQLP, differences in the outcomes were significant: mean RMSE/(bpm) was 2.66 vs. 2.28 (p = 0.031) and mean P ∇u /(10 −4 m 2 /s 2 ) was 1.2 vs. 0.9 (p = 0.030), first vs. last window. For LQ, there was modest evidence of a difference in average control signal power -P ∇u /(10 −4 m 2 /s 2 ) was 13.1 vs. 11.3 (p = 0.082) -but the difference in RMS tracking error was not significant: RMSE/(bpm) was 1.77 vs. 1.68 (p = 0.43).

Discussion
The aims of this work were to derive the stochastic optimal control solution giving low-pass characteristics in the compensator and in the input sensitivity function, to empirically compare the new low-pass solution (LQLP) and an existing non-low-pass controller (LQ), and to investigate changes in HRV over time.
The clear and significant differences which were observed between the LQ and LQLP controllers are in line with theoretical expectations: generally, there is a trade-off between the accuracy of regulation and the cost in terms of control signal intensity; that is, RMSE can be driven down by stronger suppression of plant disturbances, but this requires more activity in the control signal.
Specifically, the LQ controller had significantly lower mean RMSE (1.74 bpm, cf. 2.46 for LQLP), but average control signal power P ∇u was an order of magnitude higher  Table 3, and also depicted in Figure 7). The red dashed lines show overall mean values across the whole evaluation interval (Table 2) (12.2 vs. 1.0 × 10 −4 m 2 /s 2 ). While the improvement in tracking accuracy (i.e. lower RMSE) with LQ was statistically significant, it is likely that, from an applications perspective, the much lower intensity of changes in the treadmill speed command with LQLP would be preferable to most runners and that the higher level of RMSE seen with this controller would be regarded as acceptable; a mean RMSE of 2.46 still represents, in both absolute and relative terms, highly accurate tracking performance.
The observed differences in outcome measures are also consistent with the frequency domain analysis of the input sensitivity functions U o (Equation (9) and Figure 2(a)) and the sensitivity functions S o (Equation (10) and Figure 2(b)) for the two controllers. Since U o is the transfer function between disturbance d and control signal u, the low-pass behaviour of U o in the case of LQLP (Figure 2(a)) results in substantially lower excitation of treadmill speed by the HRV disturbance, whence the lower mean value of average control signal power P ∇u . S o , on the other hand, is the transfer function from disturbance d to controlled variable y. For the LQ controller, the bandwidth of S o is seen to be substantially higher than for LQLP (Figure 2(b)), meaning that HRV disturbances are attenuated over a wider frequency band, thus giving lower mean RMSE. Additionally, the nominal S o for LQLP has a slight amount of peaking (Figure 2(b)), which would further contribute to higher RMSE with this controller.
The LQ and LQLP controllers thus exemplify the trade-off between tracking accuracy and control signal intensity. This gives flexibility in the process of controller tuning, depending on whether, in a given application setting, highly accurate HR tracking is the priority or maintenance of a low-intensity speed command is of greater importance.
It can be shown that the LQLP formulation provides sufficient flexibility to achieve a desirable performance Notes: n = 20; LQ, optimal controller with non-low-pass behaviour; LQLP, optimal controller with low-pass behaviour; Phase 1, first 730-s window; Phase 3, last 730-s window; MD, mean difference of Phase 1-Phase 3; SD, standard deviation; 95% CI, 95% confidence interval for the mean difference; p-values, paired two-sided t-tests; RMSE, root-mean-square tracking error; P ∇u , average control signal power; bpm, beats per minute.   Table 3, and also depicted in Figure 6). D = R1-R3, or P1-P3, is the difference between the paired samples. MD is the mean difference (red horizontal bar), with its 95% confidence interval (CI) in blue. When the value 0 is outwith the respective 95% CI, this indicates a significant difference between the means: this conforms with p < 0.05 for these variables (Table 3); the black horizontal bars also mark a significant difference between the means, where * ⇔ p < 0.05 (Table 3)  It is seen that, as σ n → 0, LQLP → LQ. The value σ n = 435 was employed in the test series described in the text. LQ: optimal controller with non-lowpass input-sensitivity behaviour. LQLP: optimal controller with low-pass input-sensitivity behaviour.
trade-off across the range covered by the specific LQLP and LQ parameterisations tested above. This can be implemented in practice by regarding the stochastic measurement noise intensity σ n as a controller tuning parameter and noting that, as σ n → 0, the LQLP properties tend to those of the LQ controller: when σ n → 0, Equation (16) shows that, under the additional assumptions C = 1 and σ d = 1, D f → 1. Thus, from Equation (14), the closed-loop poles are completely specified by D c alone; furthermore, this solution corresponds to the nonlow-pass LQ controller solution given in Section 3.6. To further illustrate that the LQLP properties converge towards those of LQ as σ n → 0, the input sensitivity functions U o for LQLP with a range of values of σ n are plotted in Figure 8. The strong correlation observed between RMSE and average control signal power P ∇u , with correlation coefficient r around 0.9 ( Figure 5), provides evidence that the magnitude of HRV is an intrinsic physiological property of each subject: presumably, subjects with a higher level of HRV have consistently higher RMSE and P ∇u , and vice versa, because the HRV disturbance is the principal driver, respectively, of both the controlled variable y (via S o ) and the control signal u (via U o ).
Analysis of changes in HRV over time, via both trends in the 730-s moving windows for RMSE and P ∇u ( Figure 6) and formal statistical analysis of the time-related endpoints (Table 3 and Figure 7), gives a strong degree of evidence that the magnitude of HRV decreases over the 45-min duration of exercise at the moderate-to-vigorous intensity level. This finding is consistent with a previous report of substantial and significant decreases in both RMSE and P ∇u during an extended period of running at similar intensity (Hunt & Fankhauser, 2016).
The evolution of RMSE with the LQ controller merits further discussion since, although the trend is clearly downwards (Figure 6(a)), the difference between the endpoint values was found to be non-significant (p = 0.43; Table 3 and Figure 7(a)). This might be explained by the fact that RMSE was already very low in Phase 1 of the exercise (mean RMSE 1.77 bpm, Table 3) as a consequence of the relatively dynamic LQ controller parameterization. This value of RMSE may be close to some form of empirical lower bound, thus making further decreases difficult to observe.

Conclusions
The theoretical extension of the stochastic optimal control problem formulation allows the input sensitivity function to be designed with low-pass characteristics and with a bandwidth that can be influenced directly using noise intensity σ n as a tuning parameter, in addition to control weighting ρ. This development is important because it allows the trade-off between tracking accuracy and control signal intensity to be addressed directly.
The experimental results demonstrated accurate and robust control of HR across the 20 subjects tested, despite a simple pre-existing nominal model having been used for LTI controller design.
The key design issue in this application is that of suppression of disturbances arising from HRV, whereby changes in treadmill speed must remain acceptable to the runner. The results of this work show that this can be addressed by appropriate shaping of the sensitivity and input sensitivity functions using the tuning parameters.
The results of this study point to important areas that can be further investigated in future research. The observed pattern of decrease in HRV magnitude is presumably due to ongoing autonomic regulation of key physiological variables including body temperature, stroke volume and heart rate; further elucidation of these physiological aspects would be very helpful in guiding controller design methods. Separate experimental studies should be designed which are focused on these specific questions. With regard to feedback control strategies, since the magnitude of the HRV disturbance was seen to decline over time, particular attention should be given to controller design and assessment of controller performance during the initial stages of exercise sessions.

Disclosure statement
The authors declare that they have no conflict of interest.