Self-tuning state-feedback control of a rotary pendulum system using adjustable degree-of-stability design

This paper formulates an original hierarchical self-tuning control procedure to enhance the disturbance-rejection capability of under-actuated rotary pendulum systems against exogenous disturbances. The conventional state-feedback controllers generally make a trade-off between the robustness and control effort in a closed-loop system. To combine the aforementioned characteristics into a single framework, this paper contributes to develop and augment the baseline Linear-Quadratic-Regulator (LQR) with a novel “adjustable degree-of-stability design” module. This augmentation dynamically relocates the system's closed-loop poles in the stable (left-half) region of the complex plane by dynamically adjusting a single hyper-parameter that modifies the constituents of LQR's performance index. The hyper-parameter is adaptively modulated online via a pre-calibrated hyperbolic-secant-function that is driven by state-error variables. The performance of the proposed adaptive controller is benchmarked against fixed-gain controllers via credible hardware experiments conducted on the standard QNET Rotary Pendulum setup. The experimental outcomes indicate that the proposed controller significantly enhances the system's robustness against exogenous disturbances and maintains its stability within a broad range of operating conditions, without inducing peak servo requirements.


Introduction
The formulation of robust control strategies to enhance the performance and resilience of the under-actuated rotary-inverted-pendulum (RIP) systems has posed a great challenge to scientists [1]. The nonlinear characteristics and open-loop instability of such multivariable under-actuated systems require an agile control effort to prevent their performance from deteriorating under the influence of environmental indeterminacies and parametric uncertainties [2]. Extensive research has been done to synthesize robust control strategies for mechanisms belonging to the aforementioned class of under-actuated systems.
Despite its reliability and simplicity, the flexibility of a fixed-gain Proportional-Integral-Derivative controller is limited by the linear weighted sum of the input error-variables [3]. The fuzzy-logic control scheme requires a large number of rules that are contrived in accordance with the expert's knowledge [4]. The imprecise empirical construction of the fuzzy approximation method inevitably degrades the control signal quality under parametric variations [5]. The neural controllers require large sets of training data to devise an accurate inverse model [6]. Despite their robustness, the slidingmode controllers render highly discontinuous control behaviour which injects chattering in the response [7]. The ubiquitous Linear-Quadratic-Regulator (LQR) renders optimal control decisions by minimizing a quadratic cost-function that captures the variations in states and control-input profile [8]. However, its performance gets degraded under the influence of identification errors and modelling uncertainties [9]. Moreover, the selection of state and control weighting-factors for the LQR's cost-function is an ill-posed problem [10]. The numerical ill-conditioning problem associated with the LQR is generally solved by specifying a predetermined "Degree-of-Stability" (DoS) in its design [11]. Where in, the closed-loop poles of the system are allocated on the left-hand of the line s = −β in the complex s-plane, where, s is the Laplace operator and β > 0 is a preset hyper-parameter that defines the DoS [12]. The DoS design of LQR improves the controller's phase-margin which enhances the system's damping against nonlinear disturbances and oscillations [13].
The self-tuning adaptive controllers provide a pragmatic approach to strengthen the closed-loop system's immunity against bounded exogenous disturbances, under every operating condition, by dynamically reconfiguring the controller's behaviour [14,15].
They adopt well-postulated state-driven analytical (or logical) rules to automatically modify the controller's operational parameters which renders a robust control yield [16]. A plethora of adaptive state-feedback control schemes for multivariable under-actuated systems has been proposed in the literature [17]. The model-reference adaptive systems track the output of a reference model to reconfigure the behaviour of the operational controller [18]. However, identifying the adaptation-rates for the Lyapunov gain-adjustment law is a cumbersome task [19]. The gain-scheduling technique dynamically modifies the controller-parameters by commuting between a predefined set of distinct linear controllers, each designed to address a specific operating condition, which is usually selected via a state-error dependent look-up table(s) [20]. Postulating distinct linear controllers and guaranteeing their asymptotic-stability, for every operating condition, is a laborious task that often leads to the degradation of control quality [21]. The State-Dependent-Riccati-Equation based control schemes offer to generate robust effort to regulate the performance of inherently unstable and nonlinear systems [22]. However, the accurate definition of state-dependent-coefficient matrices to fully realize the nonlinear characteristics of the system is difficult due to the system's complex dynamics [23].
The main contribution of this article is the methodical formulation of a hierarchical adaptive statefeedback control strategy for a class of under-actuated systems. First of all, the baseline Linear-Quadratic-Regulator (LQR) is refurbished by using the prescribed DoS design strategy. The DoS design is digitally realized by retrofitting the LQR's cost-function with a timevarying exponential function having a growth rate of β. To further strengthen the robustness of the controller, the DoS-based LQR design is augmented with an online self-tuning strategy that adaptively modulates the growth factor β by using a pre-calibrated Hyperbolic-Secant-Function (HSF) that depends on the state-error feedback. This arrangement modifies the LQR's state-feedback gains after every sampling interval. The HSF waveform is calibrated such that β is enlarged when the system is in disturbed state, and vice-versa. To ensure an asymptotically-stable control behaviour, the HSF bounds are preset such that the adjusted value of β always remains positive. Credible hardware experiments are conducted on the standard QNET Rotary Pendulum board to verify the efficacy of the proposed self-tuning controller. The experimental results indicate that the proposed self-tuning controller manifests rapid transits in the response with strong damping against fluctuations and reasonable control activity while maintaining the system's stability throughout the operating regime.
The idea of using the proposed adjustable DoS mechanism to self-tune the LQR's state-feedback gains, with the aim to enhance the robustness of under-actuated mechatronic systems against bounded exogenous disturbances, has not been attempted in the available open literature. Hence, this paper mainly focuses on the realization and validation of the aforementioned idea.
The remaining paper is organized as follows. The mathematical model of the pendulum system is presented in Section 2. The baseline LQR is synthesized in Section 3. The augmentation of LQR with DoS design is presented in Section 4. The proposed self-tuning control law is formulated in Section 5. The experimental evaluation of the proposed controllers is presented in Section 6. The article is concluded in Section 7.

Mathematical model of system
The Rotary-Inverted-Pendulum (RIP) is a nonlinear, open-loop unstable, and under-actuated system [24]. Its inherent instability makes it an ideal mechatronic platform to analyse and validate the performance of the proposed control scheme [25]. The hardware schematic of the RIP system is shown in Figure 1. The actuating torque is applied via a permanent magnet DC geared servomotor. The angular rotation of the motor displaces the pendulum's arm, coupled to its shaft, which energizes the pendulum's rod to swing up and balance itself vertically. The angular-displacements of the arm and the rod are denoted as α and θ , respectively. The state-space model of a linear system is given by Eq. 1. (1) where, x is the state-vector, y is the output-vector, u is the control input signal, A is the state matrix, B is the input matrix, C is the output matrix, and D is the feedforward matrix. The state-vector and the control inputvector of the RIP system are identified in Eq. 2.  where, V m is the voltage signal applied to control the DC motor. The nominal state-space model of the RIP system is defined in Eq. 3, [26].

x(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t)
Where, The modelling parameters for the QNET RIP system are identified in Table 1 [26]. The open-loop poles of the QNET RIP system are evaluated as −6.04, −0.25, 0.00, +5.99, which clearly indicates that the system is open-loop unstable.

Baseline control scheme
The LQR belongs to the class of state-feedback controllers that are widely favoured for optimal positionregulation and reference-tracking control of multivariable electro-mechanical systems [10]. The LQR yields optimal control decisions by minimizing a quadratic cost-function, given by Eq. 4, that captures the state-variations and control-input associated with the dynamical system [8]. (4) where, Q ∈ R 4×4 and R ∈ R are the state-and control-penalty matrices, respectively. These matrices are selected such that, Q is a positive semi-definite matrix and R is a positive definite matrix. The control input cost is preset to unity to economize the control activity. The state-weighting factors are tuned by iteratively minimizing the quadratic cost-function, given in Eq. 5, which captures the real-time state-error and control-input variations to minimize the system's position-regulation error and control energy expenditure.
where, ε α and ε θ represent the error between the reference and actual angular position of pendulum's arm and rod, respectively. The cost-function J e gives equal weight to the error minimization criteria and control minimization criteria. The LQR with the cost-function J lq , expressed in Eq. 4, provides optimal state-feedback gains with the lowest cost of J lq . These optimal gains are acquired for a specific set of penalty matrices, Q and R. However, this selection does not necessarily imply a good time-domain performance with respect to J lq [27]. Hence, in this research, the cost-function J e is used to tune the state-weighting factors. The iterative tuning procedure aids in fine-tuning the weighting factors [28]. A large state-weighting factor minimizes the state-errors but also increases the actuator's servo requirements of the actuator, and vice versa. Generally, a compromise is made between stateerror behaviour and the control activity. In this work, the selection of state-weighting factors is limited within [0, 100]. The tuning process is initiated with Q = diag 1 1 1 1 . The iterative algorithm conducts an exhaustive search in the direction of descending gradient of J e . In every iteration, the pendulum is allowed to balance for 5.0 s, and the corresponding cost J e is evaluated and recorded. The iterative search is terminated when the minimum cost is achieved. The tuned Q and R matrices corresponding to this minimum cost are given as follows.
The selected Q and R matrices are used to solve the Algebraic-Riccati-Equation (ARE) shown in Eq. 7 [8].
The solution of ARE is the symmetric positive-definite matrix P ∈ R 4×4 . The matrix P is used to evaluate the fixed state-feedback gain vector, K, offline by using the gain calculator shown in Eq. 8.
Using this expression, the state-feedback gains are evaluated as K = −6.21 130.56 −4.22 17.83 . The linear control law is expressed as follows.
This control law is also retrofitted with auxiliary statevariables regarding the integral-of-error in α and θ . The integral controllers effectively attenuate the steadystate fluctuations and improve the system's referencetracking accuracy. The integral control law is given by Eq. 10.
where, ρ(t) is the vector containing the error-integral variables associated with α and θ . The integral gains are also tuned by iteratively minimizing the cost-function, J e . The search is initiated from K i = −1 −1 and continued in the direction of descending gradient of J e . The search is terminated until the minimum cost is achieved for the gains. In this research, the integral gain vector that yields the minimum cost is K i = −2.05 −7.45 × 10 −6 . The final expression of Fixed-gain Optimal Controller (FOC) is given by Eq. 11.
If the system is controllable, the solution of ARE yields an asymptotically-convergent control behaviour under every operating condition.

Prescribed DoS-based controller
The fixed-gain LQR is enhanced by retrofitting it with the prescribed Degree-of-Stability (DoS) design technique [29]. The LQR's cost-function, J lq , is augmented with an auxiliary tool that forcibly relocates all the closed-loop poles of the system on the left-hand side of the vertical line s = −β on the s-plane, where β is a prefixed positive hyper-parameter [30]. This technique ensures the asymptotic stability of the controller's operation by manipulating the position of the eigenvalues of system matrix A on the left-hand side of the complex s-plane. The conventional quadratic-cost-function of LQR is altered by multiplying it with a time-varying exponential factor of the form e 2βt . The revised costfunction is expressed as follows [13]. (12) The augmentation of LQR's cost-function with the exponential factor aids in shifting all the eigenvalues of the system on the left-side of the line s = −β. The purpose of including the exponential factor in J * lq and its contribution in manipulating the eigenvalues is systematically explained as follows. The expression of J * lq in Eq. 12 can be simplified as shown below.
This simplification suggests that the state-vector and the control-input vector can be updated as shown in Eq. 14 [13].
where, p(t) and m(t) are the modified state and controlinput vectors, respectively. With the substitution of the modified vectors, the cost-function is expressed as follows.
By taking the first derivative of both sides of p(t) expression in Eq. 14 and then substituting the original state equation (shown in Eq. 1) in the resulting expression ofṗ(t), the revised state-equation of the system is expressed via Eq. 16 [13].
where, I is an identity matrix of order 4×4. The revised state-equation transforms the state-matrix into A + βI. Hence, the expression of the ARE is also revised as shown below [13].
The revised ARE expression uses the same Q and R matrices as prescribed in Eq. 6. However, it is obvious that the fixed solution of the revised ARE, P * , now depends on the preset value of β as well. The (fixed) state-feedback gain vector is re-computed offline via Eq. 18.
Since K d depends on P * ; therefore, β indirectly changes the gain vector as well. In brief, the augmentation of LQR's cost-function with the exponential factor, e 2βt , ends up transforming the state-matrix of the system from A into A + βI, which clearly shows that the variation in the eigenvalues of the modified system depends on the value of β. In practice, the DoS scheme is realized by adding the term, βI, in the system's nominal statematrix A. The control law dictated by the DoS-based Optimal Controller (DOC) is shown in Eq. 19.
The DOC is also a fixed-gain controller. Choosing a suitable value of β is an ill-posed problem. A very small value of β brings the eigenvalues closer to the origin and inevitably makes the system sluggish. A relatively larger value of β pushes the eigenvalues farther from the origin and increases the response speed. However, in realtime hardware applications, this arrangement also contributes highly discontinuous control activity and peak servo requirements which unavoidably perturbs the system's state-response and induces oscillations in it. The proposed scheme configures a single parameter, β, to re-adjust the closed-loop poles on the left and improve the stiffness of the control effort against state variations. Alternatively, similar control behaviour can also be achieved by appropriately manipulating the coefficients of penalty matrices. However, this technique is avoided because it is computationally intensive. Instead of re-tuning all coefficients of Q and R matrices, this section employs a simpler yet effective method to achieve the desired behaviour by tuning a single parameter, β.

Adjustable DoS-based controller
The fixed value of β lacks the degree-of-freedom to flexibly manipulate the damping strength and response speed of the control procedure under state-variations and disturbances. Hence, this section contributes to formulate and retrofit the baseline LQR with an online self-adjusting DoS mechanism. The proposed mechanism indirectly modifies the state-feedback gains of the LQR by dynamically adjusting the factor β via a pre-calibrated nonlinear scaling function that depends on the real-time variations in the state-error variables. The position-regulation error in the arm and rod is diagnosed by computing the weighted sum of all the state-error variables. The online parameter-adjustment function is configured such that, the magnitude of β is enlarged smoothly when the weighted-sum of stateerror variables increases, and vice versa. This arrangement forcibly moves the eigenvalues farther away from the imaginary-axis to deliver a stiff control effort under a disturbed state, and vice versa. The aforementioned rationale enhances the system's response speed and damping against oscillations caused by exogenous disturbances. It penalizes a high-deviation of the stateerrors more than a smaller value of the state-errors.
The self-adjusting DoS mechanism is formulated by using an expert adaptive system that complies with the aforementioned rules. Several mechanisms have been proposed in the literature. The fuzzy and neural inference schemes are computationally expensive because they either require a large set of empirically-defined logical rules or a large set of training data to accurately update the critical controller parameters, respectively [4][5][6]. The online iterative-learning algorithms put an excessive recursive computational burden on the embedded processor [31]. The hyperbolic and sigmoidal scaling functions have also been extensively used to self-tune the controller-gains for robotic systems [27,32]. These nonlinear functions can be easily implemented in the control software by programming simple algebraic equations that can be easily solved online, after every sampling interval, to update the desired parameters [33,34]. Hence, in this research, the nonlinear scaling approach is chosen due to its computational economy for online applications. Apart from complying with the aforementioned meta-rules, the proposed nonlinear function is also required to possess the following properties: • Continuity: This feature allows a smooth transition of the parameter as the operating conditions change. • Even-symmetry: This feature allows for online adjustment of the parameter on the basis of the magnitudes of the state-error variables only. • Boundedness: This feature restricts the updated parameter value between pre-defined limits, which prevents the controller from entering in the unstable region of operation.
The hyperbolic-Tangent-Function is avoided due to its odd-symmetry [16]. The zero-mean Gaussian function is avoided because it computes the square of the input variable to establish even-symmetry, after every sampling instant, which inevitably increases the command execution time. Hence, in this work, the online reconfiguration of β is done by means of a Hyperbolic-Secant-Function (HSF) that is driven by the real-time variation in the state-error variables [19,33]. The HSF waveform is shown in Figure 2.
The waveform exhibits the desired properties and ensures a smooth transition of β across the entire operating regime [34]. It contributes gentle parameter variations in the equilibrium-state and rapid variations in the disturbed-state of the system. The HSF used for the dynamic adjustment of β is formulated as follows.
The parameters, β max and β min , are predetermined positive constants that define the upper and lower bounds of the function, respectively. The selection range of these bounds is [0, 1.0], as prescribed for fixed β in the Section 4. The variable, z(t), is the linear weighted sum of the state-error variables as shown below.
The parameters σ , δ, μ, and γ , are predetermined positive scaling coefficients associated with each stateerror variable. This computation is beneficial because it informs the HSF regarding the real-time changes in the system's state-error variables. It unifies the cumulative effect of the four state-error variables into a single variable, which helps the HSF to detect the disturbances in the time-domain response. The classical state-error variables (ε α and ε θ ) inform the HSF regarding the actual deviation in the state-variables at a given instant. The error-derivative variables (ε α andε θ ) help the HSF to anticipate the changes in the classical stateerrors and to amplify the control action if these changes are consistent. This self-learning capability enables the HSF to adapt β with the objective to improve the controller's response speed and strengthen its damping against overshoots and oscillations. The selection range of the scaling coefficients is experimentally identified for the RIP system with the objective to prevent wasteful control activity; that is, to avoid the application of unnecessarily high control in the equilibrium-state and insufficient control in the disturbed state. Consider the following scenario: Under large disturbances, the state-responses deviate from the reference. In the deviation phase, the polarities of errors and the corresponding error-derivatives are the same. Hence, the positive scaling coefficients will allow an increment in the magnitude of z(t). Consequently, the HSF will amplify the value of β to deliver a stiff control action. When the responses converge to the reference, the polarities of errors and the corresponding error-derivatives are opposite. In this phase, the positive scaling coefficients will allow a decrement in the magnitude of z(t). Hence, the HSF would reduce β to deliver a soft control action. The bounds of β(z, t) and the scaling coefficients of z(t) are tuned by iteratively minimizing the quadratic cost function, J e , shown in Eq. 5. For each parameter, a random value is chosen from the specified range and the iterative search is conducted in the direction of descending gradient of J e until the candidate solution with minimum cost is obtained. The tuned parameter values are shown in Table 2. The preset limits of HSF ensure the asymptotic-stability of the adaptive controller. The augmentation of the existing DoS-based LQR with the state-error dependent HSF, expressed in Eq. 20, adaptively manipulates the location of the system's closed-loop poles on the left-hand side of the adjustable vertical line s = −β(z, t). The modified costfunction is expressed as follows. (22) Consequently, the modified ARE is presented in Eq. 23.

(A + β(z, t)I) T P(t) + P(t)(A + β(z, t)I)
The modified ARE is solved online, after every sampling interval, to deliver the updated solution P(t). The time-varying state-feedback gain vector is computed via Eq. 24.
The control law representing the proposed Self-Tuning Optimal Controller (STOC) equipped with the adjustable DoS design is given in Eq. 25.
The integral gains are kept fixed at K i = [−2.05 −7.45 × 10 −6 ], as prescribed originally in Section 3. Their adaptive adjustment is not attempted because the "error-integral" variables were originally introduced in the control law to enhance integral damping. The proposed adaptive scheme manipulates the solution P(t), which leads to the online modification of the K f (t) vector only. The block diagram of the proposed STOC is shown in Figure 3. The STOC scheme can be applied to other under-actuated systems as well; given that, the nominal model and a pre-calibrated nonlinear scaling function of β(.) for the specific system are available a priori.

Experimental analysis
This section presents the details regarding the hardware setup and the experimental procedure to analyse the performances of the fixed-gain FOC, DOC, and the STOC in the physical environment.

Experimental setup
The hardware setup of QNET RIP that is used to test the proposed controller via real-time experiments is shown in Figure 4 [26]. The real-time variations in θ and α are measured via onboard rotary encoders. The measurements from each encoder are acquired via the NI-ELVIS II data-acquisition board that digitizes the measurements at a sampling rate of 1000 Hz and then serially transmits it to the LabVIEW-based software control application at 9600 bps. The software is operated on a 2.0 GHz personal computer with 6.0 GB RAM. The customized control application is developed using the "Block Diagram" tool in the LabVIEW's virtual instrument file. The front-end of this control application is used as a graphical-user-interface to visualize and record the real-time state and control-input variations of the system. The control software uses the acquired measurements to update the gains and generate the corresponding control signals. The software uses the built-in real-time clock of the digital computer to execute the programme and schedule the successive updates in the controller parameter(s) after every sampling interval. The control signals are serially transmitted to a motor driver circuit, commissioned on the QNET hardware setup, which translates and amplifies them into pulse-width-modulated signals to drive the DC servomotor.

Tests and results
The performance of STOC is benchmarked against FOC and DOC by conducting "five" unique hardware- in-the-loop experiments on the QNET RIP platform. The pendulum rod is manually lifted and allowed to balance at the beginning of every trial. The experimental results, depicting θ and α, are plotted in "degrees" to simplify the graphical visualization. The description of the test-cases and the corresponding graphical results are presented as follows.

Position-regulation
This test-case examines the capability of the proposed controllers to balance the pendulum rod vertically while maintaining the arm at its initial position with minimum deviations, under nominal conditions. In this test case, external disturbances or modelling errors are not applied. The corresponding variations in the response of θ, α, V m , K d and K f (t) are shown in Figure 5.

Impulsive-disturbance rejection
The immunity of closed-loop system against exogenous disturbances is assessed by applying a pulse signal directly in the control-input (V m ) of the system. The applied pulse has a magnitude of −5.0 V and time-duration of 0.1 s. The disturbance signal is sequentially injected in the control input at t ≈ 6.0, 9.5, 13.0, and 16.0 s mark. The corresponding variations in θ, α, V m , K d and K f (t) are shown in Figure 6.

Step-disturbance attenuation
The controller's resilience against abrupt parametric changes or step variations in the torque is analysed by applying a step disturbance signal of −5.0 V in the control input (V m ) of the system at t ≈ 8.5 s mark. The corresponding variations in θ, α, V m , K d and K f (t) are shown in Figure 7.

Modelling-error compensation
The controller's robustness against identification-errors and modelling-uncertainties is examined by attaching a mass of 0.10 kg beneath the base of pendulum's arm, via a hook, as shown in Figure 4. This arrangement changes the system's state-space model permanently, and hence, perturbs the pendulum's dynamic behaviour. The additional mass is attached to the arm at t ≈ 8.0 s mark. The corresponding variations in θ , α, V m , K d and K f (t) are shown in Figure 8. It is to be noted that the proposed modification is introduced in the pendulum's hardware setup for this test-case only.

Sinusoidal-disturbance suppression
The performance of the controllers under the influence of lumped-disturbance is examined by applying a

Discussions
The experimental results are examined based on the following Key-Performance-Indicators (KPIs): • The root-mean-squared value of error (RMSE x ) in the pendulum's angular responses (θ or α). • The transient-recovery time (t s ) taken by the response to settle within ±2% of the reference.  • The peak value of DC motor voltage (V m,p ) under transient disturbances.
The aforementioned KPIs are the standard performance measures that are used in the available literature to analyse the reference-tracking accuracy and control energy expenditure of a pendulum system, under the influence of exogenous disturbances [26,33]. These KPIs examine the average position-regulation error, the transient-recovery response, the controller's damping strength against overshoots (and oscillations), as well as the motor's average and peak servo requirements, respectively. All of these measures help in evaluating the controller's time-domain performance and disturbance-rejection capability under different practical scenarios. The numerical values of these KPIs, for each test-case, are recorded in Table 3.
The FOC demonstrates poor time-domain response in every test case. However, it exerts significantly lesser control input than DOC. The DOC shows reasonable improvement in the position-regulation accuracy of the arm and the rod as compared to FOC. However, it improves the robustness at the cost of higher control activity than FOC and STOC. The qualitative analysis of experimental results validates the superior robustness and adaptability of the proposed STOC. In Test-A, the RIP exhibits minimum RMSE in the responses of θ and α under the influence of STOC. The control energy expended by STOC is almost 23.3% and 15.3% lesser than the energy consumed by FOC and DOC, respectively. In Test-B, the STOC demonstrates relatively faster transient-recovery with strong damping to attenuate the peak magnitude of overshoots. The control-input expenditure as well as the peak-servo requirement of STOC is significantly lesser than FOC and DOC. In Test-C, the STOC effectively attenuates the influence of the applied step-disturbance by contributing minimum fluctuations and offset from the reference position in the response of θ and α, respectively. In Test-C, the step-disturbance permanently displaces the arm from its reference position. However, despite the perturbations, the STOC manages to effectively compensate for the artificially induced modelling-error by demonstrating minimum offset error from α ref . Furthermore, it exhibits minimal deviations in the angular-responses of θ and α. In Test-E, the STOC effectively damps the sinusoidal disturbance signal. Under the influence of STOC, the arm accurately tracks the reference position with the minimum RMSE. Furthermore, the peak-to-peak magnitude of oscillations, caused by the disturbance, is also limited to 10.25 degrees only.
The experimental results clearly validate the superior robustness of STOC. It surpasses the other two controller variants by exhibiting enhanced response speed, damping strength, noise suppression, modelling-error attenuation capability. The control energy expenditure of the STOC is relatively lesser than DOC and FOC. Furthermore, the STOC framework maintains the asymptotic-stability of the closed-loop system in every testing scenario.

Conclusion
This paper presents a hierarchical adaptive optimal control scheme for a class of multivariable underactuated systems by retrofitting the LQ state-feedback controllers with an original adjustable degree-ofstability design scheme. The proposed approach significantly enhances the controller's position-regulation accuracy and robustness against parametric uncertainties by dynamically adjusting the location of eigenvalues (and hence, the controller gains). Apart from improving the controller's robustness, the proposed framework also preserves its asymptotic stability under every operating condition. These propositions are justified via credible hardware experiments conducted on the QNET RIP system. The results also indicate that the proposed system consumes comparatively lesser control input energy than the other controller variants while attenuating the bounded exogenous disturbances. The proposed scheme is reliable, simple, and does not put any recursive computational burden on the embedded processor. Thus, it can be easily realized using modern-day digital computers. In the future, the performance of the proposed self-tuning control scheme can be investigated by using soft-computing techniques. Meta-heuristic algorithms can be tested to improve the tuning and parameter selection procedure. Moreover, the efficacy of the proposed adaptive controller can be investigated by applying it to other under-actuated electro-mechanical systems.

Disclosure statement
No potential conflict of interest was reported by the author(s).