Sampled visual feedback pose estimation and regulation based on camera frame rates

ABSTRACT This paper presents visual feedback 3D pose estimation/regulation methodologies in the sampled data setting by camera frame rates. Vision-based estimation/control problems have been studied by a number of research groups. While most works focus on the limitation of measured outputs, they conduct convergence/performance analysis under the assumption that visual measurements extracted from a camera are continuously available. However, the camera frame rates including image processing time often cannot be neglected compared with other computation time. In view of this fact, this paper newly proposes visual feedback pose estimation/regulation techniques under the situation that visual measurements are sampled due to the frame rates. The problem settings are first provided. Then, the pose estimation/regulation methods with sampled visual measurements are proposed. The convergence/performance analysis is conducted by the fusion of a Lyapunov-based approach and an event-triggered control technique. The present analysis scheme provides us guidelines for the design of estimation/control gains guaranteeing desired convergence/performance. The effectiveness of the present technique is verified via simulation and an experiment with real hardware.


Introduction
Vision sensors have been widely leveraged for situation recognition since they provide rich 2D information projected from 3D relative states [1].Thanks to this utility, vision sensors are also utilized in robotics and control engineering communities to develop vision-based autonomous control methods [2][3][4][5][6][7][8][9][10][11][12].In vision-based control, one of the main issues is how to deal with 2D visual information to estimate/control a 3D pose (position and orientation) of a target object (relative to a camera) [5,[10][11][12].On the other hand, compared with other computation times, the sampling time for extracting desired information from the image, i.e. a camera frame rate, might not be small enough to be neglected, especially when complex image processing algorithms are employed.Nonetheless, most of research studies, including the references above, focus only on the limitation of measured outputs for estimation/control under the premise that visual information is continuously available.
In view of this fact, this paper investigates visual feedback 3D pose estimation/regulation problems in the sampled data setting.Here, the objectives are (i) to estimate the 3D target object pose relative to the camera, and (ii) to drive the relative pose to a desired one, by using only sampled 2D visual information extracted from a monocular camera with a certain frame rate.
To achieve these goals, we introduce a visual feedback 3D pose estimation mechanism, called a visual motion observer, presented by [13,14] and also utilized for some extended estimation/control objectives [15][16][17].In the visual motion observer, a passivity property of rigid body motion plays a central role and estimation/control performance is analysed through a Lyapunov-based approach.However, similarly to the references above, the observer assumes continuous measurements of visual information.This assumption leads to the convergence/performance analysis allowing any large positive estimation/control gains, but high gain estimation and its control application often do not work in real experiments with sampled measurements.Therefore, this paper extends the visual motion observer to that with sampled visual measurements.
Since the observer input is formed by the visual measurements, the input is also sampled in the case of sampled measurements.For conducting convergence/performance analysis even in the sampled data setting, this paper employs an event-triggered control technique [18][19][20].The merit of event-triggered control is the possibility to reduce the number of recomputing inputs and transmissions (i.e.events) while guaranteeing desired performance.Specifically, interevent time is investigated to guarantee a monotonic decrease of Lyapunov-like functions for convergence.This technique is suited to the visual motion observer because its convergence/performance analysis is also based on non-increasing properties of Lyapunov-like functions.As a result of the event-triggered approach, we can obtain gain limitations for desired convergence/performance based on given certain camera frame rates.
In summary, the main contribution of this work is to propose a novel vision-based pose observer and its control application in the sampled data setting by camera frame rates.We first provide problem settings: a 3D relative motion model between a camera and a target object; a visual measurement model; and definitions of the estimation and control objectives.Then, we newly propose a sampled visual motion observer and give convergence analysis for the stationary target object case and tracking analysis for the moving object case.A pose regulation method based on the proposed sampled visual motion observer is next proposed, and the regulation/tracking analysis is also conducted for the stationary/moving object.Here, we provide the relationship between the frame rates and estimation/control gains to achieve desired convergence/performance. Specifically, we show that estimation and control errors are ultimately bounded by a function of the camera frame rate, estimation/control gains and target object velocity.This analysis provides us with the guidelines for gain settings.The effectiveness of the proposed methods is demonstrated via simulation and an experiment with real hardware.
The conference versions of this paper are reported in [21,22].While [21] considers only estimation problems, this work also tackles pose regulation problems.Compared with [22], we greatly improve the conservativeness of the gain condition by modifying the control law structure, and newly carrying out an experimental demonstration with real hardware.

Problem settings
This section formulates 3D visual feedback pose estimation/regulation problems consisting of two rigid bodies (a camera robot and a target object robot) and visual measurements by a monocular camera.

Relative rigid body motion
Throughout this paper, this work considers a visual feedback system shown in Figure 1.Let the world frame, camera frame, and object frame be w , c , and o , respectively (see Figure 1(a)).Then, the pose of the origin of c relative to w is represented by g wc = (p wc , e ξwc θ wc ) ∈ SE (3), where SE (3) The orientation is represented by the exponential coordinate of the rotation matrix with the unit axis ξ wc ∈ R 3 (ξ T wc ξ wc = 1) and the angle θ wc ∈ (−π , π].The operator ∧ : R 3 → so(3), so (3)  3 for the vector crossproduct "×," and ∨ : so(3) → R 3 is its inverse operator.For the ease of representation, ξ wc θ wc is simply written by ξθ wc in this paper.Similarly, the pose of o relative to w is represented by g wo = (p wo , e ξθ wo ) ∈ SE (3).
Let us next introduce the body velocity of c relative to w as Here, v ∈ R 3 and ω ∈ R 3 are respectively the body translational velocity and angular one.The pose g and body velocity V b can be also written in the homogeneous representation form as follows: (O m×n ∈ R m×n represents the m × n zero matrix.)Notice here that another definition of "∧" is used for the 6D vector

and V b
co := (g −1 co ġco ) ∨ ∈ R 6 , respectively.Then, g co = g −1 wc g wo holds, and thus the following relative rigid body motion is obtained from Vb co = g −1 co ġco : ( 1 )

Visual measurements
2D visual measurements extracted from a monocular camera are introduced as the measured outputs for 3D pose estimation/regulation laws.Although the extraction by perspective projection is only introduced in this paper, it can be easily extended to the panoramic camera case as in [15].Consider the target object with m (≥ 4) feature points.The positions of the feature points relative to the object frame o are represented by p oi ∈ R 3 , i ∈ {1, . . ., m}.Then, these positions relative to the camera frame c are given by p ci = g co p oi from the coordinate transformation. 1 We next denote the m feature points on the 2D image plane by Then, the well-known perspective projection [1] yields the following relationship for each f i ∈ R 2 (see Figure 1(b)): (2) Here, σ > 0 is the focal length of the camera.
This paper considers the situation that the visual measurements f are only available for pose estimation/regulation laws, and supposes that the positions of the feature points in o (i.e.p oi ) are known a priori.Then, the visual measurements are given by the function only of the relative pose g co , i.e. f (g co ). Figure 2 illustrates the block diagram of the relative rigid body motion (1) with the perspective projection (2).

Research objectives
For the present visual feedback system, the objectives of this paper are under the situation that only the visual measurements f (g co ) are available and the camera frame rates including the image processing time are non-negligible.

Sampled visual feedback pose estimation
We first propose a sampled visual feedback pose estimation mechanism taking account of camera frame rates for the objective (i).

Estimation error system
Since the 2D visual measurements (2) are only available, we consider the estimation of the 3D relative pose g co by a nonlinear observer.The estimate of g co is represented by ḡco = (p co , e ξ θco ) ∈ SE (3).Similarly to the Luenberger-type observer [23], we build the copy model of the relative rigid body motion (1) as follows: Here, u e = [u T ep u T eR ] T ∈ R 6 is the observer input for the estimation of g co .We note that the model (3) does not include the target object velocity information V b wo , because it is unavailable. 2Notice also that the estimated visual measurements f ∈ R 2m can be computed by ḡco and (2).
Let us define the estimation error g ee = (p ee , e ξ θ ee ) ∈ SE(3) between g co and ḡco and its vector form e e ∈ R 6 as follows: g ee := ḡ−1 co g co , e e := p ee sk(e ξθ ee ) ∨ .
Here, sk(e ξθ ) := (1/2)(e ξθ − e − ξθ ) ∈ so (3).The estimation error vector has the important property that for θ ee ∈ (−π , π), e e = 0 holds 3 if and only if g ee = I 4 which is equivalent to ḡco = g co , i.e. the objective (i) is achieved.It should be also noted that e e can be approximately reconstructed by the measurement error f e := f − f ∈ R 2m (refer to [13] for the details).Then, the time derivative of g ee along the trajectories of (1) and (3) yields the following estimation error system: ġee = −û e g ee + g ee Vb wo .
This is also given in the vector form with the Adjoint transformation Ad (g) ∈ R 6×6 as Then, it is shown in [13] that if V b wo = 0, the estimation error system (4) has a passivity-like property from the input u e to the output −e e with the storage function U e ≥ 0, i.e.Ue = u T e (−e e ) holds.Here, and notice that U e = 0 is equivalent to e e = 0, that is, the pose estimation (the objective (i)) is achieved.( • F represents the Frobenius norm.)Based on this property, Fujita et al. [13] propose the following negative feedback law to achieve the pose estimation: where the achievement of the estimation is proved by the direct use of the storage function U e as a potential function for the Lyapunov-based energy approach.However, this observer input assumes that the visual measurements f are continuously available although general cameras have 60, 30, 15, or fewer fps when the image processing time to extract the feature points is included.Therefore, this paper tackles a new visual feedback pose estimation problem explicitly taking account of the camera frame rates.

Pose estimation mechanism
Let us first consider the case that the camera has a fixed frame rate τ > 0 [fps], and, for ease of representation, this includes the image processing time to extract the feature points.The variable frame rate case is handled at the end of the main results.We assume that the computation time to calculate the estimation input is small enough to be neglected compared with 1/τ .Then, based on the frame rate τ , we introduce the sampling time sequence {t 0 , t 1 , t 2 , . . .} such that t i+1 − t i = 1/τ holds for all i ∈ N 0 .(N 0 represents the union of natural numbers and {0}.)Then, the visual measurements (2) are extracted at each time instant t i .
We now propose the following sampled observer input for the relative rigid body motion model (3): Notice here that the present input remains to be constant until it is re-computed.Figure 3 illustrates the block diagram of the present estimation mechanism called the sampled visual motion observer.In the present observer since the image Jacobian consists of the estimates f and ḡco as well as the visual measurements f [13], we employ the same samplers for these estimates as for the camera.

Analysis of pose estimation
This section provides the convergence analysis for the stationary target object (i.e.V b wo ≡ 0 holds) and the tracking performance analysis for the moving target.

Convergence analysis
Let us define the sampling error (t) ∈ R 6 between the actual estimation error e e (t) and the sampled one e e (t i ) as Then, we obtain the following theorem: Theorem 4.1: Suppose that the target object is static (i.e.V b wo ≡ 0).Then, if the camera frame rate satisfies the condition there exist finite time T a ≥ t 0 and positive scalars for the closed-loop system (4) and (6).In other words, the equilibrium point e e = 0 is exponentially stable after time T a .
Proof: When V b wo = 0 holds, the time derivative of the potential function U e along the trajectories of ( 4) and ( 6 Therefore, if the inequality is satisfied, we obtain We next derive the frame rate condition to guarantee the inequality (9).Notice first that it is enough to consider each time interval [t i , t i+1 ) since becomes 0 at the next time step t i+1 .Motivated by the analysis of event-triggered control [18][19][20] Here, we use the fact that ˙ = ėe for t ∈ [t i , t i+1 ).Let us now consider ėe .Then, we first get the following position term from the estimation error system (4) with V b wo = 0: ṗee = −u ep − ûeR p ee .
Before obtaining the orientation term, we note that the following equality holds for any vector a ∈ R 3 and any matrix A ∈ R 3×3 : From this property and the estimation error system (4) with V b wo = 0, we get Here, we use the notation  is the solution of η = 2k e (1 + η) 2 , η(t i ) = 0, we have χ(t) ≤ η(t) for all t ∈ [t i , t i+1 ).This means that the time it takes for χ to evolve from 0 to δ is larger than or equal to the time for η (see Figure 4).The time for η is given by δ/(2k e (1 + δ)) which is obtained by the solution T > 0 of η(t i + T) = δ.Therefore, if 1/τ ≤ T holds, ( 9) is guaranteed for all time because is reset to 0 before / e e reaches to δ (see Figure 4).We finally show the exponential stability of e e = 0 after some time, from the Gronwall-Bellman Inequality [24].Notice that if τ satisfies (7), then the inequality (10) holds, i.e.Ue is negative definite for all time.It should be also noted that U e is continuous and positive definite.Therefore, there exists finite time T a > t 0 in the time sequence t i , i ∈ N 0 satisfying U e (t) < 1 ∀ t ≥ T a .Then, φ(e ξθ ee ) < 1 holds for all t ≥ T a from the definition of U e .The property φ(e ξ θ ) ≤ sk(e ξ θ ) ∨ 2 also holds for φ(e ξθ ) < 1. 4 In summary, (10) provides the following inequality for every interval t ∈ [t i , t i+1 ), t i ≥ T a : Then, from the Gronwall-Bellman Inequality, we obtain We next consider t i+1 = t i + 1/τ .Similarly to the above discussion, we obtain Then, since U e (t i+1 ) ≤ U e (t i )e −k e (1−δ)/τ holds from (13) and the continuity of U e , we get Furthermore, the induction from time T a gives Finally, we derive the inequality (8) from (14).Remember that (1/2) e e 2 ≤ U e ≤ e e 2 holds after time T a .Then, substituting these inequalities into (14) yields  The condition (7) implies that the choice of a large feedback gain k e requires fast frame rates τ (small sampling intervals).This property is intuitive because large gains increase the influence of the sampling error , which might be a poor impact on the estimation.In other words, after choosing a camera with a certain frame rate, it is not free to make the gain large.Although the condition ( 7) is only sufficient, this analysis provides the significant insight that camera frame rates or image processing time is not negligible for pose estimation.

Tracking performance analysis
We next analyse the tracking performance for the moving target.Suppose that V b wo (t) ≤ κ ∀ t ≥ t 0 holds for a positive scalar κ > 0, i.e. the target velocity is bounded.We also assume that the value of κ is known a prior, e.g.simply as prior information or due to hardware limitations of the target vehicle.Then, the time derivative of the potential function U e along the trajectories of ( 4) and ( 6 Here, for ease of representation, we employ the following notation, differently from Ad (g) in (5): Let us now introduce a performance indicator γ > 0 to evaluate the sampling error .Then, if ≤ γ is satisfied, we get Since the right-hand side of (15) consists only of e e , we get the following theorem by employing ultimate boundedness analysis [24]: Theorem 4.2: Suppose that the norm of the target object velocity V b wo is upper bounded by κ.Then, for every initial estimation error e e (t 0 ), there exists T b ≥ t 0 such that the solution e e (t) of the closed-loop system (4) and ( 6) if |θ ee | ≤ π/2 and the following frame rate condition hold: Proof: If ≤ γ is satisfied for all time, we get the following inequality for δ ∈ (0, 1) from (15):

Remark 4.2:
The performance evaluation ( 16) can be rewritten as follows: This means that smaller γ and larger k e achieve better performance.However, both of them require fast frame rates because the right-hand sides of ( 17) are monotonically decreasing for γ and monotonically increasing for k e .Therefore, as we expected, γ can be considered as the indicator of the tracking performance and the (sufficiently) allowable gains for a designer.For example, choosing a camera with a certain rate enables us to design k e for desired performance related to γ from ( 16) and (17).A design example is provided in Section 7.
Remark 4.3: Theorem 4.2 provides two frame rate conditions depending on the initial estimation error.However, if we first run the present estimation law (6) before the target moves, we can consider only the second condition.

Sampled visual feedback pose regulation
We next propose a sampled visual feedback pose regulation mechanism based on the present sampled visual motion observer for the objective (ii).

Control error system
Similarly to the estimation error system (4) presented in Section 3.1, we build the control error system.The control error g ce = (p ce , e ξθ ce ) ∈ SE(3), and its vector form e c ∈ R 6 are defined as follows: g ce := g −1 d ḡco , e c := p ce sk(e ξθ ce ) ∨ .
Notice that for θ ce ∈ (−π, π), e c = 0 holds if and only if g ce = I 4 , i.e. ḡco = g d .Then, the time derivative of g ce along the trajectories of (3) provides the following control error system: This is also written in the vector form as Combining the estimation error system (4) with the control error system (20) yields the following total error system in the vector form: It is shown in [13] that if V b wo = 0, the total error system ( 21) also has a passivity-like property from the input u ce to the output ν ce ∈ R 12 defined as e ce := e c e e ∈ R 12 .
Here, e ce is the total control and estimation error vector, and the corresponding storage function U ≥ 0 is defined as which yields U = u T ce ν ce , i.e. a passivity-like property from the input u ce to the output ν ce [13].We note that U = 0 is equivalent to e ce = 0 for θ ce , θ ee ∈ (−π , π), and e ce = 0 means g co = g d , that is, the pose regulation (the objective (ii)) is achieved.
Based on the passivity-like property, Fujita et al. [13] propose the following negative feedback law to achieve the pose regulation as in Figure 1(a): Here, K ∈ R 12×12 is a positive definite gain matrix, and the achievement of the regulation is proved by the direct use of the storage function U as a potential function for the Lyapunov-based energy approach.However, this technique also assumes the continuous availability of the visual measurements f, which allows any positive definite matrix K.

Pose regulation mechanism
Consider the same settings as in Section 3, i.e. the fixed frame rate τ and the sampling time sequence t i , i ∈ N 0 .Then, motivated by the passivity-like property of the total error system (21), we propose the following visual feedback pose regulation input based on the sampled visual motion observer: Notice here that only the estimation error e e (t i ) is constant until the next sampling time since we consider the case that the computation time to obtain the estimate ḡco is small enough to be neglected.This structure enables us to greatly reduce the conservativeness of the frame rate condition provided by the conference version of this paper [22].In [22], ḡco is also sampled, i.e. instead of e c (t) and e − ξθ ce (t) , e c (t i ) and e − ξθ ce (t i ) are used in the regulation input (22).This paper assumes k e ≤ k c for better performance in the subsequent discussion.The reason to employ this gain relationship is that only the observer input for the estimation is sampled, and as a result, a large gain k e increases the influence of the sampling error (t) = e e (t) − e e (t i ), which results in bad control performance.The block diagram of the present sampled visual feedback system is illustrated in Figure 5.

Analysis of pose regulation
Similarly to Section 4, this section provides the convergence analysis for the stationary target object and the tracking performance analysis for the moving target.

Convergence analysis
Using the same definition of the sampling error (t) = e e (t) − e e (t i ), t ∈ [t i , t i+1 ) ∀ i ∈ N 0 as in Section 4, we have the following theorem: Theorem 6.1: Suppose that the target object is static (i.e.V b wo ≡ 0).Then, if the camera frame rate satisfies the condition there exist finite time T c ≥ t 0 and positive scalars for the closed-loop system (21) and (22).
Proof: When V b wo = 0 holds, the time derivative of the potential function U along the trajectories of ( 21) and (22)  The remaining analysis to obtain the frame rate condition (23) is the same as in the proof of Theorem 4.1 The condition (23) also implies that large feedback gains k c and k e require fast camera frame rates τ .

Tracking performance analysis
By using the same assumption V b wo ≤ κ ∀ t ≥ t 0 as in Section 4.2, the time derivative of the potential function U along the trajectories of ( 21) and ( 22) is given as follows: k e e ce 2 + √ 2k e e ce + κ e ce .
Here, we note that V b wo appears only in the estimation part for the total error system (21).Therefore, if ≤ γ is satisfied for a certain performance indicator γ > 0, we get Then, we have the following theorem similar to Theorem 4.2: Theorem 6.2: Suppose that the norm of the target object velocity V b wo is upper bounded by κ.Then, for every initial control and estimation error e ce (t 0 ), there exists T d ≥ t 0 such that the solution e ce (t) of the closed-loop system (21) and (22) and the following frame rate condition hold: Proof: If ≤ γ is satisfied for all time, the following inequality is obtained from (26): Therefore, from Theorem 4.18 of [24] and the property that (1/2) e ce 2 ≤ U ≤ e ce 2 holds for |θ ee |, |θ ce | < π/2, we can conclude that for every e ce (t 0 ), there exists T d ≥ t 0 such that e ce (t) satisfies (27).
We next derive the frame rate condition to guarantee the inequality ≤ γ .We get the following dynamics of from ( 21) and (25): Then, by the same approach as in the proof of Theorem 4.2, we get the frame rate condition (28).
The condition (28) also implies that smaller γ and larger k e require fast frame rates.Here, larger k e also requires larger k c from the gain relationship k c ≥ k e .

Variable camera frame rate case
So far, we have considered the situation that the visual measurements (2) can be extracted at every fixed sampling time 1/τ .However, actual sampling time for frame rates including image processing time is variable.To deal with this issue, we consider the worst case (i.e. the maximum sampling time).
Suppose that the visual measurements ( 2) are extracted at time instants {t 0 , t 1 , t 2 , . ..},where t i+1 − t i = 1/τ i , τ i > 0, i ∈ N 0 .Instead of the fixed camera frame rate case, we now suppose that the worst frame rate, denoted by τ min > 0, is known a priori via pre-experiments of image processing.Then, by simply replacing τ with τ min in the conditions ( 7), ( 17), (23), and (28), the non-increasing properties of U e and U are always guaranteed.We thus have the following corollary: Corollary 6.3: Suppose that τ i ≥ τ min is satisfied for all i ∈ N 0 .Then, the same statements as in Theorems 4.1, 4.2, 6.1, and 6.2 hold by replacing τ with τ min .

Verification
This section demonstrates the effectiveness of the present sampled visual feedback pose regulation mechanism (22) via simulation and an experiment.The verification only of the estimation is omitted because it has been already shown in our previous work [21]., where the units of v wo and ω wo are [m/s] and [rad/s], respectively.We artificially set the velocity to 0 after 50s.This setting enables us to verify both of tracking (until 50s) and convergence (after 50s) in a single demonstration.
The simulation results are depicted in Figures 6  and 7. Figure 6 shows the time response of the potential function U, and that of the norm of the total control and estimation error e ce is depicted in Figure 7.The exponential stability of e ce = 0 is seen from the behaviour  after 50s.We also see good tracking performance from the behaviour until 50s, where the potential function U sometimes increases due to the non-zero target object velocity V b wo , but it occurs within the bound as shown in the present analysis.In summary, the present sampled visual feedback pose regulation mechanism works successfully.

Experiment
We next show the validity of the present sampled visual feedback pose regulation mechanism (22) via an experiment.Here, two self-developed 2D omnidirectional robots are employed as the camera robot and the target object robot (see Figure 8).In this experiment, the pose estimation is conducted in three-dimensional space, but the pose regulation is demonstrated on two-dimensional plane (x-z plane) with the projection from 6D translational/angular velocity inputs onto 3D ones (i.e.V b wc = [v wc,x v wc,z ω wc,y ] T ∈ R 3 ).Each omnidirectional robot mainly consists of three motors (RA250100-58Y91 from Daisen Electronics Industrial, Co., Ltd.) with omnidirectional wheels (4571398310089 from Vstone, Co., Ltd.) and a single camera (FL3-U3-13S2C-CS from FLIR Systems, Inc.).The focal length of the camera is tuned as σ = 0.0193 [m], and the camera originally has 120fps.However, when we explicitly consider the image processing time to extract the four feature points attached to the target object robot (see Figure 8), the actual minimum frame rate becomes 18fps in this experiment.We also introduce a motion capture camera system (with Opti-Track Flex13 from NaturalPoint, Inc.) to obtain actual experimental data for the evaluation of the convergence/tracking, but this information is not used for the control inputs of the camera robot.
The present sampled visual feedback pose regulation mechanism (22)  .
Here, we artificially set the velocity to 0 until 10s and after 25s to verify tracking and convergence in a single demonstration.We also confirmed that the sampling rate of the observer was about 270Hz, which was small enough to be neglected compared with the camera frame rate 18fps.successfully achieves the tracking to the moving target, and the convergence to the desired relative pose to the stationary target is almost achieved.The convergence errors and the tracking delay are due to the physical elements such as the actual dynamics of the robots, the friction between the wheel and the field, and the distortion of the camera image.

Conclusion
This paper presented a vision-based pose observer and its control technique in the sampled data setting by camera frame rates.In the convergence/tracking analysis for the proposed methodologies, we provided the relationship between the frame rates and estimation/control gains.Specifically, we showed that the estimation and control errors are ultimately bounded by a function of the camera frame rate, estimation/control gains, and target object velocity, which provides us with the guidelines for gain settings.The utility of the proposed technique is demonstrated via simulation and an experiment with real hardware.
One of our future directions is to consider robot dynamics.In this regard, previous works [13,16] have already presented passivity-based visual feedback pose regulation mechanisms also for rigid body dynamics in the Euler-Lagrange equation form and the Newton-Euler one.We thus extend the current results with the assistance of these techniques.

Notes
1.In this equality, we use the homogeneous representation [p T ci 1] T , [p T oi 1] T ∈ R 4 .2. Pose estimation/regulation problems with the target object velocity estimation have been tackled in [16].3.For ease of representation, we often simply use '0' to denote zero vectors with appropriate dimensions.4.This is easily shown by the properties that φ(e ξθ ) and sk(e ξ θ ) ∨ can be respectively rewritten by 1 − cos θ and ξ sin θ, and thus φ(e ξ θ ) < 1 means |θ | < π/2.In this case, (1/2) e e 2 ≤ U e ≤ e e 2 also holds.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Figure 1 .
Figure 1.Visual feedback system.(a) Coordinate frames and (b) Perspective projection model.

Figure 2 .
Figure 2. Block diagram of relative rigid body motion (RRBM) with camera model.

Figure 3 .
Figure 3. Block diagram of sampled visual motion observer.
) for t ∈ [t i , t i+1 ) yields Ue = −e T e u e = −k e e T e e e (t i ) = −k e e T e (e e − ) ≤ −k e e e 2 + k e e e .

Figure 4 .
Figure 4. Image of time evolution of χ and η.

Remark 4 . 1 :
) yields Ue = e T e u e + e T e Ad (e ξθee ) V b wo ≤ −k e e T e e e (t i ) + e e V b wo ≤ −k e e e 2 + k e e e + κ e e .

Figure 5 .
Figure 5. Block diagram of sampled visual feedback system.
with k c = 2, k e = 1.4,δ = 0.9, pco = [0 0 0.4] T [m], and ξ θco = 0 [rad] is applied to the camera robot to achieve the desired relative pose p d = [0 0 0.33] T [m] and ξθ d = 0 [rad].Here, the initial relative pose is set as p co (0) ≈ [0 0 0.37] [m] and ξθ co (0) ≈ 0 [rad], and the feature points are attached at p o1 ≈ [0.02 0.12 0.10] T , p o2 ≈ [−0.02 −0.08 0.10] T , p o3 ≈ [0.02 0.08 0.14] T , and p o4 ≈ [−0.02 0.12 0.14] T [m] in o .The reference body velocity commands of the target robot are given as follows: The experimental results are shown in Figures 9-13.Figures 9 and 10 respectively depict the time responses of the potential function U and the norm of the total control and estimation error e ce .On the other hand, Figures 11-13 show each input/state behaviour focussed on the pose regulation on the 2D experimental field.We see from these figures that the present visual feedback pose regulation mechanism

of
Systems and Control Engineering, Tokyo Institute of Technology from 2013 to 2020, and a visiting scholar with the School of Electrical and Computer Engineering, Georgia Institute of Technology in 2019.Since 2020, he has been a senior assistant professor with the Department of Electronics and Bioinformatics, Meiji University.His research interests include cooperative control of robotic networks, fusion of control theory and machine learning, and vision-based estimation and control.Satoshi Nakano received a B.Eng. degree from Nagoya Institute of Technology, Japan, in 2013 and the M.Eng.and Ph.D. degrees in mechanical and control engineering from Tokyo Institute of Technology, Japan, in 2015 and 2019, respectively.Since 2019, he has been an assistant professor in the department of engineering, Nagoya Institute of Technology, Japan.His research interests include nonlinear control, constrained control, and vision-based estimation and control.Shunsuke Shigaki received his B. Eng., M. Eng., and Ph.D. degrees in mechanical and control system engineering from Tokyo Institute of Technology, Tokyo, Japan in 2013, 2015, and 2018, respectively.He was a JSPS Research Fellowship for Young Scientist (DC1) from 2015 to March 2018, an assistant professor in the division of systems research, Yokohama National University from 2018 to 2019, an assistant professor in the department of system innovation, Osaka University from 2019 to 2023, and a visiting scientist with Max Planck Institute for Chemical Ecology in 2022.He is currently an assistant professor in the principles of informatics research division, National Institute of Informatics, Japan since 2023.His research interests include bio-inspired robotics and algorithms, soft robotics, machine learning, and neuroethology.Takeshi Hatanaka received a Ph.D. degree in applied mathematics and physics from Kyoto University in 2007.He then held faculty positions at Tokyo Institute of Technology and Osaka University.Since April 2020, he has been an associate professor at Tokyo Institute of Technology.He is the coauthor of "Passivity-Based Control and Estimation in Networked Robotics" (Springer, 2015) and the coeditor of "Economically-enabled Energy Management: Interplay between Control Engineering and Economics" (Springer Nature, 2020).His research interests include cyber-physicalhuman systems and networked robotics.He received the Kimura Award (2017), Pioneer Award (2014), Outstanding Book Award (2016), Control Division Conference Award (2018), Takeda Prize (2020), and Outstanding Paper Awards (2009, 2015, 2020, 2021, and 2023) all from SICE.He also received 3rd IFAC CPHS Best Research Paper Award (2020) and 10th Asian Control Conference Best Paper Prize Award (2015).He is serving/served as an AE for IEEE TSCT, Advanced Robotics, and SICE JCMSI, and is a member of the Conference Editorial Board of IEEE CSS.He is a senior member of IEEE.
k e δ e e 2 + (k e γ + κ) e e U e ≤ e e 2 holds for |θ ee | < π/2, we can conclude that for every e e (t 0 ), there exists T b ≥ t 0 such that e e (t) satisfies (16). ) and to consider not e e (t) but e e (t i ) to reduce the conservativeness.Let us now derive the upper bound of e e (t i ) .We first consider the case that e e (t 0 ) ≥ (k e γ + κ)/(k e δ).Then, the following inequality holds from (1/2) e e 2≤ U e and (18):e e (t) ≤ 2U e (t) ≤ 2U e (t 0 ) ∀ t ≥ t 0 ,which also means e e (t i ) ≤ √ 2U e (t 0 ) ∀ i ∈ N