Vision-aided precise positioning for long-reach robotic manipulators using local calibration

A vision-based guidance methodology is proposed for precise positioning of the tool center point (TCP) of heavy-duty, long-reach (HDLR) manipulators. HDLR manipulators are non-rigid structures with many nonlinearities. Therefore, conventional rigid-body–based modeling and control methods issue challenges for accurate TCP positioning. To compensate for these errors, we compute the pose error between the TCP and an object of interest (OOI) directly in the camera frame, while using motion-based local calibration to find the extrinsic sensor-to-robot correspondence. The proposed pipeline for local calibration is twofold: first, the detected tool is oriented perpendicularly with respect to the OOI. Second, range adjustment is performed in the local planar plane by exploiting the visual measurements. Two methods for adjusting the range were examined: a line equation–based method and a trajectory matching–based method. Real-time experiments were conducted using a HDLR manipulator with a 5 m reach, and visual fiducial markers were used as detectable objects for the visual sensor. The experimental results demonstrated that the proposed methodology can provide sub-centimeter positioning accuracy, which is very challenging to achieve with HDLR manipulators due to their characteristic uncertainties. GRAPHICAL ABSTRACT


Introduction
Visual sensors, such as cameras and laser scanners, are becoming an essential part of robotic systems.Traditionally, robots' work tasks have been pre-programed by utilizing precise joint sensors with high repeatability [1].However, emerging intelligent algorithms combined with visual detection and pose (3 DOF position and 3 DOF orientation) estimation provide a way for autonomous robotic systems performing work tasks based on noncontact visual sensing [2].A typical scenario is visual servoing, in which an object of interest (OOI) is detected using a visual sensor, and then the tool center point (TCP) of the manipulator is guided toward the OOI based on the visual feedback [3,4].Methods of visual servoing are generally classified into position-based (PBVS) [5], image-based (IBVS) [6], and hybrid systems [7].In a PBVS system, the control error is defined in Cartesian 3D coordinates, and the control algorithm utilizes the robot's kinematic model along with camera calibration parameters.In an IBVS system, the control law is defined in the 2D image plane using image features directly.The relationship between the image plane and the robot is established using an image Jacobian matrix, which describes a nonlinear mapping between the image feature errors and CONTACT Petri Mäkinen petri.makinen@tuni.fi the pose of the robot [8].Hybrid systems attempt to utilize the advantages of both the PBVS and IBVS.Visual control can also be realized in an open-loop manner, where the continuous feedback loop is omitted.Applications for vision-based control include grasping, surgical instruments, insertions, pick and place, and so forth.
The first of the two main causes of inaccuracies in a robotic manipulator's TCP positioning results from incorrect variables in kinematic modeling, such as the Denavit-Hartenberg (DH) parameters.The second cause are non-kinematic errors caused by, for example, structural bending and flexibility, thermal effects, backlash, and sensor resolution [9].The problem of precise TCP positioning in robotics is highlighted in the context of heavy-duty, long-reach (HDLR) manipulators that are utilized in mobile machines.For economic reasons, it is desirable that HDLR manipulators are constructed to be as lightweight as possible while able to handle significant load masses.Such conditions result in high structural flexibility and other uncertainties that are not an issue with small and compact industrial robots.For HDLR manipulators, traditional rigid-body kinematicsbased control is not sufficient for precise TCP positioning due to the non-rigid structures.However, precise positioning is an essential requirement for automated work tasks even with a human-in-the-loop and eventually fully autonomous machines.Therefore, HDLR manipulators could benefit considerably from camerabased guidance that is utilized to minimize the positioning error between the TCP and an OOI, which thus bypasses the weaknesses of the imprecise yet computationally efficient rigid-body-based kinematics, along with non-kinematic error sources.
Attaching a camera near the tip of a robotic manipulator is known as an eye-in-hand configuration [10].The rigid relation between the camera frame (eye) and the TCP of the manipulator (hand) is described with a transformation matrix estimated using extrinsic calibration [11].This static mapping allows information obtained using the camera, or another visual sensor, to be expressed in the robot's coordinate system.Typically, the extrinsic calibration procedure requires a predefined object with a set world frame, such as a checkerboard or a circular grid [12][13][14][15], which is used to estimate the hand-eye transformation.The procedure also requires taking images from different distances and angles with respect to the calibration object.Consequently, this setup is mostly feasible for structured environments in controlled factory settings.A key differentiation between mobile machines with on-board HDLR manipulators and conventional industrial robots is that the former operate in dynamic, unstructured environments.Thus, using a predefined calibration object to estimate the hand-eye transform in mines, fields, or plantations, is not realistic or practical.
Vision-based control in mining applications, including the classic peg-in-hole problem, was discussed in [16][17][18].In the peg-in-hole problem, the positioning of the tool to a desired OOI is essentially reduced to a planar positioning problem in the vicinity of the OOI, which is also exploited in our methodology.Visual servoing has been widely employed to solve the peg-in-hole problem in repetitive assembly tasks related to factory automation [19].Thus, these methods have mostly focused on structured scenes [20], whereas HDLR manipulators work in dynamic and unstructured environments.Other miningrelated studies with visual sensors include [21], in which a laser scanner was utilized for collision avoidance.Studies utilizing laser scanners in mines also include [22,23].In [24], an eye-to-hand configuration was used for positionbased visual guidance of a heavy-duty rock-breaking manipulator.Specific markers for calibration purposes were distributed into the workspace and a considerable number of measurements were conducted to estimate the extrinsic camera-to-robot calibration parameters.In [25], PBVS using an RGB-D camera was investigated for a multi-joint hydraulic manipulator, but the camera was rigidly mounted at a known location along the manipulator's kinematic chain, and no explicit hand-eye calibration was described.In [26], sub-centimeter absolute positioning accuracy was achieved with a HDLR manipulator by using a total station network.This method requires a large space in which to operate and a considerable investment in the sensors.
In this paper, the objective was to drive the tool of a HDLR manipulator to an OOI using a low-cost camera.HDLR manipulators would benefit from vision-based guidance systems especially in auxiliary tasks requiring precise (sub-centimeter) TCP positioning accuracy.Potential tasks for increased levels of automation include, for example, swapping tools or navigating the TCP to a pre-drilled hole.The desired positioning accuracy in this type of application is +/ − 5 mm, which is very difficult to achieve without a highly skilled human operator due to the characteristic uncertainties of HDLR manipulators.To solve this problem, we propose computing the pose error between the tool and an OOI directly in the camera frame (which relates our approach to PBVS), while using motion-based local calibration to align the camera frame with the TCP frame.Notably, the procedure does not require placing dedicated calibration objects in the environment.The pipeline for motion-based local calibration is twofold: First, the tool's torsion axis is oriented perpendicularly w.r.t. the OOI.Then, range adjustment is performed in the local (planar) plane while the perpendicular configuration is maintained.Two methods for adjusting the range are presented: a line equation-based method and a trajectory matching-based method.The first method is simple and easy to implement, while the second method is more specific and complex to implement.Real-time experiments were conducted on a HDLR manipulator with a 5 m reach, and a low-cost stereo camera was attached near the tip.In the experiments we used ArUco markers [27] as 'generic' representations of a tool and an OOI.The long-term goal is to use realworld objects.A key assumption of our methodology is that the tool and the OOI are visible to the camera, they can be detected, and their 6 DOF poses can be estimated.As the main contribution, it is shown that the image-based positioning error between the tool and the OOI can be reduced to the sub-centimeter range using the proposed methods, which is very challenging to achieve with HDLR manipulators due to their characteristics.
The rest of the paper is organized as follows.Robotics and control-related preliminaries are given in Section 2, the methods used are detailed in Section 3, the experimental setup is described in Section 4, the results are discussed in Section 5, and finally, the paper is concluded in Section 6.

Robotics and control preliminaries
The pose x ∈ R 6 of a robotic manipulator in its operational space describes the position and orientation of the TCP frame w.r.t. the base frame of the manipulator.The transition between the operational space and the joint space can be written using forward kinematic equations when individual joint variables q ∈ R n are known: where p ∈ R 3 represents the TCP position and θ ∈ R 3 its orientation.The rotation of the TCP frame is expressed using minimal representation (i.e.Euler angles).The relation between the TCP's linear and angular velocities and the joint velocities is computed as follows: where J(q) ∈ R 6×n is the Jacobian matrix describing the linear mapping from joint space velocities q ∈ R n to the operational space velocities ẋ ∈ R 6 .If the TCP velocities are known, the joint velocities can be obtained by using the inverse Jacobian as follows: Assuming that the desired TCP position p d and the orientation θ d are known, along with the respective desired linear velocities ṗd and angular velocities θd , the desired joint velocities qd can be computed using Equation (3) as follows: where K p and K θ are the control gains associated with the incorporated position and orientation feedback.Furthermore, δr denotes the orientation error, expressed in terms of quaternion.The desired joint positions are then integrated from the desired joint velocities: The control input vector u is then formulated as follows: where K v is a matrix containing the joint control gains.

Methods
The overall procedure for guiding the tool to an OOI using local calibration is to first orient the tool's torsion axis perpendicularly w.r.t. the OOI based on the visual measurements.Then, motion-based range adjustment is conducted in the local (planar) plane while the perpendicular configuration is maintained.Two methods for adjusting the range were examined: a line equation-based method and a trajectory matching-based method.
The line equation-based method utilizes a circular path and a line equation to compute the desired TCP position.The OOI is required to be visible to the camera at all times.This method relies on the uniformity of the camera frame and the TCP frame, so that the plane of motion is planar in both frames.The depth parameter is not directly considered.
In contrast, the trajectory matching-based method employs visual odometry/simultaneous localization and mapping (VO/SLAM).VO/SLAM-based TCP pose tracking of a HDLR manipulator in a confined space was studied in [28].Thus, this method does not require the OOI to be visible during the calibration.This method is also more robust in the camera placement, as the camera's rotation w.r.t. the manipulator's base frame is included in the computed calibration matrix.Furthermore, the depth parameter is considered although the calibration path is planar.
The objective of the local calibration procedure is to obtain a reference pose x ref ∈ R 6 for the TCP, while the perpendicularity is maintained.Then, using the basic rigid-body-based kinematics and control methods presented in Section 2, the tool is driven to the desired OOI based on the positioning error measured directly from the image.The methodologies for adjusting the orientation and the range are detailed below.

Orientation adjustment
In the case of the ArUco markers, the goal was to orient the tool marker to match the OOI's orientation so that the last joint's torsion axis was perpendicular w.r.t. to the OOI.The poses of the OOI frame C T O and the tool frame C T T are obtained from image-based computations.The rotation components are formulated as unit quaternions, r O and r T .Then, the rotation difference r between the frames is computed as the product of the two quaternions, while inversing the tool frame quaternion: The last joint's torsion axis is oriented in a perpendicular configuration w.r.t. the OOI frame by computing the manipulator's reference TCP pose x ref in the form of a transformation matrix as follows: where R ∈ R 3×3 denotes the rotation matrix computed from the quaternion product in Equation ( 7), and 0 = [0, 0, 0].

Line equation-based method
In accordance with [29], the transformation matrix relating the base of the manipulator to the TCP frame comprises unit orientation vectors (n, s, a) and the position vector p: As discussed, the control problem in the OOI's vicinity is treated as a planar positioning problem.Thus, the local 2D plane for the range adjustment is defined by the unit orientation vectors by using the current pose of the manipulator resulting from the orientation adjustment.
For the line equation-based method, a circular path is designed using point-to-point motion in Cartesian 3D space as follows: where r denotes the circle's radius, p init = [p 1 , p 2 , p 3 ] T denotes the initial TCP position, the unit orientation vectors n ∈ R 3 and s ∈ R 3 are extracted from Equation ( 9), and φ ∈ [0, 2π ].The first and last points of the path are set to the center of the designed circular path.The path is executed while holding the current TCP orientation.During the path execution, the image-based distance error between the tool and the OOI is maintained in metric form: where p x and p y denote the metric positions of the detected tool and the OOI measured in the camera frame, respectively.After the path is completed, the minimum distance error d ,min is obtained.The algorithm saves the initial pose of the manipulator (the circle's center) and the pose corresponding to the measured minimum distance error: The normalized unit vector pointing toward the OOI is then computed using the two points as follows: The reference TCP pose x ref ∈ R 6 is expressed using the following transformation matrix: where the current TCP orientation is held.The reference position is computed with the line equation using the initial position (the circle's center), the deduced directional unit vector, the circle's radius, and the minimum distance error between the tool and the OOI.The circular path and related variables are illustrated in Figure 1.

Trajectory matching-based method
Trajectory matching was explored as an alternative method for adjusting the range.This method is more complex but does not require detection of the OOI during the motion-based calibration.It is also not dependent on the orientation adjustment, although the perpendicular configuration is helpful when computing the pose error between the tool and an OOI.First, a point-to-point calibration path in planar space is designed using the following equations: where [x 1 , y 1 , z 1 ] T = B p T , the unit orientation vectors n ∈ R 3 and s ∈ R 3 are obtained from Equation ( 9), the angle ξ is altered between 0 and π 4 , and L i denotes the length of the ith path segment.The path was designed to be asymmetric to improve the matching outcome.
The asymmetric path is executed, and two pose trajectories are saved: (i) The camera's pose trajectory is estimated using a VO/SLAM algorithm, and (ii) the TCP pose trajectory based on the kinematics model and the joint sensors is also obtained.The camera-tokinematic model transformation matrix is then obtained using robust point set matching.The overall formalism is detailed in [30], which has two steps: (1) Coarse frame alignment is used to roughly orient the two point sets containing the pose trajectories to avoid an incorrect (e.g.mirrored) matching result and (2) fine matching using probabilistic hybrid mixture model-based point set matching [31].This method utilizes the full 6 DOF pose data in the trajectory matching procedure, whereas most algorithms utilize only the 3 DOF position data.However, other accurate point set matching methods should also suffice.
After the point set matching sequence is completed, information measured in the camera frame can be expressed w.r.t. the manipulator's base frame.The locally calibrated position error between the tool and the OOI is written as follows: where R fm denotes the rotation matrix related to the fine matching, and R cfa denotes the rotation matrix related to the coarse frame alignment.The transformation matrix C T contains the positioning error between the OOI and the tool frame, measured in the camera frame, as follows: where b is a small offset to avoid collision between the tool and the OOI.Then, the reference TCP pose x ref ∈ R 6 is obtained using the following transformation matrix: where the tool orientation is held, and the locally calibrated image-based positioning error p lc ∈ R 3 is incorporated into the current position.While maintaining the local calibration, the reference TCP pose can be updated using the visual sensor and Equation ( 22) as often as required (within the limitations of the sensor's refresh rate).The principle for the trajectory matching-based method is illustrated in Figure 2.

Vision system
ArUco marker detection and pose estimation were realized using OpenCV functions, which employ monocular methods.The metric scale was obtained based on the known marker size.The experimental setup is illustrated in Figure 3.A single marker representing the tool was  attached parallel to the torsion axis of the last joint of the manipulator.Additionally, the marker was set at approximately a 90 • angle w.r.t. to the axis.A three-marker setup was used to construct the OOI.Specifically, the OOI was in the middle of the three markers, and the OOI detection algorithm was designed so that one detected marker out of the three is sufficient to deduce the pose of the OOI.
The objective was to drive the tool marker to the OOI, with a depth offset to avoid collision.A low-cost ZED2 stereo camera was attached near the TCP and parallel to the torsion axis of the last joint.The left lens was used to detect the marker and estimate the pose.Therefore, the pose error between the tool marker and the OOI was known in the camera frame.As for the VO/SLAM utilized in Subsection 3.2.2, the open-source ORB-SLAM2 stereo algorithm [32] was employed.An adequate surface for the VO/SLAM algorithm's feature extraction was provided by a textured wall.A PC running ROS was used to handle the vision system, and the required information was sent to the manipulator's Beckhoff control system via UDP.The camera settings were set to 720p at 60 FPS, and it was assumed that the intrinsic parameters were known.

Hydraulic manipulator with a 3 DOF wrist
A laboratory-installed HDLR manipulator with an approximately 5 m reach was used in the experiments.
Using the DH convention, the forward kinematic model of the manipulator is formulated.The symbolic DH parameters are shown in Table 1, where 'Link 7' is a mock frame to unify the last frame with the tool marker's frame.
The rigid transformation matrix from the base of the manipulator to the TCP is computed as follows: where T i denotes the joint-specific transformation matrices, formulated as follows: while using the respective DH parameters of each joint.Additionally, s = sin, and c = cos.After the desired pose x ref for the TCP was obtained using Equations ( 8), (16), or (22), point-to-point paths were generated using a quintic polynomial [33].Openloop visual control (looking and then moving) was utilized, as for HDLR manipulators in dynamic environments, safety is a significant concern and cameras are subject to robustness and reliability issues in such conditions.Instead, only the joint controllers work in closedloop.Closed-loop visual control (looking and moving) is also possible by continuous usage of the camera feed, but requires constant and reliable vision of the tool and the OOI.By performing several 'looking and then moving' steps, the open-loop method can provide the same final positioning accuracy as the closed-loop method.The control system is described in Figure 4.For joint control, Equation ( 6) was utilized.For the two joints contributing the most (lift and tilt), PT-1 control was used, for which the transfer function is written as follows:  where K p is the proportional gain, and τ is a time delay term, which enables larger gain values compared to pure P-control.This reduces static errors when positioning to a specific target point.

Orientation adjustment
The first step in each measurement was to perform the orientation adjustment as in Subsection 3.1.An example  result is illustrated in Figure 5, which shows the imagebased rotation errors between the tool marker and the OOI.The errors are expressed as XYZ Euler angles for clarity.As shown, two of the errors are successfully driven close to zeros by using a single control input.The objective was to achieve zero orientation errors so that the tool's torsion axis is perpendicular w.r.t. the OOI.The perpendicularity is important in the peg-in-hole task and for line equation-based range adjustment.The measurements were performed repeatedly, with the error behaviors resembling the illustrated example.In general, the orientation signals were not of good quality, and the initial rotation difference was assumed to be small.The orientation adjustment also assumed the uniformity of the tool marker's frame and the TCP frame based on the kinematic model.In the experimental setup, some errors existed in the alignment.

Line equation-based method
After the orientation adjustment, the line equation-based range adjustment was initiated by executing the circular path.The circular path in Equations ( 10)-( 12) was realized using 18 points, with the first and last points set at the center of the circle.The radius was set to 20 cm.After the path was successfully completed, the reference TCP pose was obtained using Equation ( 16).This method was designed as a single point-to-point problem, as it is expected that the OOI will be occluded toward the end of the final approach.
An example execution of the circular path and driving to the reference TCP pose is illustrated in Figure 6.The black line represents the reference path, whereas the red line shows the completed path.Generally, this was the typical result achieved with the experimental setup.As shown, the completed path clearly resembles a circle; however, it is not completely planar as required to compute the line equation-based range adjustment as precisely as possible.
The image-based position errors between the tool marker and the OOI are illustrated in Figure 7. Notably, this method did not account for the depth parameter (Z-axis), yet it is still shown.The X-axis and Y-axis positioning errors, estimated by the camera, were reduced to a range of a few centimeters.The measurements were performed repeatedly, and the results are documented in Table 2, which also shows the image-based absolute rotation errors resulting from the previous orientation adjustment.The mean absolute rotation errors between the tool marker frame and the OOI frame were reduced to less than 2 • , except for the Z-axis, which contained more errors.The mean absolute positioning accuracy Notes: The first three columns show the image-based absolute rotation errors between the tool marker and the OOI after the orientation adjustment.The respective absolute position errors after the range adjustment are shown in the next two columns.The depth (along the Z-axis) between the tool marker and the OOI in each measured case is also documented in the last column.
w.r.t. the XY plane was less than 3 cm per axis, which did not satisfy the desired sub-centimeter accuracy.In measurement 7, the desired accuracy of the sub-centimeter range was achieved; however, the result was clearly not reliably reproducible.Based on the results, no clear connection between a successful orientation adjustment and the resulting final positioning accuracy can be seen.However, the performance also relies on accurate execution of the circular path, for example.A servo-type implementation variant was also tested by applying the currently measured variables to Equation ( 16) and then providing a second control input to the system.However, this did not improve the end result as the dictating factor is the computed directional unit vector.

Trajectory matching-based method
For the trajectory matching-based range adjustment, the manipulator was first aligned with the OOI using the orientation adjustment.However, the trajectory matching-based method itself is not reliant on the orientation adjustment, unlike the line equation-based method.
The motion-based local calibration was conducted by first completing the planar path designed using Equations ( 17)- (19).The two pose trajectories were obtained using the VO/SLAM algorithm and the kinematic modelbased TCP computation with joint encoders.The resulting point sets were matched, and the reference pose for the TCP was obtained using Equation ( 22).An example of the completed path is shown in Figure 8.The respective mean and maximum absolute errors after the point set matching in a single measurement are documented in Table 3.As shown, the matching errors are small, and the calibrated camera signals correspond to the encoder-based TCP pose.The results were similar for the measured cases.After obtaining the local calibration, the manipulator was driven to the computed reference pose.An example result is shown in Figure 9, which shows the imagebased position errors between the tool marker and the OOI.The final positioning is also shown in Figure 10, where the tool marker was driven to the OOI.As with the line equation-based method, the goal was to achieve precise positioning with a single point-to-point control input, meaning the camera would look and then move only once.However, two control inputs were required for precise positioning: The first was given at approximately 8 s, which did not result in the desired sub-centimeter accuracy.The second control input was given at approximately 45 s, which reduced the errors to the desired range, with the exception of the depth variable.It had an offset of b = 10 cm in Equation (21), and the depth is allowed more tolerance than the other axes, as positioning in the XY plane holds the most significance.
The measurements were performed repeatedly, starting from the orientation adjustment followed by the point set matching, and each measurement demonstrated uniform behavior, as shown in Table 4. Two updates for the visual control input were required in each measured case to achieve sub-centimeter position errors, meaning the camera was used to look and then move twice.For each case, the second update clearly resulted in sub-centimeter position errors, whereas with just one control update, the desired accuracy was not achieved, especially in the Yaxis direction.This suggests that the image-based pose estimation of the markers becomes more accurate as their distance to the camera is reduced, although the camera is relatively close initially.
Overall, the trajectory matching-based method performed much more reliably than the line equation-based method.The results demonstrate that sub-centimeter positioning accuracy can be achieved using a low-cost visual sensor, despite using the simple rigid-body kinematic modeling and controller structures that are prevalent in the industry.However, the trajectory matching-based method relies on the performance of the VO/SLAM pose estimation, which requires certain conditions, such as sufficient lighting and textured surfaces, for feature extraction.There exists some research on improving the performance of VO/SLAM in challenging environments, for example, [34].Notes: Two updates for the visual control input were required in each measured case to achieve sub-centimeter position errors, meaning the camera was used to look and then move twice.The first three columns show the errors after the first update, and the last three columns show the respective errors after the second update.

Conclusion
In this paper, a vision-based guidance methodology for precise TCP positioning of HDLR manipulators was examined.For HDLR manipulators, the conventional extrinsic camera calibration procedure is not realistic or practical due to the dynamic, unstructured working environments and inaccuracies in rigid-body kinematic modeling.Instead, local calibration and visual guidance based on direct measurements in the camera frame were proposed.Notably, placing predefined calibration objects in the environment is not required.The presented methodology comprised orientation adjustment followed by range adjustment, for which two methods were explored.A key assumption is that the tool and the OOI are detectable using a camera in the eye-inhand configuration, and that their 6 DOF poses can be estimated.ArUco markers were used as 'generic' representations of a tool and an OOI.However, for HDLR manipulators in dynamic environments, it is not a realistic option to place markers around the workspace.Thus, for a practical application, the vision-based detection and pose estimation of tools and OOIs have to be realized with application specific parameters.This includes omitting the markers completely and shifting to real-world objects.This can be a challenging task depending on the complexity of the target objects, such as tools, tool racks, and drill holes.Real-time experiments were conducted using a HDLR manipulator with a 5 m reach, and a low-cost stereo camera was used for vision-based measurements.Openloop visual control was employed by first looking and then moving.The motivation was to avoid closed-loop visual control due to safety concerns.Nevertheless, subcentimeter positioning accuracy was achieved using the trajectory matching-based method.For the application of interest, the desired positioning accuracy is +/ − 5 mm, as the tolerances are much larger than in industrial robotics.The line equation-based method suffered from accumulated errors beginning from the hardware installation and orientation adjustment to the path tracking.Thus, the positioning accuracy with this method had considerable variation, and the desired accuracy was not reliably achieved in the experiments.
The main advantage of the proposed methods is that computing the pose error between a detected tool and an OOI directly in the image frame, using a lowcost camera, enables precise positioning accuracy.This is typically very challenging to achieve with HDLR manipulators due to their characteristic uncertainties.Moreover, the motion-based local calibration does not require calibration objects placed in the workspace.The main challenges lie in the robustness and reliability of vision-based algorithms.For example, VO/SLAM systems require certain conditions and the object detection and pose estimation for complex objects can be challenging to realize.Future research should focus on extending the proposed locally calibrated vision-based guidance method into a practical application by replacing the ArUco markers with application specific objects.

Figure 1 .
Figure 1.The principle of the line equation-based range adjustment.

Figure 2 .
Figure 2. The pipeline for the trajectory matching-based range adjustment.

Figure 3 .
Figure 3.The experimental setup.The OOI was in the middle of the three ArUco markers, and the marker attached at the end of the HDLR manipulator represented a detectable tool.

Figure 4 .
Figure 4.The control loop for the image-based reference pose.Image processing is conducted on a dedicated PC running ROS, whereas the rest of the process is handled on the Beckhoff control system.The numbers refer to the main related equations.

Figure 5 .
Figure 5. Image-based orientation errors between the tool marker and the OOI before and after the orientation adjustment, expressed in Euler XYZ angles.The control input was given at approximately 5.6 s, and the zero errors represent failed samples.

Figure 6 .
Figure 6.An example of the circular path and the computed line equation-based range adjustment.The black line shows the reference path, whereas the red line shows the completed path.

Figure 7 .
Figure 7. Image-based position errors between the tool marker and the OOI.The objective was to reach zero errors w.r.t. the X-axis and the Y-axis, but the depth along the Z-axis is also shown.

Figure 8 .
Figure 8.An example of trajectory matching.The green line denotes the reference path, the black line shows the encoder-based TCP position, and the red line illustrates the VO/SLAM-based camera position after the camera-to-kinematic model local calibration.

Figure 9 .
Figure 9.An example of image-based position errors before and after the trajectory matching-based range adjustment.Two control updates (manually input at approximately 8 s and 45 s) were required to achieve sub-centimeter position errors in the XY plane, with the depth axis error slightly larger.

Figure 10 .
Figure 10.An example of the final positioning result with the trajectory matching-based range adjustment.

Table 1 .
DH parameters of the HIAB033 with a 3 DOF wrist.

Table 2 .
The results for the line equation-based range adjustment.θ X [deg] e θ Y [deg] e θ Z [deg] e X [mm] e Y [mm] e Z [mm] e

Table 3 .
The mean and maximum absolute errors after point set matching.

Table 4 .
Image-based absolute positioning errors between the tool marker and the OOI using the trajectory matching-based method.