Digital-twin deep dynamic camera position optimisation for the V-STARS photogrammetry system based on 3D reconstruction

ABSTRACT Photogrammetry systems are widely used in industrial manufacturing applications as an assistance measurement tool. Not only does it provide high-precision feedback for assembly process inspection and product quality assessment, but also it can improve the flexibility and robustness of manufacturing systems and production lines. However, with growing global competition and demands, companies are forced to enhance production efficiency, shorten production lifecycle and increase product variety by incorporating reconfigurable factory design that can meet challenging timeline and requirements. Although dynamic facility layout is widely investigated, the position selection for the photogrammetry system in dynamic manufacturing environment is usually overlooked. In this paper, dynamic layout of the V-STARS photogrammetry system is investigated and optimised in a digital-twin environment using deep reinforcement learning. The learning objectives are derived from the field of view (FoV) evaluation from point clouds 3D reconstruction, and collision detection from the digital twin simulated in Visual Components. The application feasibility of the proposed dynamic layout optimisation of the V-STARS photogrammetry system is verified with a real world industrial application.


Introduction
Photogrammetry-assisted measurement solutions are widely used in industrial manufacturing applications such as civil engineering (Schuch et al. 2019), additive manufacturing (Bortolini et al. 2020), aerospace (Yelles-Chaouche et al. 2020), experimental solid mechanics (Baqersad et al. 2017;Bortolini, Gabriele Galizia, and Mora 2018) and reverse engineering (Maganha, Silva, and Ferreira 2019).The advantages of the photogrammetry optical solution include its non-contact nature, fast data acquisition rates, large coverage of targets, high point density and high-precision feedback.Also, the photogrammetry system allows the rectification and rework demand to be dramatically reduced, as any deviation and error during production can be instantly detected.
A typical photogrammetry system consists of cameras and retro-reflective targets, and usually requires sophisticated preparation, such as software configuration, camera positioning, image collection and processing, and 3D reconstruction (Skřivanová and Melichar 2019).Despite the fact that photogrammetry systems do share a large slice of the industrial inspection market, the inspection scenario is difficult to find in an automated and established manner.More specifically, the position of CONTACT Zi Wang Sara.Wang@nottingham.ac.uk photogrammetry cameras is one of the key factors that should be considered during the use of optical coordinate measurement devices (Zhang et al. 2021).Nevertheless, photogrammetry systems are sensitive to position selection in a work cell and quality results are highly dependent on the coverage of the object surfaces and the precision of the measurement.Current studies on layout position optimisation for photogrammetry decives mainly focus on applicationspecific and fixed facility layout design (Ahmadabadian et al. 2014;Barazzetti 2017;Rangel, Costa, and Loula 2019;Tarabanis, Allen, and Tsai 1995).However, the problem we are facing in this paper is that the layout of the work cell is reconfigurable or with rapid changes (Koren et al. 1999).As mentioned before, the configuration of photogrammetry systems often demands complex calibration and testing processes.Therefore, the tremendous labour for repetitive adjustment of the photogrammetry system such as hardware reintegration and software resetting in the real world is unavoidable.Hence, some of the approaches proposed in Zhang et al. (2021), Rangel, Costa, and Loula (2019), Barazzetti (2017) for position optimisation are not suitable for a dynamic layout design.Virtual V-STARS system in digital twin.The V-STARS photogrammetry system provides accurate three dimensional measurement in industrial manufacturing application.The virtual V-STARS system in digital twin in Figure 1(a) duplicates the same coverage of the FoV as given in Figure 1(b).
In addition, the photogrammetry system is required to assist the manufacturing processes during its entire production horizon, which means at multiple key timestamps, the photogrammetry system should have sufficient FoV for the target object coverage (Mavrinac and Chen 2013).However, it is not easy to configure a photogrammetry system in a compact work cell as shown in Figure 1(b).Not only should the target features fall in the FoV during the whole production horizon, but also the collision among different facilities should be avoided.
Moreover, most of the camera position optimisation approaches are designed using computer aided design (CAD) product data and models (Bergström, Fergusson, and Sjödahl 2018;Sims-Waterhouse et al. 2017).However, throughout the assembly processes, products are handled by conveyors and manipulators.Given that the photogrammetry system is usually fixed during the production horizon, reduced visibility caused by manipultor blockage is highly possible.As in Figure 1(b), the manipulator blinds the visibility towards the target frame.Hence, CAD data cannot fully represent a real-world work cell (Nakath et al. 2022).
In this paper, we propose a novel digital-twin based deep optimisation framework for positioning the V-STARS photogrammetry system (as shown in Figure 1(a)) based on point cloud 3D reconstruction.There are several reasons why deep reinforcement learning was selected for this application.Firstly, establishing a dynamic photogrammetry camera position model mathematically or based on data sets is difficult in the virtual world.This is because the target object might be blocked by manipulators or other devices during the overall assembly production process.Reinforcement learning is a good option as there is no established/existing knowledge or data model, which is the building block required for any supervision-learning type of strategy.
In addition, for a novel reconfigurable manufacturing system, there is no prior knowledge or data for camera position optimisation.Thus, a feasible solution to is to learn and interact with an established digital twin in a trial-and-error way using reinforcement learning.During optimisation, reinforcement learning can elevate those that enable a whole sequence of good solutions and eliminate low-reward solutions.
Finally, PPO (Proximal Policy Optimization), DQN (Deep Q Learning) and A2C (Advantage Actor Critic) were ideal because the learning states are chosen as discrete.Note that, the optimisation can also be extended to other discrete deep reinforcement learning algorithms such as HER (Hindsight Experience Replay) and QR-DQN, which is built on DQN and derived from quantile regression explicitly modelling the distribution over returns.
During the camera position optimisation, the V-STARS system is considered as an agent which explores the digital twin environment established in Visual Components.Note that, the digital twin is setup with the same parameters (horizontal 72 deg and vertical 58 deg) of the V-STARS cameras for their FoVs shown in Figure 1(a).Following a discrete space deep reinforcement learning framework, the objectives are derived from collision detection and coverage evaluation of the target frame surface based on 3D reconstruction.
The contributions are highlighted as follow: (1) The digital twin of a dual-camera V-STARS photogrammetry system along with the reconfigurable manufacturing cell is created with camera FoV captured in a manufacturing simulation software, Visual Components; (2) The coverage evaluation method of a given target object based on 3D reconstruction with a dualcamera V-STARS photogrammetry system is proposed, considering both FoV coverage and overlapping; (3) The deep dynamic layout optimisation in the digital twin environment is addressed with deep reinforcement learning algorithms, such as PPO, DQN and A2C; (4) Experiments are carried out to demonstrate the application feasibility of the proposed deep dynamic layout optimisation for the V-STARS system.
The remainder of the paper is organised as follow: the related research work is summarised in Section 2; the coverage evaluation for the V-STARS system FoV is presented in Section 3; furthermore, the deep learning framework for layout optimisation is given in Section 4; finally the evaluation is conducted in Section 5 and the conclusion are drawn in the last section.

Related work
In this section, the camera position optimisation is investigated in Section 2.1.In addition, the effect of camera position in reconfigurable manufacturing systems is discussed in Section 2.2.Finally, the research status of photogrammetry systems in dynamic layout optimisation and reconfigurable manufacturing systems is analysed in Section 2.3.

Camera position optimisation
In modern advanced manufacturing, photogrammetry systems are extensively used as measurement-assisted tools to improve efficiency and reduce manufacturing costs (Wang et al. 2020).Using a single camera or a pair of cameras is limited to specific machine vision applications and is usually insufficient for process inspection due to self-occlusion and constraint FoV (Liu et al. 2019).As one of the key factors in photogrammetry applications, the FoV coverage is often considered as an optimising objective.Multiple camera viewpoints can also maximise the FoV coverage, and it can be realised with relative movement of a network photogrammetry system (Gai, Da, and Tang 2019).However, these methods are applicationspecific and once the layout is configured, it does not support further modification, which is inapplicable for reconfigurable manufacturing system use cases.
Position selection is not only required for large-scale metrology, but also crucial in microscopic scale manufacturing (Ren et al. 2019).To improve measurement accuracy, the characterisation of intrinsic and extrinsic camera parameters are discussed in the following works (Liu et al. 2019;Sun, He, and Zeng 2016;Xing, Yu, and Ma 2017).As pointed out in Barazzetti (2017), users particularly who have access to photogrammetry systems, often do not have much measurement experience and usually produce crude results at the cost of accuracy.Hence, finding an efficient solution to automatically optimise the position of the photogrammetry system is highly required.
The work done in Mason and Grün (1995), Nakath et al. (2022) addresses the configuration of a sensing system derived from a knowledge-based expert system for view planning based on CAD models.In Carrivick, Smith, and Quincey (2016), the evolving position of camera systems is proposed to find the optimal position for imaging networks for unmanned aerial vehicles.Instead of using a knowledge-based expert system, the evolving system applies genetic algorithm to learn the best positions of the imaging network.In addition, the imaging network system proposed in Ahmadabadian et al. (2014), investigated the optimisation problems following four steps, datum definition, optimal distance derivation, viewpoint generation, and finally clustering and selection.The designed system depends on the initial target geometry obtained from the structure light projection technique.
Other approaches address the photogrammetry camera positions using evolutionary algorithms with combined objectives of FoV coverage and viewpoint redundancy (Rangel, Costa, and Loula 2019).Furthermore, in Erat et al. (2019), a real-time online view planning method is discussed based on an incoming view coverage evaluation with a predefined coverage metric.This approach seeks to iteratively optimise a sparse network by importing additional camera views.Any iterative approach as such usually requires a rational initial setting and the computational expense would significantly impact the real-time measurement speed.In Zhang et al. (2021), the position of the photogrammetry system is optimised based on visible point analysis derived from the hidden point removal approach using a genetic algorithm.
In summary, current photogrammetry position optimisation frameworks were proposed for fixed layout designs and specific applications but not for the reconfigurable facility layout.Although computer aided techniques were studied in several publications, they were used for a single product without considering the other involved facilities during the production horizon.

Effect of camera position in reconfigurable manufacturing systems
As pointed out in Telgen (2017), since reconfigurable manufacturing systems can be configured in many ways in response to different product families, crucial factors including camera position should be taken into consideration.Regarding different production manufacturing schemes, working area usually varies and the target objects are commonly three-dimensional, this makes it more complex to analyse camera visibility to the targets, which could be rotated, upside down or completely out of the camera FoV after reconfiguration.
In Urgo et al. (2016), a zero-point fixture system was designed as a reconfiguration enabler for flexible manufacturing.For measurement and verification of different pallet products, the camera visibility of the products was ensured by positioning the product within the viewing volume defined by the camera FoV and laser fan angle.Although the importance of camera positioning was pointed out, it mainly focused on single trail measurement with no further investigation into camera positioning.In the recent work of Wang et al. ' Adaptive, Repeatable and Rapid' (2022), Wang et al. 'Development of An Affordable' (2022), an adaptive and highly repeatable reconfigurable assembly system and process for smallbox product family were proposed and tested.In this process, a reconfigurable tooling system is designed to support the assembly of winglets, rudders and elevators.Using a photogrammetry system, the repeatability of reconfigurable fixture assembly was realised within +/-0.04mm.However, they pointed out that 'although the photogrammetry system is time-efficient in measurement of large-volume point cloud, the measurement accuracy relies heavily on its field of view' (Wang et al. ' Adaptive, Repeatable and Rapid' 2022).
In Martin (2018), a reconfigurable test execution system based on image processing was developed for a series of infrared ceramic heating elements.Although it achieved an accuracy within 500 microns, the measurement repeatability and precision were significantly influenced by camera placement and ambient lighting conditions.Considering the number of singulation units and pallet exchange time, the objective investigated in Steed (2016) was to develop a simulation-based method for evaluating two control configurations, namely fixed camera and eye-in-hand, in a robotic assembly cell.However, the evaluation was application-specific and camera position optimisation was not addressed.In order to effectively capture human motions within a workstation environment, camera disposition for different views (top, bottom, left and right side) was discussed in Faccio et al. (2019) considering camera field of view.However, no optimisation was carried out with regard to the camera position in the workstation environment.Similar work was done in Bortolini et al. (2018) for capturing operator movement and gesture in manufacturing activities by using optical motion capture technology.Despite the fact that this work suggested that the camera position of the motion analysis system should be carefully selected to maximise coverage and acquisition precision, only the ideal configuration of motion analysis system was introduced with no optimisation approach discussed.
In Drouot et al. (2018), a Nikon K-CMM camera was used in a reconfigurable wing assembly cell to control robotic handling and correct robot kinematic inaccuracy.This K-CMM photogrammetry system embedded achieved an absolute positioning accuracy better than +/-0.1 mm Sanderson et al. (2019).In Xia et al. (2018), a global calibration for multi-cameras based on nonoverlapping fields of view and reconfigurable targets was proposed for various vehicle outline detection.The calibration method aims to achieve large-scale vision and improve high-precision measurement (RMS 0.04 mm).However, the joint benefit of flexibility and accuracy would be impossible without a proper camera setup.Yet neither of the works mentioned any camera positioning approach.
In summary, metrology-assisted approaches are commonly applied for various purposes in flexible/reconfigu rable manufacturing systems.Despite the fact that most literature only focused on one configuration/application, the importance of camera position and the corresponding challenge with calibration effort is being widely recognised.Yet there is no systematic approach for the disposition of camera-based metrology systems and effort in optimising camera position is very limited.

Research status of photogrammetry systems in dynamic layout optimisation and reconfigurable manufacturing systems
According to the survey of dynamic facility layout in reconfigurable manufacturing systems (Benitez, Da Silveira, and Fogliatto 2019;Hosseini-Nasab et al. 2018;Pérez-Gosende, Mula, and Díaz-Madroñero 2021), the addressing issues generally include materials handling cost, rearrangement cost, construction cost, flow distance, flow path length, transport time, workflow etc.However, camera position has not been investigated as a dynamic facility layout problem nor in reconfigurable manufacturing systems.
Although camera position optimisation is investigated for surveillance applications in Piciarelli et al. (2015), Piciarelli and Luca Foresti (2020), they mainly focused on FoV coverage maximisation (Asaamoning et al. 2021; Suresh, Narayanan, and Menon 2020) without considering measurement accuracy.For object identification, Nuger and Benhabib (2018) proposed a camera reconfiguration method to detect unknown objects and paid special attention to system latency and recognition approaches.However, these applications do not need to be rapidly configured.In the case of a reconfigurable manufacturing system, not only good measurement accuracy is required, but cameras are required to rapidly adjust between configurations of different products or product families.Between configurations, robot arm movements, target objects and their location vary, which makes the optimisation of photogrammetry cameras a dynamic layout problem.Therefore, an automated optimisation framework for camera positions in reconfigurable manufacturing systems is desired.
In summary, in dynamic layout optimisation and facility design of reconfigurable manufacturing systems, the impact of photogrammetry devices is usually neglected.Although the importance of the camera position is highlighted (Płowucha, Jakubiec, and Wojtyła 2016), there is no general or established approach for the issue, which is also one of the major barriers in photogrammetry system implementation and integration in the industry (Zhang et al. 2021).

Target-object coverage evaluation based on 3D reconstruction
This section aims to evaluate the target-object coverage based on 3D reconstruction.The digital twin of the V-STARS photogrammetry system as well as the whole work cell are presented in Section 3.1.Furthermore, the coverage evaluation procedure including coarse-to-fine registration and 3D reconstruction are given in Sections 3.2 and 3.3, respectively.

Digital twin modelling and reconfigurable manufacturing work cell
The digital twin of a work cell is the virtual representation of the physical facility layout.Throughout the whole production lifecycle, the digital twin provides exact digital information as the physical factory.Moreover, digital twin models can be established with software packages such as Process Simulate, Gazebo and Visual Components.In this paper, the digital twin of the overall work cell is created in Visual Components as given in Figure 2. As given in Figure 2, the digital twin is built in Visual Components which duplicates the real-world facilities in the profile board assembly work cell.As presented in Figure 2 (c), there are several key resources in work cell, such as tool stand, two Kuka robots, target frame and profile boards storage rack (profile boards highlighted in yellow and rack in cyan respectively.The aims of this assembly cell is to pick and place three profile boards to the target frame from the storage rack. The V-STARS photogrammetry system consists of two 3D cameras as given in Figure 2(a,c).Correspondingly, their virtual digital twins are modelled in Figure 2(c) with the same FoV parameters (72 deg horizontally, 58 deg vertically).According to the FoV illustration in Figure 1(a), it inspects the whole assembly processes during the production horizon as presented in Figure 2 (a).In return, the virtual FoV simulation captures 3D data as point clouds and outputs them as ASCII data consisting of point cloud positions and their corresponding colours.
The assembly processes are performed within a multi-product reconfigurable work cell with a supporting reconfigurable tooling system similar to the one  2022)'s paper as given in Figure 3 (a).With the reconfigurable fixture and tooling system, products of similar size and build philosophy, such as winglets, rudders and elevators, can be assembled through similar processes.Even within a product type, for example winglets of Airbus A330 and Boeing 777, the different specifications (dimension, spars/ribs locations, skin thickness, aerofoil profile etc.) are also accounted for via the tooling system.
The fixture frame and the cell can be reconfigured regarding different product requirements.The V-STARS photogrammetry system is used for accurately positioning the reconfigurable components to the target frame.The frame will be located on an AGV to provide jig mobility and increase reconfigurable capability.The robot can be positioned anywhere on its grid base plate.The frame orientation (front/back side) can change between products.Different tooling storage to facilitate robot pick and place can also be included or removed from the cell.All reconfiguration and assembly processes would require good camera visibility of the frame.However, given different product assembly schemes, the FoV of the V-STARS system must be optimised and covers the target frame during the production horizon to meet accuracy demand.This paper aims to obtain an optimal camera position using the proposed digital-twin-based deep optimisation framework, and validate it in physical experiments as shown in Figure 2.

Coarse-to-fine registration
In this paper, two camera positions were investigated, and the two respective view (in the format of point clouds) are combined into a global consistent model by point cloud registration.Before registration, two respective standard templates were obtained from the STL model via the digital twin model as given in Figure 4(a).Then, the obtained point cloud from one camera is registered with its corresponding standard template.For example, the point clouds as shown in Figure 4(c) is the collection of the standard template in Figure 4(a) and the novel camera input in Figure 4(b).Given that the global position of the standard template is known, if novel camera input is aligned, the relative transformation can be solved.
The registration process designed in this paper is divided into two steps.Firstly, the coarse registration of two point clouds is conducted with fast global registration, which the error objective function is defined as (1) where p and q are the matching points satisfying p ∈ P and q ∈ Q, respectively.The collection of the correspondences is defined as S = {S|(p, q) ∈ S, p ∈ P, q ∈ Q}.In addition, the prior ψ(l (p,q) ) is denoted as ( 2 ) regarding the minimisation of partial derivatives with respect to l (p,q) of Equation ( 1).The optimisation is implemented by alternately optimising the coefficients l (p,q) and the transformation matrix T .While optimising the coefficients, the transformation matrix is locked and the solution to coefficients is analytical and Equation (2) must be satisfied (Zhou, Park, and Koltun 2016).Similarly, for the optimisation of the transformation matrix, the prior term is fixed and the error function in Equation ( 1) is defined as a least-square objective, which can be calculated with Gaussian-Newton method.
The initial correspondence set S is generated with the fast point feature histogram (FPFH).This method is based on the point feature histogram (PFH) and relies on the presence of 3D points and their surface normals.Therefore, given a point p, its neighbours within the sphere of radius r are selected to generate point pairs.Regarding each point pair (p i , p j ) with index i, j, the Darboux frame is denoted as where n i and n j are corresponding normals of point p i and point p j .The angular variations of normals n i and n j are Hence, the FPFH for point p can be given as The term SPF represents simplified point feature, which can be obtained from the angular variations in Equation ( 3).M is the number of neighbours considered for point p and d m is the distance between the m-th neighbour p m and the query point p.
In order to implement coarse registration, the point clouds are down-sampled as shown in Figure 4(c).In the feature extraction stage, the normals presented in Figure 4(d) are estimated using FPFH and applying KDtree search (Greenspan and Yurick 2003).Hence the two point clouds (standard template and camera input) can be aligned with RANSAC algorithm as indicated in Figure 4(e).

3D reconstruction and coverage evaluation
After the coarse alignment using fast global registration, point clouds still need to be further refined based on the initial condition obtained previously.As indicated in Figure 5(a), they are further optimised with pointto-plane iterative closest point registration and the error objective function is given as with n p being the normal of point p derived from the normal estimation from the feature extraction step.Given that the global information of the standard templates is known, the aligned two input point clouds can be used to reconstruct the overall work cell as given in Figure 5(b).Note that, during 3D reconstruction, the combined point clouds need to be re-sampled for frame point cloud extraction.After that, DB-SCAN (density-based spatial clustering of applications with noise) (Khan et al. 2014) is used to divide the collected point cloud of the whole work cell into several clusters as given in Figure 5(c), given the geometry information from the STL file.As shown in Figure 5(d), the target frame can be identified from the clusters according to geometry information, such as maximal and minimal boundaries.
Nevertheless, it is still difficult to evaluate the coverage of FoV over the target frame, as the point could is not directly related to FoV coverage.In this paper, hidden point removal technique is applied to remove the overlapping points behind the front surface.The hidden point removal algorithm consists of two steps: point inversion and convex hull computation.Point inversion maps all points p i ∈ P internal to a bounding sphere to the outside of that sphere and is mathematically defined as with F(p i ) being the inverted coordinates of p i and R s being the bounding sphere radius.Then, the convex hull computation distinguishes the visible points from the hidden points.The coverage of FoV over the target frame is evaluated according to the number of visible points from the frame point cloud cluster as indicated in Figure 5(e).

Deep dynamic camera position optimisation of V-STARS system
Camera position optimisation approach for photogrammetry system is introduced in this section.Firstly, the learning objectives are described in Section 4.1.Then, the proximal policy optimization framework in presented in Section 4.2 for optimal position selection.

Learning objectives
Two key considerations were included in the optimisation process, namely collision and FoV coverage.Therefore, the learning objectives are formulated to detect collision and evaluate FoV coverage.Since a dual-camera system is used, FoV overlap between the two respective views is another learning objective.Hence, in this subsection, three indicators are deployed to optimise camera positions.
Collision-free indicator p cd (ξ n ): firstly, the facilities located in the work cell and their working trajectories should be free from collision.Traditionally, collision is investigated as a 2D layout problem (Bortolini, Gabriele Galizia, and Mora 2018; Guo, Jiang, and Yang 2022; Pérez-Gosende, Mula, and Díaz-Madroñero 2021), examining footprints of each resource.Other publications approximate resources as rectangles or irregular shapes.In the proposed approach, collision is detected by using digital-twin simulation.Given that collision can occur anytime during robot movement, it should be monitored for the entire assembly process.This is achieved by the collision detection functionality in Visual Components as indicated in Figure 6.When collision occurs, the clashing parts are highlighted in yellow and a true boolean signal returns to the detection monitor, which is used reversely as the collision-free indicator in this paper.Its penalty is denoted as with ξ n being the position of the photogrammetry camera and v 0 being the penalty value defined by experiments.V d is the collection of collision detection.As long as V d is not empty, a penalty is triggered.
FoV coverage indicator p FoV (ξ n ): in addition, the camera position is evaluated for target coverage as detailed in Section 3.After removing hidden points, the FoV coverage is obtained by calculating the quantity of the point cloud of the 3D reconstructed target frame.Its reward can be defined as where r FoV is the number of the visible points in the target frame point cloud.
FoV Overlapping indicator p ovl (ξ n ): finally, due to the space limitations of the work cell, the two respective FoV on the target object cannot be fully overlapped, which is usually the case for photogrammetry applications in manufacturing.However, in order to obtain high-quality measurements, two respective FoVs must have enough overlapping coverage of the object.In this paper, the third learning objective is the overlapping indicator p ovl (ξ n ), which is defined as where M l , M r , and M R are the number of the points derived from left camera, right camera and 3D reconstruction, respectively.Note that, these three point clouds are down-sampled using the same density parameters.

Deep reinforcement learning
The camera positions for the V-STARS photogrammetry system are optimised using proximal policy optimization (PPO) (Schulman et al. 2017).The two camera positions are embedded into a state vector s t = [x l t , y l t , x r t , y r t ], in which t is the time step, x and y are the exploration coordinates, and l and r are the left and right camera notation.The exploration action a t = [a l xt , a l yt , a r xt , a r yt ] is chosen from a discrete incremental action set defined as, Thus, the reward of a step exploration is calculated as where w cd , w FoV , w ovl are the weight parameters of collision detection, FoV evaluation, and overlapping, respectively.Therefore, the overall reward depends on the two camera positions and the weight parameters are defined before optimisation.
Besides the three indicators, the learning objective should also include the penalty if the camera position is out of the constrained area.Therefore, we rewrite the reward of Equation (11) as where p a (ξ n ) is the additional penalty for position constraints and w a is the corresponding weight.PPO is a policy gradient method based on the actorcritic manner.Moreover, the actor maps the observation to an action by collecting a bunch of trajectories T k = {τ i } from the latest version of the stochastic policy π(θ k ).In addition, the reward-to-go R(s t , a t ) as well as the advantage estimation are calculated for updating the policy according to the objective function which is optimised via stochastic gradient ascent with Adam (Adaptive moment estimation).The advantage function can be estimated by using generalized advantage estimation written as (14) Regarding the value function, it is solved by minimising the mean-square error defined as Like all reinforcement learning applications, the proposed camera position optimisation framework requires a learning environment.As detailed in the Section 3.1, the digital twin is established in Visual Components.However, regarding the reinforcement learning, several settings need to be explained in this section.
Firstly, according to reinforcement learning theory, the learning environment consists of a digital-twin level and an optimisation level as given in Figure 7.During each learning step, an action is generated from the optimisation level and then, sent to the digital-twin level.Consequently, the learning environment implements the virtual assembly processes and feeds the reward back to the optimisation-level algorithm.
Secondly, the optimisation framework is designed based on the Gym Env class (Brockman et al. 2016).Following the template of Gym Env, the step function is used to trigger an implementation in the learning environment; the reset function is applied to reset the initial setting.Correspondingly, the digital-twin level responds to the commands (reset and step) sent from the optimisation level.Given the learning objectives, three penalties, i.e. collision detection, FoV overlapping and FoV coverage evaluation are combined as the reward for each step learning.
Finally, the connection between the optimisation framework and the digital-twin environment is realised by socket communication.Two ports, namely the actionsending port and the reward-receiving port, are designed in the overall framework.The action-sending port uses non-blocking mode in a try-except method, which allows the digital-twin environment to respond to the optimisation command.In contrast, as the optimisation algorithm waits for the reward feedback from the digital-twin environment, the socket is configured in the blocking mode as indicated in Figure 7.

Evaluation
The verification and experiments of our proposed dynamic camera position optimisation framework consist of three steps.Firstly, the V-STARS camera FoV comparison is detailed in Section 5.1.The lifecycle FoV coverage evaluation based on 3D reconstruction is given in Section 5.2.The overall digital-twin-based deep dynamic camera position optimisation is implemented in Section 5.3, along with a real-world demonstration.

FoV comparison
There are two key camera parameters, namely the horizontal and vertial angles (H 72 deg., V 58 deg.), characterising the FoV.As the V-STARS system is set to multiple-camera model, two D12 cameras are employed together to provide spatial information using vision triangulation.Therefore, although the two D12 cameras capture two dimensional pictures, they can be transformed into three dimensional measurements.
In order to verify the application of the virtual cameras, initial comparison between the pictures obtained from the V-STARS devices and the scanned point clouds from the virtual cameras is performed as indicated in Figure 8.Note that, both environments used identical setting configurations.
The picture captured by the right-hand side camera in the real world is given in Figure 8(a), whereas the point cloud scanned by the virtual camera is shown in Figure 8(c).As the manipulator occludes the frame as highlighted with a green circle, the FoV coverage for the target frame in the virtual environment is also blocked by the robot.Similarly, visibility for the lower part of the frame is hindered by the second link of the manipulator in Figure 8(a) which is also indicated in Figure 8(c) as shown in the pink circle.
For the left-hand side camera, the less visible parts of the frame and the storage rack are marked with a brown circle (upper left corner), a red circle (upper right corner) and a blue circle (lower middle) in Figure 8(b,d).Given the FoV comparison in Figure 8, the V-STARS camera FoV can be effectively simulated by the virtual cameras configured.

Single lifecycle episode learning
After introducing the virtual learning environment, the single lifecycle episode learning is detailed in this subsection.Given that the digital twin accompanies and evolves with the physical work cell during its lifecycle, the camera position optimisation should consider the key sequential poses of the robot during the overall lifecycle.
The assigned process for this work cell is to repetitively pick and place the profile boards to the target frame.During the assembly process, the V-STARS photogrammetry system spatially locates the frame and profile board positions by monitoring retro-reflective markers on the surface of the frame and profile boards as given in Figure 2. Like all photogrammetry systems, the visible area or the FoV coverage is crucial to the measurement accuracy.However, during the assembly processes, the target visibility might be obstructed by the motion of the manipulator.Therefore, it is crucial to have full consideration of the key sequential robot poses in the camera position optimisation.
Therefore, a single lifecycle episode learning is conducted as given in Figure 9, in which the learning processes are presented in four stages, i.e. point cloud obtaining, registration, reconstruction and frame extraction.As shown in the first column of Figure 9, the FoV coverage under five different robot poses are discussed.The point clouds, captured from two virtual cameras in Visual Components, are registered with the standard templates at first according to the FoV evaluation detailed in Section 3.2 and then the registration results are reconstructed to a new combined point cloud as detailed in Section 3.3.Subsequently, as presented in the second column of Figure 9, the point cloud captured by left V-STARS camera is given in red, and the point cloud captured by right V-STARS camera is shown in blue.
Moreover the clustering results are given in the third column of Figure 9 and the identified frame point clouds are shown in the forth column of Figure 9.As shown in the forth column of Figure 9, only part of the frame is captured by the two V-STARS cameras.V-STARS photogrammetry system optimises the measurement based on the bundle adjustment.Although the minimum number of points for the bundle adjustment is three, a higher number is recommended for an accurate measurement.This is why the FoV coverage needs to be maximised in the photogrammetry applications.
In order to have a thorough understanding of the single episode learning, point cloud informations is presented in Table 1 derived from the result given in Figure 9.Note that, as all points are resampled from the reconstructed point cloud which is captured from the frame surface, number of points contained, or size of the point cloud, can be used as an indication for FoV coverage completeness.Hence, it is used for FoV coverage evaluation as given in Table 1.Additionally, sum of the five 3D reconstruction point cloud capacities is considered as the FoV coverage indicator p FoV (ξ n ) according to Section 4.1.Meanwhile, sum of the five overlapping point cloud is defined as the overlapping indicator p ocl (ξ n ).It is important to note that the overlapping point cloud size is not proportional to the size of its corresponding 3D reconstruction, especially at robot pose (d) and robot pose (e).Hence, the overlapping information cannot be replaced by the 3D reconstruction.The overlapping parameter is extremely useful when the object measurement surface is flat, which will be further detailed in the following subsection.With the complete single episode learning, the overall deep camera position optimisation framework is presented in the next section.

Deep camera position optimisation of the V-STARS photogrammetry system
Given the virtual learning environment in Section 4.2 and single episode learning in Section 5.2, the camera position optimisation is implemented for a constrained area as shown in Figure 10.Since there are two cameras in the V-STARS photogrammetry system, instead of optimising the camera positions as two separate agents, we embed two camera positions into a four-element vector.Hence, the observations and the actions are also four-element vector.
Besides the software details given in Section 4.2, the optimisation is accelerated using a NVIDIA GeForce GTX 1080 GPU.Hence, according the experimental setting introduced above, the V-STARS camera position optimisation is performed with three deep reinforcement algorithms, namely PPO, DQN (deep Q-network) algorithm (Mnih et al. 2015) and A2C algorithm (a synchronous variant of Asynchronous Advantage Actor Critic) (Haarnoja et al. 2018).The aim of comparing PPO against DQN and A2C is to identify the convergence from any local optimal.All three algorithms have been used in industrial applications (Panzer and Bender 2022), such as process control (Guo et al. 2019;Szarski and Chauhan 2021;Yoo et al. 2021), scheduling (Dong et al. 2020;Park et al. 2021Park et al. , 2019;;Rummukainen and Nurminen 2019), dispatching (Cui et al. 2021;Dittrich  and Fohlmeister 2020; Kuhnle et al. 2021), logistics (Feldkamp, Bergmann, andStrassburger 2020;Hildebrand, Andersen, and Bøgh 2020), and assembly (Li et al. 2019;Watanabe and Inada 2020).Both PPO and A2C are policy gradient deep reinforcement learning algorithms, which means they update the policy derived from the gradient of the objective functions.A2C only learns by its current policy, hence being more stable, but requiring more computational power to converge than PPO, while PPO is sample-based and introduces more hyperparameters and complexity during optimisation.Therefore, it is necessary to compare both algorithms in this instance.
As shown in Figure 11, the exploration of DQN, A2C, and PPO are given.Each algorithm is tested five times, and the optimised positions and their error bars are presented together in Figure 11.Both PPO and DQN algorithms start from a negative search at the beginning, while A2C algorithm starts with a positive value.This indicates that a collision was detected in the initial state of exploration.In addition, it shows that PPO converged faster than DQN and A2C, as the reward quickly settled after 100,000 episodes.Given that PPO is a policy-based deep reinforcement learning algorithm, it is sample-efficient and generally converges faster than DQN and A2C.12(a-c)), and three snapshots from the right camera (Figure 12(d-f)), monitoring the profile board assembly process.
In contrast, the DQN and A2C optimisations keep exploring after the emergence of the maximum reward (after 200,000 episodes in Figure 11(a) and 170,000 episodes in Figure 11(b)).These explorations do not make the convergence performance better.The PPO learning only performs small-scale exploration after convergence.Also, the maximum reward in Figure 11(a) is smaller than the maximum value in Figure 11(b) and Figure 11(c), which indicates that camera positions are not fully explored with the DQN algorithm.However, as shown in Figure 11, the converged value learned by PPO is consistent with the maximum reward obtained by A2C (a small deviation within measurement noise), which proves that the optimal camera position is the global optimal.Regarding optimisation speed, PPO is much faster than A2C.
Using the PPO algorithm, the optimal camera position for two V-STARS cameras are presented in Figure 10 as shown in red star.In order to verify the optimised position.We further implement a real world frame inspection as given in Figure 12.The snapshots from the left camera are presented in Figure 12.More specifically, during the assembly process, the snapshots from the left camera are given in Figure 12( a-c).The snapshots of the assembly work cell are presented in Figure 12( d-f).Furthermore, according the V-STARS system, the identified retro-reflective markers are indicated in green, while the unidentified markers (high reflective objects) are shown in red.During the overall inspection, the green markers are used for high-accuracy measurement.Therefore, further investigation is conducted based on the learned optimal camera position as shown in Figure 12.
Furthermore, the measurement report of the frame inspection experiment is summarised in Table 2. Given the optimal camera position, there are 14 markers (code label) identified by the V-STARS bundle algorithm.Not only the 3D coordinates are presented in the summary but also their uncertainties (sigma value) are shown in the  table.Additionally, the measurements are evaluated with root mean square (RMS) and root sum of squares (RSS).The FoV visibility over the frame is indirectly assessed with the ray numbers as given in Table 2.For a clear explanation, the relationship between the measurement uncertainties and the ray numbers are investigated in Figure 13.Generally, the sigma value of Z coordinates is the depth information calculated with multiple view geometry (Hartley and Zisserman 2003), which is larger than the uncertainties of X coordinates and Y coordinates as shown in Figure 13(a-c), especially at the code label with less visibilities (Code 13 and Code 14).In addition, as shown in Figure 13(a), the sigma X of code label 1 is 0.0028, which is co-visible among 38 rays.In contrast, the sigma X of code label 14 is 0.0231 with co-visible among 2 rays.Similar situations can be found in the rest four subfigures of sigma Y, sigma Z, RSS and RMS in Figure 13.Since the ray numbers directly indicate the quantity evaluation of the object visibility, which fewer rays lead to less coverage of the inspection target.
In theory, the V-STARS photogrammetry system uses bundle adjustment to refine 3D geometry coordinates, camera optical characteristics and relative motion parameters simultaneously according to a bunch of covisible images.More specifically, the reprojection errors between predicted image points and observed image locations are minimised with nonlinear least-square algorithm in bundle adjustment.However, given that minimising reprojection error is a typical maximum likelihood estimation, which might be easily overfitting, if the cameras have a small coverage over the object and could lead to inaccurate measurements.This is why the FoV coverage indicator is crucial in the camera position optimisation for V-STARS photogrammetry system.

Conclusion, discussion and future work
In this paper a novel deep dynamic camera position optimisation framework is proposed for V-STARS photogrammetry system for a reconfigurable manufacturing work cell.Instead of manual configuration, this framework could provide an automatic camera position optimisation solution for photogrammetry system in a virtual digital twin environment.Moreover, derived from 3D reconstruction of the camera, FoV coverage for the target object and collision detection are taken into consideration throughout the overall product lifecycle.The PPO reinforcement learning framework is utilised to optimise the camera positions given the penalties of collision and FoV coverage evaluation.Finally, the feasibility of the virtual V-STARS scan is investigated in the experiment, followed by a single episode lifecycle learning.The deep optimisation of the V-STARS camera position is verified with a real-world demonstration.
This paper proposes a generic camera position optimisation framework for rapid change of photogrammetry devices in reconfigurable manufacturing systems.Despite that this work is demonstrated with a twocamera photogrammetry system, it can be extended to other metrology devices by customising different learning objectives.Although Benefiting from the digital twin techniques, the camera position optimisation framework can be implemented in a virtual manufacturing environment, which could significantly enhance the configuration efficiency of the photogrammetry system and avoid repetitive manual work for scaling, calibration and camera position relocation.
DQN is a value-based approximation reinforcement learning, which applies stored offline data (replay buffer) to logically separate the experience buffer from the exploration.In order to improve the convergence performance of DQN, the epsilon-greedy approach is used.The epsilon parameter is designed to be linearly decreasing, which means initially the deep Q learning explores aggressively and it gradually reduces the exploration probability towards convergence.Not akin to DQN, PPO is a policy-based reinforcement learning method.PPO directly learns from the established digital-twin environment.After obtaining a batch of experiences, it does a gradient update and clears the batch memory, which is easier to tune, and it is sample-efficient.As PPO uses surrogate clipping objective function to prevent large gradient updates, it generally outperforms A2C in our use case of camera position optimisation.However, for parallel computing like A2C, distributed computing process can be used for PPO to further access the learning procedure.
In this paper, the discrete states are used for camera position in dynamic layout optimisation.Based on the incremental exploration, step size can be chosen to avoid If the target can be tracked, the ray will be in green.Otherwise it will be red and return a boolean signal.As the position varies in a reconfigurable system, it is crucial to make sure that during production process, the target must not be blocked.
converging to a local optimal solution.Also, using discrete state scheme can ensure that the final sequential search arrives at a location surrounded by a group of relatively ideal locations.From a practical point of view, the designed location and the actual location will differ by a small deviation as the cameras are setup and located manually.Therefore the optimal position should be insensitive to this small deviation.The optimal position obtained from the accumulated reward scheme can guarantee that the actual camera position is still within the optimal range.
In terms of optimising multiple objectives simultaneously, they are three objectives considered in this work, including collision detection, FoV coverage and FoV overlapping.Since the collision is a negative phenomenon, therefore it is penalised with a large weighted parameter.The other two objectives, FoV coverage and overlapping, might cause competing issues in the camera position optimisation.Nevertheless, manual tuning weighted parameters for these two objectives is not difficult.However, if there are multiple photogrammetry devices, the optimisation would require a multi-agent system strategy and the balancing of weighted parameters can be challenging.
Since photogrammetry system relies heavily on target visibility and the number of good measurement points to construct a highly accurate coordinate system, the realisation of its large-measurement-volume and highlyportable benefits is also dependent on those prerequisites.On the other hand, metrology technologies, such as laser trackers, can achieve highly-accurate measurement, however at a lower volume with very limited portability.One could combine the benefit of two systems by constructing a common measurement coordinate.Currently, as shown in Figure 14, apart from the digital twin of the V-STARS photogrammetry systems established in this paper, the digital twin of a laser tracker is under investigation.
Therefore, our next step is to perform dynamic layout optimisation with multiple metrology devices for sensor fusion and 3D reconstruction.For example, by using both the V-STARS photogrammetry system and a Leica laser tracker, feasibility and comparative studies for large-scale measurement can be carried out.
modelling and simulation, manufacturing informatics and precision manufacture with current research project portfolio in excess of 14 million and publication record of over 220 publications including over 100 journal papers.Svetan is a Fellow of IMechE, member of the IFAC technical committees TC5.1 and TC5.2 and the founding chair of the International Precision Assembly Seminar IPAS.

Figure 1 .
Figure 1.The graphic explanation of V-STARS camera FoV and its digital twin model in Visual Components.(a) Graphic explanation of the camera FoV of the V-STARS system (KINEMATICS 2013).(b) Virtual V-STARS system in digital twin.The V-STARS photogrammetry system provides accurate three dimensional measurement in industrial manufacturing application.The virtual V-STARS system in digital twin in Figure 1(a) duplicates the same coverage of the FoV as given in Figure 1(b).

Figure 2 .
Figure 2. Digital twin modelling of V-STARS photogrammetry system.The V-STARS photogrammetry system consists of a laptop, two cameras, two tripods, and power supply as shown in Figure 2 (b).The parameters of V-STARS camera FoV are 72 deg.horizontal and 58 deg.vertical.The digital twin of the assembly manufacturing work cell is given in Figure 2 (a) and its physical layout is shown in Figure 2 (c).

Figure 3 .
Figure3.The conception of the reconfigurable assembly products and the mobile transport platform.The two products are given in Figure3(a)(Wang et al. 'Development of An Affordable' 2022)  and their assembly could be achieved in through a common reconfigurable system with adaptive and rapid auto-reconfiguration processes.The target frame will be finally located on an AGV as indicated in Figure3(b) to improve the reconfiguration capability of the work cell.Therefore, it is necessary to have an auto photogrammetry camera position optimisation framework due to changes of the physical facility layout.

Figure 4 .
Figure 4. Flow chart of the V-STARS FoV coverage evaluation (part one).The first part of the V-STARS FoV coverage consists of point cloud collection, feature extraction, and coarse point cloud registration.

Figure 5 .
Figure 5. Flow chart of the V-STARS FoV coverage evaluation (part two).The second part of the V-STARS FoV coverage consists of fine point cloud registration, 3D reconstruction and hidden point removal.

Figure 6 .
Figure 6.Collision detection among photogrammetry system and the other components.

Figure 7 .
Figure 7.The design of the virtual learning environment.The virtual learning environment uses socket communication to exchange data.The virtual environment responds to the action sent from the optimisation level to update the new layout setting.In addition, given the learning objectives, the virtual learning environment provides a point cloud and reward of collision detection for three indicators.

Figure 8 .
Figure 8. FoV comparison of the virtual cameras and their physical sides.Figure 8 (a) shows the image captured in real world.In contrast, the image captured by virtual V-STARS camera is given in Figure 8(c).Similarly, the images obtained from left camera of the physical side and the virtual environment are given in Figure 8(b,d), respectively.

Figure 9 .
Figure 9. Lifecycle FoV coverage evaluation of a single episode learning.The lifecycle experiment consists of six FoV evaluations regarding different robot poses in a single episode learning.

Figure 10 .
Figure 10.Two-dimentional projection of the work cell and constrained area for camera position.In order the provide a clear view of the work cell and the camera position optimisation area, the top view of the work cell is given in the above figure.Also, the two cameras are located in the two grey regions are indicated in the above projection.

Figure 11 .
Figure 11.The learning process of the V-STARS camera position optimisation.The camera position optimisation is implemented using DQN, A2C and PPO algorithms respectively.(a) DQN.(b) A2C.(c) PPO.

Figure 12 .
Figure12.The V-STARS camera snapshots of the frame inspection experiment.The above figures contain three snapshots from the left camera (Figure12(a-c)), and three snapshots from the right camera (Figure12(d-f)), monitoring the profile board assembly process.

Figure 13 .
Figure13.FoV visibility data analysis.The sigma value of X axis, Y axis and Z axis of fourteen labels are presented above, along with RSS, RMS and Rays, which indicates the visibility of the V-stars photogrammetry system.

Figure 14 .
Figure 14.The digital twin modelling of the Leica laser tracker in Visual Components.(a) High-precision measurement by using Leica tracker.(b) Metrology view blocked by the ABB robot.If the target can be tracked, the ray will be in green.Otherwise it will be red and return a boolean signal.As the position varies in a reconfigurable system, it is crucial to make sure that during production process, the target must not be blocked.

Table 1 .
Lifecycle FoV coverage evaluation corresponding to Figure9.

Table 2 .
The measurement report of the V-STARS photogrammetry system based on the optimal camera position.