Toward Standard Guidelines to Design the Sense of Embodiment in Teleoperation Applications: A Review and Toolbox

ABSTRACT We present a literature review and a toolbox to help the reader find the best method to design for and assess Sense of Embodiment (SoE) in several application scenarios. The main examples are based on teleoperation applications, due the challenges that these applications present. The three embodiment components that we consider to describe SoE are sense of ownership, sense of agency, and sense of self-location. We relate each embodiment component to the most often used assessment measures, test tasks, and application scenarios. The toolbox is built to efficiently design, test, and assess an embodiment experience, following seven concrete steps. We provide four main contributions: 1) a literature review of the assessment measures and strategies used to measure SoE; 2) a systematic categorization of SoE measures; 3) A categorization of the main test tasks used in SoE assessment; and 4) a toolbox consisting of seven steps as guidance to design SoE. We included several examples and tables to guide the user step by step through the design of an embodiment experience.


Introduction
Sense of Embodiment (SoE) can be defined as the experience that the external body, body part, or object (such as a rubber hand, a mannequin, a robotic device, or a virtual avatar) is perceived as one's own (Kilteni et al., 2012). Throughout the rest of the paper, we will refer to the external body or part of it as surrogate, while the term operator will be used to refer to who goes through an embodiment experience over a surrogate. SoE lacks a standard definition and the most used and acknowledged definitions (Kilteni et al., 2012;Longo et al., 2008;Metzinger, 2009;De Vignemont, 2011) do not stress the importance of defining SoE through its components due to their relation. We consider SoE as characterized by three components: 1) the sense of ownership, namely the feeling of self-attribution of an external object or device. For example, if an operator is teleoperating a robotic arm, the level of the sense of ownership is defined by how much the operators experience the robotic arm as being part of their body (Krom et al., 2019). 2) The sense of agency is defined as the feeling of having motor, action, and intention control (Blanke & Metzinger, 2009;Kilteni et al., 2012), of the surrogate. This component is characterized by the trust that operators put in the fact that their intended actions are mirrored by the surrogate and by the sense of control that they have over it . 3) The sense of self-location refers to the volume of space where one feels located (Kilteni et al., 2012). Usually, self-location and body-space coincide until one feels selflocated inside a physical body (Lenggenhager et al., 2009; out-of-body experiences can be an exception (Ehrsson et al., 2004)). Operators should be aware of the remote environment and their position in it. They should feel confident in estimating distance and position of objects, and (if possible) in navigating in the remote environment (Arzy et al., 2006). We also integrate the current definitions to highlight that SoE has both subjective and objective component. The subjective component is related to the individual update of the level of embodiment, that everyone performs intrinsically and unconsciously, mostly on the basis of perceptual cues. The objective one concerns the parameters, of the system or the experience (such as the setup, the context, the tasks), that can be manipulated in order to increase or decrease SoE. We are still unaware of the nature of all the parameters that can affect SoE and their specific effects.
Disciplines such as neuroscience and psychology (Carruthers, 2008;Sampson, 1996) study SoE to gain more insight in the malleability of the surrogate representation in the brain, but SoE is also studied in applications like virtual reality and robotics (Haans & IJsselsteijn, 2012;Kilteni et al., 2012;Toet et al., 2020). For example, in teleoperation (i.e. the remote control of a machine or device (Niemeyer et al., 2016;Sheridan, 1995)), (Marasco et al., 2018;Sanchez-Vives & Slater, 2005;Schiefer et al., 2015). This is because when a high level of SoE is achieved, the operators' perception of the surrogate as mediator is lower (Cabrera & Wachs, 2017). In other words, the operators do not perceive the surrogate as a third party object that mediates the interaction between them and the environment, but they feel embodied and in control of the surrogate as they are of their own body.
The guidance provided in this study is applicable to different embodiment experiences and virtual reality (VR) setups. However, there will be a focus on teleoperation since it offers the most difficult challenges for designers and developers.
Although the importance of SoE in the previous-cited fields is acknowledged, there is no standard framework to test, measure, and assess it. Results are difficult to replicate and compare across studies. Moreover, we argue that the design of a teleoperation system or an embodiment experience could benefit from a standard procedure to assess them both in the developing phase and also in the after-implementation tests. Therefore, we provide a literature review and a toolbox to guide the reader in designing, testing, and assessing SoE in a system.
In this respect, our works is comparable with previous reviews, but it has some significant differences (Gonzalez-Franco & Peck, 2018;Kilteni et al., 2012;Toet et al., 2020). In (Kilteni et al., 2012), the authors present a literature review of the structure, measures, experimental manipulations, and challenges related to SoE with a virtual body in an immersive virtual environment, but not in other applications like teleoperation in a real environment. The authors define SoE using the same components considered in this work. These same components are considered in (Toet et al., 2020), in which the authors focus only on telerobotics; they also provide a set of guidelines to apply SoE in telerobotics, identifying some important challenges and research topics. Eventually, Gonzalez-Franco & Peck (2018) focus only on virtual bodies, and the aim of the authors is to define a standard questionnaire to assess SoE. The authors present a literature review of the most commonly used questionnaires and SoE structure described by more components than the ones considered in this work. Particularly, they include the tactile sensation, the external appearance, and the response to external stimuli. We consider these extra components as part of the sense of ownership, following the research line of (Kilteni et al., 2012) and (Toet et al., 2020). Our work considers both SoE in virtual bodies and in telerobotics. Moreover, we provide the reader a toolbox to design, test, and assess SoE in a teleoperation system or an embodiment experience. We categorize the most used assessment measures and we relate them to each embodiment component, the tasks, and the application scenarios.
Four reference figures and tables are presented in this paper. We labeled, grouped, and categorized the design steps, assessment measures, and perceptual cues on the basis of the reported literature review: • Figure 1 represents a general guideline to apply the toolbox step by step; • Figure 2 sums up our findings related to the subcategories of the categorized assessment measures; • Table 1 associates the perceptual cues considered in this study with the embodiment components that are mostly affected by them; • Table 2 shows if a specific assessment measure is suitable, nonspecific, or unsuitable for each embodiment component.
The paper is structured as follows. In Section 2 we present the methodology to realize the literature review. Section 3 is dedicated to a detailed presentation of the steps of SoE toolbox. This section is structured following Figure 1. Section 3.9 includes our observations on the most used assessment measures and methodology, and reports the (dis)advantages for each assessment measure. Section 4  is an overview on the toolbox use and includes two example cases built with the toolbox application. Finally, in Section 5 we list conclusions and recommendations.

The methodology
We examined the academic literature to provide a comprehensive review of the assessment measures and tasks that can be used to evaluate and test SoE and its three components. After the categorization of the assessment measures and tasks, we identified if they can be applied to assess and test each single component, multiple components together, or if the distinction is not clear. The aim is to present a first step toward an SoE toolbox, associating the embodiment components to the involved sensory factors, the assessment measures, the test tasks, and the considered application scenarios. Publications on testing and assessing SoE and the mostly used assessment tasks were identified through a systematic literature review. We included searches from Google Scholar, Elsevier Scopus, IEEE, and Mendeley, from 1991 to 2021 (with a particular focus on papers from 1991 to 2019). The reason for combining the different sources was that Google Scholar is not enough to be used alone for systematic reviews (Giustini & Boulos, 2013). The search terms that were used were 'sense of embodiment, ' 'embodiment,' 'ownership,' 'agency,' 'self-location,' 'evaluation,' 'assessment,' 'measures,' 'metrics,' 'teleoperation,' 'telemanipulation,' 'teleoperation task,' 'teleoperation taxonomy,' 'teleoperation toolbox,' 'user case study,' 'VR setup,' 'virtual reality,' 'embodiment experience,' 'avatar,' 'embodiment illusion,' in various combinations (e.g., adding "and" between the terms or combining them: "teleoperation metrics"). Papers were selected from the domains of engineering, technology, psychology, and neuroscience. The initial search provided about 10300 results. By limiting our search to articles in English mostly reporting empirical studies in peer-reviewed journals, and excluding articles targeting animals and brain injured populations, we obtained 1800 articles. Reviewing the title and abstract of those articles, we excluded those that did not meet the inclusion criteria reported above. The search was further refined by adding search terms referring to the perceptual cues, test tasks and assessment categories (such as 'tactile feedback,' 'visual feedback,' 'questionnaire,' 'proprioceptive drift,' 'proprioception,' 'skin conductance response,' 'SCR,' 'peg-inhole,' 'time delay'). This resulted in 161 articles, which were reviewed in chronological order to understand the evolution of the evaluation metrics, the test tasks, the research aims, the technical developments, and the related works.

Toolbox and guidance: five steps to sense of embodiment
In this section, we explain the steps in the toolbox in more detail and provide guidance of how to complete the steps to design and assess SoE.

Step 1. Application scenarios
We focus on four application scenarios and their relation with the components of SoE. These are high-level scenarios that illustrate our general recommendations, though each individual case may have its own considerations.

Social
A social scenario refers to the situations in which the operator has to interact with other individuals in a dynamic and, often, partially unknown environment. In this scenario, the operated device must enable tasks such as: shaking hands, giving hugs, making eye contact, expressing gesture, adjusting posture, touching and manipulating everyday objects, moving around in an apartment, expressing emotions, and displaying other cues relevant in social interaction (Breazeal et al., 2016). Usually, the robotic devices are humanoid in this application context. For social robotic telepresence, we agree with (Coradeschi et al., 2011) in pointing the attention on four main aspects: 1) the mechanical design, 2) the user interface design, 3) the interaction between the surrogate and the operator's real body and 4) the subjective perception of the telepresence level of the system. Moreover, both (Coradeschi et al., 2011) and this work consider the comparison between robotic and non-robotic systems. For a social scenario, the relevant embodiment components are the sense of ownership, agency, and self-location.

Industrial
In industrial scenarios, the environment is usually static and the actions predictable. The operated device will have to manipulate tools, move objects, and move around in an environment that is not open air, such as a factory. The environment may not be human-friendly due to, for instance, high temperature, tiny dimensions, or high-security risks (Heyer, 2010). Examples of tasks in this scenario are maintenance using tools or robotic arms, and moving large masses. While designing a teleoperation system that has to perform in an industrial scenario, the focus will be mostly on the performance and not on SoE. The operator's experience refers to a more classical concept of user experience: efficiency, effectiveness, usability and ease of use (Quesenbery, 2001). In (Townsend & Guertin, 1999), for example, the authors present a design methodology to improve the operator's performance and experience in industrial scenarios, introducing the new concept whole-arm manipulator (WAM). The design goal is that the WAM should be able to control forces and torques robustly along all of the outer link surfaces. It also refers to the degrees of freedom for an operator to manipulate any intermediate structure independently of the end-effector by exploiting redundancy in the kinematics. For an industrial scenario the relevant embodiment components are the sense of agency and self-location.

Field
Field scenarios include tasks such as inspection & maintenance (Pouliot & Montambault, 2008), and search & rescue (Birk et al., 2009) in unstructured environments. Unstructured environments are dynamic and unpredictable. These kinds of scenario are also called exploratory robotics. The robotic devices for this scenario are often tracked vehicles or animal shaped with multiple legs. In a literature review by (Halme et al., 1999), the authors compares different complications that could be encountered while teleoperating in a field scenario. The focus is on technical issues, such as data transmission, the choice of the setup and measurements. For a field scenario the relevant embodiment components are the sense of agency and self-location. The sense of ownership can be relevant if the operator needs to receive multimodal haptic feedback to optimize the teleoperation experience and to improve the performance.

Surgical
Robot-assisted surgery was developed to overcome the limitations of preexisting minimally invasive surgical procedures and to enhance the capabilities of surgeons. The surgeon uses a direct telemanipulator, or a computer control, to control the device and the instruments. Another advantage of using robot-assisted surgery is that the surgeon does not have to be present, leading to the possibility for remote surgery. In this scenario, tasks of microassembly and microteleoperation are common. The main challenge is to create a connection and transparency between the macro world of the operators and the nano world in which they have to tele-operate the system. Particularly, the focus is on optimizing the motion control in constrained workspaces (Funda et al., 1996), and increasing dexterity and degrees of freedom (Madhani et al., 1998) safely. For a surgical scenario, especially due to the importance of tasks involving hand-eye coordination, the relevant embodiment components are: the sense of agency and self-location.

Step 2a. Sense of embodiment and the embodiment components
Based on (Kilteni et al., 2012), we use the term SoE as an overarching construct including the sense of ownership (Krom et al., 2019), sense of agency , and sense of self-location (Arzy et al., 2006). Unlike (Gonzalez-Franco & Peck, 2018), as we already mentioned, we include the tactile sensation, the external appearance, and the response to external stimuli, as sub-components of the sense of ownership. Numerous studies have found that is possible to induce a strong SoE over virtual and real extracorporeal objects such as fake limbs, robotic hands and arms, mannequins, virtual bodies and even empty volumes of space and invisible bodies (Caspar et al., 2015;Guterstam et al., 2015;Kondo et al., 2018;Van Der Hoort & Ehrsson, 2016). While operating a machine remotely, with a high level of SoE, the operator's perception of the remote device as mediator decreases (Cabrera & Wachs, 2017), increasing the teleoperation system transparency. Starting from this intuition, some studies try to demonstrate that a high level of SoE can improve teleoperation task performance (Marasco et al., 2018;Sanchez-Vives & Slater, 2005;Schiefer et al., 2015). In (Toet et al., 2020), the authors use the predictive encoding theory (Friston, 2012;Friston & Kiebel, 2009;Hohwy, 2013) as a neuroscientific framework to interpret and discuss their findings on how SoE affects teleoperation performances. The predictive encoding theory postulates that the brain produces models at each level of perceptual and cognitive processing to predict what information it should receive from the level below it (i.e., top-down). Then, the brain compares the bottom-up sensory information with the predictions from the model. The discrepancies between both (the prediction errors) are the only elements that are passed to higher levels, in which they are used to update the model. The model updates are aimed at minimizing or suppressing prediction errors at a lower level. This theory can be applied to SoE in the sense that the brain effectively updates the body model in a bottom-up way by minimizing prediction errors between its top-down predictions and actual sensory events. The predictive encoding framework allows to interpret and discuss data on embodiment experiments and to make predictions for embodiment effects in telerobotics. Moreover, in (Toet et al., 2020), the authors present four premises on the relation between SoE and teleoperation, that are supported by the literature: 1) the brain can embody nonbodily objects (e.g., robotic hands, animal-shaped robots, non-humanoid virtual avatar), 2) embodiment can be elicited with mediated sensorimotor interaction, 3) embodiment is robust against inconsistencies between the robotic system and the operator's body, and 4) embodiment positively correlates to dexterous task performance. One of the main debates and challenges concerns how to clearly disentangle SoE components. In (Longo et al., 2008), the authors present a psychometric approach to disentangle the embodiment components. Participants had to perform a proprioceptive judgment task while experiencing a rubber hand illusion (RHI). The authors confirmed previous findings about how the sense of ownership and agency reflect dissociable components of the embodiment (Gallagher, 2000;Synofzik et al., 2008;Tsakiris et al., 2010Tsakiris et al., , 2006. The same was claimed for the sense of self-location, even if this component presents a strong correlation with the sense of ownership. Another line of thought concerns the strict dependency and the mutual influence that the components, particularly the sense of ownership and agency, have on each other. In (Asai, 2016), the authors used the proprioceptive drift as a measure to assess SoE during an RHI. They concluded that the explicit sense of agency can arouse an implicit measure of the sense of ownership. In (Gallagher, 2013), the author presents the ambiguity of the sense of agency, focusing on the difficulty of disentangling this component from other factors influencing the embodiment experience. The exact relation between the sense that one's body is one's own (body ownership) and the sense that one controls one's own bodily actions (agency) has been the focus of much speculation, but remains unclear. On the basis of Tsakiris et al. (2010), we can consider two models to describe the relationship between the sense of ownership and sense of agency. First, an additive model, in which agency and body-ownership are strongly related, because the ability to control actions is a powerful cue to body-ownership; plus possible additional sub-components unique to ownership and agency. An alternative independence model, sustains that agency and body-ownership are qualitatively different experiences triggered by different inputs and recruiting distinct brain networks. We can divide the brain regions involved in two main groups. The first group of brain regions constitutes a network of sensorimotor transformations and motor control, whereas the second group of brain regions represents a set of hetero-modal association cortices implicated in various cognitive functions. Unfortunately, we still do not know the exact functions and contributions of these brain regions to the sense of agency. Several studies are aimed at identifying the neural correlates of two different judgments of attribution: experiencing oneself as the cause of an action (the sense of agency) or experiencing another person or object as being the cause of that action (Farrer & Frith, 2002;Tsakiris et al., 2008).

Step 2b. Perceptual cues to optimize
The perceptual cues positively affect an embodiment component when they reassemble or recreate experiences to which the human brain is used, reducing the prediction error as explained in the previous section. The relevant perceptual cues and which embodiment components they mostly affect can be found in Table 1.

Point of view
The point of view is the perspective from which the operator observes the remote environment and experiences the surrogate. Usually, the main distinction is between first-person perspective (1PP) and third-person perspective (3PP). The extensive literature on this cue shows that a 1PP is sufficient to create SoE, but is not strictly required, as SoE also occurs in 3PP, in the absence of visual cues or with an incongruent visual perspective. However, as the authors conclude in the literature review from (Toet et al., 2020), 3PP alone (i.e. in the absence of other perceptual cues) is not sufficient to create an embodiment experience. In more detail, the sense of ownership over a virtual body can be obtained in both 1PP (Maselli & Slater, 2014;Petkova & Ehrsson, 2008;Slater et al., 2010)) and in 3PP (Gorisse et al., 2019;Lenggenhager et al., 2009Lenggenhager et al., , 2007. However, ownership is typically stronger from a 1PP compared to a 3PP (Petkova & Ehrsson, 2008;Pozeg et al., 2015;Slater et al., 2010). For what concerns the sense of agency and self-location, both 1PP and 3PP induce the same level of SoE (Gorisse et al., 2017). As we already stated, the sense of ownership and self-location are strongly correlated, and 1PP can consistently increase their level during an embodiment experience (Debarba et al., 2017). Among others, this is due to the better perception of the arms and the hands of the operators' avatars (Gorisse et al., 2017). Ownership is less likely to occur when the apparent visual location or orientation of a body part conflicts with its real location (Pavani et al., 2000). Moreover, it was demonstrated that 1PP of a realistic virtual body can induce a strong embodiment illusion even after asynchronous visuo-tactile stimulation (Maselli & Slater, 2014;Serino et al., 2016;Slater et al., 2010). However, 3PP can provide a better awareness of the environment. Therefore, according to the context of application, it can be preferred over 1PP. SoE can also be obtained over a distant body, as seen from 3PP, when synchronous visuotactile information is provided (Lenggenhager et al., 2009(Lenggenhager et al., , 2007 or when the surrogate preserves spatial overlap with the real embodiment (Maselli & Slater, 2014).

Field of view
The field of view is the observable area that the operators see without head or eye movements, directly or via an optical device (Fribourg et al., 2020), such as a VR headset. Normally, humans have a slightly over 210-degree forward-facing horizontal arc of their visual field, i.e. without eye movements (with eye movements included it is slightly larger). A high SoE is easier to obtain if the field of view allows for the coverage of an area similar to the human field. This cue can be manipulated in different ways: it can be dependent on the surrogate size or it can be increased or decreased compared to the human field of view (often decreased because of the limited field of view of head mounted displays). The manipulation of this perceptual cue affects the perception of the remote environment and the judgment of the peri-personal and distant space (Van der Hoort & Ehrsson, 2014). This can affect the level of sense of ownership and self-location, but it can also affect the sense of agency by creating movement impairments (Wenk et al., 2021).

View direction control
This cue refers to the amount of control that operators have in directing their gaze in the remote environment. When the control is absent, imprecise, or not intuitive, this can affect the perception and experience of the embodiment. Different studies demonstrated the impact of the avatar control on SoE. In (Fribourg et al., 2020), the authors tested the impact of three perceptual cues on SoE. They demonstrated that a close match between the view direction control of the operators' intentions and the subsequent actions, mostly affected the sense of agency, compared to the other two embodiment components. Another example is given in (Kokkinara et al., 2016), where participants experienced a high sense of agency when walking a virtual body, even though they were seated and only head movements were allowed.

Connectedness
The connectedness refers to the perception that the surrogate is attached to the operators' body. The operators perceive the real body as joined to an external object or device, such as a continuum of their own body. This cue is especially helpful in increasing the joints perception and awareness in the space. Feeling the external device as attached to the real body helps the operators in choosing proper trajectories and better managing movements in the remote environment. In (Linebarger & Kessler, 2002), the authors compared the time to accomplish a task in both connected and no-connected conditions. The results showed that the presence of connectedness improved the accomplishing time of the task. In ((Perez-Marcos et al., 2012)), the authors investigated the importance of four factors on SoE in a virtual rubber hand illusion: visuo-tactile synchronicity while stroking the virtual and the real arms, body continuity, alignment between the real and virtual arms, and the distance between them. The results showed that the subjective illusion of ownership over the virtual arm and the time to evoke this illusion are highly dependent on synchronous visuo-tactile stimulation and on connectivity of the virtual arm with the rest of the virtual body.

Visuo-proprioceptive synchronicity
The manipulation of proprioceptive information is one of the most used SoE evaluation tasks. Proprioception is based on the information from receptors in muscles and joints (capsules and surrounding tissue). These receptors provide information to the central nervous system about the position and movement of body parts (e.g., the angle of a joint or the length of a muscle). Proprioception is the sense that tells the body where it is located in space. It is very important to the brain as it plays a big role in self-regulation, coordination, posture, body awareness, the ability to attend and focus, and speech. To reach high SoE, it is important that the visual cues that the operator receives reflect the remote device and environment accurately. Therefore, the distances, points and objects in space have to be properly perceived in the remote environment, the perception of joints has to be paired, and the device response to the operator's movement has to be synchronous and, ideally, happen in real time (i.e. with negligible delay; Kouakoua et al., 2020;Krom et al., 2019;.

Visuo-tactile synchronicity
The visuo-tactile synchronicity refers to how the operator detects visual and tactile cues (e.g., if the operator touches an object, the touch feedback should be in sync with the visual cues). The asynchronicity between visual and tactile feedback can easily and immediately break the embodiment illusion, providing to the operator an impaired perception of the external body. Usually, the asynchronicity is caused by delay in transmitting and receiving information from the operator to the device and vice versa. The asynchronicity makes the telemanipulation of objects and the interaction with the environment inefficient for the operators, as they need to adapt to a sensation of impaired position of the joints and tactile feedback. This perceptual cue is usually affected by time delay in teleoperation or by system lag in VR environments. There is an extensive literature of studies on the effect of synchronous and asynchronous strokes (Aymerich-Franch et al., 2017;Ehrsson et al., 2004;Folegatti et al., 2009;Hogendoorn et al., 2009;Longo et al., 2008;) in a RHI setup or manipulating virtual or robotic limbs. The common finding is that asynchronous stimulation decreases the sense of ownership over the surrogate.

Visual likeness of the surrogate
The visual likeness of the surrogate refers to the human-likeness and appearance fidelity of the embodiment w.r.t. the operator who is manipulating it. Generally, the more the device one is controlling is similar to the real body, and the more the actions that the operator accomplishes are mirrored by the device, the higher will be the operator's SoE. For what concerns the sense of ownership, Shin et al. (2021) showed that, in VR-based teleoperation, using human-like hands increased the risk perception and degraded workers' task performances in the execution of highrisk tasks. Moreover, some studies point to the importance of taking into consideration the operators' diversity while designing the avatar appearance. For example, Schwind et al. (2017) showed that female operators were more sensitive to the manipulation of male or female hands while operating an avatar in a VR setup. Female operators felt less embodied, while they manipulated male avatar hands. Particularly, this condition negatively affected the sense of ownership. On the other hand, in Mick et al. (2020), the authors tested the impact of the human-likeness on the sense of agency and task performance. They showed that this cue has a significant weight just in the initial phase of adaptation with the system. Meaning that, after a first phase of familiarization with the new joints, the operator will no longer perceive a low human-likeness of the surrogate as critical.

Haptic feedback
Haptic feedback is about the simulation of physical attributes, such as weight, pressure and stiffness, which allow the operators to interact directly with virtual or remote objects using touch and experiencing the physical attributes of them. Usually, they are reproduced by very small forces or cues (such as vibration), which are mostly only felt through mechanoreceptors in the skin surface. This cue determines a believable perception of and interaction with the remote environment. This is essential from both a practical point of view, in terms of task performance, and from the view of the operator's experience, in terms of the embodiment illusion. This cue is defined by the combination of tactile, kinesthetic and contact feedback, and the presence and magnitude of contact force. Realistic tactile feedback is complex to obtain with current technologies, but it can make a big difference in the task performance, especially in social, field, and surgical scenarios (Burke et al., 2006). In the social scenario, it can make the interaction more believable by providing, for example, a proper skin perception (Dargahi & Najarian, 2004). In the field scenario it could help in exploring the unknown remote environment by providing important information about objects, such as the temperature, texture, and stiffness (Giguere & Dudek, 2011). Finally, in the surgical scenario, it can improve task performance providing information about the texture, shape, and consistency of the internal body parts of interest (Bholat et al., 1999;Schostek et al., 2009). The proprioceptive feedback, instead, can provide information about the position and movement of the remote surrogate in the workspace. The presence of force and contact feedback makes the operator aware of the dimensions and shape of both the remote environment and the surrogate, while the magnitude of contact force is necessary to make the operator aware of the mass of the manipulated objects in the remote environment, and also of the power and strength of the surrogate.

Likeness of the environment
There are different ways to present the remote environment to the operator, such as video streaming or by building a virtual environment. To allow the operator to properly interact with the remote environment and to make the experience immersive, the quality of the data transmission and of the environment reproduced has to be high. The highest and easiest to design immersive experience occurs when the operator wears a VR headset. However, there are also other solutions, such as: big screens (cinema effect), placing the cockpit of the surrogate in a silent and isolate room, augmented reality glasses. These cues help in moving through the environment and predicting the interaction effect. Simulated environments should also take the social norms applicable to a real-life environment into consideration, if other actors are present in the simulation (Yee et al., 2007). In other words, the likeness of the environment is not only affected by the quality of the realization and transmission of the workspace, but also by the way in which other living beings, external to the operator, can interact with it.

Step 3a. Surrogate
We use the term surrogate to refer to the device, fake body, or avatar embodied by the operator. On the basis of the application context, the surrogate can represent the entire body or just a part of it. Rubber hands (Botvinick & Cohen, 1998;Ehrsson et al., 2004) and mannequins (Carey et al., 2019) are options just in empirical studies that focus on a better understanding of the embodiment components and their relation. VR avatars, instead, can be used for both theoretical studies (Debarba et al., 2017;Slater et al., 2010) and for testing the operator's SoE in a commercial teleoperation system (e.g., VR games). In telerobotics, we distinguish three categories of robotic surrogates: 1) humanoid robots are used in social scenarios, they resemble the human body and their design is based on the concept of bio-mimicking. The humanoid shape, due to the perception of a familiar appearance from the recipients, facilitates the expression of social cues and, therefore, the interaction. 2) Industrial robots are usually used for manufacturing. This category is mostly represented by robotic arms and the main focus, while designing these devices, is the optimization of the task performance. 3) Explorative robots can be divided in two sub-categories: a) open-air refers to animal-shape robots that are usually used in search & rescue scenarios since their anatomy is effective in hazardous and unpredictable environments; b) nano-world refers to robot applied in micro-surgery, anatomical exploration, or exploration in out-of-the-body world, such as pipe or tubes exploration (Anthierens et al., 2000).
In section 3.1 we described the tasks that these different categories of robots are required to accomplish.

Step 3b. Control device
With control device we mean the physical device, or combinations of devices, used by the operator to control the surrogate. The choice of the control device depends on the context of application and can affect the operator's experience (McEwan et al., 2012). For example, a joystick might be preferred in an industrial scenario to control a robotic arm, while a sensory glove may be a more adequate option to control a robotic hand in a social context. The control devices are part of the operator interface of the system and they are strongly linked to the perceptual cues (needed to achieve high SoE according to the described Step 2 in Section 3.3). In (Salcudean, 1998) and (Cui et al., 2003), several system controls and haptic interfaces are discussed with their challenges and opportunities, which could help deciding on a control device. A more updated and complete literature review can be found in (Hatzfeld & Kern, 2016).

Step 4. Test tasks
Most of the following tasks are suitable both in teleoperation and VR setups.

Positioning objects
This category of tasks requires the operator to place an object in a specific point in the workspace (Hannaford et al., 1991;Lozano-Pérez et al., 1989;Salcudean et al., 1997). Some examples are: 1) the peg-in-hole, that is a classic robotics task to test the interaction with a mechanical environment. It is a test of the operator's ability to achieve accurate positioning in spite of nonlinear mechanics and imprecise knowledge. It consists of a test participant grasping a peg with the end-effector and then inserting it into a specified hole. The tolerance of fit for the hole is usually varied, and the associated task time for inserting the peg into the hole is recorded (Howe & Kontarinis, 1993;Massimino & Sheridan, 1994). The 2) pick and place consists in grasping an object and then manipulate it (an extra condition is sometimes added, in which it is required to delicately manipulate the object), avoiding obstacles and moving along certain lines to avoid collision, and finally placing it in a target point (Aarno et al., 2005;Griffin et al., 2005). The 3) tube task (TT) is a test designed to gather measures of ownership and self-location. Participants are instructed to use a control device (such as a joystick) to adjust the size and position of a virtual tube, to match it with the perceived locations of their ankles. During the task no virtual body is displayed and the virtual tube is the only visible object in the environment.

Telemanipulation of flexible objects
This task consists of the telemanipulation of non-rigid objects, which require more complex control and force feedback strategies than the ones used to manipulate rigid objects. An example is attaching and detaching velcro fasteners. Its hook and loop fastening system has nonlinear mechanical properties that challenge manipulation capabilities. Tele-shaking hands is another example, the challenge is given by the unpredictable impedance of the dynamic system (in this case the hand of the recipient) to which the operator has to interact with (Aiple & Schiele, 2017;Bevan & Fraser, 2015;Hannaford et al., 1991;Machida et al., 1998;Song et al., 2005).

Micromanipulation & microassembly
This category of tasks is used to test systems for surgical or precision maintenance purposes. It consists of an operator interface that uses visual, haptic and control devices (macro world), a nanomanipulator, and sensors (nano world), to telemanipulate between macro and nano worlds (Ben-Porat et al., 2000;Codourey et al., 1997;Sitti, 2001;Sitti et al., 1999).

Tracking a sustained contact force
This set of tasks measures the ability to present information for tracking the magnitude of a force over time. It tests how the operator can manage the force feedback, by dosing the force and properly interpreting the information provided by the sensory feedback. An example is a telemanipulation task where a force must be exerted over a period of time, and has a maximum level above which damage and task failure will occur. These tasks are good to determine the presence and magnitude of contact force (Anderson & Spong, 1988;Massimino & Sheridan, 1993;Park & Khatib, 2006;Zhu & Salcudean, 2000). Moreover, they test the operator's awareness of the dimensions, shape, and weight of the surrogate.

Changed workspace
This task demonstrates the ability of the operator to deal with a dynamic workspace. It tests the ability to interact with a hazardous environment and unexpected stimuli. The operator is first trained in a workspace, and then this workspace is changed. For example, a new obstacle is placed right in the path of the learned trajectory and the operator has to circumvent the object (Aarno et al., 2005;Szántó et al., 2013).

Motor imagery task
It consists of asking the operators to experience an embodiment illusion with the surrogate, only by imagining a movement (motor imagery) and watching the device performing it. Several studies demonstrate that the timing and accuracy of the performance feedback could improve operators' modulation of brain activities for the motor imagery task (Alimardani et al., 2014(Alimardani et al., , 2015Beursken, 2012;Debarba et al., 2017). Therefore, the motor imagery skills acquired through the training have long-lasting effects, which improve the operators' performances especially if they are using Brain Computer Interfaces (BCIs) as control devices. Operators can explore and operate in a remote environment and train their distance perception and proprioceptive level of information to increase the sense of self-location. An example is the mental drop ball (MBD) task (Lenggenhager et al., 2009), in which the participant estimates the time a ball would take to fall down from their hand to the floor. The MDB is meant to address the question where the self is localized or, more specifically, to detect whether the operator has similar time estimation in 1PP and 3PP. Consistently shorter times in 3PP could indicate a weak sense of self-location or that operators are better at judging distances than depth.
Threat task This is a passive task that tests the body ownership at the time of the threat. If the operator feels affected by the observation of the threat in the remote environment, this indicates the presence of the sense of ownership, which will be probably ended by the threat (since the operator becomes aware of the illusion; Ehrsson et al., 2007;Yuan & Steed, 2010;Zhang & Hommel, 2016). Designing a threat into the experiment (e.g., hitting the surrogate with a hammer or stabbing it with a knife) is the most used test to assess the sense of ownership with a proper physiological measure (such as skin conductance response or heart rate). A peak in the signals recorded by the physiological measure, at the moment of the threat, is considered a proof of embodiment. It is also a consequential measure of the sense of self-location, even if it is not necessarily considered as such in the papers. We claim that if one feels affected by the threat, it implies that the individual feels located and immersed in the remote environment, therefore also affected by its dynamics.

Time delay
For all the previous tasks, imagine an additional condition in which a delay is added. This is a difficulty that can be added to each task and it can be used to test each component. Just to provide some examples, the delay could be added in order to test to what extent the presence of time delay affects the embodiment experience, to test the operator's management level of an unstable control device, and to test how much delay the system control of a device can handle before becoming too unstable and dangerous (Massimino & Sheridan, 1993;Sheridan, 1993). The time delay is one of the main issues that in encountered in teleoperation applications and it can be use to test how its presence can affect the sense of ownership and agency of the operator in different contexts and conditions.

Miscellaneous tasks
There are even mixed and alternative methodologies of stimulation. In (Pavone et al., 2016), for example, the authors explore whether embodying the errors of an avatar may activate the error monitoring system in the brain of the observer (e.g., looking in 1PP at an avatar who drops an object that it should hold or who does not follow the instructions provided by the experimenter) by seeing it from 1PP. Other studies manipulate the perceived size of the external body or part of it, and in their findings they show that the perceived size of objects is determined by the size and by the strength of the body in which the participant feel embodied (Normand et al., 2011;Van Der Hoort & Ehrsson, 2016;Van Der Hoort et al., 2011).

Step 5. Explicit measures
We discern two categories of assessment measures: explicit and implicit measures. Explicit measures are based on explicit ratings and reports made by the user and observers. Implicit measures are based on, for instance, performance, or physiology. This section presents the explicit measures (questionnaires and self-reports) and Section 3.8 describes the implicit measures.

Questionnaire
Questionnaires are the most used explicit measures of the embodiment, especially due to the fact that they can be adjusted and adapted to every kind of embodiment experience. Furthermore, questionnaires allow to focus on specific components of the embodiment.
Focus on the sense of ownership. The most well-known and used questionnaire, possibly with variations, is from (Botvinick & Cohen, 1998). Participants submit their responses on a seven-step visual-analogue scale ranging from 'strongly agree' to 'strongly disagree' (e.g, "I felt as if the rubber hand were my hand," "It seemed as if the touch I was feeling came from somewhere between my own hand and the rubber hand"). Even if the items to be evaluated mostly address the sense of ownership, they also cover the sense of self-location, especially for what concerns the proprioceptive awareness of the operator. Similar studies can also be found in (Van Der Hoort et al., 2011) and (Hohwy & Paton, 2010). In the questionnaire presented in , the focus is on the sense of ownership, while the sense of self-location is measured through the proprioceptive drift. In (Petkova & Ehrsson, 2008), the authors designed a questionnaire with complete focus on the sense of ownership. The participants were asked to complete a questionnaire on which they had to affirm or deny seven possible perceptual effects using a seven-point Likert scale. Three statements were designed to capture the illusory experience of being the artificial body (in this case a mannequin, e.g., "It felt like the mannequin's body was my body"), and the other four served as controls for suggestibility and task-compliance (mostly synchronous or asynchronous stimulation, e.g., "It seemed as the touch I felt was caused by the stick touching the mannequin's body"). In one experiment, participants observed a knife 'cutting' the mannequin's abdomen. There are studies in which the previous reported questionnaires are rephrased, mixed or adjusted in order to be used in a particular embodiment experience, but keeping the same scales and question contents (Bruno & Bertamini, 2010;Lenggenhager et al., 2007;Pavone et al., 2016;Tieri et al., 2015;Tsakiris, 2010;Van Der Hoort & Ehrsson, 2016). For example, (Slater et al., 2008) proposes a variation of the one presented in (Botvinick & Cohen, 1998), in order to make it applicable also to virtual reality scenarios (e.g., "During the experiment there were moments in which I felt as if the virtual arm was my own arm"). In , some of the questions were derived from previous works (Botvinick & Cohen, 1998;Ehrsson, 2007;Lenggenhager et al., 2007) and others were introduced following interviews with participants in extensive pilot trials. A 13-item questionnaire was answered by the participants. Eight of these questions related to the sense of ownership. The questionnaire scores (between 0 and 10) were recorded into ranges as Very Low (0), Low (1-3), Medium (4-6), High (7-9) and Very High (10), based on the layout of the questionnaire. In (Yuan & Steed, 2010), instead, the questionnaire is re-adapted from (Slater et al., 2008), in order to be applied to virtual reality scenarios. This questionnaire covers both the sense of ownership and agency. There are also studies in which the authors measure the sense of ownership using questionnaires designed for a different purpose. For example, in (Fusaro et al., 2016) participants were asked to fill out two questionnaires aiming at measuring trait-empathy (i.e., the interpersonal reactivity index (IRI; Davis et al., 1980) and the empathy for pain scale (EPS; Giummarra et al., 2015)). Finally, other studies try to design a unique embodiment questionnaire in order to standardize this metric. An example is in (Gonzalez-Franco & Peck, 2018), in which the authors review the questionnaires used in past user studies and propose a standardized embodiment questionnaire based on 25 questions (the ones prevalent in the literature). The questions can be customized and used in studies involving virtual avatars, mannequins, and robotic devices. Moreover, the authors encourage to administer this questionnaire in future embodiment experiments (especially in virtual reality scenarios) that include first-person virtual avatars. The main aim of the work was to further investigate the embodiment components, and to increase the comparability and standardization of the measurement of embodiment across experiments by providing a standard embodiment questionnaire that is validated and reliable. They confirmed and updated this purpose also in their most recent work (Peck & Gonzalez-Franco, 2021), in which they presented new topics of discussion on the embodiment components and also an updated questionnaire with a reduced number of questions, from 25 to 16.
Focus on the sense of agency. In (David et al., 2006), the authors design a questionnaire that evaluates the sense of agency and the sense of self-location, particularly participants indicate whether they experience a sense of agency during active agency task conditions and how they perform during the 3PP condition. In (Burin et al., 2017), we find an example of a questionnaire that focuses on the sense of ownership and agency. Another well-known and validated questionnaire is the one from (Longo et al., 2008). The authors assess the three components of SoE, but with a particular focus on the sense of agency, especially for what concerns the experiment design. Particularly, they measure five components: 1) embodiment, reflecting feelings that the rubber hand, used as the artificial limb to create the embodiment illusion, belonged to the participant; it comprises three dissociable subcomponents: ownership, agency and self-location. 2) Lossofownhand, reflecting feelings of being unable to move one's hand; 3) movement, relating to perceived motion of one's own hand; 4) affect, relating to the experience of the experiment being interesting and enjoyable; 5) deafference, which is related to the experience of perceiving the hand less vivid than normal due to asynchronous visuotactile stimuli (such as seeing a brush touching the rubber hand but not feeling it on the real hand), which deceives the brain. Participants indicated their agreement or disagreement with 27 statements in each block using a 7-item Likert scale (from "strongly agreed" to "strongly disagreed"). The first two items were always related to the experience being interesting and enjoyable (e.g., "I found that experience enjoyable"); the order of subsequent items was randomized separately for each participant in each condition (e.g., "It seemed like I was in control of the rubber hand," "it seemed like my hand was in the location where the rubber hand was"). For what concerns the affect component, there is an ongoing debate on the way in which the experiment experience can affect the embodiment components. In Ataria, (2015) and Banakou et al., (2020), the authors demonstrated, in two different experimental context, that the level of interest and enjoyability of an experience mostly affect the sense of ownership and then, possibly as consequence in the long term, the sense of agency.

Focus on the sense of self-location.
It is not common to focus a questionnaire on the sense of selflocation, because often this component is not assessed independently from the other embodimemt components. As reported before, David et al., (2006) assess the sense of agency and self-location in combination. Another example is in (Debarba et al., 2017), in which the authors design a questionnaire to assess the senses of agency, body ownership, self-location to assess the effect of congruent visuo-motor-tactile feedback both in active and passive (the participant is just an observer) conditions, and 1PP and 3PP conditions. It contains 10 questions: two for each component, two for the threat, and two control questions. Questions were formulated based on related experimental protocols (Caspar et al., 2015;Lenggenhager et al., 2007;Longo et al., 2008). The answers were given in a 7-point Likert scale, ranging from "Strongly Disagree" to "Strongly Agree."

Self-report
We define self-reports as participants' reflection on their experience. It is an introspective report that can be both semi-structured or without any kind of guidelines in the participant's stream of thoughts. This differs from questionnaires, in which participants have to answer questions or evaluate an experience using a rating scale. The literature related to self-reports can be a good starting point to design more specific quantification metrics. The information obtained from self-reports can be interesting and relevant, but hard to compare among studies and to report outside an exploratory view. Often, self-report data are reported but not analyzed. Usually, self-reports take into consideration all three embodiment components. It is also common to combine questionnaires and self-reports, as in (Armel & Ramachandran, 2003), combining free response descriptions of the experience and an intensity rating to determine the degree to which participants embodied a fake hand. In (Ehrsson, 2007), we can find a combination of questionnaire and self-report. In this case, Ehrsson reports a few sentences of the participants who described their experience and feelings, but does not use a scale to assess the reported interview. Unlike (Lewis & Lloyd, 2010), in which the authors analyzed interview data using interpretative phenomenological analysis (IPA; Smith, 1996). The IPA was selected because of its emphasis on the experiences of the participant and how the participant makes sense of these experiences. Structure for the introspective interview was provided by a series of open-ended questions: "Do you have any unusual sensations and can you describe them?," "How do you feel about the rubber hand at the moment?," "How intense is this sensation?," "Compared to the synchronous stroking how does the asynchronous stroking feel?." Questions comparing the experience during synchronous stroking and the experience during asynchronous stroking were used to aid introspection and help the participants in articulating their experience. Questions were usually presented to the participants in the order reported in the paper; however, the authors claim that participants were encouraged to report out loud any thoughts as they occurred and this could influence what questions were asked and in what order.

Proprioceptive measures
Using proprioception as a measure of SoE is related to the operator's awareness and perception of the size and shape of the surrogate in relation to the remote environment. There are three common ways to apply this measure: 1) by asking the participants to reach a point and then measuring the distance between the target point and the reached one (Normand et al., 2011); 2) by asking participants were they think that a part of their body is located (Longo et al., 2008); or 3) by asking participants if they felt that their location and the one of the controlled surrogate was the same (Hoover & Harris, 2016;Lenggenhager et al., 2007). Proprioceptive measures provide useful information about all three components (sense of ownership, sense of agency and sense of self-location).

Reaching-distance judgment
This measure is designed to assess the perception of the peri-personal space, in order to link it with the sense of self-location and also the sense of ownership. In (D'Angelo et al., 2017), participants were asked to stop a confederate at the distance where they thought they could reach her/him.

Heart rate
Heart rate (HR) is the speed of the heartbeat measured by the number of contractions (beats) of the heart per minute (bpm). HR can be used as a measure of embodiment to observe how much an operator is engaged with the surrogate, the remote environment, and the global embodiment experience (i.e., the extent to which, for instance, anxiety and stress are provided by the external environment). Please note that other factors such as the physical activity of telemanipulation are also reflected in changes in HR and this should be compensated or controlled for. For example, in (Fusaro et al., 2016), participants were immersed in a VR scenario and they observed a virtual: i) needle penetrating (pain), ii) caress (pleasure), or iii) ball touching (neutral) the hand of an avatar seen from 1PP or 3PP. In  they measured HR deceleration in response to a virtual scenario in which a woman slapped a girl, a parameter that has been associated with reports of aversive stress in the context of picture viewing. However, interoception is not always a good index of SoE, its variation could also be related to other factors (Buldeo, 2015). In (Fusaro et al., 2016;Slater et al., 2010), the authors do not state or proof a clear disentangling among the three components while using the HR as measure of SoE.

Skin conductance response
The skin conductance response (SCR) is the phenomenon that the skin momentarily becomes a better conductor of electricity when either external or internal stimuli occur which are physiologically arousing. In (Armel & Ramachandran, 2003), the authors report that a threat to a rubber hand in the RHI caused a skin conductance response (arousal in response to the expectation of pain) in the synchronous, but not in the asynchronous condition. Also in (Grechuta et al., 2017), the authors use SCR as a measure of the autonomic nervous system (ANS) activity to quantify the experience of agency and ownership over a virtual hand. The ANS is the primary mechanism that regulates involuntarily physiological states, such as arousal due to anticipating pain or fear. Participants who experienced the illusion show a marked rise in SCR. In (Ehrsson, 2007), the author registered the SCR as a measure of the emotional response when the illusory body was "hurt" by hitting it with a hammer after a period of synchronous or asynchronous stimulation. Several studies decided to apply the SCR as a physiological measure of embodiment (for all three components, but with particular attention to the sense of ownership; Debarba et al., 2017;Fusaro et al., 2016Fusaro et al., , 2019Petkova & Ehrsson, 2008;Riemer et al., 2015;Van Der Hoort et al., 2011;Yuan & Steed, 2010), and in order to maximize the measure they always insert a threat at the end of the embodiment illusion.

Skin temperature
The skin temperature (ST) variation is a result of a physiological reaction of the body to a stressful situation. It is often used as embodiment measure to observe the unconscious body reaction to the embodiment illusion, in both virtual and physical conditions. However, the literature presents contrasting opinions on the efficiency of this assessment, especially because there is not a standard way to interpret the results and the replication of similar studies brings to inconsistent results. For example, some studies report that any change in the temperature is a proof of SoE. In (Hohwy & Paton, 2010), the ST of participants was recorded to assess how changes in ST are related to presence or absence of the embodiment illusion in different conditions. In (Tieri et al., 2017), the authors investigate whether SoE over a virtual hand is reflected in changes to the physiological mechanism of ST regulation, and whether ST is modulated by the visual appearance of the virtual limb. This study focuses on both the sense of agency and the sense of ownership. However, it is difficult to state that ST addresses a specific component of SoE. Some studies are more specific about the kind of temperature changes, and they report that the sense of ownership is active if the temperature of the real limb decreases during the embodiment experience. In (Moseley et al., 2008), the authors hypothesize that ST in a specific limb can be changed by psychologically disrupting the sense of ownership of that limb. By using an established protocol to induce the RHI, they demonstrated that ST of the real hand decreases when they take ownership of an artificial counterpart. Moreover, they showed that the decrease in ST is limb-specific: it does not occur in the unstimulated hand. Also in (Crivelli et al., 2021), the authors explore the relationship between body ownership, thermoregulation, and thermal sensitivity in a mirror-box illusion paradigm. Results showed a decrease in the hand ST, following the induction of the illusion of ownership toward the participant's reflected hand. Other studies point out the inconsistency of using the ST changes as measure of the embodiment. In (De Haan et al., 2017), the authors conducted several studies in which they recorded hands temperature during a RHI in different circumstances, including continuous temperature measurements in a temperature-controlled room. They covered five attempts to replicate the traditional RHI experiment. The results did not show a reliable cooling of the real hand during the RHI. Therefore, they stated that hand cooling in the RHI is not causally related to changes in body ownership. (Rohde et al., 2013) replicated the classical RHI, by inducing cooling of the stimulated hand using an automated stroking paradigm, where stimulation was accomplished by a robot arm. After they found no evidence for hand cooling in two experiments using this automated procedure, they tried a manual stroking paradigm, which is closer to the one applied in the original RHI. With this procedure, they observed a relative cooling of the stimulated hand in both the experimental and the control condition. The subjective experience of ownership, as rated by the participants in the questionnaire, was strictly linked only to synchronous stroking in all three experiments, implying that hand-cooling is not a strict correlate of the subjective feeling of hand ownership in the RHI.

Reaction time
Reaction time (RT) is a measure of how quickly an operator can respond to a particular stimulus. This measure can only be applied to certain kinds of task-oriented user study, because it is strictly task related. A classical example of reaction time (RT) as measure of embodiment, particularly of the sense of ownership, can be found in (Pavani et al., 2000). Participants had to recognize the position of vibro-tactile stimuli while ignoring the incongruent visual feedback (in this case, distractor lights). The RT was compared between congruent and incongruent stimuli, in order to examine the response conflict. A study with a focus on task switching, instead, is the one from . The authors investigated the relationship between the RHI and higher cognitive functions by experimentally testing task switching by measuring the RT. Task switching involves the ability to unconsciously shift attention between one task and another, therefore the required attention span is high and the RT becomes a valid assessment. A more unusual application of this measure, in the more peculiar form of onset time and temporal dynamics in general, can be found in (Kalckert & Ehrsson, 2017;Lane et al., 2017), where the authors try to detect the sense of disownership from the real hand in a RHI setup. In (Grechuta et al., 2017), the focus is on measuring the sense of agency, even if the other components are involved. In this work, the authors measure the reaction time playing with the synchronicity of visuo-tactile stimuli. In Pavani et al., 2000) the attention is also on the sense of self-location.

Neural activity
Recording and measuring neural activities can provide insights and evidence o SoE experience. In a less invasive way, it is possible to record the electrical impulses in the brain using an electroencephalogram (EEG; Pavone et al., 2016). It was also possible to observe that several brain areas are involved in SoE, thanks to functional magnetic resonance imaging (fMRI), positron emission tomography (PET) and to directly stimulate them through the transcranial magnetic stimulation (TMS;MacDonald & Paus, 2003;Ogawa & Inui, 2007;Seghezzi, 2019). These measures are expensive to design and invasive for operators, therefore they present lots of constraints in designing the setup, the user study and the tasks. They also require the use of specific materials for the control device and the haptic devices, and they limit the action space of the operator. Several studies tried to investigate the brain mechanisms involved in the sense of ownership over a surrogate. Usually, the ownership is manipulated by making use of a perceptual illusion, such as the RHI (Ehrsson et al., 2004Fossataro et al., 2018;Limanowski & Blankenburg, 2016b;Preston & Ehrsson, 2016;Zeller et al., 2016). In (Limanowski & Blankenburg, 2016a), the main focus is on the sense of self-location. The authors test the awareness of the body position in space and how the brain model is updated during an embodiment experience.

Assessment measures compatibility and discussion
The possibility to clearly and effectively disentangle the embodiment components is still an open debate, however the tasks are usually designed to test them all together. Often, the tasks design is too generic: it does not allow a clear distinction between the components and, therefore, their assessment. The importance of disentangling the embodiment components arises when there is the necessity to singularly improve them in a system or an embodiment experience. If it is unclear how to address a specific component, it is not possible to understand how and which conditions and parameters have to be manipulated in order to change the level of a specific component. Sometimes, authors state that a particular measure was used to assess a specific embodiment component, but, considering the reported experiment design and the tasks, it is hard to disentangle the assessment among all SoE components. For example, self-location is almost never tested individually, but it is usually indirectly tested when the sense of ownership is assessed. Moreover, the sense of self-location is rarely tested in big or open spaces, but mostly in the context of the peri-personal space of the surrogate. Table 2 summarizes the compatibility between assessment measures and each embodiment component: 1) when a measure can assess a specific component, 2) when it is nonspecific, and 3) when it is not suitable for a specific component.
The questionnaire and self-report can address each component specifically, since they can be customized. They can be applied to every kind of user study and context, since they are versatile and easily editable. Moreover, especially for questionnaires, there is a vast body of literature to support them. An advantage of self-reports is the possibility to obtain insights of the operator experience and to pay attention on unexpected aspects suggested by participants. Among the disadvantages of the two measures, there is the necessity to be supported by implicit measures and they are usually timeconsuming for the participants. Another drawback of the self-reports is the lack of a unique and comparable way to assess them.
For what concerns the implicit measures, the proprioceptive measures are easy to assess and provide a good indication of the operator's perception of the surrogate. Among the disadvantages, it can be time-consuming to first collect and measure a baseline before the embodiment experience to test. Moreover, it is hard to disentangle the assessment of each embodiment component, since the localization bias of the real body toward the surrogate can be an effect of all the three components. Finally, operators' performance could be affected by some aspects the experiment or setup design such as: the local and remote environment, the sensory feedback, the surrogate, or other factors involved in the design of the system or the embodiment experience.
The reaching distance judgment is a good measure of the sense of self-location, but it is difficult to distinguish it from the effect of the sense of ownership. The judgment about the distance is done on the basis of the perception of the surrogate in the space. It is not a good measure for the sense of agency, since it does not allow to directly assess it but just to have an idea of how the operator would interact with the environment. Therefore, it cannot be considered as a measure of the sense of agency, but just as a method to build predictions on the interaction between the operator and the environment. Among the advantages, this measure is also versatile and easy to measure, since it does not require an expensive equipment, and its application is not time-consuming.
The heart rate and the skin conductance response are the mostly used implicit measure of SoE. They are easy to measure, but the results could be affected by each embodiment component. A high level of stress could be strongly related to the sense of ownership, but being immersed in the remote environment (sense of self-location) and performing an active task (sense of agency), which requires a high span of attention, could equally influence the results, without the possibility of understanding to what extent each component contributed to the embodiment experience of the operator. Currently, pupil dilation is studied in comparison with SCR and HR (Falcone et al., 2021), since these three measures, apart from SoE evaluation, are used in similar applications and with similar purposes. They aim at reflecting the functions and reactions of the human body to the change of the outside environment and the inside of the body (e.g., that is why they correlate with cognitive effort that can be a purely internal process). Using a combination of physiological approaches to measure human behaviors, not specifically SoE, is not novel (Gutjahr et al., 2019;Wang et al., 2018), and neither comparing and investigating what information they can provide and how accurately (Hogervorst et al., 2014). However, this measure is still novel and needs more validation.
The skin temperature is usually measured on a specific limb and it is used as measure of the sense of ownership (Moseley et al., 2008). In the current literature, ST is not used as measure for the other two components. There is an open debate in the scientific community about the use of this assessment measure. Some studies failed to replicate previous findings regarding temperature changes as a consequence of the embodiment illusion (Crucianelli et al., 2018;Rohde et al., 2013). Moreover, it is still unclear if 1) the skin temperature is a measure of the stress levels and if 2) the temperature variation should be considered locally (on the body part) or globally (on the entire body).
The reaction time allows to design dedicated tasks to assess each component specifically. For example, a task involving cognitive shifting would allow to assess the sense of agency by measuring the reaction time. The sense of self-location could be assess by measuring the reaction time in a changed workspace task.
Finally, the measure of neural activities is versatile and can be used as measure of all the embodiment components and of the operator's experience. There is a vast body of literature to support it, especially the noninvasive techniques. Moreover, it can provide interesting and unique insights of SoE and the embodiment experience. Among the cons, both not-invasive and invasive techniques create lots of constrains for what concerns the setup design. The use of this category of measures can impose limitations, such as the impossibility to use certain kinds of materials and the reduced action space for the operator. At the same time, this is also the category which, until now, provided the most interesting insights and explanation on SoE, because it provides direct feedback from the brain.
What emerged from this survey is that the most common and effective way to measure the embodiment is a combination of one explicit measure, and one or more implicit ones. In our opinion, this combination is the necessary basis of any good evaluation framework, since conscious perception and unconscious processes do not always coincide. What also arises is that there is a lack of a standard and common definition of SoE, which makes it hard to re-use current SoE design and assessments. This leads to a SoE assessment unfocused and vague. For what concerns the physiological measures, for example, each sort of variation from the standard signal is considered a proof of SoE, without taking into account the context and conditions in which an embodiment experience is realized. As a final consideration, the initial assumption that achieving high level of SoE can improve task performance is not always correct: SoE and teleoperation are positively correlated in some, but not all application scenarios and tasks. Indeed, if we think about industrial scenarios and the kind of classical tasks that the operators have to achieve in this context of application (such as maintenance in unhandy or inaccessible environments, or moving heavy objects), operators may prefer and perform better using a joystick as control device to teleoperate an industrial arm with a gripper attached to it than a sensory glove to teleoperate a humanoid arm and hand. This is because humanoid hand would not be as effective as a gripper to carry or move heavy objects. Moreover, in this context, the main focus is on the task performance improvement, that cannot be achieved in this case by a humanoid surrogate considering the current technology.

An overview on the toolbox
After having explained in detail the steps of SoE toolbox, this section provides a global understanding of it.
We distinguish three components of SoE: the sense of ownership, the sense of agency, and the sense of self-location. As also depicted in Figure 1, the first step is to define the application scenario (step 1) and the embodiment components involved (step 2a). The components determine the relevant perceptual cues as listed in Table 1. These cues are key points for the setup and system designer to focus on while designing an embodiment experience (step 2b).
The next step is to define the embodiment setup (step 3). The surrogate that can complete the application scenario (as determined in step 1) and the control device that can deliver the key perceptual cues (as determined in step 2b) have to be chosen. The next step (Step 4) is to choose a task, or a combination of tasks, to test the system according to the previous choices. According to the context of use, there will be some tasks that are more recommended than others. Finally in step 5, the assessment measures and their customization are selected. We make a distinction between explicit and implicit assessment measures. The explicit measures have to do with what people say they actually experience, such as questionnaires and self-reports. The implicit measures include task performance (irrespective of the experience during task execution) and physiological measures. In accordance with (Kaneko et al., 2021), we suggest to use a combination of explicit and implicit measures in order to have a complete overview on the operators' experience. Now we present two examples of application of the five steps outlined above.

Example 1: Hugging in a computer game
Step 1. Let us say that we want to introduce the possibility to give and receive hugs in a virtual reality game (step 1: application scenario).
Step 2a. Since we will operate in a social scenario, we want to achieve the best level of all the embodiment components: sense of ownership, sense of agency, sense of self-location. This information can be deducted from the description of the social scenario provided in Section 3.1.
Step 2b. It means that we want to optimize the perceptual cues that affect the embodiment components addressed in this situation (in this case, all of them). These cues can be taken from Table 1.
Step 3a. In order to do that, it is important to choose the proper surrogate (step 3.1: surrogate). Since the application will be a virtual game, we can design a humanoid virtual avatar.
Step 3b. We need to design the operator interface. We will immerse the user in a VR scenario, using a VR headset with 3D-vision and audition, and a haptic suit that covers the upper body of the operator so we can optimize the perceptual cues listed under 2b.
Step 4. The best tasks to test our system are from the category of sustain contact force (to test the hugging experience), and threat (to test SoE). The guidelines to select the tasks are presented in Section 3.6.
Step 5. The most suitable measures to assess the system are: 1) explicit measures: a questionnaire on SoE and telepresence, and a structured self-report to have more insights on the experience (e.g., what still misses to be believable, which are the differences with respect to a real hug); 2) implicit measures: skin conductance response or heart rate, to measure the level of stress and the emotional state to the threat task chosen in step 4, and the proprioceptive drift to measure the awareness of distances and space.

Example 2: Maintenance in a hostile environment
Step 1. We want to design a system to allow the operator to do maintenance in an industrial scenario, particularly moving blocks of different weights in an environment in which the temperature is too high to be tolerated by a human being.
Step 2a. The scenario is industrial and it involves the manipulation of objects in space, therefore the focus will be on the sense of agency and self-location (step 2a: embodiment components involved).
Step 2b. The perceptual cues that affect those embodiment components should be optimized (step 2b: perceptual cues to optimize). We need to give the priority to the optimization of the task performance more than SoE. Haptic feedback is important for task execution because the operator has to distinguish among the different weights of the blocks. In this case, temperature and the texture of the manipulated objects or environment are less relevant. However, it could be useful to alert the operator with a visual or tactile feedback if the temperature is high at the level that can damage the surrogate or the manipulated objects. Force and tactile feedback provide information that facilitate the operator in accomplishing the task.
Step 3a. The best surrogate will be a robotic arm able to handle blocks weight and high temperature (step 3.1: surrogate). Moreover, the surrogate will need to have enough degrees of movements to reach all the necessary points of the workspace. In this case, we will choose a gripper attached to an industrial robotic arm. This is because we do not need an end-effector to accomplish precision tasks or with multiple degrees of freedom. The operator will have to perform always the same simple movements that will be differently combined with respect to the situation. Therefore, a setup that supports the operator in accomplishing the allowed movements, by proving a complete visual overview of the workspace, will reduce the risk of making mistakes.
Step 3b. As a control device, a force feedback joystick would be one of the best comfortable choice to perform in this environment (step 3.2: haptic device), since the movement would be more intuitive and easy to learn.
Step 4. Positioning objects and changed workspace are three categories of tasks that could be useful to test SoE (step 4: tasks). The first tests the dexterity of the operator and the second tests how the operator faces the dynamic environment and the awareness of the surrogate in it.
Step 5. The best assessments would be: 1) explicit measure: questionnaire on the user experience (efficiency, effectiveness, ease of use of the system); 2) implicit measure: proprioceptive drift, reaching distance judgment, and task performance (number of errors, time needed to accomplish the task, number of moves needed to accomplish the task) (step 5: assessment measures).

Conclusions and recommendations
We presented a toolbox to assess SoE, starting with a literature review of the most frequently used assessment measures, test tasks, perceptual cues, and application scenarios, with particular attention to teleoperation, VR setups, and embodiment experiences. Our conclusion consists of the following considerations and recommendations: • We miss a standard definition of SoE and a clear picture of what we would like to assess while testing it. In this paper, we try to integrate the previous well-known definitions in a unique one, highlighting, with respect to the previous definitions, the distinction between subjective and objective aspects of SoE, and underlining the importance to the role of the embodiment components; • Authors often create a strict dependency between the task and the measure. This makes searching and designing a standard measure for SoE difficult; • Tasks are often designed to test the three SoE components at the same time, even if it is not explicitly stated by the authors. Our recommendation is to have clearly in mind that embodiment components one wants to address, before designing or assessing an embodiment experience. This will make the assessment more reliable and valid; • The most used explicit measures are questionnaires. In (Gonzalez-Franco & Peck, 2018;Peck & Gonzalez-Franco, 2021), the authors present a good starting point toward a standardized questionnaire. It is useful to define specific rules to customize the questionnaire to a particular application scenario and embodiment experience, defining how to choose a subset of questions (e.g., this could be useful in long and repetitive experiments, in which 25 questions would be too many). (Peck & Gonzalez-Franco, 2021), with the new validation studies, addresses some of these points; • The most used physiological measures are the skin-related ones and the heart rate. However, they can become unreliable if, while designing an embodiment experience, unrelated factors to SoE (that can affect anyway the measurements) are not taken into account. Therefore, even if these are the most frequently used implicit measures, it is important to choose the proper assessment on the base of the setup and system requirements, the task and the application scenario; • It is hard to disentangle the assessment of SoE components using physiological measures; • We suggest to measure SoE using a combination of explicit and implicit measures. The first would measure the conscious embodiment experience of the individuals, while the second ones would measure the intrinsic and unconscious changes in SoE levels.
This study guides in choosing the proper tasks and measures to assess SoE. We also present and underline the reasons that led us to conduct this review, namely the lack of a standard assessment framework. Moreover, this paper aims at providing the first complete SoE toolbox that can guide in applying the existing measures and tasks. A SoE design toolbox will help in defining a clear idea of what researchers can and want to assess, test, and obtain from their SoE studies and setup. To facilitate SoE design, we also define Tables 1 and 2. However, further works and investigations are required to confirm the information that we reported and structured in the tables. The aim of the proposed toolbox would even be to help the research community in comparing the different studies and, ideally, in creating a predictive model of the level of SoE in a task-oriented system before its implementation. This predictive model, object of our current investigations, would become a starting point for the improvement and optimization of new teleoperation systems, VR setups and, more in general, embodiment experiences.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Notes on contributors
Sara Falcone is interested in human-machine interaction, teleoperation, robotics, cognitive science, neuroscience, and linguistics; she is a PhD candidate in the Human Media Interaction department of the University of Twente and in the Netherlands Organization for Applied Scientific Research (TNO).
Gwenn Englebienne is interested in robotics, haptics, machine learning, human-machine interaction, modelling behaviors; he is an Assistant Professor in the Human Media Interaction department of the University of Twente.
Jan van Erp is interested in human-computer interaction, human perception, haptics, brain-computer interfaces, robotics; he is a Full Professor in the Human Media Interaction department of the University of Twente and in the Netherlands Organization for Applied Scientific Research (TNO).
Dirk Heylen is interested in robotics, virtual reality, linguistics, communication, artificial intelligence, intelligent systems; he is a Full Professor in the Human Media Interaction department of the University of Twente.