Methodologies for evaluating exoskeletons with industrial applications

Abstract Industrial exoskeletons are globally developed, explored, and increasingly implemented in industrial workplaces. Multiple technical, physical, and psychological aspects should be assessed prior to their daily application in various occupational environments. The methodology for evaluating these aspects is not standardised and differs in terms of focussed research objectives, used types of analyses, applied testing procedures, and use cases. The aim of this paper is to provide a matrix comparing the prevalence of different types of analyses combined with their respective research objective(s). A systematic review in the database ‘Web of Science’ identified 74 studies, mainly in laboratory settings, with a focus on short-term effects as well as with male-dominated samples being low representative for industrial workforces. The conducted evaluation methodologies are further discussed and compared in terms of testing procedure, sample, and research objectives. Finally, relevant aspects for prospectively evaluating industrial exoskeletons in a more harmonised and comprehensive way are suggested. Practitioner summary: Industrial exoskeletons are still inconsistently and insufficiently evaluated in scientific studies, which might hamper the comparability of systems, threaten the human health, and block an iterative system optimisation. Thus, a comprehensive evaluation methodology is needed with harmonised and multicriteria types of analyses.


Introduction
The industrial interest in implementing exoskeletons in several working processes increasingly rises, since their use can facilitate the workload, reduce the risk of work-related musculoskeletal disorders, or gain working productivity and precision of the workforce (Bogue 2018;Baltrusch et al. 2019). Industrial exoskeletons are typically designed for assisting tasks with high repetition, in non-neutral postures (Hensel and Keil 2019), and with heavy workloads. Thus, exoskeletons can be a promising personal measure if other technological (e.g. cranes, power lifts) or organisational techniques (e.g. modified workstations, redesigned work processes, ergonomic trainings) are not applicable (Madinei et al. 2020a;Pacifico et al. 2020). This might be the case for, e.g. highly flexible workplaces (i.e. in terms of location or content), difficultly accessible work locations, individual work-related disorders, or high capital expenditures for conventional approaches.
For meeting specific workplace requirements and characteristics, exoskeletons technically pursue different system philosophies and feature various morphologies, which can lead to diverse physical, psychological, and cognitive effects on the wearer (Fox et al. 2019;Schroeter et al. 2020;Weidner and Karafillidis 2018). To assess these effects, investigators can choose from many objective and subjective evaluation methods.
The (support) performance of exoskeletons is already summarised and contrasted in different systematic reviews. For instance,  scanned 40 papers for actuation type and potential effects on physical workload (period: 01/1995À08/2014). An update with 33 publications (period: 01/2015À06/2020) by Kermavnar et al. (2021) focussed on exoskeletons for back support and summarised study designs, (in)dependent variables, and statistical results. Pinto-Fernandez et al. (2020) identified in 187 publications performance indicators for practical benchmarking lower limb exoskeletons and included motor skills in an overview (period: 01/1989À04/2018). Del Ferraro et al. (2020 dealt with upper limb exoskeletons and summarised investigations on metabolic costs (period: until 09/ 2020).  identified 67 systems with a maturity level to be sufficient for (at least) industrial pilot studies (state: 02/2019) and present a distribution of supported body regions, technical properties, and relational patterns between human and technology (Weidner and Karafillidis 2018).
Thus, this paper focuses on conducted evaluation methodologies in laboratory and field studies with industrial applicable exoskeletonsindependent of supported body parts or research objective(s). The prevalence of each type of analyses combined with its respective research objective(s) is quoted and contrasted in a matrix. This overview should help to emphasise the common variety in evaluations, the focus on specific research objectives, and the potential lack in standardisation in respect to methods, procedures, and considered tasks. In addition, summarised information about prevalence rates or patterns as well as promising individual handlings can be the basis for prospectively deriving best practise approaches and first steps towards harmonised and comprehensive evaluation methodologies. Besides, evaluators can get specific information about common methodologies and inspiration for applicable analyses and research objectives.

Methods
The systematic review was conducted via online research using the database of 'Web of Science Core Collection'. The search considered all types of documents written in the English language, published between January 2000 and December 2020, and cited in the 'Science Citation Index Expanded' or 'Emerging Sources Citation Index'. The study selection process is presented in Figure 1. NH and GP independently screened the identified records and articles. RW respectively resolved any disagreement between the other two authors. All three authors agreed on the final study selection.
Afterwards, the selected studies were summarised, and respective characteristics of the evaluated exoskeleton(s) and test setup were extracted. Here, the following distinctions were made for exoskeletons: maturity level: prototype vs. commercial system, actuation: active vs. passive as well as supported body region: lower limb vs. trunk vs. upper limb.
Accordingly, for test setups: use case: laboratory vs. field, sample: size, gender, average age, height, weight, and health status, applied type of analyses: objective: applied forces, executed errors, fine motor skills, metabolic costs, modelling, movement patterns, muscular activity, system data, testing machine data, walking speed, working speed, subjective: survey and objective or subjective: observation.
Concerning the sample, the data were only statistically considered, if all above mentioned information were given. Previously, any gender specifications were averaged over the total sample. In case of varying samples in a paper, only the biggest sample group was considered. Applied evaluation methods were not further considered if the purpose was principally for designing or constructing the exoskeleton at an early development stage. Finally, each considered method was matched with its respective research objective(s). Here, the following constructs and items (italicized) were distinguished, whereas some constructs cannot be strictly separated from each other (duplications in matrix possible): Work performance principally addresses the productivity with exoskeletons. Common aspects are endurance time, number of errors, number of repetitions in a specific period, and task completion time (Maurice et al. 2020). All these criteria were respectively subsumed under working speed and quality of work. Kinematic aspects comprise principally the movability with the exoskeleton, since the user's range of motion should not be impaired (Baser, Kizilhan, and Kilic 2019) or changed in motion sequences in terms of movement patterns (Baltrusch et al. 2019) or load transfers (i.e. postural control, risk of falling (Maurice et al. 2020)). Ideally, the system features a synchronicity with the user's dynamics (i.e. acceleration and velocity (Yong et al. 2019)) and motion trajectories (i.e. position, direction, and orientation (Li, Yuan et al. 2018)). The latter can also cause discomfort, since, e.g. misalignments between exoskeleton and human can lead to skin irritations or sores (Pacifico et al. 2020). The comfort is crucial for user's health and system acceptance especially in case of longer wearing periods. It can be observed in either general or specific inquiries (i.e. whole body, certain body regions, (specific) working tasks). Since exoskeletons should reduce the user's workload, the relief can be evaluated in either physical (i.e. reduced muscle activity or increased endurance time) or mechanical dimensions (i.e. supporting torques or forces). The support of exoskeletons can also influence the individually perceived task difficulty. Acceptance and usability are two constructs influencing each other and being influenced by many aspects (Hensel and Keil 2019;. In particular, the perceived discomfort might be a major factor for the system acceptance (Luger et al. 2019). In this paper, system acceptance mainly includes concrete questions to the user's voluntary intention to use the system. Usability comprises factors like user-friendliness and the effectiveness, efficiency, and satisfaction of the system. This includes aspects like ease-of-use or effort for donning and doffing. However, reductions in perceived task difficulty might also influence usability ratings.

Results
The systematic review identified 74 studies. The extraction results are summarised in Table 1. Most of the scanned studies were generally performed with humans in laboratories (n ¼ 65). Occasionally, work environments were simulated with real working tools, objects, or motion scenarios. Four studies evaluated exoskeletal prototypes in modelled environments (Aoustin and Formalskii 2018;Nelson et al. 2020;Pan, Gao, and Miao 2014;Yan, Zhang, and Qi 2018) without any sample. Five studies were performed in field        (2009) were used in field, whereas prototypes are predominant in laboratories (ca. 70%). Overall, 45 prototypes and 40 commercial systems were investigated. In terms of the system's characteristics, 59 passive and 26 active systems were evaluated, but only passive systems in field studies. The total amount of systems is higher than the number of scanned papers, since some paper evaluated more than one system (e.g. Alabdulkarim and Nussbaum 2019). 28 systems supported the upper limb, 37 the trunk, 20 the lower limb. The sample was fully described in 48 papers (mean size: n ¼ 13.06 ± 7.27 (males: n ¼ 10.19 ± 6.98; females: n ¼ 2.88 ± 4.12); mean age: 27.05 ± 5.06; mean weight: 73.99 ± 5.20; mean height: 176.51 ± 4.06). In general, the sample was small (6% with less than five subjects, 19% with less than ten subjects), young (79% younger than 30 years), and male-dominated (58% without female subjects, 20% with a gender-balanced sample). The highest sample sizes were found in Luger et al. (2019) with 45 subjects. In more than 82% of all studies with humans (n ¼ 68), it was explicitly mentioned that the tested subjects were healthy or without any musculoskeletal problems. Comparing all field and laboratory studies, the total sample size was 13.60 ± 8.43 and 11.28 ± 7.05, respectively.
Inspired by the illustration of Pinto-Fernandez et al. (2020), Figure 2 summarises the prevalence of different types of analysespartly including several evaluation methodswith its respective research objective.
The last column exhibits the total number of accessible research objectives with a certain type of analyses, whereas the last line reveals existing focus areas. Since physical support systems are an integration of the human, machine, and activity (Weidner, Kong, and Wulfsberg 2013), each type of analyses is also qualitatively assigned to these superordinate dimensions on the left side.
In total, 193 evaluation methods (161 objective and 32 subjective) were used. More than three quarters of all studies applied more than one type of analyses; about 50% of them combined a subjective and an objective method. The physical relief by the exoskeleton was evaluated most of the time, followed by the mechanical support. Further key research objectives were movability, motion sequence, usability, and comfort. The most frequently applied type of analyses were surveys with questionnaires or interviews (e.g. Maurice et al. 2020;Smets 2019) that often assess the perceived physical support, comfort as well as the system's acceptance and usability. Since they were easily adaptable to different research questions, they were used in almost every category except for kinematic aspects. The usability was also usually evaluated with questionnaires, but Baltrusch et al. (2018) used objective observations of functional performance tests. The analysis of muscular activity with electromyography (e.g. Wang et al. 2020;Pillai, van Engelhoven, and Kazerooni 2020) is predominantly applied for evaluating the physical relief for selected (in particular supported) muscle groups. The analysis of movement patterns (e.g. recorded with force sensors (Baser, Kizilhan, and Kilic 2019), inertial measurement units (Maurice et al. 2020), video camera ), optical marker system (d 'Elia et al. 2017)) focussed mainly on kinematic aspects or working speed comparisons. Modelling (e.g. musculoskeletal simulations (Blanco et al. 2019;Nelson et al. 2020;Weston et al. 2018), mathematical/numerical calculations (Aoustin and Formalskii 2018;Han et al. 2020;Pan, Gao, and Miao 2014;Wang et al. 2018) addressed mainly the system's support as well as the movability and motions during its use. Occasionally, input data from sample's anthropometry or force sensors are also considered. Other analyses were more seldomly applied. This includes time impacts on workplaces (Dahmen and Constantinescu 2018), the number of executed errors (e.g. with a detection ring (Alabdulkarim and Nussbaum 2019)), the fine motor skills with a pegboard (Madinei et al. 2020a), applied forces (e.g. with a contact pressure mat , load cell , dynamometer (Li, Yuan, et al. 2018), force plate (Maurice et al. 2020)), metabolic costs (e.g. with ergospirometry (Baltrusch et al. 2019), oxygen consumption (Junius et al. 2018), heart rate (Maurice et al. 2020), blood lactate concentration (Galle et al. 2014), blood oxygenation (Gams et al. 2013), minute ventilation (Gams et al. 2013)), individual working speeds (e.g. walking on a treadmill (Baltrusch et al. 2019)), task execution with maximum acceptable working frequency (Alabdulkarim and Nussbaum 2019)), influences on the work load with sample independent data (e.g. from a humanoid testing machine (Nabeshima et al. 2018;Ito et al. 2018)), or analysis of system's motion synchronicity and cycle stability with a joint simulator (Shamaei et al. 2014). Godwin et al. (2009) and Lotz et al. (2009) applied a testing machine assessing the human's maximum remaining strength after exhausting tasks with or without exoskeletal support. Objective observations with measureable criteria were made one-dimensionally  or multidimensionally (Baltrusch et al. 2018;Kozinc et al. 2020).

Discussion
In evaluation studies, the research objective generally depends on the system's level of development as well as the evaluator's perspective (Hoffmann et al. 2019).
In studies with exoskeletal prototypes, the evaluation purpose is often a 'proof of concept'. Therefore, technical functionalities (i.e. mechanical or physical support to the user, range of motion, usability) are in the centre of interest. Alternatively, field studies that are mainly conducted with matured systems focus more on downstream effects like usability and acceptance. According to the evaluation horizon, exoskeletons should be ideally evaluated in prolonged sessions to reflect a daily use in industry as well as to detect crucial long-term effects like changes in the subjective attitude towards the system as well as possible learning or adaption processes. For instance, users can perceive discomfort in certain body areas due to a poor microclimate on interfaces, increase the total range of motion due to system fit adjustments (Smets 2019), improve the handling and consequently the task execution (Sadler, Graham, and Stevenson 2011), or optimise the self-selected support level (Hensel and Keil 2019). However, the evaluation of short-term exoskeletal effects is still predominant, particularly in laboratory studies. Possible reasons might be limited resources and increased practicability Rashedi et al. 2014). Thus, shortterm results are sometimes carefully interpreted for a longer timespan (e.g. Pacifico et al. 2020). Due to the selected search pattern in terms of, e.g. database, time horizon, and search terms, the presented matrix is neither comprehensive nor universally applicable. For instance, Spada et al. (2017) assess the task performance, endurance time, and perceived discomfort in a field setting with a large representative sample (n ¼ 31, age: 51.5 ± 4.7 years). Schroeter et al. (2020) focus on cognitive effects (i.e. ability to concentrate, proneness to errors) in field caused by physical support of an exoskeleton. In our filtered data set, Maurice et al. (2020) and Li, Cheng, et al. (2018) are the only ones investigating effects on mental support.
In the following, several other aspects and differences for evaluating industrial exoskeletons are objectively discussed concerning testing procedure, sample as well as used methods and criteria. By exemplifying certain publications, the authors of this article never intend to personally address or challenge any author(s) or scientific study. Finally, suggestions for evaluating industrial exoskeletons in a harmonised and comprehensive way are presented.

Testing procedure
It becomes apparent that testing procedures are generally inconsistently documented and processed. For instance, it is assumed that all authors have conducted an individual fitting of the exoskeleton and a familiarisation of the subject with the test setup (i.e. system, tasks, evaluation method) before starting the study, but this is not always explicitly written in the paper. Otherwise, questionnaires with complicated scale classifications submitted to inexperienced young participants might not provide accurate results (Pacifico et al. 2020).
For deriving usage effects, test subjects usually perform different tasks with the exoskeleton (and often for benchmarking without the system). The tasks are often reduced from a possible large scope of activity to selected core activities of an end-user. Static tasks are held in predetermined positions for a longer period (e.g. 30 s), whereas dynamic tasks often show a repeated motion range and a neutral position. Both types are frequently varied in (end-)position or weight to increase industrial representativeness, but without any standardisation along the above-mentioned studies (also when using the same exoskeleton). However, more cycles, longer static postures, or other motions could also appear in real industrial applications, particularly by considering side tasks like walking or resting. In practise, the task selection for studies is often (pre-)determined by the study environment (e.g. Hensel and Keil 2019), the chosen research objective(s) and evaluation method(s), and the anticipated support region of the exoskeleton (i.e. lifting tasks for exoskeletons of the trunk (e.g.  or overhead tasks for exoskeletons of the shoulderneck region (e.g. Otten, Weidner, and Argubi-Wollesen 2018).
Since results naturally depend on the test setup, it must be mentioned that authors differently specify task executions and exoskeletal support levels. For instance, task velocity formulations can vary from as fast as possible (Maurice et al. 2020), self-selected paces (Pacifico et al. 2020) to given executions within a certain period (Hefferle et al. 2020). Other evaluators only entitle the task (e.g. Pillai, van Engelhoven, and Kazerooni 2020), which simulates real working conditions in a better way but hampers the comparability between different test subjects (Pillai, van Engelhoven, and Kazerooni 2020). Furthermore, most authors do not declare the respective support level that is adjustable for many active (and some passive) systems, although the adjusted support level can influence evaluation results (e.g. reduced electromyography can increase work endurance time (Gillette and Stephenson 2019;Bosch et al. 2016), but more support does not necessarily imply a higher system acceptance (van Engelhoven et al. 2019)). For instance,  or Madinei et al. (2020a) let test subjects to individually select their support level, which can naturally vary between subjects (Pillai, van Engelhoven, and Kazerooni 2020). Alternatively, Pacifico et al. (2020) set the support to the minimum, whereas Maurice et al. (2020) individually adapt the support referring to the user's arm weight. Practically, any declaration of support can be difficult for evaluators since support levels are usually quoted in different ways. Alternatively, Sadler, Graham, andStevenson (2011) andLotz et al. (2009) adjust the respective task load in accordance with the previously measured user's strength characteristics. Graham, Agnew, and Stevenson (2009) digitise photos of subjects and use a biomechanical computer program to calculate individually required forces of the actuator.
Furthermore, some authors only evaluate a subsystem to reduce the development effort. For instance, Dong et al. (2019) evaluate an exoskeleton for gait assistance only on the right shank and place an additional weight as a dummy for counteracting imbalances on the other shank. Wang et al. (2018) focus on effects according to the range of motion if exoskeletal degrees of freedom are reduced, but solely construct a kinematic version of the system without any actuation.

Sample
Samples are generally inconsistently described, since some authors only mention the total sample size, whereas others typically add several more distributional details for age, weight, height, and gender. In some descriptions, working experiences or arm dominances are additionally considered, since posture, fine motor skills, and manageable tool mass can heavily vary in these cases (e.g. van Engelhoven et al. 2019;Maurice et al. 2020). Alternatively, Blanco et al. (2019) and Pacifico et al. (2020) consider the arm dominance in the task execution. However, the test subjects of Maurice et al. (2020) always use the right arm. Finally, authors should weigh which sample description is really needed, since the total number of participants can be sufficient for specific questionnaires (see, e.g. Otten, Weidner, and Argubi-Wollesen 2018). For modellings, samples are normally not needed (e.g. Yan, Zhang, and Qi 2018). Here, sample information is only given if these data are used as input data for equations or model building (e.g. Weston et al. 2018). Besides, sample sizes can vary between conducted methods (e.g. Otten, Weidner, and Argubi-Wollesen 2018;Galle et al. 2014) or different phases of intervention (Smets 2019;Kang, Hsu, and Young 2019).
Various authors (e.g. Madinei et al. 2020a;Maurice et al. 2020;Pacifico et al. 2020) concede limitations for the representativeness of their tested sample, since test subjects are often sparse, young, male-dominated, without pre-existing diseases as well as inexperienced with the executed manual tasks and the handling of the exoskeleton. This can have various practical reasons and is differently argued. For instance, evaluators can disclaim any risk by considering only healthy individuals. Initially planned test subjects can be finally extracted due to, e.g. massive system aversion or discomfort, medical backgrounds, unsuitability of the system or the method to the workplace (Hensel and Keil 2019) or the user (e.g. unusual anthropometry, skin wounds close to the interface (Pacifico et al. 2020), no adhesion of electromyography electrodes on user's skin due to sweating (Pillai, van Engelhoven, and Kazerooni 2020)). Furthermore, the male domination in samples can be appeased by the facts that industrial and physical demanding workplaces are primarily occupied by men, the design and size of certain exoskeletons might not be generally suitable for female anatomy (Luger et al. 2019;Hensel and Keil 2019), difficulties appear in obtaining approvals by the Investigational Review Board for female participants (van Engelhoven et al. 2019), or females are unable to complete all experimental conditions (Rashedi et al. 2014). Elsewhere, solely female (Galle et al. 2014;Godwin et al. 2009) and gender-balanced samples appear (e.g. Madinei et al. 2020a;Sadler, Graham, and Stevenson 2011). Here, gender effects on, e.g. task completion time (Madinei et al. 2020a) or handled tool mass  can be carefully derived.

Methods and criteria
Authors used different methods in their studies for various purposes like system design, assessment of usage effects, or observation of task executions. For instance, systems can be designed with the help of motion capture and fitness equipment (Li, Yuan, et al. 2018), kinematic models (Yan, Zhang, and Qi 2018) combined with finite elements analysis (Yong et al. 2019), or human simulations based on motion capture (Blanco et al. 2019). Because of possible multidimensional exoskeletal usage effects, authors often consider more than one evaluation criterion by combining different evaluation methods. Here, questionnaires are particularly easy to additionally implement in studies. Some authors also try to investigate statistical dependencies between evaluation criteria like muscular activity in supported body regions, error proneness, and working speed Nussbaum 2019), reductions in physical demand, discomfort, usability, andintention-to-use (Hensel andKeil 2019), reductions in muscular activity or reduced muscle fatigue and usage decision or system acceptance (Gillette and Stephenson 2019). Other authors multidimensionally use or interpret the results of electromyography for, e.g. calculating supporting moments (Koopman, N€ af, et al. 2020), metabolic costs (Grazi et al. 2020), mean user's power frequency (Rashedi et al. 2014), or the system's efficiency (Yong et al. 2019) as well as detecting changes in working speed (Baltrusch et al. 2018) or in motions (Madinei et al. 2020a). The analysis of more muscle groups than only the directly supported ones can help to detect a physical burden as well as to derive further comfort and usability issues (Maurice et al. 2020;van Engelhoven et al. 2019;Weston et al. 2018). However, certain muscle groups might not be accessible due to overlying exoskeletal components (Alabdulkarim, Kim, and Nussbaum 2019;Madinei et al. 2020b). Other authors observed the correct task execution with, e.g. motion capture or force sensors to delete any outliers in the sample data for further analyses Weston et al. 2018). Alternatively, a video camera (Hefferle et al. 2020) or auditory feedbacks ) can be used.
Finally, it must be mentioned that several evaluation artefacts may occur, since sensors may deliver wrong data, questionnaires may be prone to personal preferences as well as both objective and subjective methods may depend on the interpretation of evaluators (Hoffmann et al. 2019). It also needs to be emphasised that most scanned evaluation studies do not consider any safety aspects to eliminate hazards to users (e.g. climbing stairs, working with certain machines). In the future, evaluations might intensify clarifications for this case as well as according to longterm effects (Theurel and Desbrosses 2019) with resulting recommendations to the wearing time, the overall dimensions of the user when wearing the exoskeleton, the connectivity with personal protection equipment and work tools (e.g. forklift trucks), the usability in specific working or climate environments or during side tasks, the maintenance and cleaning effort, possible damping effects for work with vibrating tools as well as the influence of physical support on mental relief and of different (industrial) implementation strategies on later system acceptance. Even in laboratory settings, more practical insights might be derivable if work-experienced subjects are considered (see (Koopman, N€ af, et al. 2020)).

Suggested improvements
Based on the findings of this review and the previous discussion with referring to different study examples, it becomes obvious that industrial exoskeletons need to be generally evaluated in a more harmonised and comprehensive way. Prospectively, established guidelines or standardizations in processing, parametrising, and documenting might generate harmonised evaluation results and enable knowledge-or artificialintelligence-based system improvements by identifying central exoskeletal core properties, user-or application-related system requirements, and best practise system developments. Due to detected interdependencies between some evaluation criteria, comprehensive exoskeletal evaluations might be prospectively reducible to selected research objectives and evaluation methods.
Since results naturally depend on the initial situation, it is generally recommended to familiarise test subjects with the exoskeleton, the tasks, and the applied evaluation methods before starting the evaluation procedure. Exoskeletons should also be individually adjusted in terms of fitting and support characteristic. Task execution should consider the individual handedness. The test subjects should represent end-users and use cases (industrial main task with side tasks) as best as possible. Concerning the evaluation itself, Figure 3 summarises six recommended evaluation aspects for deriving the exoskeletal performance and workplace suitability, taking up the personal experiences of the authors from several field and laboratory studies as well as investigations concerning different research projects like 'smartASSIST (2014-2020)' and 'exo@work' (since 2018). It is based on the following four principles: 1. Consideration of the unit human, support, and activities: In terms of exoskeletons, the humanmachine-interaction should be evaluated by considering the activity as well (see (Weidner, Kong, and Wulfsberg 2013), since all three dimensions influence each other in support situations. In an occupational context, the respective dimensions are characterised by the user (e.g. with physiological precondition, personal technical affinity, pre-existing diseases), the support/exoskeleton (e.g. support characteristic with, e.g. maximum support level, system handling, path of force, overall dimensions), and the industrial workplace (e.g. with profile of main and side tasks as well as spatial, climatic, and social working environment). 2. Harmonised methodology across studies and investigations: The applied types of analyses (i.e. surveys, (biomechanical) analyses, and modelling) should be expanded as well as synchronised or harmonised with their respective research objectives to a profound comprehension of the exoskeletal technical functionality and performance.
A multidimensional perspective with both objective and subjective evaluation methods is beneficial. For instance, Maurice et al. (2020) suggest evaluating objectively the posture, movement, and effort as well as subjectively the user's perception and system acceptance. Hefferle et al. (2020) propose to evaluate a physical relief objectively with different local and global methods combined with a general subjective evaluation. Certain research objectives should be repeatedly evaluated with a time shift of, e.g. four to six weeks to reveal learning or adaption process of the user to the system itself or the workplace. 3. Harmonised test cases: Test parcours should be applied with test rows consisting of multiple work tasks with varying tools and motions (see as an example for load handling (Bostelman et al. 2019)). Any standardizations can generally increase the transparency and comparability of the systems' performance. This implies also comprehensive task description in terms of, e.g. frequency, velocity, rang of motion, or weight. It is also recommended to observe and record (correct) task executions to exclude outliers and ease the data interpretation. Intermediate pauses between test cycles and randomisation of test orders (i.e. tasks, use of exoskeleton) can prevent test effects. Since movements with the biggest support of the exoskeleton are usually only a small percentage in a total work shift, evaluations should also include side tasks. Furthermore, several exclusion criteria for a principal applicability of the system in a targeted use case can be usually quickly checked with common sense. This should be conducted first and before any elaborate evaluation. 4. Objectivity and general validity due to standardised testing machines: Sample-independent evaluation methods (e.g. testing machines with internal measurement sensors, modelling) should be used due to their objectivity and general validity. Inspired by a testing machine for lifting tasks (Nabeshima et al. 2018), a similar machine seems to be feasible to tasks at head level or above, since exoskeletons for upper limbs were predominantly evaluated in the scanned papers with overhead drilling tasks and a 90 angle in shoulder and elbow, respectively. Derived exoskeletal data sheets can also help to interpret the recorded data in biomechanical analyses.

Conclusion
The systematic review identified 74 papers and emphasises that previously applied methodologies for evaluating exoskeletons with industrial application are generally neither comprehensive (Maurice et al. 2020) nor consistent (de Looze et al. 2016;Kim et al. 2018). This might have different practical reasons but also biases evaluation results (Hefferle et al. 2020) and hampers any iterative system optimisation when considering, e.g. the transdisciplinary development approach for wearable systems by Weidner et al. (2017). Support situations are always individually formed by the complex interaction of the user, the exoskeleton, and the industrial workplace with their specific characteristics. However, the user can be considered as the crucial link between the two other dimensions and should be in the centre of interest ). Thus, system developers or manufacturers should continually integrate the targeted enduser in system development processes or should at least conduct comprehensive and harmonised evaluations by considering as many aspects suggested in Figure 3 as possible. However, even comprehensive evaluations cannot consider all real application circumstances in advance (Hoffmann et al. 2019). Thus, industrial exoskeletons probably remain a case-by-case treatment, which can lead to guided and individual self-evaluations of end-users. In this context, Ralfs, Hoffmann, and Weidner (2021) present an approach of a decision support matrix by merging task properties, work profiles, and exoskeletal characteristics in every single use case. This should sensitise end-users for a multidimensional decision perspective and finally enable an objective comparability and selection of systems for individual application cases.

Disclosure statement
The authors declare that they have no competing interests and no conflicts of interest with respect to this authorship or the publication of this article. The authors are solely responsible for the manuscript content.

Funding
Parts of this research are funded by the German employers' liability insurance association (BGHW) in the project 'Exo@Work -Influences of Exoskeletons on the workplace'.