A versatile mechanized setup for controlled experiments in archeology

ABSTRACT Experimentation has always played an important role in archeology, in particular to create reference collections for use-wear studies. Different types of experiments can answer different questions; all types should therefore be combined to obtain a holistic view. In controlled experiments, some factors are tested, while the other factors are kept constant to improve the signal-to-noise ratio. Yet, controlled experiments have been conducted with variable degrees of control. Although they seem decoupled from archeological applications, mechanized experiments and the robust causal relationships they measure are critical to answer archeological questions like understanding the processes of use-wear formation. Here we introduce the concept behind using the SMARTTESTER®, a modular material tester, and we present four different setups (linear, rotary, percussion and oscillating) and their potential archeological applications. Such experiments will contribute to our understanding of causality in human tool use.


Introduction
Experimentation has played an important role in archeology for decades (Bradfield 2016;Coles 1979;Eren et al. 2016;Lin, Rezek, and Dibble 2018;Outram 2008). Experiments can be classified into different categories, and each category can answer a different type of question. Here, we provide the reader with a brief overview of the topic relevant to controlled and mechanized experiments; for extensive discussion, we refer to some recent detailed reviews (e.g. Eren et al. 2016;Lin, Rezek, and Dibble 2018;Marreiros et al. https://doi.org/10.1007/s41982-020-00058-1).
Pilot or first generation actualistic experiments allow the identification of the factors influencing the process(es) of interest and their correlations, while controlled or second generation experiments are necessary to disentangle the problems of equifinality and to build cause-effect models (Lin, Rezek, and Dibble 2018;Marreiros et al. accepted).
Controlled experiments refer to experiments in which some factors are tested (i.e. vary between different states), while other factors are controlled (i.e. kept constant). This experimental control leads to an increase in the internal validity of the experiment (Eren et al. 2016;Lin, Rezek, and Dibble 2018;Lycett and Eren 2013;Marreiros et al. accepted). Unfortunately, there is no definition of how controlled an experiment must be to be considered controlled.
Furthermore, it is virtually impossible to control all parameters (e.g. Bebber and Eren 2018). Consequently, controlled experiments have been conducted with variable degrees of control (Lin, Rezek, and Dibble 2018) depending on the study's research questions (compare for example Pedergnana and Ollé 2017;Pfleging, Iovita, and Buchli 2019). If some factorswhether known but ignored, or unknownare not controlled, they become confounding factors and will potentially blur the signal with noise (Lin, Rezek, and Dibble 2018;Marreiros et al. accepted). For example, this might have contributed to the moderate classification success (67% overall) of Ibáñez, Lazuen, and González-Urquijo (2019); better control of raw material, action and duration would, in all likelihood, have increased the classification rates (see also Barceló and Pijoan-Lopez 2004;Barceló, Vila, andGibaja 1996, 2015). In order to achieve such strict control necessary to discover and measure robust causal relationships, some degree of mechanization cannot be avoided.
Critics of mechanical experiments argue that the latter cannot improve our archeological knowledge (e.g. Rots and Plisson 2014). However, as mentioned above, such experiments are necessary if we want to build causal models. This type of model might seem far from archeological concerns, but we argue that it is critical to answer some archeological questions. Correlations are limited in their explanatory power because one or several other factors might drive the correlation, because of directionality issues, or just because of coincidence. Therefore, a simple correlation cannot be used to interpret data with a high degree of certainty; only cause-effect relationships can be used to make more definitive statements thanks to their high internal validity (Lin, Rezek, and Dibble 2018;Marreiros et al. accepted).
With this in mind, controlled, mechanized experiments are not the solution to all problems either. They can only build causal relationships between a few factors and the observations in laboratory conditions, i.e. unreal, contrived (sensu Lycett and Eren 2013) conditions where variability is excluded (Lin, Rezek, and Dibble 2018;Pettigrew et al. 2015). In doing so, the external validity of controlled experiments is limited, potentially leading to distorted archeological interpretations. In Eren et al. (2016)'s words, "experimental control is a strategy in which any perceived benefit from one degree or kind of control necessary comes with an unavoidable cost" (106-107). This is why, once causal relationships are built, third generation actualistic experiments (sensu Marreiros et al. accepted), must be conducted in order to increase the external validity (sensu Lycett and Eren 2013; but see Lin, Rezek, and Dibble 2018). Third generation experiments do not have to be strictly controlled, but a whole range of parameters can be measured (e.g. Gaudzinski-Windheuser et al. 2018;Key et al. 2015Key et al. , 2017Milks et al. 2016;Pfleging et al. 2015;Schmitt, Churchill, and Hylander 2003;Stemp, Morozov, and Key 2015). Third generation experiments have sometimes shown that the models built through controlled experiments cannot be transferred to real situations (e.g. Lipo et al. 2012;Pettigrew et al. 2015), calling for more experiments. In such cases, it is likely that the controlled experiments either missed some important parameters (confounding factors; Lin, Rezek, and Dibble 2018), or that the models were extrapolated to situations where other factors are at play. In sum, every generation of experiment is relevant for a specific goal and the three generations are complementary to interpret the archeological record, but the strengths and limits of each type of experiment must be explicitly acknowledged (Eren et al. 2016;Lin, Rezek, and Dibble 2018;Lycett and Eren 2013;Marreiros et al. accepted;Pettigrew et al. 2015).
In addition, an important difference between mechanized and manual experiments is that a machine cannot (yet) be programmed to achieve a specific goal (e.g. scrape the fat off a hide until the whole fatty layer is removed, without damaging the hide); it can only perform an action to predefined settings/parameters (e.g. given force, duration …). The experimenter must define these settings so that the goal can be achieved to an acceptable degree. In turn, this means that any mechanized setup must be able to vary at least within the appropriate range of the factors of interest. For example, it must be able to apply forces including the range of past humans.
Here we present a modular setup that is highly controlled and can be used in different configurations for different purposes. We intend to explain the concept behind adapting a machine designed for quality control within industrial manufacturing to answer archeological questions. In addition, we supply all technical details and specifications of the different configurations for this machine, which is beyond the scope of a technical appendix. The present paper therefore provides a conceptual and technical framework for future work conducting experiments with this machine.

Comparison of mechanical designs
In this section, we provide a short overview of the three types of mechanical design available, as well as their pros and cons.
Custom, "home-made" designs (e.g. Collins 2008; Dibble and Rezek 2009;Martisius et al. 2018) are usually much cheaper and can be tailored to specific needs. However, they are dependent on the knowledge and workshop capabilities of the person who builds the machine. Finally, they are mainly designed to perform one task, so they are rarely versatile.
Collaborative robots (Pfleging, Iovita, and Buchli 2019; Schmidt et al. 2019) have a varying number of articulations to produce a wide range of movements at the tool end, i.e. where the sample is attached. Most, if not all, of these robots can be programmed to reproduce human-like movements. Many sensors are available for these robots; they are either incorporated in the joints or can be added between the tool end and the tool itself. Because these robots are designed to be operated together with humans (hence the term "collaborative"), they feature safety protocols so that the risk of injury is greatly reduced. Nevertheless, this type of robot has limitations that must be kept in mind for highly controlled experiments. Indeed, the movements are not as precise as one would expect from a robotic action. For example, the command to produce a straight movement will not produce a perfectly straight line, as our preliminary tests on the UR5 (Universal Robots, Odense, Denmark) show (see also section 3.2; Supplementary Material 1[https://doi.org/ 10.5281/zenodo.3752681]). In technical terms, the parallelism is in general lower than for linear drives. While the force applied to the tool can be measured by different types of sensors, the control of this force is limited too, particularly if the robotic arm is fully stretched: the maximum force cannot be applied in this situation and so the applied force might vary along the movement. The successor UR5e is apparently able to correct this issue. Another limitation of collaborative robots is that they are not designed for repetitive percussive actions or impacting tasks, which damage the joints. Last but not least, these collaborative robots are not cheap, although their prices vary a lot between manufacturers and models.
Here we present one material tester, the SMART-TESTER®, which is, to our knowledge, the only modular system available, which allows large movements in all directions. It can therefore be used for a wide range of applications, as detailed in the next section.

Smarttester
This section details the overall, basic construction of the SMARTTESTER®, manufactured by inotec AP GmbH (Wettenberg, Germany), as well as the different setups that can be assembled. The specifications of all components are also detailed here.

Basic construction
The SMARTTESTER® hardware is composed of drives and sensors connected to a central controlling/computing unit. The central unit can either be included in a mobile tower (so-called "rack" version; Figure 1) or table ("rig" module), and is powered through 380 V three-phase alternating current. Both electric and pneumatic drives can be connected as long as the corresponding controlling modules are incorporated in the central unit. Movement length, speed and acceleration of electric drives can be more precisely controlled, while pneumatic drives can move much faster and in general with greater force (depending on the pressure). Digital and analog sensors can be controlled and read. These include, but are not limited to, force, torque and distance sensors. All drives and sensors can be mounted either directly on the table or on any other multi-purpose platform with the help of standard building and fixing elements.
The operating software features a graphic user interface. The machine is mainly programmed with the help of the touchscreen: programming elements (drive and sensor actions, loops, conditions …) can be dragged into the scripting window in order to create a testing experiment (Figure 2

Linear setup
In this setup, three linear drives are mounted so that the tool and worked material can be moved in linear movements along three directions: the linear drive #1 moves the tool along the X (horizontal) axis, linear drive #2 moves the worked material along the Y (horizontal) axis, and linear drive #3 lifts and lowers the tool along the Z (vertical) axis (Figure 3; Script #1 of Supplementary Material 2, and Supplementary Material 4 [https://doi.org/10.5281/zenodo.3752681]). Thanks to this combination, the tool can be used in uni-or bidirectional movements (depending on whether the tool is lifted between the strokes or not), either always in the same track, or on a new surface of the worked material thanks to linear drive #2. The way the tool and worked material are fixed and held depends on their shapes; holders usually need to be adapted for each new tool/material. Currently, we use the following design. The tool is attached onto a custom sample holder (Figure 3a-b), which can be oriented in all directions to vary the orientation of the edge relative to the movement and the attack angle. Calibrated weights of up to 12 kg (2 × 6 kg) can be attached to adjust the force applied onto the tool. The worked material is fixed on a supporting table, itself placed on guide-rails with rollers so that friction is limited. The whole construction is fixed on the linear drive #2.
Three sensors are mounted on this setup: the force sensor #1 measures the force applied onto the tool by the weights (Z axis), while the force sensor #2 measures the friction due to contact between the tool and worked material along the X axis, measured on the table on which the worked material is fixed. The distance sensor is used to measure the penetration depth of the tool into the worked material. The distance measured is the sum of the loss of material from both the tool and the worked material. Finally, a vacuum cleaner can be connected to a small tube mounted in front of the sample; this allows removing the chippings or abraded material produced during the experiments between the strokes. This setup can be used for archeological experiments testing hypotheses about cutting (Figure 3c), scraping and scratching/engraving tools. The effects of applied force, speed, acceleration, movement length, duration, and attack angle can all be tested individually or in combinations with this setup. Therefore, it allows the testing of many hypotheses. Additionally, this setup can be used to test material properties in e.g. scratch and frictions tests, similar to Astruc, Vargiolu, and Zahouani (2003).
Our preliminary tests comparing the UR5 as an example for a collaborative robot and the SMARTTES-TER® in terms of parallelism (Supplementary Material 1 and Script #5 of Supplementary Material 2) emphasize three noteworthy differences. First, the parallelism tolerance of the SMARTTESTER® is low for the linear drives (± 10-20 µm; Supplementary Material 3), producing a very straight movement. However, the complicated design for the sample holder probably added some play, so that stick-slip phenomena (e.g. Popp and Stelter 1990) might have led to undulations at the beginning of the movement. On the other hand, the scriber was attached directly to the tool end of the UR5 and such play was prevented. Yet, the movement produced with the collaborative robot was not a straight line. Moreover, stick-slip phenomena led to jumps and materialized with discontinuous grooves at the beginning of the movements. Second, even though the force applied should have been almost identical (50 N = 5.10 kg with the UR5 vs. 5 kg with the SMARTTESTER®), the width of the single grooves are much wider with the SMARTTESTER® (407 vs. 814 µm). There could be two non-mutually exclusive reasons for this discrepancy. (1) We used two different scribers and it could be that the scriber used with the UR5 had a much sharper tip. This should be easy to test by repeating the experiment using the same scriber with both machines. (2) The force could be applied in different ways by each machine: the programmed command to apply 50 N on the tool end with the UR5 might not lead to a full 50 N applied onto the tool, while the physically attached 5 kg dead weights are fully applied onto the tool with the SMARTTES-TER®. This could be easily tested by replacing the worked material (aluminum plate here) with a force sensor to measure the force actually applied. Third, the width of the grooves increase more due to repeated strokes with the UR5 (407 vs. 587 µm, i.e. 44% increase) than with the SMARTTESTER® (814 vs. 961 µm, i.e. 18% increase). This could be due to a higher positional repeatability of the SMARTTESTER® (±30 µm for the linear drives) as compared to the UR5 (±100 µm).

Rotary setup
This setup is composed of a rotary drive to rotate the first part of the system (anvil in Figure 4 and Supplementary Material 5). The second part of the system (tool in Figure 4 and Supplementary Material 5 [https://doi.org/10.5281/zenodo.3752681]) is attached to a plate sliding freely in vertical rails; weights can be mounted to adjust the force applied onto it. The tool holder used in the linear setup, with its force and distance sensors, can also be used, although not shown here. For this setup too, sample holders have to be adapted to the shapes of the tools and worked materials.
Experiments related to ground and grinding stones (Figure 4; Script #2 of Supplementary Material 2, and Supplementary Material 5) can be conducted with this setup, testing effects related to force, speed, torque and duration. Additionally, this setup can be transformed slightly to test material properties in experiments similar to pin-on-disk setups (e.g. Davim and Marques 2004;Zdero, Guenther, and Gascoyne 2017). In this scenario, the tool would be a standard material of known shape and properties (e.g. tungsten carbide ball or pin) that would be applied onto the materials of interest (e.g. flint, limestone) with a given force. The loss of volume/weight from each material of interest can be compared to measure their relative wear rates and scratch resistance. In addition, any other material pairing inspired by/taken from archaeological evidence may be addressed in this way (e.g. organic-inorganic, etc.).

Percussion setup
Here, the rotary drive is used to rotate a snail/drop cam that will periodically lift and drop the tool onto the worked material (anvil in Figure 5 and Supplementary Material 6[https://doi.org/10.5281/zenodo.3752681]). The tool is attached to a sample holder, which freely slides in vertical rails ( Figure 5 and Supplementary Material 6). The dropping height can be adjusted either by moving the drive up or down relative to the worked material, or by using cams of different sizes. Weights can be added onto the sample holder in order to increase the impact force applied to the worked material. A piezoelectric press force sensor (Kistler Instrumente GmbH, Sindelfingen, Germany) is positioned below the worked material to measure the impact force when the tool hits the worked material. The impact force readings are precise within a given mechanical setup. Nevertheless, they are not absolute values; they depend on force dissipation within the system designed, for example, the constructional elements connecting worked material and sensor, or the  overall design of a sample holder on vertical slide rails. In other words, even a minor modification of such a mechanical setup may affect sensor values. Therefore, the readings obtained from different setups can only be roughly comparable.
This setup serves as a mechanized proxy for several archaeological tool-related activities, for example pounding (e.g. to break bones in order to extract the marrow; Figure 5; Script #3 of Supplementary Material 2, and Supplementary Material 6) or knapping. Force applied, dropping height, impact force, impact point and contact angleamong othersare all parameters relevant to understand the processes of bone or stone fracturing.

Oscillating setup
In this setup, the rotary drive, with the help of a kind of crankshaft, moves a tray back and forth along the X axis. The tray slides freely in rails (the same used in the rotary and percussion setups). It is filled with sediment (e.g. sand). Samples (e.g. flint tools) can be suspended, buried within or placed on top of the sediment (Figure 6; Script #4 of Supplementary Material 2, and Supplementary Material 7[https://doi. org/10.5281/zenodo.3752681]).
The travel range of the tray and the speed of the movements can be adjusted by changing the diameter of the rotating disc, the length of the arm connecting the drive to the table and the speed of the drive. This setup mimics the action and functionality of commercially available Oscillating Abrasion Testers (e.g. TABER®) and allows the assessment of material properties according to existing standard protocols. Furthermore, hypotheses related to postdepositional processes due to sediment movement can be tested.

Conclusions and future developments
Here we have presented the concept of employing this very versatile mechanical system to test a multitude of hypotheses with many archeological applications and implications. Drives, sensors and building elements can be combined in many ways to achieve these diverse goals, which would normally require several pieces of equipment. We must stress that the material tester presents a significant investment and requires an adequate budget; its price is likely much higher than several custom-made designs together but all electric and mechanical components conform to stringent tolerance quality controls prevalent in the manufacturing industry. As such, it has the advantage that all setups are integrated and controlled in the same way. Therefore, the adaptation of the machine to new applications is greatly facilitated and most elements can be re-used in different ways. This means that new applications can be designed with few extra costs.
This setup is used to run second generation, controlled experiments. This type of experiment cannot be directly transferred to interpret the archaeological record, but is meant to mechanistically, causally link an action to a pattern. However, it is important to remember that the results of such experiments are only valid within the system; extrapolations should be cautious, especially when external variability is low (sensu Lin, Rezek, and Dibble 2018). Nevertheless, the combined results from several well-designed experiments can lead to a thorough, mechanism-oriented understanding of the system that can be applied to a wider context thanks to its increased external validity while keeping internal validity high (sensu Lin, Rezek, and Dibble 2018).
Eventually, the collaborative robotic arms UR3 or UR5 (Universal Robots) can also be connected and controlled with the SMARTTESTER® programming software. The human-like, but controlled and repeatable, movements of the robotic arm will allowto a degreethe replication of actions and uses of past humans. These experiments will pave the way toward actualistic, third-generation experiments, where variability of human movements will be taken into account to understand the full variability of human actions and behaviors.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This research has been supported within the Römisch-Germanisches Zentralmuseum -Leibniz Research Institute for Archeology by German Federal and Rhineland Palatinate funding (Sondertatbestand "Spurenlabor").

Notes on contributors
Ivan Calandra is a post-doc at the TraCEr lab. He is a paleontologist specializing on surface texture analyses of mammal teeth and archeological artifacts.
Walter Gneisinger is the lab technician of the TraCEr lab. He is the person maintaining, setting up and operating the SMARTTESTER®.
Joao Marreiros is the head of the TraCEr lab. He is a trained archeologist on lithic use-wear analysis.