The field becomes the laboratory? The impact of the contextual digital footprint on the discipline of E/HF

Abstract The increasing prevalence of affordable digital sensors, ubiquitous networking and computation puts us at what is only the start of a new era in terms of the volume, coverage and granularity of data that we can access about individuals and workplaces. This paper examines the consequences of harnessing this data deluge for the practice of E/HF. Focusing on what we term the ‘contextual digital footprint’, the trail of data we produce through interactions with many different digital systems over the course of even a single day, we describe three example scenarios (drawn from health care, distributed work and transportation) and examine how access to data directly drawn in considerable volume from the field will potentially change our application of design and evaluation methods. We conclude with a discussion of issues relevant to ethical and professional practice within this new environment including the increased challenges of respecting anonymity, working with n = all data-sets and the central role of ergonomists in promulgating positive uses of data while retaining a systems-based humanistic approach to work design. Practitioner summary: The paper envisions the impact of new and emerging sources of data about people and workplaces upon future practice in E/HF. We identify practical consequences for ergonomics practice, highlight new areas of professional competence likely to be required and flag both the risks and benefits of adopting a more data-driven approach.


Introduction
As an applied science, Ergonomics and Human Factors (E/HF) has traditionally used data obtained from experimental and real-world settings to inform our understanding of the way in which people work, and thus the way in which we should design work systems, technologies and environments. While there has often been vigorous debate regarding what form these data should take and how and where it is collected and analysed (see Wilson and Sharples 2015a), a general consensus exists that good ergonomics will tend to take a strong focus on investigating the domain of interest and collecting relevant information about activities within that domain, whether in the field or perhaps through relevant laboratory work. Commensurate with this, considerable effort within the discipline has been devoted to the development of an extensive range of data collection and analysis methods (e.g. Salvendy 2012; Stanton et al. 2013;Wilson and Sharples 2015b).
It is well-recognised, and indeed, considered a matter of pride, that as new technologies have appeared and society itself has changed through aspects such as increased automation, the appearance of the service industry sector, globalisation or the emergence of the environmental sustainability agenda, E/HF has responded by extending the scope and nature of its domain interests. However, it is perhaps less often noted that these same changes have also considerably altered the practice of the discipline itself (although see Moray 2008). It is clear that, for example, the availability of desktop computers has radically changed the ease with which E/HF laboratory experiments can be undertaken; similarly, advances in visualisation and communications technology, as well as development of advanced data analysis tools, has put complex simulations and statistical analyses within the reach of nearly all practitioners.
Today, the most pervasive changes in technology and perhaps society centre on the emergence of the practical implications of widespread networked computation (National Research Council 2014). The advent of mobile and ubiquitous technologies and novel, embedded sensing technologies, alongside distributed data storage, has contributed to the development of the concept of the 'contextual digital footprint' . The contextual digital footprint can be described as the data which we produce OPEN ACCESS where we were born, where we went to school and who the members are of our social networks. We might share our location, distributing information about our travelling preferences, choice of leisure activities or consumer selections. We might also likely interact with commercial systems such as shops or online banks. The 'Future Identities' Foresight report by government office For Science 2013 refers to the 'wealth of personal data which can be mined' .
In our working lives, data about our movements may be collected or sensed to influence the temperature of the buildings we work in, allow us entry to secure areas of work, or provide us with access to IT such as printers. In addition, formal records of our working lives will record our continuous personal development, safety training, pay level and sickness record. other formal service providers such as our doctors' surgeries, utility companies and transport organisations will also hold information about our health, habits and behaviours.
We term these data 'contextual digital data' . These data are both rich and imperfect. They represent a tremendous set of business opportunities, and are already used in some forms to support applications such as personalised marketing campaigns. There are also examples of the creeping unification of these data sources -for instance, there have been debates about whether social media information should be referred to when considering an individual for a new employment role, and court cases have referred to social media sources when determining a person's eligibility for disability benefits (The guardian 2012).
As E/HF specialists, we are concerned with how systems that we use in our work and lives are designed in order to ensure comfort, satisfaction, usability and effectiveness. Beyond the wider ethical, policy and privacy debate, we should consider the implications of the existence, use and value of these data for E/HF methods and practice.
In order to support the consideration of these implications, we present three example scenarios in which the contextual digital footprint is of relevance to E/HF practice and interventions. The aim of these descriptive scenarios is: (a) to highlight some different circumstances in which we might encounter contextual digital data, and consider the different technologies that both currently and in the future will enable this data to be collected, stored and interpreted; (b) to provide a basis from which the positive and negative aspects of using contextual digital data to support E/HF analysis can be identified.
The three scenarios are: • Situated work: An example of where contextual digital data can be used in a confined workplace. The selected example workplace is a hospital, and the collection of data about clinical work using a range of sensing and systems technologies is presented. throughout our everyday lives, and represents a 'cradle to grave' collection of explicitly and implicitly produced data about ourselves, our families, our interactions, thoughts, behaviours and work. This footprint is a construct that describes a wide range of current and future forms of data collection that may or may not be discoverable by any given individual; as such it defies further formal definition. However, early work into the properties of data-sets containing information that can be tied to given individuals has demonstrated it typically has certain characteristics, including for example sparsity and higher dimensionality (that is, individuals can be uniquely identified across related data-sets on the basis of specific features, so-called ' jigsaw identification' , see Brynjolfsson, Hu, and Smith 2003;Narayanan and Shmatikov 2008). This paper considers the implications of the contextual digital footprint for E/HF science and practice. Up until now, the focus of research into the contextual digital footprint has been conducted primarily within the fields of computer science and human computer interaction. However, the world of business and industry is increasingly becoming aware of and investing in the concept, reflected in initiatives such as customer loyalty schemes or data-driven approaches to personnel management. We consider which types of data are of relevance for E/HF design and analysis; the characteristics of that data and the way in which it is produced, collected and stored of which we need to be aware; and the ethics of using the contextual digital footprint, considering utopian and dystopian views of the practice, and thus how we as E/HF practitioners should embark on responsible use of this data to design effective and safe work systems. Whether this constitutes a new paradigm within E/HF is open to debate. one might argue that E/HF has always been a data-driven discipline and that consequently simply having more data represents merely a change of degree in practice rather than a change of kind. In the present paper, we make the case that when data about workers and work environments come to exist in sufficient volume, and are increasingly ubiquitous, there is the potential for significant changes not only in the work we study, but also in how we study it and the depth of understanding that can be achieved. At the same time, however, we will also argue that whether we use the digital footprint to 'do better things' or merely 'do things better' , this will be done most effectively not by rejecting the values that already guide our practice, but rather by reacquainting ourselves with what our purposes and values actually are.

Contextual digital data in our work and lives
Data exist and are produced throughout our lives. In our personal lives, we might refer to major life events on social media, so that data are stored about when and • Distributed work: An example where contextual digital data is used in a work setting where employees are geographically dispersed. The selected example is of crowdsourced human computation (HC), where individuals contribute to an overall task by interacting with separate tasks and systems. • User experience: An example where contextual digital data is used to capture, manage and enhance the experience of users. The selected context for this is travel, where the user will interact with a number of different official and unofficial systems to help them manage and enjoy their journey.

Example 1: Situated work -health care
Many jobs in many workplaces have routinely been monitored or logged. From air traffic control strips recording agreed changes to flight paths, to voice communications in rail control being recorded, it has been accepted as being reasonable (from the employee perspective) and valuable (from the employer and legislative perspectives) to record many types of decisions and actions at work. The introduction of technologies such as efficient data storage, mobile smartphones, location tracking technologies and movement sensors means that the extent to which it is now possible to monitor and record a wide range of aspects of work is vast. A particular context in which this is considered is the out-of-hours care hospital setting. Around 75% of time in hospitals is classified as 'out of hours' care, where a small number of clinicians, many of them often quite junior, are responsible for patients on a range of medical wards. often these wards are geographically distributed over a wide area. Until recently, if a patient needed to be attended by a clinician, the clinician would be alerted via a mobile pager device. on receipt of this short message, the clinician would speak on the telephone to a nurse coordinator, who would relay the message about the patient, including details of their condition and location. This voice and pager system has in many hospitals now been replaced by a task allocation system that uses smartphones to send and display details of tasks to doctors.
These smartphones link with a system (Blakey et al. 2012) that records the number of tasks performed in a shift. As a doctor is required to 'accept' a task, there is also normally an indication of the current task that is being undertaken. However, a single task, such as replacing a catheter, could actually involve a number of distinct actions that are completed in different parts of a ward. Developments in location-based technologies, both in advanced and discrete sensors that can be worn on the person, mean that the task allocation software can now be combined with location tracking to increase the amount of detail collected about tasks that are completed, building on the knowledge about what tasks are done, to also consider where they are done and when.
The above technology is being implemented in a basic form for research purposes now (Brown et al. 2014). It is reasonable to assume that the capability of these technologies will increase (e.g. wifi coverage will become more reliable, location tracking more accurate) and new technologies will have the potential to be introduced into this context. Therefore in the future, in addition to detailed monitoring and tracking of movements around a hospital, we may also be able to monitor, in real time, physiological indices of clinicians, which could provide indications of when a doctor is becoming highly stressed or fatigued for example. We may also be able to record conversations or communications, or provide the ability to allow remote support for diagnosis. This has the potential to introduce efficiencies to patient care, for example, by introducing demand-driven staffing (Brown et al. 2014). The data obtained from such technologies can also be used to support staff training, both through development of best practice guidelines for task allocation, as well as providing a reflective tool for clinicians to review their performance on shifts and understand for themselves which types of work strategy are most effective.

Example 2: Distributed work -Human Computation
Human Computation (HC) can be described as using the internet population to perform tasks and provide data to address difficult problems that either cannot be solved by computer algorithms alone (Ma et al. 2009). From an E/HF perspective, this paradigm is recognisable as an implication of Fitt's list (Fitts 1951) that asserts humans and machines have different relative strengths (see also de Winter and Dodou 2014, for a contemporary discussion of the applicability of Fitt's insights). The larger part of HC research concerns commercial HC offerings, principle amongst them the Amazon Mechanical Turk that allows workers ('turkers') to participate in online tasks for micropayment (Vakharia and lease 2013). These tasks are typically referred to as Amazon Mechanical Turk Tasks (AMTs) and tasks range from image labelling, participation in surveys and undertaking university research experiments through to language translation, carrying out searches and even content generation (such as 'write a short paragraph explaining why hotels are important to business travellers'). Key to the design of HC is that there is a digital platform that both distributes and manages technologies currently in place are real-time train system status, on-board booking of onward journey steps and use of formally delivered messages in case of disruption. In addition, unofficial sources, such as twitter feeds from passengers travelling on a disrupted service, can be of value both to transport users and providers, but, of course, the quality of data obtained from these informal sources has little or no official verification and can be variable.
Already there has been a move to make more realtime transport information open, and this has led to a series of entrepreneurial apps that have supported user experience. This therefore yields a 'personal digital travel footprint' , where an individual's data about their travel preferences, behaviours and movements can be collected over a long period of time. This not only offers the potential to provide personalised information form and content, but also supports business models such as geographically targeted advertising or varying ticket prices.
This more business-and experience-led example highlights some distinct issues. There are clear opportunities to use such personal footprint data to make travel more efficient and satisfying -two underlying goals of E/HF design. But there are clear questions regarding ownership of data and transparency of decision-making -for example, it may be possible for a transport provider to note that too many people are planning to take a single, congested route, and therefore encourage some individuals to take a longer journey. Should individuals know the rationale behind these decisions? Should they be aware that their suggested route will take them longer to travel?
This example scenario also highlights the need for E/HF to not only consider the work and organisational context of changes to technologies and systems, but also to understand the business implications. Increasingly, we are moving away from dedicated and constrained work settings as the home of E/HF analysis, to more distributed, less controlled contexts. This highlights the need for a new paradigm to our research and practice.
As Table 1 demonstrates, each of these scenarios have the potential for E/HF interventions supported by the contextual digital footprint. These scenarios also being to highlight issues that our E/HF methods need to overcome to ensure that interventions are ethical and effectively influence workplace systems/design.
We now use an alternative framework to consider the challenges and opportunities for the contextual digital footprint in E/HF practice. The following section considers the impact of contextual digital data for five different types of E/HF method that routinely form part of our E/HF toolkit. the work at hand, issuing new work and assessing the quality and aggregating completed work units (typically by cross-checking multiple redundant completions of the same work unit).
Several issues have been raised around use of the Mechanical Turk with regard to its commercial aspects, such as the fair pricing of AMTs for workers, questions about ethical practices and whether aggregate earnings for turkers are reasonable. Some have referred to the Mechanical Turk as a 'digital sweatshop' whereas others have preferred the view that AMT provides remunerated diversions people can undertake in their spare time that are not supposed to replace the typical job (see Kittur et al. 2013 for a discussion). From an E/HF perspective we might also be concerned that implementations of HC represent something of a slippage back from lessons already learned about the design of harmonious and productive sociotechnical work because the driver for work design in this sector is not so much what humans can do, as much as what machines so far cannot. This pattern of work allocation is referred to as left-over automation (see Bainbridge 1983). This pattern of work allocation should concern us as it is inimical to the design of satisfying, meaningful work (see, e.g. oldham 1975, 1980;Vicente 1999). While HC might be seen as a relatively niche form of work at present, it seems reasonable to wonder how much further this paradigm might extend in line with developments in computational intelligence. Might it, for example, be possible to break down the work of a legal professional into a set of small bite-sized chunks that are then crunched by a legal rulebase to render an opinion (indeed, the ergonomics expertise in task decomposition techniques might be crucial to this venture). Kittur et al. (2013) have taken the view that HC may transcend its current limits and indeed actually constitute the future of work itself and ask: 'what will it take for us, the stakeholders in crowd work -including requesters, workers, researchers -to feel proud of our own children when they enter such a work force?' (12).

Example 3: User experience -transport
Transport businesses and infrastructure providers are increasingly becoming aware that there is value in capturing travellers' end-to-end journeys. We know, for example, that the 'last mile' is often the barrier to modality change (Rehrl, Bruntsch, and Mentz 2007). Increasingly, people are planning journeys using technologies, and monitoring status of transport infrastructure in real time, making dynamic journey choices (e.g. to walk or take the bus/tube).
The data available to support these activities can be drawn from both official and unofficial sources, and a key technical challenge here is the integration of data from a range of sources in a range of forms. Examples of (1) Collection of information about people, equipment and environments (2) Methods to support Analysis and design (3) Evaluation of user and system performance (4) Evaluation of demands on and effects experienced by people (5) Management and implementation of ergonomics.
In the following sections, we outline the different ways in which contextual digital data can potentially be used by E/HF practitioners, and begin to consider the specific

Contextual digital data and E/HF methods
In this section, we explain a series of characteristics of contextual digital data, and specifically consider the implications of these characteristics for E/HF methods.
We use an existing classification of E/HF methods (Wilson and Sharples 2015a) to ground this discussion. Wilson and Sharples outline five types of methods, beyond 'general methods' (a grouping that includes generic techniques such as observation, interviews and experiments) that embody the different approaches that an E/HF researcher or practitioner might wish to employ. These are: Data used to discriminate against certain user groups outside profitable target demographic or user profiles Annoyance from interruptions from targeted advertising and messages, distracting from primary task argued to be increasing well beyond the levels required for E/HF design intervention or evaluation. For example, we are already able to measure movement in centre of gravity (used to detect experience of sickness such as might be experienced after a period of using Virtual Reality (see Cobb and Nichols 1999)) to at least a resolution of 0.1 mm. But, realistically, to be distinguishing between an individual who is experiencing symptoms of motion sickness to an extent that it affects their well-being or ability to work, it may well be the case that only measurements to an accuracy of 1 mm are required. Similarly, within the situated work example given above, we may be able to deploy technology that could measure the position of a clinician within a hospital to ±1 cm accuracy, but in fact, only ±1 m accuracy may be needed to inform the design of ward layouts. Secondly, in the past the collection of data about people, equipment and environments was an explicit and targeted activity. Indeed, phenomena such as the Hawthorne effect demonstrated that the explicit nature of data collection had the consequence of potentially changing implications in this paradigm shift. Table 2 summarises the issues for contextual digital data for these five different types of E/HF methods, and feeds into recommendations for the use of contextual digital data to enhance E/HF that we present in the conclusion to this paper.

Collection of information about people, equipment and environments
E/HF has a history of using and developing many specific instruments and approaches to measure the characteristics of people, equipment and environments. These measures can be physical (e.g. anthropometry), physiological (e.g. heart rate), environmental (e.g. lightmeter) or perceptual/ cognitive (e.g. visual acuity tests). over the past 50 years, the instrumentation to support these measures have been increasing in accuracy and decreasing in their intrusive nature. It is clear that these trends in instrumentation are continuing. This presents two interesting challenges that represent a paradigm shift, as opposed to an incremental change. Firstly, the sensitivity of instrumentation can be Essential to ensure that data is stored securely and ethical requirements for data collection, storage and use are met methods to support analysis and design Potential to collect data to inform design of specific tasks over longer period of days or weeks more likely to capture variation in task performance, and impact of unusual events.
cost of data analysis, and no clear guidance on how much data would be 'enough' multiple, discrete, data-sets about a single task are more feasible to collect Potential for new insights from combined data-sets Potential for misleading co-ocurrence of data (correlation ≠ causation) and difficulty in assessing reliability of varied data types Potential to increase sample sizes more likely to capture individual differences in task completion strategies and design preferences required samples may still be quite large to achieve required power, and larger samples will present time and cost implications Evaluation of user and system performance more detailed and varied types of information about task completion can be collected Potential to increase detail and quantity of task data than previously feasible Task data (e.g. counts) may not represent performance without contextual information also being captured Evaluation of demands on and effects experienced by people Lower intrusion measures of physiological and psychological response richer and less intrusive data collected from real world setting important to understand meaning of physiological data with respect to E/ HF concepts (e.g. workload) reduced reliance on subjective data reporting in real world context opportunity to capture changes in experience at higher resolution than possible with subjective data measured physiological changes may not be meaningful or of concern to the individual management and implementation of ergonomics Ability to monitor the long-term effectiveness of E/HF interventions Potential to collect evidence to support cost benefit analysis of E/HF need to understand the role of the E/ HF intervention (as opposed to other workplace changes/behaviours) when interpreting data opportunity for workforce team members to review data in an open and transparent way, and reflect on their own performance and actions Potential improvement in motivation and commitment to job role from workers need to ensure that appropriate and relevant data are captured, understood and used effectively by managers and griffiths 2005) or influences on human performance (Edwards 2014) are multifactorial. Currently, methods that allow us to examine such phenomena include: expertled qualitative methods, such as structured critical decision-making interviews; knowledge elicitation methods such as card sorts; or laboratory scenarios with multivariable manipulations. However, in the case of the laboratory examples, power analysis often reveals that to obtain data with a reasonable likelihood of detecting any effects that exist, large participant sample sizes are required. This not only presents time and cost implications, but in many cases it is appropriate for those laboratory study participants to in fact be expert operators themselves (e.g. air traffic controllers); thus they are drawn from a limited participant pool. Contextual digital data presents the potential to gather data where we are interested in analysing multifactorial phenomena from real world data. However, as with all analysis of this type, it is not always straightforward to obtain the appropriate metrics that directly map on to the influences of interest; in the case of google flu analytics, there were data such as search terms that could be used as indicators of the experiences of users.
As E/HF specialists, we need to consider whether there are equivalent types of contextual digital data that occur during workplace performance that can be used as indicators of multifactorial phenomena. If we are successful in identifying sources of factorial data such as this, we can begin to move away from the constraints presented by multifactorial ANoVA design towards more dynamic epidemiological modelling techniques to understand the development of workplace experiences and effects such as comfort, stress and workload that are at the heart of E/HF.

Evaluation of user and system performance
The challenge of obtaining valid and reliable measures of user and system performance has yielded many methods in E/HF, such as cognitive work analysis (Vicente 1999) or human reliability analysis (Kirwan 1994). There are still however many situations in which there is not a clear and unambiguous measure of work 'performance' , and many laboratory tasks are subject to criticism either that they are too artificial and do not reflect the complexity of real world jobs, or that they are subject to classic experimental artefacts such as the traditional speed/accuracy trade-off. Contextual digital data therefore provides an opportunity to deliver new measures of work performance. For example, as in the situated work example provided earlier, research we are conducting in the hospital context is allowing us to collect data relating to workplace tasks (through the smartphone job allocation system) and track clinician movement around the hospital. This will yield a much larger data-set than would be practicable through participant behaviours once people were aware that they were being measured or recorded. Whilst this might have had an undesirable consequence in terms of the validity of data collected, this has a (perhaps unintended) positive side effect in that participants were clearly aware that data were being collected, and therefore an E/HF practitioner could be confident that the principle of 'informed consent' was being upheld. 1 Now, technologies such as embedded sensors in buildings, or personal devices such as smartphones, mean that participants may not be aware of the presence of sensors, due to their integration into the building infrastructure or technologies that they are routinely using for other purposes. This has the positive consequence of ensuring that collected information is more naturalistic, but presents ethical challenges. We have an underlying ethical principle of ensuring that all participants in research and data collection are able to give 'informed consent' -in other words, that they are able to understand the purpose of the data collection, and consent to their participation in data collection. The ability to capture data about people, equipment and environments using our contextual digital footprint therefore demands more formal and explicit confirmation of 'informed consent' to data collection.

Methods to support analysis and design
Methods to support analysis and design include approaches such as task analysis, modelling and expert evaluation. They traditionally depend on the collection of data about a work task or interaction, and either real-time or off-line analysis conducted by an expert, sometimes using tools such as digital human modelling, to evaluate the workplace, its requirements or design implications. Contextual digital data offers the potential for a richer data-set to be used as the basis for this analysis. For example, rather than observing the tasks completed by an individual supermarket checkout operator, all the interactions with the different systems being used could be collected at several workstations over a period of several weeks. In addition, the increased variety in types of data that can be collected offer the opportunity for combining data-sets and making analytical inferences that are only possibly when two sets of data are combined (for example, the relationship between number of interactions required on a till during a shift could be combined with data on absenteeism for a sequence of shifts). In other contexts, this data analytics approach has been used by organisations such as google to predict the onset of flu outbreaks (with varying success -see lazer et al. 2014). In E/HF previous theoretical work has demonstrated that many of the phenomena in which we are interested, such as work-related upper limb disorders (Armstrong et al. 1993), work-related stress (Cox triangulation shows that our methods are agreeing with each other, it does not necessarily help us to interpret the meaning of such data. This is the classic correlation/causation dilemma -for example, if a participant experiences an increase in heart rate as they report high workload, this could mean that the heart rate is as a consequence of workload, or that both heart rate and workload are influenced by the same external phenomenon. Non-intrusive physiological monitoring undoubtedly offers significant potential but it is important that E/HF practitioners understand the validity of what data are being collected, and how they relate to the multifactorial phenomena that have previously been established.
This type of data is not necessarily solely physiological. For example, if we consider either the distributed work or the user experience examples given above, the level of stress experienced by a participant could potentially be inferred by typing speed or number of errors made whilst providing input to either a mechanical turk work system or an app being used whilst travelling. This is potentially extremely powerful and sensitive data, but it is essential that its meaning in terms of E/HF concepts such as stress is understood and managed appropriately. For example, in the case of distributed work, we would want an E/HF intervention to be focused around work demand management and support for the worker, rather than a punitive or monitoring regime that in fact increases the stress experienced by the individual.

Management and implementation of ergonomics
We also use methods such as Human Factors Integration (Cullen 2007) to support the effective implementation of E/ HF in workplace contexts. Contextual digital data offer two opportunities in this area -(a) the ability to monitor the long-term effectiveness of E/HF interventions and (b) the opportunity for workforce team members to review data in an open and transparent way, and reflect on their own performance and actions.
The first opportunity, to monitor long-term effectiveness, is critical to supporting the cost/benefit analysis of E/HF interventions. This has for a long time been something that the discipline has grappled with, as so many ultimate indicators, such as staff turnover, or absenteeism due to illness, are low in frequency and long term in their development. Contextual digital data make obtaining shorter term indicators of overall workplace realistic. Examples of such data might be detailed analysis of frequency of task completion, or lengths of rest breaks taken. This data does of course have to be very carefully managed and interpreted, and clear strategies for management of privacy and use developed. For example, if a group of workers are noted to be taking frequent short breaks during a task, traditional methods such as observation, and more reliable data than methods such as diary methods. It is not, however, a perfect measure. For example, observations of the technology in-use have shown that the users appropriate the technology to help them manage their work tasks. When a clinician is allocated a job, they are then required to 'accept' the task that has been allocated to them, when they begin that task. This interaction tends to be a fairly reliable indicator of the task being started. However, in order to remove a task from the list, clinicians need to 'complete' a task. our observations have shown that the task is often in fact left on the smartphone list long after it has actually been completed. This is not because clinicians are deliberately making the system think that a task is taking longer than it actually is, but instead is due to the fact that a clinician will often keep a task on the list as a reminder, perhaps to check later on in their shift on the result of some medical tests that they have ordered. Therefore, it is vital that we accompany interpretation of contextual digital data with a clear understanding of how the technology that yields that data is appropriated by its users.

Evaluation of demands on and effects experienced by people
As noted earlier, one particular technological development that is of value to the practice of ergonomics is that advances in sensor technologies make them much less intrusive than in the past. Therefore, it is realistic to imagine, for example, an entire factory workforce wearing devices such as a 'fitbit' that records physical activity through the day. In addition, technologies such as eye tracking now have the potential to be embedded into standard head gear, and the size of physiological monitors such as heart rate monitors has decreased to such an extent that they can now realistically be worn throughout the working day without being noticeable to the user -therefore hopefully reducing the extent to which participants are aware of the device and therefore changing their behaviour due to their awareness of being monitored. However, it remains critical that we are not seduced by these vast sets of quantitative data that perhaps represent an Aladdin's cave of previously unobtainable data. It is still important that we understand the validity and meaning of such data; whilst these technologies may enhance the accuracy and availability of such data, there is still a challenge in understanding the meaning of these measures (parasuraman 2003). Science and engineering colleagues often refer to the concept of 'ground truth' when establishing the accuracy of measures (e.g. in developing new global Navigation Satellite System technologies); in E/HF it is very rare that we have a 'ground truth' . We tend to rely on triangulation to overcome this limitation, but we must acknowledge that whilst this data from an individual's life more generally, and these data might span multiple workplaces, or combine their work and home life. If we consider an individual who is experiencing symptoms of an upper limb disorder for example, we may be able to collect data not only from the way in which they interact with systems and equipment at work, but also their activities at home -for example, an individual's frequent interaction with a personal smartphone or games technology may be combining with a physical task they do at work to produce the symptoms that are presenting. This results in a much more sophisticated and extended form of our existing concept of 'archival data' .

Anonymous or not?
The collection of large volumes of contextual data poses significant challenges to practitioner in terms of both security (that is, making sure the data remain confidential) and in terms of maintaining anonymity. The specific issue of security is outside the scope of this paper, but we will note that barely a week passes without reports of a significant security breach concerning personal data, be it the result of deliberate hacking (as in the case of leaks of photographs stored on the cloud) or some form of simple error (losing a datastick). Indeed, with human error reported as being implicated at some level in 95% of cyber incidents (IBM global Technology Services 2014), this is in itself an important new facet of safety science research. As the potential custodians and users of contextual data, E/HF practitioners will have increasingly onerous responsibilities in this area.
Maintaining anonymity will also be challenging. Normal ethical practices in this area typically include removing personal identifying information (such as names or addresses) and perhaps coding respondent's data by job role or simply with an index number. It is increasingly clear that such practices will not be sufficient when dealing with contextual data; indeed, risks to anonymity will exist even where no ostensive identifying data was ever collected by an individual researcher or team. The key to this difficulty lies in the specific characteristics of contextual data both linked to its sheer volume as a result of its automated collection. First, we may fail to fully appreciate what identifying information is hidden in our data-set. It may for example contain metadata 2 we are unware of. Second, it may be possible to carry out inferences over data we did not directly foresee, possibly because the data are captured at a higher spatial or temporal resolution than we were aware of. For example, analysis of time-load data from energy monitoring devices can be used to identify time-varying appliance load signatures that identify when specific electrical items were being used (National Institute of Standards and Technology 2010, 13). This may in itself does this indicate that they are not complying with their work requirements, or is it an indicator of a potential problem with the way in which the task is designed. It could in fact be the case that the regular, short, unplanned breaks are an example of good practice, leading to shared intelligence about the status of work, or team building activities. For example, if a doctor whose movement is being tracked is apparently spending frequent periods of time in the mess room, is this an indicator of an inefficient rota, or a sign that the doctor is able to take sufficient rest during their shift, and thus less likely to become stressed?
Secondly, it is already seen in manufacturing and transport contexts that rapid feedback of performance data is possible. Rather than this being perceived as a 'big brother' concept that intimidates and disenfranchises workers, there is potential for the workforce to themselves to take ownership of these data and use the data positively to change their own performance and actions. This therefore requires careful implementation of such data within an organisational structure as formative, rather than summative, feedback.

Characteristics of contextual digital data
As mentioned previously, a principle of research ethics and ethical E/HF practice is that for any data we collect, the participant(s) must be aware that data collection is taking place, and how that data is going to be used and stored. In an experimental context, this is usually quite straightforward to ensure, through the use of consent forms, and influenced by the expectations of the participants, who are aware that they are taking part in a formal process of data collection. E/HF practice has traditionally grappled with the acknowledgement that people change their behaviour in the field once they are aware that they are being monitored or observed. As noted earlier, this was first reported as the Hawthorne effect (landsberger 1958; but see also levitt and list 2011) and is a phenomenon that has persisted. Contextual digital data extends the nature of data that can be collected about an individual whilst completing their work or using a system, and presents new challenges for how we ensure that the principle of 'informed consent' is maintained (Eden, Jirotka, and Stahl 2013). Therefore, in addition to the specific impact of the use of contextual digital data within E/HF methods, there are some general, contextual issues of which we need to be aware of as follows:

The blurring of the work-life boundary
Traditionally within E/HF we tend to look at people within a particular context, situation or place of work. Contextual digital data present the opportunity to examine and use nature of the likely intent (identifying a specific individual known to be in the data-set, attempting to identify a specific individual who might be in the data-set, attempting to identify as many people as possible in a data-set) (El Emam 2010).
Another aspect of practice will be to educate users and workers effectively as to the nature of the risks involved and to offer them appropriate levels of control over their own data where possible (ENSIA 2011). In the workplace this would however require an appropriate managerial and cultural viewpoint on whether workers are indeed allowed this kind of privacy or ownership in the first place.

Beyond the sample
Traditionally, we are able to look at constrained contexts, samples which are governed by physical rules. For example, when completing a Cognitive Work Analysis (Vicente 1999), we establish an abstraction hierarchy that outlines the context to which our analysis will apply. Contextual digital data, in principle, allows us to look at complete populations. This presents a tremendous opportunity in terms of coverage of a range of user types, but we (a) must really be sure we have captured the whole population, and (b) need to acknowledge that the examination of a complete population represents a change from what is sometimes our normal good practice -sometimes we deliberately do not try to look at the whole population but consider specific user groups and their needs, on the basis that if their needs are met, others are automatically met (e.g. door height for tall people, button size for big fingers, visibility of contrast for those with visual impairments etc.), (c) this makes the concept of statistical significance tricky. We already see this in the context of correlations when we have large samples, where we need to be cautious in the inferences that we make from statistical tools such as correlation, and remember what the numbers produced from applying a statistical test actually mean. For example, a correlation of 0.3 for a sample of N = 50 will be considered 'significant' at a level of p < 0.05 (i.e. the correlation would only have occurred by chance 5% of the time). However, there is also a meaning to the correlation coefficient of 0.3 -by converting it to the coefficient of determination, we know that 9% of the variance in one variable is explained by knowing the other variable. Whilst 9% may be 'significant' , the meaningfulness of having explained only 9% of the variance in a variable needs to be acknowledged (and methods used to help capture the nature of the other 91%!).
The challenge for E/HF is therefore to a) be able to interpret the correlations in massive data-sets in ways which are meaningful and b) to pose hypotheses or offer explanations which can exploit these data-sets. (For further discussion of the notion that n = N in big data contexts, see Drury 2015). be embarrassing or could be used then to produce further inferences about who was at home and their pattern of life. Relatedly, it has been established that the phenomena of higher dimensionality and sparsity tend to exist in data-sets that contain so-called micro-records of individual behaviour as an inevitable consequence of the long-tail distribution (Brynjolfsson, Hu, and Smith 2003;Narayanan and Shmatikov 2008). The practical consequence of this is that once an individual record is considered from multiple dimensions in terms of all the attributes it may contain, far from blending into the crowd, individual specific records can be easily located on the basis of specific diagnostic features (i.e. there will be one or more dimensions in a given micro-record that will disambiguate it from other similar micro-records). Third, even if (a) our data-set is clean of metadata and (b) sparsity and higher dimensionality do not lead practically to 'jigsaw' identification, considerable risks are posed by the existence of other data-sets, particularly where many different types of data-set are already publicly available. A definitive example of this occurred relatively early in the modern history of contextual data where a Massachusetts hospital discharge database was deanonymised by correlating it with a public voter database via postal codes (Sweeney 1997). other famous examples include correlating a publicly released data-set of films watched by supposedly anonymous individuals on Netflix with film reviews posted to the Internet Movie Database: "Using the Internet Movie Database as a source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive personal information" (Narayanan and Shmatikov 2008). This type of anonymity breach is not limited just to online databases, for example locational privacy can also be broken with reference to social networks as a correlate (Srivatsa and Hicks 2012) or with reference to other kinds of sensor data that modern smartphones collect like accelerometer, magnetometer and gyro data (lane et al. 2012).
one might however note that the risk of deanonymisation also implies something positive about the qualities of contextual data in that it shows that these data tend to be highly specific and thus in principle, more data means potentially more information. More generally, this possibility can be seen simply as a corollary of the power of data mining to produce insights, albeit in this case ethically dubious ones. The mining of hospital data together with voter records could just as easily generate epidemiological data. The challenge lies in using data ethically and in an informed manner. Useful tools in this area may include ways of categorising data in a risk-based manner based not upon absolute security, but the amount of effort that would be required to deanonymise a data-set and the flexibility in how they transact their labours by providing them with rich sources of data. Indeed, at an extreme, exploitation of the digital contextual footprint could permit the removal of management functions in favour of self-synchronising teams within the workplace as has been envisioned as a consequence of ubiquitous data sharing in military domains (e.g. Alberts and Hayes 2006). The implication here is that ergonomists may occupy the role of ombudsman with regard to effect of new technologies on the workplace (Meister 1999; see also Hancock 2009 for an extended discussion). It may be increasingly necessary that we stick to our guns with regard to what we understand as the appropriate ways to design work, and ensure that we understand how contextual digital data can be used to support this dialogue.

Discussion
The emergence of contextual digital data, like most technological developments, presents both opportunities and hazards to the discipline of E/HF at several different levels of analysis. Traditionally, E/HF has been something of a data hungry discipline where practitioners may often have found data collection expensive and time-consuming even assuming easy access to relevant sites and Subject Matter Experts, and sometimes have to accept that their resources may not stretch as far as they might want. The potential for a deluge of rich and seemingly unlimited data about individuals and work systems has clear appeal, signalling the potential to become more confident about the effects of work design on a wider population, and reducing the time and financial costs of data collection. At the same time, E/HF as a newly 'data rich' venture presents numerous fresh challenges in terms of the interpretation of these data, the practical and ethical handling of large data-sets and ultimately, determining how it fits in with the concerns of our discipline and how it should actually be used and what could and should change in the world as a result.

Abstracted empiricism and 'the ergonomic imagination'
Although the contextual digital footprint is a new phenomenon with several distinctive features, this is not the first time a discipline has had to consider its reaction to the availability of a flood of new data and in it is instructive to examine the lessons that were learned. In 1959, the American sociologist C. Wright Mills expressed concerns that have a familial similarity to our own in this paper. Mills had noted that the then-emerging technology of electric computers meant that survey research on public opinion could be rapidly coded onto Hollerith punch cards 'which are used to make statistical runs by means

The uses of E/HF in a contextual data environment
In addition to practical considerations as to how we might use contextual data in E/HF, there is also the significant issue of how E/HF would in turn itself be used in organisations and what part we will play within these ventures. At the present time, the use of such 'business intelligence' has arguably happened ahead of substantial efforts by ergonomists to understand it. of course, using data within management is hardly new and has not been without its proven benefits and equally, its discontents, particularly when linked to 'targets culture' . Timecards, for example, have long been a form of employee tracking. Indeed, the overbearing 'big brother' manager (Chaplin's vision predates orwell's) who tracked his employees even up to the point of tracking and intruding on their bathroom breaks was famously parodied in the Charlie Chaplin film 'Modern Times ' (1936). one might feel that a trajectory from registers to punched time cards through to swipe cards and then employee location tracking is merely a quantitative change in the fidelity with which employees can be tracked. However, a key development is that this tracking data is just one of a range of measures that can now be easily applied, and most importantly, the development of computational intelligence to track employees (e.g. parenti 2001). Recent media attention has been focused on the use of location and activity tracking data as part of employee monitoring at mail-order warehouse and fulfilment centres (BBC 2013) with several workers expressing unhappiness at their perceived lack of control within their workplace: 'Workers are treated more as robots than humans' (Streitfeld 2013).
In wider focus, one of the biggest challenges ergonomists will have to face regards the potential for improvement specifically in terms of production. We have, in a sense, been here before. one of the first responses to having accurate information about employee behaviour (in the form of artefacts like time and motion methods and Frank gilbreth's filming of the workplace) was so-called scientific management (Taylor 1911). In an echo of the present situation, F.W. Taylor himself was surprised to find that the Ford motor company had implemented methods of scientific management ahead of the involvement of experts, including himself (Sorensen 1956). While the sociotechnical turn corrected for this tendency (e.g. Trist 1981), there is a risk that with the lure of data-driven improvements in efficiency, lessons learned at great cost are once again forgotten leading to a 'neo-Taylorist' future. At the same time of course there is fantastic potential for the contextual footprint to serve what we might regard as sound sociotechnically informed ends such as permitting job enlargement or even allowing employees newfound • Training in and provision of appropriate techniques to ensure data security, coupled with methods to ensure that ethical requirements are met. • Developing methods to store and dispose of digital data • Making sure procedures are in place to ensure 'informed consent' is feasible whenever data are used as part of ergonomics analysis • Developing approaches that allow us to maintain participant anonymity, being particularly aware of the hazard of jigsaw identification.
But, in addition to these recommendations regarding the ethical and responsible use of contextual digital data, we should not be blinkered, and should embrace the opportunities presented by these data. Contextual digital data may well provide us with the opportunity to have new insights and advance our theories about causation and response to stimuli. The contextual data footprint, if used responsibly and ethically, has the potential to transform the nature of E/HF analysis and track the impact of design changed informed by E/HF analysis over days, months and years. We can move beyond concerns about the transferability of data from the laboratory to the field and, consider the possibility of the field becoming the laboratory. Notes 1. Whilst in laboratory studies or formal activities such as interviews or focus groups, a standard consent form will be used to confirm informed consent, in many less formal workplace observations, participants do not always give written consent to data being collected, but the E/HF practitioner will clearly verbally explain the reason for their presence and the types of data (e.g. written notes) that will be collected. 2. Additional, explanatory data that is attached to the primary data value.
of which relations are sought. Undoubtedly this fact, and the consequent ease with which the procedure is learned by any fairly intelligent person, accounts for much of its appeal' (Mills 1959, 50). The concern expressed by Mills was that this technology would lead to the distortion of sociology as an academic discipline towards 'abstracted empiricism' where data and method were not appropriately contextualised or integrated with theory with an endgame emerging where sociology degenerates into the analysis of opinion polling rather than retaining focus on understanding social structures and phenomena. A further corollary of Mills' concerns was that the easy availability of data leads to a potential confusion between what is important and what is easy to measure. Mills' 'abstract empiricism' of the Hollerith card has much in common with contemporary critiques of big data that emphasise its likewise theory-free interrogation of data privileging correlational statistics over hypothesis-testing inferential methods and several of the concerns expressed in the present paper. Mills' response to this was to invoke the notion of the 'sociological imagination' , essentially a call for a sociology that took a three-dimensional, holistic view of society combining macro-and micro-perspectives such that individual experience could be understood in terms of larger, interlinked phenomena, a view not dissimilar -at least by analogy -to the systems ergonomics perspective in E/HF (e.g. Wilson 2014) most typically expressed through ideas such as the onion model (see Wilson and Sharples 2015a) or ergonomics as 'reflective practice' (see Sharples and Buckle 2015). In view of this, we have no apparent need at the present time for a putative New Ergonomics but it is perhaps ironic that in the consideration of a new paradigm within E/HF, our attention is drawn back to the key pre-existing foundations of our discipline. Ultimately the safe, positive and effective accommodation of the contextual footprint within our subject will require a recommitment to our core values and concerns.

Using the contextual data footprint to enhance E/HF
Contextual digital data already exists, and is here to stay. As E/HF specialists, it is our responsibility to understand how these data can be used ethically and responsibly to improve the way in which we design systems, technologies and work. We require at least the following: • Training in methods to handle large data-sets, and retaining a fundamental understanding of statistical inference, so that colleagues are aware of the way in which statistical tests behave with large data-sets.