Investigating call record data using sequence analysis to inform adaptive survey designs

ABSTRACT Researchers have become increasingly interested in better understanding the survey data collection process in interviewer-administered surveys. However, tools for analysing paradata capturing information about field processes, also called call record data, are still not yet fully explored. This paper introduces sequence analysis as a simple tool for investigating such data with the aim of better understanding and improving survey processes. A novel approach is to use sequence analysis within interviewers, which allows the identification of unusual interviewer calling behaviours, and may provide guidance on interviewer performance. Combining the technique with clustering, optimal matching and multidimensional scaling, the method offers a way of visualising, displaying and summarising complex call record data. The method is introduced to inform survey management and survey monitoring. The method is hence informative for adaptive survey designs and will help to identify unusual behaviour and outliers and to improve survey processes. Sequence analysis is applied to call record data from the UK Understanding Society survey. The findings inform further modelling of call record data to increase efficiency in call scheduling.


Introduction
Many survey agencies nowadays routinely collect survey process data, so-called paradata (Couper, 1998;Kreuter, 2013). For interviewer administered surveys, including both telephone and face-toface surveys, data about the fieldwork process, often termed call record or call history data, have received much attention (Bates, Dahlhamer, & Singer, 2008;Laflamme, Maydan, & Miller, 2008;Blom, Jäckle, & Lynn, 2010;Durrant, D'Arrigo, & Steele, 2011, 2013b. Such data may contain information about the outcome of the call or visit and the day and time of the call attempts. Several outcomes may be distinguished such as non-contact, contact, ineligible, refusal, appointment made and any interviewing done. This string of outcomes of all call attempts to a household is referred to as a call sequence, an ordered collection of activities or states (Piccarreta & Lior, 2010). A large number of different sequences are possible even if the number of positions (i.e. total length) is relatively small. As an example, Table 1 indicates a selection of short call sequences observed in the UK survey Understanding Society (http://dx.doi.org/10.5255/UKDA- . This paper introduces the use of sequence analysis for investigating call record data to inform survey monitoring and management processes. The technique is a powerful tool for visualising and summarising call record data, which can be large and complex (see Background section). It may help to identify unusual cases or outliers, which then may require further investigation. In particular, it can build the basis for further statistical modelling. A novel approach here is to use sequence analysis within interviewers, which allows the identification of unusual interviewer calling behaviours, and may provide guidance on interviewer performance.
Here, sequence analysis plots are combined with optimal matching, cluster analysis and multidimensional scaling (Bartholomew, Steele, Moustaki, & Galbraith, 2008;Kruskal & Wish, 1978;Piccarreta & Lior, 2010). This allows finding similarities across the contact histories and identifying groups of sequences that are homogeneous. The technique allows the identification of unusual calling behaviours and provides survey managers with tools to display such cases. The paper provides some practical guidance on how to analyse call record data using sequence analysis, for example regarding details on how to implement the method into practice including the use of software, how to detect unusual calling behaviours, outliers, unproductive call sequences and coding errors, and highlights possibilities for cost and efficiency savings.
In this paper, the method of sequence analysis is applied to data from the UK Understanding Society survey (Wave 1), a large-scale household survey, for which extensive call record data have been collected. Although the method is illustrated using data from a face-to-face survey, it can be employed in the same way to telephone surveys. Other sequences arising in survey methodology may also be analysed with this approach, such as sequences of mouse movements in web surveys (Olson & Parkhurst, 2013).
The remainder of the paper is organised as follows. The following section (Section 2) provides further background and reviews the use of sequence analysis and call record data modelling. The methodology section, Section 3, introduces the basic principles of sequence analysis in the context of survey process data. In Section 4 the method is applied to call sequences from Understanding Society, including a separate analysis per interviewer. The paper concludes with a summary of the main findings, limitations and implications for survey practice.

Background
Sequence analysis methods are frequently used in a range of disciplines such as in medicine and biology to detect DNA sequencing (Miyazawa et al., 1989;Smith et al., 1986), but also in the social sciences, for example in demography to study family-life trajectories (Elzinga & Liefbroer, 2007) and life course trajectories (Halpin, 2003;Aassve, Billari, & Piccarreta, 2007;Wiggins, Erzberger, Hyde, Higgs, & Blane, 2007;Martin, Schoon, & Ross, 2008;Piccarreta & Lior, 2010), and in economics to study employment histories (Malo & Munoz-Bullon, 2003;Pollock, Antcliff, & Ralphs, 2002) and transitions from education into work (McVicar & Anyadike-Danes, 2002). For a recent review of the method in the social science context see Fasang and Liao (2014). Researchers have also started to use sequence analysis in the context of survey methodology with the aim of informing nonresponse adjustment methods based on call histories (Hanly, Clarke, & Steele, 2015;Kreuter & Kohler, 2009). Kreuter andKohler (2009) andHanly (2014) found that variables derived from call record data-in part informed by sequence analysis-did not improve nonresponse adjustment methods. Although not as useful as hoped, they point out in their discussions, that the method may have the potential for use in survey management field practice. This paper aims to address this.
Call record histories were already collected in the time when data were not automatically available through technologies. Durbin and Stuart (1954) discussed the issue of efficacy of callback strategies. The field of survey practice has changed significantly since then. Technology has driven the data collection process forward so it is now possible to monitor fieldwork processes using visualisation tools and methodologies that allow the identification of patterns and potentially unusual or erroneous interviewer behaviours.
Although survey researchers have become increasingly interested in understanding data collection processes, it is still often not clear how best to analyse such data, since call record data can be large and may exhibit complex data structures, such as time dependencies and multilevel clustering. For example, calls are made over time requiring time dependent analyses such as discrete time event history analysis; also calls are normally nested within households and cross-classified within interviewers and areas requiring multilevel crossclassified models (Durrant et al., 2013a(Durrant et al., , 2011(Durrant et al., , 2013bHanly, 2014;Hanly et al., 2015;Sinibaldi, 2014). Call record data can also be of significantly lower data quality, since these data are regarded as a by-product and often do not undergo the same editing and cleaning checks and may exhibit larger proportions of measurement error and missing data than the actual survey data. Hence they require further cleaning before analyses. Often ad-hoc methods are used and usually summary measures are applied to describe outcomes of call sequences (Bates et al., 2008;Groves & Heeringa, 2006), such as the number of non-contact calls or the total number of calls. As noted in Fasang and Liao (2014, p. 644) sequences of categorical states are much more complex than simple numerical variables and cannot be easily summarised as categorical variables with a limited number of categories. In the survey methods literature it has been recognised more recently that it may be important to analyse the contact sequence as a whole (Hanly et al., 2015;Kreuter & Kohler, 2009). It may be the actual interplay between several call outcomes that is informative. A particular call outcome may have a different meaning if analysed separately or if seen as part of a longer sequence. For example, a non-contact after an appointment may be regarded as a 'hidden refusal' whereas otherwise it may simply be interpreted as a period of absence (e.g. in Table 1 compare household 4 with 1 and 6). Consequently, modelling procedures, such as discrete time event history analysis, have been developed to analyse not just the final nonresponse outcome but the process leading to contact, cooperation or refusal as a whole and to recognise the entire contact history (Durrant et al., 2011(Durrant et al., , 2013bWagner, 2013b). However, a full modelling approach may not always be necessary or desirable to analyse sequences.
Sequence analysis offers a potentially powerful tool for visualising and summarising call record data and for exploring and reducing the complexity of such data structures. It represents a relatively simple descriptive method which can be easily implemented in practice, not requiring any modelling techniques or distributional assumptions, which should be attractive to many survey agencies. It has the potential to provide survey researchers and field managers with a simple, additional graphical tool to investigate interviewer calling patterns either during or post data collection. Findings from the analysis may then inform future routine monitoring as well as further, more sophisticated statistical modelling.
This benefit has already been illustrated in Durrant, Maslovskaya, andSmith (2015, 2017). They demonstrated the use of the findings from this analysis in survey practice by modelling sequence length and sequence outcome jointly, using a multinomial model thereby improving on standard response modelling, which only models response outcome. In the past, researchers used length of call sequences only as an explanatory variable in nonresponse models. Here, they extended the modelling to include length of call sequences as an outcome variable.
Furthermore, their analysis suggests that modelling the joint outcome improves the fit of the model in comparison to the two separate models for either length or final outcome.

Methodology: using sequence analysis
Sequence analysis consists of a series of routines, plots and outputs. Simple sequence plots show the distribution of sequences. Transition rate matrices indicate the propensity for the next outcome options following a particular call outcome. Ideally one would like to find similarities among the contact histories to create groups that are homogeneous in their outcome variables and to summarise the different sequence patterns. The low frequency count of sequences may specify each individual sequence but makes summarising sequences complicated (Piccarreta & Lior, 2010). Given the large number of possible outcomes and patterns no single characteristic that fully describes the sequences will be available. We do not expect an individual summary measure to describe the sequence as a whole. Rather, a combination of multiple characteristics will be necessary to establish an adequate measure of similarity across sequences. In this section, we first describe how a distance matrix can be constructed using optimal matching to order the sequences based on a number of criteria. This distance matrix obtained through optimal matching can then inform cluster analysis and multidimensional scaling. The combination of sequence analysis and multidimensional scaling has been proposed by Piccarreta and Lior (2010) and this approach is extended here to call record data and discussed at the end of this section.

Creating a distance matrix through optimal matching
A distance matrix is a square symmetrical matrix (i.e. where the element in row i and column j is equal to the element in row j and column i) that indicates distances between a number of pairwise objects (the distance between object i and object j appears in the ith row and the jth column) (Bartholomew et al., 2008). A distance matrix helps to summarise important features of the data and to explore relationships. One way of constructing such a distance matrix is optimal matching (Abbott & Hrycak, 1990;Abbott & Tsay, 2000;Hollister, 2009;Levenshtein, 1966;Sankoff & Kruskal, 1983). This method computes the distance between pairs of sequences by counting the number of basic operations that are necessary to transform one sequence into another (Levenshtein distance). In simple terms, the more operations are required, the larger the differences. Each type of operation may be associated with a weight or cost such that a weighted distance matrix may be constructed. The basic operations considered are insertion, deletion and substitution. The insertion and deletion operations are also referred to as 'indel' operations since a deletion of an element in one sequence is equivalent to the insertion of an element in the other sequence. A substitution operation implies the direct substitution of one element in the sequence with another.
It is important to think carefully through the cost settings of each of the operations involved as these costs may have an effect on the resulting distance matrix (Pollock et al., 2002). A typical default option for a substitution cost is 2 and for an indel it is 1, although other cost settings may be advocated (see also Abbott & Tsay, 2000;Hanly et al., 2015;Hollister, 2009;Pollock et al., 2002;Wu, 2000). For example, Pollock et al. (2002) calculated costs on the basis of the likelihood of transitions from one employment status to another. According to Abbott and Tsay (2000), the area of cost setting observes real diversity and some theoretical questions about cost settings have not yet been answered. (However, this is outside the scope of this paper.) Let us briefly consider an example of an indel operation and a substitution that transform the call sequence of household 1 in Table 1 into the call sequence of household 2. Both have two noncontact calls (N) in common but household 1 responds with an interview (I) after an appointment (A) whereas household 2 refuses (R) at the first contact-call. By applying one deletion and one substitution we have converted one sequence into the other.
Household 1: The operations are performed between all possible pairs of sequences. This procedure creates a distance matrix with the dimension equivalent to the number of sequences. In total, since the matrix is symmetrical, (n2-n)/2 distance measures need to be calculated. An advantage is that the method takes account of multiple dimensions or characteristics of the sequences, not just of one or two characteristics as is the case for conventional summary measures of call record data. As indicated, a potential difficulty is the adequate specification of the costs (Abbott & Tsay, 2000;Wu, 2000). Results from prior analysis or different datasets may provide an empirical solution. A theoretical solution with the researcher deciding on the costs for the different types of required operations is also possible. For further discussions and options on cost settings see Abbott and Tsay (2000) and Hollister (2009). Optimal matching is an intermediate stage which is necessary to perform sequence analysis techniques such as cluster analysis and multidimensional scaling discussed below.

Cluster analysis and multidimensional scaling
The distance matrix can be used to carry out cluster analysis, to find similarities between groups of sequences based on multiple dimensions (Bartholomew et al., 2008;Everitt, Landau, Leese, & Stahl, 2011;Kaufman & Rousseeuw, 1990). Different clustering algorithms can be used, for example centroid-based clustering, distribution-based clustering and density-based clustering. It should be noted that in principle different clustering methods may give different results. In this paper we use the optimised 'partitioning around medoids' (PAM) algorithm (Studer, 2013), which differs from hierarchical algorithms and uses a predefined number of k groups to obtain the best partitioning of the dataset. This algorithm was first introduced by Kaufman and Rouseeuw (1987). It aims to identify the k best representatives of groups, called medoids, where a medoid is defined as the observation of a group with the smallest weighted sum of distances from the other observations in that group. The algorithm then aims to minimise the weighted sum of distances from the medoids.
It is important to bear in mind that the number of classes generated by cluster analysis is decided by a researcher and is therefore to some extent arbitrary. According to Halpin and Cban (1998), cluster analysis will always provide a solution even when there is no meaningful structure in the data. Pollock et al. (2002) and Piccarreta and Lior (2010) suggest that it is important to inspect different cluster solutions and sequences in clusters closely when deciding on the most optimal solution for cluster analysis and evaluating the quality of a partition. Pollock et al. (2002) argue that the best solution would have substantive meaning which could be ascribed to each cluster membership.
Another possibility is to analyse the distance matrix by multidimensional scaling (Bartholomew et al., 2008;Kruskal & Wish, 1978). This is a multivariate technique that aims to reveal the structure of the distance matrix by representing the sequences in a small number of dimensions, such as on a one-dimensional scale or in a two-dimensional map or plot. Then, each sequence is identified with a location in that dimension based on the distance matrix. Groups of sequences may be identified based on low, medium or high values in that dimension. In a two-or threedimensional scatterplot, groups of sequences may be identified visually. Part of the interest in the analysis is to try to uncover which attributes of the sequences appear to carry weight in the similarity measure, i.e. which attributes are key features that determine the distances between sequences. The multidimensional scaling scatter plot allows the identification of groups of sequences as well as potential outliers.
A combination of the two techniques (cluster analysis and multidimensional scaling) was proposed by Piccarreta and Lior (2010) in order to assure the accuracy of the results obtained through cluster analysis and the visual representation of the clusters. The multidimensional scaling can help to examine graphically the quality of the cluster partitioning and of the cohesion within clusters (Piccarreta & Lior, 2010).
Application of sequence analysis to the UK Understanding Society survey data

Design and fieldwork of Understanding Society
Sequence analysis is applied to call record data from the UK Understanding Society survey data (Wave 1). Understanding Society is the UK Household Longitudinal Study of approximately 40,000 responding households in the United Kingdom, covering topics on health, work, education, income, family and social life to help understand the long-term effects of social and economic change as well as policy interventions. The study has many advantages over previously existing datasets in the UK by being exceptionally large and comprehensive. In particular, the study was designed to collect a range of paradata, including call record data. Only interviewers with above average experience and ability were selected for the study.
Data collection for each wave is scheduled across a 24 months period, with interviews taking place annually. Wave 1 data collection took place between January 2009 and March 2011. All interviews at Wave 1 were carried out face-to-face in respondents' homes by trained interviewers using computer-assisted personal interviewing (CAPI). All adult household members (age 16 and older) were asked to respond. A household also needed to respond to a household questionnaire in addition to all individual interviews. A minimum of six calls were made at each sampled address before it was considered unproductive but interviewers were encouraged to make further calls if possible. Interviewers had one month to contact households allocated to them (McFall, 2012;McFall & Garrington, 2011).
The main survey consists of the two main sample components in Wave 1: the general population sample (GP) and the ethnic minority boost sample (EMB). The survey has a multistage sample design and households are clustered within interviewers and within primary sampling units (PSUs). The details of the sample selection can be found in Lynn (2009). The study benefits from rich paradata which include call record data and interviewer observation variables (Knies, 2014).

Call record data in Understanding Society and analysis sample
Call record data in Understanding Society contain information about the outcome of calls. The call outcome at each call, the key variable of interest here, is recorded by the survey agency as a 5-categorical variable: 'non-contact', 'contact made', 'appointment made', 'any interviewing done' and 'any other status'. A limitation is that the last category combines a range of possible call outcomes, ranging from different types of refusals to different ineligibility statuses. It should be stressed that this coding was provided by the survey agency and hence it is not possible for us to distinguish the different types of categories contained in the 'any other status' categories. The final outcome variable recorded at the household level has about 50 outcome codes, split into six broad groups, containing 'ineligible', 'refusal', 'contact made but no interviewing', 'any interviewing but not completed' (at least one individual interview completed), 'case completed' (i.e. household questionnaire and all individual interviews from all member of the household have been completed). An additional category in the call outcome variable, 'interviewing process completed', was also created, if it was the last occurrence of the category 'any interviewing done' in the sequence and the final outcome call indicated 'case completed', which means that all household members have responded individually and the household questionnaire has been completed.
The analysis sample includes all households from Wave 1 with at least one call, including those which later were classified as ineligible. Cases from the Ethnic Minority Boost sample are excluded as rules for the selection of the boost sample differ from the rules for the main sample (Berthoud, Fumagalli, Lynn, & Platt, 2009). Calls with no recorded outcomes are also excluded (1.2% of all calls). The final analysis sample contains a total of 255,778 calls with 11,143 distinct sequences, clustered within 47,899 households (including both responding and nonresponding households) and 741 interviewers. The number of noncontact calls is 142,705 calls (representing 56% of all calls) which is relatively high. The minimum length of a sequence is one and the maximum is 30 (mean length 5.34, median 4).
Application: basic sequence plots and transition rates Figure 1 shows a basic sequence plot which displays the sequences across calls for every household, colour coded according to the final outcome of each call (colour version of all plots could be found in online Appendix). Each horizontal line in the plot represents a call record for one household, that is one sequence. About 10% of households experience only one call, and these end primarily in interview or 'any other status' indicating either ineligibles or refusals (as coded by the survey agency). Surprisingly, a small proportion of households experience a contact call with no further outcome or even a noncontact call with no further follow-up visits, which does not show adherence to the interviewer guidelines. This trend continues with the proportion of interviews steadily declining and the longer call sequences being predominantly driven by non-contact calls. Just over 70% of all sequences have a length between 1 and 6 calls. After 10 calls 88% of all households have been completed and after 15 calls this has increased to 98%. About 8% of all call attempts are still being made after the 10th call (20,032 calls). Figure 1 clearly shows that there are a number of households that do not receive the required minimum of 6 calls, although the households have been coded as a contact with no further outcome or a noncontact. This is the case for almost 8% of households.
It should be noted that a basic frequency plot such as in Figure 1 suffers from the problem of overplotting giving the large number of sequences. This may hide particular features and, if not carried out carefully, may even be misleading. Solutions have been offered in the literature such as plotting only a subgroup of sequences, unusual cases and outliers and ordering sequences (see Fasang and Liao (2014) for further discussions). We aim to overcome overplotting by plotting subgroups and by ordering sequences. Figure 2 displays the ten most frequent sequences, sometimes referred to as a sequence frequency plot (Fasang & Liao, 2014). The most frequent sequence with only 6% contains two calls with an appointment made at the first call and complete interview at the next call. The second most frequent sequence (4.5%) contains one call with outcome 'any other status' (ineligible or refusals). Interview at the first call accounts for only 3% of all sequences. Sequences resulting in 'non-contact and interview' or 'non-contact, appointment and interview' account for 6% in total.
Analysing call record data from the European Social Survey (ESS), Kreuter and Kohler (2009) find a high proportion of short sequences resulting in interview at the first call (22%) or contact and interview (18%); third most frequent is immediate refusals (9%), and fourth is no contact then interview (6%). In their analysis, the UK tends to have longer sequences than the other countries in the sample.
A contributing factor in Understanding Society may be the allowance of long call sequences (at least for the first wave where survey researchers are keen to keep as many sample members in the sample as possible) rather than the specification of a maximum number of calls. In our sample the sixth group is also interesting as it finishes with the status 'any interview done' but there is no follow-up to complete the interviewing process or a coding to indicate 'completed status'. Both  graphs are helpful in visualising sequences and can help assessing compliance with the survey protocol which prescribes what interviewers should be doing. Table 2 contains a transition rate matrix, indicating the likely outcome at the next call given a particular call outcome at the current call. Rates in rows add up to 1. Table 2 indicates that a noncontact is very likely to lead to another non-contact (65% of the time) and to a contact call without a further outcome (12%). A contact call is likely to lead to a non-contact (38%), to another contact (20%) and to the end of the sequence (18%), possibly indicating a refusal but with the interviewer not coding it as such. 'Any other status' is either the end of the sequence (56%) or it leads to a non-contact (23%), or another 'any other status' outcome (11%). An appointment leads in almost 24% of cases to an interview. However, 18% of cases also result in a non-contact indicating a broken appointment and therefore possibly a hidden refusal. The matrix indicates again some unusual interviewer behaviour: for example, in 6% of completed cases, somewhat surprisingly, further calls are being made after the entire interviewing process has already been completed (2.4% of all sequences). These cases require further investigations as they may imply unnecessary costs. In total 18% of cases with a contact and 1% of cases with an appointment are not followed-up. Whilst there may be legitimate reasons for these interviewer behaviours, including coding errors of outcomes, it seems nevertheless worthwhile investigating these unusual calling strategies, including the characteristics of interviewers who conduct such calls. To summarise, outcomes such as noncontact, contact and 'any other status' are likely to lead to nonproductive call sequences, whereas an appointment may indicate a high likelihood for an interview at the next call.
Another advantage of basic sequence plots is the identification of potential coding errors. For example, initial analysis of the sequence plots for the Understanding Society data (results not shown) found more than 13% of cases with calls after completed interviews. After analysing the call record data further, it became apparent that a number of calls had erroneously been entered in the wrong order. This was subsequently corrected by the survey agency, and also for further waves.

Cluster analysis and multidimensional scaling
Next, cluster analysis based on the optimal matching distance matrix is performed. To implement optimal matching the cost settings need to be specified. The 'constant' method in R is the default option for optimal matching in the TraMineR library (substitution cost 2, indel cost 1). With this method the substitution costs are the same for all possible call outcomes and the substitution operation is equivalent to 2 indel operations. However, it is more intuitive to set the costs according to the transition rates, that is the probability of moving from one call outcome to another. The TRATE method in R implements this (Gabadinho, Ritschard, Studer, & Mueller, 2010;Gabadinho, Ritschard, Mueller, & Studer, 2011), and this is the method used here. A number of different cost settings were explored, but the overall conclusions remained the same. Cluster analysis was performed exploring different number of clusters (k = 3, 4, 5 and 6 clusters). Figure 3 displays the four cluster solution which was believed to be the most appropriate choice due to it being the most interpretable and parsimonious. The first cluster (17,705 households) contains mainly successful sequences with interviews. These are primarily shorter sequences (around two to three calls) but interestingly also contain a small number of longer sequences (around 7-13 calls). These longer sequences are also characterised by appointments and interviews, and may belong to households with several household members where longer call sequences may be expected. The sequences from cluster 1 hardly contain any non-contacts. Cluster 2 (12,768 cases) contains again shorter sequences, with around 2-5 calls, but with predominantly unsuccessful outcomes, including ineligibles, refusals, noncontacts and non-productive contact calls. Cluster 3 (11,406 households) contains medium to long call sequences (around five to eight calls mostly) with a number of successful calls or an interview during the call sequence but also many non-contact calls. Cluster 4 (6020 cases) contains long sequences with 10 and more calls, predominately driven by non-contact calls, and mostly unsuccessful call outcomes. To summarise, the cluster analysis seems to represent a categorisation that is driven primarily by call length and outcome. Cluster analysis proves to be useful in summarising rich call record data into meaningful and interpretable dimensions which can be used for further analysis (see as an example Durrant et al., 2015Durrant et al., , 2017. We now turn to the results from the multidimensional scaling analysis, based on the optimal matching distance matrix. Figure 4 shows two graphs, presenting a) the first dimension only and b) the first and the second dimensions together. Multidimensional scaling orders sequences according to a criterion. In any particular area of the vertical axis of Figure 4(a) the omitted sequences are similar to the ones plotted. The graph implicitly uses the method by Fasang and Liao (2014) of using the middle sequence or medoid. Displaying the sequences according to their ranking of the multidimensional scaling analysis in the first dimension (Figure 4(a)) indicates an ordering primarily according to length, with successful call sequences at the bottom of the vertical axis, characterised by appointments, interviews and fully completed interviews with no non- contacts. These are predominantly short sequences with only some longer ones, similarly to cluster 1 discussed above. Next, short sequences are observed with 'any other status' outcome (ineligible or refusal) without non-contacts. Then, calls with non-contacts are displayed in increasing order from the bottom to the top of the vertical axis with calls driven by non-contacts at the very top. The one-dimensional graph therefore seems to be displaying sequences according to length and to some extent outcome. Although this first graph has strong similarities with the simpler Figure 1, it is different in that it groups sequences together according to all call outcomes during the whole sequence not just the outcome of the last call and it is not simply based on length. In Figure 4(b) each point inside the graph represents a sequence in the dataset, including its position according to the first and second dimension. The first dimension, presented on the horizontal axis, orders the sequences according to length and to some extent outcomes with length being the driving factor. The vertical axis displays the sequences according to the second dimension with sequences below the horizontal line representing mostly successful sequences and above the line mostly unsuccessful sequences. The further away the sequences are from the horizontal line the larger the number of non-contacts. The second dimension therefore displays sequences as a mixture of outcome and length with outcome being the driving factor. Figure 4(b) displays a number of outliers or unusual cases, such as those with long calls and predominantly non-contact calls. In survey practice these could be displayed separately to investigate further mechanisms leading to such patterns.

Multidimensional scaling within interviewers
It is well known that interviewers can have significant influences on response outcomes (Durrant & D'Arrigo, 2014;Durrant, Groves, Staetsky, & Steele, 2010;Hox & De Leeuw, 2002;Pickery, Loosveldt, & Carton, 2001;Vassallo, Durrant, Smith, & Goldstein, 2015). Of particular interest is an analysis of the calling behaviour per interviewer, which provides an easy and intuitive tool for investigating interviewer performance and adherence to interviewing protocols and guidelines. This is of relevance from both a methodological perspective (analysis of sequences within  subroups) and from a substantive perspective, since interviewers play a crucial role in scheduling calls and making contact and establishing cooperation with sample members (Durrant & D'Arrigo, 2014;Durrant et al., 2010). Figure 5 shows multidimensional scaling plots for two selected interviewers. The axes are defined as previously in Figure 4. The call sequences are colour-coded to indicate short and long call sequences (defined as up to six calls and more than six calls respectively to represent the interviewer guidelines) and successful and unsuccessful sequences (defined as at least one interview in the household versus no interviews respectively). The plots reveal a clear distinction between short and long, and successful and unsuccessful calling sequences as well as significant differences between interviewers. For example, interviewer A has made mostly long call sequences (in fact 57%) and has experienced a relatively high proportion of unsuccessful calls (63%), with the plot indicating also a cluster of short and medium-to-long successful calls. Interviewer B has made mostly short call sequences (88%) with a relatively high proportion of successful call sequences (62%). Hence, such sequence plots allow the visualisation and comparison of calling patterns between interviewers. However, it should be noted that any interviewer effect in face-to-face surveys can be confounded with area effects (Campanelli & O'Muircheartaigh, 1999;Durrant et al., 2010) and differences in interviewer performances found may not represent a causal effect. Hence, the plots do not allow a direct evaluation of interviewer performances since differences may also reflect the difficulty of the cases to be interviewed or the type of area the interviewer works in. Sequence analysis, as with all descriptive statistics tools, should therefore be followed up by further investigations on the underlying reasons for the calling behaviour. Nevertheless, the findings may point towards some tailored treatment such as incentives or to more specialised training for interviewers in certain areas.

Conclusions and implications for survey practice
The paper presents sequence analysis as a tool for investigating call record data to better understand and improve survey processes. It offers a potentially powerful tool for displaying and summarising record data, which can be large and complex, and for the detection of outliers and unusual cases and subgroups. The method proposed can be used for both face-to-face and telephone, and for cross-sectional and longitudinal surveys. A novel approach is to introduce sequence analysis within interviewers, allowing the identification of unusual interviewer behaviour and an initial analysis of interviewer performance. This contributes to the growing body of literature on interviewer performance and evaluation, e.g. Pickery et al. (2001); Durrant et al. (2010); Durrant and D'Arrigo (2014); West and Groves (2013).
Basic sequence plots show the distribution of sequences across households. Transition rate matrices indicate the likelihood of a particular call outcome given a previous outcome. Combining sequence analysis with cluster analysis and multidimensional scaling allows grouping of sequences with similar features. Multidimensional scaling defines a rank order, based on a distance matrix, and sequences can be plotted in one or two dimensions, which helps to reveal key features of the sequences that drive that ordering. Sequence analysis, in particular the multidimensional scaling plots, may identify groups of sequences, outliers and unusual or unexpected calling behaviour, which may require further investigation by fieldwork managers. Although some of the standard statistical software can nowadays implement the method, the routines are still mainly limited to a relatively small number of short sequences. Here, we overcame this problem by using the WeightedCluster and Vegan libraries in R.
We first list the specific findings from the application of sequence analysis to the Understanding Society data. Although the findings are not the main focus of this paper, they support the general implications below: (1) Despite clear guidance on the minimum number of calls per address, a number of households are identified that received households are identified that received significantly less than six calls, despite the fact that they were neither coded response, refusal, nor ineligible (a total of 8% of households). Some of those call sequences consisted of non-contact calls throughout. For future rounds, we recommend monitoring of such cases, establishing the reasons and, if appropriate, provide further guidance to interviewers to avoid such occurrences. In a small number of cases further calls are being made although a case was already coded as completed. Careful consideration and guidance of interviewer work in such circumstances may help to avoid unnecessary calls in the future to improve survey efficiency and to reduce costs. (2) The results from the transition rate matrix indicate that a non-contact is likely to be followed by another non-contact. A contact call is likely to lead to a non-contact or another non-productive call. Hence, findings from transition rate matrices may help to predict response outcomes and might help designing tailored treatment, for example, regarding decisions to stop calling. (3) The intuitive notion that sequences are characterised by length and outcome are supported by the substantive findings from both the cluster and the multidimensional scaling analyses. The multidimensional scaling plots showed a clear distinction between short and long, and successful and unsuccessful calling sequences. (4) The sequence analysis reveals the significance of non-contact calls. In fact, in this dataset a large proportion of all calls are non-contact calls (56%). If the aim is to increase efficiency and to reduce the number of unproductive calls, it seems advisable to investigate methods to reduce the large number of non-contact calls, for example to apply methods for the reduction of non-contact calls such that targeting time slots with a high likelihood of establishing contact and to provide such guidance to interviewers.
(For examples of such techniques, see Durrant et al., 2011;Kulka & Weeks, 1988;Weeks et al., 1980Weeks et al., , 1987 Following on from the specific findings, the more general implications of the method for survey practice are wide ranging: (1) Sequence analysis may be used to inform future (routine) monitoring to identify unusual calling behaviours and outliers and to assess the adherence to interviewing guidelines either during data collection or retrospectively once data collection has finished. In a further step, the methods may inform the design of automated flag systems that indicate unusual calling behaviours, to inform the choice of which summary measures to use in routine monitoring or to derive summary statistics and indicators for future use. Although often used, simple summary measures on their own are generally not sufficient to analyse sequences as a whole.
(2) More specifically, sequence analysis of call record data is another tool to identify problems or editing errors in the dataset. For this dataset in fact, sequence analysis identified a number of editing and coding errors that were subsequently corrected by the survey agency in this and future waves. These errors had not been picked up initially by standard summary measures. (3) Related to 1 and 2, the findings from sequence analysis may be used to inform responsive and adaptive survey designs, which represent data-driven tailoring of data collection procedures to different sample members. In a responsive design the survey is modified after the initial design and after the survey has been fielded, with modifications based on the early phases of the fieldwork. If a survey has been conducted a number of times previously, it may be possible to determine different optimal designs for different subgroups on the basis of this past experience (adaptive survey design). (For a range of applications of such designs, including call record data, see Kreuter, 2013;Schouten, Peytchev, & Wagner, 2017;Luiten, 2013;Tourangeau, Brick, Lohr, & Li, 2017.) (4) Sequence analysis informs further (more sophisticated) modelling going beyond simple summary measures. As a demonstration, Durrant et al. (2015), (2017) model for the first time sequence length and sequence outcome jointly, informed by the outcome of the sequence analysis that both indicators simultaneously are important for describing survey processes. As a result, they were able to improve on standard response modelling using a multinomial model. More specifically, having identified sequence length and outcome as (joint) key features of the calling patterns, it is of interest to investigate the correlates and determinants of both. Certain call outcomes early on in the sequence are predictive of later call outcomes and sequence length. Durrant et al. (2015), (2017) identify cases with a high likelihood of long unsuccessful calls early on in the data collection process to inform more efficient calling strategies.
Sequence analysis can also have limitations. Although it does not depend on explicit distributional assumptions and does not require modelling techniques, a range of choices, such as regarding metric and costs and number of clusters or even regarding a cluster analysis technique, have to be made. Here a sensitivity analysis was carried out exploring the different settings of the algorithms. There is also the potential problem of overplotting with many sequences displayed in a plot, which may lead to misrepresentation of data if plotting is not carried out carefully. There is a technical limit how thin each sequence line can be displayed and what is visible to the human eye (see also Fasang & Liao, 2014). This may be overcome by only plotting a subgroup of sequences or only unusual cases. A number of suggestions have been made, for example using relative frequency sequence plots (Fasang & Liao, 2014), and some of those techniques have been explored here. Software package capabilities represent another limitation for sequence analysis. The R software package is not able to conduct optimal matching or multidimensional scaling on a large number of relatively long sequences as the limit of a vector in R is 2 31 -1 elements. Consequently, optimal matching can be applied in R to a dataset with up to around 35,000 sequences (depending on the length of the sequences). Other statistical software packages, such as STATA and CHESA, suffer from the same limitation. However, to overcome this problem we used the WeightedCluster (Studer, 2013) and Vegan (Oksanen, 2013) libraries in the R software package. Also, a function is available in the WeightedCluster library for performing cluster analysis around medoids (PAM algorithm), which is computationally efficient and is therefore appropriate for analysing very large datasets (Studer, 2013).
To conclude, sequence analysis is a useful technique for summarising and visualising rich call record data. It can be used as a technique for monitoring field work in survey practice as well as for summarising complex call record data. These summaries can then be employed for further analysis of different aspects of survey research, e.g. to inform modelling for the prediction of call outcomes (see Durrant et al., 2015Durrant et al., , 2017.