The Political Parties Crosswalk for mapping party codes in cross-national surveys to Party Facts IDs

ABSTRACT The Political Parties Crosswalk (PPC) maps party codes used in questions about party preferences in European cross-national public opinion surveys to Party Facts IDs, which are commonly used identifiers of parties in political science datasets. The PPC, a data linkage tool, supports research that combines data on party support from surveys with characteristics of parties, and in particular, facilitates research that combines data from different survey projects. PPC v.1 covers surveys conducted in Europe in the following projects: European Social Survey, European Values Study, World Values Survey, Asia Europe Survey, Consolidation and Democracy in Central and Eastern Europe, Integrated and United, Life in Transition Surveys, New Baltic Barometer, New Europe Barometer, and selected waves from the Candidate Countries Eurobarometer, Eurobarometer, and the International Social Survey Programme. In addition to describing the scope and properties of PPC, as well as the steps of data processing and quality assurance, we present case studies that illustrate possible applications in substantive and methodological research.


Introduction
With the growing number of datasets available for secondary analysis, standards for coding certain information become increasingly important, as they facilitate combining data from different sources thus creating new opportunities for research. Among datasets with characteristics of political parties, the Party Facts (PF) dataset (Bederke, Döring, and Regel 2020;Döring and Regel 2019) provides such a standard, and PF IDs are now commonly used in political science datasets, such as ParlGov (Döring and Manow 2021) and V-Party (Lührmann et al. 2020). The Party Facts ID has not (so far) been adopted as a coding scheme by cross-national survey projects, which frequently include questions about the party the respondent voted for, would vote for, or feels close to.
Political scientists are often interested in analyzing the characteristics of voters together with characteristics of their preferred parties, but linking party-level data with survey data to date required substantial effort to create the necessary recodes. The Scope PPC provides mapping of party codes for data from 53 waves of 12 cross-national survey projects from 41 European countries (according to United Nations Statistics Division 2021) between 1981 and 2019. The coverage of countries is quite unequal, and ranges from a single survey in Andorra and two surveys in Kosovo, to 37 surveys in Hungary, 38 in Slovenia, and 39 in Poland. The survey data include information about party preferences and/ or voting behaviour of over 1 million respondents and over 1200 political parties or coalitions. The projects with their basic characteristics are listed in Table 1, while full information about dataset versions with references are provided in the Appendix and in the documentation accompanying the PPC data. It is worth noting that a few other initiatives have taken up the task of mapping survey party codes to PF IDs. Party Facts themselves provide crosswalks to numerous other datasets, including surveys: ESS, WVS, Afrobarometer 2016, Latinobarometer and the European Elections Survey 2014 (https://partyfacts.herokuapp.com/data/). The project 'Old and new boundaries: National Identities and Religion', as part of the documentation, created a mapping for party codes for selected countries for ESS, EVS and ISSP (Bechert et al. 2020). Sophie E. Hill has also created a crosswalk for ESS which is available online (Hill 2020). PPC contributes crosswalks for additional European survey projects, while at the same time proposing procedures for the documentation of data processing, quality assurance, as well as crosswalk format that facilitates verification and reuse. The existing crosswalks were used as part of the verification strategy for PPC, as we note later.

Outline of data processing
To create PPC, data processing included the following steps:

Screening survey data and identification of source variables
The scope of the survey data was defined as cross-national survey projects on general population samples of adult populations in European countries that included questions referring to trust in state institutions. Altogether, we screened 83 data files listed in the PPC documentation. In these datasets, we identified 1106 candidate variables, including questions about: Questions about presidential elections were excluded due to the domination of questions about parties in the analyzed surveys. Of these 1106 selected variables some were ultimately discarded, if, for example, they were country-specific variables while equivalents being available in a single variable or if they lacked value labels both in the data and in the survey documentation. The final selection includes 974 variables from 41 data files.

Mapping template
The processing of survey metadata, i.e. the variable names and labels, as well as values and value labels, was performed in R (R Core Team 2020). To create mapping templates, we extracted value labels from SPSS data files with the packages haven (Wickham and Miller 2019) and labelled (Larmarange 2019). We created separate mapping tables for each national survey (project-wave-country). This is particularly important ifin a multi-country datasetin one variable the same codes have different meanings in different countries, as is the case, for example, of ISSP, wave 1991. Further, having separate mappings per survey makes it possible to include in the crosswalk information about the frequencies of responses, which are often useful in identifying parties where value labels are insufficient. The combined mapping templates were exported to a spreadsheet programme. In cases where value labels with explicit party names were not available in the data files, labels from printed documentation (codebooks, questionnaires) were added to the spreadsheet file manually.

Mapping
The coding was performed manually via a custom-made mapping utility, a Python script based on the PyInquirer package, 'a Python module for collection of common interactive command line user interfaces, based on Inquirer.js' (Oyetoke 2020). The script takes survey metadata from the mapping templates and from the Party Facts data, and displays them next to each other in tree-like feeding lists that facilitate matching of the corresponding items. The coder's task is to select parties from a given country in the mapping table and match them with selected parties from Party Facts. The work is done country by country. Matched parties are removed from the list and the same applies to countries for which all parties are matched. The resulting matches are saved to a CSV file. The organization, display of data and the automated removal of matched parties substantially improve the efficiency of the coding.

Verification
The multi-step verification included basic checks on the complete crosswalk, such as the screening of all survey value labels that were assigned the same PF ID, as well as investigating cases where a label could not be matched with a PF ID, especially in the case of parties that received many survey responses. 1 We also screened the data to identify large jumps in support for major political parties, however, these large jumps sometimes result from differences in the question type (e.g. party the respondent voted for versus party the respondent would vote for). We return to this issue in the case studies section. Next, we merged PPC with party-level data from the V-Party dataset and ParlGov's Election dataset to verify the matching with existing high-quality party-level datasets.
Additional checks were performed on the recoded survey data. For example, we verified that in all surveys supporters of electoral winners had higher trust in parliament than supporters of electoral losers (e.g. Martini and Quaranta 2019) and scrutinized the few instances where this was not the case. Finally, we verified the coding for ESS and WVS with the mapping tables provided in the Party Facts project, finding almost perfect agreement in the party codes.

Limitations, challenges, and opportunities
We briefly mention the main challenges encountered during the mapping. More details and examples are provided in the PPC documentation.

Identifying parties
While in the vast majority of cases it was possible to unambiguously identify and match the parties from survey value labels to PF IDs, a handful of cases were challenging due to the brevity of the labels that included a shortened version of the party name or just its abbreviation (in the local language or in English), or resulting from the fact that PF only includes parties that received at least a few per cent share in elections, in addition to challenges posed by party name changes and alliance or coalition formation.

Coalitions and name changes
There is a tension between accuracy in reflecting the political scene at the time of the survey and the continuity in representing support for political forces. This tension manifests itself in the coding of parties which change names and in the coding of coalitions, especially those that are led by one large party accompanied by smaller coalition partners. This problem applies both to the labelling in the survey data and, perhaps to an even larger extent, to external party-level dataset with which survey data would be merged.
In PPC, we assigned PF IDs that best match the given survey label. In order to match survey data, via PPC, to external party-level datasets, PF IDs assigned in PPC may have to be adjusted to match codes in the external dataset, which have their own rules for dealing with name changes, mergers and coalitions. For example, the V-Party dataset, while predominantly relying on PF IDs, occasionally uses their own codes for alliances. Thus, even ensuring maximal internal consistency within PPC does not guarantee that the merging with an external dataset will be seamless. It is also important to consider whether the survey question asks about past or potential voting. Response options to questions about past voting behaviour may be reasonably expected to reflect parties that had contested in the latest election the question asks about. At the same time, response options to questions about potential vote choice reflect the situation at the time of the survey, which may be different from that in the most recent and in the following election. Ultimately, the coding in such cases should be determined by the intended use of the data.
To facilitate merging the data with V-Party and ParlGov's Election datasets, we provide necessary adjustments that match survey responses with the most recent election results, prior to the first day of survey fieldwork. These adjustments account for less than 4% of all responses. We caution data users to carefully verify the merging of survey data, via PPC, to party-level datasets.

Special codes
We used standardized special codes for parties that we could not match to Party Facts IDs or response options that referred to 'other parties'. Additionally, survey responses that did not refer to parties at all were classified into several categories, including responses about casting a blank vote, supporting independent candidates, respondents being ineligible to vote, as well as when the question was inapplicable, often because the preceding filtering question about having voted or planning to vote was answered negatively. Identifying and consistent coding of such 'missing value' categories enables distinguishing between non-voters and item non-response, and the analysis of these special groups across survey datasets. Details about these special codes are available in the documentation.

Case studies
In this section we provide case studies to illustrate selected properties of party support as measured in cross-national surveys, allowing to verify the integrity of the party mapping and its suitability for longitudinal analyses of partisanship, as well as methodological research.

Following changes in party support
When examining the level of support for specific political parties based on survey variables recording the support for these parties, it is necessary to keep in mind the type of question the respondent answered. PPC provides mapping of responses for different types of variables: the party the respondent voted for in the last election, the party they would vote for if an election was held the coming Sunday or otherwise soon, the party the respondent feels close to, etc. Some of the differences in the level of support for the same parties even in the same years are related to these question types. Responses to 'would vote' and 'close to' questions between the 2005 and 2009 elections indicated very low support, which manifested in the 3% vote share in the July 2009 election. Only just above 1% of respondents in the ESS Round 5 (December 2010-March 2011) declared having voted for NDSV and less than 1%feeling close to. In ESS Round 6, a single respondent declared feeling close to the party.

Poland's Democratic Left Alliance
This case study illustrates the challenges associated with examining changes in support for parties that often participated in elections as part of coalitions and alliances. We use the example of Poland's Democratic Left Alliance (Sojusz Lewicy Demokratycznej, SLD), whose support trajectory is presented in Figure 2. SLD was created in 1991 as an alliance led by the party Social Democracy of the Republic of Poland, the main successor of the communist Polish United Workers' Party. In 1999, SLD registered as a party under the same name, Democratic Left Alliance. For the 2001 election, SLD formed an alliance with the Labour Union, which went on to win the election with 41% of votes. In the next election

Left-right placement of respondents and parties over time
The second case study illustrates over-time changes in the meaning of survey items measuring respondent's left-right self-placement in different countries. We applied PPC to combined data from European survey projects that included questions about individual left-right self-placement, and merged in party characteristics from the V-Party dataset depending on the respondent's preferred political party. We examined the correlations between respondent's left-right placement and three party characteristics: economic left-right scale, following religious principles and support for LGBT equality. Correlations for selected two countries -Poland and Sloveniaare presented in Figure 3. In this graph, each point corresponds to a correlation in one survey between respondent's left-right self-placement and one of the three party characteristics.
According to these results, correlations between individual left-right placement and parties' economic left-right position in Slovenia have been positive throughout the studied period and have increased between 1991 and 2019. A similar pattern is found regarding correlations with party's religious character. This means that in Slovenia people who identify as right-wing tend to support parties with economic right-wing orientations and parties that invoke God or religion. At the same time, left-wing selfidentification more often coincides with support for parties that promote LGBT social equality.
In Poland, correlations between individual left-right self-placement and party's religious character remained strongly positive during the thirty-year period under study. Meanwhile, correlations between left-right self-placement and party's economic leftright character started in the positive range and then became negative during the 2010s. In 2019, respondents who identified as right-wing were more likely to support economically left-wing parties. Over time, the popular understanding of left-right has clearly lost much of its economic meaning. This example illustrates the limitations to cross-national and over-time comparability of the meaning of left-right self-placement scales that have been raised by other researchers (Bauer et al. 2017;Wojcik, Cislak, and Schmidt 2021;Zuell and Scholz 2019).

Modelling winner-loser gaps in political trust
The combined survey data, with respondents' party preferences matched to parties' winner-loser status based on the V-Party dataset, enable the analysis of long time series of changes in political support among electoral winners, losers and non-voters. Winner-loser gaps reflect the degree to which losers consent to their loss, which is a stronger signal of elections and democratic legitimacy than the satisfaction of winners (Anderson et al. 2005). Analysing support among non-voters completes the picture with information about the possible political alienation of these groups. Figure 4 presents estimated levels of political trust based on items on trust in the national parliament, political parties, and justice system, among electoral winners, losers, and non-voters, in surveys from Hungary and Poland between the early 1990 and 2019. The estimates were obtained based on Bayesian item response theory models, which allowed for different numbers of points on response scales to trust items across projects (Bürkner 2017, Kołczyńska et al. 2020. This approach enables model-based harmonization of response scales without resorting to simplistic linear rescaling, which is commonly used, but which has been found to be problematic (Valgarðsson and Devine 2021). With the probit link function, the resulting estimates are interpretable as standardized z-values.
As could be expected, at all times political trust among winners is higher than among losers and non-voters. In Hungary, the winner-loser gap remained roughly stable during the first two decades post-transition and increased sharply following the 2010 election won by Fidesz. Political trust among supporters of the ruling party after 2010 exceeds 0, which means that positive responses to the trust items are more frequent than negative ones. At the same time, political trust among losers and non-voters is clearly negative.
In Poland, the situation is similar in that the winner-loser gap was stable and below 0.5 units on the probit scale until around 2010, then increased to approach 1 unit in 2019. The  year 2010 marks the escalation of the conflict between supporters of the then ruling Civic Platform and the main opposition party Law and Justice, following the 2010 plane crash in Smolensk, Russia, which killed 96 passengers and crew, including President Lech Kaczyński and high ranking state and military officials, on their way to commemorate victims of the Soviet Katyń massacre during the Second World War (Etkind et al. 2012). Until 2015 the winner-loser gap increased primarily because of the decline of political trust among losers. In 2015, Law and Justice won elections, and their supportersthe new winnerssaw a sharp increase in political trust, resulting in the largest winnerloser trust gap since 1990.
In the case of both countries, the sharp increase in the winner-loser gaps in political trust coincides with the increase in political and societal polarization scores according to the V-Dem data (Coppedge et al. 2021). The question about systematic link between the winner-loser gap in political support and other aspects of social and political polarization remains an avenue for exploration by future research.

Concluding remarks
PPC provides mapping tables for variables recording party preferences in 12 crossnational survey projects carried out in Europe until 2019. The crosswalk format is preferable to a more common approach of providing recode scripts, because it is both humanand machine-friendly and software agnostic. The crosswalk format makes it easy to filter and subset the data, e.g. in a spreadsheet programme, and thus facilitates verification of the mapping. This format makes crosswalks straightforward to extend, by simply adding rows that represent new variables from added datasets. The application of standardized party codes can be performed by directly using the crosswalk to transform the data, e.g. by merging the crosswalk into the survey data or as a basis for the creation of recode scripts in the chosen programming language. In this way, PPC proposes a standard for the collaborative development of resources for cross-national social science research with survey data, including on the harmonization of existing surveys and integrating them with other data sources. We acknowledge that creating such crosswalks, as any form of ex-post survey data harmonization, is time-consuming and error-prone, and always a second best alternative to ex-ante harmonization. We, therefore, encourage survey data producers to consider using Party Facts IDs to record responses to partyrelated survey questions, or to provide crosswalks from survey codes to Party Facts IDs in new datasets. Note 1. Not all parties are present in the Party Facts data, because criteria for inclusion are at least 5% in elections with some degree of arbitrariness (cf. https://partyfacts.herokuapp.com/ documentation/codebook/). Parties that received just a few responses, for which we could not find a corresponding Party Facts ID were thus not as suspicious as parties chosen by sizeable shares of respondents.

Disclosure statement
No potential conflict of interest was reported by the author(s).