Bias in the eye of beholder? 25 years of election monitoring in Europe

ABSTRACT Building on the original corpus of OSCE monitoring reports, the article analyses quarter of century of election monitoring in Europe and assesses the congruence of OSCE written assessments with expert views. We show that, overall, the OSCE monitoring reports are highly correlated and congruent with expert assessments. More importantly, the level of congruence between the two increases with time. However, we also identify various forms of biases rooted in strategic interests and institutional preconditions. Mainly, we show that OSCE has a strong and positive bias towards Russia and its allies when it comes to election assessments indicating defensive and lenient stances. We theorize this mechanism as a pushback effect and show that although Russia’s effort to cripple the activities of OSCE in the past two decades was not successful, OSCE was effectively forced into a defensive position producing less critical assessments than reality warrants.


Introduction
In the past three decades, election monitoring has become such an important factor in regimes' credibility that even authoritarian elites have started to feel obliged to invite international observers mimicking the effort of fulfilling their democratic commitments. 1 In an environment where media, governments, and international organizations listen carefully to what election monitors have to say, the official monitoring reports have increasingly affected countries' international outlooks, leading to various political as well as economic ramifications. With this much influence, international monitors have started to be dragged into thorny political entanglements often accompanied by accusations of political bias questioning the overall integrity of the monitoring missions and their goals. 2 How is this reflected in the monitoring practices in Europe? What kind of bias (if any) does prevail in a region with comparatively rich history of election monitoring, varying democratic qualities, and (sometimes) unjustified superiority complex?
The goal of the article is to assess 25 years of election monitoring in Europe as conducted by the Organization for Security and Cooperation in Europe (OSCE) and its Office for Democratic Institutions and Human Rights (ODIHR). The aim is to explore whether reports produced by OSCE contain any sort of bias and whether this bias is systematically present in the evaluation of elections in certain contexts. As such, the article intellectually builds on the seminal work of Judith Kelley, 3 yet goes beyond the original period covered while analysing full-fledged final reports and introducing new methods in the study of election monitoring. To this end, we use the wordscores scaling algorithm with guided bootstrap sampling in order to analyse positions of 303 monitoring reports, counting over 8700 pages of raw text, on a latent scale of free and fair election and explore how they fare against the internationally accepted standards. Moreover, the focus on OSCE/ODIHR explores the relevance of election bias in a context where election monitoring has a long tradition and has gone through a well-documented development. As such, the history of OSCE/ ODIHR election monitoring tells the story of election monitoring in post-Cold War Europe and the dynamics that accompany it.
When it comes to existing literature, scholars have identified number of factors potentially driving the biased assessments of international monitors ranging from political, to economic, and strategic motivations. 4 In almost all of these settings, the observing authority is presented as the one with the upper hand following its political, economic, or strategic goals. While evaluating relevance of these assumptions in European context, the article explores an existence of a specific type of reverse mechanism, under which international observers (OSCE) with high credibility are systematically pushed by the party being monitored to a more submissive position producing potentially favourable reports. The article conceptualizes this mechanism as a pushback effect and shows how it works in the context of Russian pressure on OSCE/ODIHR monitoring activities.
The overall results show that the OSCE monitoring reports are highly correlated and congruent with independent expert views, which validates the methodologies employed and their common reference to the universally recognized standards of free and fair elections. More importantly, the level of congruence between the two increases with time. However, we also identify various forms of biases rooted in strategic interests and institutional preconditions. We find that higher GDP, GDP proportion of total natural resources rents, Official development assistance (ODA), and legislative elections are associated with more positive assessments. On the other hand, the size of the observation mission is associated with a more negative assessment. More importantly, we show that the OSCE has a strong and positive bias towards Russia and its allies indicating persistent defensive and lenient stances. We theorize that although Russia's effort to cripple the election monitoring activities of OSCE/ODIHR in the past two decades was not successful, OSCE was effectively pushed into a defensive position producing less critical assessments towards some of the post-soviet countries than reality warrants. This pushback effect presents an additional perspective on election monitoring, its biases, and the underlying drivers explaining them. As such, our article contributes to the literature on election monitoring, election observer bias as well as power relations in European context.

International election monitoring and its contested bias
As election monitoring has started to play a prominent role in international acceptance of all sorts of regimes, scholars as well as practitioners raised important questions concerning their impact and credibility. 5 Often, tensions have been highlighted between the proclaimed aim to improve elections through reliable and accurate assessments and the realities of balancing this goal with other objectives. 6 Specifically, the concern has been voiced that election assessments are more positive or negative than reality merits in order to serve certain (geo)political, security, and economic goals. 7 Kavakli and Kuhn 8 even argue that the calculus of outside observers depends not only on who they wish to see in power, but also who they want to keep from power.
Scholars and practitioners of election monitoring agree that independence and impartiality are the hallmarks of a good election monitoring body affecting both its credibility and positive influence. The authority enjoyed by such bodies rests on their adherence to the highest standards of accurate and unbiased election monitoring. 9 Nevertheless, election monitors often face numerous practical obstacles that may hinder their ability to report on elections accurately. For instance, due to oftenlimited resources, decisions have to be made on how many observers can be deployed, how many interlocutors they can speak to, how many polling stations they can visit, where to visit them, and for how long they can stay in the country. The existing literature suggests that this has led to a disproportional monitoring of urban polling stations at the expense of stations situated in remote and rural areas. 10 Additionally, large countries often host proportionally fewer observers than smaller ones, poisoning the statistical significance of the sample of visited polling stations. Some authors also argue that cultural factors such as the observers' nationalities may have an influence on assessments made in the field. 11 Relatively overlooked remains the effect of the hosting country counter-actions which might range from diplomatic squabbles to strategic threats. In the context of OSCE monitoring missions, this "pushback" behaviour is most often associated with post-soviet countries lead by Russian Federation, which has been criticizing OSCE monitoring missions since late 1990s. 12 That said, it must be noted that significant efforts have been made to overcome these shortcomings in reaction to the rising competition among credible international monitoring actors who have started to find themselves under an increasing risk of harming their reputation and effectiveness by inaccurate assessments. In short, monitoring organizations that lack credibility also lack influence. 13 As a result, there has been a substantial increase in adherence to universally accepted principles for international election observation and codes of conduct for election observers. 14 Moreover, more sophisticated observation methodologies have been developed, aimed at improving the reliability of election assessments in general, often with a contribution and feedback from independent electoral experts. 15 As a result, well-established actors with transparent observation methodology, such as EU, OSCE, or The Carter Center are believed to produce election observation reports that are increasingly accurate and objective. 16 The proposed mechanism has been further accelerated with an increase in the number of international election monitoring bodies and greater emphasis on the importance of credibility of the international standards. 17 Building on these theoretical claims, we assume that the scholars' assessments regarding the observed improvements should be mirrored in textual data as well, providing evidence that international standards are indeed the basis of the written monitoring reports and, more importantly, that the adherence of the monitoring reports to these international standards increases over time. 18 This leads us to our first hypothesis: H1: The congruence between OSCE election monitoring reports and experts' views strengthens over time.
While improvements to the methodology can reduce internal sources of bias originating from, for example, a lack of resources or the nationality of the monitors, they are less effective in shielding monitoring bodies from political pressure. These pressures can come from host governments, third countries, or member states in the case of inter-governmental election monitoring organizations (IGOs) such as the OSCE. Kelley 19 points out that while most of the time election monitors provide genuine and uncontested assessments, the political and economic relationships between the monitored country and the member/funding countries of the monitoring organizations may influence the assessments. This is particularly the case of countries that are recipients of aid or military/trade partners of sponsoring states. 20 In our case, arguably, the OSCE represents an IGO mostly dominated by the West, if not on the whole, then at least in the human rights and democracy promotion activities of the organization, including election monitoring. This claim leans on the fact that, to a large extent, the democracy promotion activities are politically, financially, and personally supported by countries integrated or closely associated to the Western structures such as European Union (EU) or NATO, making up a majority among OSCE states. 21 A notable part of this dominance, besides possible political leverage of these governments within the organization, lies in the overwhelming number of staff working in the OSCE/ODIHR election monitoring missions originated from EU and NATO member states or other closely associated countries. 22 Arguably, this establishes a link through which some OSCE member states may impose leverage over activities of the OSCE in the area of election monitoring or at least create incentives and channels for socialization to certain norms. This inherent political bias then may affect favouring a set of (geo)political interests within the organization which does not have to be accepted by all member states. 23 This may range from geopolitical interests in countries such as Georgia or Ukraine and their role in regional security systems or economic interests in countries such as Azerbaijan with its vast natural resources.
We merge these theoretical expectations with real world dynamics of OSCE monitoring which over the years has been challenged multiple times, yet only one line of criticism has prevailed almost throughout the whole period under studythe allegations of political bias against Russian Federation and its allies. When it comes to Russia, the post-Cold War era has been increasingly affected by the West/East divide fuelling confrontation in political, economic, as well as military arenas. 24 It has become a standard procedure that monitoring of elections in countries with strong ties to Russia or Russia itself are contested on political grounds accompanied by allegations of unfair treatment.
The situation got worse with diplomatic feud that dates back to the aftermath of the colour revolutions in the 2000s when the OSCE/ODIHR played an important role in uncovering election frauds in some of the monitored states, thus contributing to the public mobilization against the non-democratic regimes. 25 The divergent opinions on election monitoring have been voiced by Russia, however, at least since 1999 criticizing the OSCE for privileging the human rights dimension over other principles. 26 Specific objections to the OSCE election observation started to be raised in 2003 with a document prepared by delegations of Russia, Belarus, Kazakhstan, and Kyrgyzstan. It was a reaction to an "apparent intrusion" of OSCE practices and institutions, including election observation, into the internal affairs of the participating states. 27 With colour revolutions and the consequences they had in the post-soviet region, a coalition of post-soviet countries led by Russia Federation started to contest how OSCE/ODIHR operated systematically. Russian rhetoric intensified and demands turned to an overhaul of OSCE election observation and its basic principles. In 2004, it led to a common declaration by the presidents of Armenia, Belarus, Kazakhstan, Kyrgyzstan, Moldova, Russia, Tajikistan, Ukraine, and Uzbekistan (later endorsed also by Turkmenistan) condemning the OSCE/ODIHR election observation practices and accusing OSCE of applying "double standards". 28 The initiative, later transforming to another open declaration known as "Astana Appeal" and its successors, represents the most systematic attempt to question the integrity of election monitoring in Europe. Although other states may have questioned the OSCE/ODIHR election assessments occasionally, none of them has transformed the criticism to actual coordinated policy. Although the effort to modify the core focus of monitoring missions eventually failed and number of states reinterpreted their support for international audience, the argument of biased assessments has not disappeared and reemerges regularly with potentially critical assessments the organization publishes. 29 The question however remains whether the allegations are justified. This leads us to our second hypothesis, which tests whether the OSCE/ODIHR monitoring reports are negatively biased against signatory and affiliated countries of the Astana Appeal, an umbrella term we use for the Russian-led coalition of post-soviet states questioning the integrity of OSCE election monitoring: H2: OSCE assessments of elections organized in signatory and supporting countries of Astana Appeal are harsher/more negative than expected.

Data and methods
To test the aforementioned hypotheses, we analyse an original corpus of 303 OSCE monitoring reports we collected covering the period of 1995-2020. 30 It is a mix of elections monitored in different parts of Europe and Eurasia region, with few additions from North America covering western democracies, post-communist countries, and post-conflict societies with different levels of economic development and democratic qualities (see overview in Figure 1). Capitalizing on the advancements of natural language processing and computational linguistics in general, we approach the analysis of textual data from a corpus-based perspective utilizing bag-of-words logic together with a popular wordscores scaling algorithm.
The unit of analysis (n = 303) is a final report representing a comprehensive assessment of a specific election made by a monitoring mission. Although not all documents cover all possible dimensions of election monitoring, their general assessment framework is consistent and focuses on whether and to what degree an election or its part met the international standards of free and fair election. Henceforth, we assume that each report we analyse represents a valid approximation of how the OSCE assessed an election in time and space. Apart from standard cleaning and preprocessing 31 we use a pre-trained named entity recognition (NER) model provided by the Allen Institute for AI for extracting any context-specific references to named entities to avoid a potential location-based bias. 32 The raw corpus after preprocessing consists of 1 528 314 words and 6584 unique tokens.
Using wordscores scaling algorithm, the goal is to scale the corpus in order to uncover a latent continuum that defines the overall assessment of the quality of elections, i.e. the extent to which the OSCE considers an election to be free and fair. When it comes to underlying logic, wordscores algorithm estimates the positions of documents using reference scores for texts whose positions on well-defined a priori dimensions are "known". 33 We combine this approach with a guided bootstrap sampling, a method we propose in order to overcome a problem of selection bias, which inevitably occurs when the reference texts are selected based on close reading (we present a full algorithmic description in Appendix).
As monitoring reports are highly complex, choosing the reference documents is always arbitrary. Moreover, testing has shown that choosing just one pair of documents produces a scale that is not stable and often vary across different pairs. To mitigate this effect we bootstrap the pairs of potentially ten best and ten worst monitoring reports 34 selected based on close reading of collected documents in order to benefit from a good knowledge of the corpus and at the same time to accommodate alternative selection preferences. We choose one election per country potentially covering different nuances of good and bad qualities monitoring reports may focus on in different settings and train 100 wordscores models using all combinations of potential pairs in order to stabilize both the scaling scores as well as the standard errors.
The process of bootstrapped scaling gives us stabilized scores (hereafter referred to as the OSCE election scores) we can use as an approximation of latent scale of free and fair elections. These scores however needs to be validated extrinsically with proper benchmark. While an objective evaluation of an election is perhaps impossible to make, we decide to use the expert assessments collected by the V-Dem project as a form of empirical yardstick that should tell us how well the selected algorithms perform on a simple scaling task. We use Clean Elections Index (v2xel_frefair) as a standardized score capturing the dimension of free and fair election while covering the whole studied period. The index is not perfect but arguably, it is still superior to any available alternatives in terms of rigour, transparency, methodology, and time span. 35 However, as a robustness check, we provide an additional validation of the scaling results using both the Freedom House and the Polity IV index in the Appendix ( Table A2). The results are substantially the same despite the fact that both tested indexes focus on general democratic qualities rather than elections per se.
Although we cannot argue that experts are not exposed to reports under study or do not project their own hidden biases, 36 a systematic-level bias in favour of monitoring missions problematizing the whole expert survey is improbable. First, the coders provide their assessment on a highly aggregated level. It means even a sourcespecific bias is effectively flattened into a number or a code that is an abstraction of much wider range of resources a person with country expertise is exposed through time (e.g. media, research articles, social networks, and monitoring reports). Second, V-Dem's selection criteria for choosing country experts, cross validation of assigned scores and their weighting, and mitigating their biases are thoroughly addressed in the survey's methodology. 37 Third, although our results do not support this argument, we acknowledge that differing information environments across countries might result in an increased reliance of some of the independent experts on findings of OSCE/ ODIHR reports. However, this scenario is not prevailing or exclusive. This is most evident under authoritarian regimes in countries such as Russia, Belarus, and Azerbaijan during elections not observed by the OSCE but still critically evaluated by country experts. There is no indication that the quality of expert assessment significantly deteriorates. Finally, academics and experts themselves occasionally criticize monitoring reports for being biased, inherently recognizing their inconsistencies and problems. 38 This makes us believe the Clean Elections Index, although not perfect, provides a sufficient benchmark we can use for validating the modelled scores.

Empirical congruence: monitoring reports vs expert views
To validate the scaling outcome we compare the OSCE scores with the V-Dem election scores to assess how well the scaling matches the coders' judgement. Figure 2 plots OSCE election scores produced by wordscores algorithm against V-Dem election scores. As we can see, there is generally a great deal of consistency between the OSCE and the V-Dem election scores, despite the fact that they employ an entirely different approach to assessing elections (expert surveys vs monitoring missions). Further inspection shows, that rather than geographical clustering, countries are indeed scaled based on qualities of their elections, which empirically covers various political processes like democratization in many Central and Eastern European countries. This is confirmed in Table 1, which reports the relation between the V-Dem scores on the OSCE scores. Model 1, showing the results of a linear regression, confirms this giving us a reasonably strong confidence that the modelled index captures the latent dimension of free and fair election quite well. 39 Although the analysis in Model 1 provides strong evidence on existing high congruence between scaled monitoring reports and expert assessments and confirms the expectation that OSCE is a trusted IGO often providing genuine and uncontested assessments, 40 we can also see plenty of cases where the scaled scores and expert views disagree. These outliers raise valid questions about whether monitoring reports provide harsher/more lenient assessments for certain contexts than the expert baseline does or it is just noise produced by the scaling algorithms. As the theoretical section suggests, we believe it is the former.
That being said, given the aforementioned methodological differences, it is uncertain and even unlikely whether the relation between V-Dem and the OSCE scores will be linear. This suspicion is confirmed in Model 2, which reports the results of a polynomial regression model. The R² is noticeably higher when compared to the first model, something visually represented in Figure 2.
To analyse the substantive difference between V-Dem and the OSCE scores, we calculate the residuals from Model 2 in Table 1 and use that as an approximation of potential bias. These residuals, the main dependent variable in our analyses, indicate when and to what extent OSCE-election scores present a more positive assessment of an election (positive values) or a more negative one (negative values) than is merited by the V-Dem scores, which thus serves as the benchmark against which to compare the OSCE assessments. 41 This approach takes into account the fact that both scores rely on a different methodology and that some differences are inevitably of an instrumental nature. Our residual-based approach explicitly models the instrumental effects, creating a baseline of a "normal" (given the methodological differences) relation between the OSCE and V-Dem scores. The values of our dependent variable thus indicate to what degree and in what direction cases deviate from this normal relation.
The first main independent variable we test for is the year of election. If the first hypothesis (H1) holds, OSCE reports of more recent elections should be more consistent with the V-Dem scores. For the second hypothesis (H2), the main independent variable is a dummy for being a signatory or supportive state of Astana Appeal and related initiatives (Astana). As the appeal as well as other declarations and initiatives associated with it refer to systematic bias also prior 2004, we expand the relevant window to 1999 when the first Russian attempts to criticize the OSCE activities and profile can be traced. As Ukraine has politically fallen apart with Russia in the recent years, we exclude the country from the group after 2013 (the post Euromaidan era). Apart from the main independent variables, we further control for two sets of independent variables. First, we focus on contextual factors of election monitoring combining socioeconomic and demographic factors with potential strategic interests of the OSCE monitors (see discussion above). More specifically, we control for GDP and total trade measured as percentage of GDP as proxies for important markets, volume of official development assistance (normalized per capita) as an indicator of dependency on international community, proportion of population living in urban areas and population density as an indicator of how well the monitors can cover elections in urban and rural areas, and GDP proportion of total natural resources rents as an indicator of strategic relevance. Second set of control variables focuses on contextual factors of the monitored elections taking into count mission-specific aspects as well as domestic political climate. We control for a deployment of full observation mission and its size (number of observers relative to the size of a country) as an indicator of administrative strength of a mission as well as regional affiliation of the head of mission (Western Europe; Eastern Europe; US/Canada) as a proxy for cultural affiliation. Monitored elections are contextualized through variables of transitional election as an indicator of major political change, turnover election as an indicator for handover of power, and legislative election as contextualizing factor of the race type (see Appendix for an overview of coding rules). Table 2 summarizes the descriptives. 42

Analysis
The first hypothesis predicts that the disagreements between the OSCE and V-Dem election scores will decrease over time due to increased competition and an increasingly elaborate monitoring methodology. Table 3 tests this hypothesis. 43 In this analysis, however, we take as dependent variable the absolute value of the residuals. This is because the first hypothesis concerns itself only with the absolute level of bias, regardless of its direction. Therefore, the models test whether the residuals are smaller (closer to zero) as opposed to higher (either under-or over-estimating the quality of an election). This ensures that we are testing whether OSCE scores are more likely to conform to the normal relation with the V-Dem scores if an assessment was made more recently. Model 1 in Table 3 tests the overall relation between time and OSCE election score bias. Model 2 and 3 gradually make the model more stringent by controlling for the socioeconomic and strategic factors and political factors of the monitored elections. In all three models, we find evidence of a significant and strong decrease in the absolute size of residuals over time (Figure 3 visualizes this trend). In other words, recent OSCE election monitoring reports are more in line with the assessments of experts. This supports hypothesis one, suggesting that efforts made to elaborate on the methodology of election monitoring pay off and increases the quality of the OSCE election evaluations.
Regarding the control variables, only the distinction between legislative and presidential elections is statistically significant at a p < 0.01, with the former showing smaller deviations from the norm than the latter. This can be explained by the nature of legislative races, defined by potentially less tension and more open competition with multiple mandates being contested, as opposed to presidential elections where there is only one winner. 44 As such, presidential elections are thus defined by zero-sum logic with Table 3. Analyses of the absolute bias in OSCE election scores.  no consolation for second place. 45 The overall more positive assessment of parliamentary elections by both OSCE and expert scores is in line with existing literature which agrees that proportional and multi-mandate elections are less prone to fraud as the incentives for electoral misconduct are lower. 46 Thus, while parliamentary elections can be expected to meet at least some minimal standards of free and fair competition, presidential elections represent much more of a riddle with potential large-scale frauds coming into play. Because the dependent variable here is the absolute value of the residuals, we are unable to deduce in which direction this bias goes. Therefore, in Table 4, we use the regular values of the residuals. The first model includes only the Astana variable, and Model 2 and Model 3 add the socioeconomic and strategic factors and political factors of the monitored elections. In the first model, the distinction between the Astana and the other countries is not statistically different from zero, but with the addition of the covariates, a difference begins to emerge. In Model 3, we see that the Astana countries are significantly experiencing more positive evaluations of their elections by the OSCE than merited by V-Dem. Figure 4 visualizes the difference between Astana group and other countries. This is in contrast to what Hypothesis 2 predicted (and Russian-led coalition would hope for). 47 The conclusions of this finding are twofold. Firstly, the data provides empirical evidence showing that the criticism for alleged negative bias pronounced by Russia and other countries supporting the Astana Appeal is unfounded. This is in line with the majority of literature which has interpreted Russia's criticism as part of the broader effort to delegitimize election observation by OSCE in order to fend off unfavorable assessments and preserve authoritarian regimes fitting to Russia's geopolitical interests in the region. 48 The finding however also points to a second and more important perspective. It shows the bias on the side of OSCE exists but in the opposite direction, meaning that OSCE produced reports that were more positive of the assessed elections than reality warrants. This can be explained by the fact that the Russian-led Astana Appeal and the activities that followed represented a major challenge to the functioning of OSCE/ODIHR to which the institution reacted in various ways. 49 Our data indicates that part of the response that OSCE/ODIHR took in reaction to the raised allegation was to moderate negative assessments of elections in the concerned countries in order to accommodate the Russian-led criticism and avoid proposed structural reforms (see Appendix for an example of lenient assessment towards Russia). In other words, the organization was effectively pushed into a more submissive position accommodating a critical voice of powerful actor (or a coalition of actors).  The pushback effect Russian-led coalition has successfully imposed on monitoring activities of OSCE/ODIHR can be characterized as a systematic pressure combined with an abuse of structural shortcomings in the functional organization of OSCE as an IGO (e.g. functioning of executive bodies). More generally, the effect is a result of a political pressure leading to a change of position that is seen as unwanted or less preferable. Although different forms of bias might come from external pressure, the pushback effect captures a specific dynamics that is long-term, political in nature, and focuses on changing the core principles of election monitoring.
Apart from the pushback effect, it is important to acknowledge that at least in some cases, an additional contributing factor might play a role in explaining the observed leniency. As a number of authoritarian regimes regularly alters their strategies for election manipulation, the capacity of election monitors to verify and document them can be regularly challenged as well. If international monitors are not able to keep up with the advancements of election manipulation, it also might lead to a more positive assessment than reality warrants. 50 Regarding the control variables, we see that higher GDP, GDP proportion of total natural resources rents, official development assistance, and legislative elections are associated with more positive assessments. These variables tell three distinct stories. First, there is a positive bias towards strategic markets either in terms of mere size (GDP) or their importance (natural resources) which are in line with criticism that economic interests might interfere with international organizations' monitoring goals. 51 Second, positive bias towards recipients of ODA appear to be in line with the argument that IGO member states may attach particular importance to countries that receive more foreign aid and treat these more leniently as a reflection of their commitment. 52 Lastly, the positive assessment towards parliamentary elections can be explained by a relatively higher level of competitiveness (i.e. more seats to compete for) than we observe in zero-sum contests such as presidential elections with no consolation for attaining even one vote fewer than one's rival. 53 Only one variable shows significant and negative effect (i.e. negative bias)the size of observation mission. We theorize that it mostly reflects a relative number of observers who carry out the observations. In this context, more observers being deployed can spot more irregularities, hence provide on average a more critical assessment. Moreover, bigger monitoring missions are probably more often allowed in countries where the OSCE assessment can be critical. This is in line with the practice of monitoring of elections in countries with authoritarian or repressive governments where monitoring missions are more often than not smaller than the size of a country would require (e.g. Russia).

Conclusion
The article analyses a quarter of century of election monitoring in Europe. Based on the original corpus of OSCE monitoring reports, we explore the existing biases in raw textual data and assess them against independent expert views. Our results show that OSCE is highly consistent with the expert opinions in assessing whether and to what degree an election can be considered free and fair. In this context, the OSCE conducts the overwhelming majority of its assessments with a high degree of professional integrity and continues to improve the quality of its work with time (H1). However, our analysis has also identified several biases of which the most relevant refers to a positive bias towards Russia and its allies effectively showing a defensive position the OSCE has when it comes to election monitoring in these countries (H2).
The article should be read as a test of existing theoretical arguments using original data with an aspiration to understand how political bias works in predominantly (but not exclusively) European context. As we showed in the previous section, the story of bias is both political and strategic. On the one hand, the congruence of reports and expert assessments increases with time, showing the overall standards of election monitoring being continuously improved. On the other, we show that some contexts are more prone to be assessed with a bias than others. Important markets as well as countries with strategic interests get more lenient assessments, which problematizes the legacy of OSCE election monitoring whose impartiality seemingly reaches its limits when it comes to economic and political realities of the OSCE region. On the other hand, we show that the accusation of double standards expressed by the Russian-led coalition of states indeed exists, but in the opposite direction than pictured by the concerned governments. It indicates a defensive position the OSCE was pushed to over the years of political squabbles, which apparently helped Russia and its allies to receive more moderate assessments than reality warrants. We conceptualize this mechanism as a pushback effect which explains the lenient assessments as a result of systematic pressure imposed by Russian-led coalition on OSCE as an IGO.
Overall, the article presents a complex picture of the OSCE's legacy of election observation missions in Europe in the past quarter of century. Despite the declared high standards, OSCE/ODIHR has not always delivered on the principles of impartiality and accuracy of assessments when confronted with vested interests of OSCE member states and complex geopolitical realities of the OSCE area. Although it is not surprising that an international organization composed of national governments yields to political pressures and concealed national interests, the existence of bias is not justifiable considering practical implications that election assessments have for domestic and international audiences. We believe that the evidence we have presented here has given us more insight into the workings of international monitoring organizations, the output they produce, the political goals they seek to balance, and pressure they might face.
Notes 36. Martínez i Coma and van Ham, "Can Experts Judge Elections? Testing the Validity of Expert Judgments for Measuring Election Integrity." 37. see a lengthy discussion on selection procedure and profiles of country coders in Coppedge et al., "The Methodology of 'Varieties of Democracy' (V-Dem)." 38. Kohnert, "Election Observation in Nigeria and Madagascar"; Mendelson, "Democracy Assistance and Political Transition in Russia," 104; Fawn, "Battle over the Box," 1136-38. 39. Given the seemingly non-linear relation between the OSCE and V-Dem election scores, we tried varies alternative specification of the regression model (e.g. log transforming the OSCE scores), but the base model showed the greatest fit with the data. 40. Kelley, Monitoring Democracy. 41. To be sure, we checked whether alternatives to the V-Dem scores yield the same results by repeating the analyses, but with the Freedom House score. The results, reported in the Appendix, are substantively the same (see Table A3 and Table A4). 42. In order to avoid confounding effects, we do not include the variable "Astana" in the models of Table 3. As a robustness check, we repeat the analysis with it in Table A5 (Appendix). 43. The models in Table 3 as well as Table 4 account for the fact that the observations are not independent but can come from the same country, through clustered standard errors. 44. Dawson, "Electoral Fraud and the Paradox of Political Competition." 45. Linz, "Transitions to Democracy." 46. Birch, "Electoral Systems and Electoral Misconduct"; Lehoucq and Kolev, "Varying the Un-Variable"; Ruiz-Rufino, "When Do Electoral Institutions Trigger Electoral Misconduct?" 47. For the same as we excluded the variable "Astana" in Table 3, we leave out the variable "Year" in the models of Table 4. Here too, however, we ran a robustness check, which did include it as a covariate. The model reported in Table A6 supports the results shown here. As another robustness check, Table A7 in Appendix presents the models with Astana variable using period after 2003 instead of 1999. 48. Evers, "OSCE Election Observation"; Zellner, "Russia and the OSCE"; Ghebali, "Growing Pains at the OSCE." 49.

Notes on contributors
Michal Mochtak is a Post-doctoral Research Associate at the Institute of Political Science, University of Luxembourg. He focuses on the existing challenges to democracy in Central and Eastern Europe with a special emphasis on election-related conflicts, political violence, and modern forms of authoritarian rule. He is the author of "Electoral Violence in the Western Balkans: From Voting to Fighting and Back" (Abingdon, New York: Routledge). See more at www.mochtak.com.
Adam Drnovsky is an election expert working for Congress of Local and Regional Authorities, Council of Europe. His research focuses on elections in the context of regime transition, electoral law, voting rights, and election observation. He contributes to election-related publications released by OSCE and CoE and works as an election observer for OSCE/ODIHR and European Union.