Monte Carlo simulation of vote counts from Nigeria presidential elections

Abstract This study was designed to examine the application of Monte Carlo simulation to analyse vote counts from the 2011 and 2015 presidential elections in Nigeria. The study adopted a simulation approach and the data were simulated using the R programming language. The actual and simulated data were analysed using frequency and percentage distribution as well as Spearman Rank Correlation tests. Findings revealed that the vote counts for People’s Democratic Party (PDP) and Congress for Progressive Change (CPC) in the 2011 presidential elections as well as vote counts for All Progressives Congress (APC) and PDP in the 2015 presidential elections do not conform to the distributional pattern of the respective simulated vote counts. A manipulation-free data generating process is expected to produce vote counts that are close to a random distribution. The study recommends the use of forensic analysis by the Independent National Electoral Commission (INEC) for future electoral reform in Nigeria.


PUBLIC INTEREST STATEMENT
Election forensic is a nascent field in Social Science. It uses statistical methods to verify whether election results are free from anomalies or not. The techniques adopted in election forensic are based on finding patterns in the electoral returns data and the possible deviation of election data from the expected distribution of this pattern. The study adopted secondary data gathered from the electoral body in Nigeria. We adopted agent-based approach (using Monte Carlo method) to detect anomalies in the Nigeria 2011 and 2015 presidential election results. The findings show that the elections conducted in Nigeria are filled with anomalies, as the collated votes for the two leading political parties in both elections deviate from the simulated votes. The study concluded with recommendations and suggestions for policy makers and government on the way forward and necessary steps that could help to avoid future election malpractices.

Introduction
Election is a formal decision-making process by which a population chooses an individual to hold public office by voting. The history of Nigeria's democratic experiments demonstrates that elections and electoral politics have generated so much animosity which has, in some cases, threatened the corporate existence of the country (Animashaun, 2010). Much of this animosity could be traced to unreliability of electoral result returns. Studies show that all elections conducted during the fourth republic in Nigeria (1999Nigeria ( -2019 are filled with anomalies (Ogbeidi, 2010;Osinakachukwu & Jawan, 2011;European Union Election Observation Mission (European Union Election Observation Mission. 2015. Second preliminary statement on, 2015).
The 1999 general election was generally considered to be violence-free but not credible due to massive irregularities. There was a wide disparity between the number of voters observed at the polling stations and the final result that was reported from several states (Ogbeidi, 2010). The 2003 election, in the history of Nigeria politics, presented the first peaceful civilian transition. However, the elections were said to be more pervasively and openly rigged than the flawed 1999 polls (Onebamhoi, 2011). More so, this apparently phantom election recorded extraordinary high turnout figures generally in excess of 90 percent (Osinakachukwu & Jawan, 2011).
The 2007 elections were described as the worst in Nigeria's history ranking among the worst conducted anywhere in the world in recent times (Onebamhoi, 2011). The elections were characterized by late arrival of electoral materials in the various polling units, inadequate polling materials, voter's registration problems, no secrecy of the ballot, ballot paper problems, snatching of ballot boxes and destruction of ballot materials, violence, use of security agencies to intimidate the voters and rigged elections, no voting in some polling centres, use of government officials to commit electoral fraud, and omissions of some parties' logo and candidate names on the ballot paper to disenfranchise their opponents' supporters (Kia, 2013). Although the 2011 presidential election was commended by observers as one of the most successful in Nigeria's political history, cases of stuffing ballot boxes, under age voting and outright falsification of election results were also reported in some states (Yusuf & Zaheruddin, 2015). Likewise, the European Union Election Observation Mission. 2015. Second preliminary statement on (2015) reported that the 2015 general elections were marred by malpractices, despite being largely peaceful.
To address the issues of election anomalies, scholars have suggested the application of election forensics to analyse election data (Breunig & Goerres, 2011;Levin & Alvarez, 2013;Mebane, 2006). Election forensic techniques focus on finding patterns in the electoral returns data and the possible deviation of election data from the expected distribution of this pattern (Levin & Alvarez, 2013). Many of the common electoral forensic techniques are based on Benford's Law while few others focus on Agent-based modelling.
Agent-based modelling is a modelling technology that is ideally suited to investigate outcomes that may emerge when large numbers of boundedly rational agents, using adaptive decision rules selected from a diverse portfolio of possibilities, interact with each other continuously in an evolving dynamic setting (MacGregor et al., 2006). Agent-based modelling has been deployed to analyse electoral outcomes (Cantu & Saiegh, 2011;Quratul-Ann, 2013). There are several approaches adopted in simulating with agent-based models. One of these methods is the application of game theory which is a decision taking tool that analyses the choice of optimum strategy in different conflicting situations for achieving individual objects out of the common goals of competitors in an election (Nagaraju et al., 2012). Other methods are the application of scripting techniques based on object-oriented methods (Wilensky & Rand, 2011) and the application of the Monte Carlo analysis (Cantu & Saiegh, 2011). According to Phelps (2012), agent-based models are often sufficiently complex that deriving explicit solutions for quantitative aspects of their macroscopic behaviour is often impractical if not impossible; hence, they are often analysed using discrete-event simulation and Monte-Carlo methods. The Monte-Carlo approach has been specifically adopted in detecting fraudulent electoral data (Kobak et al., 2014).
Monte Carlo analysis is a general term that refers to research that employs random numbers, usually in the form of a computer simulation (Johnson, 2011). This article focuses on the application of Monte Carlo approach of the agent-based model to analyse election data in Nigeria. In this article, the authors examined if the actual vote counts of PDP and CPC in the 2011 presidential elections deviate from the simulated vote counts from the Monte Carlo simulation; and if the actual vote counts of APC and PDP in the 2015 presidential elections deviate from the simulated vote counts from the Monte Carlo simulation. In order to determine the correlation between the actual vote counts and their respective simulated vote counts from the Monte Carlo simulation, we hypothesised that:

An overview on election forensics
Election forensics is a name coined to describe a nascent field of social science that intends to develop data analysis tools that can be used to detect discrepancy in election outcomes Levin & Alvarez, 2013;Mebane, 2010). The fundamental difficulty with the study of election fraud is its measurement-it may take many forms and those involved typically wish to hide these illicit activities. Scholars generally rely upon assessments by election observers to measure electoral fraud for quantitative cross-national studies and use media reports and petitions filed by aggrieved parties for single-country studies. But these measures are generated by participants with different interests, expectations, and standards across elections, which raises concerns about consistency and bias. Consequently, current measures used in cross-national studies may underestimate the extent of electoral fraud in new democracies, including those that appear to have fairly clean elections. This underlies the need for substantial caution about the robustness of empirical findings in this emerging large-n literature (Birch, 2007;Ichino & Schundeln, 2012;Kelley, 2012;Lehoucq, 2003).
The operational problem in uncovering fraudulent elections is identifying the characteristics that make them distinct from valid ones. Still, it is usually impossible to be absolutely certain about the legitimacy of an election. A particularly appealing option to overcome this difficulty is to tease out possible evidences of fraud from the available data using mathematical algorithms (Cantu & Saiegh, 2011). Monte Carlo simulation provides a probable application of mathematical algorithm that can be used to simulate and analyse election data.
Monte Carlo analysis is a technique in agent-based modelling. An agent-based model can be termed as a collection of multiple, interacting agents, situated within a model or simulation environment such as represented by the artificial world (Heppenstall et al., 2012). Agents can be representations of animate entities such as humans that can roam freely around an environment or inanimate objects that have fixed locations but can change state. Each of the inanimate and animate agents outlined above can possess rules that will affect their behaviour and relationships with other agents and/or their surrounding environment. Rules are typically derived from published literature, expert knowledge, data analysis or numerical work and are the foundation of an agent's behaviour. One rule-set can be applied to all agents or each agent (or categories of agents) can have its own unique rule set. These rules are typically based around "if-else" statements with agents carrying out an action once a specified condition has been satisfied. Agents can interact with each other and among themselves and with the environment. This relationship may be specified in a variety of ways, from simply reactive (that is, agents only perform actions when triggered to do so by some external stimulus) to goal-directed (that is, seeking a particular goal). The behaviour of agents can also be scheduled to take place synchronously or asynchronously.
Although there is no universal agreement on the precise definition of the term "agent," definitions tend to agree on more points than they disagree. The fundamental feature of an agent is the capability of the component to make independent decisions. This requires agents to be active rather than purely passive (Macal & North, 2006). While introducing the need for agent-based model, Macal and North emphasised that the systems to be analysed and modelled are becoming more complex in terms of their interdependencies. More so, traditional modelling tools are no longer as applicable as they once were. Agent-based model has its direct historical roots in complex adaptive systems (CAS) and the underlying notion that systems are built from the ground-up, in contrast to the top-down systems view taken by Systems Dynamics. CAS concerns itself with the question of how complex behaviours arise in nature among myopic, autonomous agents. In addition, agent-based model tends to be descriptive, with the intent of modelling the actual or plausible behaviour of individuals, rather than normative such as traditional operations research, which seeks to optimise and identify optimal behaviours (Macal & North, 2006).
The first social agent-based simulation was developed by Thomas Schelling (Schelling, 1978). Schelling applied cellular automata to study housing segregation patterns, in which agents represent people and agent interactions represent a socially relevant process. The Schelling model showed that it is possible to have patterns that are not necessarily implied or consistent with the objectives of the individual agents. Some years later, Epstein and Axtell extended the notion of modelling people to growing entire artificial societies through agent simulation called Sugarscape model. Sugarscape agents emerged with a variety of characteristics and behaviours, highly suggestive of a realistic, although rudimentary and abstract, society. Emergent processes were observed including death, disease, trade, wealth, sex and reproduction, culture, conflict and war, and externalities such as pollution (Epstein & Axtell, 1996;Macal & North, 2006). Laver and Sergenti (2012), on their own, started with the twin premises that understanding multiparty competition is a core concern for everyone interested in representative democracy, and that we must understand multiparty competition as an evolving dynamic system, not a stationary state. Given these premises, they investigated the dynamics of multiparty competition using computational agent-based modelling. This allowed them to model decision-making by party leaders, in what was clearly an analytically intractable setting, in terms of the informal rules of thumb that might be used by real human beings, rather than the formally provable best response strategies used by traditional formal theorists. Their study was fundamentally about decisions made by party leaders.

Application of agent-based model and Monte Carlo technique
In another research, Fowler and Smirnov (2005) developed an agent-based model of dynamic parties with social turnout built upon developments in different fields within social science. They described and analysed an agent-based model (ABM) of repeated elections in which voters and parties behaved simultaneously. They placed voters in a social context and let them interact with one another when choosing whether or not to vote. The researchers also let parties chose the platforms they offered and these choices might change from election to election depending on feedback from the electorate. Their model yielded significant turnout, divergent platforms, and numerous results consistent with the rational calculus of voting model and the empirical literature on social turnout.
Adopting Monte Carlo technique, Oleg (2011) simulated a total sample of 80,000 precincts in Russia and discovered that the higher the turnout, the less opportunity for falsification. Likewise, Kobak et al. (2014) hypothesised that the frequency of reported round percentages should be increased if election results are manipulated or forged. They analysed raw data from seven federal elections held in the Russian Federation during the period from 2000 to 2012. They used Monte Carlo simulations to confirm high statistical significance of man-made fraud in all elections since 2004. They discovered that the number of polling stations reporting turnout and/or leader's result expressed by an integer percentage, as opposed to a fractional value, was much higher than expected by pure chance.
In a similar study, Ananyev and Poyker (2018) examined the role electoral fraud plays in nondemocracies. They focused on data obtained from the 2011 parliamentary elections in Russia and a regionally representative public opinion survey. Using indicators based on empirical studies and Monte Carlo simulation, they discovered that more manipulation occurred in the areas where the regime was more popular. In a separate study carried out in developing democracies, Ferrari et al. (2018) used Monte Carlo methods, nonparametric Bayes and path sampling methods to investigate fraud in the 2013 presidential elections in Kenya and 2014 presidential elections in Brazil. The electoral data for the Brazil election was aggregated to town level while that of Kenya was at ward levels. Their model shows signs of fraud in the Kenya election with no sign of fraud in the Brazil election.
In another related study, Cantu and Saiegh (2011) created a set of simulated elections using Monte Carlo method as part of the tools to diagnose electoral irregularities in Argentina. Their study indicated that Monte Carlo method can help in diagnosing electoral fraud using recorded vote counts. Also, Rivest and Shen (2012) carried out a research on post-election auditing based on Bayes audit. The Bayes audit has the structure of two nested loops: an outer, "real-world" loop that is auditing the ballots one by one, and an inner "simulation" loop that is using Monte Carlo simulation to estimate the probability that the Bayesian model would generate an election upset (compared to the originally reported election outcome). Their study gave an experimental evidence of the effectiveness and efficiency of the Monte Carlo technique.

Research method
Monte Carlo simulation, a technique in agent-based modelling, was adopted for the research design.

Data analysis
This section focuses on the simulation of the vote counts in the 2011 and 2015 presidential election results using Monte Carlo approach. A total of 37 different surrogates of vote counts for each political party per state were generated, for each of the 2011 and 2015 presidential election, to represent the 37 states (including the FCT) by simulating with a pert distribution. The pert distribution uses the minimum, expected and maximum values of the observed election data. The analysis was carried out on the two leading political parties of the presidential election results.

Monte Carlo Simulation of the 2015 presidential election's result
This sub-section presents the Monte Carlo Simulation of the vote counts for APC and PDP (the two leading/major political parties) in each of the geopolitical zones and all the states in the 2015 presidential election. Figure 1 presents the Monte Carlo simulated votes (surrogate) and the actual votes for APC and PDP in the South West geo-political zone. The figure shows that the vote counts for APC and PDP differ from the simulated votes in all the states in the South West geo-political zone. Figure 1 also reveals that the actual votes for APC and PDP in Lagos State are far beyond the simulated votes for both parties.
Key: APC = All Progressives Congress; PDP = People's Democratic Party Figure 2 presents the Monte Carlo simulated votes (surrogate) and the actual votes for APC and PDP in the South East geo-political zone for the presidential election in 2015. Figure 2 shows that the simulated votes and actual vote counts for APC are low and close in Abia State, Anambra State, Ebonyi State and Enugu State. However, the simulated votes and actual vote counts differ for APC in Imo State and for PDP in all the states in the South East zone.      Yobe State. On the contrary, the actual vote counts were extremely high for APC in Bauchi State when compared with the simulated votes.
Key: APC = All Progressives Congress; PDP = People's Democratic Party Figure 6 shows the Monte Carlo simulated votes (surrogate) and the actual votes for APC and PDP in the North Central geo-political zone for the presidential election in 2015. The actual votes and the simulated votes are close for PDP in Kogi State and Kwara State, as well as for APC in the FCT. However, there is an extremely high vote counts for APC in Niger State when compared with the simulated votes.

Monte Carlo simulation of the 2011 presidential election's result
This sub-section presents the Monte Carlo Simulation of the vote counts for PDP and CPC (the two major political parties) in each of the geopolitical zones and all the states in the 2011 presidential election. Figure 8 presents the Monte Carlo simulated votes (surrogate) and the actual votes for PDP and CPC in the South West geo-political zone. Figure 8 shows that the actual vote counts and the surrogate votes are higher for PDP in all the states in the South West zone. Figure 8 also reveals that PDP recorded more votes than the expected simulated votes in the South West, with the exception of Ekiti State. Also, the actual vote counts and the simulated votes are very low for CPC in Ekiti, Ogun, Ondo and Osun states.  Figure 9 shows the Monte Carlo simulated votes (surrogate) and the actual votes for PDP and CPC in the South East geo-political zone. Figure 9 reveals that the actual vote counts and the surrogate votes are higher and very close for PDP in all the states in the South East zone. Figure 9 also reveals that the actual vote counts and the surrogate votes are very close and very low for CPC in all the states in the South East zone.

Test of hypotheses
In this section, results of the tests of hypotheses are presented.
H 01 : There is no significant correlation between the digital distribution of vote counts of the 2011 presidential election results in Nigeria and the digital distribution of their respective simulated votes from the Monte Carlo simulation. Table 1 shows the correlation between the actual vote counts and the respective simulated votes in the 2011 presidential election. We infer from Table 1 that there is a significant correlation between the simulated votes and the actual votes of PDP (p-value < 0.05) and CPC (p-value < 0.05) in the 2011 presidential elections. Therefore, the null hypothesis is rejected. The correlation between the actual vote counts and the respective simulated votes in the 2015 presidential election is presented in Table 2. We infer from Table 2 that there is a significant correlation between the simulated votes and the actual votes of APC (p-value < 0.05) and PDP (p-value < 0.05) in the 2015 presidential elections. Therefore, the null hypothesis is rejected.

Key: APC = All Progressives Congress; PDP = People's Democratic Party
The results of the hypotheses imply that there is a correlation between the simulated votes and their respective actual vote counts. This also implies that the simulated votes represent (conform to) the actual vote counts for each of the political parties.

Discussion
The tests of hypotheses show that there is a significant relationship between the digital distribution of vote counts of election results in Nigeria and digital distribution of Monte Carlo simulation. Monte Carlo analysis employs random numbers in simulating data in a way that mimics the actual data as close as possible. The assumption is based on the expectation that election without fraud should have vote counts that are approximately random (Johnson, 2011;Kobak et al., 2014;Oleg, 2011). The results of the Spearman Correlation Coefficients tests on the relationship between the vote counts and Monte Carlo simulated votes show that the surrogate votes represent (or mimic) the actual vote counts from both the 2015 and 2011 presidential elections.
Pert distribution was used to simulate the actual election data because there are fewer details in the state collated results released by the Independent National Electoral Commission. Similar to Cantu and Saiegh (2011), who were able to separate clean elections from the fraudulent elections by creating a set of simulated elections using Monte Carlo method to simulate vote counts and Bayes classifier as learning algorithm, findings from the pattern of the pert distribution for the simulated data and the pattern of distribution of election data show that some of the vote counts differ from the expected simulated votes. This variation in the pattern of distribution suggests that the actual vote counts do not represent a random distribution.
The findings corroborate reports from the European Union Election Observation Mission. 2015. Second preliminary statement on (2015) on the 2015 presidential elections and the submission of Yusuf and Zaheruddin (2015) on the 2011 presidential elections. Both studies revealed that the 2011 and 2015 presidential elections were marred with electoral malpractices. However, the Monte Carlo analysis alone might not provide sufficient evidence to back up the suggestion that the deviation in pattern represents a fraudulent process in the vote counts of the 2011 and 2015 presidential election results. Hence, it is important to compare the results of the Monte Carlo analysis with other election reports, such as election audit reports and election observer's reports.
This research work has contributed to literature in showing that Monte Carlo simulation, using PERT distribution approach, can be used to analyze election data with a view to detect irregularities in the data. Based on the findings of this study and the reviewed reports in literature, the 2011 and 2015 presidential election results were marred with irregular data.

Conclusions and recommendation
This paper presents the application of Monte Carlo simulation to analyse vote counts from the 2011 and 2015 presidential elections in Nigeria. The results reveal that the digital distribution of the vote counts for PDP and CPC in the 2011 presidential elections as well as vote counts for APC and PDP in the 2015 presidential elections do not conform to the respective distributional pattern of the simulated vote counts. The digital distribution of the vote counts for each of the political parties is expected to produce a pattern which shows that the counts are independently generated. The findings, therefore, suggest that the electoral process did not produce independent and randomly generated votes in the 2011 and 2015 presidential elections. It is recommended that the Independent National Electoral Commission should intensify effort in making elections in Nigeria closer to the hitch-free elections obtained in more stable democracies. There should be a reform of the electoral act to produce a framework that addresses issues of vote buying and other election fraud, litigations, prosecution of electoral offenders within the shortest possible period, and structural reform of the electoral bodies. The electoral reform should also allow for use of forensic analysis of electoral data by plaintiff or defendant at the election petitions and tribunals. In addition, relevant authorities should ensure that the security forces deployed to maintain peace and order during elections do not interfere in the election process or participate in any form of election fraud.