A Misuse of Statistical Reasoning: The Statistical Arguments Offered by Texas to the Supreme Court in an Attempt to Overturn the Results of the 2020 Election

Abstract In December 2020, Texas filed a motion to the U.S. Supreme Court claiming that the four battleground states: Pennsylvania, Georgia, Michigan, and Wisconsin did not conduct their 2020 presidential elections in compliance with the Constitution. Texas supported its motion with a statistical analysis purportedly demonstrating that it was highly improbable that Biden had more votes than Trump in the four battleground states. This article points out that Texas’s claim is logically flawed and the analysis submitted violated several fundamental principles of statistics.


Introduction
On December 7, 2020, Texas filed a motion requesting the U.S. Supreme Court to allow it to file a bill of complaint against the states of Pennsylvania, Georgia, Michigan, and Wisconsin. The complaint would assert that the defendant states "will appoint electors based on unconstitutional and deeply uncertain election results" 1 and ask the Court to enjoin them from certifying the electors pledged to vote for President-elect Biden. In support of its motion, Texas presented two probability calculations stating that the probability Vice President Biden won the four states was less than one in a quadrillion.
On December 11, 2020, the Court rejected Texas's attempt to deny the voters of the four defendant states their chosen electors. 2 Because the analysis presented to the Court is logically flawed and violated basic principles of statistical reasoning, it is worthwhile pointing out the erroneous reasoning offered to the Court, so courts can reject such "statistical analyses" at an early phase of a case.
The article is organized as follows: the statistical analyses referred to in the motion, the expert's affidavit, Pennsylvania's reply, and Texas's response are summarized in Section 2. Section 3 shows how the analyses reviewed in Section 2 are logically flawed and violate several basic principles of statistical reasoning.

Texas's Statistical Argument
The motion filed by Texas described why the four defendant states did not conduct their elections appropriately. The motion supported its claim with two probability calculations. The first was based on the fact that during the vote count, then-President Trump had an early lead, however, the later votes were sufficient to make President Biden the ultimate winner. The analysis said that "The probability of former Vice President Biden winning the popular vote in the four Defendant States-Georgia, Michigan, Pennsylvania, and Wisconsinindependently given President Trump's early lead in those States as of 3 a.m. on November 4, 2020, is less than one in a quadrillion, or 1 in 1,000,000,000,000,000. " 3 The second analysis compared the numbers and percentages of votes for President Biden in 2020 with those of Secretary of State Clinton in 2016. The expert's declaration concluded "the statistical improbability of Mr. Biden winning the popular vote in these four States collectively is 1 in 1,000,000,000,000,000. " 4 These probabilities were based on the Z-scores of standard statistical tests comparing the total votes and vote percentages between Clinton and Biden, and early versus later tabulations in the four battleground states. Those Z-scores are reported in Table 1. For example, for the state of Georgia, the expert first tested the hypothesis that the performance of Biden and Clinton were statistically similar by comparing both the total votes and voting percentages, using standard statistical tests. The resulting Zscores are 396.3 and 108.7, respectively, enabling the expert to "reject the hypothesis many times more than one in a quadrillion times that the two outcomes were similar. " 5 The expert also compared the percentage of voters counted by 3 a.m. on November 4th (48.91% for Biden and 51.09% for Trump) with the final results (50.14% for Biden and 49.86% for Trump) announced by the state on November 18th. 6 Then he tested the equality of the percentage of the votes counted by 3 a.m. (early) that Biden received to his percentage of the votes counted after 3 a.m. (late). The Z-score of the two time periods is 1891 and consequently the two time periods "could not remotely plausibly be random samples from the same population of all Georgia ballots tabulated. " 7 The expert further notes that Georgia had counted 95% of all votes by 3 a.m. and that comparable figures for the initial phase of counting in Pennsylvania, Wisconsin, and Michigan were 75%, 89%, and 69%, respectively, and they "are large enough to expect comparable percentages and vote margins for random selections of ballots to tabulate early and later. " 8 Besides the above probability calculations, the expert also showed that early votes increased significantly in 2020 relative to 2016 in all four battleground states. For the state of Georgia, the expert also showed that a much smaller percentage of absentee ballots were rejected in 2020 (0.3634%) than in 2016 (6.42%). The motion claimed that the modifications in the state's treatment of absentee ballots made in March of 2020 led to the large decline in rejected absentee ballots. 9

The Reply of Pennsylvania and Texas's Response
The reply brief of Pennsylvania addressed Texas' probability calculations. For the early versus late comparison, Pennsylvania pointed out that the expert's calculation assumed early and late votes were randomly drawn from the same population, however, those votes were clearly not "randomly drawn from the same population of votes. " 10 For the comparison between Biden to Clinton, Pennsylvania noted that the calculation was based on the assumption that voters in a state would vote the same way in two consecutive elections. Because the elections were separate events, any analysis based on this assumption is worthless. 11 In its response brief, Texas questioned the criticisms in Pennsylvania's brief. 12 Concerning the comparison of early versus late votes, Texas claimed that their expert "did take into account the possibility that votes were not randomly drawn in the later period" but he was unaware of any data that would support such an assertion. With regard to the comparison of Biden to Clinton, the brief refers to a subsequent analysis by its expert that showed that Biden underperformed Clinton in the Top-50 urban areas in the Country by 1.4%, 13 but received a larger percentage of votes in the four of the five urban areas in the defendant states. The expert claimed that this pattern was unusual and deserves more scrutiny. 14

How the Analysis Submitted by Texas is Logically Flawed and Violates Some Basic Statistical Principles
This section explains why the logic and statistical reasoning underlying Texas's analysis are incorrect.

The Comparison of Total Votes and Vote Percentages between Biden and Clinton
The expert found that in all the four battleground states, the increase in total votes and percentages of votes for Biden over Clinton are "statistically incredible if the outcomes were based on similar populations of voters supporting the two Democrat candidates, " thereby raising doubts about the 2020 election outcomes. 15 These comparisons between Clinton and Biden are logically wrong. As stated later in this section, the circumstances in 2016 and 2020 were substantially different, therefore, one expects different numbers of voters and different percentages of them favoring the Democrats and Republicans in the two years, that is, the populations of voters supporting the Democratic candidate were not expected to be similar. The statistical significance of the test of the equality of the percentages voting for Biden and Clinton simply confirms the fact that the political preferences of voters differed in the two years and does not raise any doubt about the vote count of the 2020 election. Following the principles of Mallows (1998), there should be a reason for conducting a study. However, there is no political or historical justification for assuming the political preferences for voters in 2020 and 2016 should be similar. Not only were Biden and Clinton different candidates with different histories and styles, the state of the nation differed substantially in 2020 compared to 2016. The 2020 election occurred in the midst of the worst pandemic the nation had experienced in 100 years.
11 Id. at 8. 12 Reply in Support of Motion for Leave to File, No. 22O155, December, 11, 2020, pages 2-3. 13 Id. at 156a. The expert removed cities in the four battleground states in determining the top-50 cities. 14 Id. at 157a. 15 Expert declaration at 4a.  (1904,1912) 37.6 41.8 4.2 Wilson (1912,1916) 27.4* 46.1 18.7 Hoover (1928,1932) 40.8 57.4 16.6 Roosevelt (193216.6 Roosevelt ( , 1936 39.6 36.5 −3.1 Roosevelt (1936,1940) 36.5 44.8 8.3 Roosevelt (1940Roosevelt ( , 1944 44.8 45.9 1.1 Eisenhower (1952,1956) 44.4 42.0 −2.4 Nixon (1968,1972) 42.7 37.5 −5.2 Carter (1976,1980) 48.0 50.7 2.7 Reagan (1980,1984) 41.0 40.6 −0.4 Bush (1988,1992) 45.6 43.0 −2.6 Clinton (1992,1996) 37.4 40.7 3.3 Bush (2000,2004) 48 The national unemployment rate was 6.9%, 16 much higher than the 4.6% in November 2016. 17 Furthermore, a much larger number of voters participated in the 2020 election than in 2016. According to the United States Elections Project, the turnout rates as a percentage of voting-eligible population was 59.2% 18 in 2016, and 66.7% 19 in 2020, a 7.5% increase. The fact that over 22 million Americans moved to a different state between 2017 and 2019 also affected the pool of eligible voters in the states. 20 Moreover, even if the populations of voters in the two elections were nearly identical, political preferences can certainly change even in a two-year period. For example, while both President Obama and President Trump started their terms with a majority of House of Representatives being from their respective parties, after the mid-term elections the opposing party became the majority of the House. Historically, when a president runs for reelection, the percentages of voters who voted for their opponents usually are not the same in both elections. Table 2 lists the percentages of the popular votes for their opponents for those Presidents who ran for reelection, since President McKinley in 1896. Even when the President ran against the same candidate in both elections, for example, President Eisenhower ran against Stevenson both times, the percentages of votes the opponent received changed, due to changes in the social, political, economic conditions occurring between the elections, the current President's approval ratings and in some cases, the existence of a serious third party candidate in one of the elections. The differences between the opponents' percentages in the second and first elections range from −5.2% to 18.7%. In fact, Table 2 shows that in the 16 pairs of elections where an incumbent ran for reelection, there were only three where the percentages of votes the opponent received were within 1%. 21 Because the historical data clearly contradict the idea that the percentages of votes received by candidates from the same party should be the same in successive elections, the hypothesis tested by the expert has no subject matter justification.
In addition to the logical flaw in the Biden versus Clinton comparison, the application of the standard statistical hypothesis tests is questionable for several reasons.

Assumptions of the Statistical Tests are Violated
The calculations of the Z-scores in the expert declaration assume the data are random samples from a common population, 22 but the expert's affidavit provides no justification for this assumption. Indeed, it is likely that voters are more interested or have stronger political views than nonvoters, so the population of voters differs from the population of eligible voters. Consequently, the assumption underlying the standard statistical tests that the sample (those who actually voted) are randomly chosen members of the population of eligible voters is false. Meng (2018) showed that seemingly small violations of randomness can seriously bias results, which implies that the expert's conclusions are unreliable.

The Conclusions from the Hypothesis Tests are Wrong
Even under the questionable assumptions made by the expert, the large Z values reported in the affidavit simply shows that the total votes and percentages of those votes cast for Biden and Clinton are significantly different. As noted earlier, the two populations of voters being compared are different, and the large Z-scores simply confirm this. They do not cast doubt about the election results.

Other Factors that Might Influence Voters' Preference are Not Controlled for
Since the economic and public health circumstances were substantially different at the time of the 2016 and 2020 elections, the analysis should include a control population, for example, all the other states or possibly a subset of other states where the vote was expected to be close. 23 Therefore, the simplistic analysis only comparing Biden to Clinton in the four battleground states is meaningless. Because the statewide elections are of primary importance, changes in voter turnout as well as the Biden versus Clinton comparison in all 50 states and the District of Columbia will be reviewed.
Biden versus Clinton: Table A1 in the Appendix shows that the percentage of individuals who voted for Biden in 2020 were higher than those for Clinton in 2016 in all 50 states and the 21 McKinley (189621 McKinley ( , 1900, Reagan (1980Reagan ( , 1984, and Bush (2000,2004 District of Columbia, not just in the four battleground states. Even in states that Trump won, for example, Kansas, Idaho, and Utah, Biden received at least 5% more votes than Clinton. Voting for Other Candidates: In all 50 states and the District of Columbia, the percentages of voters who voted for other candidates on the ballot were lower in 2020 than in 2016 (see Table A1 in the Appendix).
Voter Turnout: Table A2 in the Appendix shows that in all 50 states and the District of Columbia, the voter turnout rates as a percentage of the voting-eligible population in 2020 are higher than those in 2016. The turnout rates in 2020 in the four battleground states, Georgia, Pennsylvania, Wisconsin and Michigan, were 8.6%, 7.4%, 6.3%, and 9.2% higher than the 2016 turnout in those states. These are similar to the national increase of 7.5%.
Because the Biden versus Clinton comparisons in the four battleground states were similar to the same comparisons in other states, the fact that Biden received more votes and a higher percentage of votes than Clinton in the four battleground states does not raise a credible doubt about the election results in those four states. Indeed, had the expert used a control group, which is standard statistical practice, he would have seen that this pattern occurred in all 50 states and the District of Columbia.

Texas's Additional Analysis in its Response
In the reply filed by Texas, the expert claimed that, after removing the four battleground states, Clinton outperformed Biden by 1.4% in the top-50 cities, but Biden won four out of five major urban counties in the battleground states. He infers that this conflict is unusual and justifies further investigation 24 . The expert's declaration does not report or cite the source of the data used to support the claim. In fact, the claim is wrong. Biden outperformed Clinton in a majority of the largest urban areas. The National Review examined the data in 36 of the top-50 urban areas and found that in 29 of them Biden received a higher percentage of the votes than Clinton. 25 This was known in November, 2020 before the motion was filed.

Comparing Early and Subsequent Tabulations
The expert reported that Trump led the vote count before 3 a.m. on November 4, 2020. When additional ballots were included, Biden won all the four battleground states. The expert tested the hypothesis that the percentages of votes for Trump tabulated in the two time periods were equal. From the result that they were statistically significantly different, he concluded that the votes counted in the two periods "could not remotely plausibly by random samples from the same population of all Georgia ballots tabulated, " 26 which cast doubts on the election results in those four states. The logic of this conclusion is wrong.
The ballots counted in the two time periods are not random samples from all votes in each state, that is, the assumption that 24 Page 156a of the expert's Supplemental Declaration. 25   they are random samples on which the standard statistical tests are based is violated. Most jurisdictions count the ballots cast by absentee and early voters after they count the votes cast in person on Election Day. 27 According to New York Times, 28 mail ballots tend to take longer to process than in-person votes, and millions more people voted by mail in 2020 than ever before. Since a higher proportion of registered Democrats requested absentee ballots, this trend combined with the expected delays in tabulating absentee ballots in several battleground states, implies that the in-person vote counted soon after polls closed on Election Day were likely to show early Republican leads, and the absentee votes tallied later were likely to show Democrats gaining ground. 29 Even if the assumptions underlying the tests were satisfied, the statistically significant results would only mean that the percentages of the total vote Trump received in the two time periods were different. They do not imply that the early and later votes are not random samples from the same population. On the contrary, only when the two samples are randomly selected can one use the Z-score to test whether the candidate preferences in the votes counted early or later are statistically similar, that is, the fact that the data are random samples is one of the assumptions required for the validity of the statistical test, not an inference or conclusion.
Furthermore, the expert's declaration stated that "Georgia had tabulated 95% of the ballots cast by 3 a.m. EST. The comparable initial tabulations in Pennsylvania, Wisconsin, and Michigan were 75%, 89% and 69%. These are large enough to expect comparable percentages and vote margins for random selections of ballots to tabulate early and late. " 30 When comparing the percentages of votes counted early and later received by the candidates, the fact that ballots counted early were a large proportion of all the votes does not imply that they were randomly selected from all ballots cast. The votes counted in the two time periods reflected the different preferences of in person votes and mail-in votes. Indeed, it was known before Election Day that the percentages of votes favoring either candidate in the two voting options would not be similar.
Because the expert's declaration discussed the situation in Georgia in detail and the state had to carry out two recounts, it is worthwhile to examine the data in detail. Table 3 reports the way Georgians cast their ballots in the election.
The data in Table 3 demonstrate that Biden's strong majority of the absentee by mail ballots were essential to his winning the state. Because these votes were counted after the ballots cast on Election Day, the votes tabulated earlier in the process favored Trump, while the absentee votes reversed his lead. Vote counting was also delayed in Fulton County (Atlanta, Georgia) because a pipe burst in State Farm Arena just above the room where the absentee ballots were being counted. 31 Thus, absentee ballots from Atlanta, which is predominantly Democratic were counted later. On December 2nd, the Secretary of State who oversees the voting process announced that a second recount confirmed that President Biden received more votes. 32

Percent Increase in Early Ballots between 2016 and 2020
The expert showed that the four battleground states had a significant increase in early balloting in 2020 compared to 2016. 33 Even though an explicit conclusion was not drawn from these increases, reporting them in only the four battleground states suggests they were unusual situations, thereby casting doubt about the election results in those four states. The increase in early ballots in 2020 was expected because many states gave voters more opportunity to vote absentee or early in response to the pandemic. Table A3 in the Appendix lists the early ballots for 2016 and 2020 for all 50 states and the District of Columbia along with the 2020/2016 ratio, expressed as a percentage. The table is arranged in descending order of the 2020/2016 ratio and shows that 2020 early ballots increased in all 50 states and the District of Columbia. 34 The ratios of the four battleground states are ranked 4, 16, 17, and 31 among the 50 states and District of Columba and are in line with those of other states. If one desires to draw a sound statistical conclusion from the 2020/2016 early ballot ratios for the four battleground states, one should compare them to the ratios of the other states.

Final Remarks and Conclusion
This article points out that the analysis of the 2020 election submitted by Texas is logically flawed and violated several major 31 Brash, B. (2020). Fulton County election results delayed after pipe bursts.
The Atlanta Journal-Constitution (November 3rd). 32 Biden to Carry Georgia After Second Recount: State Election Official https://www.usnews.com/news/top-news/articles/2020-12-02/bidenwill-carry-georgia-after-second-recount-secretary-of-state. 33 Table 1 on page 5a of expert's original declaration. 34 The increase was noticeable except in the three states that mailed ballots to registered voters in both 2016 and 2020.
statistical principals. First, the populations being compared, 2016 and 2020 votes, are by definition different. Historical data also showed that the percentage of votes received by a party's candidate in one election usually differed from the succeeding election. Second, one cannot draw a sound inference by examining only data in the four battleground states. Only when the trends in the four battleground states differ from those of comparable states can one believe those trends are meaningful. This is especially true in the present context as the economic and health situations in 2020 were dramatically different from 2016 throughout the nation. Third, a major assumption in testing the equality of two binomial proportions is that both proportions were obtained by random sampling. The individuals who voted in 2016 and those who voted in 2020 are not random samples from all eligible voters. Thus, the large Z-scores and the corresponding extremely small p-values reported in the expert declaration are calculated under an erroneous assumption and conclusions drawn from them are not reliable.
Fourth, the p-values do not convey information on whether or not the samples are random. With respect to the 2020 election, several sources 35 show that the tabulated votes at the two time frames were not random samples and hence, once cannot expect comparable percentages from the two time periods. Thus, the expert's conclusions from the early versus late comparison are not sound.