Optimal Legislative County Clustering in North Carolina

North Carolina's constitution requires that state legislative districts should not split counties. However, counties must be split to comply with the"one person, one vote"mandate of the U.S. Supreme Court. Given that counties must be split, the North Carolina legislature and courts have provided guidelines that seek to reduce counties split across districts while also complying with the"one person, one vote"criteria. Under these guidelines, the counties are separated into clusters. The primary goal of this work is to develop, present, and publicly release an algorithm to optimally cluster counties according to the guidelines set by the court in 2015. We use this tool to investigate the optimality and uniqueness of the enacted clusters under the 2017 redistricting process. We verify that the enacted clusters are optimal, but find other optimal choices. We emphasize that the tool we provide lists \textit{all} possible optimal county clusterings. We also explore the stability of clustering under changing statewide populations and project what the county clusters may look like in the next redistricting cycle beginning in 2020/2021.


Legislative Districting in North Carolina
North Carolina elects its general assembly consisting of 50 senators and 120 representatives by districts. Article 2 of North Carolina's constitution states [Nor] that (1) Each [legislator] shall represent, as nearly as may be, an equal number of inhabitants, the number of inhabitants that each [legislator] represents being determined for this purpose by dividing the population of the district that he represents by the number of [legislators] apportioned to that district; (2) Each [legislative] district shall at all times consist of contiguous territory; (3) No county shall be divided in the formation of a [legislative] district; (4) When established, the [legislative] districts and the apportionment of [legislators] shall remain unaltered until the return of another decennial census of population taken by order of Congress. Additionally, districts must comply with the federal Voting Rights Act (VRA), which mandates the construction of several districts where minority votes have a reasonable chance to effect the election outcome. This translates into requiring some districts with higher concentrations of minority votes. For the purposes of this paper, it is assumed that it is always possible to comply with the VRA once the clusters are set, so we will ignore this constraint.
Constraint (1) was clarified in 2002 in the case Stephenson v. Bartlett [Lak02], when North Carolina's Supreme Court ruled that all districts must be single-member districts, and that the population of each district must be within 5% of the ideal population, which is 1/50th of the state population for Senate districts and 1/120th of the state population for representative districts.
The definition of "contiguous" in constraint (2) was clarified in 2003 in Stephenson II [Lak03]. In North Carolina, districts which are contiguous are not allowed to be connected by just one point (e.g. if four counties meet at a corner, a district cannot go across the corner to connect opposite counties). However, districts are considered contiguous even if they are only connected by water. Constraint (3) is known as the "Whole County Provision," or WCP [Lak02]. Due to the population distribution of North Carolina, it is impossible to satisfy constraint (1) while keeping every county whole. For example, several counties contain significantly more than 1/120th of the state population, so they must be split into multiple House districts. Counties may also have too much population to be one district but too little to be split into two districts. In this case, a district must be drawn which crosses county lines.
The process chosen to comply with both of these constraints is to first group counties together into county clusters, then draw districts that do not cross cluster lines. The more clusters formed and the fewer counties in each cluster, the fewer districts must be split between counties. 2 North Carolina's Supreme Court ruled that a legal clustering is one that contains the most single-county clusters, then among those options, one that contains the most two-county clusters, and so on [Dic15]. As mentioned earlier, we assume that it is always possible to construct VRA districts regardless of the clustering chosen, so we have only considered NC Constitution constraints (1-3) in this paper.

Mathematical Overview
The legal definition that concluded the previous section may be recast into mathematical terms in the following way Definition 1. Consider two county clusterings A and B. If clustering A has a 1 single-county clusters, a 2 two-county clusters, etc. and clustering B has b 1 single-county clusters, b 2 two-county clusters, etc., clustering A is preferred over B if a n > b n for some n and a m = b m for all m < n. A clustering is a legal county clustering (equivalently optimal county clustering) if no clustering is preferred over it.
We develop an algorithm to determine all optimal county clusterings. At the highest level, the basic algorithm can be described as follows: First, identify which single counties can be divided into an integer number of districts, with each district being within 5% of the ideal district population dictated by the "one person, one vote" principle, and such that the remaining counties can still be grouped into legal clusters. Then among the remaining counties, we look for all pairs of contiguous counties which contain an integer number of districts within 5% of the ideal population, where remaining counties can still be grouped into legal clusters. This process is then performed with groups of three contiguous counties, then four, and so on. Eventually all of the counties are placed in a county cluster, with each cluster having an assigned number of districts it should be subdivided into. Pseudocode for the algorithm is presented in Algorithm 1. Algorithm 1: High-level pseudocode for the county clustering algorithm Input: Populations, adjacencies, number of districts. Output: Optimal clustering. Let all counties be unassigned to a cluster; Let n = 1; while any county is unassigned do Let S be the largest disjoint set of n-county clusters made from unassigned counties, such that unassigned counties not in S can still be made into legal clusters ( * ); Assign counties according to S; Increment n; return the county cluster assignment; The most difficult part this procedure is verifying that the remaining counties might still form a collection of legal county clusters (marked ( * ) in Algorithm 1). To find such a set of clusters, we construct a search tree and use a branch-and-bound depth-first search method to traverse it. Each level of the tree represents adding one county cluster consisting of n counties, so the depth of the tree corresponds to the number of clusters with n counties we have added. The leaves at the deepest level of the tree represent all largest sets of clusters containing n counties. In constructing this tree, we must guarantee that it is still possible to cluster all remaining unassigned counties at each level. The following theorem allows us to quickly characterize whether it is possible to cluster these remaining counties. A more verbose formulation of the theorem is given and proven in Appendix A.2.
Theorem 1 (Short Version). A set of counties C can be clustered into D districts if and only if each contiguous subset of C, c k , can be divided into d k districts subject to the constraint that d k = D, where D and d k are all positive integers.
The search tree is far too large to search entirely. We can reduce the search space dramatically using two mathematically-rigorous bounds on "how good" a solution under a particular branch could be. These optimizations allow us to search only the parts of the tree which could lead to optimal solutions. The full details behind the algorithm and bounds can be found in Appendices A.3 and A.4.

2017 Clusterings
Using the clustering algorithm described in Section 2, we determine the optimal county clusterings using the 2010 census data. We find that the clusterings constructed in 2017 and used in the 2018 elections are optimal. However, there are other optimal county clusterings for both the Senate and House districting plans; we find four optimal clusterings in the Senate and two in the House. All possible optimal clusterings are depicted in Figure 1 for the Senate and House. In each figure, a map of the entire state demonstrates the clusters (colored) that are found in all optimal maps. Regions which have multiple options are not colored (kept white) in this map and labeled with a letter in parentheses; each option is shown in subsequent maps in the figures. Clusters are labeled with the number of districts they contain. The clusters used in the maps enacted in 2017 are the first option in each region with possible choices.
Although the choices are all equivalent under the definition of optimality, some may be preferable over others in fulfilling the legal requirements for districtings. For example, the deviation from ideal population is not the same between choices. The population deviations in the clusters in each option are summarized in Figure 2. The numbers shown are the percent difference between each district's population and the ideal population if all districts within a cluster were drawn to have the same population.  In both the Senate region (B) and House region (A), the alternative, non-enacted, clustering option leads to districts that are closer to the ideal population compared to the enacted clustering. The most striking example is House region (A). The northernmost cluster, which contains 7 districts, had 4.996% more population than ideal. In fact, one district in the enacted districting slightly exceeds the 5% threshold [Per17]; this could have been avoided by choosing the other optimal clustering.

Stability of Clusterings Over Time
Using the Census Bureau population estimates [Ann19], we found the optimal clusterings for each year from 2010 to 2018. Note that the Census Bureau estimates the population at July 1 of each year, but the 2010 census represents the population at April 1. Hence, two results are given for 2010 which differ by 3 months.
In Figure 3 and Figure 4, the optimal clusterings for each year are depicted. Clusters which remain the same between adjacent years are colored light gray. Most years had multiple optimal options; the one shown is the one most similar to the previous year.
We use several measures to quantify the distance between clusterings. The first measure is the percentage of clusters which are different between the two clusterings (abbreviated DC for "different clusters"). If the clusterings are the same, the number of different clusters is 0%, and if every single cluster changed, it is 100%.
A more sophisticated measure takes into account how much each cluster changes. One way to do this is by the following observation: we choose a county at random and compare its cluster between two clusterings; if those clusters contain many of the same counties, the clusterings are similar. On the other hand, if the clusters have few counties in common, the clusterings are different.
This intuitive notion is made rigorous with variation of information (abbreviated VI). The variation of information between two identical clusterings is 0, and a higher number indicates very different clusterings, i.e. a low chance that a randomly chosen county is in a similar-looking cluster. The units of the variation of information are bits/county. This measure was previously used in other redistricting papers [FHIT18]. 3 The final measure we consider is variation of information divided by the average population change (abbreviated VI/APC). The average population change measures the average difference in county population between two data points. Thus, a high VI/APC is achieved when the clustering changed significantly (high variation of information) despite a small change in population (low average population change). This number   Figure 5. We present three metrics that quantify the minimal amount county clusterings would change across consecutive sets of population data. We find that, on average, roughly a third of the clusters change every time new data is collected (mostly annually, with the exception of a 3 month lapse in 2010); see the black line labeled DC (different clusters). The other two metrics, variation of information (VI) and variation of information by the average population change (VI/APC) are highly correlated with the first. All July dates are census estimates, whereas the April date is the decennial census count.
is a measure of the fragility of clusterings. The units of average population change are %/county so the units of VI/APC are bits/%.
There are generally multiple optimal clusterings in each year. Hypothetically, if the clusterings were redrawn when the new population estimates were released each year, one might desire that the new clustering is the one that preserves the most clusters from the previous clustering. This is how we report the stability metrics: starting with the enacted plans, we update the clusters each time the census bureau updates the population estimate by choosing the clustering which minimizes the number of different clusters; if there are multiple clusterings which have the same number of different clusters, we prefer the ones with a smaller variation of information.
The mathematical definitions of different clusters (DC), variation of information (VI), and average population change (APC) can be found in Appendix B. The results from the three metrics are shown in Figure 5.
As can be seen in the figure, the number of different clusters (DC) and the variation of information (VI) are highly correlated. The average population change (APC) of each year was close to 1%, so the numerical values of VI and VI/APC are close, with the exception of change between the two 2010 data sets. 4 Despite the smaller population difference, the clusters changed significantly in both the House and Senate, causing VI/APC for that data point to be very large. In other words, the clusters constructed using the 2010 census data are very sensitive to population change. Furthermore, on average more than one third of the clusters change each year, meaning that the clusters utilized in any districting cycle may be highly sensitive to timing and quality of the census data.

2020 Projected Clusters
Based on the NC OSBM County Population Projections [Cou], we predict the 2020 clusters. The results are shown in Figure 6I and Figure 6II. This data projects the population in July 2020, and may be far from the actual April 2020 population that the Census will measure due to both uncertainty in the projection and the 3-month time difference. As demonstrated before, a small difference in population can still result in a substantial change in the clusters, so it is very unlikely these predictions will be exactly correct, and it is possible they could be very far from what the actual clusters will be. We expect the majority of the county cluster to change in the upcoming redistricting cycle. In the Senate, we find only 2 clusters containing 3 districts that would potentially be preserved in the next redistricting cycle given the current population projections. 5 In the House, we find 11 clusters containing 24 districts that would be preserved in the next redistricting cycle; 6 Wake and Mecklenburg both are projected to remain single county clusters, however both are projected to increase in the number of districts. In short, we project that a majority of the state's clusters will be redrawn or have a different number of districts by the next redistricting cycle.

Another Possible Interpretation of the Whole County Provision
The courts could have interpreted the Whole County Provision in ways that would have further reduced the number of county splits and that would have led to different optimal county clusterings. One idea is that instead of maximizing the number of n-county districts at each step, one should instead maximize the total number of clusters. As the following theorem shows, this metric is essentially equivalent to minimizing the number of "county splits," where a county with all or parts of d districts counts as d − 1 county splits. A more precise formulation of this theorem is given and proved in Appendix C.
Theorem 2 (Basic Version). A clustering which maximizes the number of clusters also minimizes the number of county splits.
Does the court's metric give the clusterings with the most total clusters? In most cases, the answer is no, i.e. there is usually a clustering which has more clusters than the ones deemed optimal by the courts that would lead to fewer county splits. To determine this, we expand our search tree to include clusterings which are slightly suboptimal at the n-county cluster level, but which may lead to better solutions at the statewide level, therefore "relaxing" the algorithm.
In the Senate with the current provisions, there are four ways to generate 29 clusters. With the relaxed algorithm, we consider clusterings where the number of n-county clusters is allowed to be up to 3 fewer than the court-optimal solutions. 7 By relaxing the criteria in the Senate, we find over 25000 clusterings with 29 clusters.
In the House there are two ways to generate 41 clusters with the current provisions; by relaxing the criteria as above, we find 191 clusterings with 42 clusters. In 84 of these House clusterings the number of singleton counties is the maximum possible; we display these 84 clusterings in Figure 7. The alternative clusterings show that there are many more possible ways to cluster the counties. Each of the 84 county clusterings 5 The possibly preserved Senate clusters are the Lincoln-Cleveland-Gaston county cluster, which is the southeastern light blue 2-district cluster of the leftmost option (A) in Figure 6I, and the Pitt-Greene county cluster which is the brown central 1-district cluster of the leftmost option (B) in Figure 6I 6 The preserved House clusters are the Ashe-Watauga (1 district), Caldwell (1 district), Caswell-Orange (2 districts), Union-Anson (3 districts), Buncombe (3 districts), Macon-Cherokee-Clay-Graham (1 district), Lincoln (1 district), Davidson (2 districts), Iredell (2 districts) Guilford (6 districts) and Alamance (2 districts) clusters; on the map these county clusters are found in Figure 6II to be the orange-yellow northern most district with 1 district to the west, the brown single district cluster immediately south of the previous cluster, the northern-most central light blue two district cluster, the green three district cluster on the southern border, the green three district cluster toward the west of the state, the western-most green one district cluster, orange-yellow one district cluster next to the brown 13 district cluster, the central brown 2 district cluster, the bright yellow 2 district cluster to the west of the previous cluster, the bright-yellow 6 district cluster, and the green two district cluster immediately east of the previous cluster, respectively. 7 See Appendix C.2 for the exact alterations we make to the algorithm.  These are not optimal under the court's metric, as they all contain two fewer 2-county clusters than the court-optimal clusterings, in return for more total clusters. There are at least 107 more House clusterings with 42 clusters than the ones shown; however, these other clusterings use fewer 1-county clusters.
contain two fewer 2-county clusters than the enacted plan, but more 3-and 4-county clusters, and the other 107 House clusterings we found use fewer 1-county clusters. This means they are not optimal according to the court's criteria. However, they admit one fewer county split than the enacted plans because they contain one more cluster. This shows that the court-optimal House clusterings do not maximize the total number of clusters. In addition to the 2010 census data, the relaxed algorithm was run on all annual population estimates from 2010 to 2018. The results are summarized in Table 1, where they are compared to the optimal clusterings according to the metric defined by the courts. In most cases, the relaxed algorithm finds clusterings that lead to fewer county splits than the court-defined optimal clusterings. It also usually finds vastly more possible clusterings.
The relaxed algorithm does not exhaustively search the space, and therefore does not guarantee the clusterings found lead to the highest number of clusters and therefore the fewest number of county splits. Significantly more computational power and algorithmic improvements would be necessary to provably find all clusterings which maximize the total number of clusters. The amount of time it takes to run the algorithm grows exponentially with the degree to which the search space is expanded. Nevertheless, a sufficiently relaxed algorithm would find all clusterings which maximize the total number of clusters. See Appendix C.2 for the exact alterations we make to the algorithm, further discussion of the extent to which the algorithm must be relaxed to find all optimal clusterings under this modified metric, and the computational feasibility of such relaxation.
Given the computational difficulty of finding the optimal clusterings under the alternate interpretation, it is not clear as a matter of public policy if it is better to have fewer split counties without an guarantee of optimality, or more split counties with a guarantee of optimality within the more narrow optimization procedure given by the court. Furthermore, we have found that the number of possible cluster tends to The relaxed algorithm seeks to maximize the number of clusters (therefore minimizing the number of county splits per Theorem 2), whereas the original algorithm implements the definition of optimality given by the courts. The relaxed algorithm does not guarantee to find all optimal solutions, nor guarantee the solutions it finds maximize the number of clusters; the original algorithm does guarantee that it finds all court-optimal solutions. The Senate and House clusterings found using the relaxed algorithm did not use the same parameters; see Appendix C.2 for further details.
increase dramatically under the expanded sense of optimal clusterings, begging the question of which to choose.
6.1. Minimizing traversals. In addition to the whole county provision, map makers for the North Carolina legislature must minimize number of county traversals [Lak02,Leg11]. A district is said to traverse a boundary between two counties when there is a path within the district across the boundary. In minimizing county traversals, one must not have more than one connected component of a district within a county. Like minimizing the number of county splits, minimizing the overall number of county traversals is equivalent to maximizing the total number of clusters. We state and prove a more precise formulation of the following theorem in Appendix C.
Corollary 3 (Basic Version). A clustering which maximizes the number of clusters also minimizes the number of county traversals.
In short, all of the above work we have done to minimize the number of county splits is fully consistent with minimizing the total number of county traversals.

Discussion
We have developed an algorithm to cluster counties in a way that is consistent with court-adopted criteria. This algorithm has shown that the 2017 clusterings enacted by the North Carolina legislature are optimal, however it has also revealed that these clusterings are non-unique. We have developed and made our code available so that the clusterings adopted in the 2020 census cycle may be independently validated and compared with other possible choices [Cod]. We have also predicted what the 2020 county clusterings may look like based on projected census data. We have investigated the stability of the clusterings and have found that even modest population changes can lead to significantly different possible county clusters. This suggests that clusterings will be sensitive to when census data is collected and shows that the optimal clustering is not a fixed object. Finally, we have considered an alternative criteria to adhere to the Whole County Provision, and have found that the other criteria generally leads to fewer split counties and significantly more possible county clusterings. We have not, in this work, predicted how the choice of county clusterings, or county clustering criteria, may effect the partisan tilt in redistricting plans. We hope to consider this in future work.

A. Mathematical and Algorithmic Details
A.1. Definitions and Notation. The county graph G is a graph where each vertex represents a county, and there is an edge between two vertices if the corresponding counties are adjacent. In North Carolina, the ruling is that counties can be adjacent by water, but cannot be adjacent by a single point (i.e. catty-cornered, or rook, rather than queen, adjacency). If S is a set of counties, pop(S) is the sum of the populations of S's counties and |S| is the number of counties in S. pop(G) is thus the population of the entire state, and |G| = 100 for North Carolina.
G is to be divided into D districts such that the population of each district is within some fraction ε of the ideal district population, given by 1/Dth of the total state population. In North Carolina, ε = 0.05, i.e. up to 5% deviation is acceptable, and D = 50 for the state Senate and D = 120 for the state House.
A county clustering is a partition of G into contiguous sets of counties with information about how many districts these sets should contain, called county clusters. In other words, a county cluster is a tuple (set of counties, # of districts), and a clustering is a set of clusters. A cluster is called valid (or is said to have valid population) if its population divided by the number of districts it contains is within the error tolerance. In other words, if G is to have D districts, a cluster (C, d) is valid if where d is the number of districts in the cluster and C is the set of counties in the cluster. Note that in general the county set C may have multiple choices of d which satisfy these inequalities, especially if C has very large population. A clustering is valid if all of its clusters (C 1 , d 1 ), (C 2 , d 2 ), . . . are valid and As we have stated in Definition 1, the courts have defined a clustering A to be preferred over another B if A has more 1-county clusters than B, or, if they have the same number, if A has more 2-county clusters than B, and so on. In other words, if A n is the number of n-county clusters of A and similarly for B n , A is preferred over B if there is a k such that A k > B k and A j = B j for all j < k. An optimal or legal clustering is one that is either preferred over or ties with all other clusterings.
A.2. When Can Counties Be Clustered? We now continue the discussion started in Section 2. We give a more precise version of Theorem 1 from that section as well as its proof.
Theorem 1 (Enlarged Version). Let G be the county graph and S be an induced subgraph of G. If G is to contain a total of D districts, S has a valid clustering with a total of d ≤ D districts if and only if both of the following are true: • All connected components S k of S have valid population, i.e. for all S k there exists an integer d k such that • d lies between the sum of the minimum number of districts each connected component S k of S can contain and the related maximum, i.e.
Proof. Let {c 1 , c 2 , . . . } be the set of connected components of S. First, given the two statements of the theorem, we must prove that S can be clustered into d districts. We can rewrite the first statement of the theorem by noting that d k satisfies and since d k is integral, this is equivalently written It is then easy to see that given the second statement in the theorem, there is some choice of {d * k } which satisfy (1) for all k and where k d * k = d. Then {(S k , d * k )} is a valid clustering of S using d districts. Now for the reverse direction: given a valid clustering of S using a total of d districts, we must prove the two statements of the theorem. Let this valid clustering be the set of clusters L = {(C 1 , d 1 ), (C 2 , d 2 ), . . . }.
Then because the clustering L is valid, for all i, Then summing across connected components, has a valid population for all S k , which is the first condition in the theorem. Now solve (2) for (Ci,di)∈L k d i and add in floor and ceiling symbols, as we did to find (1). The result is Then summing over all k, and the theorem is complete.
Using this, we can quickly check each branch of the tree to see if the proposed clusters could possibly lead to a solution down the line.
A.3. Optimizing the Search Tree. In addition to the optimization given by the previous section, which effectively removes all branches which do not give solutions, two more major optimizations can be made to skip branches that give solutions that are known to not be optimal.
Let S be the set of counties remaining at some branch of the tree, and let V = {v 1 , v 2 , . . . v J } be the set of all valid n-county clusters with counties contained in S. Explicitly, v i ∈ V is a cluster made up of n counties, and the counties of v j may be shared with the counties of v i ; i.e. the county clusters of this set are not pairwise disjoint. Furthermore, the counties in the elements of V do not cover S in the general case, i.e. there may be some county in S for which is not contained in a valid n-county cluster found in S.
Suppose we are finding the largest set of disjoint n-county clusters S can contain, for some subgraph S ⊆ G, and we have already assigned as many 1-, 2-, ..., (n − 1)-county clusters as possible. Let {S k } be the set of connected components of S. Then for all S k where n < |S k | < 2n , we can remove from V all v j where v j ⊂ S k . This is because if any n-county cluster was assigned on S k , there would be less than n left-over counties, but we already know we cannot form any clusters of less than n counties.
Let V ⊆ V be the set of n-clusters that have not been pruned by the above optimization. Construct a new graph whose vertex set is counties in any cluster of V ; counties c i and c j share an edge if there is a v k ∈ V such that c i , c j ∈ v k . Let {V k } be the connected components of this new graph. Then if we know that there is a collection made up of T n-county clusters and the remaining unassigned counties, S, in the search tree is at depth t < T , we can skip the branch if we are unable to reach depth T based on V . To determine this, we note that each connected component of V , V k may be broken up into, at most, |V k |/n n-county clusters. Therefore we can skip the branch if as it will be impossible to find T − t more n-county clusters. Another way to look at this bound is that we first connect each county to other counties that could be part of the same n-county cluster. Connected components on this new graph represent disjoint sets of counties that don't interact at the n-county cluster level. Then, we skip any branch where the sum of the maximum number of n-county clusters of each connected component does not make up the difference between T and t, so any leaf under that branch is necessarily a worse solution than we have already found.
A.4. Complete Algorithm. The complete algorithm we use is as follows. First, we define Algorithm 2, which uses depth-first search and the optimizations described in the previous subsections. This performs the ( * ) operation noted in Algorithm 1. Then we have the iterative procedure defined in Algorithm 3 which outputs the set of all optimal clusterings, which is a fleshed out version of Algorithm 1. A 1 , A 2 , . . . and clustering B made of clusters B 1 , B 2

Consider clustering A made of clusters
Algorithm 2: Recursive depth-first search to find largest valid n-county cluster sets Input: (always the same) Populations, adjacencies, minimum and maximum district population. Input: Set of counties S, number of districts d, cluster size n. Input: (for recursive calls only) Set of valid clusters V , deepest point reached so far T (i.e. the number of n-county clusters we have found). Output: Set of all largest sets of n-county clusters with valid district numbers. if not called recursively then Let V be the set of all valid n-county clusters on S (these can be found simply by enumerating contiguous n-county subsets of S and checking for validity); Let T = 0; if the check given by Theorem 1 fails or the checks given in Appendix A.3 fails then return ∅; // Note that the checks in Appendix A.3 may mutate V .
Algorithm 3: Complete algorithm Input: Populations, adjacencies as a graph G, number of districts D. Output: All optimal clusterings. Let S be the vertex set of G, i.e. the set of all counties; Increment n; return O n−1 ; and the log is base 2 so the units are bits/county. The summand is defined to be 0 if A i and B j are disjoint. Note that V I and DC are both symmetrical in their arguments, and they are zero if A = B. V I is a true metric [Mei07], but DC does not satisfy the triangle inequality.
Consider year X where the populations of counties are X 1 , X 2 , . . . , X n and year Y where the respective populations are Y 1 , Y 2 , . . . , Y n . The average population change is defined as .
The reason that the denominator of the summand is the average of the two populations rather than the earlier population (as population changes are typically expressed) is to make the function symmetrical.

C. Other Optimality Metric
We now return to Theorem 2 which was previously discussed in Section 6. We give and prove a more precise version of this theorem C.1. Proof of Theorem 2 and Corollary 3. Suppose in a districting ξ, county c i contains parts or all of m i districts. Then the number of county splits D has is defined to be f (ξ) = i (m i − 1).
We introduce the notion of a minimal cluster. A cluster is minimal if it is valid and cannot be split into two smaller valid clusters. Note that the clustering which maximizes the total number of clusters contains only minimal clusters. Now we have the theorem: Theorem 2 (Enlarged Version). The clustering which maximizes the number of clusters, when districted optimally, also minimizes the number of county splits, with the exception of rare circumstances which impact the optimal districting. Proof. We proceed by "growing" a cluster to show that minimal clusters with d districts, when districted optimally, usually contribute d − 1 county splits. Consider a cluster (C, d), where C = {c 1 , c 2 , . . . c n }, a set of counties and d is a number of districts. The set C induces a subgraph of the county graph G. On this subgraph, draw a rooted spanning tree T . Order the vertices such that each vertex comes before its parent in the tree, and the root node is last. We will use this spanning tree and ordering to construct a districting with few county splits.
Consider the vertices in the order decided. We assign districts to a vertex according to three cases: the vertex is a non-root leaf, the vertex is a non-root non-leaf, and the vertex is the root.
Consider a leaf of the spanning tree c i . Let it contain some number of whole districts and exactly 1 "partial" district, which is shared with other counties. Draw the whole districts such that the remainder of the county is connected and contains at least part of the border with c i 's parent in the spanning tree.
Consider a county c j which is not a leaf and not the root. If the partial districts of two children of c j combine to less than a valid district and those children are "next to" one another, extend their partial districts inside c j , connecting them. Consider this combination as one "child" from this point forward. The requirement that the two children are "next to" each other means that, upon connecting them, c j is not broken into multiple disconnected components. Continue connecting children until as many as possible are connected. When we connect two children in this manner, we call it a good combine.
Next use c j 's population to complete as many partial districts as possible. After doing this, there are two cases: either all partial districts were completed and c j has population left over, or there are some number of uncompleted children and c j 's remaining population is less than the population of one district. In the former case, draw as many whole districts on c j as possible, leaving the remainder of c j connected and bordering its parent. The latter case has two subcases: there is either one child left, or multiple children left. If only one child, combine this child with the rest c j , with the new partial district bordering c j 's parent. If more than one child is left, it must be the case that partial districts which are next to each other sum to more than one district even without any of c j 's population. Call two adjacent children x and y. Use a small amount of c j to add to y's partial district ξ y and bridge over to x. Then, reapportion part of x's partial district ξ x to now be part of the ξ y , leaving ξ y with valid population and ξ x with less population. Note that ξ y now stretches across three counties: x, c j , and y. On the other hand, ξ x contains part of x and borders c j .
What we have done is combined children x and y, but added one county split. We will call this a bad combine. After doing a bad combine, it may be possible to do more good combines. Continue combining children, first doing as many good combines as possible, then one bad combine, then good combines, etc. until only one child remains. Combine this child with the rest of c j , whose partial district will end up bordering c j 's parent.
Repeat this process until just one county, the root county, remains. For this county, as before, connect children, complete as many partial districts as possible, then connect children, and so on. As all districts need only to have valid population in some range, rather than a specific value, there may be slightly too much or too little population in the root county to complete the districting. In these cases, return to lower layers of the tree and tweak boundaries slightly while staying with good population and not changing which districts are part of which counties, until the districting can be completed.
We will now compute the number of county split this process resulted in. Each edge of the spanning tree represents exactly one time that a district was part of multiple counties, with the exception of edges that were used in bad combines, which contribute one more for each bad combine. Given a districting which minimizes county splits, one can construct a spanning tree, vertex ordering, and sequence of combines such that this process results in the given districting, so performing this process for all spanning trees, orderings, and sequences of combines is guaranteed to minimize county splits. Although it is not difficult to construct examples where bad combines are necessary in all cases, 8 we think it is rare to require any in most real-world scenarios. Hence, if a cluster contains d districts, it usually can be districted using d − 1 county splits.
On the other hand, d − 1 county splits is a lower bound for minimal clusters containing d districts. For any districting, we can consider the subgraph generated by the districting, where two counties are adjacent in the subgraph if they are adjacent in G and a district contains parts of both of them. Because the cluster is minimal, this graph must be connected (otherwise the cluster could be split via the connected components into multiple valid clusters). Hence, it contains at least E ≥ |C| − 1 edges. By a similar calculation to above, the number of county splits is at least d + E − |C|, which is at least d − 1.
Now the total number of county splits in all clusters is counties ci The first term in the right-hand expression just the total number of districts and is constant. We suspect the inequality is almost always tight in real life; if this holds, then maximizing the number of clusters minimizes the number of county splits. 8 The smallest example contains four counties: one in the middle and three surrounding it, which receives two districts total.
None of the three outer counties touch each other, only the inner county. Then if the outer counties have populations 0.9, 0.9, and 0.2 of a district, with the inner county having negligible population, one bad edge combine is required. For a bad edge combine to be required, it is necessary that (1) the cluster receives multiple districts, and (2) for all spanning trees on the cluster, there is a vertex with "low population" and degree at least 3.
Corollary 3 (Enlarged Version). The clustering which maximizes the number of clusters, when districted optimally, also minimizes the number of traversals, with the exception of rare circumstances which impact the optimal districting.
Proof. Each edge in the spanning tree in the proof of Theorem 2 represents a district traversing a county-tocounty boundary (i.e. one traversal). The only additional requirement when optimally redistricting is that when partial districts in the children are combined with sections of the parent, the district within the parent must form a connected component. The only exception to an edge adding only one traversal are the bad combine's which introduce two traversals. The number of edges in a spanning tree is one less the number of vertices # of traversals in cluster = |C j | − 1 + (# of bad combines), where |C j | is the number of counties in cluster j. Therefore, the total number of traversals is The first term in the right-hand expression just the total number of counties and is constant. As mentioned in the previous proof, we suspect bad traversals to be rare, making the inequality almost always tight in real life; if this holds, then maximizing the number of clusters minimizes the number of traversals.
C.2. Fuzziness. The alterations to the algorithm are fairly straightforward. First, we change the metric used to bound branches, because simply comparing the number of clusters does not account for counties remaining. As an example, consider a partial clustering which uses 1 single-county cluster and no two-county clusters, compared to a partial clustering which used no single-county clusters but 2 two-county clusters. Which of these clusterings is preferred? Although the second clustering uses more clusters, it has fewer counties left to cluster so it may lead to fewer clusters in the long run. In fact, the first case has the potential to have one more three-county cluster than the second case. If the second case instead had 3 two-county clusters, or if it was known that the first case did not lead to one more three-county cluster, the second case would be better. With this in mind, the metric we now used is (n + 1)(# of clusters) + (# of unassigned counties) , where n is the number of counties in the clusters just assigned; in the previous example, n = 2. Note that adding one cluster containing n counties increases the first term by n + 1 and decreases the second term by n, for a net increase of 1. Under this metric, the two cases described previously would be equal after n = 2, though if n increased without the first case "catching up," the second case is better.
Next we introduce a "fuzziness" f . At each iteration, instead of taking all solutions which maximize this measure, we take all solutions that are at least f smaller than the best solution under this measure. Almost no changes are needed to adapt the bounds and optimizations to this measure. The first optimization in Appendix A.3 can still be done; although it is not guaranteed that all possible n-county clusters are taken at each step, at no future point in the algorithm will any other n-county clusters be created. The second optimization calculates the maximum number of clusters that could be added. Since each cluster increases the measure by 1, the calculation is exactly the same, though now we only skip a branch if we cannot make up the difference between t, the current depth, and T − f , the greatest depth reached minus f . The other alteration to the algorithm is in updating O (in the notation of Algorithms 2 and 3); at each step we now set O to be all elements of O and R which are at least as good as the best element of those sets minus f . C.2.1. Details of our analysis. In our analysis, a fuzziness of 3 was used for the Senate and a fuzziness of 2 was used for the House. This is because the House has a much larger number of valid clusters, leading to much larger search trees than the Senate.
Additionally, for both the Senate and House except in the 2010 census House, an optimization was made which reduces the number of cases considered, which is as follows. Consider a case where there are two regions, A and B, that each have two options, (A1 and A2) and (B1 and B2). Only two clusterings are needed to reconstruct all four possibilities, namely the clustering with A1 and B1 and the one with A2 and B2 (or A1, B2 and A2, B1). For larger sets, such as those dealt with by the relaxed algorithm, a significant amount of time and memory is saved by considering only the reduced set of clusterings which contain enough information to reconstruct the whole set.
The complete set can be reconstructed by comparing each pair of clusterings and checking for situations like in the above paragraph describes. Additional solutions that were reconstructed must also be compared against all other solutions, and so on until no more additional solutions are found.
C.2.2. How much fuzziness is required? Clearly, having a fuzziness of 100 (the number of counties in North Carolina) would suffice to provably find all maximum-cluster clusterings. However, the same result can be accomplished with much less fuzziness. First, consider the House using 2010 census data. It is known that there exists a clustering with 42 clusters. The legal precedent requires taking all 1-county clusters, of which there are 12. If a proposed partial clustering had no 2-county clusters, it could have at most 12+ 88/3 = 41 clusters. In fact, at least two 2-county clusters must be taken in order to tie 42; at least 5 must be taken to get 43. The most possible 2-county clusters is 17 for the House, so a fuzziness of 17 − 5 = 12 suffices to prove that 42 clusters is optimal.
In fact, this can be improved by noting that not all counties can be part of 3-county clusters, and some must be part of 4-or 5-county clusters. In particular, the westernmost county Cherokee must combine with at least 3 others, as do many of the eastern counties with low population. In fact, one can show that at least 14 counties must be part of a cluster with at least 4 counties. All this means a fuzziness of 17 − 10 = 7 suffices to prove 42 clusters is optimal (43 clusters could be reached under these constraints, for example, by 12 1-county clusters, 10 2-county clusters, 18 3-county clusters, 2 4-county clusters, and a 6-county cluster).
It is harder to find good bounds for the Senate. In particular, the additional analysis in the previous paragraph is far more complicated as a result of an increased number of possibilities. Here is a simpler analysis. There is a unique 1-county cluster (Mecklenburg). In the case that no 2-county clusters are taken, at most 16 3-county clusters can be taken, resulting in at most 29 clusters (11 4-county and one 7-county in addition to Mecklenburg and the 16 3-county clusters), so a fuzziness of 13 (based on the maximum number of 2-county clusters) suffices.
Reaching a fuzziness of 7 for the House or 13 for the Senate is out of reach without major alterations to the algorithm. Both the time and space complexity are empirically at least exponential in fuzziness, so it would take orders of magnitude longer and more memory to run the algorithm when compared to the analysis we completed using fuzziness 2 and 3. More research would also be required to develop a good general algorithm for bounding fuzziness in order to extend this algorithm to other population datasets.