A Mathematical Connection Between Single-Elimination Sports Tournaments and Evolutionary Trees

Summary How many ways are there to arrange the sequence of games in a single-elimination sports tournament? We consider the connection between this enumeration problem and the enumeration of “labeled histories,” or sequences of asynchronous branching events, in mathematical phylogenetics. The possibility of playing multiple games simultaneously in different arenas suggests an extension of the enumeration of labeled histories to scenarios in which multiple branching events occur simultaneously. We provide a recursive result enumerating game sequences and labeled histories in which simultaneity is allowed. For a March Madness basketball tournament of 68 labeled teams, the number of possible sequences of games is ∼1.91×1078 if arbitrarily many arenas are available, but only ∼3.60×1068 if all games must be played sequentially in the same arena.

The National Collegiate Athletic Association men's and women's basketball tournaments, colloquially known as "March Madness" after the month during which most of their games take place, are single-elimination sports tournaments with 68 teams from colleges across the United States.Each team is assigned an initial opponent, with subsequent opponents determined by the outcomes of a sequence of specified games.A team that loses a game plays no subsequent games, so that 67 games are played until a single winning team remains.
In typical years, the games are played in multiple, distant locations.Games are scheduled in many arenas, often concurrently.The teams in a tournament are divided into four regional groups of approximately equal size (18, 18, 16, and 16 in the 2019 men's tournament, for example), and they play their games within the regional groups until four teams remain, one from each region.The "Final Four" teams play the last three games in a single arena, revealing the champion of the tournament.
The 2020 tournaments were canceled due to the COVID-19 pandemic.For the 2021 tournaments, with the pandemic continuing, the organizers sought to limit teams' travel, arranging for all the games to be played in Indiana in the men's tournament and San Antonio in the women's tournament.This circumstance inspires a question: Suppose all the games in a single-elimination sports tournament are played sequentially in the same arena.In how many possible sequences can the games be played?
In an actual March Madness tournament, the sequence of games is divided into "rounds."First, the "First Four" round is played, reducing the 68 teams to a fully symmetric arrangement of 64 teams.Next, for each i from 1 to k − 1, with k = 6, each remaining team plays its ith game (not counting the "First Four" games) before This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/),which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent.

MSC: 92D15
Figure 1 A bracket with 2 3 = 8 teams.Teams labeled T1 to T8 appear as circles, and games appear as squares.The number of games played is 2 3 − 1 = 7.Note that our use of the term bracket refers to a tournament structure after teams are assigned to the leaves, but before any games are played; in the setting of sports tournaments, a bracket can also include predicted or actual winners of the games.any team plays its (i + 1)st game.However, an equally valid sequence would play all the games in one of the four disjoint regional groups of teams before any game is played in the other three groups; such a sequence would identify one member of the "Final Four" before the teams in other regions have played any games at all.Many different sequences of games leading to the "Final Four" can be envisioned.
We will see that the question of enumerating sequences of games reveals connections between a familiar sporting event and problems of evolutionary biology.Although the actual 2021 tournaments used multiple arenas rather than a single one (all in Indiana in the men's tournament, all in San Antonio in the women's), the difference from a typical year-with games spread across 14 arenas in 11 states plus the District of Columbia in the 2019 men's tournament, for example-suffices to inspire a mathematical connection.

Symmetric single-elimination tournaments
We make the enumeration problem precise.For now, ignore the "First Four" games, and consider a symmetric arrangement of 2 k distinguishable teams, each of which has the potential to win the tournament by winning k games.Assign each team its sequence of potential opponents.Next, bijectively assign each of the 2 k − 1 scheduled games a label from {1, 2, . . ., 2 k − 1}, subject to a constraint.In particular, a game can only be played after its teams have been determined: the label for a game must exceed the labels for games that determine its teams.We term the tree whose leaves encode the teams and whose interior nodes encode the games a bracket (Figure 1).
First, we consider a scenario with a single arena.In how many possible sequences can the 2 k − 1 games of a bracket be played in a single arena?The problem consists in counting permissible bijections between the 2 k − 1 games and the labels {1, 2, . . ., 2 k − 1}.
To develop an understanding of the problem, we examine small k up to k = 6, corresponding to the March Madness men's tournaments from 1985 through 2000 and the women's tournaments from 1994 through 2021; the number of men's teams was increased from 64 to 65 in 2001 and then 68 in 2011, and the number of women's teams was increased from 64 to 68 in 2022.For k = 1, there is a unique ordering of one game.For k = 2, either of the two non-championship games can be played first, then the other, and then the championship game, so that there are 2 sequences.
For k = 3, consider the bracket in Figure 1; for convenience, we index games with letters rather than numbers.First, note that any number of games among {A, B, C} in the "left" sub-bracket can be played before the first game in the "right" sub-bracket (necessarily D or E).Having fixed the schedule in each sub-bracket, we can count the number of ways to sequence the games from the sub-brackets to form a complete sequence of all games; this number is the same irrespective of the sequences within sub-brackets.For illustration, suppose the order of games for the left sub-bracket is B, then A, then C. For the right sub-bracket, suppose the order is D, then E, then F.
For fixed sequences of games in the left and right sub-brackets, the number of ways of sequencing the games from the right sub-bracket in relation to the games from the left sub-bracket can be counted by choosing which 3 ranks among {1, 2, 3, 4, 5, 6} are assigned to games in the left sub-bracket.For example, suppose the ranks 1, 2, 6 are assigned to the left sub-bracket.With the sub-bracket game orders B, A, C and D, E, F already chosen, the complete game schedule is B, A, D, E, F, C, followed by G, which must be played last.
Hence, 6  3 = 20 valid game sequences exist for each pair of choices for the order of games in the left and right sub-brackets of 2 2 = 4 teams.We have already seen, for k = 2, that 2 valid sequences of games exist for each of these sub-brackets.Multiplying the number of ways of sequencing the games in the left and right sub-brackets in relation to each other by the numbers of sequences of games in the left and right sub-brackets themselves, the number of possible sequences of games is 20 × 2 × 2 = 80.
This example leads to a recurrence.Let S(k) denote the number of valid game sequences for a tournament of 2 k teams.We saw S(1) = 1, S(2) = 2, and S(3) = 80 via S(3) = 6  3 [S(2)] 2 .For 2 k teams, there are [S(k − 1)] 2 pairs of sequences for the two sub-brackets that produce the two teams in the championship game.The binomial coefficient that counts the number of ways that the games of one sub-bracket can be placed in relation to the games of the other sub-bracket is 2 k −2 2 k−1 −1 .Hence, we have This recurrence, with S(1) = 1, has solution Using equations (1) or (2), we find that By a nice coincidence, the quantity S( 6), counting valid sequences of games in a tournament of 64 teams, has 64 digits.Note that no particular requirement exists that a bracket have 2 k games, and the problem of counting possible sequences of games in a bracket can be considered for arbitrary single-elimination tournament structures.

Labeled histories in evolutionary biology
In fact, the problem of counting sequences of games in a bracket is equivalent to a problem of evolutionary biology, that of enumerating the labeled histories that are compatible with a labeled topology.(B, A, D, E, F, C, G).Note that in (A) and (B), the temporal sequence of non-leaf nodes is not needed for describing labeled topologies, and labels for those nodes are omitted.
In evolutionary biology, species are related to each other by descent from a common ancestor.The ancestor-descendant relationships can be represented by a tree structure.Consider a rooted binary tree T with y leaves, bijectively labeled by the elements of a label set containing y elements.In the context of evolution, each label represents the label for a "taxon," or a distinctive biological group such as a species.The tree describes the descent relationships among the taxa.A rooted labeled binary tree is termed a labeled topology (Figure 2A,B).With the precise definition of "bracket" given above, a bracket is exactly a labeled topology, replacing the leaf labels for biological taxa with the team names in the tournament.
A second structure of interest in evolutionary biology is that of a labeled history.Consider a rooted binary tree T as a directed graph with edges that point from the root toward the leaves.A node w of T is said to be ancestral to a node v if w lies on the path from the root to v; v is said to be descended from w. Trivially, a node is both ancestral to, and descended from, itself.Given a labeled topology for a rooted binary tree T with y leaves, a labeled history for T is a bijection σ from {1, 2, . . ., y − 1} to the internal (non-leaf) nodes of T , satisfying the constraint that if node v is descended from node w in T , then σ −1 (v) ≤ σ −1 (w) (Figure 2C,D).
Observe that this constraint on labeled histories is precisely the constraint that makes a sequence of games valid for a tournament: valid game sequences compatible with a tournament correspond to labeled histories compatible with a labeled topology.Hence, we can use ideas from the mathematical study of evolutionary trees, or mathematical phylogenetics, for sports tournaments, and vice versa.
Note that in an evolutionary tree, the branching of one lineage into two is usually treated as taking place instantaneously.In the analogy with sports tournaments, it is convenient to also assume that each game is played instantaneously.Note also that for alignment with the sports analogy, in our definition of labeled histories, the numbering of internal nodes increases from leaves toward the root, but in mathematicalphylogenetic studies, the opposite convention is often adopted.

Review of mathematical phylogenetics results.
The problem of enumerating labeled histories compatible with a labeled topology can be solved recursively [15].Consider a labeled topology T with subtrees L and R immediately descended from the root, where one subtree (L) is arbitrarily assigned to be the "left" subtree and the other (R) is arbitrarily assigned to be the "right" subtree.Let |T | be the number of leaves of T , so that T has |T | − 1 internal nodes.For each pair consisting of a labeled history for L and a labeled history for R, we must count the number of ways of placing into a sequence the |L| − 1 internal nodes of L and the |R| − 1 internal nodes of R. Reserving the label |T | − 1 for the root node of T , we have that |L| − 1 of the numbers {1, 2, . . ., |T | − 2} must be assigned to the internal nodes of L and the rest to the internal nodes of R.This assignment can be made in |T |−2 |L|−1 ways.Hence, multiplying by the numbers of labeled histories for L and R, and noting that the number of labeled histories is N(T ) = 1 for labeled topologies T with 1 leaf (and for those with 2 and 3 leaves), we have a recurrence: Theorem 1.For a labeled topology T , the number of labeled histories N(T ) is This recurrence can produce a non-recursive formula [5].For subtrees L and R, let their subtrees be L , L r and R , R r , respectively.Applying the recurrence, Iterating until each subtree in the expression has 1, 2, or 3 leaves, we have a product of binomial coefficients, one for each internal node of T .To state the result precisely, denote by V 0 (T ) the set of internal nodes of T , including the root.For each v ∈ V 0 (T ), denote by m(v) the number of leaves in the subtree rooted at v, and denote by (v) the number of leaves in the left subtree of the subtree rooted at v. Then In this product, each internal node other than the root appears in the numerator of one binomial coefficient and the denominator of another; the root appears only in a numerator.Multiplying by (|T | − 1)/(|T | − 1), the expression can be simplified.

Theorem 2. For a labeled topology T , the number of labeled histories N(T ) is
As an example of the theorem, consider the case in which T is the labeled topology that corresponds to the symmetric bracket for 64 teams (Figure 3).Considering all internal nodes v in T , the tree contains one node with 64 descendant leaves, two nodes with 32 descendant leaves, four nodes with 16 descendant leaves, eight nodes with 8 descendant leaves, 16 nodes with 4 descendant leaves, and 32 nodes with 2 descendant leaves.Hence, the product over internal nodes in Theorem 2 is The full March Madness men's tournament bracket from 2021, with the "First Four" games included, adds two games each to two of the four sub-brackets that produce the "Final Four" teams, so that two sub-brackets contain 18 teams and the other two contain 16 teams.
This modification of the symmetric bracket for 64 teams produces 4 more internal nodes (Figure 4).It also changes the numbers of descendants for many of the nodes in the resulting tree T .Considering all internal nodes v, m(v) now takes values 68, 36, 32, 18, 16, 9, 8, 5, 4, 3, and 2. The numbers of nodes with these numbers of descendants are 1, 1, 1, 2, 2, 4, 4, 4, 12, 4, and 34, respectively.The product in Theorem 2 becomes 1 67 Uses of labeled histories.Labeled histories, sometimes termed coalescence sequences or ranked labeled trees, appear frequently in mathematical phylogenetics.They are among the main classes of tree structures used in assessing probabilistic outcomes of assumptions about evolution [28, p. 47].It is often convenient for evolutionary models to assume that each sequence of branching events that could produce a rooted binary tree for a set of labeled species is equally likely; this assumption, that of the Yule or Yule-Harding model in phylogenetics [1, 13, 15, 20, 27, 28, 30, 33, 35], produces a uniform distribution on labeled histories.
Computations concerning features of tree shape for evolutionary trees often evaluate the probability that such features are produced under the Yule-Harding model, so that they directly or indirectly examine the fraction of labeled histories on y leaves that possess a given feature, or the probability distribution of a quantity across labeled histories [3, 5-8, 10, 12, 16, 18, 22, 24, 25, 29, 31, 34, 36, 37].Mathematical phylogenetics computations have used combinatorial results on the set of labeled histories for y species, for example employing a space of labeled histories with a notion of distance between them [26] and a characterization of the labeled topologies that possess the largest number of labeled histories [9,11,14].In some situations, phylogenetic research reports results on labeled histories that have been obtained in equivalent scenarios in computer science, involving concepts such as binary search trees [

Simultaneous games, simultaneous binary mergers
Not only does the biological setting introduce a result for the sports tournament sequence enumeration problem, the sports context also introduces a new idea that has not often been considered in the biological setting of evolutionary trees: simultaneity.If the games of a tournament are played in multiple arenas, as is true of March Madness in typical years, then games can be played simultaneously.
Suppose arenas are available.A sequence of games is now permitted to possess "ties," where a tie represents games played in different arenas at the same time.How many tie-permitting sequences are possible for a tournament with bracket T if arenas are available?Figure 3 A labeled topology with 64 labeled leaves.This labeled topology corresponds to a 64-team bracket, with teams numbered 1 to 64; the root node is the championship game.
Figure 4 A labeled topology with 68 labeled leaves.This labeled topology corresponds to the 68-team bracket that was used in the March Madness men's tournament in 2021.
We call this quantity N (T ); N 1 (T ) gives the case with one arena, denoted by N(T ) in Theorems 1 and 2. We enumerate tie-permitting sequences in the "infinite-arenas" setting, obtaining N ∞ (T ).The number of available arenas need not actually be infinite.It need only satisfy ≥ |T |/2 , where |T | is the number of teams in bracket T , as no more than |T |/2 games can ever be played simultaneously in T .
For convenience, we refer to a collection of simultaneous games as an "event."A feature of the infinite-arenas context is that events induced by two game sequences, one for the left sub-bracket of a node and the other for the right sub-bracket, can be combined into a joint event when forming the complete game sequence, without occupying more arenas than are available.By contrast, if, for example, = 2, then a 2-game event from the left sub-bracket cannot be simultaneous with an event from the right sub-bracket, as this joint event would occupy at least 3 arenas.
To state a succinct recurrence for N ∞ (T ), we let E ∞ (T , n) be the number of tiepermitting sequences on the bracket T which consist of exactly n events, so that the sequence occupies exactly n distinct points in time.Note that E ∞ (T , |T | − 1) is equal to N 1 (T ), or, equivalently, the quantity denoted by N(T ) in Theorems 1 and 2. Let δ(T ) denote the depth of T , the maximum length of a path from the root of T to one of its leaves.We have We can now give the following recurrence for E ∞ (T , n), noting that for a trivial bracket T consisting of a single team, we have that E ∞ (T , 0) = 1 and E ∞ (T , n) = 0, for n = 0.
Theorem 3. Let T be a bracket with left sub-bracket L and right sub-bracket R.
where X is the trinomial coefficient .
Proof.For (i), if a bracket T has only one team in the left sub-bracket and one team in the right sub-bracket, then a single game is played (n = 1), and trivially, only one sequence exists for this game.Hence, E ∞ (T , 1) = 1 and E ∞ (T , n) = 0 for n = 1.
For the recursive case (ii), we count sequences that merge a sequence of games from the left sub-bracket L and a sequence of games from the right sub-bracket R. The last event in any such sequence is a single game, corresponding to the root node of T .
In a sequence of n events, the number a of events in L can range from δ(L) to |L| − 1, and the number b of events in R can range from δ(R) to |R| − 1.We additionally require a, b ≤ n − 1 to ensure that at most n − 1 events are used for the bracket T , excluding its root, and a + b ≥ n − 1 to ensure that at least n − 1 events are used.
The number of sequences for the left sub-bracket containing a events is E ∞ (L, a), and the number for the right sub-bracket containing b events is E ∞ (R, b).It remains to prove that the trinomial coefficient correctly counts the number of possible ways to form a sequence of n − 1 events for the bracket T (excluding the root) by combining the a events of the left sub-bracket and the b events of the right sub-bracket in an order-preserving, tie-permitting sequence.Each of the n − 1 events must be formed in one of three ways: an event among the a events of the left sub-bracket occurs and is not simultaneous with an event among the b events of the right sub-bracket, an event among the b events of the right sub-bracket occurs and is not simultaneous with an event among the a events of the left sub-bracket, or an event among the a events of the left sub-bracket is simultaneous with an event among the b events of the right sub-bracket.The numbers of events in these three disjoint categories are (n − 1) − b, (n − 1) − a, and a + b − (n − 1), respectively.
In the biological context, the setting of Theorem 3 corresponds to labeled histories with ties, in which-looking backward in time-multiple pairs of lineages can coalesce simultaneously.How many tie-permitting labeled histories are possible for a labeled topology if arbitrarily many pairwise coalescences can occur simultaneously?In mathematical evolutionary biology, models that permit simultaneous pairwise coalescences, or simultaneous binary mergers, are sometimes studied [2,23].Such models relax the assumption of the Yule-Harding model for tree shape that coalescences must be asynchronous.They are useful when considering genealogies of genetic lineages sampled in small populations in discrete time; in such settings, it is not improbable that multiple pairs of lineages will coalesce in the same discrete time step.The problem of counting sequences of games for single-elimination tournaments when multiple arenas are available is the problem of counting tie-permitting labeled histories when arbitrarily many simultaneous binary mergers are permissible.Hence, Theorem 3, allowing simultaneous binary mergers in the enumeration of labeled histories, generalizes the recursion of Theorem 1 that counts labeled histories in the standard setting when ties are not permitted.
For tournaments with 2 k teams, we can compare the number of sequences of games in the arbitrary-arenas case in Theorem 3 to the single-arena case in equation ( 2) (Table 1).In the case of 2 2 = 4 teams, the arbitrary-arenas case permits one additional sequence that cannot occur with only a single arena: a sequence in which two games are played simultaneously in two arenas.For 2 3 = 8 teams, 365 sequences are possible when arbitrarily many arenas are available, compared to the 80 possible with only one arena.
The number of sequences in the arbitrary-arenas case grows rapidly in relation to the number in the single-arena case.For 2 4 = 16 teams in a symmetric bracket, as used for the single-elimination round of World Cup soccer, the number of sequences is 1,323,338,487 for the arbitrary-arenas case compared to 21,964,800 sequences for the single-arena case.For 2 7 = 128 teams in a symmetric bracket, as used for Grand Slam tennis tournaments, the number is approximately 5.84 × 10 182 for the arbitraryarenas case compared to 4.10 × 10 163 for the single-arena case.For March Madness, the 68-team design in Figure 4 gives 1.91 × 10 78 sequences compared to 3.60 × 10 68 .
Tables 2 and 3 show the numbers of game sequences for all tournament designs with 8 or fewer teams-the numbers of tie-permitting labeled histories for all labeled topologies with at most 8 taxa.In these tables, for a tree T , the number of sequences for the single-arena case can be seen in the column for n = |T | − 1 events, as each of the |T | − 1 games represents a distinct event.
In Tables 2 and 3, for fixed trees, we can compare the numbers of sequences with different numbers of events.The number of sequences tends to increase as the number of events increases from its minimum, δ(T ), reaching a peak before decreasing as the number of events reaches its maximum, |T | − 1.For example, for the fully symmetric bracket with 8 teams, the number of sequences increases from 1 with 3 events to 22 with 4 events, 102 with 5 events, and 160 with 6 events, before declining to 80 with 7 events.The minimal number of events, δ(T ), introduces a constraint that requires many specific games to be played simultaneously; somewhat larger values for the number of events are less constrained, allowing larger numbers of game sequences.It will be of interest to more formally explore this pattern of change with n for fixed T .
For a specified number of teams, considering different trees, the number of tiepermitting sequences tends to increase with an increasing amount of "balance" in the tree structure.For example, with 8 teams, the "caterpillar" bracket in the first row of Table 3 possesses only one sequence, whereas the fully symmetric bracket in the last row possesses the largest number, 365.This observation, that the number of tie-permitting sequences increases with tree balance, considered informally, follows a pattern seen in the single-arena case [14].It suggests the possibility of searching for results concerning the tree shapes that produce the largest number of labeled histories when ties are permitted.

Conclusion
We have illuminated a connection between labeled histories in phylogenetics and sequences in which the games of a single-elimination tournament can be played in a single arena.By also obtaining a recursion that counts the number of sequences of games for single-elimination tournaments when arbitrarily many arenas are available, we have identified the equivalent problem of counting tie-permitting labeled histories when simultaneous coalescence events are permitted, and we have provided a recursive solution.
On April 4, 2021, Stanford defeated Arizona 54-53 at the Alamodome in San Antonio in the last coalescence of the 2021 March Madness women's tournament.The following day, Baylor defeated Gonzaga 86-70 at Lucas Oil Stadium in Indianapolis in the corresponding last coalescence for the men's tournament.Interestingly, the asynchronicity of these championship games, enabling audiences to watch both of them, suggests a more general question of counting game sequences that either asynchronously interleave the games of multiple tournaments or that permit synchronous games across tournaments: the question of counting labeled histories, without and with simultaneous binary mergers, for forests of labeled topologies.The number of sequences of games for symmetric brackets of 2 k teams, k = 1, 2, 3, 4, 5, 6, 7, with one arena (Theorem 2) or arbitrarily many arenas (Theorem 3)or, the number of labeled histories for symmetric labeled topologies of 2 k taxa, either disallowing or allowing ties in coalescence times.For k = 5 to k = 7, the values are approximate.For the 68-team bracket in Figure 4, the number of game sequences is ∼ 3.60 × 10 68 in one arena and ∼ 1.91 × 10 78 for arbitrarily many arenas.An amusing problem concerning a resourceful attempt to conduct sporting events under the constraints induced by the COVID-19 pandemic has revealed new results for combinatorial structures in evolutionary biology.And if the schedulers of March Madness ever find themselves in circumstances that demand that all the games in a 68-team tournament (as in Figure 4) must be played in one arena, they can rest assured that although the number of available game sequences for a specified bracket is greatly reduced from the 1,905,458,855,466,636,787,971,925,146,177,334,793,473,753,765,414,856,950,607,419,556,152,726,849,614,067 that are permissible in the case that arbitrarily many arenas are available, the singlearena scenario still leaves them with 360,410,120,625,822,474,490,741,822,944,015, 962,624,736,196,480,481,624,064,000,000,000,000 possibilities.

TABLE 2:
The number of sequences of games for brackets with at most 7 teams.For convenience, each bracket T is depicted as unlabeled, so that the leaf labeling is omitted.The entries represent the terms E ∞ (T , n) in Theorem 3, with sum N ∞ (T ) (equation 3).
Bracket Number of game sequences with n events (number of taxa) (labeled (number of labeled histories with n events) (topology) n = 1 n = 2 n = 3 n = 4 n = 5 n = 6

TABLE 3 :
The number of sequences of games for brackets with 8 teams.For convenience, each bracket T is depicted as unlabeled, so that the leaf labeling is omitted.The entries represent the terms E ∞ (T , n) in Theorem 3, with sum N ∞ (T ) (equation 3).