Collective decision-making algorithm for the best-of-n problem in multiple options

The best-of-n problem [Valentini G, Ferrante E, Dorigo M. The best-of-n problem in robot swarms: formalization, state of the art, and novel perspectives. Frontiers in Robotics and AI. 2017;4(9):1–18.] is a collective decision-making problem in which many robots (agents) select the best option among a set of n alternatives and is focused on the field of distributed autonomous robotic systems and swarm robotics. It is desirable to develop a collective decision-making algorithm that can work even when there are a lot of social–behavioural alternatives (n>2) to realize an intelligent system that can solve more complicated problems. However, previous studies mainly focused on binary collective decision-making scenarios (n = 2). In this paper, we propose a collective decision-making algorithm for the best-of-n problem with a large number of options by using short-term experience memory with a trial and error approach at the group level. After proposing this decision-making process, we show typical behaviour. Next, we show the convergence of this algorithm when a quadratic function is used for the bias corresponding to the individual characteristic. Next, the bias distribution proposed shows that an equilibrium point where all candidates have the same number of supports is unstable, and consensus states are a stable fixed point. Therefore, dynamics is expected to converge towards a consensus. Simulation results and mathematical analysis show that the average time required to find the best option is nearly proportional to the number of options and does not depend on the number of robots.


Introduction
"Flocking" [1][2][3][4] is said to be a state in which many individuals are in physical proximity. In the following, a state where individuals are flocking and they can share information is called a group. Groups can expect to solve tasks that far exceed the capabilities of individuals, but this requires the group can reach consensus appropriately. The method of achieving this differs greatly depending on the presence or absence of a leader. The following deals with situations where there is no leader. This is commonly referred to as group decision-making or decentralized decision-making problems, and much research has been done.
The collective decision-making problems are generally organized into two categories: consensus achievement, where agents aim at making a common decision, and task allocation, where agents allocate themselves to different tasks, with the objective to maximize the performance of the collective [5]. In addition, consensus achievement is decomposed into two classes: continuous and discrete. An example of the first class, continuous, is the selection of a common direction of motion by a flocking of agents [2], LCP algorithm [3] is known well. Here, we deal with the second class, discrete, which is generally called the best-of-n (BSTn) problem. In the BSTn problem, a group of robots has to select the best option among a set of n alternatives without a leader [6].
The preconditions for BSTn dealt with in this paper are as follows. First, this paper assumes that the options are symbolized beforehand and that each robot can explicitly select. The option selected by each robot can be transferred to other robots within the group, and the number of agents who choose an option can be counted by each robot. We suppose that information about the outcome of each option, such as its goodness, value, or effectiveness, is unknown in advance, and the robot only realizes the value of this option after executing that option.
We have summarized the related works in Table 1. The methods of the best-of-n problem can be roughly classified into value-sensitive methods or value-free methods. A value-sensitive method is a distributed decision-making method derived from social insects, in short, a method of dividing a group by the number of agents proportional to the goodness of options. For example, it is assumed that there are 100 agents and two options A and B, and the goodness is 60/100

Value-sensitive
Parker et al. [7] 2 Message flooding The robot has six states and a group of the robots reaches a consensus in three steps. It collectively corrects individual errors by verifying an alternative with multiple robots. The first alternative that passes to the committing phase becomes the collective unanimous alternative by message flooding. Experiments using actual robots are conducted Montes et al. [8] 2 Majority-rule [9] Proposing a method of a shortest-path problem by introducing Latency to the majorityrule presented by Krapivsky and Redner [9]. The supervisor creates k teams of three randomly selected robots, and members of each team unify their options using a majority rule. The rule that the robot that chooses better options updates its own state more frequently converges to a state where everyone has better options Seeley et al. [10] 2 Honeybee Population model (HPM) Solving deadlock problem using cross inhibition by stop signals in a nest-site selection scenario by honeybee swarms Pais et al. [11] 2 HPM Improving the decision-making capabilities by introducing sensory noise to the cross inhibition of [10] Valentini et al. [6] 2 Weighted voter model Proposing a weighted voter model for a site-selection scenario. The higher the quality of the option, the more likely it is to influence the votes of other agents Reina et al. [12] 2 HPM For the problem of finding the shortest path (n=2), we propose a collective decisionmaking model based on the home selection behavior observed in honeybee colonies [10].The correct decision is taken in at least 90% when n = 2, the resolution R > 0. 15 for every group size except very small one Reina et al. [13] ≥ 2 HPM Generalizing and extending the model of [12] to the best-of-n case. The correct decision is taken in at least 75% when n ≤ 7 Hasegawa et al. [14] ≥ 2 The supervisor selects by majority decision Applying a sensory discrimination mechanism using yes/no agents with differential thresholds to a model for making a collective decision among multiple options. The proportion of correct choices is more than 80% Value-free Iwanaga and Namatame [15]  This proposes a collective decision-making model of BRT [19] with Short-term Experience Memory (SEM) that the time required to find a suitable option can be reduced than [19]. Also, the quadratic bias function is proposed to provide the analysis of BRT model agents and 40/100 agents, respectively (x/y determine that x agents out of y agents say the option is good).
Since the goodness of either option is unknown in the initial state, supporters of each option are about 50% of the population. After that, by comparing with other options, supporters of A will gradually increase, and in this case, it can be expected that 60 agents will issue A. However, it is difficult for all 100 agents to reach the state of indicating A, and an additional mechanism is needed to overwhelm the minority opinion. This valuesensitive method is not covered in this paper because the algorithm for this part is highly dependent on each application. On the other hand, the value-free method is a method to expect to lead to behaviour suitable as a group, assuming that all members select the same behaviour [15,17,21]. For example, Iwanaga and Namatame [15] proposed an individual decisionmaking method inspired by T. Schelling critical mass [16], and they also clarified that the group quickly converges to the state where everyone selects one of the two options.
Also, we have proposed a value-free approach, BRT model (Bias and Rising Threshold model) in which by introducing the give up term to the individual decision-making process [15] even if there are more than two options [19,20,[22][23][24]. At the time of convergence, the group knows the quality of the selected option at the moment. If the quality is high enough, the selected option will be the final result of the collective decision-making. Otherwise, individuals can quit selecting this option and start selecting another option to find better options. However, these previous studies dealt with the problem that the number of options n is relatively small, and its dynamics have not been clarified.
In this paper, we propose an extended version of the BRT model [19,20,22], and the characteristics are shown by mathematical discussions and computer simulations. For the analysis, we introduce the quadratic bias function for action selection of BRT model is proposed. In the BRT model, each agent selects one option out of a set of n-options and notify to the other agents. The agents change their choices based on the ratio of supporters for each option and their own bias θ. Here, we have the following two states, a state of the group where all agents select the same option as an agreed interval and the other state where they do not select the same option as a competitive interval. Ordinary, an agreed interval and a competitive interval emerge iteratively until a good option is discovered.
Generally, if the number of options is large, the competitive interval is expected to continue for a long time. It is difficult to apply the proposed model to a group of robots that operate in a dynamically changing environment if it takes time to reach an agreement. Therefore, it is an extremely important task to clarify the relationship between the number of robots N and the number of options n to reach an agreement promptly. Therefore, in the following, we analyse the requirements and the approximate time to agree on suitable options.
The major difference between this paper and the previous models [19,[22][23][24] is that, in addition to introducing a short-term memory mechanism, we propose a design method for bias θ, which can be expected to converge to an agreement. Even in the case of n ≥ 2, when using this bias θ design method, it can be guaranteed that the agreed state is a stable fixed point and the typical competitive states are unstable points.
Also, the time required to reach an agreement is almost independent of the number of agents N. Furthermore, computer experiments confirmed that robots can reach the appropriate option when the number of options was 20 or less and that the average time required to reach the agreement increased almost linearly with an increase in the number of options n.
The remainder of this paper is organized as follows. In Section 2, we explain the proposed method and its characteristics. In Section 3, we analyse the convergence and the optimality of the proposed model mathematically. In Section 4, we verify the characteristics and indicate the effectiveness of the proposed method using computer experiments.
Finally, we denote assumptions in this paper. The agents can communicate with nearby agents without any mistakes and noises. The communication speed is much faster than their other action selection processes. Therefore, the distribution of the ratio of supporters for each option used for their decision-making which is global information can be obtained by the local communication instantly and correctly. This assumption has been introduced for the reason to analyse the dynamics on the whole system without any noises as same as many past works [14,[25][26][27]. The effects of a limited communication range of agents are discussed by [28].

Problem statement
The number of options is written as M instead of n in the following to avoid confusion. Consider the system of N agents and M options. Among M options, there is 1 superior option (suitable option) and M−1 inferior options (unsuitable option). The task is to discover the superior option by all of the agents.
In this paper, the most basic case is supposed, as follows. It is assumed here that no supervisor determines what to do. After each agent selects its action, it executes the selected action immediately. However, this paper deals with the most basic case and assumes an environment in which change occurs only when everyone's options coincide. This situation corresponds, for example, to the case of transporting a heavy load that can only be moved by the combined efforts of all agents.
In addition, this paper supposes no supervisor who judges the suitableness of the options of a group of agents. Every agent evaluates the suitableness of their options on their own judgment function. On the other hand, in this paper, we also suppose that all agents have the same judgment function. If all agents attempt to perform the same option, this option will cause a change in the environment. The agents observe this change with their own sensors and obtain the same evaluation. However, in the experiment described below, the system will tell all agents if their actions match one of the pre-selected options, as described above. We do not deal with realistic and complex decisions, such as those that require a detailed judgment of the target situation, since these decisions vary from problem to problem, but there is research on the use of BRT in complex environments, such as [29].
On the other hand, whether everyone has taken the same measure is needed for the BRT algorithm described below. We assume that each agent observes others by their sensors and communicate with each other to find out.

The approach
BRT evaluates an option in an agreed state by all individuals like the method of Wessnitzer and Melhuish [17] then they judge collectively whether the option is good or not. When all their options match, we suppose that each agent evaluates the option based on its own criteria and decides that it is good if the criteria are exceeded. An option that everyone decides is good is called a suitable option. Therefore, if their criteria levels are low, multiple options may become suitable options, but here the option satisfying the criteria is one out of M to simplify the problem. Also, the judgment process of goodness which this agent performs is omitted because the evaluation process strongly depends on their applications.

Iwanaga and Namatame model [15]
The research by Iwanaga and Namatame [15,30] was inspired by critical mass created by Schelling [16]. As shown in Figure 1, they stated that individual decisionmaking depends not only on personal philosophy and personal preferences but also on the atmosphere of the whole group (p(t)). Each individual shall take the position of agreeing or disagreeing with an option and has a personal attribute bias value θ i (Figure 1 A), which is used in this decision-making process (Figure 1 C). Each individual observes the ratio of the members who agree with the option p(t) (Figure 1 B).
As shown in Equation (1), an individual i agrees with an option if the percentage of agents who agree with this option p(t) (the agreement ratio) is higher than the bias value θ i at time t. After all individuals have made their decisions, agreement ratio p(t + 1) at time t + 1 changes (Figure 1 D), so the individual makes its decision again (Figure 1 C). Iwanaga and Namatame [15] proposed a method to analyse the dynamics of agreement ratio p(t) in this decision-making process. If the distribution of θ is determined as shown in Figure 2, the cumulative distribution function F(θ ) forms an S-shaped curve as shown in Figure 3. If the agent makes a decision according to Equation (1), ratio p(t) at time t follows the following equation: The solutions of Equation (4) p * are called fixed points. There are two types of fixed points: stable point and unstable point. If bias value θ i is distributed in a bell shape, as shown in Figure 3, one unstable point (E 2 ) and two stable points (E 1 and E 3 ) are generated. In other words, the behaviour of a group can be classified into the following three patterns.
(1) The case of p(t) = 0.5 corresponds to E 2 in Figure  3. The state of the group does not change because F(p(t)) = p(t). However, if p(t) changes for some reason, the state of the group will transit to one of the following two states. (2) The case of p(t) > 0.5 (E 2 < p(t) ≤ 1). In this case, because p(t + 1)(= F(p(t))) > p(t), the state of the group transits rapidly to the state where all  agents agree with the option (p(t) = 1). In Figure  3, because of p(0) < p(1) < p(2) < p(3) 1, the agreed state is reached in approximately three units of time from the almost equilibrium state.
, the state of group transits rapidly to the state where all agents disagree with the option (p(t) = 0).
If the cumulative distribution of bias values θ i is S shape, the number of iterations until the convergence does not depend on the number of individuals. Thus it is possible to expect that the convergence is achieved promptly even with a large number of individuals. However, in [15] there are no mathematical discussions of the convergence, and the scalability of the number of options M > 2 has not been considered.

BRT model with short-term experience memory (SEM)
Based on the above, in the following, we propose an extended model of our study in [19] in which agreement can be promptly reached despite a large number of options (M > 2) and call it the BRT model.
Here, we assumed that there are N agents P 1 , . . . , P i , is the number of neighbours selecting option a j . When every agent chooses the same option, the group agrees to that option. This corresponds to, for example, in [17], the entire population heading to one of multiple feeding grounds.
If all agents select the same option, we suppose that the group agrees with that option. If the state of the group at this time is suitable for the environment, etc., this consensus option is called a suitable option. Here, we set the set of these suitable options as A goal (A goal ⊂ A). The agent does not know A goal in advance but only when all agents have agreed, it is understood that A goal includes that agreed upon option. Figure 4 shows the decision-making flow of agent P i based on BRT model with short-term experience memory. The agent P i decides its option A i (t + 1) at time t as follows: , otherwise Equation (6) is the condition to continue the current option at the next time step. N i is the number of neighbours of P i . In the following, we suppose that all agents can communicate with all others as we described at the end of Section 1. Therefore we use N for N i for simplification as follows: τ is a constant representing the prediction rate of the increase in the number of supporters. t i,last (t) is the time at which agent P i last changed its option, and (t − t i,last (t)) is the time span that the same option continues to be selected. c i (t) is a function that is equal to 0 when consensus option is a suitable option and is defined as follows: is obtained by local communication or monitoring the neighbourhood, defined as follows: If the ratio of agents who select the same option is lower , the agent selects a new option a l (a l = A i (t)) with probability Prob(a l ). The second term on the right side of Equation (6) is a term with a value increasing over time, called BRT term. We will explain in detail the BRT term in the next section.
At time t, if all agents agree with option a u (a u ∈ A, i.e. u ∈ {1, . . . , M}), i.e. agreement i (t) = true, but this is not suitable, B i is updated as follows: The most recently unsuitable option stored in is set b u = 1. The memories of other options are forgotten little by little. β (0 ≤ β < 1) is a constant representing goodness of the memory. In this paper, we adopt β = 0.2. In previous research [19], if the condition equation (6) is not satisfied, then an option other than A i (t) is stochastically chosen and becomes A i (t + 1) ∈ A\A i (t). It may repeatedly select the option that was not suitable a while ago. In this study, we introduce B to memorize options that were collectively agreed upon but not suitable to improve the efficiency of the suitable option discovery. Because β is small, this short-term memory is limited to be able to remember the very recent option.

Proposed distribution of bias θ i
In previous research [19,22], bias value θ i has been generated using the bell-shaped curve based on a Gaussian distribution. In this paper, we adopt a distribution by a quadratic function which makes the mathematical discussions more easily.
We set θ i as follows: where N is the number of agents, n(θ i ) represents the number of agents taking θ i , μ = 1/M is the expectation of the distribution, and k 1 = 3M 3 2 is a normalization term. Thus we derive the following equation: Figure 5 illustrates three examples of the distributions of θ determined by Equation (13). As can be seen, the shape of the bias distribution is a peak determined by the number of options M only.

Behaviour and characteristics of the BRT model up to the agreement
In the following, the behaviour of BRT will explain separately for (1) reaching an agreement and (2) finding a suitable agreement option.
Here, we explain the function of the BRT term of Equations (5) and (6) to realize the best option discovery ability. Agents repeat the same decision-making mechanism, but apparently, group behaviour can be divided into three phases: the competitive phase, the evaluation phase, and the revoting phase.
First, Figure 6 shows the typical behaviour for the agreement of a group of the proposed agents. The behaviour consists of three phases: competitive phase, evaluation phase, and revoting phase. The vertical axis represents the ratio of agents selecting each option and the horizontal axis represents the elapsed time step. At the initial step, each agent randomly chooses its option.
Next, agent P i selects its behaviour according to Equation (5). It decides whether to continue or quit the current option A i (t) with its personal attribute (θ i ) and surrounding atmosphere (n i (A i (t))/N and informs its decision to the whole. As time passes, a certain option is selected by all and becomes an agreement option (option a 1 in this figure).
We call this the end of competitive phase as Figure 6 shows and T v is the time required for the competitive phase. Next, since it is necessary to evaluate whether this agreement option is suitable for the group, this agreement option is maintained for a while even after a lapse of time. This is called evaluation phase, and T a is the time during the evaluation phase.
If agents accept the current option as a suitable option for them, the best-of-n procedure is finished successfully. On the other hand, if the current agreement option is unsuitable for them, the agents switch to other options and a Revoting phase will start. By repeating this process, we can expect to realize the ability to discover the suitable option among many options. Next, we explain how BRT term can make an agreed state. If Equation (6) is satisfied, agent P i continues to select the current option A i (t). By the time elapsed, the value of BRT term on the right side of Equation (6) increases by τ . When the value of the right side of Equation (6) exceeds the left value (n i (A i (t))/N), the agent P i stops selecting its current option and selects another option as A i (t + 1). Repeating this, (n i (A i (t))/N) becomes 1 and the whole group will select only one option as its agreement option. Figure 7 shows a simple example of this collective decision-making process. A = {a 1 , a 2 } is a set of agent options with M = 2. There are N agents P 1 , . . . , P N with their bias values θ = {θ 1 , θ 2 , θ 3 , θ 4 , ··· , θ i , ··· , θ N }. In the figure, only P 1 , P 2 , P 3 , P 4 are represented for simplicity. The height of the blue vertical bar represents the bias value. At initial time, we assume that the agents selecting option a 1 are P 1 , P 2 , . . ., and the agents selecting option a 2 are P 3 , P 4 , . . .. n(a j )/N is the ratio of agents selecting the option a j . As shown in Figure  7(B), at t = k, (τ · k) is added to the bias of each agent θ i (Equation 6). At this time, since the value of (θ 4 + τ · k) of agent P 4 exceeds the value of n(a 2 )/N, at next time step t = k + 1 Figure 7(C), agent P 4 stops selecting option a 2 and starts selecting option a 1 . Thus n(a 1 )/N increases while n(a 2 )/N decreases accordingly. In this way, as the time elapsed the group will collectively select option a 1 .
From previous experiments, it is known that the behaviour changes considerably according to τ and the number of options M. In the case where M is large, the group needs more time to compete for an agreement. In the case of small τ , its voting is done gradually, agents will take much time to reach an agreed state. If τ is large, the agents may give up selecting the option that they selected at the initial time then it is difficult to reach an agreement.
Namely, in the case of an agent with θ = 0, if no agreement is reached within 1 τ (15) steps from Equation (5), the support will be changed to another option. Therefore, it is expected that it will be difficult to reach an agreement if τ is not small enough.

Typical behaviour to discover the suitable option
This section describes the revoting phase and until the behaviour of discovering the suitable option. If the selected agreement option is not suitable A i (t) / ∈ A goal , the value of c i (t) will remain at 1 (Equation 9) and all the agents will keep selecting to evaluate this option for a while. As time passes, the right side of Equation (6) continues to increase steadily. Since each agent has its own θ i , the timing of reaching 1 also varies. An agent whose value on the right side of Equation (6) exceeds 1 will finish selecting the current option and select another option. This will trigger to revote to converge another option. By repeating this process, a group can switch the agreement option until the suitable option is discovered.
Here, by the computer simulations we show the typical behaviour until discovering a suitable option. A suitable option is discovered by a repeat of the agreement process in the last section.
We set the number of agents N = 1000, the number of options M = 3 (i.e. μ = 1/3), τ = 0.003. Two typical initial conditions are considered: (i) each agent selects its option randomly and (ii) all agents select an unsuitable option at the initial time. We assume that A goal = {a 3 }.   Figure 8 shows an example behaviour in the first case. The horizontal axis is time and the vertical axis is n(A i ), i = 1, 2, 3. All the agents selected and agreed with option a 2 at approximately time step 10. Since a 2 is not the suitable option, the second term of Equation (6) always increases. As such, everyone changed its opinion and agreed with another option, a 1 , at time step round 160, option a 2 at time step round 310 again, and option a 1 at time step round 470. They finally discovered the suitable option a 3 at time step round 620. Figure 9 shows an example in the second case. The group agreed on option a 1 in the initial state, and then it switched its option at time step round 160, 310, 470 and discovered the suitable option at time step round 620.
From the above, in both of these cases, we found that the group can make the collective decision and discover the suitable option using the proposed method even when the number of options is more than 2. Figure 10 shows the cumulative probability distribution of the time it takes to find the suitable option when β in Equation (12) is varied from 0.0 to 0.8. The number of options M=5, the number of agents N=30. At least 10,000 trials were conducted for each β. The line of β = 0.0 (dotted line) corresponds to the conventional method [19], and the lines of β > 0.0 (solid line) corresponds to the BRT with SEM. In both cases, the CDF became a staircase depending on the number of times the agreement was made. On the other hand, as a result of this experiment, we found that the time required for discovery was shorter for the model with SEM as the number of agreements increased.

The advantage of the SEM mechanism
In the conventional method (β =0.0), there are more than 5% of trials that require more than 3500 steps before everyone agrees on the suitable option. This time is unavoidable if the environment is highly dynamic, where the suitable option at the moment is different  from the one agreed upon in the previous round. However, if the environment is not such a highly dynamic one, it is reasonable not to choose an option that was judged to be unsuitable several times before. We have found that by adjusting β, the SEM mechanism can significantly reduce the time for discovery.

Mathematical analysis
In this chapter, we analyse the convergence of the proposed model and show the stability of each agreed state of any M and the instability of the equilibrium states in which all options are equally supported. Furthermore, using the case of M = 3, we show that there are other fixed points when M > 2 while this point is also unstable. From this, it expects that the model will converge to a consensus state over time while this model will not be caught at least the state where all options are equally supported. Next, we estimate the time to find a suitable option and show that this time does not depend on the number of agents.

The agreed state and the equilibrium states in time evolution dynamics of agent ratio
Here, we discuss the time evolution dynamics of the ratio of agents who select each option and the stability of the state in which agents agree to a specific option. In this section, τ = 0 is set for simplicity. Let x j (t) be the ratio of agents selecting option a j in step t.
Since the sum of the agents' ratios is 1, the state of the discrete dynamical system that describes the time evolution of x j (t) is represented by the M − 1 dimension vector X(t).
Let G j be the mapping that determines x j (t + 1) from X(t) as follows: The time evolution mapping G j is defined as the following equation: Here, F is the cumulative distribution function of the parameter θ of each agent. In Equation (20), the first term on the right side corresponds to the ratio of agents who also select option a j in the next step among the agents who select the option a j . The second term corresponds to the ratio of agents who change to the option a j in the next step among agents who have selected the other than a j . The sum of these two terms is the ratio of agents who select the option a j in the next step. Regarding the first term, F(x j (t)) is the ratio of agents whose θ parameter is smaller than x j (t) and it includes the ratio of agents who select the option other than a j . Therefore, we can obtain the ratio of agents who select the option a j and whose parameter θ is smaller than x j (t) by multiplying F(x j (t)) by x j (t). This is the percentage of agents who do not change the option a j in the next step. For the second term, by multiplying 1 − F(x k (t)) by x k (t), we can also obtain the ratio of agents who changes the selecting option from a k in the next step, since 1 − F(x k (t)) is the ratio of agents whose parameter θ is greater than x k (t). Dividing by M−1 gives the ratio of agents who change from the option a k to a j , since there are M−1 possible options to change. By summing this ratio for the options other than a j , the ratio of agents who change to option a j from the other options in the next step can be obtained.
The mapping G from the M − 1 dimensional vector to the M − 1 dimensional vector is defined as the following equation: The time evolution dynamics of the agent ratio can be described as a M − 1 dimensional discrete dynamical system given by the following equation: Let X * j be the agreed state in which all agents agree to the option a j (j = 1, 2, . . . , M). The agreed state X * j (j = 1, 2, . . . , M − 1) is defined as a vector with the jth dimension being 1 and the other dimensions being 0. For example, the agreed state X * 1 is given by the following vector: The state X * M agreed to a M has all dimensions of 0 and is given by the following vector: Next, let X * E be the equilibrium state in which all options are selected with the same ratio M −1 . The equilibrium state X * E is given by the following vector: It is trivial that the agreed states X * 1 , X * 2 , . . . , X * M−1 are equivalent in stability due to the symmetry of the dynamics. Therefore, in the next section, we analyse the stability of the agreed states X * 1 and X * M , and the equilibrium state X * E .

Stability analysis of the agreed states and the equilibrium state
First, we show that X * 1 , X * M , and X * E are fixed points of the dynamics of Equation (22).
Here, the following equations are used.
From the above equations, the agreed state X * j , (j = 1, . . . , M) and the equilibrium state X * E are fixed points of the mapping G.
Next, we show the Jacobian matrix T(X) of G.
Since Equation (22) is the dynamics of discrete-time, the fixed point X * 1 is stable if the absolute value of each eigenvalue of T(X * 1 ) is less than 1. Similarly, if the absolute value of each eigenvalue of T(X * M ) is less than 1, then the fixed point X * M is also stable. On the other hand, the fixed point X * E is unstable, if there are one or more eigenvalues of T(X * E ) which absolute value is greater than 1.As will be described later, the eigenvalues are real numbers in these three cases.

Equation (32) is calculated as
for l = j, and for l = j. Substituting X * 1 for X in Equations (33) and (34), we obtain T(X * 1 ) as the following matrix: Since T(X * 1 ) is a lower triangular matrix, its eigenvalues are diagonal elements.
If the absolute values of these eigenvalues are less than 1, the fixed point X * 1 is stable. Since 0 ≤ F (x) by definition, we obtain the stability condition of the agreed state X * 1 , . . . , X * M−1 as the following inequality: T(X * M ) is also calculated by substituting X * M for X in Equations (33) and (34).
The eigenvalues of T(X * M ) are equal to the eigenvalues of T(X * 1 ) (Equations 36 and 37). Therefore, the agreed states X * 1 , X * 2 , . . . , X * M are stable under the same condition shown in Equation (38).
Next, we calculate T(X * E ) by substituting X * E for X in Equations (33) and (34).
Then we obtain T(X * E ) as the following matrix: Since T(X * E ) is a diagonal matrix, all eigenvalues are equal and given by Equation (40). From F (M −1 ) ≥ 0 and F(M −1 ) ≥ 0, the equilibrium state X * E becomes unstable when the following inequality is satisfied.
Simplifying this inequality, we obtain the instability condition of X * E as follows:

Characteristics of the agreed states and the equilibrium state for any M
The proposed bias distribution Equation (13) is as follows: For the given bias distribution f (θ ), F(x) and F (x) are defined as follows: The values of F(x) and F (x) can be calculated as follows: From Equation (50), the stability condition Equation (38) of the agreed states is satisfied as follows: From Equations (48) and (49), the instability condition (Equation 44) of the equilibrium state is also satisfied as follows: Therefore, the agreed states are stable and equilibrium state is unstable for the proposed bias distribution f (θ ) shown in Equation (45).

Characteristics of the agreed states and the equilibrium state for M = 2
For the case of M = 2, the time evolution dynamics of the agent ratio is given by the following onedimensional mapping: Here, G 1 (x) has the following symmetry for any given F(x).
If F(x) has similar symmetry, then Therefore, for the case of M = 2, if the cumulative distribution function has the symmetry shown in Equation (55), the time evolution dynamics is given by the cumulative distribution function. The stability condition of the agreed state (Equation 38) is The instability condition of the equilibrium state (Equation 44) is If F(x) has the symmetry of Equation (55), the instability condition (Equation 58) becomes The proposed bias distribution (Equation 13) for M = 2 is as follows: In this case, the stability condition of the agreed state (Equation 57) and the instability condition of the equilibrium state (Equation 59) are satisfied, since and

Characteristics of the agreed states and the equilibrium state for M = 3
For the case of M = 3, the time evolution dynamics of the agent ratio is given by the next two-dimensional mapping.
We can find a fixed point other than the agreed states and the equilibrium state by solving the following  x 1 , x 2 ).
equation that corresponds to the case of x 1 = x 2 .
(64) Figure 11 shows a 3D plot of G 1 (x 1 , x 2 ). The orange curved surface is G 1 (x 1 , x 2 ), the blue plane is x 1 , the green plane is x 2 , and the red plane is 1/3. The intersection line of the blue and green planes is The intersection points of this intersection line and the orange curved surface are fixed points when x 1 = x 2 . It can be seen that there is a fixed point near x 1 = x 2 = 0.5 other than the fixed point (the equilibrium point) of x 1 = x 2 = 1/3. Figure 12 shows the graph of For the case of x 1 = x 2 = x ∈ [1/3, 1/2] and x 3 = 1 − 2x ∈ [0, 1/3], expanding Equation (64) by using the proposed bias distribution Equation (45), the following quartic equation is obtained.
Equation (65) has two real number solutions and two complex number solutions, and the real number solution other than x = 1/3 is as follows: The Jacobian matrix equation (31) for M = 3 can be obtained as follows: When x 1 = x 2 = x, the eigenvalue of the Jacobian matrix of Equation (67) are as follows.
Therefore when M = 3, the conditions that the absolute values of the eigenvalues λ 1 and λ 2 are greater than 1 is obtained as follows.
The condition of |λ 1 | > 1 is The condition of |λ 2 | > 1 is If either of the eigenvalues λ 1 and λ 2 has an absolute value greater than 1, the fixed point becomes unstable. For the fixed point In this section, we have clarified the conditions for the stability of the agreed states and the instability of the equilibrium state for any M and any bias distribution. We also used the case of M = 3 to show that a new fixed point exists when M > 2, and showed that this fixed point is unstable in this case of M = 3. From these results, it can be expected that the ratio of agents converges to the agreed state, which are stable, over time. While the rigorous analysis is difficult when M ≥ 4, at least the following two propositions have been clarified. (1) An equilibrium state in which all options have equal support is out of balance due to instability. (2) Agents converge to an agreement in the neighbour of the agreed state.

Approximation of time required to find the suitable option
Here, we consider the average time required for discovering the suitable optionT. As mentioned earlier, sometimes a competitive state and an evaluation state have emerged iteratively until the suitable option is discovered. The notation used below is summarized in Table 2.
The prior experiment is illustrated by Figure 13. This is the result of measuring T D , that is, the sum of the time of the competitive phase and the evaluation phase, by performing 10,000 experiments each by changing the combination of τ , M, and N.
In this figure, the whiskers of each bar indicate the corresponding standard deviation and the variance of T D is very small in any pairs of M and τ . Therefore, in the following, we assume that the time required for an agreement is a constant T D . Assume that there is only one suitable option, a x , among M options. When voting for the first time, the probability of selecting a x is 1/M. From the second agreement, the agents know that one option was not a x . We assume the next agreed option is determined almost Table 2. List of notations used to analysis in this section.

Symbol Definition
T D ,T D Average time for an agreement (i.e. T v + T a ) and its estimated valuē T,T Average time required to discover the suitable option and its estimated valuē E Average number of agreements required to discover the suitable option at random. Under these assumptions, Table 3 indicates the estimation time required to discover the suitable option. Thus, the time required for discoveryT is the following:T This equation is the average number of agreements required for the discovery. Thus the average number of agreements required for the discovery can be estimated to be almost linear for large M.
The time required for agreeing toT D is the time taken from the initial time to the time when the competitive phase ends. T D is different each time it is tried depending on the initial conditions. The characteristics are analysed below. At a certain qth attempt, agent P i exits the agreement when both sides of the Equation (6) are equal, so the agreed time T q D,i is the following expression. θ q i is the bias of agent P i during this trial.
An agent that has the largest bias at q th run θ q max = max(θ q i ) will be the first agent who quit the agreement so the shortest time span T q D = min(T q D,i ) is obtained as follows: The bias of the proposed method, θ q i , is stochastically generated between 0 and 2 M . Therefore, if the number of agents N is large enough, the following equation holds: So, we proposedT D for approximation of T D as follows: Using this and Equation (78), (83), we can derive the approximate value of the average time required for one agreementT as follows: From this, we can expect that the average time required for discoveryT increases linearly as the number of options M increases. Also, this estimation shows that T does not depend on the number of population N. In the following, the accuracy of this approximation will verify by computer experiments.

Experimental results
In this section, when there are many options, we will conduct a verification experiment of a BRT model with an SEM mechanism using a new bias generation method. In particular, the estimation performance of Equation (83) and (85) will be clarified by computer experiments.

Scalability in group size N
Here, we clarify the scalability characteristics of the group size N in an agreed time. To confirm that the time required for an agreement T D does not depend on the group size N, we conduct verification experiments   Figure 14. The horizontal axis represents k 2 , the vertical axis represents the average time required for an agreementT D . Red dotted line shows T D = 1/τ . From the figure, except for the very small value of k 2 (k 2 = 1, 2), the average time required for an agreement T D is almost constant for the increase of N.
When k 2 is very small that cannot be certified as a group, the time required for an agreement is larger than it when k 2 is large. If k 2 is small, i.e. N is small, as shown in Figure 15, the generated bias is not correctly shaped in the form of the proposal distribution. So this is the reason why the group could not make agreements promptly.
From the above, if τ is small enough for the number of options M, the proposed method can enable groups to make agreements promptly and the time required for an agreement does not depend on the group size.

Best option discovering
Next, we show that a group can discover the suitable option using the proposed method by computer  experiments. The experimental setups are as follows: the number of agents k 2 = 10, τ = 0.003. Assuming that each agent randomly selects its option at the initial time, and we verified whether it was possible to select the suitable option by changing the number of options M. One option a x , which belongs to A, was selected in advance and set to the suitable option (|A goal | = 1). With each value of M, the number of cases where all of the agents discovered the suitable option a x was counted for all 10, 000 trials, and the cumulative distribution of success ratios of the discovery was determined. The experimental results are shown in Figure 16 and Figure 17.
We define success ratio E success is the proportion of cases when they found the suitable option in 10, 000 trials: where n success is the number of cases when agents find the suitable option and n trials is the number of trials. Figure 16 shows the cumulative distribution of success ratios of the discovery across different M. The horizontal axis represents the time required for the discovery, the vertical axis represents the cumulative distribution of success ratios. As can be seen, in all of the cases of different M, the cumulative distribution of success ratios of the discovery were very high values (≥ 99%). Therefore, the group could discover the suitable option a x even if M is large. Figure 17 shows the average time required for the discovery and the estimationT = From the above, we can conclude that if τ is small enough, the proposed method can enable the groups to discover the suitable option and the average time required for the discovery is nearly proportional to the size of M and the time does not depend on the number of population N.

Conclusion
In this paper, we propose a BRT model with short-term memory, which is an agent decision-making algorithm for the best-of-n problem. BRT can generate grouplevel trial and error dynamics. By repeating the competitive and evaluation phases, the group can find the best option from many options. In this study, we proposed a bias generation method using a quadratic function, clarify the stability of the agreed states and the unstable condition of the equilibrium states. When M = 2,3, we show that the agreed states are stable and that the equilibrium states are unstable. In addition, although a new equilibrium state occurs when M > 3, it is expected that the proposed method will converge to the agreed state because the equilibrium state is unstable. We also proposed a method for estimating the time required to find an appropriate option. We validated this estimation by computer experiments and showed that the time needed to find a suitable option increases almost linearly with the increase in the number of options. Then, it found that the time does not depend on the number of robots.
Application to real multi-agent systems such as the cooperative behavior of multiple robots and application to the analysis of the collective behaviour of living beings are future tasks.