Trusting: Alone and together

We study the problem of an agent continuously faced with the decision of placing or not placing trust in an institution. The agent makes use of Bayesian learning in order to estimate the institution's true trustworthiness and makes the decision to place trust based on myopic rationality. Using elements from random walk theory, we explicitly derive the probability that such an agent ceases placing trust at some point in the relationship, as well as the expected time spent placing trust conditioned on their discontinuation thereof. We then continue by modelling two truster agents, each in their own relationship to the institution. We consider two natural models of communication between them. In the first (``observable rewards'') agents disclose their experiences with the institution with one another, while in the second (``observable actions'') agents merely witness the actions of their neighbour, i.e., placing or not placing trust. Under the same assumptions as in the single agent case, we describe the evolution of the beliefs of agents under these two different communication models. Both the probability of ceasing to place trust and the expected time in the system elude explicit expressions, despite there being only two agents. We therefore conduct a simulation study in order to compare the effect of the different kinds of communication on the trust dynamics. We find that a pair of agents in both communication models has a greater chance of learning the true trustworthiness of an institution than a single agent. Communication between agents promotes the formation of long term trust with a trustworthy institution as well as the timely exit from a trust relationship with an untrustworthy institution. Contrary to what one might expect, we find that having less information (observing each other's actions instead of experiences) can sometimes be beneficial to the agents.


Introduction
In this paper we are interested in the process by which individuals decide to place their trust in an institution.The importance of trust and distrust in governments around the world was highlighted by the COVID-19 pandemic during which the distrust in government held by a group resulted in fewer vaccinations among individuals of that group [1].Similarly, distrust in science is strongly correlated with vaccine hesitancy [2].The prevalence of fake news during the COVID-19 infodemic [3] points toward peer-to-peer communication as a culprit for widespread distrust.Although these results were gathered in relation to the COVID-19 pandemic and vaccine hesitancy, one expects similar dynamics to apply to a whole host of different social phenomena.While the personal experience of an individual with the institution is important, it seems that the interactions with any individual's peers play a crucial role as well.We therefore study the dynamics of individual-to-institution trust in the context of peer-to-peer relationships between the individuals placing the trust.
A trust situation is a situation in which individuals are required to take a 'risk' in order to observe how trustworthy the other party is [4,Ch. 1].In some cases this risk concerns the placement of resources at the disposal of the other.Relationships to institutions are characterised by trust: the truster in such a relationship typically builds an intuition of how trustworthy the institution is.For example, we could consider the relationship between an individual and a social media company.The individual has thus placed their data (resources) in the hands of the company, and in return can post content, and has access to the posts of others.The level of trustworthiness the individual observes is determined by their daily experience using the social network, manifested in the (lack of) news that the social media company has been selling their information to third party advertising companies.As an alternative example, consider the decision of whether or not to get vaccinated for a virus.For those individuals who believe scientific consensus to have their best interest at heart, this decision may not seem like something requiring trust at all.For other individuals who are skeptical of the institution science and/or healthcare governance, it may seem like a risk to their health and possibly their personal information to get vaccinated at a government vaccine station.This type of examples has motivated us to phrase the problem of deciding to place trust as a learning problem.The idea that agents learn about the trustworthiness of their interaction partner is also supported by sociological theory.The chapter by Buskens and Raub [5] provides a good overview of trust as a learning problem, as well as its own experimental evidence thereof.Furthermore, individuals are rarely isolated when faced with the decision of whether or not to place trust.Instead, human actions are embedded in a network which is likely to play a role in the decision making behaviour of its individuals.We therefore consider the effect of peer-to-peer communication on the dynamics of the trust problem, which is also supported in the empirical literature (cf.[6,7]).In particular Wang and Yu [8] find evidence that this influence acts both when experiences are communicated as well as when agents simply observe another's behaviour.We restrict our study to two agents as this showcases the essence of the underlying mechanism in the agent-to-agent communication.This allows us to construct the model with exact Bayesian learning 1 .Under basic assumptions of myopic 2 rationality, instead of exogenously imposing model assumptions on the effect of a signal sent from one agent to the other.This approach leads to delicate intricacies already in the two agent case, stemming from agents interpreting their neighbours actions whilst knowing that these actions were influenced by the actions they themselves have taken thus far.As a result of this mutual influence, in the two agents setup the evaluation of key quantities in our model eludes simple and explicit expressions.In fact, evaluating them requires many numerical integrations, which has led us to use a cluster of a large computing facility in our numerical experiments.
1.1.Related literature.Our investigation connects to two streams of literature: that of social network learning and that of social network trust.We proceed by providing an admittedly non-exhaustive, brief account of the related literature.The work relevant to our investigation spans various research disciplines, each with their own approach and set of 1 Work in biological perception has shown that humans often behave in a Bayesian manner.For a discussion on the implications hereof and a list of results we refer to Knill and Pouget [9].
Not only is myopic decision making a common assumption in the literature (cf.[10,11,12,13,14]), but there is also some experimental evidence that points to semi-myopic behaviour in humans [15,16].
"reasonable assumptions."The fields in question contain, but are not limited to, economics, theoretical sociology, socio-physics and machine learning.Beyond modelling of these dynamics, the empirical investigations of network influence on trust even extend to the field of marketing.
1.1.1.Social network learning.Social network learning concerns agents who try to optimize an unknown objective function by choosing an action, and that are learning via private signals and/or signals from other agents.Under various modelling decisions, the interest is in the speed at which a group of agents learns the best course of action, as well as whether or not under assumptions of rationality, the group may take a sub-optimal action often as a result of social influence.The modelling decisions relate to the degree of rationality in learning, the communication between the agents, and whether or not the private signals of the agents are conditional on the actions they take.Table 1 summarises the modelling decisions of the papers [12,17,14,18], all of them considering a population of agents in the social network learning framework.The second last column headed "Communication" refers to what the agents share with one another.This could be their private learning signal per round, their belief distribution, or simply the actions they take.The last column headed "Signals" refers to whether the reception of private signals by the agents is independent of or conditional on the actions the agents take.

Paper
Rationality Communication Signals Bala and Goyal [12] myopic actions and signals conditional Molavi et al. [17] imperfect-recall belief independent Harel et al. [14] myopic actions independent Huang et al. [18] non-myopic actions independent Table 1.Modelling decisions in selected social network learning literature.
The agents in the paper of Bala and Goyal [12] are myopic, and in addition they do not infer anything about the outcomes of the neighbours of their own neighbours based on the actions taken by their immediate neighbours.The authors find that in connected networks the agents' behaviour converges asymptotically.Molavi et al. [17], Harel et al. [14] and Huang et al. [18] all consider models in which the private signals received by the agents are independent of the actions they take.Within that context, Molavi et al. [17] show how imperfect-recall in the context of Bayesian learning relates to the seminal model by DeGroot [19].In models of a fully connected population of agents, Harel et al. [14] and Huang et al. [18] relate the learning rate of the model with communication restricted to actions to a reference model where communication between agents is open to signals and beliefs.
Departing from the network setting, Correa et al. [20] and Fu and Le Riche [21] consider a model in which sequentially arriving agents from a population choose to buy a product or not, based on a Bayesian belief on the product's quality that has been built on the observations of the preceding agents.Correa et al. [20] compute the probability of incomplete learning which occurs when the agents falsely believe the product's quality to be low.An important difference between [20] and our setup is that the agents in [20] know that the quality of the product comes from a set of two known values.This mean that in [20] the agents are only tasked with identifying which of the two values the quality takes, whereas in our paper the relevant parameter (in the sequel refrred to as the trustworthiness) may take any value in the unit interval.Fu and Le Riche [21] incorporate a similar learning problem into an endogenous growth market model in which the agents compare a new product of unknown quality to an old one of known quality, leading to the outcome that there are attainable equilibria in which the product's true quality remains unknown.
There also exists a body of literature focused on conditions which ensure social learning.In these papers agents typically receive only one signal and take sequential actions which are observed by some or all of the remaining agents.In the case of Banerjee and Fudenberg [22], the observation structure takes the form of a representative sample while the population is a continuum.Acemoglu et al. [23] impose a network structure to determine which actions are observed by agents.For an overview of models in which agents are represented by a fully connected graph, the interested reader is referred to Birkchandani et al. [24].
To the best of our knowledge, the following relevant setup has not been studied in the social network learning literature: A model in which agents act myopically rational, observe their neighbours actions or signals, and have private signals which are conditional on the actions taken.This kind of model also relates to the social network trust literature, considering that trust relationships require a dependency between signals and actions.
1.1.2.Social network trust.Social network trust typically concerns the behaviour of pairs of truster and trustee agents playing an iterated trust game.The interest here lies in the likelihood of trust breaking down and describing the rational strategies of the truster and the trustee.Particularly relevant to our work are studies in which there is a population of trustworthy trustee agents who always honour trust, and opportunistic trustee agents who play strategically.In such models, it is a natural extension to have agents learn about the proportion of trustworthy trustee agents.
Important examples of this literature stream have been presented in Bower et al. [25], Buskens [26], and Frey et al. [27].Bower et al. [25] describe equilibrium strategies for a sequence of 2-round trust games in which a new pair of truster and trustee agents is paired to play two rounds of the trust game.Additionally to learning about the true portion of trustworthy trustees, there is learning from the first round to the second round within a match-up between truster and trustee agent.Buskens [26] extends this analysis by considering a pair of truster agents that are both in a relationship to the same trustee.The main finding is that trust placement and honouring (in equilibrium strategies) increases with the probability of sharing information between truster agents, only if both agents are sharing information with a high probability.These findings have been corroborated by experimental work by Buskens et al. [28].Frey et al. [27] extend the theoretical work further by letting the link between truster agents be bought at a cost.The authors determine which game parameters include and exclude investments in such a connection for equilibrium strategies.
In contrast to this stream of literature, we consider a trustee agent such as an institution whose behaviour is not modelled strategically, and who interacts with more than one truster agents.Following e.g.[26,27,28], one trustee interacting with two trusters is a natural setting to consider.
1.1.3.Relevance and perspective.In this section we elaborate on the relevance of the two different streams of literature on the model we present.Thematically we are based in the social network trust literature, while methodologically our approach bears resemblance to those used in social network learning.We first present the ideas in both streams and subsequently how our work fits into this landscape.
The social network trust literature offers detailed descriptions of the interactions between strategic trusters and one of two types of trustees, e.g."friendly" trustees who always honor trust and "strategic" trustees who try to "fool" the truster in order to be able to abuse trust at some point (cf.[25,26,27]).The proportion of "friendly" trustees may be known (as in [26,27]) or unknown (as in [25]).
The trusters are learning about what type of trustee they are interacting with, and only sometimes also learning about the proportion of "friendly" trustees.As soon as a trustee abuses trust once in the peer-to-peer setting, they reveal that they are not to be trusted.A prevalent example in this literature is that of people buying a second hand car (cf.Buskens [4,Ch. 6]) from a loose acquaintance or via an online peer-to-peer service, or that of hiring an informal house sitter without a contract, both situations in which the truster and the trustee are peers in some way.
In the social network learning literature, there is an environment which creates an ordering on the actions the agents may take and emits a signal.The agents take actions and receive a signal.They use the signal to update their belief about the environment with the goal of taking the best action.It is social learning in the sense that the agents communicate with one another about their actions and/or signals.The modellers in turn are interested in the probability of learning the best action as a group (cf.[12,20,21]) and the speed of convergence to this best action under different forms of communication (cf.[14,18]).The social network learning results are applicable to situations in which agents are consistently updating their belief and trying to take optimal actions.One can think of prediction in markets (and setting the corresponding price to maximise profit), adoption of opinions (trying to fit in the group), and the dissemination of information (keeping up to date with the latest information).
We are motivated by the question of trust in institutions and the influence of peer-to-peer communication thereupon.Thus the social network trust literature paints the thematic backdrop for our work.We are interested in the event that trust is lost, primarily in the "asymmetric" case of trusters interacting with institutions.In this setting it is less natural to incorporate the strategic behaviour of the trustee: they are not actively taking part in the interactions, but rather "passively" providing a service.Instead of modelling trustee behaviour as strategic, we let them simply draw an action from a distribution.
We investigated the social network learning literature because of the technical similarity between our model and the models found therein.Our goal was to look for work that handles our setting (possibly under different nomenclature).In the process we observe that our model also fills a gap in the social network learning literature.Furthermore, we are encouraged in our decision to compare the effects of different forms of communication on outcomes.
In the social network trust literature, the agents are assumed to always communicate their experiences or their belief distribution fully.One can argue, though, that communication of experiences does not necessarily provide the right perspective: typically actions are readily observable, while the internal belief or personal experience may not always be shared.This has motivated us to compare the two.
The resulting modelling framework naturally applies to the setting we are interested in: individuals trusting institutions in the context of peer-to-peer influence.
1.2.Contribution.Our model aims to fill the above mentioned gaps in the literature.In order to model the effect of peer-to-peer communication on the dynamics of a trust problem between an individual and an institution, we draw from both streams of literature.
Learning signals are dependent on the actions taken, following the social network trust literature.We follow the social network learning literature by implementing myopically rational decisions and a Bayesian learning procedure.We study two of the communication models between agents seen in both streams related literature.In the first ("observable rewards") agents disclose their experience with the institution with one another, while in the second ("observable actions") agents merely witness the actions of their neighbour, i.e., placing or not placing trust.In both models of communication, we describe rational usage of the information in updating beliefs.The extent of the rationality plays a role as a benchmark rather than a description of actual human behaviour.The computations involved become complicated quickly, but provide a useful indication of perfectly rational information usage in terms of belief updating.
We impose no assumptions on the motivations of the trustee, who simply acts honourably at some probability θ ∈ (0, 1), thus generalising the setting where this probability takes one of two values as in [20].We are interested in a setup in which truster agents interact with institutions whose behaviour cannot be modelled strategically at the level of individual interactions, unlike [25,26,27,29].We also consider a more general version of the communication seen in [26,27], in which agents have access to their neighbours' actions and the outcomes thereof.
We describe the information usage and myopic decision making without depending on signals independent of the actions taken as seen in [17,14,18].The asymmetry of the actions, natural to the trust problem, means that incomplete learning is possible.Thus we pay attention to the probability of convergence to the truth rather than only the rate thereof.
In our work we analyse the dynamics of the single agent case using techniques from the field of random walk theory.Such analytic techniques, however, do not extend to two agent models, so that we analyze these relying on Monte Carlo simulation.As the numerical integrations required for agents to interpret each others actions take a prohibitive amount of computation time on personal computing machines, we have to use the Lisa cluster of the computing facility SURFsara.
We observe that two agents with communication between them tend to make the "correct" decision sooner.In other words: typically, sample paths in which the relationship helps the agents outweigh the sample paths in which "bad luck" for one agent implies "bad luck" for both due to the communication between them.
Our experiments reveal that the observable rewards model is not always "better" than the observable actions model.Which mode is most helpful to the agents depends on whether one is interested in making the correct decision in the long run or in being sure to end a relationship with an untrustworthy institution as quickly as possible.Moreover, we identify a parameter setting in which the probability of quitting is lower in the observable actions model than in the observable rewards model.This means that, contrary to what one might expect, having less information available might be beneficial to the agents, and there is no monotone ordering between the two models.
1.3.Organisation of paper.In §2 we describe basic elements of the model along with relevant interpretations.Thereafter, in §3 the model is described further and analysed in the case of a single agent (with proofs being provided in Appendix A).Here we pay special attention to subcases which allow for analytic results regarding the probabilities of such a trust relationship ceasing.In §4 we formulate the two models for two agents in the trust relationship with the same institution: observable rewards and observable actions.In §5 we discuss the experimental setup used to investigate the two agent model and in §6 we discuss the results of this experimentation.We conclude this paper with a discussion on the implications of the results and their relevance to the present literature in §7.

Model for a single agent
In this section we present the model for a single agent, which forms the core of the two agents models discussed in later sections.We consider the situation in which an agent has repeated opportunities to place trust in a institution.The institution's behaviour is modelled by a single parameter ϑ ∈ [0, 1], the true trustworthiness, defined as the probability at which trust is honoured (so that its complement 1 − ϑ is the probability that trust is abused).If trust is not placed, then the institution has no action to take.This behaviour can be interpreted as the efficacy of the institution honouring trust, implicitly assuming that this is what they are attempting in each round.Note that we acknowledge the high level of abstraction taken in regard to the institution and that interest is mainly in the agent's behaviour in such a situation.In each round t ∈ N the agent chooses an action from the action set A = {0, 1} in which A t = 1 indicates that the agent places trust in round t while A t = 0 indicates that the agent quits the trust relationship.We define the random variable X t , ∀t ∈ N, indicating whether the institution's action in round t is (would be) that of honouring or abusing any trust that may or may not have been placed: The random variables {X t : t ∈ N} are i.i.d., in particular independent from the agent's actions.Importantly, the agent only observes X t in rounds when A t = 1.If trust is honoured the agent gains utility r (reward), while if trust is abused the utility gain is −c (cost).We assume that c and r are positive integers; note that if r, c ∈ Q we simply multiply both by the product of their denominators to get integers.The utility for the agent placing trust in the institution is As is widely adopted in the learning literature (see, e.g., Sebenius and Geanakoplos [10], Parikh and Krasicki [11], Bala and Goyal [12], Keppo et al. [13] and Harel et al. [14]), we let the agent act with myopic rationality.This means that in every round t they only consider the expected utility of the immediate action, and they do not take into account the possible utility of actions taken in rounds t + 1, t + 2, . . .or the utility of information gained by taking action A t = 1.The agent places trust if they believe the utility to have a nonnegative expected value.Furthermore the agent starts with a Beta distributed prior belief P 0 with parameters α, β ∈ N, such that they initially believe the probability density of ϑ is given by The initial estimate of the expectation of θ is ϑ 0 = E[B(α, β)] = α/(α + β).As more information becomes available (trust is placed and subsequently honoured or abused during rounds t > 0) the agent updates this belief distribution in a Bayesian fashion.Let be the number of times that the agent observes that trust was honoured until time t.Similarly let be the number of times that the agent observes that trust was abused until time t.The belief distribution held by the agent at the end of round t is then found by applying Bayes' rule, so as to obtain Denoting the estimated trustworthiness at time t by then for t = 1, 2, . . .we have: 2.1.Quantities of interest.We are interested in the agent's quitting which happens when they stop placing trust: {A t = 0}.The random variable τ denotes the number of rounds in which trust was placed until the first 'do not place trust' action: By the definition of τ in ( 8) and A t in (7), we have We thus note that ϑ t = ϑ τ for all t > τ , which arises quite naturally from the model dynamics considering that once the agent has taken the action to not place trust, they also do not get to observe the outcome of the action and thus do not adjust their estimate.This links the agent's actions to the estimate and vice-versa.An institution can have a true trustworthiness ϑ such that rϑ−c(1−ϑ) > 0, implying that an agent aware of the true value of ϑ would place trust forever.It is also possible in such cases that the agent's estimate ϑ continues to adhere to the condition in (7) which means that they never lose trust.In such cases quitting is not a given and a particularly interesting probability to study is that of the event {τ < ∞}.We denote this by Furthermore, we are interested in the expected time spent in the system before quitting conditioned on quitting: We now recapitulate the dynamics with reference to the graphical representation in Figure 1a.The two elements determining the agent's belief distribution at the end of time t are their prior P 0 as well as the history of their interaction contained in the random variables S t and F t .Together these result in a trustworthiness estimation of the institution ϑ t by which the agent makes the choice to either place trust or to quit the trust relationship in round t + 1.If the agent places trust (thus continuing the process) there is a response from the institution of either honouring the trust (at probability ϑ) or abusing the trust (at probability 1 − ϑ).
Note that all of the model's randomness stems from the response of the institution.

Trusting alone
In this section we analyse the dynamics of the single agent model described in §2.Our interest lies in the probability of quitting p quit and the expected time to quitting q (conditioned on this occurrence).We consider the dynamics that unfold between a single agent who periodically places (or doesn't place) trust in the institution.We start this section in §3.1 with a re-interpretation of the model in terms of a random walk with an absorbing barrier and (potentially asymmetric) step sizes as well as (potentially asymmetric) step probabilities.Subsequently in §3.2 we analyse this random walk model for a number of special cases, focusing on the evaluation of the metrics p quit and q.
Prior Belief

Action
Trusting Quitting

Figure 1. (A)
The single agent learner model illustrated conceptually and (B) two example paths of the model dynamics in the random walk interpretation.B(ϑ) denoting the Bernoulli distribution with parameter ϑ.

3.1.
Model re-interpretation.We interpret the model described in §2 as a 1-dimensional random walk with step sizes −r and +c at probabilities ϑ ∈ (0, 1) and 1 − ϑ respectively.For an agent holding an initial Beta belief distribution with parameters α and β the decision criteria becomes By rearrangement we find that in order to place trust the agent needs which we interpret as a random walk starting at Z 0 = 0, and having an absorbing barrier at the "critical" position Note that different values of α and β correspond to the same behaviour in terms of p quit and q quit as long as the value of u crit is the same.The additional unit in ( 15) is due to the decision criteria to place trust including equality (the expected reward must be zero or more) while the formulation of an absorbing barrier describes the first position for which trust is not placed.Note that all the influence of the prior is contained in u crit .This also allows us to reformulate the probability of quitting as the probability of the random walking reaching the absorbing barrier.For ease of reference we refer to c/(c + r) as θ crit because an equivalent condition to Z t ≥ u crit is that ϑ < θ crit .
For illustrative purposes we provide two sample paths that such a random walk might take in Figure 1b.
Example 1 (Random walk interpretation).Consider the case with an absorbing barrier at u crit = 5 arising from parameter values c = 2, r = 1, α = 8 and β = 2.We interpret this as having a critical trustworthiness θ crit = 2/3 = 0.67.The first such path (depicted as a solid arrows) hits the absorbing barrier at t = 9, while the second path (depicted as dotted arrows) does not hit the absorbing barrier in the time steps shown and so has τ > 10.The values of S t and Z t are shown in Table 2 along with the respective estimated trustworthiness ϑ t .♣ Table 2. Sample paths and agent beliefs as random walks.
(a) Solid line in Figure 1b.
-2 ϑ t 0.8 0.73 0.75 0.77 0.79 0.73 0.75 0.76 0.78 0.79 0.75 0.76 Regarding the probability of quitting p quit , we consider first the case where ϑ < c/(c+r).Note that in this case the actual expected utility of placing trust is negative and that the process Z t has a drift toward the absorbing barrier, meaning that this will be reached eventually with probability 1.
The proof (given in Appendix A) heavily relies on the law of large numbers.The rest of our investigation in this section takes place in the (more interesting) case where ϑ > θ crit .This case is particularly relevant as it represents cases in which the optimal action for the agent would be to place trust indefinitely yet they do not necessarily do so.The random walk is investigated for general absorbing barriers u ≥ 0 in order to find recurrence relationships between p quit for different values of u.We extract the probability of quitting the trust relationship and of the time at which this occurs by using the appropriate u for the parameter values in question.
Considering the model thus far described, an agent either continues to place trust indefinitely and learns the true value of the trustworthiness or quits at some time t < ∞.

Lemma 2 (Converge or quit).
A single agent partaking in the trust relationship described with ϑ > θ crit either (A) quits at some time τ < ∞, i.e., or (B) continues to place trust indefinitely and has their estimate converge to the true trustworthiness, i.e., A t = 1, ∀t = 1, 2, . . ., and lim This entails that The proof follows from the definition of the estimated trustworthiness value and the law of large numbers and is found in Appendix A.

Results.
In this subsection we study the probability of the agent quitting the trust relationship.We achieve this by finding the absorption probability of the random walk at some level u.We define by π(u) the probability of hitting an absorbing barrier at u in terms of the distance from the starting point Z t = 0 to this absorbing barrier at u ≥ 0: Here we suppress the dependence on c, r and ϑ for ease of reading.Note that u can equal 0, corresponding to a scenario that absorption is certain.The analysis of π(u) is divided into three cases with respect to r and c.Two of the cases we characterise analytically, while we present a numerical approximation for the third.Using u = u crit gives the probability of quitting, i.e., p quit (α, β, c, r, ϑ) = π(u crit , c, r, ϑ).
(21) In § §3.2.1-3.2.2 we provide the above-mentioned analysis of two cases: 1) r ∈ N and c = 1, and 2) r = 1 and c ∈ N.An approximation scheme for the case r, c ∈ N is presented in Appendix B. The split is a result of the fact that it is not possible to find a closed form result for the general case.The techniques used in the two cases presented also differ substantially and should therefore be viewed separately.

3.2.1.
Case c = 1, and r ∈ N. In this case the random walk exhibits a useful memorylessness property.To see this, observe that the walk can only go up levels one at a time, while on its way down it can skip levels.We therefore first define the probability of the random walking climbing by one level for some t ∈ N: As a consequence of the strong Markov property, we note that the dynamics after attaining 1 level is independent of the history by which this was done.This means that the probability of going up u levels is simply the probability of u times sequentially going up 1 level, so that π(u) = ϱ u .A formal expression of the probability of quitting, including a characterization of ϱ, is presented in the following lemma which is proved in Appendix A.
Lemma 3 (Quitting probability when c = 1, and r ∈ N).Suppose ϑ ≥ 1/(1 + r).The probability of the corresponding random walk with parameter values c = 1, r ∈ N reaching an absorbing barrier at u satisfies in which ϱ(ϑ) = π(1, 1, r, ϑ) is the unique solution in the range [0, 1) to the equation To find the probability of quitting we use In Figure 2 we plot numerical results of the quitting probability denoted p quit (α, β, c, r, ϑ) = π(u crit , c, r, ϑ) for the subcases r = 1, 2, 3.The Beta prior belief distribution is given by shape parameters α = β = 2.The lines are theoretical results obtained by Lemma 3, and the dots (with confidence intervals) correspond to simulated results which corroborate the analytical results.The shape of the quitting probability is explicable by noting that a lower trustworthiness ϑ consistently leads to a higher quitting probability.The results of Lemmas 1 and 3 are illustrated as the probability of quitting is unity where ϑ < θ crit and follow (25) thereafter.We now turn our interest to the expected time an agent spends in the system.Note that the time of quitting τ in terms of the random walk notation is such that Then we are interested in the expectation of τ (u) conditioned on 1{τ (u) < ∞} which we now know occurs with probability π(u).Specifically we investigate the quantity We use the techniques from probability generating functions in order to describe the expectation of the distribution at hand.As such we inspect a number related to the expected time to reach the first level, τ (1): and note that This (again) follows from the fact that all levels {1, . . ., u − 1} need to be passed first in order reach u.Thus, the expected time to reach the u th level is simply u times the expected time to reach the first level.
Lemma 4 (Expected time to quitting when c = 1, and r ∈ N).If r ∈ N, c = 1 and ϑ ≥ 1/(r + 1) then for all u ∈ N, τ (u) satisfies where φ ′ (z) is the derivative from below of the function φ(z) which, for any given z ≤ |1|, is the unique solution to and where ϱ is the unique solution to equation (24).In particular, Expected time in the system conditioned on the agent quitting at some t < ∞ plotted against ϑ for different values of r while α = β = 2 and c = 1 on a log-linear axis.Analytical results are plotted as lines, while simulated results are shown as points with 95% confidence intervals.
To find the expected quitting time (conditional on quitting), we use To find the expected time in the system one starts by solving (24) with the appropriate value of r for the unique solution of ϱ ∈ (0, 1).This in turn is substituted into (32) along with u = u crit , resulting in the desired expression for τ (u crit , 1, r, ϑ).For details, see the proof in Appendix A.
We plot numerical results as well as simulated results of the expected time in the system conditioned on the agent's eventual quitting q(α, β, 1, r, ϑ) in Figure 3 for the subcases r = 1, 2, 3 on a log-linear axis.The Beta prior belief distribution is characterised by shape parameters α = β = 2.We notice that the expected time is increasing toward an asymptote positioned on ϑ = 1/(1 + r) = θ crit .The position of this asymptote is due to the property of the model: quitting is still guaranteed for ϑ = θ crit but the time it takes has an infinite expectation (cf. the symmetric Bernoulli walk).This initial increase makes sense, as the probability of making a detour away from quitting is increasing with ϑ though it cannot escape forever in this region.The subsequent decrease as ϑ > θ crit results from the fact that, as ϑ increases, if quitting takes place then it does so earlier.This is because more time in the relationship exposes the agent to more unbiased information which is likely to indicate that the institution is trustworthy.
3.2.2.Case r = 1, and c ∈ N. In this case we notice that the walk Z t loses its memoryless property: steps upward now have size c ∈ N and so it is possible that levels are skipped along the way.This means that a relationship of the type π(u) = ϱ u no longer holds.We thus see that one has no guarantee of which levels are reached on the way to quitting, and therefore one cannot use the approach that we relied upon in the c = 1 case.We nevertheless obtain the following result.In this case the appropriate value of u is given by u crit = α − cβ + 1.
Lemma 5 (Quitting probability when r = 1, and c ∈ N).If r = 1, c ∈ N and ϑ ≥ c/(c + 1), then the probability of quitting is given where π(u, c, 1, ϑ) satisfies here ξ (u) (0) is the u th derivative of the function ξ(w) in w = 0, where ξ(w) is defined by The proof (in Appendix A) uses a translation between the maximum of our random walk on one hand, and the minimum of a random walk with c = 1 and r ∈ N on the other hand.
We observe that ξ(u) is of the form P (w)/Q(w), where P (w) and Q(w) are polynomials of degree c + 1 and c + 2 respectively.We evaluate values for π(u) by differentiating both sides of Q(w) ξ(w) = P (w), u times and substituting w = 0 in order to get the expression for π(u).
Here ξ(w) can be interpreted as a probability generating function with coefficients a u = π(u), which allows calculation of the ruin probabilities by differentiating u times, setting w = 0 and dividing by u!, a standard procedure from the theory of probability generating functions.
The resulting quitting probabilities are plotted in Figure 4 for the first two values of c.As expected, quitting remains certain until ϑ > c/(c + 1), the stability condition.The prior belief parameters were chosen in order to ensure that the agent does not quit in round 0.
The effect of the prior belief parameters is that u crit = 3 for c = 1 and u crit = 2 for c = 2.As before, we are also interested in the expected time that an agent spends in the system placing trust, conditional on them quitting eventually.Similar hurdles arise here as a result of the potentially larger step in the direction of the barrier, though in this analysis these are handled without much extra machinery.The following lemma expresses the expected time that an agent places trust until quitting conditioned on their quitting eventually.
Lemma 6 (Expected time to quitting when r = 1, and c ∈ N).Define and let w(z), for any given z ∈ (−1, 1), denote the unique solution for w ∈ (−1, 1) in and Suppose ϑ > c/(c + 1) and the utility parameters of the agent are given by c ∈ N and r = 1.
The expected time for the corresponding random walk to hit an absorbing barrier at u satisfies where In order to fully express Φ in terms of only w, ϑ, and z, one first uses the fact that a root of the denominator is necessarily also a root of the numerator (to keep Φ(w, z) bounded), and using it to express φ(z, 1) in terms of z.The proof, presented in Appendix A, makes use of the object φ(z, u) := E[z τ (u) 1{τ (u) < ∞}] which is then used in the generating function for w ∈ (−1, 1) from which we obtain the result.
In Figure 5 we plot numerical results as well as simulated results of the expected time in the system conditioned on the agent's eventual quitting q(α, β, c, 1, ϑ).We do so for the instances c = 1, 2 using the Beta prior belief distribution determined by shape parameters α = 3 and β = 1, i.e., the same parameter choices as the ones used in Figure 4.As in the case with c = 1 and r ∈ N, we notice that the expected time is increasing toward an asymptote, which is now positioned at ϑ = c/(c + 1).After this critical ϑ there is a decrease as failing becomes more centered around the early time steps.More time in the system implies that the agent is exposed to more unbiased experience with the institution, and therefore they have a greater chance of remaining in the trust relationship.

Trusting together
In this section we extend the model from one truster agent to two truster agents.To this end, we enhance the model with a communication mechanism (between the truster agents, that is) that retains the assumptions of myopically rationality.
We consider agents who communicate as frequently with one another as they make decisions to place or not place trust.The nature of this interaction is considered in two forms.The first is that the agents share completely their own interaction history with each other.The second form is that the agents do not communicate about their outcomes at all, and only observe which action is being taken by the other agent in each round.
4.1.Shared modelling aspects.In both mechanisms we have two agents, each with their own interaction with the institution.As such, we have decisions for each agent per round from the action set A = {0, 1} 2 in which A i,t = 1 if agent i ∈ {1, 2} places trust in round t.
For each round t and agent i, where X i,t = 1 indicates that the trust that was placed in that round was subsequently honoured by the institution.Similarly to the single learner case the agents use to keep track of how many times trust was honoured to them in rounds when they placed trust.We also define the number of times trust was abused to each agent i = 1, 2 by The agents are equipped with the same Beta prior belief distribution as before.The prior belief distribution (with shape parameters α i , β i ∈ N) for agent i is therefore For simplicity, from now on we consider only the case of homogeneous priors (i.e., α 1 = α 2 and β 1 = β 2 ).It should be noted that one can deal with non-homogeneous priors in the exact same way, where one should specify what each of the agents knows about their neighbour's prior.One could assume that they know it exactly, or hold some belief distribution on it; evidently, the formulation becomes more tedious but is conceptually straightforward.
As we are only considering the homogeneous case, we drop the subscript on the prior belief distribution, calling it simply P 0 (•).Let the information received by agent i in round t be denoted I i,t (θ).The precise formulation of this information is a modelling outcome of the choice of communication; see (50) of §4.2 for the first mechanism, and (59) of §4.3 for the second mechanism.The resulting belief update rule becomes The difference between (47) and ( 2) lies in the presence of the information received by the agent.Subsequently the agents use the mean of their belief distribution as a point estimate for the true trustworthiness of the institution and use this to make the decision whether or not to place trust in round t + 1.If we define i.e., the mean belief held by agent i at the end of round t, then agent i takes the action A i,t+1 in round t + 1 where Note that the condition in (49) can be rewritten as ϑ i,n ≥ θ crit , ∀n ∈ {0, 1, . . ., t − 1}, with θ crit := c/(c + r).In the next two subsections we discuss the two different communication mechanisms in more detail.

Observable rewards (OR)
. This case considers agents that communicate fully with one another about their experiences with the institution.The dynamics of this model are represented graphically in Figure 6.The extension from the single learner model (represented in Figure 1a for comparison) to this model consists of the shared information between agents which entails the response of the institution to trust that was placed.
Each agent thus has same information available to them, and under homogeneous prior beliefs they have the same estimate of the true trustworthiness.In this case we define the information agent i receives from their neighbour j ̸ = i in round t, We see that this information contains for each agent exactly the interaction history of the other agent, hence we have the line from agent i's "Trusting" action to agent j's "Belief" distribution in Figure 6.
As mentioned, the agents' estimated values of the true trustworthiness are the same.Thus we can drop the subscript i from ϑ i,t and note that it is given by Defining Z t = c 2 i=1 F i,t −r 2 i=1 S i,t , similarly as in the single-agent model we get a random walk that takes steps with an absorbing barrier at u = rα − cβ + 1.
In the case of heterogeneous priors the formulation of this model does not change dramatically.The fates of the agents would not be tied as before and so there would simply be two random walks defined as in (52), one for each agent.Each random walk would have its own absorbing barrier defined by u i = rα i − cβ i + 1. Recall that α i , β i , r and c for all i are taken from N and so u i too is an integer for both i.If at some time t q one of the agents quit, then the remaining agent simply continues according to the dynamics of the single agent model with the new transitions defined by for t > t q .
4.3.Observable actions (OA).In this setting the agents do not communicate explicitly by sharing information about their interaction history.Instead, they observe the actions of their neighbouring agent, and use this to infer something about the possible histories their neighbouring agent may have experienced leading to such a result.This is where subtle intricacies arise, as rational agents need to keep in mind that their own actions are observed by their neighbour and thus will also affect the decisions made by their neighbour.
The extent of the communication between the two agents in this model is binary indicating whether or not trust was placed at the end of round t.This information is only incorporated into the agents' belief distributions for round t + 1.This model is depicted graphically in Figure 7.The crux of this model is the line from one agents "Action" decision to the other agent's "Belief" distribution.The action of not placing trust implies that the agent exits the system forever.The quitting agent does not continue to update their belief based on their neighbour's actions.The information sent by the quitting agent is used as the information received by the not-quitting agent for the remainder of the not-quitting agent's tenure in the system.The nature of the communication necessitates separate definitions for the information sent with a "yes" signal, when the agent places trust, and for the information sent with a "no" signal, when the agent does not place trust.As such we define for i = 1, 2, j ̸ = i and t ∈ N, where τ j is the round in which agent j quits for j = 1, 2. We formally define Y i,t and N i,t in (59) and (61) respectively.By the rationality assumption, the agents use the binary information along with the knowledge that their own outbound information up until round t−1 was used rationally by the other agent.With this they infer a range of possible histories that may have lead to the decision taken by their neighbour.
We define the belief distributions R i t , formed by taking only the report received by agent i (signal sent at time t by agent j to agent i) into account: We then define the auxiliary belief distribution resulting from a combination of R i t and θ x (1 − θ) t−x where x is one of the possible number of positive experiences giving D t (x) the distribution: This allows us to define the set of permitted positions of agent i, J i t as We have that D t (x) for all x ∈ J i t are the possible belief distributions held by agent i.These possible belief distributions are then summed in I j,t weighted according w x (t−1) the number of ways in which these x positive experiences may have been realised (i.e., the number of histories resulting in x positive experiences by time t).Defining w x (t) is done recursively: for x ∈ [0, t] and t ∈ N (58) with initial condition w x (1) = 1 for all x ∈ J 1 .The information received by agent i from agent j considering that agent j's decision was to place trust in round t is thus defined as We can interpret w x (t) as the number of walks from (0, 0) to (x, t − x) that retain x τ ∈ J τ for τ = 1, 2, . . ., t. Figure 8a illustrates the permitted walks for a "yes" signal with the boundary between x in and out of J t being represented as the thick horizontal line.
In order to define the information sent with a "no" signal, we define the range of values which would result in "yes" signals up until round t − 1 and a "no" in round t: Subsequently, we have the information sent in a "no" signal:  where w x (t) is defined as above.The number of permitted walks to a "no" signal at time t is simply the number of permitted walks to a "yes" signal at time t − 1 which are also one negative experience away from quitting.This is illustrated in Figure 8b with a horizontal line indicating the boundary between x in and out of J t and the dotted line indicating the boundary between x in and out of K t .Note that the information sent by a "no" signal in subsequent rounds (after the first such signal) does not change, as we allude to in (54).
The agents finally construct an estimate by combining their own information with the signal information: 1 0 y S i,t (1 − y) F i,t P 0 (y)I j,t (y) dy and place trust in round t + 1 if 4.4.Illustration of the differences between the models presented.In this subsection we present a set of example experiences for two players.These serve to demonstrate the workings of the dynamics in the OA model and the OR model.
We consider both dual agent models with parameters c = 2, r = 1 (and therefore θ crit = 2/3) and u crit = 2 (α = 5, β = 2).Without determining whether or not these interaction outcomes are witnessed or not, let us presume the responses by the institution to the two agents are given for the first four rounds (X i,1 , X i,2 , X i,3 , X i,4 ) = (0, 0, 1, 1) for i = 1, and We observe that the first agent would have quit in the single agent model after the first interaction, but in the OR model they continue past this point.The OA model is not so easily represented visually and so will require more elucidation.We offer Table 3 summarising the variables over time for the single agent models for agent 1 and 2 respectively as well as in the dual agent models.Note that both agents 1 and 2 have the same estimate throughout in the OR model, due to the fact that they start with the same prior and are privy to the same information each round.The first agent's estimate stops changing after the first round, as a result of their quitting which is the case in the single agent model as well as in the OA dual agent model.This is because by the time they take the decision to not place trust, they have not yet received a signal containing information from agent 2. The reactions of the institution X i,t for i = 1 and t ≥ 2 are thus not witnessed and not used to update agent 1's belief.In the first round of the OA model both agents behave exactly as they would in the single agent model because their first action is non-informative.After this first round, the first agent quits and takes the "not place trust" action in round 2 and all subsequent rounds.This indicates to the second agent that trust was abused in agent 1's first round.In this case w x (1) = 1 (as part of the initial condition of w x ) and I 2,2 = θ 0 (1 − θ) 1 while their personal S 2 equals 2, which provides The mean of this belief distribution is used as the point estimate; it is For illustrative purposes we also show the information sent out by agent 2 by their action in round 3, i.e., I 1,3 .Firstly they construct R 2 2 , the belief formed by taking into account the report received by agent 2 by round 2: R 2 2 (θ) = P 0 (θ)I 2,2 (θ) Subsequently the auxiliary distribution D 2 is constructed: This distribution is then used to construct the set J i=2 t=2 = {2} considering that this is the only x for which 1 0 D 2 (x) dθ ≥ 2/3.Considering that x = t we know that w 2 (2) = 1, which means that This example illustrates an information cascade, as the second agent would not quit in the first 4 rounds in any model except for the OR model.The effect for the first agent on the other hand is that instead of quitting in round 1, as in the single agent model as well as in the OA model, they quit in round 2 of the OR model.

Setup of simulation experiments
In this section we describe the setup of our simulations experiments.The results of these simulation experiments we present in §6, while in §7 we discuss and interpret the results of the experiments.The primary goal of the experiments is to assess the two communication mechanisms in the two agents case, in terms of the probabilities of quitting and the expected quitting times (conditional on quitting).5.1.Choice of c and r.In order to get an idea of how the model parameters influence the probability (and timing) of agents quitting, we choose the following 5 combinations of (c, r): (1, 1), (3, 2), (2, 3), (2, 1), (1,2), so that c = r is the "base case" and c = 2r and r = 2c are the "extremal ends".There are two reasons for choosing this range of ratios.The first consideration is that if the ratio gets pushed further in either direction, then the simulation results become harder to obtain.The critical trustworthiness θ crit is shifted toward 1 with an increase of c, which means that only a small range of ϑ have probabilities of quitting less than unity.Within this small range there is a sharp drop of quitting probabilities, because with very large ϑ the probability of quitting is low as this requires an abuse of trust.In the other extreme, with an increase of r, θ crit decreases, so that increasingly many parameter settings have to run over a relatively long time interval in an already slow simulation.This is because of the large number of numerical integrations required in the observable actions model used in constructing J i t , D t and R i t for each i = 1, 2. The second reason is that it is a sufficiently extensive range of cost-to-reward ratios, especially in the context of trust problems 3 .5.2.Choice of α and β.Intuitively, the parameters α and β can be considered as a number of interactions with the institution prior to the dynamics we consider, which resulted in trust being honoured α times and abused β times.This means that the greater α is compared to β, the more the prior belief distribution of the agents is skewed toward greater values of ϑ.Similarly, for a lower α compared to β, this prior belief distribution is skewed toward the lower values of ϑ.The prior belief parameter settings α and β, in combination with the choice of c and r, determine the instance's u crit = rα − cβ + 1. Borrowing from the analogy of the one-dimensional random walk of the single agent model, the value of u crit represents a "distance" to quitting.If we would like consistency in the interpretation of u crit , we require a conversion: Take u * = u crit /c, which is the minimum time in which an agent can quit in the single agent model.There is asymmetry in the model between the cases when c > r and c < r.In case c > r, it is possible for an agent to experience an honouring of trust in the first round, yet still quit after the second if the step size for an abuse of trust is such that c > u crit + r.In case r > c this scenario is not possible.We choose values of α and β such that, in combination with the values of r and c, we cover the range u * = 1, 2, 3.The choice to stop at u * = 3 is made to keep the probability of quitting high enough to facilitate efficient simulation.

Iterations and simulation length.
In our simulation study, it is our goal to produce reliable estimates for the parameters under study.We run each of the models for 4 000 truster agents in total, i.e., the single agent models are run 4 000 times while the dual agent models are run 2 000 times.We run each simulation for a maximum of 500 time steps, with the exception of simulations with ϑ ∈ {0.84, 0.9} in which case we run the model for a maximum time of 200 time steps.These choices were made keeping considerations of confidence interval width in mind as well as execution feasibility.Considering that in the two agent models the dynamics are sped up (as there is more information available per time step), this should be sufficient time also for both two agent models.
In the interactions with highest ϑ the probability of quitting is very low and concentrated on the first couple of time steps.We can make an educated (pessimistic) estimate as to how many time steps would be required in order to not miss any relevant "probability mass."We do this relying on Markov's inequality applied to the expected time to quitting in the single agent model.As an illustration consider the parameter setting with c = 2, r = 1, α = 5, and β = 2 (such that u * = 1) at the parameter setting ϑ = 0.84.In the single agent model the expected time to quit is E[τ | τ < ∞] ≈ 3.17.By Markov's inequality we know that the probability of quitting at time τ ≥ 200 is bounded as follows: Hence, in the N experiments performed, the expected number of times we miss the quitting event is Taking N = 4 000, we obtain the bound The single learner case is a conservative benchmark, as there is less communication than in the two agents models, so that we anticipate agents in the two agent models to quit sooner.The extent to which this estimate is conservative, is illustrated by the fact that in the single agent model (for the parameter settings described) the latest quitting occurred at round 45, for the dual agent OA model at round 40, and for the dual agent OR model at round 17.

Results of experiments
In this section we present the results of the experiments described in §5.We plot the results for specific values of (c, r) and u * in Figures 10a-15b; the numerical output of all experiments is given, in a tabulated format, in Appendix D. The plots cover the settings (c, r) = (1, 1), (2, 1), (1, 2) and u * = 1, 3, where we add the synthetic points p quit (α, β, c, r, ϑ = θ crit ) = 1 and p quit (α, β, c, r, ϑ = 1) = 0 to the curves shown in the figures in order to show the dynamics beyond the capability of the simulation.To be able to see subtle differences in results, we show the probability of quitting predominantly in the region in which quitting is not guaranteed.6.1.The probability of quitting.The probability of quitting in the regime ϑ < θ crit should remain unity: the same and more information is made available to the agents in the dual agent model as in the single agent model.The values of 0.999 in Tables 5a, 6a, and 7a could have been remedied at the cost of more simulation effort (i.e., by working with more runs with more time steps).
In the (more interesting) regime ϑ > θ crit , for most parameter settings the probability of quitting increases from the OR model to the OA model and again to the single agent model.This trend is more pronounced for u * = 1 than for u * = 2 and u * = 3.There are two exceptions to this trend in Tables 6a and 7a: for ϑ = 0.65, 0.66 at the parameter settings c = 3 and r = 2, with u * = 2, 3.However, we observed overlap in the original confidence intervals.
To determine whether such a difference bears significance we conducted more simulations for the faster case with u * = 2.For the OR model, being computationally lighter, we conducted  The resulting confidence intervals of the points ϑ = 0.65, 0.66 are depicted in Figure 16.We observe the, perhaps unexpected, result that the OA model produces a lower probability of quitting in this parameter setting.Hence, the OR model does not always outperform the OA model in terms of the probability of quitting in the regime where ϑ > θ crit .
Part of the explanation of the lower probability of quitting in the OA model relates to the timing of the communication, being at the end of a round.In the OR model, as soon as an agent has their own reward, they also observe the reward of their neighbour.This boils down to two pieces of information per round.In the OA model however, the agents only observe the actions of their neighbour at the same time when they have already made their own action for that round.That means that an agent who observes their neighbour not placing trust for the first time in round t can only use the information contained therein to inform their action in round t + 1.In the meantime, as they have not quit yet in round t, they are privy to at least that round's outcome before they have to make another decision.We elaborate on this effect in Appendix C.

6.2.
Expected time to quit.In general (in the regime ϑ < θ crit ) the expected time to quit is lower in the dual agent models than in the single agent model.Furthermore, the OR model tends to have the lowest expected time to quit (i.e., the greatest effect), though with a fair amount of crossing with the OA model.The key exception to this trend occurs at c = r = 1 with u * = 1, depicted in Figure 10b.At this setting, the OA model performs better than both others which show a similar time to quitting.Furthermore, there are exceptions at c = 2, r = 1 and u * = 1 depicted in Figure 14b, and at c = 3, r = 2 and u * = 1 shown in Table 5b.
In the regime ϑ < θ crit there is the trend (with exceptions) that quitting in the OR model occurs sooner than in both other models.In the OR model the agents are exposed to two outcomes per round of interaction and so receive twice as much unbiased information than in the single agent model.In this regime, the unbiased information thus received is likely to indicate that the institution is trustworthy and so quitting becomes less likely with more time (i.e., quitting must occur quickly or not at all).This trend is most pronounced when u * = 3 shown in Figures 13b and 15b as well as in Table 7b.

Discussion
In this section we draw conclusions from the numerical experiments, discuss how these findings relate to those in the literature, and present directions for future research.7.1.General conclusions.In all of the models a similar pattern in the expected time to quitting holds: there is an increase in the expected time to quit as ϑ increases until θ crit and a decrease afterwards.This specifically entails that a long average tenure of customers at an institution, does not indicate that this institution is to be trusted, or, put differently, there is no way of knowing which side of the critical value they might be on with only this information.An institution can simply have a trustworthiness that is just high enough to keep customers placing their trust in them long enough to get positive net utility from that relationship yet not actually high enough to warrant an indefinite relationship.An institution with many relationships still going and a couple of very short concluded ones on the other hand might be a good indication that the institution is indeed trustworthy.
By comparing the probability of quitting for the same c and r at different u * (for example in Figures 10a and 11a) we see that the difference between the three models is relatively small for u * = 2, 3.This shows that starting optimistically allows agents to secure good chances of trusting a trustworthy institution in the long run.
We encountered the (perhaps) remarkable phenomenon that the effects of the different two agent communication mechanisms are not monotone.This is highlighted in the cases (a) with c = r = 1 and u * = 1, depicted in Figures 10a and 10b and (b), the case with c = 3, r = 2 and u * = 2, depicted in Figure 16.In (a), the OR model outperforms the OA model in terms of probability of trusting a trustworthy institution.At the same time however, the OA model outperforms the OR model in making the decision to quit a trust relationship with an untrustworthy institution sooner.In (b), we see the OA model outperforming the OR model in terms of the probability of trusting a trustworthy institution for values ϑ = 0.65, 0.66.These results show us that we cannot state that the OR model is always "better" than the OA model.In fact this depends on the parameter setting as well as on which criterion you find most important: making the correct decision in the long run, or being sure to end a relationship with an untrustworthy institution relatively quickly.This surprising result is partially due to the "timing" of the underlying dynamics.Agents make their decision to place trust at the same time and so can only use information from their neighbours action, one round later.We illustrate this in a two-rounds setting in Appendix C. We summarise our insights from the numerical results: • Communication always helps: it increases the probability of never ceasing to trust a trustworthy institution, and it decreases the expected time until quitting a relationship with an untrustworthy institution, both compared to the single agent model.• The OA model can be better or worse than the OR model, depending on the performance measure of interest (probability of quitting, or expected time to quit).There are instances in which having less information is beneficial to the agents.• A good way to increase chances of trusting a trustworthy institution in the long run without a social network is to start with a more optimistic prior.
7.2.Reflection and context.In this paragraph we compare our model to those found in the literature streams conceptually.In the subsequent paragraphs we compare the outcomes of our model to those in the literature.We presented a model of trust which includes both learning from interactions with the institution as well as learning from communication between agents.Thematically this work is related to social network trust where the focus is on the possible loss of trust.Methodologically this work is related to the social network learning.The agents in our model are learning about the environment (trustworthiness of the trustee).Our model extends ideas in both of these streams in a natural way: The dependence structure between actions taken by the agents, and the signals they subsequently use to learn, extend in a realistic way the work from the social network learning literature, which typically does not cover this complication.Finally, the observable actions model of communication adds realism to the relationships between agents, not present in the more common observable rewards model of communication in the social network trust literature.
The results of our investigation, akin to those from Buskens [26], show that trust increases in the models with truster agents sharing information about the trustee.In our model, however, this effect extends to models in which a trustee dishonouring trust in one round does not immediately indicate that they are not to be trusted.Furthermore, we extended the type of communication between the truster agents beyond complete information sharing.We find that the positive effect of agent communication, though stronger in the complete information sharing, is also present in the model in which communication is limited to observing actions.Hence, even when individuals do not hold extensive discussion with their peers about the institution, simply observing their actions provides a significant benefit, which has previously been shown for more extensive communication in the social network trust literature [26,27,28].
In both models of communication that we consider, for all parameter settings, we observe that rational use of social network information increases the chances of agents to learn the true trustworthiness when it is rational to place trust.We also observe that for both models of communication and in all parameter settings, the expected time to quitting is sooner, which is especially beneficial when it is rational to quit.Our model includes realistic assumptions in terms of agent communication and of signal dependence on actions, and shows that communication between agents are an aid to learning and trusting.The simplistic nature of independent signals and actions cannot be imported to models of trust and learning.It is the nature of a trust problem that resources have to be placed at the liberty of the trustee agent in order to see how they respond.However, like in the social network learning literature (cf.[14,18]), our work shows that more communication leads to faster dynamics, but, due to the dependence of signals on actions, sometimes the result is converging to the sub-optimal action (i.e., not placing trust).
7.3.Future work.We see from the developed model that agents communicating with one another bears a benefit.There are, however, situations in which the agent that otherwise might have stayed on the course of placing trust, "wrongfully" stops placing trust as a result of what they hear from their neighbour.It seems that the overall dynamics are not dominated by this effect.A natural question then becomes whether or not this beneficial effect retains in a model with more than two agents.The present model is limited to two agents partially as a result of the intense computational work involved: agents in the OA model perform numerous numerical integrations per round in order to identify which histories of experiences are plausible for the communication received from their neighbour.
It is an open question whether there is a scalable approach to perform these computations.Alternatively, one could find a way around this by relaxing the agents' rationality when it comes to interpreting their neighbours actions.This would allow investigation of a greater pool of agents who have some sophistication in learning from their private signal, but comes at the cost of simplifying the model.
Another line of future work concerns asymmetric communication between two truster agents.
One of the truster agents may be modelled as a news outlet which only sends information without receiving any in return.In the same spirit one agent might be malicious by spreading misinformation about the institution.It may be that this asymmetry between agents may make technical results more attainable.For instance, one can condition on the private signal of the only sending truster and observe the dynamics of the receiving truster.The dynamics for the purely sending truster agent conveniently follows the dynamics we have presented in the context of the single-agent trust model.Modelling a malicious actor is also possible by deciding a priori what signal they will send, or by having a truthfulness parameter by which the truster agent communicates their actual experience with probability η and communicates that trust was abused (regardless of the truth) with probability 1 − η.The honest agent in such a model may then need to learn not only about the trustworthiness of the institution but also about the information they receive from their network.
The agent estimates the trustworthiness in round t by This means that there exists t for which At this value of t, the expected utility of the interaction is negative, which implies that the agent takes action A t = 0, a contradiction.□ A.2. Proof of Lemma 2 (Converge or quit).
Proof.The agent makes use of a Bayesian belief update starting with a beta distributed prior belief with shape parameters α and β.Given that S t = t t=1 X t 1 {At=1} , where X t is a binary random variable taking the 1 at probability ϑ and 0 at probability (1 − ϑ), where ϑ is the value which the agent is trying to estimate, we have that the belief distribution at time t equals Defining τ as the first t ∈ N such that 1 0 θP t (θ) dθ < c/(c + r) (possibly ∞), we have for all t < τ that the estimated ϑ is given by Then there are two possible cases: τ < ∞ and τ = ∞.In the first case (where τ < ∞) the agent stops placing trust and by the assumption on ϑ, namely that ϑ ≥ c/(c + r) = θ crit , it is clear that the agent has not converged to the true value of ϑ.This is because of the fact that quitting requires ϑ ≥ θ crit and that by assumption we have chosen ϑ > θ crit .In the second case we have τ = ∞.Here we define S t which drops the dependence on the actions of the agent: and we note that when τ = ∞ the relationship S t = S t holds, and therewith ϑ = (S t + α)/(α + β + t).Proof.Recall the assumption that θ ≥ c/(c + r) and the definitions π(u) = P(∃t : Z t ≥ u) and ϱ = P(∃t : Z t ≥ 1).Observe that (A) by the strong Markov property the random walk after having hit level 1 is independent from what happened before, (B) on the path to level u any level in the set {1, . . ., u − 1} has been attained.We thus obtain the 'memoryless property' Now consider the first step of the random walk starting at 0; it can either hit the first level immediately at probability (1 − θ) or it drops to the level −r putting it a distance r + 1 from the level 1.Using (80) we find the identity We can prove that this equation has exactly one solution in the interval [0, 1).Note that F (•) is convex.A trivial solution is of course ϱ = 1 which accounts for one of at most two intersections between a convex function and a linear function.It thus remains to show that there is another intersection, and that it lies in [0, 1).This is indeed the case because F ′ (1) = θ(r + 1), which is larger than G ′ (1) = 1 due to the condition that θ > c/(c + r) after substituting c = 1 and rearranging.Observing that F (0) = 1 − θ > 0 = G(0), this entails that there is another root located somewhere between ϱ = 0 and ϱ = 1.□ A.4. Proof of Lemma 4 (Expected time to quitting when c = 1, and r ∈ N).
Proof.Let τ (u) = inf{t : Z t = u} be the first passage time for the walk Z t to u and recall that π(u) is the probability of this event occurring in finite time (i.e., π(u) := P(∃t : Z t ≥ u)).
Then we are interested in the expectation of the τ (u) conditioned on its finity: Note the relationship which holds by the precise same 'memoryless argumentation' as in the proof of Lemma 3. We define which by the relationship (83) also allows for a self-referential expression Finally note that we can get the expectation of interest out again by taking the derivative and substituting z = 1: We now manipulate these expression in order to get the final statement: observing that substituting z = 1 and noting that φ(1) = π(1) = ϱ with ϱ as defined in (22), we obtain =ϱ by ( 24) Multiplying this by u/ϱ gives the final result.□ A.5. Proof of Lemma 5 (Quitting probability when r = 1, and c ∈ N).
Proof.We introduce the time G which follows a shifted geometric distribution: P(G = t) = (1 − f ) t f for some f ∈ (0, 1).We investigate the behaviour of the walk, specifically the probability of reaching u, before the geometrically triggered time G: as well as the probability of reaching Z n = 1 for some n ≤ G: Furthermore, we define minimum of the walk until time n, Z n := min{Z 0 , Z 1 , . . ., Z n } as well as the maximum attained until time n, Zn := max{Z 0 , Z 1 , . . ., Z n }.Due to the Wiener-Hopf factorisation [30,31] as is seen by rotating the walk 180 • and considering Z G as the starting point of the new, identically distributed process.Therefore, Hence, if we manage to identify Ew Z G and Ew Z G , we have found Ew ZG .In pursuit of Ew Z G , let x be the number of times the process jumps upward in the time interval t ∈ {1, . . ., G}.Hence, Recalling that G is geometrically distributed, for In order to find Ew Z G we note that the minimum Z G is identically distributed to the negative of the maximum of the random walk B G that takes steps up of size one with probability ϑ and down c ∈ N with probability 1 − ϑ.These two walks and their step descriptions are shown in Table 4.The walk B G is simply a variable substitution of the walk analysed in

Walk
Step + P(+) we have an expression for this maximum.By the relationship between B G and Z G , we have also have an expression for Z G , with ϱ[f ] solving (94), Hence, This, together with expression (93) and the Wiener-Hopf factorisation (91), yields By considering this as f ↓ 0, it is first noted that ϱ[0] = 1.Then, defining σ(u) := P( Z∞ = u), What remains is to identify φ(z, 1) for each z ∈ (−1, 1).The key principle is that, because their ratio is a finite number, any root of the denominator in (112) must also be a root of the numerator.Indeed, supposing there is a unique w(z) ∈ (−1, 1) which yields a root of the denominator, this would imply that This means that we are done if we can show that for any z ∈ (−1, 1) the denominator in (112) has a unique root w(z) in (−1, 1).It is directly seen that this root satisfies Observe that for z = 0 we obviously have the unique root w = 0.
Subcase z < 0. The case of c odd works analogously to the subcase z > 0: F (0) = zϑ < 0 = G(0) and F (−1) = z > −1 = G(−1).From the concavity of F (•), the existence of a unique root w(z) ∈ (−1, 1) follows (which lies between −1 and 0).In case c is even, we still have F (0) = zϑ < 0 = G(0), but now it should be noted that F (−1) = z(2ϑ − 1), which is, for z ∈ (−1, 0) and ϑ ∈ (0, 1), larger than −1 = G(−1) (pick ϑ = 1 and z = −1).The existence of a unique root w(z) ∈ (−1, 1) now follows from the fact that F (w) is decreasing and G(w) is increasing for w ∈ (−1, 0) (where this root lies between −1 and 0).□ B. Appendix: Single agent case r, c ∈ N In this case no exact formula can be obtained for the probability of quitting or the expected time to quitting conditioned thereon; recall that in the other two cases we exploited the fact that the walk, either in the upward or the downward direction, had step size 1.We briefly sketch a numerical scheme which allows accurate approximation of the probability of quitting.The relationship for all u > c is: We supplement this with an approximation for large enough u > u 0 by setting π(u) = ηγ u for η ≥ 0 and γ ∈ (0, 1).In combination with (115) we know that γ satisfies 1 = (1−ϑ)γ −c +ϑ γ r .Using this approximation for π(u) from u ≥ u 0 one has: which can be used to construct a system of equations which can be solved numerically.We propose approximating the probability of absorption (and subsequently a termination of trust) by the solution to the system (116) at the appropriate u = rα − cβ + 1.The value of u 0 should be large enough value to get a good approximation, and small enough to keep its computation manageable.
We sketch a similar approximation scheme for the expected time in the system.The crux of the scheme is to approximate the expected time to quitting at a high enough u by the number of rounds it would take to move directly toward absorption without taking any steps away: ⌈u/c⌉.In terms of φ(z, u) one arrives at the following system of equations: This defines a system of equations similar to (116) which can be solved numerically.This provides an approximation to φ(z, u crit ) which in turn can be used to get an approximation of E[τ (u crit ) | τ (u crit ) < ∞] by taking the derivative at u crit : φ ′ (1, u crit )/φ(1, u crit ).Notice the similarity to (40).
C. Appendix: An illustration of how the OA model can beat the OR model In this appendix we elaborate on the peculiar outcome that in the parameter setting with c = 3, r = 2, and u * = 2 the OA model yields a lower probability of quitting than the OR model at the trustworthiness settings ϑ = 0.65, 0.66.
We consider a pair of agents in precisely the setting in with the exception (c = 3, r = 2, u * = 2 and ϑ = 0.65).Furthermore, we presume that of the first four times trust is placed in the institution (by either agent) trust is honoured only once.
In the OR model, this means that maximally two rounds of interaction can take place.Either the first round includes abuses of trust to both agents, in which case they quit in round 1, or trust is honoured once in round 1, but abused twice in round 2 and so the agents do not place trust in round 3.
In the OA model, the agents involved have two rounds of interaction regardless of their first reward.The act of placing trust at the start of round 2 yields no information to the neighbour.At the start of round 3 however, the agent (labelled 1) who observed two abuses does not place trust.This signals to the neighbouring agent (labelled 2) that two abuses of trust have certainly occurred.At the point of placing trust in round 3 (before observing the outcome thereof) agent 2 has access to information which brings their estimate below the critical value.However, the agent has already placed their trust and so will see the outcome of that interaction.This outcome is of crucial importance because if the trust is honoured (which is more likely as ϑ = 0.65 > 0.5), agent 2 will stay in the relationship while if the trust is abused they quit.

3 Figure 2 .
Figure 2. The probability of a single agent quitting p quit (α, β, c, r, ϑ) plotted against ϑ for different values of r while α = β = 2 and c = 1.Analytical results are plotted as lines, while simulated results (4 000 iterations) are shown as points with 95% confidence intervals.

Figure 4 .
Figure 4.The probability of a single agent quitting plotted against ϑ for different values of c while α = 3, β = 1 and r = 1.Analytical results are plotted as lines, while simulated results (4 000 iterations) are shown as points with 95% confidence intervals.

2 Figure 5 .
Figure 5. Expected time in the system conditioned on the agent quitting at some t < ∞ plotted against ϑ for different values of c while α = 3, β = 1 and r = 1.Analytical results are plotted as lines, while simulated results are shown as points with 95% confidence intervals.

Figure 8 .
Figure 8.An illustration of the weighting w x (t) used in the interpretation of the observed action.
The resulting random walk interpretations within the single agent model as well as the observable rewards model are depicted in Figure 9.

Figure 9 .
Figure 9.The random walk interpretation of the OR model as well the individual sample paths of the respective agents.

Figure 16 .
Figure 16.The results of extra simulation runs for the probability of quitting.In these runs c = 3, r = 2, and u * = 2.

and we know by the law of large numbers
ϑ < c/(r + c) (almost surely).
Figure 6.The observable rewards model of learning.

Table 3 .
Estimates, ϑ t in the respective models.
First inspecting S t /t which we know by the law of large numbers takes This shows that the agent's estimate has indeed converged to the true ϑ almost surely.□ A.3.Proof of Lemma 3 (Quitting probability when c = 1, and r ∈ N).