Emotion-involved human decision-making model

ABSTRACT This study proposes a computational human decision-making model that handles emotion-induced behaviour. The proposed model can determine a rational or irrational action according to a probability distribution obtained by mixing an optimal policy of a partially observable Markov decision process and an evolved probability distribution by novel dynamics of emotions. Emotion dynamics with consecutive negative observations cause emotion-induced irrational behaviours. We clarify the conditions, via two theorems, that the proposed model computes rational and irrational actions in terms of some model parameters. A numerical example based on Japanese court records is used to confirm that the proposed model imitates the human decision-making process. Moreover, we discuss the possibility of preventive measures for avoiding the murder case scenario. This study shows that if the traits of a decision maker can be modelled, the proposed model can support human interactions to avoid an emotion-driven murder case scenario.


Introduction
Every day of our lives is filled with decision-making opportunities, and we occasionally exhibit irrational behaviour and have irrational thoughts. For example, a person may strike another person in anger, whereas another person may act to help other people's children without regard for any danger associated with it. These actions are products of irrational decision-making processes; therefore, taking into consideration the concept of rationality, as well as the deviation from it, in the process of decision-making facilitates human behaviour prediction. The concept of irrationality is discussed in various fields, each of which has its own technical term. In game theory [1][2][3], cognitive science [4][5][6], and political economy [7], irrationality has been discussed as a deviation from the maximization of an agent's expected utility. In the field of strategic management [8], irrationality has been discussed as a deviation from rational decision-making, which is caused by manifesting various biases and heuristics including overconfidence to some extent. In policy making [9], an irrational decision is what people make based on emotions, gut feelings, habits, and beliefs to make decisions quickly without pursuing clear goals and prioritizing certain types of information. Additionally, the textbook of [10] in the field of business discusses the competitive situation where people engage in an activity that is clearly irrational in terms of the expected outcomes to both sides, called as competitive irrationality.
Emotion, defined in [11], is a specific type of automatic appraisal process influenced by our evolutionary and personal past. In [12], emotions were shown to consist of primary and secondary emotions, where the primary emotions mean the (reactive) emotions we experience early in life, and the secondary emotions mean the (complex) emotions we experience as adults. Secondary emotions arise from complex cognitive processes, such as primary emotions, prospect, and the evaluation of the likelihood of outcomes; they are believed to be unique to humans [13]. An example of primary emotions, as introduced in [12], is that a baby chick in a nest does not know what eagles are, but promptly responds with alarm and by hiding its head when wide-winged objects fly overhead at a certain speed. An example of secondary emotions is that a man may feel fear when he is told of the unexpected death of his colleague. Moreover, emotions lead to rational and irrational decisions [12]. Emotions are related to other factors that cause human (rational and irrational) decisions, such as the absence of information [14], a bias [15], and risk taking behaviour [16]. Therefore, emotions must be considered in human-mimicking decision-making models that are purposed to predict rational and irrational behaviours. Indeed, some of studies have discussed how emotions influence human decision-making processes. For example, a human decision-making process is considered to be an integration of conscious and unconscious (emotional, reflexive, and impulsive) reasoning, which is referred to as a dual-process theory [17][18][19]. Moreover, the somatic marker hypothesis is that somatic markers, feelings in the body that are associated with emotions, strongly influence decision-making processes [12]. The review paper by [20] includes the claim that the somatic marker hypothesis is consistent with the results of studies that indicate that mood, affect, and emotions play significant roles in decision-making processes. Regarding the decision-making processes of older adults, the results of a review paper [16] and finance questionnaire [21] suggest that older adults possess an emotional regulation skill that copes better with negative events or regulates their emotions and controls their negative mood. As a result, the skill helps the adults actively manage their emotions in a way that encourages positive self-evaluation. Thus, we can conclude that emotions are related to decision-making and may lead to irrational decisionmaking.
There are textbooks [12,22] that provide detailed accounts of how the emotions are categorized and what affects the intensity of emotions, as well as a guideline of modelling emotions and their intensity. Meanwhile, Ortony-Clore-Collins (OCC) theory, presented in [22], describes several categories of emotion types; the information about the categories is useful for establishing a computational model for emotion-induced decision-making processes. This is because we can quantify the intensities of the emotion types provided by the OCC theory. In [23], a reinforcement learning algorithm was proposed that considers an agent's artificial emotions based on the OCC theory and agent's drives inspired by Lorentz's hydraulic model. The survey paper of [24] introduced several computational models of emotions based on a somatic marker hypothesis and the OCC theory in a robot-and-agent-based reinforcement learning method. In contrast, a valence-arousal model was proposed by [25] to express group emotions that are adjusted by the sensor stimuli control system. In [26], to develop socially assistive robots, a Markov model was used to achieve artificial emotions in a robot.
Furthermore, as stated in [27], the fundamentally dynamic nature of emotions is increasingly being taken into consideration in the development of humanmimicking decision-making models. Because our daily lives are dynamic, our emotional responses to our environment are also dynamic. That is, our environment changes every moment, and therefore every time we make a decision, the resulting response may differ from the past ones. The authors of [27] organized the core principles of emotion dynamics in terms of contingency, inertia, regulation, and interaction, after conducting research to investigate how emotions evolve. To the best of the authors' knowledge, there are no computational frameworks to describe the dynamics of emotions during decision-making processes. However, if there was such a computational framework, human-mimicking decision-making and the prediction of human behaviour could be realized and applied for various purposes, such as the realization of an affective artificially intelligent (AI) character, a humanoid robot, or a persuasive dialogue system, or the enhancement of humancomputer interaction. The study of [28] formulated dynamic emotions in beliefdesire-intention agents using difference equations, in which the authors simulated bushfires evacuation in Australia as an example. An AI model was also developed and reported in [29]; through the implementation of Newton's laws of motion, the researchers demonstrated that it could imitate the affective character of humans and exhibit the dynamical characteristics of emotions. In [30,31], the authors reported on an OCC theory-based model that implemented a mass-spring model and cyclical appraising and reappraising stimuli as the basis for emotion dynamics. Consequently, they developed architectures to reproduce an affective character. Additionally, the authors of [32] developed differential equations to express known romantic feelings. The feelings, love and hate of Romeo and Juliet, are quantified and the time evolution of the feelings is expressed in a coupled ordinary differential equation.
The objective of this study is to propose a computational human decisionmaking model that considers emotionally driven irrational behaviours of humans. The proposed model consists of a rational decision-making process using a partially observable Markov decision process (POMDP), novel emotiondynamics architecture that computes the emotion-driven probability distribution for actions, and a conclusive decision process that chooses an actual action. Emotion dynamics are described using a dynamical equation of emotions based on relevant studies [12,22,31] to allow the emotions to guide the decision-making. The resulting equation is described in a discrete-time nonlinear state-space representation, which is a familiar formulation in modern control theory, where we can use significant concepts and tools for the analysis and design of dynamical systems expressed as differential and difference equations. Secondary emotions are defined as a state variable of the model, and primary emotions and imagination are considered exogenous inputs to the state revolution. This study also clarifies the conditions that the proposed model computes rational and irrational actions in terms of some of the model parameters, which are summarized using two theorems. This study defines an action computed in POMDP as rational and the opposite action to the rational as irrational. The definition is similar to [1][2][3][4][5][6][7] in that irrationality is the deviation from individual utility maximization. The main contribution of this study is that a computational decision-making model has been created via mathematical formulation that involves emotion dynamics to induce irrational behaviour. The conventional research most relevant to this study is [31], where a POMDP was used to compute the intensity of emotions, but it does not address an irrational decision-making issue. A numerical example is presented in which a Japanese court record of a murder case is modelled as an emotioninduced decision-making process. The results show that the modelled human decision-making simulates the agent's behaviours appearing in the court record. Moreover, we discuss the possibility of preventive measures for avoiding the murder case scenario. In addition, there are studies on criminal acts based on mathematical modelling, such as a crime prediction approach based on kernel density estimation using Twitter-derived information [33], and the construction and application of an agent-based model for simulating the occurrence of crime [34]. It should be noted that they have not addressed the decision-making process.
The rest of this paper is organized as follows. Section 2 proposes the decision-making model that includes rational decision-making using a POMDP and dynamic emotions. Section 3 analyzes the conditions for considering rational and irrational actions in the proposed model. Section 4 demonstrates the modelling of a human decision-making process using a court record as an example and discusses the possibility of preventive measures for escaping a criminal action using the example. Section 5 concludes this paper.

Proposed decision-making model
This section proposes a human-mimicking decision-making model of an agent that includes rationality and emotion-induced irrationality. This study considers a situation in which the suboptimal behaviour of a human in the sense of payoffs arises due to emotions. The proposed model to imitate the situation is shown in Figure 1, where the symbols are listed in Table 1. It consists of three components: a POMDP, emotion dynamics, and action choice. In the POMDP, an agent decides an action that is defined as rational, and the opposite action to the rational action is defined as an irrational action. The emotion-dynamics framework computes and updates an emotion-induced probability distribution over the rational and irrational actions. The choice of action determines an actual action using a linear combination of the rational action and the emotion-induced probability distribution. In addition, the authors were inspired by the model structure including dynamic emotions of [31] to construct the proposed model.
a policy of the maximum value of the ETD reward μ π � t a resulting optimal policy � t secondary emotions � 1 t secondary emotions for an action a 0 � 2 t secondary emotions for an action a 1 ζ t primary emotions η t an imagination μ E t an emotion-induced probability distribution of actions P E1 t an emotion-induced probability for an action a 0 P E2 t an emotion-induced probability for an action a 1 α an attenuation rate of secondary emotions β a rate between primary emotions and imagination V �y t ðâ t Þ an ETD reward at step t À 1 with a semi-optimal policy a t an actual action μ D t a probability distribution in the action choice component P D1 t a probability for an action a 0 P D2 t a probability for an action a 1 δ a mixture rate of rationality and irrationality of an agent σ a threshold

Rational decision-making process
This study developed a rational decision-making process using a POMDP which is defined as a tuple hS; A; Ω; T; O; Ri, where S, A, and Ω are finite and time-invariant sets of states s i ; i 2 f0; 1; 2g, actions a i ; i 2 f0; 1g, and observations ω i ; i 2 f0; 1; 2g, respectively. That is, 1� returns a transition probability when state s t transfers s tþ1 after an agent chooses action a t , that is, � � �g denotes a step and P returns a probability. An observation function O:S � A � Ω ! ½0; 1� returns an observation probability when an agent receives observation ω t after the agent selects action a t and a state transfers s tþ1 , that is, Oðs tþ1 ; a t ; ω t Þ ¼ Pðω t js tþ1 ; a t Þ, and then, R:S � A ! R is the reward function.
Because the POMDP assumes that an agent does not directly observe a state s, this study adds a probability distribution over S, denoted as μ b 2 Δ S called a belief, to the POMDP, where a set of the belief is it considers that the last-step actual action â t :¼ a tÀ 1 and the observation ω t are used to compute a belief μ b tþ1 in the next step t þ 1 in the following manner: Let us define an expected total discounted reward for policy π t given belief μ b t at step t, which is denoted as V π t t ðμ b t Þ 2 R, in the model predictive control fashion, where μ b tjt :¼ μ b t is the initial belief in evaluating the expected total discounted reward, A is a policy (a sequence of probability distributions over actions), N > 0 is a finite horizon that is constant, and γ 2 ½0; 1� is the discount factor. The maximum value of the reward V π t is denoted as V � , that is, The expected total discounted reward is computed using the following dynamic programming backward recursion: initialize V � tjtþN ðμ b Þ, and then, for k ¼ N À 1; � � � ; 0, where the expected reward is given by the value function for any initial belief μ b 2 Δ S [35], that is, The POMDP component outputs the first component of the optimal policy, μ π � t :¼ π � t ð1Þ ¼ π � tjt . Here, notably, if the optimal policy μ π � t expresses a 0 , the expression is described in μ π � t ¼ a 0 , which means μ π � t ¼ ½1 0� T , and similarly, μ π � t ¼ a 1 , which means μ π � t ¼ ½0 1� T . This study uses the POMDP-solver [36] to compute an optimal policy and an expected total discounted reward. Thus, the resulting optimal policy μ π � t is passed to the emotion dynamics and action-choice components, and the resulting expected total discounted reward V � t is passed to the emotion dynamics component.

Emotion dynamics
The following mathematical formulation is proposed to describe emotion dynamics [27] of interests: where the primary and secondary emotions are respectively ζ 2 R 2 , 1 and � 2 are the respective intensities of the secondary emotions for actions a 0 and a 1 , α 2 ð0; 1Þ and β 2 ð0; 1Þ are parameters, η 2 R 2 is an imagination that expresses a degree of goodness, μ E t :¼ P E 1 t P E 2 t � � T 2 Δ A , P E 1 t and P E 2 t are the respective emotion-induced probabilities for actions a 0 and a 1 , and a past prospect updated by the last-step actual action â t is defined as where V �y t ðâ t Þ expresses the expected total discounted reward at step t À 1 with a semioptimal policy fâ t ; π � tÀ 1jt ; π � tÀ 1jtþ1 ; � � � ; π � tÀ 1jtþNÀ 2 g. Thus, the proposed emotion dynamics component quantitatively handles the intensities of each emotions. The inputs of emotion dynamics are the observation ω t , the optimal policy μ π � t , and the expected total discounted reward V � t . The output of the dynamics is an emotion-induced probability distribution over actions μ E t , which is sent to the action choice component. The relevance of the presented novel emotion dynamics in Equations (1) to the existing studies is explained as follows. The authors have created a mathematical formulation from descriptions of emotions in [12,22], using the mathematically defined intensity of desirability presented in [31]. The secondary emotions in Equation (1a) have been developed to have an inertia [27], an intensity of 'hope' and 'fear' explained in [22], and dependency on the current primary emotion and imagination, based on the knowledge in [12] that the primary emotion has an impact on the secondary emotion. The primary emotions in Equation (1b) have been made so as to have an instantaneous response to an observation, which is described in [12], and an intensity of 'joy' and 'distress' explained in [22]. The second term on the right-hand side of Equation (1b) corresponds to the mathematically defined intensity of desirability. Equation (1c) regarding the imagination is a modified version of the original desirability of [31]. Then, Equation (1d) is added to convert the values of the emotions and the imagination into a probability distribution over actions using a softmax function.

Action choice
In the action choice component, an agent determines an actual action using μ π � t 2 Δ A and μ E t 2 Δ A . This study introduces a probability distribution μ D t 2 Δ A , and the actual action a t at step t is determined using the following discriminant: where P D 1 t and P D 2 t are probabilities for actions a 0 and a 1 , respectively, δ 2 ð0; 1Þ is a parameter that expresses the mixture rate of rationality and irrationality of an agent, and σ 2 ð0; 1Þ is a threshold. When σ is set closer to 0, the proposed model simulates the behaviours of an aggressive agent, and when it is set closer to 1, the model simulates the behaviours of a patient agent.

Model analysis
This section analyzes the proposed model consisting of Equations (1) and (2) with the optimal policy to clarify the conditions that rational and irrational actions are chosen. Theorem 3.1. Assume that an optimal policy μ π � t 2 Δ A is a 0 2 A. Threshold σ 2 ð0; 1Þ and parameter δ 2 ð0; 1Þ satisfy the following inequality condition, if and only if an actual action is a 0 , that is, a t ¼ μ π � t . Furthermore, this study assumes that an optimal policy μ π � t 2 Δ A is a 1 2 A. Threshold σ 2 ð0; 1Þ and parameter δ 2 ð0; 1Þ satisfy the following inequality condition: if and only if an actual action is a 1 , that is, Proof. Assume that an optimal policy is a 0 , that is, μ π � t ¼ 1 0 ½ � T ð¼ a 0 Þ and that δ � 1 À σ is satisfied. Equation (2) provides P D 1 t ¼ δ þ ð1 À δÞP E 1 t , and because ð1 À δÞP E 1 t � 0, the assumption leads to the following inequality: where the relationship of P D 2 t ¼ ð1 À δÞP E 2 t of Equation (2) is used. In this case, the discriminant of Equation (2) returns the actual action a t ¼ a 0 . In contrast, this study assumes that an optimal policy is a 1 , that is, μ π � t ¼ 0 1 ½ � T ð¼ a 1 Þ and that δ > σ is satisfied. Equation (2) In this case, the discriminant of Equation (2) returns the actual action a t ¼ a 1 . This completes the proof. Theorem 3.1 implies that if we want to create a rational agent who always chooses the optimal policy calculated by the POMDP, then we design the parameters σ and δ such that Equations (3) and (4) are satisfied. Moreover, this study clarifies the conditions regarding irrational decision-making as follows.

Theorem 3.2.
Assume that an optimal policy μ π � t 2 Δ A is a 0 2 A and that σ þ δ < 1 holds. A secondary emotion � t 2 R 2 satisfies the following inequality if and only if an actual action a t is a 1 , where σ and δ are included in ð0; 1Þ. Furthermore, this study assumes that an optimal policy μ π � t 2 Δ A is a 1 2 A and σ > δ holds. A secondary emotion � t satisfies the following inequality if and only if an actual action a t is a 0 .
Proof. Suppose that μ π � t ¼ 1 0 ½ � T ð¼ a 0 Þ and that σ þ δ < 1 holds. Assume that are satisfied. Since σ 1À ðσþδÞ > 0, the assumption leads to the following inequality: , where the relationship of P D 2 t ¼ ð1 À δÞP E 2 t of Equation (2) is used. In this case, the discriminant of Equation (2) returns the actual action a t ¼ a 1 . Furthermore, suppose that μ π � t ¼ 0 1 ½ � T ð¼ a 1 Þ and that σ > δ holds. Assume that � 2 t � � 1 t þ log σÀ δ 1À σ are satisfied. Since σÀ δ 1À σ > 0, the assumption leads to the following inequality: where the relationship of P D 2 t ¼ ð1 À δÞP E 2 t of Equation (2) is used. In this case, the discriminant of Equation (2) returns the actual action a t ¼ a 0 . This completes the proof. Theorem 3.2 implies that we cannot develop an agent model such that an irrational action is always chosen by tuning only the parameters σ and δ. This is because the conditions in Equations (5) and (6) involve the secondary emotions as well that evolve with step.
A graphical explanation of which actual action is chosen in the proposed model is presented, using Figure 2, which summarizes the conditions of the theorems. From the theorems, the relation between δ and σ influences an agent's characteristics. Figure 2 specifies the guideline for modelling an agent having rational or irrational properties. In the figure, the parameters δ and σ are taken within a line segment between 0 and 1. Figures 2(a) and (b) show the conditions when σ < 0:5 and σ � 0:5 hold, respectively. A rational action is an action computed by the POMDP, and an irrational action is the action opposite to the rational action. The blue area corresponds to Theorem 3.1, where an actual action is rational: The red area  corresponds to Theorem 3.2, where an actual action is not specified. These conditions help us model an agent. For example, if we want to create a rational agent, then the parameters are set such that δ � 1 À σ and δ > σ hold. If we want to create an agent that has the possibility of choosing an irrational decision, then the parameters are set such that δ < 1 À σ and δ � σ hold.

Numerical example
This section demonstrates the modelling of a human decision-making process to confirm the ability of the proposed model to reproduce human decision-making processes that results in irrational behaviour. The court record from an actual murder case that took place in Shiga, Japan, in April 2018 was used [37]. The modelling is detailed in Section 4.1, and assuming that the obtained model is sufficiently valid to predict the agent behaviours, Section 4.2 discusses the possibility of preventive measures for avoiding a murder action.

Modelling and simulation
A summary of the court record by the authors is presented as follows.
The defendant, who was a police officer, worked at a police station with the victim, who was an immediate boss, and other police officers. He received daily instructions and reprimands from the victim. As he was receiving his orders, he began thinking that the victim's orders were unfair and unreasonable which made him feel inferior and offended his self-esteem. Thus, he began to disapprove of the victim's attitude. A few days later, a fellow officer's unexpected admittance to a hospital led to the defendant spending more time alone with the victim. Five days later, the defendant was excessively reprimanded for not completing all the tasks assigned to him by the victim. He was also told that his poor performance was reflective of poor parenting. With that as a trigger, antipathy and resentment of the defendant toward the victim increased, and then he shot to kill the victim to clear himself and others of blame.
Based on this record summary, this study attempts to model the defendant behaviours, including rational and irrational actions until the murder of the victim, where a step represents an hour, and the defendant and the victim are replaced with the agent and boss, respectively. A timeline of observations by the boss in the numerical example is illustrated in Figure 3. In the figure, the red mark expresses the boss's behaviour extracted from the record summary, and the black marks express the additional behaviour of the boss that the authors assumed from the record summary. The assumed behaviour is to reprimand the agent every 24 steps from day 1 to day 4. This is because 'the agent was instructed daily and reprimanded on the work by his boss,' 'the agent worked alone with his boss,' and 'the agent had increased antipathy and resentment for his boss.' Based on the prescribed situation, let us define the actions and observations of the agent. The agent's actions of work and murder are denoted as a 0 and a 1 , respectively. The boss's praise, neutral attitude, and reprimand are denoted as ω 0 , ω 1 and ω 2 , respectively, where neutral attitude ω 1 is assumed to be observed except when observing reprimand ω 2 . As for parameters in a POMDP, transition functions over states were set such that it is easier for a state to transition to the dissatisfaction state s 2 than to the satisfaction state s 0 taking the action of work a 0 , Tðs t ; a 0 ; s tþ1 Þ ¼ The transition functions quantitatively express the character of the agent that he experiences dissatisfaction easily and quickly. The observation functions of the POMDP were set such that the agent receives praise ω 0 from the boss when the agent performs an action of work a 0 in satisfaction s 0 , and such that the agent receives reprimand ω 2 when the agent performs an action of work a 0 in dissatisfaction s 2 or when the agent performs an action of murder a 1 . A reward function was defined such that when an agent in satisfaction s 0 performs an action of work a 0 , he obtains a positive reward, which is more than the reward achieved when working in neutrality s 1 , and such that when an agent in dissatisfaction s 2 performs an action of murder a 1 , he obtains a better reward than a reward obtained when murdering in satisfaction s 0 . Rðs t ; a 0 Þ ¼ In the expected total discounted reward, a discount factor γ was set to 0:90, and the evaluation horizon N was set to 73 step (three days). The initial value of a belief was μ b 0 ¼ 0:20 0:50 0:30 ½ � T . In terms of the parameters of emotion dynamics, α ¼ 0:97, β ¼ 0:60, and the initial values of emotions and imagination were set to zero, that is, In the action choice component, σ ¼ 0:427 and δ ¼ 0:350, and then log σ 1À ðσþδÞ ¼ 0:650. A profile of the observation is as follows: ω t ¼ ω 2 if t ¼ 24τ; "τ 2 f1; 2; 3; 4; 5g; otherwise, ω t ¼ ω 1 . Using these parameters obtained via a trial-and-error method, this study obtained the simulation results shown in Figure 4. Figure 4 shows time responses of the probability distribution μ D t and secondary emotions � t over five days. In Figure 4(a), the blue and red lines express the probabilities of choosing a 0 and a 1 , respectively, and the dotted line is the threshold. In Figure 4(b), the blue and red lines express the intensities of the secondary emotions � 1 t and � 2 t , respectively. From the figures, every time the observation is reprimand ω 2 , the value of P D 2 t exhibits a spike close to the threshold, and simultaneously the behaviour of the secondary emotions � 1 t declines. This provides the explanation that the agent is reprimanded by his boss and endures the reprimand at the threshold of emotion, and then, the agent's enthusiasm to work a 0 tends to decrease. On the fifth day, the value of P D 2 121 reaches 0.428, which exceeds the threshold for the first time. As a result, the agent chooses an action of murder a 1 to kill the boss based on the action choice rule in Equation (2), which is shown in Figure 4(c). Moreover, comparing Figure 4(c) with Figure 4(d), the POMDP determined the action of work a 0 as the optimal policy μ π � 24τþ1 for τ ¼ 5, whereas the action choice component chose the opposite action a 1 at the step, where the rational action is flipped to be irrational. This is because the corresponding inequality conditions of (5) in Theorem 3.2 are satisfied, the details of which are shown in Table 2. In other words, the agent could not bear the reprimand of his boss on the fifth day. This is a feature of the proposed model that enables the handling of rational and irrational behaviours depending on the dynamic emotions of the agent.  Figure 4. Transitions of (a) the probability of an action a 1 used in deciding an actual action and (b) the agent's emotional behaviours. Histories of (c) the resulting actual action and (d) the second component of the optimal policy corresponding to a rational action.

Application to preventive measures
This section discusses the possibility of whether the proposed model helps prevent an agent from choosing an action of murder, assuming that the proposed model simulates agent behaviours sufficiently. In this case, the developed computational model makes it possible for the boss to predict the agent's actions leading to killing in a few days. This example considers a situation in which the boss praises the agent for his achievement at step 60. That is, a profile of the observation is considered: ; "τ 2 f1; 2; 3; 4; 5g; otherwise, ω t ¼ ω 1 . Figure 5 shows time responses of the probability distribution μ D t and secondary emotions � t over five days. The meanings of the line types and colours in Figure 5 are the same as those in  Figure 5. Transitions of (a) the probability of an action a 1 used in deciding an actual action and (b) the agent's emotional behaviours. Histories of (c) the resulting actual action and (d) the second component of the optimal policy corresponding to a rational action.  (5) close to the threshold every time the observation is reprimand ω 2 , the value of P D 2 121 reaches 0.424, which does not exceed the threshold; therefore, an action of work a 0 is chosen. Because � 2 t À � 1 t ¼ 0:630 < 0:650 at step 121, the inequality condition (5) in Theorem 3.2 is not satisfied. Additionally, the POMDP determined an action of work a 0 as the optimal policy for all steps, as shown in Figure 5(d). Therefore, this example claims that if traits of the agent can be modelled, the proposed computational model helps support a human interaction to escape an emotion-driven murder case scenario.

Conclusion
In this study, we proposed a decision-making model that enables the handling of rational and irrational behaviours driven by emotion dynamics. The proposed model consists of a POMDP component that decides a rational action of the optimal policy that maximizes the expected total discounted reward, an emotion dynamics component that computes an emotion-induced probability distribution over actions, and an action choice component that yields an actual action by mixing the optimal policy and the emotion-induced probability distribution using a threshold method. This study clarified the conditions regarding the action choice via the two theorems in terms of the model parameters. Using a numerical example of a court record of a murder case, we illustrated the contributions of the proposed model and confirmed that the record was simulated with the resulting parameters through a trial-and-error method. Furthermore, we confirmed that if the traits of the agent can be modelled, the proposed computational model helps support the human interaction to escape an emotion-driven murder case scenario. Therefore, this study establishes the computational model of rational and irrational decision makings using the developed emotion dynamics expressed in the difference equation.
The relation between the proposed model and conventional models [28][29][30][31] is to compute dynamic emotions. The proposed model incorporates emotion dynamics into the decision-making process, unlike the conventional models. Especially, emotion dynamics is beneficial in enhancing the lifelikeness, believability, and perceived intelligence level of a game character, as stated in [25,26], and therefore such a game character may react more dynamically and make the game more fun. Therefore, incorporating dynamic emotions into the decision-making model is essential, as it leads to improved representation and prediction of human-like behaviour.
In future work, the authors will consider a systematic method of estimating and learning model parameters using the history of the actions and observations, by extending the threshold σ and the mixture rate δ to a time-dependent parameter. In particular, the threshold and the mixture rate are important factors in the model that characterize agent behaviours. The authors aim to clarify which data contribute to the model as agent characteristics in the presented emotion dynamics, and to specify an index to express human-mimicking behaviours quantitatively. Furthermore, the validation of the proposed model by applying several records of the human decision-making process is also a potential future work. The authors plan to develop a computational model with numerous actions and observations for expressing a richer situation in humanmimicking decision-making and extend the proposed model to capture characteristics of national culture and an emotional regulation skill. In addition, other factors, such as cognitive capabilities, limited information processing capabilities, and limited foresight, can be incorporated into the proposed model to be more precise.