What can be learned about gambling from a learning perspective? A narrative review

Abstract Gambling is a field that harbors both harmless recreational activities and pathological varieties that may be considered an addictive disorder. It is also a field that deserves special interest from a learning theoretical perspective, since pathological gambling represents both a pure behavioral addiction involving no ingestion of substances and behavior that exhibits extreme resistance to extinction. As the field of applied psychology of learning, or behavior analysis, espouses a bottom-up approach, the basis of understanding begins in basic research on behavioral principles. This article provides a narrative review of the field of laboratory experiments conducted to disentangle the learning processes of gambling behavior. The purpose of this review is to give an overview of learning principles in gambling that has been demonstrated under lab conditions and that may be of importance in the development of clinical applications when gambling has become a problem. Several processes, like the importance of delay and probability discounting, reinforcement without actual winning, and rule governed behavior have been experimentally verified. The common denominator appears to be that they impede extinction. Other areas, especially Pavlovian conditioning, are scarce in the literature. Our recommendations for the future would be to study Pavlovian and instrumental conditioning in interaction. Treatment programs should profit from strategies that serve to enhance extinction learning. We also conclude that online gambling should provide a promising environment for controlled research on how to limit excessive gambling, provided that the gambling companies are interested in that.


Introduction
Gambling is a popular recreational activity that offers momentary excitement and pleasure. The diverse array of gambling options includes a variety of card games, dice, slot machines, betting, and casino games that people play to winor, more frequently, losemoney. However, excessive (or pathological) gambling poses a serious risk to both individuals and society. In Sweden, the estimated population prevalence of current pathological and problem gambling is 2.1%, with a 95% confidence interval (CI [1.8,2.4]), and the estimated point prevalence for moderate-risk gambling is 2.2% (CI [1.9, 2.5]; Abbott, Romild, & Volberg, 2018). As a condition, a gambling disorder is associated with destroyed careers, broken marriages, financial ruin (Blaszczynski & Nower, 2002), and increased risk of suicide (Newman & Thompson, 2003) . In comparison to the recreational gambler, the pathological gambler displays a different array of behaviors, such as "loss-chasing" (Campbell-Meiklejohn, Woolrich, Passingham, & Rogers, 2008), repeated unsuccessful efforts to stop gambling and frequent lies (Denis, Fats eas, & Auriacombe, 2012). At the same time, gambling serves as a troublefree exciting recreational activity for millions of people and as an industry, provides employment and tax revenues, generating economic contributions that benefit the community (Mawhinney, 2006). Thus, gambling is a societal phenomenon whose consequences are unevenly distributed: A lot of people benefit, but some pay a high price. The adverse effects of gambling disorder on health and general living conditions have spurred an interest in research on both the factors contributing to the disorder and its treatment (Hodgins, Stea, & Grant, 2011).
From a theoretical point of view, the fact that a person can become addicted to gambling deserves special interest. Since gambling does not involve any chemical agent, it may be considered pure addiction from a psychological perspective (Lyons, 2006). It is addiction as a behavior or activity. Within the field, this addiction is described using several models, including a learning theoretical approach (e.g., James & Tunney, 2017;Weatherly & Flannery, 2008). Studying gambling harbors special value for psychology and, especially, for learning theory, since gambling may show extreme resistance to extinction and may persist despite aversive consequences that should reasonably be expected to decrease the likelihood of the behavior. A better understanding of the psychology of gambling would, thus, be informative for learning and behavioral change processes in general.
The study of gambling has received growing attention in behavior analysis and experimental psychology of learning (Dixon, Whiting, Gunnarsson, Daar, & Rowsey, 2015b). On a superficial level, there is a resemblance between the gambler's repetitive and homogenous response when pulling the lever on a slot machine and that of a laboratory animal's behavior in the operant chamber (Porter & Ghezzi, 2006). This may have contributed to the general acceptance of the idea that intermittent reinforcement scheduleswhen reinforcers follow only sometimes upon the same behaviorare responsible for gambling behavior. This was an idea originally put forth by Skinner (1953). This idea, that intermittent reinforcement responsible for the addictive properties of gambling, has been generally accepted outside the strict behavioral approach (e.g., Brevers & Noel, 2013). This is also true of the idea that gambling, through big wins, contains a potential powerful conditioning effect that accounts for people getting hooked to an unhealthy habit (e.g., Lesieur & Custer, 1984). In general though, behavior analytic research tends to fall outside the realm of clinical mainstream psychology. This in spite of its dedication to use both clearly defined dependent and clearly defined independent variables, that allow experimental manipulation (Cooper, Heron, & Heward, 2007). A dedication relevant to the task of treatment program development. The field of gambling is an example of a field where a multitude of treatment interventions have been proposed (Gooding & Tarrier, 2009), but from a behavioral perspective the theoretical rationale may appear unclear and the interventions are lacking a base in research on the assumed processes proposed to be responsible for the effect of the interventions (Weatherly & Flannery, 2008).
Behavior analysis espouses a bottom-up process in which elementary responses are to be understood prior to complex behaviors and the basic principles of learning are to be understood under controlled conditions before they are applied to the challenges in everyday life (Cooper et al., 2007). This approach actualizes the question of what can be studied under lab conditions, which offer high levels of control and the possibility for fine-grained observation and moment-to-moment analyses of the critical behaviors under experimental manipulation. When it comes to gambling, lab conditions do not offer the possibility of getting rich or losing large amounts of money, nor do they offer the tantalizing milieu of a casino or the escape from daily chores into a world of excitement. This is a problem intrinsic to gambling research: the problem of adequate and detailed study of problematic behaviors in their natural context, on one hand, and realistically studying gambling behaviors in an artificial environment, on the other (Barrett, Collins, & Stewart, 2015).
As a part of a research program mainly aimed at the development of treatment models that should be based on, or at least informed by, experimentally verified learning principles, we searched the literature on laboratory experiments on gambling and learning theory. We started broadly by entering the search-term "gambling" together with "behavior analysis," "respondent conditioning," "operant conditioning" and the like, and then went on from the sources found to secondary sources. One purpose with this article was to high-light insights from the behavior analytic research within the field of gambling that may be of value for the development of treatment programs and that we think deserves to be acknowledged outside a behavior analytic audience. This implies that we used an approach in order to capture learning principles applicable to gambling in a broad sense, rather than looking at a more circumscribed area for a state-of-the-art review.
The focus of this article is on studies that experimentally manipulate the environment that reinforces gambling behavior. Though several studies focus on the psychological characteristics of the gambler, our attention is on the contextual variables, since that is where the experimental manipulations take place. However, we also refer to studies in which personal characteristics interact with the manipulation. We will start by discussing operant learning and move on to verbal and rule-governed behavior and last address Pavlovian conditioning. This structure runs contrary to the customary principle of moving from more basic learning processes to these that are considered more advanced. However, there is a scarcity of research on Pavlovian conditioning and gambling (Weatherly & Flannery, 2008). It is also identified as a promising area for the future.

Learning processes in gambling
Schedules of reinforcement In addictions, the behavior of using a certain drug produces a reinforcing effect with a certain likelihood every time it is ingested. Gambling differs in that the supposed effect occurs at a low and unpredictable rate (James & Tunney, 2017). Gambling is frequently described in psychology textbooks (e.g., Cooper et al., 2007) as a behavior maintained by a variable-ratio (VR) schedule of reinforcement. That is, the delivery of a reinforcer depends on a variable number of responses that operate independently of the passage of time. Skinner (1953) used gambling as an example of VR schedules, writing: " … the efficacy of such schedules in generating high rates has long been known to the proprietors of gambling establishments" (p. 104). However, the term variable ratio is not fully correct. Slot machines and many other gambling activities are programmed according a random-ratio (RR) schedule, in which every response has a constant probability not contingent on prior draws, to be followed by a reinforcer (Madden, Ewan, & Lagorio, 2007). Researchers have conducted animal studies in experimental chambers to examine the choice between gambling-like RR schedules and fixed-ratio (FR) schedules, in which the number of responses per food reward is held constant (Fantino, 1967;Logan, 1965;Madden et al., 2007). The general finding is that experimental subjects strongly prefer a gambling-like source of food. Madden, Dake, Mauel, and Rowe (2005) conducted an experiment with four pigeons that pecked a key to earn food rewards according to a FR and a RR condition. The results showed that the RR and FR schedule were equally effective in maintaining behaviors at low ratio values. However, as more behavior was required per food pellet, the RR schedule maintained a significantly higher rate of responding response than the FR schedule. Two of the pigeons even failed to maintain a healthy weight under the FR schedule, but gained weight and earned more food due to increased behavioral responses when the schedule was switched to a RR. The result of this experiment may have implications for an important question regarding tolerance: that is, gradually and continuously spending more resources despite continuous losses. Tolerance is frequently exhibited among problem gamblers and is also part of the DSM-5 diagnostic criteria for gambling disorder ( American Psychiatric Association [APA], 2013). Specifically, do RR schedules produce behavior that is more resistant to gradual increases in the amount of work required per reward? As the pigeons in Maddens et al.'s (2005) study pecked for food, the response requirement was gradually increased by approximately 20%. When over 100 responses were required per pellet, the pigeons' daily food intake fell by 70%. Under these low-income conditions, the pigeons under the FR schedule substantially decreased their behavior, resembling a strike; however, the pigeons under the RR schedule continued to gamble at high rates.
Animal studies have been criticized for shortcomings in mirroring the experiences of human pathological gamblers. For example, animal subjects are often given a constant amount of food outside the experimental chambers as a standard procedure to keep motivation constant between conditions (Madden et al., 2007). Human problem gamblers, who continue to gamble despite large losses, on the other hand, cannot be sure of securing free money outside of the gambling setting. To further mimic the gambling situation and examine whether the choice to gamble is affected by income, Kendall (1989) conducted an experiment with two pigeons and a closed economy procedure, with a longer session duration and no supplemental food. Pecking the gambling key (RR schedule) had a 10% probability of earning several low-cost food rewards and a 90% risk of initiating a lengthy timeout, consisting of a darkened chamber and no food rewards. If the pigeons chose not to gamble, they were assured of obtaining a food reward after completing 30 (FR schedule) key-pecks. The result showed that the pigeons still chose to gamble on most study trials, even when their daily food intake declined by 64%.
These findings are relevant because gambling research on reinforcement schedules in human subjects is scarce. Hurlburt, Knapp, and Knowles (1980) compared VR and RR schedules in an experiment in which 20 psychology students played a computer-simulated slot machine task. However, they found no differences in terms of behavior, choice of game, or strategy. The distinction between VR and RR schedules has also been proposed as important for the gambler's fallacy, or the false belief that a random event is less likely to occur if the event has occurred recently (Suetens & Tyran, 2012). Following repeated exposure to gambling machines, gamblers could rationally expect to win after a series of losses and develop a strategy of continuous raising stakes to achieve a reward. This strategy is a fallacy when applied to a RR schedule, but not a VR schedule (Haw, 2008).
Horsley, Osborne, Norman, and Wells (2012) examined whether high-frequency gamblers were more sensitive to the partial reinforcement extinction effect (i.e., behaviors that have been rewarded intermittently persist for longer periods of non-reward than behaviors that have been rewarded continuously), than low-frequency gamblers. They conducted a computer based experiment where 19 high-frequency and 21 low-frequency gamblers were exposed to partial or continuous reinforcement, while measuring persistence of responding in extinction. Compared to the low-frequency gamblers, the high-frequency gamblers showed a larger partial reinforcement extinction effect; and made the target response a greater number of times in extinction following partial reinforcement.
In all, the research on reinforcement schedules offers credible animal models, but is more limited when it comes to human subjects. However, the effects of reinforcers do not rely solely on actual schedules; they also rely on how the individual relates to the presentation of reinforcers. This is an area with far more studies of human subjects.

Discounting the consequences
For the past 30 years, many researchers have explored the choices humans make when instructed to choose between various amounts of a reward delivered at different time intervals or probability rates. From a behavior analytical perspective, this has also been proposed as an important aspect of understanding gambling behavior (Petry, 2012;Weatherly & Dixon, 2007). As the temporal distance to gain access to an outcome or reward increases, people tend to prefer smaller, sooner rewards. This phenomenon is called temporal or delay discounting. An increase in delay produces increased, or steeper, discounting of larger, later rewards. An adjacent construct, with topographical similarities to gambling itself, is probability discounting, which describes the devaluation of an outcome that is obtained probabilistically. Shallow probability discounting describes the tendency to take risks, preferring higher rewards despite much lower probabilities of attainment (Madden & Bickel, 2010;Madden, Petry & Johnson, 2009).
The frequently described impulsive gambler in the gambling literature (cf. Petry, 2012) is typically conceptualized, from a behavior analytical perspective, as having an inability to tolerate delay (steep delay discounting) and a propensity to take risks (shallow probability discounting). Petry (2012) examined the association between discounting and treatment outcome in 226 pathological gamblers who completed a probability and delay discounting task before entering treatment. A propensity for more shallow probability discounting among pathological gamblers at baseline was associated with greater reductions in the amounts wagered during treatment and the likelihood of gambling abstinence at end of treatment and during follow-up, whereas delay discounting was not predictive of treatment outcome.
Several quasi-experimental studies have shown that pathological gamblers discount future rewards more steeply (Dixon, Marley, & Jacobs, 2003;MacKillop, Anderson, Castelda, Mattson, & Donovick, 2006;Petry & Casarella, 1999;Reynolds, 2006) and probabilistic outcomes more shallowly (Holt, Green, & Myerson, 2003;Madden et al., 2009) than nonpathological gamblers. For example, Dixon et al. (2003) compared the discounting of delayed rewards by 20 pathological gamblers and 20 matched control non-gambling participants. All participants completed a delay discounting task in which they made repeated choices between immediate rewards ($1-$1000) or a delayed reward of $1000 (one week to 10 years). The results showed that the pathological gamblers discounted the time-delayed rewards more steeply than the control non-gambling participants. See Figure 1 for a graphical presentation.
Furthermore, as a dependent variable, discounting has been shown to be susceptible to experimental manipulation. Dixon and Holton (2009) explored whether conditional discrimination training could affect future and past delay discounting. In a multiple-baseline design, five pathological gamblers completed a task designed to alter the functional properties of once-neutral stimuli associated with a choice option for smaller and larger reinforcers, as well as a pre-and post-delay discounting task. All participants discounted past rewards and future rewards similarly and both delays discounting types occurred less often following the conditional discrimination training, indicating that the magnitude of discounting can be altered. Dixon, Jacobs, and Sanders (2006) examined the relative impact of gambling and nongambling contexts on delay discounting. They assigned 20 pathological gamblers a delay discounting task that required them to make repeated choices between immediate and delayed (one week to 10 years) hypothetical amounts ($10-$1,000). The delay discounting task was presented in two conditions: in and out of the context in which the participants regularly gambled. For 16 of the 20 participants, the difference in context changed the subjective value of the delayed rewards, indicating that most pathological gamblers discounted delayed rewards to a greater degree in a gambling context.
The effect of recent history of gambling outcomes on probabilistic discounting was examined in 38 undergraduate students randomized to a simulated dice-rolling task with three conditions (Witts, Ghezzi, & Wheatherly, 2011): (1) win more than chance, (2) lose more than chance, and (3) break even. Before and after the dice-rolling task, participants completed a probabilistic discounting task, which required them to repeatedly select between smaller, guaranteed amounts (from $1 to $499) or larger, uncertain amounts (from a 5% to a 95% chance of receiving $500). Discount patterns before and after the dice-rolling task were analyzed. The results supported the notion that probability discounting can be manipulated by gambling outcomes and indicated that those who are winning may experience a greater effect on probabilistic discounting than those who are losing.
In all, these studies, including a recent meta-analysis (Kyonka & Schutte, 2018), are highly indicative of the importance of consequence discounting processes in understanding gambling as an addictive phenomenon. They also indicate that these processes are amenable to change.

Reinforcer value
It is reasonable to assume that the reinforcing consequences of gambling explain why so many people enjoy gambling and, thus, continue to engage in this behavior. However, reinforcement aspects, in themselves, do not explain why some people develop pathological gambling behaviors, while others do not. This is for the simple reason that the reinforcement schedules for a particular game are the same for all gamblers. Individual differences in behavioral trajectories upon exposure to the same reinforcement schedule must be explained in terms of individual differences in the effectiveness of the reinforcer for the particular individual. This is dependent on phenomena that precede the behavior, referred to as antecedents. Antecedents can be divided into those that signal the availability of a reinforcement or punishment, referred to as discriminative stimuli, and those that impact the value of the reinforcement or punishment, referred to as setting events or establishing operations (Michael, 1982).
Demographic and psychological risk factorsmost prominently male sex, young adult age, low socioeconomic status (SES) and comorbid alcohol, and substance misuse (Buth, Wurst, Thon, Lahusen, & Kalke, 2017)can be viewed as examples of setting events and establishing operations that moderate the value of consequences (Weatherly & Dixon, 2007). For example, a gambler with low SES will likely perceive a win of a certain amount as more rewarding than someone with higher SES. However, the fact that these risk factors are (more or less) stable traits means that they are typically beyond the realm of experimental manipulation. Other, more temporally proximal establishing operations can be put under experimental control, but are difficult to study in naturalistic settings. Such operations include reinforcement history, idiosyncratic win/loss gambling record, and inter-individual variation, which demonstrates that, while all gamblers face the same reinforcement schedule, some will lose and others will win at different times. Although reinforcement history has been argued to be an important aspect in explaining inter-individual differences in gambling behavior (Lyons, 2007), experimental support is mixed. However, the study mentioned earlier on discounting by Dixon et al. (2006) provides an excellent example of how the gambling context may act as an establishing operation for one of the problematic gambling behavior (discounting) among problem gamblers.
Earlier models favored the so-called "Big Win hypothesis," which states that gamblers who experience a large win early in their gambling career will form an unrealistic expectation that such a win will occur again, promoting pathological gambling (Custer, 1984). On the contrary, according to an operant model of gambling, an early big win could have the opposite effect: since the discriminative contrast between the early big win and subsequent losses is greater than in a typical intermittent reinforcement schedule, an early big win should promote extinction, rather than hinder it. Of the three experimental studies that have randomized participants to experiencing an early big win or not, two have found no impact on subsequent gambling behaviors (Kassinove & Schare, 2001;Mentzoni, Laberg, Brunborg, Molde, & Griffiths, 2012), and one found an early big win to be associated with less pathological gambling and earlier extinction (Weatherly, Sauter, & King, 2004).
Together, these findings suggest that an early big win is not associated with pathological gambling behaviors, at least not within the same gambling session and at levels that can be feasibly studied in experimental studies. This is accordance with research featuring experimental roulette gambling showing that the probability of placing a high bet (another pathological gambling behavior) is not influenced by the number of preceding win trials, or so-called winning streaks (Studer, Limbrick-Oldfield, & Clark, 2015). This research did, however, find an association between the probability of high-bet gambling and the number of preceding loss trials, consistent with the "chasing losses" behavioral phenomenon that is considered pathological.

Reinforcement without winning
It may seem obvious that wins are the reinforcers of gambling behavior and thereby responsible for the enjoyment, preference, and prolonged duration of gambling behavior. However, the research on reinforcers has some fascinating instances that show that winning is not necessary in order to reinforce gambling behavior.
A prominent concept in the gambling literature is the near miss or, perhaps more correctly labeled, the near-win phenomenon: an event in which the presentation of the gambling outcomes is perceived to indicate that the player was close to winning, even though each outcome combination or component shares the same statistical probability (Clark, Lawrence, Astley-Jones, & Gray, 2009). A near win could be two identical symbols on the pay-line and a third one just off the pay-line of a slot machine, or a roulette ball landing in a pocket just adjacent to the one bet upon. In a strict sense, this is a non-reinforced trial; however, the function seems to be the opposite. For gambling behavior, a near-miss event may actually decrease the rate of extinction. Players exposed to a series of near misses in video lottery significantly prolonged their gambling in a subsequent phase during which they did not win, compared to players who were not exposed to near misses ( Côt e, Caron, Auber, Desrochers, & Ladouceur, 2003). In the study mentioned above, Kassinove and Schare (2001) examined the effects of near misses and early big wins on 180 participants playing a computerized slot machine. They were interested in the persistence of gambling behavior, defined as the number of gambling trials performed under extinction (i.e., not followed by any wins). It was found that a 30% near-miss condition led to the greatest persistence (compared to 15 and 45%). As mentioned above, big wins had no effects. This persistence effect was replicated in a subsequent study (Daugherty & MacLin, 2007); however, here, the 45% condition showed greater resistance to extinction. Especially illuminating is a study by Habib and Dixon (2010), who used functional magnetic resonance imaging (fMRI) to compare activated brain regions in pathological and non-pathological gamblers under winning, losing, and near-miss conditions in slot machines. Near-miss outcomes activated brain regions associated with wins for the pathological gamblers, but regions associated with losses for the non-pathological gamblers. This implies that, for the former, a near miss is functionally similar to winning. Near misses have also been associated with significantly higher psychophysiological responses (skin conductance and heart rate deceleration) than either wins or losses (Dixon et al., 2011).
The findings discussed above raise a topic for debate: Do near wins or near misses function as rewards, or do they create frustration? Either way, they could serve as motivators that prolong gamblers' behaviors, despite not winning. When the effects of near-miss outcomes were investigated on a computer-simulated slot machine, on which the win frequency was also manipulated, latencies were found to be longer following near misses (Daly et al., 2014). Latency tends to follow reinforced trials, but not unreinforced. The authors also found that higher rates of near misses were associated with greater behavioral sensitivity to win frequency. They argued that both findings are indicative of near misses functioning as conditioned reinforcers.
A systematic review by Barton et al. (2017) concluded that most studies have shown that near misses motivate continued to play, but that results on the emotional state or betting behavior of the gambler show a more varied pattern. While most studies that included skin-conductance levels reported increased levels, near misses were found aversive events in a majority of studies where valence was rated. Near misses have also shown to lead to overestimating the actual frequency of winning. As cogently put by Foxall and Sigurdsson (2012), the "intriguing feature of near-miss outcomes in slot-machine gambling is that, while they are objectively losses, they motivate further play. The 'near-miss effect' contradicts standard reinforcement theory in which failure should punish, rather than reward, responding" (p. 5).
A closely related phenomenon is Losses Disguised as Wins (LDWs). Especially, in the now so popular multi-line slot machines visual and auditory stimuli can be used to signal something akin to a win, even though the actual result is that the player loses money (Graydon, Dixon, Stange, & Fugelsang, 2019). Studies have shown that when LDWs occur in the gambling session, gamblers tend to overestimate their win frequencies (Dixon, Collins, Harrigan, Graydon, & Fuglsang, 2015a;Dixon, Harrigan, Sandhu, Collins, & Fugelsang, 2010;Templeton, Dixon, Harrigan, & Fugelsang, 2015) and result in longer post-reinforcement pause (Templeton et al., 2015). The latter being an indication of the reinforcing effects. Over a range of studies, sound has been found to be an especially important aspect of the reinforcing properties of LDWs (Barton et al., 2017). Sounds can be used to influence game preference (Dixon et al., 2014), but it has also been found that sounds can effectively be used unmask the disguised losses (Dixon et al., 2015a). Sharman, Aitken, and Clark (2015) studied the impact of near misses where half of the subjects were exposed to near misses that were also combined with LDWs. Valence and motivation ratings were collected after each round. The LDW group reported increased valence ratings compared to the no-LDW group. Within the LDW group, trials with LDWs also resulted in increased enjoyment compared to trials without LDWs. LDWs were also found to exacerbate the motivational effects of near misses.
In all, there is compelling evidence that near misses and LDWs can be used to reinforce gambling behavior which, in effect, makes gambling behavior less dependent on actual wins to be reinforced. It also makes it less costly for the provider since the gambler behaves under control of consequences that do not require payment in order to be reinforcing.

Verbal rules and derived relational responding
As demonstrated above, there are different mechanisms both with regards to the psychological characteristics of the gambler, as well as the characteristics of the different games that may contribute to answer the pivotal question, from a learning perspective: How does gambling behavior become so notoriously resistant to extinction in some individuals, and why does it persist despite the aversive consequences associated with losing money? Manufacturers of slot machines use a variety of stimuli, such as sound, color, and special effects, to attract gamblers (Griffiths, 1993). The presence of these kinds of contextual stimuli in the gambling context may not only function as reinforcers (as discussed above), they may also overshadow the omission of reinforcers, thereby increasing resistance to extinction (Dixon et al., 2015a). Another hypothesis is that verbal rules and contingencies may compete with actual reinforcement contingencies (e.g., Dixon, Hayes, & Aban, 2000). A verbal contingency can be defined as when descriptions or verbal rules come to govern behavior by postulating a relation between a behavior and reinforcers. A defining feature of verbal or rule-governed behavior in humans is that, once established, it tends to dominate over contingency-shaped behavior (Hayes, Brownstein, Zettle, Rosenfarb, & Korn, 1986).
A series of experiments have shown that non-pathological gamblers may establish a preference for a certain gambling machine, such that they place the majority of their bets in one of two available machines, despite that the machines have equal pay-off. This can be established by preferences derived from another task in which a contextual stimulus has been established as a cue for "better than." In one experiment, participants were trained to select stimuli of differing physical quantities in the presence of two contextual cues (colors) that indicated the verbal relations more than and less than (Hoon, Dymond, Jackson, & Dixon, 2008). After this training, the participants were given the opportunity to play on two concurrently available slot machines with equal probabilities of winning. Despite identical payout, the participants allocated most of their responses to the slot machine that shared properties with the contextual cue for more than. These findings were later replicated by Hoon and Dymond (2013), who found that, once established in accordance with contextual cues, preferences could also be reversed by further training. Another study found that, when subjects were exposed to different payout probabilities in slot machines (i.e., one high and one low probability), in later trials, the subjects tended to prefer slot machines according to trained relations with stimuli presented together with the machines (i.e., nonsense syllables that had been matched through training; Dymond, McCann, Griffiths, Cox, & Crocker, 2012), despite non-reinforcement or equal probabilities of wins. These findings underscore how prior experiences, rather than reward frequencies, may impact the characteristics of the gambling behavior and, in all, contribute to an insensitivity of actual reinforcement contingencies. Instead, the behavior falls under the influence of preferences not based on actual experience, but, rather, derived in accordance with previous experience. For example, if a person has established blue as a lucky color, he or she may perceive better opportunities of winning when gambling on blue machines. This may function as a rule (e.g., "blue means luck") that competes with the extinction expected from the experience of non-winning, based on the premise that verbal contingencies (i.e., the rule) may come to dominate other influences of the behavior under study. Further, Dixon et al. (2000) demonstrated that the best predictor of when participants ceased gambling was the rules (i.e., instructions) they were provided, not the outcomes they experienced (winning or losing) while playing. If verbal contingencies can contribute to an insensitivity to actual contingencies, it raises the possibility that they could but put to work in the reverse. Weatherly and Meier (2008) let 18 non-pathological gamblers play a slot machine for money, then gave them accurate information about the independence of turns programmed by a slot machine, the negative rate of return of a slot machine over time, or both. The results showed that accurate information significantly decreased gambling, but did not eliminate it completely. These findings are important given that many treatment programs rely on verbal behavior and providing adequate rules in order to counter problem gambling. Describing the programs in these behavior analytic terms in uncommon, though.

Pavlovian conditioning
Thus far, we have considered gambling from an operant perspective, according to which gambling behaviors are governed by their reinforcing consequences and from the perspective of derived relational responding or rule-governance. However, when approaching a phenomenon like gambling from a learning perspective, one would reasonably expect Pavlovian (often labeled classical or respondent) conditioning to be covered in the extant literature. Surprisingly, there is a striking dearth of research on the role of Pavlovian conditioning, especially experimental conditioning (Weatherly & Flannery, 2008), in gambling. According to the Pavlovian account, if exposure to an unconditioned stimulus (US), gambling, with all its associated features of excitement and pleasure, is paired with related neutral stimuli (visual, auditory, olfactory, or tactile), these latter stimuli could be established as conditioned stimuli (CS) that evoke a conditioned response (CR). The CR could be conceptualized as the organism's preparation for the US (i.e., the actual gambling) and may, in turn, serve as discriminative stimuli for operant behavior. According to prominent models of addiction in general (e.g., Siegel, 1983), the responses that follow gambling are physiological and psychological arousal. If a conditioning process were to occur, stimuli in the environment would be conditioned to this arousal, thereby serving as predictors capable of evoking preparatory responses and associated emotions. Although there is empirical support for gambling cues evoking arousal (Sharpe, Tarrier, Schotte, & Spence, 1995) and this cue-evoked arousal decreases after successful cognitive behavioral therapy (Freidenberg, Blanchard, Wulfert, & Malta, 2002), we are not aware of any research that has experimentally manipulated the arousal response to investigate whether it moderates successful conditioning to gambling-related stimuli or impacts subsequent gambling behaviors.
The literature on aversive conditioning is similarly scarce and has focused on identifying antecedents that may explain why individual differences in risk of developing problem gambling, rather than experimentally manipulating the actual learning processes. Gambling by design includes monetary losses, and monetary losses can act as secondary (learned) punishers (Delgado, Jou, & Phelps, 2011;Delgado, Labouliere, & Phelps, 2006) Thus, impairments in the ability to learn associations between stimuli and aversive outcomes may result in failed avoidance of such stimuli. Congruently, one study found that individuals who show impaired aversive conditioning on an unrelated learning task show less risk-avoidance in a subsequent simulated gambling task (Brunborg et al., 2010). In a follow-up study, pathological gamblers, as expected, were found to show impaired aversive conditioning .
If the Pavlovian account of pathological gambling and other addictions holds true, then the clinical analog of Pavlovian extinction, cue exposure therapy (CET), ought to be an efficacious treatment of addictions. In CET, repeated exposure to the CS in absence of the US should extinguish the CS-US association, such that the CS is less likely to evoke the CR, which, in turn, should reduce gambling behaviors. In the case of both substance use disorders (Conklin & Tiffany, 2002) and alcohol use disorder (Mellentin et al., 2017), meta-analyses have shown that CET does not appear to be an efficacious treatment. Fewer studies have explored CET for pathological gambling, but clinical results are not promising (Jimenez-Murcia et al., 2012).
A more promising approach to integrating Pavlovian conditioning into the study of gambling behaviors could be the Pavlovian-to-instrumental transfer (PIT) phenomenon, which thus far has not been studied with gambling-related stimuli, behaviors, or consequences (arguably, the closest being a stock market paradigm; Allman, DeLeon, Cataldo, Holland, & Johnson, 2010), but can be reliably experimentally induced in both animals and humans (Cartoni, Balleine, & Baldassarre, 2016). PIT describes the phenomena in which a cue (stimulus) becomes hierarchically associated with an operant contingency that features a behavior that predicts the same (or similar) outcome as in a Pavlovian contingency, despite the two being acquired (learned) separately. In other words, in the presence of a stimulus S (which has already been separately associated with the outcome O1), organisms will be more likely to engage in behavioral response R1 (which has already also been separately associated with O1) than R2 (associated with O2). Importantly, once an hierarchical S:R-O association has been established, it becomes resistant to extinction through separate instrumental and Pavlovian processes (Hogarth et al., 2014). These phenomena may explain why gambling behaviors become resistant to extinction, assuming that there are, indeed, carry-over effects from previous Pavlovian conditioning on subsequent instrumental behaviors. In the case of gambling, there is an abundance of stimuli that may enable the PIT phenomenon. However, research has yet to show that PIT occurs in real-world gambling or the experimental equivalent.

Discussion
This narrative review has covered experimental research on gambling behavior, some performed with non-human subjects, some performed with people with no gambling problems, and some performed with problem gamblers. Our intention was to search the field of basic experimental research on learning processes that underpin gambling as a behavior, including what makes it a potentially addictive phenomenon. From a commonsense perspective, it is easy to understand the occasional gambler's drive to win, but harder to understand the compulsive gambler's inability to learn from the overwhelming number of non-successful trials. The latter constituting a part of a pathological condition that may pose a serious threat to the health of the individual as well as a social problem.
As we have shown, objections can be made to the popular idea that gambling as a behavior builds on an intermittent VR schedule of reinforcement. Gambling should, instead, be treated as an RR schedule of reinforcement (Madden et al., 2007). While there is lack of experiments on human subjects unequivocally demonstrating the importance of reinforcement schedules for gambling habits to develop, animal research has provided compelling laboratory analogs showing that behavior maintained through RR reinforcement is highly similar to gambling behavior in humans. There seems to be a preference for gambling-like distribution of reinforcers (e.g., Kendall, 1989;Madden et al., 2005). A second popular notion has been the idea of an early big win increasing the risk of developing pathological gambling. Although seemingly congruent with learning theory, in that the event would powerfully condition cues associated with gambling as appetitive stimuli that would evoke gambling responses, there is, in fact, no experimental evidence to support this hypothesis (Kassinove & Schare, 2001;Mentzoni et al., 2012, Weatherly et al., 2004. As we have argued, these findings would make sense from an operant perspective, since the singular big win provides a more salient contrast to extinction (no wins), and thereby facilitates discrimination.
From a learning point of view, processes that impede extinction (i.e., not learning from unsuccessful trials) may provide important clues to why gambling behavior can persist over time and despite negative consequences for the individual, both direct (e.g., financial loss) and indirect (e.g., relationship problems). Here, one finds compelling evidence for experimentally verified learning theoretical principles. There seems to be a consensus that discounting consequences is a process that is central to the understanding of gambling behavior (Petry, 2012;Weatherly & Dixon, 2007). It is a phenomenon that tends to be activated in gambling contexts (Dixon et al., 2006). The exact role of discounting processes in pathological gambling has yet to be specified, but they might provide a clue to the individual trajectories that lead, in some people, to a gambling disorder. Discounting processes seems to be amenable to change by targeted interventions. These interventions may contain training in orienting behavior toward more distant and realistically attainable consequences. Discount training procedures have been used in school settings (Schweitzer & Sulzer-Azaroff, 1988), but could be adapted to a gambling context. This would imply training the individual to discriminate adverse long-term consequences and the minimal likelihood of appetitive consequences. Preferably this should be trained in close proximity to the gambling context since this context may increase the likelihood of discounting.
It is customary to distinguish between operant and respondent (or Pavlovian) conditioning, the latter of which is implied in emotional learning. From a common-sense perspective, gambling means excitement and thrill, but also anxiety. Nevertheless, research on respondent conditioning processes in gambling research is scarce. This means that neither the basic principles of how people attach emotional significance to gambling, nor the contextual cues that signals the availability of gambling is covered in the experimental literature. Neither has the interaction between instrumental and respondent conditioning been studied to any extent. We also failed to identify experiments targeting the process of "losschasing," which have been proposed as a key symptom of gambling disorder (Chamberlain, Stochl, Redden, Odlaug, & Grant, 2017). This must be considered a major shortcoming if researchers wish to understand the basic learning processes of gambling. In relation to the emotional aspects of gambling, we can also conclude that laboratory analogs to the clinical phenomenon lack the negatively reinforced functions when gambling behavior reduces aversive thoughts and emotions. This also suggests that the manipulation required to realistically recreate situations analogous to clinical phenomena is neither possible nor ethically justifiable. When the clinical perspective is taken into account, the interaction between Pavlovian and instrumental conditioning, and especially the transfer effects, opens the possibility that effective treatment should involve a dual extinction processrespondent as well as operant.
Two other phenomena that complicate the consequential side of learning are near misses and losses disguised as wins. These are events where an omission of what is expected to be a reinforcer actually functions as a reinforcer for gambling behavior (Barton et al., 2017;Daly et al., 2014;Foxall & Sigurdsson, 2012). This is a phenomenon that is also highly relevant not only for understanding gambling behavior, but for understanding learning processes in themselves, since it appears to contradict the keystones of learning theory. The event is experienced as a signal that one was close to winning, despite all outcomes having equal probabilities in every trial and the result, objectively, being a loss. The outcome could be considered an example of derived relational responding (Dymond & Roche, 2010), in which the experienced proximity to winning is derived from arbitrary rules, rather than an experience of actual contingencies. Contextual factors in the gambling environment may also contribute to gambling through the human tendency to behave in accordance with verbal rules and descriptions of possible outcomes. One thing that seems to withstand scrutiny in the experimental situation is the tendency to allocate gambling behavior to a context associated with better chances of winning, despite the chances for different contexts being equal (Dymond et al., 2012;Hoon et al., 2008;Hoon & Dymond, 2013). If the hypothesis that gambling behavior may be largely rule-governed, withstands further empirical examination, it suggests a process that could decrease the sensitivity of gambling behaviors to actual contingencies and impede extinction in situations in which this could reasonably be expected (Catania, 2007;Hayes et al., 1986).
But research has shown that stimuli, like sounds, could be used to highlight LDWs, and that they actually are losses (Dixon et al., 2015a). If near wins and LDWs are used by gambling manufacturers to reinforce gambling behavior, learning principles could guide the development of responsible gambling tools. Again, the clinical task when it comes to interventions with problem gambling lies in facilitating discrimination of the actual distribution of rewards.
A very general principle of psychological treatments could be formulated "Help the client discriminate between current functional classes of responding and the problematic consequences produced by that responding" (p. 258) (T€ orneke, Luciano, Barnes-Holmes & Bond, 2015). This aligns very well with the above described principles. Effective treatment should contain elements in which the subject learns to determine whether he or she is operating under extinction and realizes that near misses, discounted consequences, and beliefs and rules may be detours from an adaptive, chosen long-term orientation of behavior. As gambling behavior appears to be at least partially rule-governed, this may involve learning to apply accurate rules. For instance, rules that inform actual chances of winning, rather than offering illusionary prospects. But also relating the gambling behavior to self, and developing an alternative repertoire of responding, that is in accordance with chosen long-term goals and values (T€ orneke et al., 2015).
The validity of an experimental approach rests upon the potential to effectively manipulate the environment to influence behavior. However, this is not solely an experimental end; it also has the potential to use experimentally verified principles to influence gambling behavior. Within the industry, the concept of gaming productivity can be defined as accelerated play, extended duration, and an increase in total expenditure (Sch€ ull, 2012). As clinical psychologists, we are interested not only in understanding the behavior, but also in finding an effective remedy to the clinical problem: in gambling, the downsides of accelerated play, extended duration and increased total expenditures.
While the overall conclusion of this review is inevitably that there is a vast field yet to be researched to achieve a fuller understanding of the learning processes involved in gambling, it is nevertheless true that the field does provide some clues to interventions that could be grounded in basic research. An important role for these basic lab studies on learning would be to point out the most probable processes for maintaining gambling behavior. This should serve as a guide to develop treatment interventions and assessment procedures and to point out phenomena with clinical implications for empirically informed psychological treatments. Our suggestion for the future would be that they stress discrimination learning in order to support the extinction process that ought to be the natural consequence of losing, rather than winning.
Last, we conclude that the relative dearth of naturalistic experimental and observational research on gambling behaviors is ironic, given the abundance of online gambling wherein every single stimulus presentation, gambling behavior, and outcome for every user is automatically logged, providing real-life Big Data that are, unfortunately, unavailable to most researchers (Sch€ ull, 2012). While certain aspects of gambling services are typically strictly regulated by an independent authority (e.g., slot machines' return-to-player percentage), other aspects of interest in behavioral analysis are not, and could, thus, be manipulated and evaluated within an A/B testing framework during actual online gambling. Online gambling is also unique among addictions in that the addictive behavior takes place in the very same medium by which digital interventions could be provided, allegorically akin to having bartenders deliver treatment for problem drinking. The idea that the gambling platform itself would perform behavioral tracking and intervene is not unrealistic: so-called responsible gambling toolsoften with voluntary features, such as self-tests with feedback and the possibility to set spending limits and freeze servicesare already common among online gambling service providers and are even mandatory in many jurisdictions (Forsstr€ om, Hesser, & Carlbring, 2016). A novel generation of responsible gambling tools featuring behavioral interventions grounded in learning theory, carefully designed to identify and extinguish excessive but not recreational gambling, have the potential to make a great public health impact, but will likely not be implemented voluntarily by the gambling industry.