Incidental learning of trust from eye-gaze: Effects of race and facial trustworthiness

ABSTRACT Humans rapidly make inferences about individuals’ trustworthiness on the basis of their facial features and perceived group membership. We examine whether incidental learning about trust from shifts in gaze direction is influenced by these facial features. To do so, we examined two types of face category: the race of the face and the initial trustworthiness of the face based on physical appearance. We find that cueing of attention by eye-gaze is unaffected by race or initial levels of trust, whereas incidental learning of trust from gaze behaviour is selectively influenced. That is, learning of trust is reduced for other-race faces, as predicted by reduced abilities to identify members of other races (Experiment 1). In contrast, converging findings from an independently gathered set of data showed that the initial trustworthiness of faces did not influence learning of trust (Experiment 2). These results show that learning about the behaviour of other-race faces is poorer than for own-race faces, but that this cannot be explained by differences in the perceived trustworthiness of different groups.

With just a brief glance at a person, the basic physiognomy of the face provides sufficient information for its rapid classification. In a very short space of time (100-200 ms), people make inferences about whether a face belongs to their own or another social group and whether a person can or cannot be trusted (e.g., Caldara, Rossion, Bovet, & Hauert, 2004;Willis & Todorov, 2006). However, basing our behavioural decisions on such physiognomy can be harmful, leading to rigid responses that reinforce discrimination. In addition, it can render us incapable of responding to the dynamics of behaviour and adjusting our behaviour towards others in light of incoming information. The central focus of this paper is to investigate the relationship between face physiognomy that enables a rapid classification of a face in terms of race or levels of trust, and how the subtle dynamic changing behaviours that signal deception can be learned.
Previous work has indeed shown that changeable aspects of faces, such as view direction or emotion, can alter judgments of trustworthiness. For example, faces expressing a smile are trusted more than faces expressing anger (e.g., Caulfield, Ewing, Burton, Avard, & Rhodes, 2014;Sutherland, Young, & Rhodes, 2016). Importantly, within such studies these properties of a face are present and therefore attended to whilst trustworthiness judgments are being made. However, there are subtle dynamic facial behaviours that could be learned, even when they are not explicitly attended to. These behaviours might affect later trustworthiness judgments even though the face does not possess these properties at a later judgment time. To investigate this issue, we explore the incidental learning of patterns of gaze shifts made by a face that is being ignored.
When we see the eyes of a face re-fixate to another point in space, then our own attention follows their gaze. This results in enhanced processing for objects in that area of space, and poorer or disrupted processing for objects elsewhere (Friesen & Kingstone, 1998; for review, see Frischen, Bayliss, & Tipper, 2007). Previous research has shown that this gaze cueing effect is powerful and difficult to inhibit, that people show cueing effects even when they know that gaze direction is uninformative or that the likelihood of invalid cues is high (Driver et al., 1999), and that they are sensitive to misleading gaze cues even when they know the true future location of the target .
Given that these cues are such a salient and powerful form of nonverbal communication, they can be used to direct an interaction partner's attention around the environment, either helpfully (directing them to appetitive or aversive stimuli that merit attention) or deceitfully (misdirecting them away from such stimuli). Furthermore, observers are able to detect consistent gaze patterns, and use this information to guide subsequent decisions: faces that provide consistently valid (helpful) cues are both rated and treated as more trustworthy than those that provide consistently invalid (deceitful) cues (Bayliss & Tipper, 2006;Manssuer, Pawling, Hayes, & Tipper, 2016;Manssuer, Roberts, & Tipper, 2015;Rogers et al., 2014;Strachan & Tipper, 2017;Strachan, Kirkham, Manssuer, & Tipper, 2016).
This paradigm, which measures incidental learning of trust in that participants are instructed to ignore the faces during gaze cueing, has been used multiple times to explore different facets of this trust learning process. The effect has been shown to be durable (Strachan & Tipper, 2017), related to involuntary facial muscular and electro-cortical activity associated with emotional responses , and sensitive to the emotion of the cueing face (Bayliss, Griffiths, & Tipper, 2009;Strachan et al., 2016). However, one outstanding question is how a fixed property of the cueing face that conveys higher order social information, such as its race, may affect (or not affect) this learning.

Racial group membership
Humans are social creatures, and we rely on social groups to survive. These social groups can range from a smaller personal scale (e.g., a close circle of friends) to a much larger societal scale (our national identity, gender, race; Lickel, Hamilton, & Sherman, 2001). People prefer individuals who are members of their in-group over their out-group, even when this group distinction is a new category that has been learned in the laboratory (Allen & Wilder, 1975;Tajfel, Billig, Bundy, & Flament, 1971).
One of the most studied and historically important social group categories is that of race. Aside from showing preferential biases towards own-race over other-race faces (Dasgupta, Mcghee, Greenwald, & Banaji, 2000), people are also better at recognizing the posed emotions of own-race than other-race faces (Elfenbein & Ambady, 2002a, 2002b, remember own-race faces better than other-race faces (Meissner & Brigham, 2001), and are more sensitive to gaze cues provided by own-race than other-race faces (Dalmaso, Galfano, & Castelli, 2015;Pavan, Dalmaso, Galfano, & Castelli, 2011). There is also evidence that these own-race biases are linked to decisions about trustworthiness in both explicit ratings and economic games (Stanley, Sokol-Hessner, Banaji, & Phelps, 2011).
In Experiment 1, we explore whether trust learning is the same for individuals who differ in terms of racial group membership. To this end, we test British Caucasian participants using images of White (own-race) and East Asian (other-race) faces in the cueing paradigm and ask participants to rate them in terms of trust at the beginning and end of the experiment.
We predict that learning about other-race faces from subtle and incidental gaze cues will be impaired. The trust learning task requires that gaze patterns be associated with a specific face identity. That is, when a particular face produces consistent gaze behaviour, for example always looking away from targets, this behaviour has to be associated with the cognitive representation of that individual. However if, as previous research suggests (Dalmaso et al., 2015), people can be less susceptible to gaze cues provided by otherrace faces, and less efficient at remembering otherrace faces (Meissner & Brigham, 2001;Sessa & Dalmaso, 2016), then learning these faces would also be impaired. Therefore, we expect that incidental learning of trust will be weaker in other-race as compared to own-race faces.

Experiment 1
This experiment explored whether the original trust learning effect (Bayliss & Tipper, 2006) emerges similarly for faces that belong to real world social ingroups and out-groups by using White and East Asian faces with Caucasian British participants.

Participants
In Experiment 1, there were 30 participants in total (all Caucasian British, 26 female, M age 22.23). Sample size was decided on the basis of previous studies that investigated incidental social learning from gaze cues which used 20-30 participants (Manssuer et al., 2016, Experiment 2;Rogers et al., 2014;Strachan et al., 2016;Strachan & Tipper, 2017). Given that our manipulation involved a more complicated design than previous experiments (with race as an additional independent variable), we opted for the larger end of this range with 30 participants. All participants provided written consent and the study was given ethical approval by the Ethics Committee of the University of York Psychology Department.

Stimuli
Target stimuli for the object categorization task were kitchen and garage object images used in Bayliss and Tipper (2006). There were 13 unique objects in each category (kitchen/garage) and these appeared in both left and right orientations. All stimuli were coloured in blue. In total there were 52 individual images used in the experiment.
Face stimuli throughout the experiment were taken from the MR2 Face Database (Strohminger et al., 2015). This database comes with a set of ratings for each face on a range of attributes, including trustworthiness on a scale of 1 (untrustworthy) to 7 (trustworthy); these are publicly available on the Open Science Framework (OSF; https://osf.io/uwk4v/). The eight (four female, four male) East Asian faces and eight (four female, four male) White faces were selected on the basis of these ratings to be similar in terms of apparent trustworthiness (East Asian faces: M = 4.13, SD = 0.14; White faces: M = 4.12, SD = 0.21). These images were then edited in Adobe Photoshop CS6 to remove the grey background and edit the direction of eye-gaze to create three versions of each face: straight, left, and right gaze. These stimuli were used in the gaze-cueing procedure, while faces with unedited eyes were used in the trustworthiness ratings and one-back procedure.
For each participant, the cueing behaviour of faces was set such that each face would provide a valid or Figure 1. (a) Examples of the four different conditions in which faces were presented in Experiment 1: out-group valid, out-group invalid, in-group valid, in-group invalid. In Experiment 2, the conditions were faces that were previously rated as high and low in trustworthiness. (b) Schematic of two gaze cueing trials: out-group valid (top row) and in-group invalid (bottom row). The duration of each trial event is displayed along the bottom for Experiment 1 and Experiment 2. If participants made a mistake, an error tone would play between the last two trial events. invalid cue 100% of the time, and face validity was manipulated orthogonally to race, such that there were four conditions of faces: East Asian Valid, East Asian Invalid, White Valid, and White Invalid, with four identities in each condition (see Figure 1(a)). The validity of faces was counterbalanced across participants.
The study was run on an Intel Core i5 PC with a 21.5-inch monitor. The experiment was presented using E-Prime 2.0 software with a white background throughout and the resolution set to 1024 × 768 pixels. Participants were sat approximately 60 cm from the display, and during trustworthiness ratings the face stimuli had a visual angle of 19.29°horizontally and 20.97°vertically, while during gaze-cueing the face stimuli had a visual angle of 13.36°horizontally and 14.93°vertically.

Design and procedure
Participants initially completed a one-back recognition task at the beginning of the experiment with all the faces used later as stimuli. This was done because Strachan and Tipper (2017) have shown that greater familiarity with faces can improve trust learning. In that experiment, participants were asked to match faces across images that varied in expression and viewpoint, but the MR2 database does not provide any such variation. Therefore in this experiment we included a oneback recognition task, where faces were presented in sequence and participants had to respond with the SPACE bar if they saw the same face repeated twice in a row. This encourages participants to encode details of the faces and store them in working memory, at least until the next face is shown, and with repeated exposures this should allow participants to become familiar with the face identities.
Participants were told that they would be asked to perform an object categorization task on images of objects that appeared on the left or right side of the screen, and to respond with whether these were garage or kitchen objects. They were also told that the central face images were irrelevant and to be ignored. Before the experiment, participants were allowed to study printed versions of the kitchen/ garage items, in order to familiarize themselves. This was done firstly to ensure that participants knew what each object was, and secondly to ensure that early responses from the first trial block were not confounded by uncertainty as to the object categories of the targets.
Each trial began with a 600 ms fixation cross in the centre of the screen, which was then replaced by a face showing a direct gaze for 1500 ms. The face then shifted gaze either to the left or the right for 500 ms before the target stimulus appeared on either the same (valid) or opposite (invalid) side of the gaze direction. The target stimulus remained either until the participant's response was logged or until 2500 ms had passed, following which participants received feedback from an error tone that would sound if an incorrect response were logged. The face then shifted back to direct gaze for another 1000 ms. A blank screen followed for 500 ms before the next trial began. The trial structure is shown in Figure 1 The object categorization responses were the H key and the space bar of a keyboard, chosen because the H key appears directly above the space bar on QWERTY keyboards and this direction was orthogonal to the possible location of the target. Participants were instructed to respond with their index finger on the H key and thumb on the space bar. For half the participants, H represented kitchen objects, while for the other half it represented garage objects.
In total, there were five blocks of 32 trials each, and each face appeared twice in each block, once gazing left and once right (10 times in total across the experiment; five left, five right, but always either valid or invalid depending on the identity). The order of faces was randomized, as were the order of target objects, the side that the target appeared, and the order of valid and invalid trials.
At the beginning and the end of the experiment, participants were shown all 16 faces (un-manipulated original images) in a random order and asked to rate how trustworthy they found them. A calibration screen would appear with the question, "How TRUST-WORTHY do you think this person is?" with the word "START" written beneath. Participants had to click the word "START" with the mouse to progress the trial, after which the face would appear for 1000 ms. Following this, the face would disappear and a screen with an uninterrupted rating scale appeared. Participants were instructed to click along the scale with the computer mouse at the point that corresponded to how trustworthy they thought the person was. The scale recorded responses between −100 and +100, calculated by the distance from the centre of the line of the participants' mouse click. Responses to the left of the centre of the line were coded as negative, while those to the right were coded as positive (these were indicated on the screen with aand + sign at either end of the scale).

Data analysis
Before the data were analysed, participants' responses were filtered to remove all error trials (where participants reported the incorrect answer) and reaction time (RT) outliers: RTs below 250 ms (too short to process the stimuli; 0.10% of trials) and above 2500 ms (indicating that participants had not given a response in the allotted time; 0.44% of trials). The number of remaining trials was then compared with the original number of trials to check that all participants retained at least 70% of their total trials and had not scored below 70% total correct on any one condition. No participants were removed on this basis.
As well as RT filters, we also examined participants' pre-ratings. Participants' ratings to in-group and outgroup faces were averaged and examined to ensure that the average for neither group exceeded 70 on the 100-point scale in either direction. This was done because an average to one group that exceeded 70 suggested that participants gave ratings to multiple faces that used the far ends of the scale before any trustworthiness induction was performed, resulting in a floor or ceiling effect where any effect of our manipulation would be masked. For example, if one participant rated other-race faces as appearing extremely untrustworthy (using the far left of the trustworthiness scale) then they would be excluded as any trust learning would be subject to a floor effect. No participant was removed on this basis in Experiment 1.
Gaze cueing was examined in terms of RTs and accuracy rates separately using 2 × 2 repeated measures ANOVAs, with race (own-race/other-race) and validity (valid/invalid) as factors. Incidental trust learning was tested with a 2 × 2 × 2 repeated measures ANOVA with time (before/after the experiment), validity and race as repeated measures factors and trustworthiness rating as dependent variable. All analysis was run using the ez package in the statistical software R.

Trustworthiness ratings
The results of trustworthiness ratings at the beginning and end of Experiment 1 are shown in Figure 3 (top row). A 2 × 2 × 2 ANOVA looking at time, validity and race found a main effect of time (F(1,29) = 12.29, p = .001, h 2 p = 0.30). There was also a main effect of validity (F(1,29) = 5.18, p = .030, h 2 p = 0.15), as invalid faces were rated as less trustworthy than valid, but none of race (F(1,29) = 0.64, p = .430, h 2 p = 0.02), as own-race faces were not rated as more trustworthy over the course of the whole experiment than other-race faces. A significant interaction of time and validity was found (F(1,29) = 9.07, p = .005, h 2 p = 0.24), indicating that there was significant learning of trust over time as a function of gaze cueing behaviour. There was also a significant interaction of time and race (F (1,29) = 4.84, p = .036, h 2 p = 0.14), which appears to be driven by the fact that a slight own-race bias in preratings was less evident in post-experiment ratings, and a non-significant interaction of validity and race (F(1,29) = 3.41, p = .075, h 2 p = 0.11). Importantly, there was a three-way interaction of time, validity, and race, indicating that trust learning over the experiment was affected by race (F(1,29) = 4.45, p = .044, h 2 p = 0.13). We broke this down into separate 2-way ANOVAs that looked at the effects of validity and race at the beginning and the end of the experiment, separately. At the beginning of the experiment, there was no main effect of validity (F(1,29) = 0.68, p = .415, h 2 p = 0.02), as participants had not been exposed to faces' valid or invalid behaviours at this point. Participants did rate own-race faces on average as more trustworthy (M = 9.89, SD = 30.53) than other-race faces (M = 4.73, SD = 25.55) but this was not significant (F(1,29)  There are two key findings from Experiment 1. First, gaze cueing where participants follow the gaze direction of another person is unaffected by whether the viewed face is a racial in-group or out-group member. This is surprising given previous research that shows that race can affect susceptibility to gaze cues (Chen & Zhao, 2015;Chen et al., 2017;Dalmaso Figure 2. Reaction time in milliseconds to valid (light grey) and invalid (dark grey) trials in Experiment 1 (top plot; own-race trials on the left, other-race trials on the right) and Experiment 2 (bottom plot; highly trustworthy faces on the left, low trustworthy faces on the right). Error bars show ± 1 within-subjects standard error. Table 1. Accuracy rates (percent correct with standard error) averaged across subjects in Experiment 1 (own-race/other-race faces) and Experiment 2 (high/low trustworthy faces) for valid and invalid trials.  Pavan et al., 2011). However, some of these findings suggest that this effect of race is mediated by a sense of inter-group threat, that is, when participants feel that out-group faces appear threatening (due to their out-group status), people are less susceptible to gaze cues. Other studies that have found effects of race often use Black faces (rather than East Asian faces), which for White participants often carry a threatening connotation. In contrast, stereotype content for East Asian identities tends to be more nuanced (Lin, Kwan, Cheung, & Fiske, 2005), and may be less likely to be spontaneously perceived as threatening. Indeed, evidence from electroencephalography suggests that race may affect face processing more when faces show direct gaze, rather than averted (Sessa & Dalmaso, 2016), which suggests that some additional context is required for participants to spontaneously use race to inform gaze following. Second, and in contrast to attention cueing effects, incidental learning of trust from the predictive gaze patterns of ignored faces was influenced by race. That is, trust learning was larger and more robust for own-race faces. As noted above, there is a wealth of previous literature that suggests we might see a difference in incidental learning processes between faces of different races (Dasgupta et al., 2000;Elfenbein & Ambady, 2002a, 2002bMeissner & Brigham, 2001). However, that this learning occurred even without differences in attentional cueing suggests that this effect does not arise as a result of differences in Figure 3. Trustworthiness ratings from Experiment 1 (top row) with own-race (left plot) and other-race faces (right plot), and Experiment 2 (bottom row) with faces high in trustworthiness (left plot) and low in trustworthiness (right plot). Ratings are shown over time separately for valid (dotted lines) and invalid (solid lines) trials. Error bars show ± 1 within-subjects standard error. sensitivity to gaze leading to different disruptions of processing fluency (cf. Strachan et al., 2016).
There are two potential explanations for this effect: this result is driven by out-group homogeneity, as participants are less likely to individuate other-race members than own-race members; or participants more efficiently encode and store face identity for own-race than other-race members. However, as noted above, there is evidence that other-race faces are trusted less than own-race faces (Stanley et al., 2011). In Experiment 1, participants showed a nonsignificant bias in pre-experiment trustworthiness ratings to judge own-race faces as more trustworthy than other-race faces, even though faces were initially matched for trustworthiness when they were first selected. Although this was not significant, it is still possible that subtle differences in preconceptions about trustworthiness could have driven different strategies of learning for different identities. To investigate this, we report data from an earlier, independently run, experiment that directly tests the role of trustworthiness in this incidental learning effect.

Experiment 2
Much research has investigated the physical features of a face that predict how trustworthy it is perceived to be Todorov, 2008;Todorov, Baron, & Oosterhof, 2008;Todorov, Pakrashi, & Oosterhof, 2009). Physiognomic facial configurations such as wider jaws, lower brow ridges and other signals that resemble emotional expressions ) are processed quickly and automatically. Reliable ratings of attributes such as trust can be observed after only 100 ms (Willis & Todorov, 2006), and the features used to make these decisions are consistent enough that they can be visualized and predicted using image-based analysis of ambient (i.e., not posed) images (Vernon, Sutherland, Young, & Hartley, 2014). Experiment 2 was designed and run independent of Experiment 1, and aimed to address how these physiognomic features may affect trust learning. In the experiment used to collect these data we manipulated the baseline trustworthiness of the face (high/low trustworthiness) and tested trust learning (in a similar way to race in Experiment 1). Such a manipulation would create expectations (e.g., that trustworthy people will cooperate while untrustworthy people will deceive) and these expectations may interact with incidental learning of trust from eye-gaze behaviour. Given the expectation that trustworthy people are better social partners, participants may be more inclined to incidentally learn about their behaviour for reference in future interactions. If differences in social learning from ownrace and other-race faces are due to different levels of trust, we would expect that this independent experiment would have found the same profile of learning as Experiment 1 (that is, greater learning for trustworthy than untrustworthy faces).

Participants
Participants were 30 students from Bangor University (29 female, M age 20). No participants were removed on the basis of RT filters (as detailed in Experiment 1). The study was given ethical approval by the Bangor University ethics committee. Details of participants' racial identity were not collected for these data.

Stimuli, design and procedure
The face stimuli were taken from the Karolinska Database of Emotional Faces (KDEF; Lundqvist, Flykt, & Öhman, 1998). All faces were female and selected based on ratings from Oosterhof and Todorov (2008). Sixteen faces were selected, eight of which were the faces rated highest for trust and eight of which were rated lowest for trust.
Some details of this experiment differed slightly from Experiment 1, because they were run independent of each other. At the beginning of the trial, a fixation cross appeared for 1500 ms followed by a directly gazing face for 1500 ms. The face then changed gaze direction and remained for 500 ms after which an object appeared to the left or right side of the face and disappeared as soon as a response was made or until 3000 ms elapsed. When a response was made, the object disappeared and the face gazed directly ahead again for 2000 ms. These timings are shown in Figure 1(b). During trustworthiness ratings, the scale was labelled with "Very untrustworthy" and "Very trustworthy" (respectively). The faces subtended approximately 7.57°horizontally and 10.23°vertically from a distance of 60 cm. Stimuli were displayed at a screen resolution of 800 × 600 pixels in the cueing phase and at 640 × 480 pixels in the rating phases. The experiment was displayed on a 19-inch Iiyama Vision-master CRT display. All stimuli were presented on a grey background. All other details were the same as those described in Experiment 1.

Data analysis
All details of data analysis are identical to those outlined in Experiment 1, with the exception that in this experiment no participants were removed on the basis of pre-processing filters, and face trustworthiness replaced race as a factor in all analyses.

Trustworthiness ratings
The results of trustworthiness ratings at the beginning and end of this experiment are shown in Figure 3 (bottom row). A 2 × 2 × 2 ANOVA looking at time, validity and trustworthiness found no main effect of time (F(1,29) = 0.00, p = .962, h 2 p = 0.00) or validity (F(1,29) = 4.09, p = .053, h 2 p = 0.12), but did find a main effect of face trustworthiness on judgements (F(1,29) = 123.59, p < .001, h 2 p = 0.81). A significant interaction of time and validity was found (F(1,29) = 11.11, p = .002, h 2 p = 0.28), indicating that there was significant learning of trust over time as a function of gaze cueing behaviour. However, no other interactions were significant, including the crucial three-way interaction of time, validity and trust (F(1,29) = 0.01, p = .940, h 2 p = 0.00; all other Fs < 1).
We broke this down into separate 2-way ANOVAs that looked at the effects of validity and trustworthiness at the beginning and the end of the experiment, separately. At the beginning of the experiment, there was no main effect of validity (F(1,29) = 1.29, p = .266, h 2 p = 0.04), as participants had not been exposed to faces' valid or invalid behaviours at this point. There was, however, a large difference in pre-ratings of trust assigned to high-trustworthiness (M = 21.28, SD = 38.94) and low-trustworthiness faces (M = −24.65, SD = 37.30) and, as expected, this effect was significant (F(1,29) = 122.01, p < .001, h 2 p = 0.81) . There was no interaction of validity and trustworthiness (F(1,29) = 0.01, p = .919, h 2 p = 0.00). At the end of the experiment there was a main effect of validity (F(1,29) = 8.08, p = .008, h 2 p = 0.22) due to incidental learning of patterns of gaze behaviour, and again a main effect of trustworthiness (F(1,29) = 44.78, p < .001, h 2 p = 0.61). However, importantly, in this experiment there was no significant interaction between the two (F(1,29) = 0.00, p = .974, h 2 p = 0.00) confirming that incidental learning of trust is equivalent for high and low trustworthy faces. At the end of the experiment there were significant differences between valid (M = 28.24, SD = 24.58) and invalid identities both with trustworthy faces (M = 9.65, SD = 32.86; t(29) = 2.40, 95%CI The results of this experiment offer several points of interpretation. First, shifts of attention caused by gaze cues are not affected by the trustworthiness of the face. These gaze cueing effects are similar to those of Experiment 1 where no differences in attention cueing were observed between own-race and otherrace faces. 2 Second, although trustworthiness judgements are heavily driven by the physical appearance of the face, in line with previous research (Sutherland et al., 2013;Todorov, 2008;Todorov et al., 2008;Vernon et al., 2014), this has no effect on the incidental learning of trust from eye-gaze behaviour.
Therefore, we can conclude that the contrast in incidental learning between own-race and other-race faces observed in Experiment 1 is not determined by differences in levels of trustworthiness. In Experiment 2, a direct manipulation of trust based on physiognomic properties did not detect any effects on trust learning from gaze behaviour when viewing only Caucasian faces. Therefore, the hypothesis that trust learning is reduced in other-race faces compared with own-race faces due to differences in participants' initial feelings of trust is not supported.

General discussion
The current study reports the results of two experiments exploring how the identity of a cueing faceand the higher order social information that this carriescan affect orienting of attention and the incidental learning of trust from gaze cues. In both experiments we observe that cueing of attention to the right and left by eye-gaze is unaffected by the nature of the face, whether race or trustworthiness.
This supports previous evidence that gaze cues orient attention in a very fast and automatic manner that is difficult to inhibit (Driver et al., 1999;Freebody & Kuhn, 2016;Frischen & Tipper, 2004). Although previous research has found that cueing can be mediated by factors such as social status , dominance (Jones et al., 2010), familiarity (Deaner, Shepherd, & Platt, 2007), race (Dalmaso et al., 2015;Pavan et al., 2011) and trustworthiness (Petrican et al., 2013;Süßenbach & Schönbrodt, 2014), we found no evidence that participants spontaneously considered either of the latter features when processing gaze, suggesting that these mediating effects may rely on the context (e.g., perceived threat) in which participants find themselves experiencing gaze cues. This is particularly striking in Experiment 2, where we failed to replicate previous research that shows that trustworthiness can affect gaze cueing (Petrican et al., 2013;Süßenbach & Schönbrodt, 2014). There may be a variety of reasons for these contrasts: with regards to Petrican et al., we recruited young adults for all experiments reported here, where they found that trustworthiness affected gaze cueing only in older adults. While Süßenbach and Schönbrodt also used younger adults, they used affectively valenced target stimuli where both of the current experiments used neutral household items, a simpler left/right discrimination task where ours was a more demanding category identification judgement, and (perhaps most importantly) used familiar faces that were known from background information to be trustworthy or untrustworthy (characters in films), whereas ours used unknown faces that differed in perceptual physiognomic features. Further research would be needed to identify which of these design features contributes to whether trustworthiness affects the magnitude of gaze cueing effects, but our present findings certainly suggest that if people are susceptible to trustworthiness information during gaze cueing, they do not invariably use it spontaneously whenever it is available.
However, the main focus of our study was to examine how the nature of the face, whether ownrace or other-race (high-trust or low-trust) would influence the learning of trust from gaze behaviour. We predicted that trust learning would be less efficient when viewing other-race faces. This was based on the idea that during the task where faces are irrelevant and to-be-ignored, an association has to be learned between a specific face identity and the pattern of eye-gaze it produces, and that this would be processed more efficiently for own-race than other-race faces. Although no differences were found in susceptibility to gaze cues, we nonetheless found that race affected how participants learned about the trustworthiness of individuals. This suggests that these processes use different underlying mechanisms: a fast, attention-orienting mechanism that processes gaze cues and does not spontaneously take the race of the face into account, and another mechanism that reviews gaze behaviour and incorporates this into a stable representation of that particular identity for use in future social decisions.
It follows that the association between identity and gaze behaviour will be more easily learned if there is a strong/specific representation of the face identity. Strachan and Tipper (2017) confirmed this by manipulating the strength of face identity representations, demonstrating that stronger representations resulted in greater learning of trust from gaze behaviour. There is extensive prior research demonstrating that other-race faces are identified and remembered less efficiently than own-race faces (see Meissner & Brigham, 2001, for a review). As such, this could be a plausible explanation that future research may look to investigate further.
It would also be interesting for future research to investigate how participants may use racial group membership differently depending on their own identity. In Experiment 1 we used exclusively Caucasian participants. The reason we did not include East Asian participants as a contrast was because within the sample population (undergraduate students at the University of York), East Asian participants are a minority group, and there is some evidence that people process group dynamics differently on the basis of whether their in-group is a majority or minority (Elfenbein & Ambady, 2002b). However, it would be interesting to explore in future research whether the status of participants (as members of majority or minority groups) influenced participants' sensitivity to gaze behaviour in an incidental learning scenario. Such future research may wish to contrast social learning in such a minority population with a matched majority (e.g., Chinese participants living in the UK compared with Chinese participants living in China).
With the inclusion of Experiment 2, which manipulated face trustworthiness, we were able to examine the role of trust in such learning processes. It was noted that previous work has reported less trust of other-race individuals (e.g., Stanley et al., 2011), although there were only trends for this pattern in the current study. Although we hypothesized that these subtle differences in preconceptions about trust between racial groups could still have played a role in the different learning profiles seen in Experiment 1, analysis of this independent dataset demonstrated that initial trust of a face was not influential. That is, learning of trust from gaze was not impaired in low-trust faces, suggesting that the results of Experiment 1 cannot be explained by different levels of trust associated to in-group and out-group members.
Consequently, our findings confirm that while the fixed physiognomic properties of a face are a strong predictor of trustworthiness judgements, and while incidental learning of cueing contingencies has an effect, it does not override or interact with this more salient perceptual information, making it unlikely that this feature can explain the results of Experiment 1. Rather, our gaze manipulation moderates initial trust ratings in similar ways. However, note that the gaze learning is incidental while faces are ignored and we are examining effects on judgements at a later time where there are no visible cues to prior deception. This is in sharp contrast to other more inthe-moment manipulations of trust such as face emotion (which does affect trust learning; Bayliss et al., 2009;Strachan et al., 2016), which are salient physical properties of a face that are present while trust judgements are actually made.
In conclusion, our studies demonstrated a number of features of incidental learning of trust from gaze cues. Learning is incidental in that participants are ignoring the faces, and hence these demanding learning conditions are influenced by the robustness of the representations that have to be associated. In Experiment 1, learning is impaired for other-race faces that have weaker representations of each identity. In contrast, Experiment 2 demonstrated that the race effects are probably not driven by initial trustworthiness of own versus other-races. When the faces are all Caucasian, large differences in trust do not influence incidental learning from gaze behaviour. Notes 1. The primary manipulation in this experiment was that of racial group, but our stimuli also included faces that varied according to gender. This also gave an additional orthogonal group membership factor of face gender that could have affected results. Previous research has found no evidence that gender affects subsequent trust learning , but it was possible that in this paradigm the presence of an additional group dimension (race) also made gender a salient distinction. A 2 × 2 × 2 repeated measures ANOVA with time, validity and gender in place of race as fixed factors found no main effect of gender (F(1,29) = 2.54, p = .122, h 2 p = 0.08) and no interactions of gender with either time (F(1,29) = 0.02, p = .880, h 2 p = 0.00) or validity (F(1,29) = 1.51, p = .229, h 2 p = 0.05) and no three-way interaction (F (1,29) = 0.01, p = .935, h 2 p = 0.00). The same held true when examining only female participants (26/30): there was no main effect of gender (F(1,25) = 1.50, p = .232, h 2 p = 0.06) and no interactions of gender with either time (F(1,25) = 0.01, p = .918, h 2 p = 0.00) or validity (F (1,25) = 2.68, p = .114, h 2 p = 0.10) and no three-way interaction (F(1,25) = 0.04, p = .849, h 2 p = 0.00). 2. Gaze cueing effects looked largely similar across both experiments. A 2 × 2 mixed ANOVA on collapsed data from these experiments, with experiment (1/2) as a between-subjects factor and validity (valid/invalid) as a within-subjects factor found a main effect of validity (F (1,58) = 34.03, p < .001, h 2 p = 0.37), and a main effect of experiment (F(1,58) = 7.17, p = .010, h 2 p = 0.11). This effect was driven by the fact that RTs were longer overall in one experiment than the other. However, this effect did not interact with gaze cueing across the different experiments (F(1,58) = 0.30, p = .589, h 2 p = 0.01), meaning that sensitivity to gaze cues did not differ significantly across the two experiments.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by the Economic and Social Research Council [grant number ES/000012/1].