The impact of weapons and unusual objects on the construction of facial composites

ABSTRACT The presence of a weapon in the perpetration of a crime can impede an observer’s ability to describe and/or recognise the person responsible. In the current experiment, we explore whether weapons when present at encoding of a target identity interfere with the construction of a facial composite. Participants encoded an unfamiliar target face seen either on its own or paired with a knife. Encoding duration (10 or 30 s) was also manipulated. The following day, participants recalled the face and constructed a composite of it using a holistic system (EvoFIT). Correct naming of the participants’ composites was found to reduce reliably when target faces were paired with the weapon at 10 s but not at 30 s. These data suggest that the presence of a weapon reduces the effectiveness of facial composites following a short encoding duration. Implications for theory and police practice are discussed.

The presence of a weapon during the commission of a crime can lead to a weapon-focus effect (WFE).This phenomenon is signalled by a marked decrease in eyewitness memory performance for crime-related details when a weapon is present at the time of encoding (see Fawcett et al., 2013 for review and meta-analysis).Laboratory testing has shown this deficit in performance results in reduced recall for details of the event, aspects of the crime scene and the perpetrator's clothing, general appearance, and face (e.g.Fawcett et al., 2013;Loftus et al., 1987;Maass & Köhnken, 1989).The most widely documented effect of weapon presence is on identification accuracy: performance in a line-up or identity parade reduces for a participant observer exposed to a weapon (cf.no weapon seen; e.g.Erickson et al., 2014;Fawcett et al., 2013;Loftus et al., 1987).In contrast, surveys of real cases have demonstrated a weaker effect of weapon presence compared to other factors such as perpetrator race and duration of encoding (e.g.Tollestrup et al., 1994;Valentine et al., 2003;Yuille & Cutshall, 1986).

Theoretical mechanisms behind weapon focus
Witnessing or falling victim to a crime are stressful experiences.Some early research into eyewitness memory has its theoretical foundations in the effect of various levels of arousal on performance.Some stress is required for optimal performance at any given task, with lethargy and overwhelming perturbation each yielding worse performance (e.g.Yerkes & Dodson, 1908).Deffenbacher et al. (2004) applied Fazey and Hardy's (1988) catastrophe model of stress directly to eyewitness identification research, finding that increases in somatic anxiety result in gradual increases in performance up until the system reaches breaking point.After this point, increases in anxiety result in an abrupt and discontinuous drop in performance.In the brain, stress-related glucocorticoid levels modulate memory performance in a way that mirrors these patterns (Lupien et al., 2007).Perceptually, the arousal explanation is also supported by the cue-utilisation hypothesis (Easterbrook, 1959), proposing that attention allocates to central rather than peripheral details of a visual scene under heightened arousal.Weapons brandished during a crime are likely to increase arousal and stress, narrowing attention to themselves as central features of the scene.Paying more attention to a weapon means that less attention is allocated to a perpetrator's face, which would lie in the periphery, and so may not be encoded as effectively as when a weapon is absent.
It can be difficult to conclude what may be 'central' and 'peripheral' details in a real case given the variability in stress responses from person-to-person and from event-toevent (Christianson, 1992).Nonetheless, memory is less accurate for violent events than for nonviolent variants of the same events (Loftus & Burns, 1982), events wherein weapons are highly visible yield less accurate memory than events wherein weapons are less visible (Kramer et al., 1990), and, of most relevance, eyewitnesses are more likely to recall descriptions for and surrounding the weapon (e.g. the hands of the perpetrator) than eyewitnesses who do not see individuals holding threatening objects (Maass & Köhnken, 1989).All of these findings support the notion that the effect of weapons on attention may be mediated by a mechanism involving physiological arousal.
Another account of the underlying mechanism of the weapon-focus effect contends that weapons compete for visual attention since they are unusual stimuli in most contexts (e.g.Hope & Wright, 2007;Pickel, 1998;Pickel et al., 2008).This novelty-type explanation is based on early perceptual experiments wherein unusual features of a visual scene, such as an octopus in a barnyard, attract attention more quickly after onset and for longer intervals than contextually-relevant aspects, such as a bucket in a barnyard (Loftus & Mackworth, 1978).In theoretical terms, unusual objects are likely to disrupt voluntary control of attention and increase exogenous (i.e.stimulus-driven) control that may be difficult for a witness to overcome (Yantis, 2000).In the developed world, weapons are not common objects in most people's lives and so, when seen during the course of a crime, attract and hold attention at the expense of encoding a perpetrator's face and/or details of the crime.
Eyewitness experiments manipulating the WFE have tested this possibility by examining the context in which a weapon appears (e.g. a shooting range vs. a baseball park; Pickel, 1999) and the unusualness of handheld objects themselves (e.g.perpetrators holding a non-threatening object such as a rubber chicken, or a gun; Erickson et al., 2014).Such research reveals that weapons in contextually-appropriate environments do not elicit a WFE, whereas unusual objects held during crimes do.Carlson, Pleasant et al. (2016) extended the paradigm by presenting participants with scenarios in which normally non-threatening objects were used as weapons.Contrary to the unusualness hypothesis, these items did not reduce line-up accuracy to the level observed with traditional weapons.It remains likely that the degree to which an object is threatening and unusual both impact on facial memory, albeit in differing ways (Carlson & Carlson, 2012).

Weapon focus and the search for suspects
Whether through elevating arousal or distracting attention with novelty, weapons diminish eyewitness memory.What does not appear to be in dispute is that the WFE is greater for recall than for facial identification (Fawcett et al., 2013;Kocab & Sporer, 2016).This result presents a forensically relevant problem that we investigate in the current study.Eliciting a description of a criminal event and its perpetrator during an interview relies upon verbal recall.Investigators rely on these accurate descriptions when searching for suspects (e.g.Technical Working Group for Eyewitness Evidence, 1999; Wells et al., 2020), and therefore a weapon's greatest effect in real cases may come before the police have even located a potential suspect for a witness to identify.This observation is particularly relevant to the current work as witnesses engage in both face recall and face recognition processes for the construction of facial composites.

Facial-composite images
When a perpetrator of a crime is not known, the police may ask witnesses to construct a facial composite of the perpetrator's face, the result of which is an image used as a reference for the police and as a tool to generate investigative leads from both the police and the public.Such images have historically been created by artists specially trained to generate facial images based on eyewitness reports (e.g.Davies & Little, 1990;Frowd, Carson, Ness, Richardson, et al., 2005;Lampinen et al., 2012;Laughery & Fowler, 1980).More standardised methods have used a feature-based 'kit' where witnesses work with a trained technician to assemble a composite from individual prefabricated features, commonly deployed now via a computer interface (e.g.E-FIT; FACES; Identikit 2000; PRO-fit) (e.g.Davies & Christie, 1982).
More recently, a radically different method has emerged for creating facial composites.The basic approach is that witnesses repeatedly select whole faces (or whole-face regions) from arrays of alternatives, with an evolutionary algorithm combining witness choices, to allow a composite to be 'evolved' (e.g.Frowd et al., 2004;George et al., 2008;Tredoux et al., 2006).Also, global aspects of an evolved identity can be changed, such as by altering face weight, age and health (e.g.Frowd et al., 2010).As such, faces are now created in line with holistic face perceptionthat is, perceiving a face as a whole entityas opposed to the previous, non-natural approach of constructing the face by recalling and selecting individual features (for reviews see Frowd, 2011Frowd, , 2017Frowd, , 2021;;Frowd, Skelton et al., 2012).
There are a number of holistic systems, such as EFIT-6 (previously EFIT-V), EvoFIT and ID, all of which have similar face construction procedures.
The system selected for use in the present study, EvoFIT, has been the subject of intense research and development (for reviews, see Frowd, 2017Frowd, , 2021) ) and now creates composites that are more identifiable than those made by previous systems, both in the laboratory (e.g.Frowd, Bruce, Ness, et al., 2007;Frowd et al., 2010Frowd et al., , 2013) ) and in tests of actual police investigations (e.g.Frowd et al., 2011;Frowd, Pitchford, et al., 2012).Therefore, this method shows considerable promise for theoretical investigation and forensic application, including exploration of potential moderators that alter composite effectiveness.
Composite construction can be influenced by myriad factors that affect eyewitness memory.To date, research on holistic face systems has considered what Wells (1978) terms system variables, factors over which law enforcement has direct control when investigating crimes.Examples include style of interview (Brown et al., 2020;Frowd, Nelson, et al., 2012;Giannou et al., 2021), use of image-enhancement techniques (Brown et al., 2019;Davis et al., 2011Davis et al., , 2015;;Frowd, Skelton, Atherton, Pitchford, et al., 2012;Valentine et al., 2010), and procedural methods (e.g.Frowd et al., 2015;Frowd, Bruce, Ness, et al., 2007;Frowd, Park, et al., 2008;Frowd et al., 2009).Theoretically, these techniques improve the effectiveness of central, internal features of the face, the region including eyes, nose and mouth that is important for familiar-face recognition (e.g.Ellis et al., 1979) that is, for recognition of the composite (Frowd, 2021).However, limited research has examined the influence of Wells's estimator variables (Frowd, White et al., 2014;Richardson et al., 2020), aspects of a crime over which law enforcement has no control.Presence of a weapon and exposure duration are two widely documented estimator variables in eyewitness research, the focus of the present study.
As discussed, the presence of a weapon during a crime should exert the strongest effect on recall.A 'feature' approach to face construction requires a witness to recall and assemble facial features, and so should be strongly influenced by presence of a weaponalthough, correct naming of these composites is usually low under forensically relevant conditions, in particular following a long retention interval (a day or more) from face encoding to construction (e.g.Frowd, Carson, Ness, McQuiston, et al., 2005;Frowd et al., 2010), making assessment difficult.However, a holistic system such as EvoFIT requires minimal description to create its first generation of facesscreens from which witnesses select faces to begin constructionand correct naming of its composites is usually good.In a sense, a witness recognises best likenesses, and recognition memory should be affected by weapon presence to a lesser degree than recall.However, research does also indicate that face recall is still involved to some extent, particularly later in the construction process (e.g. Brown et al., 2020;Fodarella et al., 2021;Frowd, Bruce, Ness, et al., 2007); here, the ability to make fine adjustments to an evolved face can improve identification, such as by moving the eyes closer together or making the mouth larger.
Although presence of a weapon has itself not been investigated for a forensically-relevant design in relation to composite likeness, Hancock et al. (2011) examined the degree of stress at encoding.Half of their face constructors played a psychological action thriller computer game that included a briefly presented target; the other half took a passive 'onlooker' stance to the game, viewing the same content and target face.EvoFIT composites created by game players were correctly named half as often as those constructed by passive observers.The authors suggested that elevated state anxiety for the players at encoding had a large negative impact on accurate recall and recognition required for effective composite constructionalthough, players' divided attention may also have been involved, resulting in higher cognitive load, or (as suggested above) use of a less global, less-effective type of face processing (e.g.Frowd, Bruce, Ness, et al., 2007).The current research manipulates arousal in a more overt weaponfocus paradigm, as described below.

The current study
The current study manipulated two estimator variables that are known to affect eyewitness memory accuracy: weapon presence and exposure duration.Our aim was to determine whether these variables would impact identifiability of facial composites.Additionally, we modelled the forensic situation as far as possible, and so included target faces that were unfamiliar to the person constructing the composite; a long retention interval from target encoding to face construction; and a naming task involving participants who were familiar with the targets, our main DV for assessing the effectiveness of the resulting composites.Whatever may be the mechanism, weapon presence was expected to produce less recognisable composite images by drawing witnesses' attention away from the face.More specifically, this could be due to the weapon becoming a more central detail as per the cue-utilisation hypothesis or due to the unexpected nature of the weapon in an experimental context.In either case, we predicted that the depiction of internal features would suffer most under the weapon condition, reducing recognition of the composite, and that this effect would be magnified under a shorter encoding duration.
We present the research as a series of stages, each with its own method and participant samples, an approach that mirrors established practice for facial-composite construction and assessment (e.g.Frowd, Carson, Ness, McQuiston, et al., 2005).Stage 1 comprised stimulus encoding and facial-composite construction by participants who were unfamiliar with the target faces.Stage 2 comprised a naming task where participants familiar with the targets were presented with the composites constructed in Stage 1 and asked to identify them by name.Stage 3 comprised a perceptual-similarity rating task where veridical images of the target faces were presented alongside the corresponding composites constructed in Stage 1 and involved another sample of participants who were unfamiliar with the targets.

Participants
Forty participants, 36 female, from University of Central Lancashire constructed the composites; their ages ranged from 18 to 62 (M = 26.8,SD = 10.7)years.Participants were recruited on the basis of being unfamiliar with the targets: football players competing at an international level for England or Wales.Staff participated voluntarily in response PSYCHOLOGY, CRIME & LAW to an advert, while students received course credit for participation.Participants were randomly allocated in groups of 10 to each level of the two between-subjects factors: Encoding duration and Presence of knife.

Design
Participants each constructed a single composite face, and so the design was betweensubjects for Presence of knife (absent vs. present) and Encoding duration (10 s vs. 30 s).In an attempt to maximise the chance of observing a forensically relevant effect, should one exist, we opted for these target exposure durations as they are not only plausible in the real world, but also because they produced the strongest weapon-focus effect in the Fawcett et al. (2013) meta-analysis (based on their 'intermediate' duration category from 10 s to 1 min).

Materials
Target faces were selected to be unfamiliar to participants who constructed the composites but familiar to those who would later name them.This objective was achieved with use of target photographs of 10 different male footballers who play at international level for England or Wales.For face construction, participants were recruited on the basis of not following this sport, whereas football fans were sought for composite naming.A colour photograph for each player was located on the Internet; all were of good quality, with photographic subjects presented clean-shaven, facing the camera and without glasses.Although most eyewitness research investigating the weapon-focus effect has employed simulated crime videos or slide presentations to maximise ecological validity when measuring dependent variables directly relating to memory (i.e.recall and identification/recognition), it is more common in composite construction research to utilise still images at the initial encoding phase as composite construction research often examines system variables (e.g.interviewing, construction methodology).This suggestion is supported by research which indicates that static and moving images of targets at encoding produce overall equivalent results (Frowd et al., 2015).Static images have also been used to investigate system variables in eyewitness research (e.g.Wooten et al., 2020).
Target photographs were each printed in the centre of a size A4 page to dimensions of approximately 8 cm (wide) × 10 cm (high), for participants to view and then to construct in the weapon-absent condition of the experiment.For presentation in the weaponpresent condition, these targets were presented on top of a photograph of a clip-point knife pointed outwards (i.e.towards the participant) to simulate a threatening pose (see example, Figure 1).The knife was presented in colour to dimensions of 8 cm (wide) × 5.6 cm (high).This stimulus format was chosen as it mirrors the usual presentation of target images at encoding, as mentioned above, but also because of the unique challenge of obtaining a suitable number of images that depicted a target identity holding a threatening object.
EvoFIT version 1.6 was used to construct the composite faces.

Procedure
Following approval of ethics from the study's institution, participants were tested individually.They were randomly assigned, with equal sampling, to the two between-subjects factors of Encoding duration (10 s vs. 30 s) and Presence of knife (absent vs. present).
Participants were given the general briefing that they might be shown a picture of a threatening weapon and would be involved in a two-part experiment involving a computer-based task.According to assignment, they were shown one of 10 target photographs (without replacement) first to briefly look at, and report whether the face was familiar.On two occasions, the participant reported that the identity was familiar and so another face was selected randomly and shown to these participants to briefly look at (which was then reported to be unfamiliar).Familiarity checks complete, participants then viewed the face alone or paired with the knife, for either 10 or 30 s, according to condition assignment.
Participants returned to the laboratory 20-28 h later.They were informed that they would recall the target face and any accompanying object seen on the previous day, and then construct a composite using EvoFIT.The procedure for carrying out the face recall interview and EvoFIT face construction is fairly involved and is described in detail in Fodarella et al. (2015).In brief, using cognitive interviewing techniques, participants were invited to think back to the time when the target had been seen, visualise the face and recall it in as much detail as possible, in their own time, without guessing and without prompts from the experimenter.
The experimenter started EvoFIT and participants chose a database for creating the face, selecting an appropriate age range to match the previously-seen target.Next, participants were shown arrays of internal features faces (revealing the region around the eyes, brows, nose and mouth) and asked to select items with the best resemblance to the target face, focussing on the region around the eyes.Participants were asked to select for smooth faces, facial texture and overall match.This procedure (for smooth, texture and overall match) was repeated for a second cycle through the system, with selected faces being combined to evolve a face.Participants were then invited to improve the likeness using 'holistic' tools that changed age, weight, attractiveness and seven other overall properties of the face.During this stage, participants were asked to focus on the face as a whole rather than on the upper half.Next, participants were asked to select the best matching external features (hair, forehead, ears and neck), with these outer features shown on the constructed internal features.Finally, participants were given the opportunity to alter the size and position of individual features using a 'shape' tool and enhance the face again, if necessary, using holistic tools (followed by any further changes made using the shape tool).When participants reported that the best likeness had been achieved, the face was saved to disk as the composite.
Face recall and face construction were self-paced procedures.Ten composites were produced in each of the four individual conditions of the experiment, a total of 40 composites.The procedure took around 45 min per participant, including debriefing.

Participants
Participants were recruited on the basis of being familiar with footballers who play at international level for England or Wales.They comprised students at the University of Central Lancashire, for course credit, and volunteer visitors at a local library in Cumbria, UK.There were 40 participants in total, 2 female, and their age ranged from 22 to 69 (M = 31.2,SD = 9.9) years.An additional six people were recruited to replace participants who did not meet the a priori rule of correctly naming at least 80% of the target pictures, as described below, to give this sample.Ten participants were allocated to each level of the two between-subject factors: Encoding duration and Presence of knife.This sample size (along with a similar sample size at face construction) is known to be able to detect at least a medium effect size (Frowd, 2021), one that can reveal a forensically-useful difference, should one exist.

Materials
Materials were the 40 greyscale facial composites created in Stage 1 and the associated 10 target colour photographs, printed on separate pieces of A4 paper to dimensions of approximately 8 cm (wide) × 10 cm (high).Example composites are presented in Figure 2.

Design
The design was between-subjects, with each participant asked to name a set of composites from one of the four individual conditions in the experiment (comprised of two factors: Encoding duration and Presence of knife).After naming composites, participants were asked to name the target photographs to check that they were suitably familiar with the relevant identities.We applied an a priori rule: to be able to correctly name the majority of composite images, participants should correctly name at least 80% of the target photographs.If participants correctly named seven or fewer target pictures, as occurred on six occasions in total, the subsequent participant was presented with the same set of composites to name (i.e. was assigned to the same experimental condition as the person who did not meet the a priori rule).

Procedure
Participants were tested individually and the task was self-paced.Participants were told that composites would depict footballers who play at international level for England and Wales.Participants were asked to try to name each composite, if possible, or give a 'don't know' response.They were randomly allocated with equal sampling to the four individual conditions comprising the two factors of the experiment.The 10 composites were presented sequentially for participants to name, followed by the 10 target photographs, and participants were also asked to name these faces.Each person received a different random order of presentation of composites and target pictures.The procedure took around 10 min to complete, including debriefing.

Results
Responses to facial composites and target pictures were scored for accuracy. 1Responses were coded as correct and assigned a value of 1 when participants gave the correct identity, and were coded as incorrect and assigned a value of 0 when a wrong name or a 'don't know' response was given.As participants were included who correctly named at least 80% of the target pictures, overall familiarity with the identities was very high (M = 95.5%,SD = 7.5%).For the few cases (N = 18) where a target picture was not correctly named (by 12 participants, 2-4 times by group), the associated composite also could not have been named correctly, and so responses to these composite items were removed prior to analysis.
Correct responses were much lower overall for facial composites than for target pictures (M = 59.3%, SD = 15.3%); this is the usual situation since composites are errorprone stimuli and therefore tend to not be named accurately.In more detail, 39 of the 40 composites were correctly named by at least one person; naming of these items (identities) spanned the entire naming scale, with six composites (all in 30 s encoding) named at 100%.As can be seen in Table 1, mean correct naming varied little by group except for encoding at 10 s in the presence of a knife, where, for these composites, correct naming was markedly lower (see also Table 1, Note).
Individual responses to composite items from participants were analysed using the regression technique, Generalized Linear Mixed Models (GLMM) in SPSS.This approach models IVs (predictors or fixed effects) in the context of random effects; the random effects are (i) participants in the naming stage of the experiment and (ii) composite items (stimuli).The experiment involved two fixed effects, Encoding (coded as 10 = 10 s; 30 = 30 s duration) and Weapon (coded as 0 = absent; 1 = present), both specified to have a descending sorting order (see Table 2, Note).The DV was individual participant responses, coded as above, with the model set to accommodate nominal responses using a binomial logistic link function.responses (numerator) and total (correct and incorrect) responses (denominator).Data are presented for composites for which participants correctly named the relevant target photographs (N = 382 out of 400), leading to a overall mean of 61.8% correct.In the GLMM analysis of the significant interaction, there were two significant pairwise comparisons: a p < .005; the mean was greater (for 10 s encoding, knife absent) in six out of the 10 items, with two reversed and two ties.b p < .001; the mean was greater (for 30 s encoding, knife present) in eight out of the 10 items, two reversed.
Table 2. Model parameters for the interaction effect of weapon presence and encoding duration on correct composite naming. .†For ease of interpretation, this effect size and its accompanying CIs are presented as values greater than unity (Osborne, 2017), calculated as Exp(-B); based on Cohen's (1988) estimates, 1.5 can be considered as a 'small' effect size, 2.5 as 'medium' and 4.5 as 'large' (Sporer & Martschuk, 2014).
Estimation of parameters for GLMM involves linearization-based methods, also known as the pseudo likelihood approach, in SPSS (Garson, 2019;IBM, 2020), a standard iterativefitting method.As the current sample has balanced data and is sufficiently large (by design), the residual method (e.g.cf.Satterthwaite approx.)was selected as degrees of freedom for computing tests of significance.Default settings for convergence criteria were used: parameter convergence with an absolute difference of 1E-6 and a maximum of 100 iterations for the algorithm's inner loop.For both fixed and random effects' models that were conducted, Beta (slope) coefficients (B), standard errors of B [SE(B)], effect sizes [Exp(B)] and confidence intervals (CI, all reported at 95%) were checked to be within sensible limits, neither too low nor too high, that might otherwise indicate an issue with the fit of the model.
The analysis initially assessed the composition of random effects.This assessment followed the 'gold' standard statistical procedure of Barr et al. (2013): for best generalisation, random effects should comprise the maximum number of terms, as justified by the structure of the data.For the current Independent Samples design, random effects involved random intercepts only (n.b., random slopes were not considered as none of the fixed effects were within-subjects).GLMM were duly conducted for each fixed effects' model, see below, each of which contained the best combination of random intercepts for participants and / or items, with each set of random intercepts included when there was sufficient variability in the response data to compute a non-zero estimate for this variable.For the final, full-factorial model, the variability remaining was not sufficient to compute random intercepts for participants, and so random effects included only random intercepts for items (see Table 1, Note).
The analysis also considered the most appropriate method to compute parameter estimates.There are two methods available in SPSS, a Model-based method and a Robust method that is sometimes preferable (e.g. when data are noisy).Both methods gave the same pattern of significant and non-significant differences, but Model-based was selected for presenting the results in the final model since the resulting standard errors for coefficients [SE(B)] of the interaction term were substantially lower (cf.Robust), thus providing a better fit of the data (also, Model-based is more often available (cf.Robust) in statistical software, and so this choice facilitates replication of results).
A hypothesis-testing (confirmatory) approach was conducted that initially comprised three models, each specified with different fixed effects (predictors) along with appropriate random effects (as described above).One model contained encoding only and a second contained weapon only.A third model contained the interaction between these two fixed effects; as it is standard practice to include individual predictors in a model that contains their interaction (e.g.Field, 2018), this third model was full factorial.Based on these results, a final model was then selected as (i) the full-factorial model (if the interaction was significant); otherwise (ii) a model containing both predictors (if both predictors were significant); or otherwise (iii) a model with a single significant predictor.
Based on the customary alpha of .1 for regression analyses, the first model was significant for Encoding [F(1, 380) = 5.13, p = .024],with better naming for the longer (cf.shorter) encoding; and the second for Weapon [F(1, 380) = 3.20, p = .075],with better naming when the knife was absent (cf.present).As the interaction between encoding and weapon was also significant in the third model [F(1, 378) = 6.80, p = .010],this GLMM was selected as the final model.It emerged without issues: (i) there were only a few (N = 7 or 1.8% of total) cases (four in 10 s encoding, no weapon; and three in 30 s encoding, weapon present) where the standardised (Pearson) residuals (SR) fell (just) outside 2 SD (SR < −2.7), and (ii) variances by group were within a reasonable spread both for residual errors (range of σ 2 = 0.61-1.06)and for predicted probability values (range of σ 2 = 0.06-0.10).
To explore the significant interaction, fixed coefficients were examined in the third (final) model (Table 2).This analysis indicated that presence (cf.absence) of a knife led to less-effective composites at 10 s encoding duration (p = .002),but a weapon-focus effect did not extend to the longer (30 s) encoding time (p = .53).It also revealed that 10 s encoding led to less-effective composites (cf.30 s) when the weapon was present at encoding (p < .001),but there was no significant difference for encoding duration when the weapon was absent (p = .80).
We also assessed erroneous (mistaken) responses given by participants to composites, an assessment which emulates police investigations where names ('tips') turn out to be incorrect.Naming responses were re-scored (0 = no name given, 1 = mistaken name) and cases analysed (i) for which the target identity was known (as above) and (ii) for composites that had not been correctly named.GLMM conducted as above including both random intercepts were not significant for Presence of weapon [p = .84,Exp(B) = 1.12],Encoding duration [p = .85,Exp(B) = 1.05] and their interaction in a full model [p = .89,Exp(B) = 1.23].Therefore, there is evidence that mistaken names were unaffected by the two predictors in the experiment.

Participants
The composites were rated for likeness by 20 volunteers, 13 female, with an age range from 18 to 74 (M = 44.6,SD = 16.3)years.Participants were sampled widely, those living in the Cumbria area, UK, recruited by word of mouth.They were recruited on the basis of not following football at an international level in the UK.This is because participants tend to rate composites harshly if they are familiar with the target identities (Frowd, 2021), reducing experimental power.

Materials
Materials were the composites constructed in Stage 1 and the 10 target photographs, all printed in the same way as for Stage 2.

Design and procedure
Participants rated the likeness of all 40 composites and so the design was within-subjects for the two factors of the experiment.Participants were tested individually and the task was self-paced.They were asked to give a rating of likeness (1 = very poor likeness … 7 = very good likeness) for each composite in the presence of its relevant target photograph.All 40 composites were presented sequentially, and participants provided a rating as requested.The target photographs were then presented sequentially and participants asked to attempt to name the relevant footballers.To avoid familiarity effects, we planned to exclude any participants who recognised three or more of the targets; however, no one met this criterion (and so all recruited participants were included in the sample).Each person received a different random order of presentation for composite and target pairs.The task was completed in about 20 min, including debriefing.

Results
The analyses followed the same coding scheme for predictors and procedure for composite naming, except that a multinomial probability distribution and a cumulative logit link function were used to model the ordinal-type participant data (composite likeness ratings).Participant ratings are summarised in Table 3 and indicate minimal differences by group except that mean rating was somewhat higher at 30 s (cf. 10 s) encoding duration.
GLMM conducted on individual participants' likeness ratings proceeded in the same way as for composite naming by including random intercepts for both participants and items.Estimates of variance for both of these random intercepts were able to be computed (cf.items only for naming), the result of which was a single combined model [F(2, 792) = 4.78, p = .009]that was significant for (i) Encoding duration [SE(B) = 0.13, p = .010],with higher likeness ratings for the longer (cf.shorter) encoding [Exp(B) = 1.39]; and (ii) Presence of weapon [SE(B) = 0.13, p = .083],with higher likeness ratings when a weapon was absent (cf.present) [Exp(B) = 1.25]; the full model was not significant for the interaction between encoding and weapon (p = .11).
Random slopes were then added to the random effects' model by including the same predictors that were specified as fixed effects.These models did not emerge significant (p > .29)for either fixed effect on its own or for the interaction in the full-factorial model.This result suggests that accounting for differences between each participant and between each item (identity), by including both random intercepts, produced a significant model of the data, but also including the within-subjects random effectsdifferences in the differences between meansby including both random slopes, did not provide a reliable model that was able to generalise fully to participants and items.
In Table 4, results are presented for the combined model that includes random intercepts for participants, and both random slopes; random intercepts for items was not included as no variance remained for items following inclusion of the other random effects.It is apparent that non-significant fixed effects emerged for likeness ratings as effect sizes [Exp(B)] were small for a study intended to be able to detect a larger, medium effect size.This outcome would appear to be due to the variable nature of random slopes, here leading to a three-fold increase in standard errors and a consequential reduction in statistical power of the tests.

Discussion
Understanding the role played by weapons on eyewitness memory is important for the apprehension and reliable conviction of criminals.This experiment examined the effects of weapon presence and facial encoding duration on the effectiveness of forensic composites made with EvoFIT, a holistic software system based on an evolutionary algorithm.
Correct naming reduced significantly for composites constructed of target faces paired with a knife at 10 s encoding (cf. 10 s no knife).This means that a weapon-focus effect was evident, at least for the shorter encoding duration.This provides a novel extension of the weapon-focus effect from eyewitness recall and identification to downstream aspects of the search for criminal suspects.The effect of weapon presence and encoding duration on the overall appearance of the constructed composites, as assessed by ratings of likeness, was small and not reliable in a random effects' model that was most generalisablethat is, when including the best combination of random intercepts and random slopes.
To our knowledge, an encoding duration as short as 10 s has not featured in composite research that closely models the forensic situation and involves composite naming as a DV.With EvoFIT, as with the other holistic systems (e.g.EFIT-V and ID), constructors repeatedly select from face arrays to evolve a composite.This process is holistic in nature, with selection of whole faces or whole-face regions, and has been proposed (e.g.Frowd et al., 2004) to parallel natural holistic processing of upright, intact faces (e.g.Richler & Gauthier, 2014).We propose that such a brief encounter is likely to result in less detailed scrutiny of a target and encoding that is more global in nature, as evidenced here by only a small, albeit non-significant (cf.significant in a simpler model), reduction in ratings of likeness (from 30 to 10 s).This proposal is supported by research which reveals that methods designed to encourage global face processing, such as asking constructors to make personality judgments about a face after having described it, improve face construction for feature and holistic composite systems (e.g.Frowd, Bruce, Smith, et al., 2008;Frowd, Nelson, et al., 2012;Frowd et al., 2013Frowd et al., , 2015;;Skelton et al, 2019.Similarly, global procedures applied to finished composites also improve naming rates (e.g.Frowd, Skelton, Atherton, Pitchford, Hepton, et al., 2012;Frowd, Jones et al., 2014).Indeed, such procedures can be combined to give excellent results: in Frowd et al. (2013), with an encoding duration of 30 s and 24 h retention interval, mean naming for EvoFIT emerged at an astonishing 74% correct.Although our experiment was not designed to directly test the theoretical explanations for the weapon-focus effect, our results do permit reasonable speculation.Presence of the knife at encoding reliably inhibited correct naming at 10 s encoding (cf.no weapon).It is worth mentioning that, in spite of a warning that a weapon may be seen, the experimenter observed that participants were noticeably shocked by the knife, and so it is possible that the decrement in correct naming (MD = 18.9%, 10 s encoding) is in part related to the impact of stress and anxiety, as previously reported by Hancock et al. (2011).Therefore, our results fit more closely with the arousal hypothesis of the WFE because any unusualness or unexpectedness should have been abated by the (warning) instruction.This proposal is further supported by similar findings in eyewitness identification (e.g.Loftus & Burns, 1982;Maass & Köhnken, 1989).For composites, elevated stress at encoding also presumably interferes with identifiability (Davies, 2009;Hancock et al., 2011), and in real-world cases, composite constructors may be witnesses averse to the process due to reliving the trauma and feeling victimised by the individual whose face they are being asked to construct (Tredoux et al., 2021).However, note that some real-world victims experience extreme stress at encoding and yet produce a highly accurate likeness such as the EvoFIT composite constructed of rapist Asim Javed as part of Operation Hatton (Frowd, 2017(Frowd, , 2021)).We also note that, using a design similar to the one presented here, previous attempts at inducing a WFE with composites have been unsuccessful (Frowd, 2014), an outcome likely to be the result of the 30 s encoding duration that has been used (same as the null result found here).Finally, although the accuracy of external and internal facial features varies from composite to composite under the various experimental conditions, the lack of a reliable effect on perceptual similarity (likeness ratings) between weapon and no weapon conditions implicates no specific effect on either of these facial regions.Instead, it is likely that more holistic or structural codes required for familiar-face recognition (Bruce & Young, 1986;Young & Bruce, 2011) were disrupted by the presence of weapon.Further research could usefully establish the general effect of stress and anxiety on facial-composite production through measured physiology and follow-up surveys, as well as including the presence of a novel / unusual object.
The implication of the research is that a weapon is unlikely to have a measurable effect on the correct naming of composites when encoding duration is at least 30 s.For a much shorter duration, 10 s here, a negative effect of weapons would be expected, with a medium effect size [Exp(B) = 2.9].The nature of an intermediate duration is unknown, but presumably somewhere over this range, the impact of weapons becomes negligible (and could be the focus of future research).This suggestion should be taken in the context of a real crime, for which one would expect greater attention to a weapon compared to a person seeing a weapon in a photograph (and so shorter delays may be more impactful to composite accuracy).Carlson, Young et al. (2016) previously found an effect of shorter exposure duration (3 s vs. 10 s) on eyewitness recall, but not identification or confidenceaccuracy calibration, in a weapon-focus paradigm.Given that recall is important for composite construction (e.g.Frowd, Bruce, Ness, et al., 2007), we consider the encoding duration effect a replication and extension of this finding.More generally, given a similar procedure to construct faces for other holistic systems, an effect of weapons would also be expected.A similar argument could be made for traditional feature systems (e.g.E-FIT, FACES, Identikit 2000, PRO-fit).However, these types are believed to be based more on recall than recognition (e.g.Frowd et al., 2005) and so, as weapons and objects influence recall to a greater extent than recognition (e.g.Loftus et al., 1987;Steblay, 1992), the effect could be stronger.We note though that, as feature-based composite accuracy is usually low following a long retention interval (e.g.Frowd, Carson, Ness, McQuiston, et al., 2005;Frowd et al., 2010Frowd et al., , 2015)), this proposal may be difficult to verify experimentally given that correct naming is likely to be suppressed further toward floor-level performance.
Forensic composite images are commissioned to help produce tips from police officers and members of the public so that an investigation may generate or eliminate suspects as well as corroborate victim and witness testimony.In some countries (e.g.Australia, UK, US, South Africa), composites may also be used as evidence in court.The experiment reported here modelled both aspects of composites as an investigative tool, by incorporating a naming task (e.g. when police and members of the public familiar with a fugitive may recognise that person) and a likeness rating task (e.g. when law enforcement officers may compare composites to suspects they investigate).Since the results reported here reflect what would appear to be the first formal assessment examining the combined contributions of weapon presence and encoding duration on composite naming accuracy, we make some tentative practical conclusions.
First, experts making recommendations based on our results should consider them in a way similar to how all estimator variable research are considered: namely, the aggregate reduction in composite accuracy due to the presence of weapon may produce false tips and therefore jeopardise innocent suspects and in turn lead to a failure to apprehend the actual guilty suspects who can go on to reoffend.That is not to say that police should not commission composite images for very briefly occurring crimes involving weapons, but that care should be taken when following up leads in such cases.It is worth mentioning, though, that mistaken names did not vary reliably in the experiment, and so false leads would appear not to be influenced (at least with a medium effect) by the presence of weapon or encoding duration.For the forensically-important measure of correct naming, though, composites constructed from 10 s encoding in the presence of a weapon still had very good utility for correctly identifying the targetabout 50% correct naming.Also, although correct naming dropped by about 20% from 30 s to 10 s encoding duration in the presence of a weapon, naming of composites in the weapon-present condition was still nearly 60% correct overall, similar to mean naming of 56% reported in a recent meta-analysis of the system (Frowd et al., 2015).It has long been known that convictions should not be based on eyewitness evidence alone (e.g.Devlin, 1976), and so other evidence is important to support a reliable conviction later in an investigation (e.g.Greene & Loftus, 1984;Osborne & Davies, 2013), irrespective of the method used to generate a suspect in the first place.
Second, further research is needed to explore the various effects of weapons, unusual objects, arousal, and myriad other estimator variables on composite accuracy, as well as means of mitigating these effects at the interviewing and face construction stages.For example, given our current evidence (via likeness ratings) that a weapon's adverse effects on naming have a holistic mechanism, a 'holistic' tool designed to counteract this effect could conceivably be developed, as has been done for other global aspects of the face in holistic systems with a view to facilitate naming (Frowd et al., 2010).Also, composite research must overcome sampling and design limitations consequent to the necessities of setting up these sorts of experiments, as previously recommended by Wells et al. (2005), to overcome failures to find small but potentially consequential effects.While the current design had good experimental power to be able to detect at least a medium effect size, to be of practical significance, researchers might like to consider avoiding the use of celebrity or otherwise well-known faces as targets for constructors, and then experimentally induce familiarity among participants tasked with naming composites, or (to achieve natural familiarity) consider a cross-site design (e.g.Frowd, Bruce, McIntyre, et al., 2007).These measures would allow researchers to use larger samples of constructors in accordance with recommendations for heavily powered experimental designs as well as allow the production of specially tailored stimulus materials including more naturalistic features like those seen in real crimes.

Figure 1 .
Figure 1.Example of the type of stimuli used in the weapon-present conditions in the experiment; actual materials cannot be presented here due to copyright.The footballer (Adam Lallana) and knife were obtained from Wikimedia Commons (2021).In the weapon-absent conditions of the experiment, the same facial photograph was shown (to different participants) without the weapon being presented.

Figure 2 .
Figure 2. Composites of footballer Adam Lallana.They were constructed by a different person after (a) 10 s encoding duration without knife present, (b) 10 s with knife, (c) 30 s without knife and (d) 30 s with knife.

Table 1 .
Correct naming of composites by duration of encoding and presence of knife.Figures are expressed in percentage and calculated from participant responses in parentheses: summed correct

Table 3 .
Likeness of composites by encoding duration and presence of knife.Rating scale (1 = very poor likeness … 7 = very good likeness).Values are expressed using the mean (which gave a clearer pattern of results cf.median) and, in parentheses, by-participant SE of the mean.

Table 4 .
Model parameters for the effects of weapon presence and encoding duration on composite likeness ratings.
a See