‘I do not know’: an examination of reactions to virtual agents that fail to answer the user’s questions

ABSTRACT Virtual agents (VAs) used by retailers for online contacts with customers are becoming increasingly common. So far, however, many of them display relatively poor performance in conversations with users – and this is expected to continue for still some time. The present study examines one aspect of conversations between VAs and humans, namely what happens when a VA openly discloses its knowledge gaps versus when it makes attempt to conceal them in a setting in which it cannot answer user questions. A between-subjects experiment with a manipulated VA, and with perceived service quality as the main dependent variable, shows that a display of a high level of ability to answer user questions boosts perceived service quality. The study also offers explanations of this outcome in terms of mediating variables (perceived VA competence, openness to disclose own knowledge limits, usefulness, and learning-related benefits).


Introduction
As retailing is becoming increasingly online-based, many retailers have introduced virtual agents (VAs) with which the customer can interact in online settings (Gnewuch, Morana, and Maedche 2017;Suich Bass 2018).VAs, sometimes also referred to as chatbots, virtual employees, virtual shopping agents, virtual customer service agents, and conversational agents (Crolic et al. 2022;Gnewuch, Morana, and Maedche 2017;Köhler et al., 2011), are computer programs that can provide cost-efficient and consistent customer service in several ways, such as dealing with complaints and helping customers find suitable products within a retailer's assortment (Akhtar, Neidhardt, and Werthner 2019;Tan and Liew 2020).They can also capture market intelligence and engage in cross-selling opportunities (Köhler et al. 2011).And they will not be subject to the same type of stress-related reactions that are common among human employees, who often have to pay a high toll for constant exposure to customers (cf.Bromuri et al. 2020).
Typically, when a VA is used as a representative of a company, it is screen-based (i.e. it is non-embodied) and is accessible online on a firm's website or in a mobile telephone app, and it can communicate with the customer in natural language.
Many VAs resemble humans also in other ways; they may have a face, a name, a gender, and a voice (Crolic et al. 2022).So far, however, many VA applications have been disappointing when it comes to providing convincing conversations (Chaves and Gerosa 2021;Gnewuch, Morana, and Maedche 2017).The present study examines one particular VA aspect that has hitherto not been subject to empirical studies, namely explicit verbal responses to users' questions when the VA does not know the answer.The main rationale behind this focus is that VAs will continue to perform at suboptimal levels for still some time.It may be noted that even ChatGPT, a chatbot that has stunned many humans with its abilities (Taecharungroj 2023), frequently fails to answer questions correctly.Yet, it makes attempts to conceal this by producing 'hallucinations'; it provides nonsensical information that looks plausible (Taecharungroj 2023).In any event, one may foresee many situations when VAs are not able to answer questions -but they need to respond anyway, in one way or another.Such responses can vary in the extent to which a VA displays an ability to answer users' question, and this particular aspect is assessed in the present study.
In a human-to-human context, one would expect that a person's low ability to answer a question in a particular subject domain is an indication of the person's lack of knowledge about the domain.In the light of the rapid progress of artificial intelligence, and its documented ability to outperform humans in settings such as chess and medical diagnosis, one may also expect that a VA that displays a lack of knowledge would disconfirm expectations about its (allegedly) superior information processing abilities.Indeed, people in general believe that AI-based recommenders are more competent than human recommenders, at least for utilitarian-focused recommendations (Longoni and Cian 2022).If such beliefs are not confirmed, however, the expected result is attenuated overall evaluations of the VA.This outcome thus implies that VAs should avoid displaying that it cannot answer questions.
Alternative outcomes, however, do exist.Again in a human-to-human setting, a display of a low ability to answer a question can indicate a willingness to openly disclose that one has knowledge gaps and thus personal limitations.This is a facet of humility (Exline and Geyer 2004;Owens, Johnson, and Mitchell 2013), a positively valenced person characteristic in many contexts (Exline and Geyer 2004;Wright et al. 2017), which can boost the overall evaluation of the speaker.Given that we humans react to (humanlike) non-human agents in ways that resemble how we react to real humans (Epley 2018;Reeves and Nass 1996), it is therefore possible that a VA that displays a low ability to answer questions would be rewarded with positive evaluations in a social perception situation.This, then, implies that a VA should not conceal a low ability to answer questions.
Against the backdrop of these rival outcomes, and given that various VAs are increasingly replacing human service agents in online interactions (Crolic et al. 2022), the purpose of the present study is to empirically examine effects of VAs' ability to answer user questions in an assessment with perceived service quality (PSQ) as the main downstream variable.The present study also examines a set of mechanisms by which VAs' ability to answer questions can influence PSQ.To this end, an experiment was carried out in which a VA was manipulated (low vs. medium vs. high display of an ability to answer user questions).The specific setting for the empirical assessment was a speech-based VA that is able to share knowledge about running shoes.In this setting, then, the main task of the VA is to serve as an advisor (cf.Tan and Liew 2020).Moreover, as a specific means to display different levels of an ability to answer questions, the VA's ability was signaled by its use of the utterance 'I do not know' which, in a human-to-human context, is a common reply to an information question when the speaker is unable to supply the requested information (Tsui 1991).
An examination along those lines contributes in several ways to existing research on VAs; it contributes to the literature on (a) antecedents to perceived VA knowledge, competence and perceived usefulness, (b) learning benefits for the user from using VAs, and (c) it introduces the notion of intellectual humility in the context of VAs.These contributions are discussed further in section 5.1 below.

Conceptual point of departure
A main assumption in the present study is that we humans tend to react to (humanlike) non-humans in ways that are similar to how we react to real humans.This reaction pattern is commonly labelled anthropomorphism, and it can be seen as one of many ways in which people make sense of an unknown stimulus based on a better-known representation of a related stimulus (Epley et al. 2008).Typically, it is similarity (e.g. in terms of morphology or conversation style) between a non-human agent and a human agent that triggers anthropomorphism; exposure to a humanlike non-human agent can activate mental content related to real humans, and in the next step this content is applied to the non-human object (Epley 2018).Another main assumption in the present study, given anthropomorphism, is that theory about human-to-human interactions (as well as empirical findings based on such theory) can be useful for developing predictions about humans' interactions with non-human agents.This second assumption serves as the basis for several of the study's hypotheses.
More specifically, in a human-to-human setting, it is expected that observers in social situations use even minimal cues related to a target to infer various traits of the target (Barasch, Levine, and Schweitzer 2016).Previous research has repeatedly revealed that such inferencing occurs given cues about the target's physical attractiveness, emotional expressions, gender, and age, and in the present study it is assumed that what a target says provides such cues, too.Given anthropomorphism and a conversational setting in which the asker is a human user and the replier is a VA, which is presented as an advisor knowledgeable in a specific domain, it is assumed that perceptions of the ability of a VA to provide answers would serve as a cue for the attribution of several other characteristics to the VA.

The ability to answer user questions and its consequences
It is hypothesized that a VA's ability to answer users' questions influences perceptions of four VA characteristic (competence, openness to disclose knowledge limitations, usefulness, and provision of learning benefits), and of each them is hypothesized to affect perceived service quality (PSQ) in relation to a VA.The selection of these variables was made based on an examination of previous research in various areas in which it was possible to identify variables that were likely to have a potential to be (a) consequences of an ability to answer questions and (b) antecedents to perceived service quality (as an overall evaluation of a service).The selection, then, should be seen as an attempt to integrate parts from existing theories (or existing theoretical arguments) from different fields.An overview of the hypothesized associations is provided in Figure 1.
First, it is expected that the VA's ability to provide answers would be positively associated with perceptions about VA competence.Competence is viewed here as a universal dimension in human-to-human perception contexts; it reflects the extent to which a target person is seen as having traits related to abilities such as intelligence, creativity, and efficacy (Fiske, Cuddy, and Glick 2007).Moreover, attributions of competence are often based on the target's communicative acts (Treem 2012).One particular act indicative of competence is talk time; the more a target talks about a topic, the more the target is seen as knowledgeable (Littlepage et al. 1995).Similarly, cues about a person's dominance/assertiveness are typically diagnostic for perceptions of the person's competence (Cuddy, Glick, and Beninger 2011).Given that a low ability to answer questions is likely to (a) cut talk time in relation to delivering an answer and (b) signal less assertiveness than answering the question (and given also anthropomorphism), the following is hypothesized:

H1:
The ability of a VA to provide answers to user questions is positively associated with perceptions of VA competence.
However, an unrestricted ability to provide answers may backfire when it comes to attributions of other characteristics: it may reduce the perceived humility of the replier.Indeed, in a human-to-human context, one main facet of humility is an ability to acknowledge one's imperfections and gaps in knowledge in such a way that it involves a transparent disclosure of personal limits (Exline and Geyer 2004;Owens, Johnson, and Mitchell 2013;Wright et al. 2017).Similarly, intellectual humility is characterized by accepting one's knowledge gaps rather than denying them or cover them up (Whitcomb et al. 2017); an intellectually humble person should be less likely to pretend to know something and can acknowledge that there are gaps in his/her knowledge (Haggard et al. 2017).With this view, then, explicitly displaying a low ability to answer a question as a response to a question can signal that the replier is open about his/her knowledge gaps.In the present study, and given again anthropomorphism, it is assumed that an unrestricted ability of a VA to provide answers to users' questions can indicate that the VA is not open to disclosing that it does not know everything.Therefore, the following is hypothesized: The ability of a VA to provide answers to user questions is negatively associated with perceptions of VA openness to disclose its knowledge limitations Moreover, it is expected that the VA's ability to provide answers to user questions would affect two additional outcomes: perceived usefulness of the VA and perceived learning benefits.Usefulness is a frequently employed variable in theories of users' assessments of various technologies.In early studies, usefulness covered the extent to which users believe that a particular technology would enhance their job performance (Davis 1993), but in more recent studies of consumers it typically comprises beliefs about a technology as providing benefits in everyday life situations.In the case of synthetic agents, convenience and time savings are examples of such benefits (e.g.Ghazali et al. 2020;Hu 2021).
Here, in the present study, it is hypothesized that a VA presented as knowledgeable in one particular domain would be perceived as more useful if it can indeed answer questions about matters in the domain:

H3:
The ability of a VA to provide answers to user questions is positively associated with perceptions of VA usefulness Learning new things is a specific benefit that a VA can offer for the user.In consumerrelated research, consumer learning has been measured as subjective knowledge about a domain (e.g.Li, Daugherty, and Biocca 2003) and in terms of actual knowledge as captured by a quiz (e.g.Poynor and Wood 2010).Here, in the present study, however, the focus is on perceived learning benefits; that is to say, the extent to which a user believes that his/her knowledge can increase as a result of an interacting with another agent (Froehle and Roth 2004).In the context of virtual agents, this aspect has been referred to as the extent to which an agent is believed to facilitate learning (Ryu and Baylor 2005).It is expected in the present study that VAs that can provide answers to questions would boost beliefs about learning benefits compared to when answers are not provided.The following, then, is hypothesized: H4: The ability of a VA to provide answers to user questions is positively associated with perceptions of learning benefits stemming from using the VA

Effects on overall evaluations
In the next step in a process in which judgments of a replier are formed, it is assumed that the outcomes covered by H1-H4 would be positively associated with the overall evaluation of the replier.In the present study, the overall evaluation variable is the perceived service quality (PSQ) with respect to the VA.The choice of this particular downstream variable was based on its important role as an overall evaluation variable in traditional service theories (e.g.Cronin and Taylor 1992;Hartline and Jones 1996).More recent applications of PSQ as an evaluation variable comprise technology-based services such as, for example, e-channels (Blut et al. 2015), service robots (Choi et al. 2020;Söderlund 2022), and AI-based services (Prentice, Dominique Lopes, and Wang 2020).
With respect to competence, a positive association between the perceived competence of a person and overall evaluations of the person is typical for human-to-human interaction contexts (Hartley et al. 2016;Wojciszke 2005).In a service encounter setting, it has also been reported that perceived service provider competence is positively associated with the customer's positive emotions, negatively associated with his or her negative emotions (Price, Arnould, and Deibler 1995), and positively associated with intentions to try a service product (Min and Hu 2022).Similarly, customer beliefs in the capabilities of his/her interaction partner (i.e.other-efficacy) in a service context has been shown to be positively associated with perceived value (Wang 2018).Moreover, beliefs about employee competence typically have a positive influence on the overall evaluation of the employee (Söderlund and Berg 2020).In specific knowledge-related settings, such as when the transfer of knowledge is the core of an interaction (e.g. in education programs), lack of perceived competence of an interaction party is typically seen as dissatisfying (Chahal and Devi 2013;Voss, Gruber, and Reppel 2010).For example, and in terms of the ability to provide answers to questions, a professor who immediately answers a student's question is seen as satisfying by students (Voss 2009); conversely, a professor who refuses to answer student questions, or cannot answer them, is seen as dissatisfying (Voss, Gruber, and Reppel 2010).With respect to studies of reactions to VAs, Verhagen et al. (2014) report that VA expertise positively influenced social presence and personalization, and both these outcomes were positively associated with service encounter satisfaction.Similarly, VA competence was positively associated with the overall evaluation of the VA in Söderlund et al. (2021).In task-oriented conversations with VAs, it has also been shown that the majority of users leave the conversation if the VA cannot give the desired answer right away (Akhtar, Neidhardt, and Werthner 2019), and leaving a conversation can be seen as an indication of dissatisfaction.Given this, then, it is expected in the present study that a VA's perceived competence would be positively associated with the overall evaluation of the VA with respect to perceived service quality: H5a: Perceptions of VA competence are positively associated with PSQ Openness to disclosing one's limitations is one of several indicators of a person's humility.And several authors have stressed that humility is a virtuous, positively charged characteristic (Exline and Geyer 2004;Hagá and Olson 2017;Nielsen, Marrone, and Slay 2010;Owens, Johnson, and Mitchell 2013).It has also been argued that disclosure of personal limitations has the potential to foster high-quality interpersonal relations (Owens, Johnson, and Mitchell 2013) and that humble people are likely to be seen as likeable (Exline and Geyer 2004).In addition, according to Skipper (2021), we are likely to trust those who admit what they do not know; one main reason is that admitting this indicates a general aversion against making false assertions.Similarly, to be truthful is a common conversational norm, and admitting one's knowledge gaps can signal truthfulness (Setlur and Tory 2022).In empirical terms, it has been shown that intellectual humility is negatively associated with neuroticism, dogmatism (Haggard et al. 2017), manipulativeness and social dominance (Wright et al. 2017), and positively associated with prosocial values such as empathy, altruism and benevolence (Krumrei-Mancuso 2017).This, then, suggests that a person who scores high on intellectual humility is likely to be liked -particularly when the person is providing service to others.Relatively few empirical studies of humility, however, have been made in a social perception context (i.e.examinations of participants' views of a target person other than themselves).One exception is Huynh and Dicke-Bohmann (2020), who found that patients' perceptions of their clinicians' humility were positively associated with both patient satisfaction and the extent to which patient felt trust in relation to their clinician.Similar results are reported by Hagá and Olson (2017) in an everyday context and with respect to liking of the other person as a result of the person displaying intellectual humility.Taken together, then, and given anthropomorphism, it is expected that a VA's perceived openness to disclosing knowledge gaps would be positively associated with perceived service quality: H5b: Perceptions of VA openness to disclosing its own knowledge limitations are positively associated with PSQ Usefulness comprises beliefs about general benefits that a technology can offer, for example, convenience and time savings (Ghazali et al. 2020).Such benefits can been seen as the desirable consequences of using a product (Gutman 1982).They can also be seen as representing the main reason for consumers' buying and usage behaviors (Botschen, Thelen, and Pieters 1999;Gutman 1982).Given that a positive perception in a pre-purchase phase of what quality level a product provides can determine buying and using the product, it is assumed that perceived usefulness of a VA would have a positive impact on PSQ.In empirical terms, a positive association between perceived usefulness and overall evaluations is a typical finding when human participants evaluate technologies, and this association has also materialized in several studies of chatbots (Rapp, Curti, and Boldi 2021).The same is expected here: H5c: Perceptions of VA usefulness are positively associated with PSQ Finally, it is assumed that there would be a positive association between learning benefits provided by a service (i.e.beliefs about enhanced knowledge in a domain related to the service) and the overall evaluation of the service.One main reason is that enhanced knowledge can increase mastery (Alba and Williams 2013), which is typically a positively charged state of mind (White 1959).It is also likely that improved knowledge can result in a more stimulating and longer lasting consumption experience (Alba and Williams 2013).In addition, consumers derive greater enjoyment from an activity as their proficiency with it increases; feelings of insight provide particularly high increases in liking a product (Lakshmanan and Krishnan 2011).In empirical terms, Froehle and Roth ( 2004) have reported a positive association between the extent to which an interaction in a commercial setting is seen as providing increased knowledge and overall evaluations of the interaction.Therefore, it is expected that a VA that is perceived as providing learning benefits for a user would be evaluated positively in terms of perceived service quality: H5d: Perceptions of learning benefits related to using a VA are positively associated with PSQ

Experimental design and participants
The hypotheses H1-H5 were tested with data from a between-subjects experiments in which a VA's ability to answer user questions was manipulated (low, medium, high).The specific setting selected for the experiment was a VA programmed to be knowledgeable about running shoes, and it was designed to mirror what can be expected in the not-sodistant future when it comes to websites of retailers in the field of sporting goods.To be able to keep constant other aspects than a VA's ability to answer questions in a conversation with a user, a script was made for a human-to-VA interaction.The script comprised a conversation with turn-taking in terms of an exchange of questions and answers between the two parties.In this conversation, the human asked eight running shoes-related questions to the VA.Three versions were made of the script in order to manipulate the level of the VA's ability to answer questions.As a main means for the manipulation, it was assumed that the utterance 'I do not know' is a prototypical marker of an inability to supply information when it occurs in response turns (Tsui 1991).
In the low ability version, and for four of the human's questions, the VA said 'I do not know' and nothing more before it was the human's turn to speak again.In the medium ability version, and for the same four questions, the VA said 'I do not know' and continued to speak about related issues.When used in this way, it was assumed that 'I do not know' conceals an ability to answer questions by being used as a prompt to move the conversation forward (cf.Setlur and Tory 2022).In the high ability version, the VA never said 'I do not know' and instead proceeded by speaking about the same related issues as in the medium ability version.
These three script versions were enacted with a human user and a VA developed for the purpose of this study, their conversation was in voice mode, and the conversations were recorded on video.The resulting videos were used as stimuli for the participants in the study (see the Appendix for links to the videos).The VA was created by using a web-based service that allows users to design customizable speaking avatars with lip movement synchronized with their speech.This VA was given the name Alice, a female gender, the appearance of a human who is likely to be interested in sports, and a voice with a British accent.For the human user in the video (and for the participants), the VA appeared on the right side of a computer screen with the text 'Your Virtual Running Guide' on the left side (see Figure 2).
As mentioned above, the present study, as many other studies of humans' reactions to VAs and other synthetic agents, has been influenced by an anthropomorphism assumption.That is to say, we humans tend to react to (humanlike) non-humans in ways that are similar to how we react to real humans (Epley et al. 2008).Therefore, it was viewed as informative to explicitly assess the extent to which the stimulus VA in the present study could be seen as humanlike.This was done in two ways.
First, a video with only the VA visible was submitted to FaceReader, a software developed to use videos with humans' facial expressions as inputs to assess the intensity of a set of emotions.As a byproduct of this, FaceReader gives an assessment of the characteristics of the person whose emotions are gauged.For the VA in the present study, FaceReader concluded that its gender was female, that its age was in the 10-20 years range, and that its ethnicity was Caucasian.It also concluded that the VA's happiness intensity (M = 0.15) was higher than the intensity for sad (M = 0.05), angry (M = 0.03) and scared (M = 0.03).
Second, a measure of perceived humanness (scored as 1 = low and 10 = high), and used by Söderlund and Oikarinen (2021) in a study of bona fide VAs in the field, was distributed to the participants.The mean perceived humanness in the present sample was 5.23 (there were no significant differences between the three conditions), which is higher than the 4.79 mean in Söderlund and Oikarinen (2021).Taken together, these observations indicate that the VA in the present study can indeed be seen as somewhat humanlike.The participants were recruited from the Prolific online panel (n = 363, M age = 42.94;178 women, 182 men and 3 other, all were UK residents) and they were randomly allocated to watching one of the three video versions.Each video was followed by questionnaire items to capture the participants' responses to the VA in the video.

Measures
All measures of the variables in the hypotheses were based on multi-item scales, and they were scored on a 10-point scale.Cronbach's alpha (CA), composite reliability (CR) and average variance extracted (AVE) were used to assess the measurement properties.All item loadings were > .70 and a discriminant analysis with the heterotrait-monotrait approach indicated that no association was > .80.
The VA's ability to answer questions was measured with the items 'The virtual agent could answer all the questions from the human user', 'The virtual agent could provide information about everything that the human wanted to know', and 'The virtual agent was able to respond to all of the human's questions' (1 = do not agree at all, 10 = agree completely; CA = .91,CR = .92,AVE = .86).These items were developed for the purpose of conducting the present study.
Competence was measured with the adjective pairs 'Incompetent-Competent', 'Did not know what it was talking about -Did know what it was talking about', 'Low knowledge about running shoes -High knowledge about running shoes', and 'Unprofessional -Professional' (CA = .84,CR = .88,AVE = .67).Similar items have been used by Min and Hu (2022).
Openness to admitting knowledge limitations was measured with 'The virtual agent was open about its own knowledge gaps', 'The virtual agent had the ability to admit what it does not know', 'The virtual agent was willing to disclose its own lack of knowledge', and 'The virtual agent displayed awareness of its own knowledge gaps' (1 = do not agree at all, 10 = agree completely; CA = .95,CR = .96,AVE = .90).The items were developed for the purpose of conducting this study, and they were based on the conceptual characteristics of the humility construct as discussed by, for example, Haggard et al. (2017), Owens et al. (2013) and Whitcomb et al. (2017).
The measure of perceived usefulness comprised the items 'A virtual agent of the type depicted in the video makes the life of humans more convenient when they need information', 'When humans lack information about something, virtual agents of the type depicted in the video make the life of humans less effortful' and 'When there is a need for information, using a virtual agent with capabilities of the type depicted in the video would be time-saving' (1 = do not agree at all, 10 = agree completely; CA = .92,CR = .93,AVE = .87).Similar items were used in Söderlund (2022).Somewhat similar items have been used for the assessment of a digital assistant by Chattaraman et al. (2019).
Learning benefits were measured with the items 'Using a virtual agent of the type depicted in the video would improve my knowledge about running shoes', 'Using a virtual agent of the type depicted in the video would increase my competence in evaluating running shoes', and 'Using a virtual agent of the type depicted in the video would make me learn more about running shoes' (1 = do not agree at all, 10 = agree completely; CA = .94,CR = .95,AVE = .90).Similar items appear in the Ryu and Baylor (2005) scale for assessing perceptions of the extent to which a pedagogical virtual agent facilitates learning.
Perceived service quality (PSQ) was measured with the question 'What is your view of the service that the virtual agent provided?' followed by the items 'Low service quality -high service quality', 'Poor performance -Good performance', and 'It delivered below expectations -It delivered above expectations' (CA = .94).This measure of perceived service quality, as an overall evaluation of a service, was used in a service robot study by Söderlund (2022).Similar measures has been employed in human-to-human settings by Cronin and Taylor (1992) and Hartline and Jones (1996).To examine the validity of this measure (in the present setting, in which the main service of the agent comprised providing information), this item was used: 'Do you think it would be bad or good for the human in the video to believe what the virtual agent had to say about running shoes?' (1 = bad, 10 = good).The correlation between PSQ and this item was positive and significant (r = .63,p < .01),which indicates that the PSQ measure is measuring what it is supposed to measure.
It should be noted that the present study employed a Wizard of Oz approach (cf.Broadbent 2017).This means that the virtual agent was depicted in the videos as having more advanced capabilities than most existing virtual agent have when they are used for customer contacts by retailers.In other words, the stimulus virtual agent was not supposed to mirror any particular virtual agent in the world outside the experiment; it was supposed to mirror what is assumed to have the potential to exist in the not-so-distant future.This aspect of the stimulus VA calls for an assessment of its perceived realism, and it was made by asking the participants to respond to the following item: 'Virtual agents with capabilities of the type displayed in the video. ..' followed by the response alternatives ' . . .will never exist' (chosen by 3 participant), ' . . .exist already' (chosen by 235 participants), and ' . . .will exist in the future' (chosen by 125 participants).This indicates that the virtual agent appeared as realistic for the majority of the participants.

Manipulation check
The participants' perceptions of the VA's ability to answer to the human user's questions was employed as a manipulation check.These perceptions reached the lowest level in the low ability condition (M = 4.49) compared to the medium ability condition (M = 5.88) and the high ability condition (M = 7.02).The omnibus test in a one-way ANOVA indicated that all means were not equal (F = 36.36,p < .01)and post hoc tests with Scheffe's method showed that all pairwise differences were significant (all p < .01).The manipulation thus worked as intended.It should be noted that the utterance 'I do not know' can have other functions than indicating a replier's ability to answer questions -for example, indicating low commitment to what follows, to avoid impoliteness, and to reject requests (Baumgarten and House 2010;Beach and Metzger 1997;Tsui 1991;Weatherall 2011).The manipulation check results, however, suggest that the presence or absence of 'I do not know' indeed had an impact on perceptions of the VAs ability to answer.

Testing the hypotheses
A structural equation modeling approach with SmartPLS 3.0 was used to test H1-H5 within the frame of the same analysis.In the proposed model (see Figure 1), the participants' view of the VA's ability to answer the human user's questions was the independent variable.It was modelled as influencing perceptions of the VA's competence (H1), openness to disclose knowledge limitations (H2), usefulness (H3), and learning beliefs (H4).Each of these variables was also modelled as influencing PSQ (H5a-H5d).In addition, the proposed model comprised a direct association between the VA's ability to answer user questions and PSQ (to facilitate a mediation assessment; more about this below).The explained variance with this model (i.e.R 2 ) for PSQ was .68.The path coefficients and their effect sizes are reported in Table 1.
The results in Table 1 indicate that all hypotheses except H2 (about the influence of answering ability on openness to disclose knowledge gaps) were supported.The Answering ability -Openness association was in the hypothesized (and negative) direction, but it was not strong enough to receive support (b = −0.08,p = .17).Openness, however, did produce the hypothesized positive association with PSQ (i.e.H5b), which is consonant with the positive association between perceived clinician humility and patient satisfaction reported by Huynh and Dicke-Bohmann (2020).
Taken together, H1-H5 imply that the influence of the VA's ability to answer questions on PSQ is mediated.To assess this aspect formally, a mediation analysis was conducted along the lines suggested by Sarstaed et al. (2020).This means that mediation was assessed within the frame of the proposed model by an examination of the significance of indirect paths while the possible direct effect of the VA's ability to answer questions on PSQ was controlled for (by a direct link in the proposed model between answering ability and PSQ).This analysis indicated significant indirect influence of the VA's answering ability on PSQ via competence (b = 0.12, p < .01),usefulness (b = 0.13, p < .01)and learning benefits (b = 0.10, p < .01).The indirect influence via openness, however, was not significant (b = −0.009,p = .20).The direct association between answering ability and PSQ was significant, too (b = 0.20, f 2 = .09,p < .01),which indicates that complementary mediation was at hand (Zhao, Lynch, and Chen 2010).
The results thus indicate that the VA's ability to answer questions boosted PSQ.This can also be seen in a mean comparison between the three experimental conditions; the low ability condition produced a lower level of PSQ (M = 6.22,SD = 1.97) than the medium ability condition (M = 6.54,SD = 2.20) and the high ability condition (M = 6.90, SD = 1.79).The omnibus test in a one-way ANOVA indicated that all three means were not equal (F = 3.62, p < .05).Post hoc tests (with Scheffe's method) showed that only the pairwise difference between the low ability condition and the high ability condition was significant (p < .05).

Contributions
Thoughts about knowledge limits were shared already by Socrates, in Platon's Charmides, when he suggested that awareness about what we know and do not know should be seen as wisdom.Socrates, however, expressed uncertainty about the extent to which this awareness would make us act well or make us happy.The present study does not answer this, but it does provide evidence about the effects of perceptions of others when those others vary in their ability to answer questions.In the present study, the 'other' was a (humanlike) non-human agent, and thereby the present study contributes to research on reactions to such agents by examining the agent's ability to answer a human user's questions -a hitherto understudied aspect of VA behavior.This means -in relation to existing research -that the present study broadens the nomological net of several variables that were included in the study.
Given that the display of an ability to answer questions can be seen as an indicator of a VA's knowledge, existing research shows that perceptions of the knowledge of nonhuman agents are likely to be influenced by the social attractiveness of the agent (Chen and Park 2021) and by the participant's gender (Liew, Tan, and Jayothisa 2013); that such perceptions can affect trust (Matsui and Yamada (2019); and that they can mediate the influence of the agent's role as a specialist/generalist on purchase intentions (Tan and Liew 2020).The present study contributes to the nomological net of perceptions of others' knowledge, then, with the finding that a VA admitting that it does not know something results in lower evaluations of the VA compared to when the VA avoids admitting this.Oscar Wilde once stated that ignorance is like a delicate exotic fruit (Smithson 1993), thus implying that a low level of knowing can have a positive charge; the results in the present study, however, indicate the opposite with respect to the situation in which others' level of knowing is perceived to be low.
Moreover, the finding in the present study that a VA's ability to answer questions has a positive influence on its perceived competence contributes to previous research indicating that the competence of a non-human agent is a function of its perceived happiness (Söderlund, Oikarinen, and Tan 2021), agency (Lee et al. 2019), and usage of gestures (Bergmann, Eyssel, and Kopp 2012).It may be noted that -in conceptual termsa negative association between the ability to answer questions and competence is possible in a social perception setting.That is to say, if a person never fails to answer any questions, it can indicate that this person is unaware of his/her incompetence.And, according to Dunning et al. (2003), those who are most likely to have this unawareness are the low performers (here: those who indeed know little).If a perceiver is thinking along those lines when he or she faced with an agent that never admits a low level of answering questions, then, one would expect that it would result in a negative evaluation of the agent.Conversely, if I know that I do not know something (and admit this when I am asked about it), I display a high level of meta-ignorance (i.e.I know that I do not know; cf.Smithson 1993), which may represent a kind of advanced knowledge mirroring high as opposed to low competence.Therefore, one would think that my display of a low ability to answer would be rewarded with a high evaluation of me.Such response patterns, however, did not materialize in the present study.
If openness to disclosing one's own personal knowledge limitations is seen as a facet of humility, the present study also contributes to the field of research on intellectual humility.In this growing field, the typical study is based on the participant's selfreported intellectual humility and it assesses how humility is related to other personal characteristics such as dogmatism and narcissism (cf. the review by Porter et al. 2021).In contrast, the present study examined humility-related behavior in a social perception setting.Thus, the participants in the present study were asked about a target agent, not about themselves.So far, social perceptions of humility in a human-to-human context has been subject to only a limited number of studies (such as Hagá and Olson 2017).In the present study, it turned out that the perceived openness to admit one's knowledge limitations was positively associated with PSQ in the case of a VA, which mirrors empirical results in a human-to-human setting (Hagá and Olson 2017).This part of the results, then, seems to indicate that the general response pattern to humanlike non-human agents (i.e. to react to them similarly to how we react to real humans) appears to be at hand also in terms of humility-related attributes of an agent.
Similarly, the present study contributes to research on learning-related variables by showing that perceived learning benefits mediate the impact of a VA's ability to answer questions on perceived service quality.Previous studies within the field of e-learning with the aid of various non-human agents have examined, for example, if facial expressions, the instruction mode, and the verbal expressiveness of a virtual agent used for pedagogical purposes facilitate learning (Baylor and Kim 2009;Veletsianos 2009), and the present study's focus on the ability of an agent to answer questions expands the nomological net of such research.
It should again be underlined that many existing studies of humans' reactions to nonhuman agents (including the present study) are based on an anthropomorphism assumption.When such studies comprise agent characteristics and behaviors that are shared with humans (e.g.having a gender, display of happiness, and politeness), they can indeed capitalize on theory and findings from a human-to-human setting.The ability to answer questions in a human-to-human setting, however, has been subject to relatively few empirical studies.Given that a by-product of examining how humans react to (humanlike) non-human agents is that we may learn more about how humans react to other humans (e.g.Broadbent 2017), the present study can also be seen as taking some steps in the direction of a richer understanding of the ability to answer questions in a human-tohuman setting.That is to say, if this ability of a humanlike non-human enhances PSQ, a human frontline employee's perceived ability to answer questions in a service situation may enhance PSQ, too.If so, for example, humility in terms of acknowledging ones' knowledge limits may not be a desirable characteristic of a person who is representing a company in interactions with its customers.

Implications for decision makers
The main finding in the present study was that a VA's display of an ability to provide answers to user questions boosts PSQ.This implies that designers of VAs should deliberately program VAs so that they avoid an explicit display of a low ability to answer users' questions (e.g. by avoiding saying 'I do not know').As already indicated above, ChatGPT, which is likely to become a point of reference for many VAs used for customer contacts in retailing, is frequently displaying this type avoidance when it is producing nonsensical information that looks plausible (Taecharungroj 2023).
Although there are not many empirical studies of how we humans react to other humans who claim that they do not know the answer to a question, the implication above seems to be in tune with various social media personalities who give advice to managers and employees about what to do and not do in a work life context.That is to say, many of them strongly recommend that 'I do not know' should be avoided, because it can, it is argued, make one come across as inexperienced and unprofessional.Instead, many alternative options are recommended, such as 'Here is what I can tell you . . .', 'I'll find out', 'I have the same question', and 'My best guess is . . .'. Thus there are several specific options to consider for the programmer who wants to avoid a VA to signal a low ability to answer questions.This programmer, however, should be mindful of the possibility that a user of a VA may be such a user at least partly because he or she can get answers in corporate lingo-sounding language elsewhere (i.e. from humans engaging in impression management).
Another issue also calls for caution: if a VA is programmed to never display a low level of an ability to answer questions (e.g. by never saying 'I do not know'), it may interfere with the ability of the VA to learn.That is to say, using utterances such as 'I do not know' can signal to the VA itself that there is a need for additional learning, while never using such utterances may distort such signals.In other words, avoiding an explicit display of a low ability to answer questions may mask lack of knowledge -and to lack relevant knowledge without realizing it can have unwelcome (and even disastrous) consequences (Hansson, Buratti, and Allwood 2017).Using ChatGPT as an example again, it does not always pretend to know when it does not know; it frequently admits its knowledge gaps, too (Taecharungroj 2023).For example, when the author of the present study asked ChatGPT-4 about today's opening hours for Sainsbury's in London, this is what it answered: I'm sorry, but I don't have access to real-time data, including current store hours for specific locations like Sainsbury's in London." A related issue is that progress in artificial intelligence can make VAs experts in various domains.If this expertise is humanlike, however, it may produce a failure to recognize what one does not know, because (human) experts are likely to underestimate their own ignorance (Hansson, Buratti, and Allwood 2017).And if avoiding utterances of the type 'I do not know' boosts a VA's view of its own expertise, one may again expect that the motivation to learn new things is mitigated.

Limitations and directions for further research
In the present study, the stimulus VA had a (synthetic) voice as opposed to VAs that communicate with written text.Previous research, however, suggests that spoken words are more effortful to process than written text, particularly when produced by synthetic speech (Lee 2008).If it is more cognitively taxing to receive information by a synthetic voice compared to a written text, the present study may have made individual utterances such as 'I do not know' less salient and thus less causally potent compared to when they are produced in the form of text.Further studies are therefore needed to examine if similar results would be obtained when a VA communicates with written text.In addition, the VA was given a female gender.This may have reduced the negative impact of answering 'I do not know' in relation to a VA with a male gender, because in human-to-human settings people seem to have preferences for intellectually humble women and intellectually arrogant men (Hagá and Olson 2017).Thus, further research is needed to examine the extent to which the results in the present study would replicate with a male-gendered VA.
Moreover, the VA's ability to answer questions was manipulated by using the utterance 'I do not know', which represents a common phrase in human-to-human conversations to express an inability to provide requested information (Tsui 1991).However, a low ability to answer a question can be displayed in many other ways, for example, with utterances such as 'nobody really knows this' and 'this is hard to say, because . . .'. Thus, an agent can reduce the emphasis on 'I' and motivate its inability to answer by referring to gaps in the overall body of knowledge in a domain.Presumably, this can signal competence and may therefore have less negative effects than responses comprising 'I'.Indeed, if a domain comprises material that can be subject to scientific research, one may view serious science as something that acknowledges the limits of existing knowledge.In any event, the possibility of different effects of different ways of communicating one's knowledge limitations depending on the domain needs to be examined in further research.Researchers who study virtual agents' answers to questions should also be mindful about the existence of another type of response, namely an explicit refusal to answer (e.g.'I do not want to answer this' and 'I am not going to answer this').This type of nonanswer has been examined in politicians' responses to interview questions (Ekström 2009).Its valence appear to be ambiguous; it can signal a transgression of norms as well as skillfulness (Ekström, 2009).One may assume that there are situations in which such responses by a virtual agent can be positively charged, such as when the agent is concerned with the asker's well-being and does not want to deliver answers based on too little or contradictory evidence.As an extension, also based on analyses of politicians in interview situations, there are several other ways in which someone can provide a nonanswer to a question, such as ignoring the question, questioning the question, and attacking the questioner (Bull and Mayer 1993).Thus, 'I do not know' is one of many nonanswers, and further research is needed to assess how the full gamut of non-answers affect evaluations of those who use them.
The present study showed that three variables (competence, usefulness, and learning benefits) mediated the influence of perceived VA ability to answer questions on PSQ.However, the direct effect was significant, too.This suggests that also other mediating variables are likely to exist.One possibility that needs to be examined in further studies is the functions of the utterance 'I do not know'.In the present study, it was assumed that its presence or absence indicates a replier's ability to answer questions, and the manipulation check suggests that this was indeed so.However, given that 'I do not know' can indicate other things too, such as low commitment to what follows and a willingness to avoid impoliteness (Tsui 1991;Weatherhall, 2011), which appear to have a valenced charge in a social perception setting, more research is needed to capture also other possible interpretations of 'I do not know' than an ability to answer questions.Usage of 'I', for example, may signal that an agent has self-identification and self-awareness abilities (i.e. it can distinguish itself from others), which is often seen as a prerequisite for having a mind.And perceptions of a non-human agent as having a mind can boost evaluations of the agent (Söderlund and Oikarinen 2021).
As for moderating variables, the context in which a low ability to answer questions is displayed may affect its impact on other variables.However, this was not assessed in the present study.One such contextual variable is the level of expertise displayed by a VA when it comes to things that it does know (and can answer questions about).More specifically, one would assume that an agent that displays a high overall level of expertise and displays (here and there in a conversation) that there are limits to this expertise, would produce less negative responses compared to when such answers are delivered by an agent that displays a low overall level of expertise.That is to say, if admitting that one has knowledge gaps is a facet not only of humility, but also of wisdom in the Socratic sense, it is possible that an agent that admits what it does not know can boost perceived agent competence when the agent has additional means to display that it has indeed high knowledge.It should also be observed that the setting for the VA in the present study was communication about one particular product category (i.e.running shoes), a domain for which there is indeed ambiguity when it comes to empirical evidence (such as how long you should keep a pair if running shoes).This means that a display of a low level of ability to answer questions may be less influential compared to domains in which ambiguity is lower.For example, if the domain is 'the running shoes available from retailer X', one would expect that an inability to answer questions such as 'do you have the shoes XYZ in size 10.5?' would produce stronger negative reactions.Thus further research is needed also to examine if the results in the present study would generalize to domains with other levels of ambiguity.
Finally, the present study can be seen as a typical contemporary experiment in the sense that it comprised an assessment of effects immediately after (one) exposure to stimuli.Thus the present study does not address what happens after repeated exposure to VA behaviors in conversations.One possibility, however, is that prolonged and repeated exposure to non-human (but humanlike) agents displaying one particular behavior could influence humans to behave in the same way (Cappuccio et al. 2021).This is basically the same assumption as when acknowledging that firms' activities -in the aggregate -can produce unintended societal consequences.Examples of such (alleged) consequences, in the domain of firms' advertising activities, are that advertising can reinforce materialism, cynicism, irrationality, selfishness, anxiety, social competitiveness, and sexual preoccupation (Pollay 1986).In the light of such influence, one would be tempted to believe this: if retailers and others who instruct their VAs to never display a low level of knowledge, it can result in an attenuated level of intellectual humility in society.Conversely, given that a virtuous non-human agent can contribute to users' moral flourishing by continuously repeating certain behaviors (Cappuccio et al. 2021), explicitly instructing VAs to indeed admit what they do not know may enhance human-to-human intellectual humility.And as noted by Wright et al. (2017), even a little more humility in the world would go a long way to make the world a morally better place.

Biographical note
Magnus Söderlund is Professor of Marketing and Head of the Center for Consumer Marketing at Stockholm School of Economics, Sweden.His main research current interest is how consumers react to various forms of marketing and service activities involving non-human agents such as virtual agents and robots.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Notes on contributor
Magnus Söderlund is a Professor of Marketing and at the Stockholm School of Economics and a Senior Fellow at the Hanken School of Economics.His main research interest is in how consumers react to various forms of marketing and service activities involving non-human agents such as virtual agents and robots.

Figure 1 .
Figure 1.Overview of the hypothesized associations.

Figure 2 .
Figure 2. Still image of the VA from the videos.