Improving the Use of Deictic Verbs in Children with Autism Spectrum Disorder

ABSTRACT Background: Children with autism spectrum disorder (ASD) show difficulty in comprehension and production of the deictic verbs “come/go.” Objective: To examine whether introducing conditions related to daily conversations into training would improve the use of deictic verbs. Methods: Six Japanese children with ASD participated. We set up multiple scenes where the questioner presented the sentence using “come/go” with/without deictic gestures, and children with ASD replied with “come/go.” The conditions such as spatial relations between the two parties (face-to-face or side-by-side) and presentations of the gestures (moving one’s arm toward or away from the body or moving one’s upper body forward/backward) were introduced. Results: The appropriate use of deictic verbs during training and in daily life situations among children with ASD increased. Conclusions: Training children with ASD to look in the direction indicated by the questioner and to synchronize their bodies with the questioner’s movements promotes their acquisition of deictic verbs.


Introduction
Difficulties with the comprehension and production of deictic terms have been reported in children with autism spectrum disorder (ASD). [1][2][3] Deictic terms, such as "I/you," "come/go," "here/there," and "this/that," shift depending on the viewpoint of the speaker and on the spatial relation between the speaker and the listener. 4,5 There are also deictic gestures, non-verbal expressions that indicate a direction, place, or object primarily by pointing (e.g., point to their left to indicate "here"). 6 The present study focused on the personal pronouns "I/you" and the deictic verbs "come/go" in the Japanese language, as well as deictic gestures.
For the acquisition of deictic verbs, it is necessary to understand that personal pronouns and deictic verbs are interrelated. Personal pronouns, such as "I/you," require the speaker/listener to continuously re-map their reciprocal relation to their referent, depending on who is saying the pronoun. 7 The speaker then utters either "come" or "go" in response to the selected personal pronoun. As an example of conversation in Japanese, Hanako asks Taro, "Will you come to play today?" Taro first perceives the movement from Hanako's viewpoint, such as "From Hanako's viewpoint, I come to Hanako's house (Taro comes to my house)." Subsequently, Taro shifts his viewpoint as "From my viewpoint, I go to Hanako's house" and replies "I'm going/not going." This process is called deictic shifting. 8 This indicates a crosslinguistic difference between English and Japanese. In English forms, Hanako asks "Will you come to play today?" and Taro replies using the word "come," such as "Yes, I will come." Hereinafter, "come/ go" are used based on the Japanese forms. Moreover, it is observed that typically developing (hereinafter referred to as "TD") children at the stage of acquiring these verbs tend to spontaneously move their arm or hand toward the body when uttering the word "come" and away from the body when uttering the word "go." 9 However, children with ASD use deictic gestures at a significantly lower frequency than TD children, 10 which is due to the difficulty in coding the shifting reference between the speaker and listener. 11 In addressing the difficulty experienced by children with ASD in acquiring deictic terms, studies that analyze deictic relational complexity 12 and teach "I/you," "then/later," and "here/there" relations have been reported on a global scale. For example, in one study, the researcher placed pictures of pairs of items with "Then-Later" temporal relations, such as "seed-flower," in front of the participant in a randomized order. 13 In the simple Then-Later relation, the correct response to the question "What was Then?" was defined as pointing to a picture of a seed. By contrast, in the single reversal Then-Later relation, the correct response to the question "If Then was Later and Later was Then, what would be Then?" was defined as pointing to a picture of flower. In the Here-There relation, stimuli such as pencil-zoo were used, and simple/single reversal relations were set as in the Then-Later relation. Corrective feedback for responses was offered and the participants were able to identify single reversal Then-Later and Here-There relations. Similarly, applying deictic relational complexity to "I/you" and "come/go" in the English form, the question "Will you come to play today?" is required to be responded to by reversing the subject, such as "I will come"; that is, the single reversal relation. In the Japanese forms, however, both the subject and the verb are reversed, such as "I'm going"; that is, the double reversal relation. Thus, the deictic relational complexity of the language and sentence influences the acquisition of deictic verbs. Based on the analysis of the subject-verb relation, the findings of the present study conducted with Japanese-speaking children with ASD may be applied to teach deictic verbs to non-Japanese-speaking children with ASD as well. Such application may have significant research and clinical implications.
Furthermore, in the Japanese language, several studies have been conducted using the framework of conditional discrimination 14 for sentence comprehension in case of verbs such as "give/receive." For instance, in the experiment conducted by Shimizu and Yamamoto, 15 participants were required to walk up to a teacher giving or receiving an object and construct the sentence by selecting verbs, persons' name, and particles. Consequently, the use of appropriate sentence structure with the verbs giving and receiving was promoted. This is because setting up real movement as a sample stimulus enhanced the discriminability of the action itself, and therefore, promoted appropriate sentence construction. This research finding has been replicated in other studies. For example, in the study by Asaoka et al., 16 two people, including a child with ASD, sat across from each other. A stuffed animal was placed between them. A trainer then presented letter stimuli such as "Taro gives Hanako the stuffed animal" and "Hanako receives the stuffed animal from Taro." The participant performed one of the actions of "give/receive." The trainer presented corrective feedback for selective responses. Based on the results that the intervention facilitated appropriate responses, it was suggested that children with ASD increase their attention to personal pronouns and verbs by experiencing the roles of both the giver and the receiver and by acting on themselves. Other case studies have also reported the effectiveness of using this framework to target the acquisition of verbs other than giving and receiving, such as "sell/buy," "throw/catch," and "hide/seek." 17,18 However, these studies have not measured the generalization that occurs in daily life and/or repeatedly reported difficulties with generalization. Generally, the connection between words and actions is embedded in everyday life. For example, the mother says that "Hey, Taro! Give the newspaper to daddy." She praises the child's behavior or presents the model while saying "Give the newspaper to daddy." That is, children's actions and their labels (e.g., the act of giving and the word "give") are presented in pairs, and children learn the meaning and usage of words. 19 Although natural contingencies are integrated in their daily lives, children with ASD show difficulty in acquiring deictic terms, which needs to be explored and addressed.
To solve the problem of generalization and to answer this question, the present study analyzed the differences in environmental conditions between training and daily life situations and introduced the conditions related to spatial relations between the two parties (i.e., face-to-face or side-by-side) and deictic gestures as variables in daily life into training situations. The spatial relationship of the two individuals was always kept face-to-face in previous studies, whereas various relations were assumed in daily conversation. Moreover, the presence and frequency of deictic gestures might have varied -e.g., "[with one's finger pointed] Let's go there!" and "[in words only] Let's go there!" Therefore, we hypothesized that the following manner generates appropriate responses if sentences are questioned using "come/ go," in case of children with ASD: (1) depending on the types of questions, the questioner and a child with ASD, who is the responder, line up face-to-face or side-by-side, and (2) the questioner provides deictic gestures. This hypothesis is supported by several reports that children with ASD tend to imitate another person's behaviors based on the visual appearance from their perspective 20,21 ; for instance, children will wave a hand with the palm facing inwards when their mother waves their hand. Meyer and Hobson 20 implemented a task in which children with ASD imitated the operation of another person. As an imitation of the operation, the experimenter and participant sat face-to-face. Two boxes were placed in front of the experimenter and the participant. The experimenter placed their own box on top of the participant's box and then placed it back in its original position. Immediately afterward, the participant imitated the experimenter's operation. The results showed that TD children tended to place their own box on top of the experimenter's box, and children with ASD tended to place the experimenter's box on top of their own box. In summary, TD children tend to focus on both the person and direction of movement, while children with ASD tend to focus only on the direction of movement.
Considering the characteristics of imitation in ASD, we further explored the hypotheses. The sentence "Will you come to play today?" means that the child with ASD as the responder is approaching the questioner. Applying this sentence structure to spatial relations and deictic gestures, the two are face-to-face and the questioner asks, "[while moving an arm toward inside of the body] Will you come to play today?" This may encourage children with ASD to move their arms and/or body forward and to produce verbal responses of "I'm going/not going." Similarly, the sentence "Will you go to the park with me?" implies that the questioner and the child with ASD are moving in the same direction. The introduction of conditions in which the two are side-by-side and the questioner is moving an arm toward the outside of the body may promote the occurrence of appropriate physical and verbal responses.
The purpose of this study is to expand the findings of intervention research on deictic terms by incorporating deictic relationships and the characteristics of imitation in children with ASD. The research questions for this study include the following: Research Question 1: Does the introduction of variables of spatial relations and deictic gestures improve non-verbal/verbal communication in children with ASD?
Research Question 2: Does the difference in response topography of deictic gestures between the questioner and the responder affect their performance? Study 1 Masataka 9 analyzed gestures produced at the time of utterance of the word, "go/come" from video images to determine whether the movements of children's hands or arms were directed toward inside/outside of their body. To verify this hypothesis, it was essential to quantify the extent to which children with ASD looked at the questioner's body orientation and gestures, and in which direction they moved. We analyzed the data using an eye tracker and motion capture, which is a device to measure the three-dimensional positional relation by reflection from infrared rays.

Participants
Three Japanese children with ASD (one boy and two girls) participated in the study. The participants were recruited from a university clinic center based on the following three criteria: (a) the child's chronological age (CA) was between 6 and 10 years based on the results of the study by Masataka 9 ; (b) the child was diagnosed with ASD by at least one doctor using the standard and diagnostic criteria of the DSM-5 22 and had a score of ≥ 9 in early childhood or ≥ 13 in childhood on the Parent-interview ASD Rating Scale-Text Revision (PARS-TR) 23,24 ; and (c) the child was able to perform simple imitations (e.g., putting the hand on the head). The upper part of Table 1 details Moe, Ken, and Yuki's (the name of the participants in Study 1) descriptive information. The study protocol was approved by the research ethics committee of the Faculty of Human Sciences, University of Tsukuba (No. 2019-A134). The parents of each participant signed an informed consent form before starting the study. All participants were compensated for their time.

Setting and Materials
All sessions were conducted at the university clinical center and at each participant's home. The duration was 20-30 min per session, and the frequency was once or twice per week. In the session conducted at the center, one undergraduate or graduate student (hereinafter referred to as the child actor) sat face-to-face or side-by-side to the participant. The distance between the two parties was always 100 cm (see Figure 1). The participant wore a glasses-shaped eye tracker (Tobii technology  Note.The child actor presented the body movements at the same time as they uttered "come/go" in the lines. Bold and fine letters indicate the correspondence between theverbs and movements. The child actor's illustrations in Study 2 were made so that their movements were clear, though they actually moved their body slightly. Tobii Pro Glasses 2; hereinafter referred to as the eye tracker) and attached a reflective marker on the center of the chest. A director and a narrator were also present, who were out of the participants' line of sight. A motion capture camera (OptiTrack V 120: DUO), a PC (Dell Precision 5530) to control the eye tracking and motion capture software, and a video camera (Sony HDR-CX485) were placed in the room. One or two blocks were conducted during each session, with each block consisting of 12 trials, including the director's cue, narration, child actor's line, and participant child's lines. We prepared forty sets of narration and question sentences with "come/go." The verbs "come/go" imply a person-toperson movement, which is more direct than verbs that involve the movement of items, such as "give/receive." For this reason, we predicted that the effects of introducing variables of spatial relations and deictic gestures as well as the difference in the response topography of deictic gestures would be maximized. Hence, we focused on a single set of verbs. The narration sets consisted of ten each of four types of interrogative sentence as "question → response": "come? → come," "come? → go," "go? → go," and "go? → come" (for a sample of each type, see Table  2). These sentences were age-appropriate. These sentences were used in a setup in which two people were talking in the same place and were answered using "come/go." The appropriateness and validity were checked and revised by a Japanese junior high school teacher. For example, in the type of "come? → go," an interrogative sentence using "come" was presented and the participants answered using "go." The main characteristics of the Japanese language are that subjects such as "I/you" are omitted in daily conversation, and the word order is opposite to that of English. Additionally, subject, object, and prepositional phrases in Japanese sentences are identified precisely by corresponding case markers or postpositional particles, such as -wa, -no, -de, and -ni attached to nouns (for a detailed explanation of Japanese grammar, see ref. 24).

Experimental Design
A single-case experiment design was adopted with "B" denoting prompt and fading, "A" denoting test and stimulus generalization, and "C" denoting implementation at home setting. Following the initial assessment, a BABAC design was introduced for two of the three participants (Moe and Ken), while a BAC design was introduced for the remaining participant (Yuki).

Assessment.
A director asked the participants to shoot a play titled "Let's shoot a diary of the elementary school students!" An undergraduate or graduate student took on the role of either friend, parent, grandparent, or store clerk. In addition, the director instructed the participants to say a line using "come/go" following a line uttered by the child actor and gave examples such as "I want to go!" and "I think he/she will come." The director then cued the start of the play using the clapperboard. The narrator read the narration such as "The sixth period is over," and the child actor immediately asked, "[The participant's name], will you come to play today?" Based on the participant's response, the child actor accordingly modified their lines. When a correct response occurred, such as "Um, I go," the child actor responded ad-lib, such as "What shall we play?" or "I am looking forward to it!" When an erroneous response occurred, such as "Yes, I come," the child actor responded neutrally, such as "Yeah" and "uh-huh." Two conditions for the positional relations between the participant and the child actor and the presence of gestures were set as follows: regarding the positional relations, the two parties' body orientation and direction of movement were classified as equal/not equal for each type of stimulus sentence. In the types of "come? → come" and "go? → go," the side-byside was classified as equal and the face-to-face as not equal. For example, a "go? → go" type sentence such as "Will you go to the park with me?" assumes that both parties are facing in the same direction and going to the park together. Hence, their body orientation and direction of movement are equal in the side-by-side and are not equal in the face-to-face. Similarly, in the types of "come? → go" and "go? → come," the face-to-face was classified as equal and the side-by-side as not equal. For example, a "come? → go" type sentence such as "Will you come to play today?" assumes that one party approaches the Note. The subject of "come/go" and the verbs "come/go" are shown in italics and bold, respectively. The use of "come/go" in English forms is based on the Japanese forms. The upper part of b is a Romanized version of the Japanese sentences and the lower part represents English words corresponding to Japanese words.
other and plays with them. Hence, the two parties' body orientation and direction of movement are equal in the face-to-face and not equal in the side-by-side. The mix condition is a mix of equal/not equal in a 1:1 ratio. Regarding the condition of the line with the gesture, the child actor moved their arm toward their chest simultaneously as they uttered "come" and stopped moving it at the end of the line. When the word "go" was uttered, the child actor moved their arm in the opposite direction (see upper row in Figure 1). Specifically, the child actor uttered lines such as "Will you [while pointing forward with their dominant arm] go to the park with me?" In the condition of the line without the gesture, the child actor said the line with both hands on their knees -e.g., "[placing both hands on their knees] Will you go to the park with me?" The mix condition is a mix of with/without the gesture in a 1:1 ratio. Hereinafter, equal/not equal is referred =/≠ and with/without as w/ and w/o, respectively. These two conditions were combined and five conditions were introduced: and mix x mix. Prompt and fading. Based on the results of the assessment, two types of "come? → go" and "go? → come" were used for Moe and Ken, and three types of "come? → go," "go? → go," and "go? → come" were used for Yuki. Furthermore, = x w/ for Moe and Ken and ≠ x w/o for Yuki were introduced.
(a) Initial sound and physical prompt + corrective feedback was applied to Moe and Ken. When the child actor presented the gesture, the director put both hands on the participants' shoulders from behind and guided their body back and forth. For example, the director guided the participant's body forward in the stimulus sentence "Will you come to play today?" Immediately after the child actor had finished saying their lines, the director presented an initial sound (i.e., /k/ or /i/ in Japanese) of the verb "come/go" ("kuru/iku" in Japanese). When the correct response occurred, the child actor responded in an extempore manner, and the director reacted with verbal praise and clapping hands. When the erroneous response occurred, a retrial was conducted. Subsequently, physical prompt + corrective feedback was implemented, in which we provided the prompts and feedback in a similar manner.
(b) Physical prompt and fading + corrective feedback was conducted for Moe and Yuki. The difference from condition (a) is that the physical prompts were progressively removed. In addition, Moe's 12 and 13 blocks were implemented without using words. In these blocks, two parties were facing each other, the child actor moved an arm backward/forward and Moe moved her upper body forward/backward. Alternatively, they were positioned side-by-side, with the child actor moving an arm backward/forward and Moe moving her upper body backward/forward. Thereafter, the narration was omitted, and the child actor only presented the verb "come/go" in 14 blocks. In response to the positional relationship and the verb, Moe moved her body backward/forward and expressed "come/go" with gestures.
(c) Gesture fading + corrective feedback was applied to Ken. The difference from condition (a) is that gestures were progressively removed. If the number of correct responses per block in two consecutive blocks was more than 10 (83.3%), gestures were presented in 75, 50, 25, and 0% (corresponding to 9, 6, 3, and 0 trials) of the 12 trials per subsequent block.
Test and stimulus generalization. We presented the narration and interrogative sentences used/not used during the intervention phase in test/stimulus generalization, respectively. The procedures were similar to those used in the assessment.
Implementation at home setting. The first author provided the parents with examples of the stimulus statements used in this study and asked them to set up opportunities for the participants to ask questions using "come/go" in the context of their daily life. The recording form had columns for writing the date, situation (e.g., when taking a bath), the position of the parents and child (invisible to each other, face-to-face, or sideby-side), questioner (e.g., mother), the question and type, and the participant's response. When the correct response occurred, the parents naturally continued the conversation in response to the participant's response. When an erroneous response occurred, the parents naturally presented a model of appropriate expression.

Dependent Variables and Data Analysis
Dependent variables were classified into the following six categories. (1) Number of correct responses (times): The correct response was defined as the inclusion of "come" or "go" in the participant's child's line, depending on the four types (i.e., example of responses highlighted in bold font in Table 2). The number of correct verbal responses within a block was counted for each of the four types. Using voice processing software (Adobe Audition 2020), the Japanese voice onset and offset time were identified based on the waveforms and sounds of the speech. 25 Specifically, we listened to the recorded voice while looking at the waveform and noted the time when the waveform converged at the end of the child actor's line, and the time when the waveform rose at the start of the participant's response. Regarding the identification of the time when the participant uttered the line, we excluded the filler words such as "well," "uh," "er," and "um." Finally, the total time was divided by 12 and multiplied by 100 to calculate the average response time. (3) Rate of upper-body movement depending on verbs (%): Based on the response definition provided by Masataka,9 we defined the participants' upper-body movements depending on the verb used; that is, backward movement of their upper body in the trial was to be expressed as "come" and forward movement of their upper body in the trial was to be expressed as "go." Similar to the analysis of average response time, we identified the time intervals at which the participant expressed the lines from the time the child actor uttered /k/ or /i/ on each trial. Thereafter, we analyzed participants' upper body movements in the anteroposterior direction using motion analysis software (Acuity SKYCOM). The maximum amplitude in the anteroposterior direction was regarded as the upper-body movement. The total number of occurrences of the body movements corresponding to the verb was divided by 12 and multiplied by 100. (4) Average amount of upper-body movement (mm): The previous dependent variable defined the upper-body movement as a rate, while this dependent variable defined it in terms of an amount. The data were obtained for each trial using the motion analysis software. The sum was divided by 12 and multiplied by 100 to calculate the average. (5) Average fixation rate to the child actor's face (%): The rate of fixation on the child actor's face was defined as the percentage of time for which the participant gazed at the child actor's face in the time between the start and end of the child actor's line during each trial. To measure fixation duration within this time interval, eye movement analysis software (Tobii technology Tobii Pro Lab) was used, and areas of interest (AOI) were manually defined for the child actor's facial area. The AOI total fixation duration(s), not including the saccade, was automatically calculated in each trial. We divided the AOI total fixation duration by the time interval and multiplied by 100 to obtain the fixation rate to the child actor's face per trial. Based on these values, the average fixation rate to the child actor's face per block was calculated. (6) Average fixation rate to the child actor's gesture (%): The rate of fixation on the child actor's gesture was defined as the percentage of time for which the participant gazed at the child actor's gesture in the time between the child actor's utterance of /k/ or /i/ to the end of saying the line (i.e., the time interval during which the child actor presented the gesture) in each trial. We identified the fixation rate per trial and then calculated the average fixation rate for the child actor's gesture using the same method as that for the face.

Inter-rater Reliability and Procedural Integrity
The dependent variables other than the number of correct responses were automatically calculated based on the following four times: the child actor started the line, the child actor uttered /k/ or /i/, the child actor finished the line, and the participant started the line. The researchers first randomly sampled 20% of blocks each from Moe, Ken, and Yuki's assessment phases. They also extracted 36%, 33%, and 40% of blocks from Moe, Ken, and Yuki's prompt and fading phase, respectively, and 25% each from their stimulus generalization phase. Subsequently, inter-rater reliability was established through independent raters, 25 who were either a master's or doctoral student in special education. All raters received specific training on data collection procedures prior to measurement. The independent rater also collected the procedural integrity data via video using checklists for each phase. For this purpose, the researchers randomly sampled the same percentage of blocks from each phase as in case of measuring inter-rater reliability. The common checklist for assessment, prompt and fading, and stimulus generalization phases included items assessing whether (1) the two parties' body orientation (i.e., side-by-side or face-toface), (2) the presence of gestures (i.e., w/ and w/o), and (3) the child actor's response to the participant's line (i.e., correct or error response) were appropriate. Additionally, the checklist for the prompt and fading phase included items assessing whether the director accurately presented (4) prompts and (5) feedback according to the procedures of each condition. Procedural integrity was then calculated by dividing the number of items that had been completed accurately by the total number of items on the checklist and multiplying the quotient by 100. Procedural integrity for this study was assessed as being 100% for all participants across all phases.

Figures 2, 3, and 4 show the changes in the dependent variables
for Moe, Ken, and Yuki, respectively. In the assessment, Moe and Ken had a low number of correct responses and average fixation rate to child actor's face/gesture in two types of "come? → go" and "go? → come," and their rate of upper-body movement depending on verbs for each condition was generally below the chance level (50%). In Yuki, when either = or w/ was introduced, correct responses and upper-body movement depending on verbs (hereafter referred to as appropriate body movement) occurred frequently. By contrast, the occurrence rate of her appropriate physical and verbal responses remained low in ≠ x w/o and mix x mix. During the intervention phase, the dependent variables of three participants tended to positively change by progressively fading the prompts; their average response time decreased within the same condition. However, Ken's rate of upper-body movement depending on the verbs remained at the chance level in the gesture fading + corrective feedback. Moe and Ken's average fixation rate to the child actor's face/gesture increased and Yuki's rate decreased in the intervention compared to the assessment. Furthermore, the gesture to move their arm or hand toward the inside/outside of their body did not occur in all participants and phases. At home setting, the correct responses occurred in all trials for Ken and Yuki, and the error responses occurred in the types of "go? → come" for Moe.
The results of Study 1 suggested an improvement in nonverbal/verbal communication during conversations using "go/ come"; this relates to Research Question 1, which asked whether the introduction of conditions for spatial relations and deictic gestures would improve the communication in children with ASD. The participants moved their upper body in accordance with the direction of the child's gesture; these movements may have prompted the participants to utter "go/ come." In other words, the synchronization of body movements between the two parties is a necessary condition for the comprehension and production of deictic verbs. This is supported by previous research on Japanese linguistics that has demonstrated that synchronization of body movements occurs during the expression of deictic verbs. 26 The complexity of the question-response process, within the Japanese cultural context, may promote the production of deictic gestures. In contrast to the results reported by Masataka, 9 TD children consistently produced gestures to move their arms or hands; however, none of the three children with ASD produced the gesture. These results support the findings of Manwaring et al. 10 that deictic gestures occur at a lower frequency in children with ASD. From a macroscopic perspective, it is also inferred that the upper-body movement in the anteroposterior direction has the function of deictic gestures. On the basis that gestures are non-verbal body movements that exhibit images that cannot always be expressed in speech, or that they cooperate with speech to express the person's meaning, 6 upper-body movements can also be regarded as a kind of deictic gesture. We were able to accurately capture subtle (tens of millimeters) responses that are difficult to determine from video images by using a motion capturing system. Based on this premise, the fact that appropriate body movements did not occur enough in Ken's gesture fading + corrective feedback suggests that he may have learned only different types of responses by training with multiple stimulus sentences.
The clinical and research significance of Study 1 lies in the fact that we set the condition of = x w/ to utilize the spontaneous body movements that occurred during the assessment and gradually faded the prompts to approach the daily setting. However, the findings of Study 1 are limited in the following ways: (a) The intervention procedures were introduced at the same time -immediately after the assessment was completed for all participants. Additionally, the experimental designs differed among participants, which can weaken any conclusions drawn based on the intervention effects. Thus, the findings need to be validated using an experimental design such as a multiple-baseline design across participants 27 ; (b) With regard to the synchronization of body movements, response topographies were different, such as the gesture of moving their arm for the child actors and upper-body movements for the participants. The effects of aligning the response topography of the participant and the questioner should be examined.

Study 2
We replicated Study 1 using a multiple-baseline design across participants and examined the effects of aligning response topography.

Participants
Three Japanese children with ASD (two boys and one girl) participated in the study. The participants were recruited from a child development center based on the same criteria as in Study 1. Two children with high performance at baseline were excluded from this study. The bottom row of Table 1 details Sora, Hina, and Jun's (the name of the participants in Study 2) descriptive information. As with Study 1, Study 2 was conducted with the approval of the ethics committee and parental consent.

Setting and Materials
All sessions were conducted at the child development center, two community centers, and each participant's home. The other settings were the same as in Study 1 and the same stimulus statements were used as in Study 1.

Experimental Design
The experimental design was a multiple-baseline design across participants. 27

Procedures
Baseline. As the condition for the positional relationships and the presence of gestures, mix x w/o was introduced, assuming the situations of daily life. Other procedures were similar to those in Study 1.
Prompt and fading. Differing from Study 1, the child actor moved their upper body slightly (about 5 cm) back and forth (see bottom row in Figure 1). The reason for the change from arm to upper-body movements was that the synchronization of the body movements would be facilitated by aligning the response topography of the child actor and the participant. The other procedures were the same as in Study 1.
Test and stimulus generalization. The procedures were the same as in Study 1.
Introduction of diagonal positional relationships. This condition was introduced only to Hina because of her poor performance in the mix x w/o of the first stimulus generalization test (14 blocks). The child actor and the participant faced diagonally (in the middle between face-to-face and side-by-side) and played a role. Immediately after that, the two parties sat in the positional relationship of ≠ (face-to-face or side-by-side) and the same narration and child actor's line were presented. That is, the trials in the positional relationship of diagonal and ≠ were conducted as one set of two trials. The ratio of trials in the positional relationship of = and diagonal was 1:1 (i.e., 6 trials each). The director provided corrective feedback to the participant in context to the movement of the upper body and the occurrence of verbal expression. The ratio of the diagonal positional relationship was gradually reduced to 100, 67, and 0% (corresponding to 6, 4, and 0 trials) and that of ≠ correspondingly increased to 0, 33, and 100% (corresponding to 0, 2, and 6 trials) in 15, 16, and 17 blocks, respectively.
Implementation at home setting. Sora and Jun' data were excluded because they had limited opportunities to go out due to the COVID-19 pandemic. Hina had the opportunity to go out because of her family situation. We employed the same procedure as in Study 1.

Dependent Variables and Data Analysis
The dependent variables and methods of data analysis were the same as in Study 1.

Inter-rater Reliability and Procedural Integrity
Inter-rater reliability was measured in the same manner as in Study 1. The researchers randomly sampled 50%, 25%, and 20% of blocks from Sora, Hina, and Jun's baseline phase, as well as 29%, 36%, and 33% of blocks from their prompt and fading phase, respectively. They also extracted 20% of blocks from each of the three participants' stimulus generalization phases. The mean rating differences for Sora for  Note. pr. = prompt. FB = feedback. = and ≠ mean that the two parties' body orientation and direction of movement are equal/not equal,respectively. w/ and w/o mean the line with/without gesture, respectively. mix means a mix of = and ≠ or w/ and w/o in a 1:1 ratio.
The procedural integrity data were collected in the same manner as in Study 1. The researchers randomly sampled the same percentage of blocks from each phase as in case of measuring inter-rater reliability for Study 2. Procedural integrity was assessed as being 100% for all participants across all phases.

Results and Discussion
Figures 5, 6, 7 and 8 indicate the changes in correct responses, response time, upper-body movement, and eye movement, respectively, for all the participants. In the baseline, Sora and Jun's number of correct responses and rate of upper-body movement depending on the verbs were around the chance level, and their average fixation rate to the child actor's face was low. Hina's number of erroneous responses in each block was two to four trials; her rate of upper-body movement depending on verbs was slightly below the chance level, and her average fixation rate to the child actor's face was high. In the prompt and fading phase, the dependent variables tended to change positively compared with the baseline. Additionally, in the stimulus generalization for Sora and Jun, their performances were generally maintained. However, Hina's performance was maintained in the condition of = (12 and 13 blocks) and significantly decreased in the condition of mix (14 blocks). For example, the child actor and Hina faced each other. Subsequently, the child actor asked her, "Do you go to the hospital to visit Mei today?" to which she replied, "Yeah, I'm coming." Based on the results, setting up a spatial relation in which the two faced diagonally improved her performance and it was maintained in the second stimulation generalization. In addition, Sora and Hina did not produce the gesture to move an arm or hand in all phases, but Jun sometimes pointed backward or forward just before expressing with "come/go" in the generalization phase. At home setting, Hina's correct responses occurred in 22 out of 24 trials across 2 blocks (92%). Two error responses occurred in the type of "come? → go." In response to Research Question 2, which asks whether the difference in response topography of deictic gestures affects the performance of children with ASD, the results of Study 2 confirmed some of the effects. Specifically, the number of blocks required to improve the use of deictic verbs differed between Studies 1 and 2. For Moe, Ken, and Yuki in Study 1, the total number of blocks in which the training was implemented (i.e., present the prompts and feedback) were 9, 13, and 4 blocks, respectively, with an average of 8.7 blocks. In contrast, those of Sola, Hina, and Jun in Study 2 had 7, 10, and 6 blocks, respectively, with an average of 7.7 blocks. Thus, the number of blocks required for improvement in Study 2 was one block less than in Study 1. Moreover, none of the participants in Study 2 required the training to understand the direction in which the verbs were pointing, as in blocks 12 to 16 for Moe.
These results suggest that aligning response topography made it easier for the participants to understand the direction in which they moved their bodies. This may have facilitated the synchronization of the body movements and led the participants to acquire the verbs with a slightly smaller number of blocks. The findings of Study 2 were further confirmed by using a multiple-baseline design across participants. However, the data collection was weak in order to examine the intervention effects, and it partially did not meet the "What Works Clearinghouse" technical guidelines, 28 which state that "for a phase to qualify as an attempt to demonstrate an effect, the phase must have a minimum of three data points." Future studies should collect data while adhering to these guidelines.
Incidentally, the reason why Hina performed poorly in mix x w/o of the stimulus generalization is significant. Hina and Jun's rate of upper-body movement depending on verbs during the prompt and fading phases was stable at Note. pr. = prompt. FB = feedback. = and ≠ mean that the two parties' body orientation and direction of movement are equal/not equal, respectively. w/ and w/o mean the line with/without gesture, respectively. mix means a mix of = and ≠ or w/ and w/o in a 1:1 ratio.
100%. Hina's average response time in mix x w/o of the stimulus generalization compared to the prompt and fading increased by about 1 s and that of Jun increased by about 2 to 3 s. Jun sometimes said, "It's difficult." in mix x w/o of stimulus generalization. In this condition, the narration and child actor's line are the clues to respond appropriately, and the mixture of spatial relations is an interfering stimulus. The length of the response time may represent an internal process that shifts attention only to the narration and child actor's line. To summarize, it is inferred that whether these two stimuli functioned as discriminative stimuli, that is, whether or not conditional discrimination 14 was established, affected their performance.

General Discussion
In this study, we examined whether the introduction of conditions related to spatial relations and direct gestures improved the use of "come/go" in children with ASD. We partially demonstrated the effects of having two people lined up according to the type of question, looking at the direction at which the other person was facing and moving, and then synchronizing their own body movements with the other person's movements. The present study suggested that aligning the response topography of deictic gestures promotes synchronization. Hence, the hypothesis was generally substantiated by the partial confirmation of the effectiveness of training procedures based on the characteristics of imitation in children with ASD. 20,21 One important contribution of our study is that we showed, for the first time, that deictic gestures are produced not only by strong responses by the movement of arm or hand but also by weak responses such as movement of the upper body. We incorporated upper-body movements into training for de'ictic verbs. These findings extend the research literature using the framework of conditional discrimination 15,16 in Note. pr. = prompt. FB = feedback. = and ≠ mean that the two parties' body orientation and direction of movement are equal/not equal, respectively. w/ and w/o mean the line with/without gesture, respectively. mix means a mix of = and ≠ or w/ and w/o in a 1:1 ratio.
that we demonstrated some of the performance improvements of children with ASD in simulated daily life situations (i.e., mix x w/o) and home setting. Additionally, these findings extend the research literature on to the TD children, 9 which was quantitatively analyzed by us using eye and body movements using the eye tracker and motion capture system.
Focusing on the sequence of training, (1) = x w/, (2) mix x w/o (i.e., conditional discrimination with narration and question sentences as discriminative stimuli), and (3) implementation in the home setting may have promoted the acquisition of deictic verbs. The synchronization of body movements between the two parties in (1) = x w/ is considered to have facilitated the understanding of the relationship between personal pronouns and deictic verbs. In other words, the participants may have learned to remap their reciprocal relation to their family members, depending on who is saying the pronoun, 7 or code the shifting of reference between two. 11 Masataka 9 showed that there is a developmental stage where gestures are correct, but verbal expressions are incorrect before appropriate verbal expressions are possible in TD children. However, it was deduced from this study that the appearance of deictic gestures as weak responses that occur at a constant frequency may lead to the occurrence of appropriate verbal expressions in children with ASD. In addition, the gradual removal of the prompt to move their body backward/forward and the shifting of the spatial relation from = to mix may also have affected the number of correct responses in daily life situations. Incidentally, in the final generalization phase, the rate of upper-body movement depending on verbs in all participants was found to be generally above 60%. It is inferred that correct responses (i.e., verbal expressions) occurred by self-produced body Note. pr. = prompt. FB = feedback. = and ≠ mean that the two parties' body orientation and direction of movement are equal/not equal, respectively. w/ and w/o mean the line with/without gesture, respectively. mix means a mix of = and ≠ or w/ and w/o in a 1:1 ratio. movements as discriminative stimuli. In view of the history of training to understand the direction in which the verbs were pointed, it is possible that Moe's correct responses did not occur sufficiently in this phase due to the high difficulty of deictic shifting. 8

Implications, Limitations, and Suggestions for Future Research
Findings from this study have important practical implications. In order to improve the use of deictic verbs among children with ASD, special educators should utilize prompt and fading techniques to teach them two types of behaviors: (1) children with ASD look in the direction at which the other person is facing and moving, and (2) they spontaneously synchronize their own bodies with the movements of the other person. Moreover, teachers can and should fade the prompt and bring them closer to daily life situations. For data collection, it is impractical to use engineering devices, and researchers should apply methods that can be implemented by teachers. For example, it is assumed that teachers or researchers record the direction of gaze (whether or not the child's is facing the other person) and the occurrence/nonoccurrence of deictic gestures with a clear direction of movement (whether they move their arms or upper body backward or forward) and collect inter-observer agreement.
This study has several limitations. First, this study targeted children with ASD whose mother tongue was Japanese and did not examine the effect of language differences. In particular, the subject is generally not omitted in English, whereas the subject is often omitted in Japanese. In languages such as English, children with ASD automatically provide prompts about the subject to themselves. In contrast, in Japanese, they must read the subject implicitly from the conversational context. Thus, future research should analyze the deictic relational complexity 12 of language and focus on children with ASD whose mother tongue is a language other than Japanese. Second, upper-body movements that are dependent on verbs used (i.e., the synchronization of body movements) did not consistently occur in all the six participants when the correct response was uttered. Thus, depending on the participants' preexisting skills and baseline results, the intervention procedures introduced should be changed to examine whether deictic verbs are acquired in a smaller number of blocks. Third, we only examined whether generalization could be established in case of untrained sentences that included the verbs "come/go" within the home setting. Moreover, we did not collect the data on the use of "come/go" in daily life situations before the intervention. In future studies, the researchers should consider programming to establish generalization across multiple verb pairs and collect the data within the home setting before and after the intervention to evaluate the changes. Fourth, it is necessary to investigate the developmental characteristics of weak body movements during the use of deictic verbs in children with ASD and TD children. In this study, we focused on upper-body movements based on assessment or baseline data. Only six children with ASD participated in this study, and we did not collect TD children's data for comparison. In the process of acquiring deictic verbs, it is necessary to compare the two groups and statistically analyze the occurrence of weak body movements (e.g., upper-body movements) depending on verbs. Fifth, this study was mainly conducted in an experimental setting, and teachers did not provide training in elementary schools. Future researches should build on the basic findings by implementing the research described in the first to fourth limitations and then build on the applied findings by incorporating them into educational settings.