Development of a Sign Repetition Task for Novice L2 Signers

ABSTRACT ENGLISH: There is a lack of tests available for assessing sign language proficiency among L2 learners. We have therefore developed a sign repetition test, SignRepL2, with a specific focus on the phonological features of signs. This paper describes the two phases of developing this test. In the first phase, content was developed in the form of 50 items with sentence lengths between one and three signs. Then, when a period of teaching revealed a ceiling effect in the first version, a second version was developed with 40 items varying between one and four signs. Test scores revealed increasing proficiency in Swedish Sign Language during education, and that mouth actions have a lower degree of accuracy than manual parameters.


Introduction
Research has yielded only a few tests suitable for assessing sign language proficiency in different groups of signers (e.g., L1 or L2 signers, adults or children).Despite the growth of sign language tests and assessments over the past two decades, few sign language tests and assessments for different purposes (educational or research-based) are available, especially for sign language as a second language (Schönström et al., 2022).In addition, considering the number of existing sign languages in the world, there are tests developed only for a few sign languages.This article describes a recently developed test that measures aspects of sign language proficiency in adult hearing L2 learners of Swedish Sign Language (Svenskt teckenspråk, STS) learning sign language for the first time.The starting point for the test is a sentence repetition framework.Sentence repetition tests have been used as framework for several sign languages recently (see below).These sentence repetition tests have, however, been developed for and validated with deaf signers.The test described in this study, Sign repetition test for L2 STS signers, SignRepL2 is designed specifically for hearing L2 STS learners, a group rarely served by sign language assessments.We clearly need a test to measure the proficiency of those learning to sign a second language.Such a test would undoubtedly prove valuable in multilingual research, where it could provide a comprehensive multilingual profile of individual linguistic skills.This article begins with a literature review of relevant earlier sign repetition tests and research on sign L2 acquisition, after which we present methods and the results of the SignRepL2 test, focusing on validation issues related to sign L2 acquisition.One aspect of sign L2 acquisition that has attracted particular interest is phonology, and this has been the point of departure for our own test design.As will be described in the literature section below, sign L2 acquisition research often points to sign pronunciation and sign phonological knowledge as challenging for L2 signers at earlier stages.One issue may be the simultaneity and spatiality that is present in signed languages making each sign more information-packed and "bigger" in pronunciation compared to spoken languages.So, the design of SignRepL2 is, in particular, focused on sign pronunciation and phonology as part of sign language proficiency, but not leaving the other structures behind (morphology and syntax).In our description of the SignRepL2 here, we are particularly interested in describing the development of test items and the scoring method.As scoring emphasises phonological accuracy, we also discuss this at the end of the article in light of the cross-modal learning situation facing our target group.

Sign repetition tests
In recent years, several studies have shown that repetition tests are efficient and reliable tools for measuring language proficiency among both first-language (L1) and secondlanguage (L2) learners and users (Gaillard & Tremblay, 2016;Klem et al., 2015).This includes sign languages, as reflected in the recent development of sign language repetition tests (SRTs) for American Sign Language (ASL-SRT) (Hauser et al., 2008), Swedish Sign Language (STS-SRT) (Schönström & Hauser, 2022), German Sign Language (DGS) (see, e.g., Kubus et al., 2015), British Sign Language (BSL) (Cormier et al., 2012), Italian Sign Language (LIS-SRT) (Rinaldi et al., 2018); and Swiss German Sign Language (Haug et al., 2020).These SRTs have primarily been developed and validated through deaf L1 signers.Some, including ASL-SRT and STS-SRT, have been used with adults and children, while others, such as LIS-SRT, with children only.These studies show that SRTs provide an effective model for testing sign language proficiency with psychometrically sound results.In short, the sign language SRTs developed thus far have shown that it is possible to test any aspect of sign language proficiency with relatively short administration and rating time.
One question raised in the literature is whether SRTs measure language proficiency or memory.Studies of both spoken and signed SRTs show that performance is related to longterm language knowledge rather than memory.The individual's language knowledge provides a scaffolding that supports working memory to increase the amount of linguistic information available.Furthermore, studies have shown that SRT performance for spoken and signed languages depends on a wide range of language processing skills (Gaillard & Tremblay, 2016;Haug et al., 2020;Klem et al., 2015;Supalla et al., 2014).In the case of ASL-SRT, Supalla et al. (2014) found that the error patterns of native signers were linked to semantic error, whereas the error patterns of non-native signers were related to phonological errors.They suggest that this divergence is related to linguistic knowledge; i.e., refined phonological constructions and semantic representations of signs are probably linked to language knowledge rather than memory.Based on these studies, SRTs seem to be valid instruments for measuring language proficiency, even if they have some connection to memory; individuals with robust linguistic knowledge find it easier to remember the test items when performing an SRT than those with less knowledge, hence the lower scores.
Although previous SRTs -at least STS-SRT -were primarily aimed at deaf adult L1 signers, they have been used for deaf children and hearing L2 signers.STS-SRT is considered too difficult for beginners/L2 signers, as was apparent in the early days of STS-SRT, when it was tested on hearing L2 signers and hard-of-hearing children, who are not primary users of STS.Given the great variation in sign language proficiency among deaf and hard-of -hearing children due to factors such as access to signing, the use of hearing aids, school placements, parents' language choices and so on, STS-SRT is not suitable for all children (Schönström et al., 2022).Moreover, when learning an L2 sign language in the second modality, a learner unfamiliar with the visual-gestural modality of sign languages struggles with the language and the differences in modality between spoken and sign language.The literature distinguishes between M1-L2 and M2-L2 learners (M stands for "modality") (Chen Pichler, 2011).As M1-L2 sign language learners learn sign language as L1 and learn another sign language as L2, they do not learn the modality itself.However, as M2-L2 sign language learners have a spoken L1 and then learn a sign language as L2, they must learn to express and use the language in a second modality; i.e., the visual-gestural rather than the auditory-oral.
In Sweden, many hearing students with a spoken L1 take courses to learn STS as an L2, particularly to become an STS interpreter.To examine the most effective ways to teach STS -i.e., how to ensure students make good progress and acquire in-depth knowledge -in 2016, the project Teaching Swedish Sign Language as a Second Language (UTL2) was launched at Stockholm University (see Holmström, 2019Holmström, , 2021)).As this project needed a tool to examine, measure and assess students' STS learning during their studies beyond existing assessments of students' production, reception, and interaction used by the teachers, we developed the Sign Repetition Test for L2 signers (SignRepL2) using the STS-SRT test as a model.The main differences between STS-SRT and SignRepL2 are in scoring and sentence length.STS-SRT contains longer sentences, and responses to signs are rated more holistically on a two-point scale (correct/error), meaning that if any part (phonological, morphological, syntactical, etc.) of the sentence item is reproduced incorrectly, zero points are awarded.If the entire sentence item is produced correctly, one point is awarded (Schönström & Hauser, 2022).The sentences in SignRepL2 are shorter and the scoring  scale more gradual, with a five-point scale focused on the phonological features of signs (see below for a description).

Previous research into sign L2 acquisition
The development of linguistic SRTs such as SignRepL2 needs to be informed by research on linguistic features and sign phonology acquisition to contribute to construct validity.Relatively little sign language research has focused on L2 acquisition (see Marshall et al., 2021;Schönström, 2021 for summaries).Several studies have looked more specifically at L2 sign phonology (the inner formational aspects of signs) as this can address the modality of sign languages and its impact on phonological formation, which differs from spoken languages in that it is based on manual formations rather than sound production.As knowledge from these studies is important when developing sign language tests like SignRepL2, in this section we summarise earlier research into L2 sign acquisition with the emphasis on sign phonology as part of the description of the test's construct validity.

L2 acquisition of sign phonology
In general, previous studies of L2 learners' phonological acquisition confirm that sign language phonology is challenging due to its formational characteristics, which differ from spoken language phonology (Chen Pichler & Koulidobrova, 2016;Rosen, 2011).In contrast to spoken language phonology, sign phonology is not based on sounds.Instead, sign language linguists have identified hand, movement and location components as some core components, i.e., phonemes, that form and differentiate the signs (for further reading on sign phonology, see Sandler, 2012).1These characteristics are also closely linked to motor production and motor skills (Hilger et al., 2015;Mirus et al., 2001;Rosen, 2004).Rosen (2004) conducted a qualitative study of the phonological acquisition of hearing ASL L2 signers, finding that perception and motor dexterity issues characterised errors in production made by these L2 signers.He describes errors related to the mirroring of sign features (i.e., L2 signers' production may be shaped by their perception of signs).Rosen also found that variation in the production of phonological features, such as substitution and displacement, was common among hearing L2 signers.This kind of mirroring, substitution and displacement is essential when outlining instructions for the scoring of SignRepL2.Mirus et al. (2001) found that ASL motor skills were closely related to phonology and variable in terms of the proximalization of movements and sign language experience and skills; for example, hearing adult L2 signers used more proximal joints (torso, shoulder, and elbow) than deaf adult L1 signers, who used more distal joints (forearm and wrist), which was reported as a matter of fluency and smooth production.Hilger et al. (2015) observed variability in production stability between signers with different experiences of ASL.Using motion capture technology to measure the spatial-temporal index, they found that hearing L2 signers had higher movement variability than deaf L1 signers, hence less stable production.They did however report that the most skilled L2 signers were approaching the target language level of production stability (Hilger et al., 2015).In developing the SignRepL2 test, it was therefore necessary to include items with different movements, as well as signs that may be performed differently by hearing L2 signers due to the use of proximal joints.
Using SRT data to examine phonological accuracy in hearing beginner BSL L2 signers, Ortega and Morgan (2015a, b) argue that sign production accuracy is linked to the iconicity of sign language; i.e., the relatively high degree of iconicity in sign languages is both an advantage and disadvantage to learners.According to Ortega and Morgan, while the iconic properties of signing help learners understand and memorise the signs, their gestural knowledge impacts the phonological structure of their sign production, resulting in errors or "L2 forms" (Ortega et al., 2019).These findings are important when choosing items for the SignRepL2 test, as well as when outlining instructions for scoring.
Some studies focus on identifying the easiest and most difficult phonological parameters to acquire.Bochner et al. (2011) developed a paired-comparison discrimination test of contrasts in ASL with L2 signers as a target group.Part of the test focused on ASL phonology and its four parameters -location, handshape, orientation, and movementbased on minimal pair contrasts.They found that L2 signers had the most difficulty discriminating between movements but found it easier to discriminate between handshape and orientation and easiest of all to discriminate location.They therefore propose an acquisition hierarchy for phonological features in which movement is the most difficult to acquire, a finding consistent with previous ASL research (Fischer et al., 1999).See also Beal and Faniel (2019).
Interestingly, in BSL, Ortega and Morgan (2015b) found that the location parameter was acquired first, followed by the movement and handshape parameter.For Swiss German Sign Language (DSGS), Ebling et al. (2021) found movement errors to be most common, while handshape was related to fewest errors.These contrasting results regarding the acquisition of phonological features may depend on the sign language studied (ASL, BSL, DSGS), the choice of method (sign repetition tasks vs. discrimination tasks), and the target group (naïve/beginner signers).More studies are certainly needed to examine the acquisition of phonological features in more detail.In our SignRepL2 test, we therefore decided to make it possible to analyse phonological features in greater depth on the scoring sheet.
Chen Pichler (2011) focuses on handshape errors in L2 signers based on outcomes from ASL imitation tasks.Contrary to Rosen (2004), who claims that sign phonology is not transferable between languages due to modality differences (i.e., from an L1 spoken language to an L2 sign language), Chen Pichler argues that ASL L2 signers transfer their gesture knowledge into ASL L2 phonology and that the transfer of handshape forms can be explained in terms of markedness; i.e., the handshape errors L2 signers make when using unmarked handshapes are influenced by gestural knowledge.

L2 acquisition of signs
Another focus area for studies is the acquisition of a specific category of signs, often labelled as depicting signs or classifier constructions.Although vocabulary is often described in subcategories within sign linguistics, there are various takes on this issue.On the one hand, lexical signs are equivalent to the notion of "word" in spoken languages -i.e., lexical signs have a clear relationship between form and meaning independent of context -while, on the other, partly lexical signs do not gain full meaning until they are used in a specific context (Johnston & Schembri, 2010).One category of partly lexical signs is Depicting signs (also often labelled as "classifiers" in the literature), which are highly contextualised signs describing entities, handling, or the size or shape of objects.This is a highly prominent category typical of sign languages.For example, a depicting sign representing 'sit' may be performed differently depending on the context, such as the location or size of the chair, number of chairs, etc.Previous studies show that depicting signs are particularly challenging for L2 signers to acquire (Boers-Visker, 2020;Ferrara & Nilsson, 2017;Marshall & Morgan, 2015;Schönström, 2021).In developing SignRepL2, it was therefore essential to include both lexical signs and depicting signs in the items.

Aim of the paper
The overall aim of this paper is to give an account of the development of an SRT test aimed at L2 learners of STS and explore whether this test is suitable for testing L2 proficiency in STS.In light of previous research on sign L2 acquisition, the SignRepL2 items and scoring protocol include an emphasis on phonological features of STS.The intention is to contribute to a better understanding of L2 STS assessment and what we can learn from assessing L2 signing.In the next section, we will describe the development of the test.

Item development and selection
Based on knowledge generated by the previous research described above, the development of SignRepL2 started with a collection of STS signs and sentences of various forms and complexities, including lexical signs and depicting signs with varied handshapes, locations, and movements and varying degrees of iconicity.From this collection, one of the team members, an experienced teacher of L2 STS learners (and native in STS), selected a number of common signs and sentences of varying lengths and complexity that are also used in materials in the beginning STS courses.The chosen signs and sentences were then discussed with a team of teachers (all native in STS) to determine which should be included in the test.These discussions resulted in 50 test items: 30 single signs (10 one-handed signs, 10 twohanded signs with one active and one passive hand, and 10 two-handed signs where both are active); 10 two-sign sentences; and 10 three-sign sentences.The items ranged in complexity from one-handed signs with more unmarked handshapes and plain movements to two-handed signs with more marked handshapes and complex movements.2In addition, the two-and three-sign sentence items consisted of different morphological components (e.g., number and negation incorporations and aspectual markings) and different syntactic structures (e.g., declarative sentences, subordinate clauses, and wh-questions).In sign languages, grammar is usually expressed through facial expressions, such as raising or lowering eyebrows, and such features are included in the different syntactic structures.Mouth movement is also important in STS -for example, to differentiate between signs performed similarly -and all items therefore included mouth movements.Table 1 provides an example of three one-sign items, one two-sign sentence item, and one three-sign sentence item.For the one-sign items, the focus was on phonological variation.For the twosign and three-sign items, the focus was also on morphological and syntactic variations.See supplementary material for a complete list of items and descriptions of the linguistic properties of each item.
As previously mentioned, issues related SRT design include sentence length and the interaction between memory and language skills.In the literature, it has been suggested to draw the line at around 4-7 words/signs (Gaillard & Tremblay, 2016;Haug et al., 2020).However, it is not always straightforward to equate words with signs concerning sign language sentences.The modality of sign languages allows for a higher degree of simultaneity in language production; i.e., what is interpreted as one single sign unit (manually) can consist of multiple simultaneous units.One sign can, for example, be modified (in the movement parameter) to express additional meaning (e.g., iteration in verb signs) and be accompanied by specific mouth actions to express adverbial information.We believe this kind of simultaneity adds complexity for hearing L2 learners of a sign language.So, contrary to previous literature, we added shorter sentences consisting of 1-3 manual signs and later four signs (in version 2).

Test design procedure
After selecting signs and sentences, we video-taped the third author's signing of the 50 selected items.Four examples were also recorded for a practice section to be completed before the test, so the participants could better understand the structure and task before taking the actual test.In addition, we recorded a native Swedish speaker explaining the test instructions in spoken Swedish.Among other things, the speaker instructed the participants to imitate the signing in the film as exactly as possible.The speaker also warned them that    the signer might use one or more signs that participants were unfamiliar with (i.e., sign synonyms), but that they should nevertheless imitate the sign(s).The spoken instructions, four examples and 50 test items were edited into a single video, meaning that the test was the same during each session and for all participants.A grey screen appears between each item, the length of which depends on the item length, during which the participant should reproduce the same sign (see Figure 1).The test should be conducted on a computer, and the camera should record the participant during the whole test procedure.The test takes approximately 12 minutes, including the information in spoken Swedish and the practice session.
Before the test was distributed to the target group (the hearing L2 learners of STS), we performed a pilot test with 12 hearing, non-signing participants.The pilot test assessed whether the instructions were clear and the technology worked.After the pilot test, we made a few adjustments to the instructions and increased the duration of grey screen time.We made no adjustments to the items as they all appeared to work well.

Scoring
As previously noted, several established SRTs, STS-SRT among them, use a two-point grading scale (Hauser et al., 2008;Schönström & Hauser, 2022).As our test consists of shorter items (i.e., one to three-sign sentences) than those found in STS-SRT, we adopted a more nuanced grading scale sensitive to many linguistic properties, such as the phonology, morphology and syntax of the two-and three-sign sentences.Inspired by Ortega (cited in Gaillard & Tremblay, 2016), we applied a five-point grading scale (0 to 4), see Table 2.
The maximum possible score for the test was 200 (single signs 30 items = 120 points; two-sign sentences 10 items = 40 points; and three-sign sentences 10 items = 40 points).
We developed a scoring sheet gradually through the first author of the study started to analyse the items and make notes of what was needed to be able to rate them.We thereafter created an Excel spreadsheet on which we entered a score from 0 to 4 for each item and subscores for each participant.We created one tab for each participant (see Figure 2, with an example of item 24 from one participant) and one tab with an overview of all participants and items (see Figure 3).The detailed tabs included rows and columns for the phonological parameters handshape, orientation, movement, and location for each sign in the item (one in single signs, two in two-sign sentences, and three in three-sign sentences).Rows for morphology were optional, as only a few signs had morphological features.Each item also had one row for syntax (only applicable to the two-and three-sign sentences).We also added rows for mouth movements for each sign in the item.It should be noted that mouth movements are not usually treated as a phonological parameter in the literature.However, they are considered to play an important role in the meaning and understanding of them (see e.g., Johnston et al., 2016).So, we decided to include mouth movements as a single part of the scoring, meaning that if one mouth movement in the item was incorrect, mouth movement was deemed as incorrect for the whole item.Each part in the item (e.g., the parameters) was scored 1 point for correct form and 0 points for incorrect form or if the part was missing.The scoring sheet was adjusted several times during the initial analysis, and colours for different boxes were added until the raters found the sheet covering all details needed for analysis.Figure 2 illustrates the detailed analysis of item 24, a three-sign sentence SCHOOL START AT-EIGHT 'school starts at eight o'clock'.This sentence consists of three signs where the last sign AT-EIGHT includes a morphological feature that is a modifier, i.e., number incorporation meaning 'o'clock'.
The analysis revealed that the first sign, SCHOOL, was not performed correctly because the participant did not perform a correct movement (i.e., one phonological parameter is wrong).Therefore, the whole sign was scored as incorrect (0 points).Note that the mouth is not included but rated separately for the whole item (see yellow boxes and arrows).Also here, the mouth movement was in total scored as incorrect (0 points) because one of the three signs had incorrect mouth movement.The second sign, START, was performed with all parameters correct, as was the third sign.The morphological feature (blue boxes and blue arrow) was also performed correctly.After this analysis, the rater counted the 0 and 1 points, resulting in two 0 points and four 1 points (see boxes with black circles).As shown in Table 2, the overall score for this item was 2 (see the box with a red circle), as more than half of the items are performed correctly, i.e., 4 out of a maximum of 6 sub-points.
These detailed sub-scores of each sign in the items make it possible to analyse which parameters are harder to perform correctly and which signs or features are harder than others.
Figure 3 is a screenshot of the compiled score sheet.Single signs are in light yellow, twosign sentences in green, and three-sign sentences in blue.The test administrator manually scored the item (highlighted in dark yellow) as described above.All numbers come from the detailed sheets for each participant.
From the compiled score sheet, we could see the total points for each item for each participant and the total points for the whole test for each participant, as well as the signs, parameters and features that caused most difficulty.Each test took approximately 30-45 minutes to analyse using the score sheet.Two deaf, fluent signers who are trained linguists each scored the same participants' tests from the first session in order to ensure that the score sheet worked well and produced a similar result regardless of who was scoring.University participated in the study.All participants were beginners with no previous knowledge of STS before instruction started, or with knowledge limited to the hand alphabet and some single signs.The students participated voluntarily.In total, 48 students have taken the test one or more times, 44 women and 4 men.The median age at the first test session was 22 years.Some participants only participated in the first session, some students participated in the second and/or third session, etc., while others participated in all three sessions.Students from both cohorts were tested three times during the first semester of the programme: before their first STS instruction, after approximately 100 hours (half a semester) of STS instruction, and after 200 hours (one semester) of STS instruction.Table 3 lists the test sessions and the number of participants.

Test procedure
The students were informed about the project UTL2 by the first author on the first day of the semester and were asked to participate in the study.If they agreed, they were asked to perform the test for the first time the following day.Students were allowed to leave lectures to take the test.Each student, in turn, came to a silent office and met one of the team members, who was present throughout the test procedure.The student was asked to sit in front of a computer and the test leader started the video, beginning with the instructions in spoken Swedish.After listening to the instructions, the student was asked if he or she had any questions.The student then performed the four test examples to ensure that they understood the procedure.When the student was ready, the test leader started the computer's camera for recording.The video with the test was then started and the student looked at each item once and repeated it while the grey screen was displayed.If the student needed more time, the test leader could pause the test until the student was finished.This procedure was the same in all test sessions and for all students.

Score results
Table 4 shows mean scores for each participant and cohort in the three testing sessions (T1, T2 and T3) (Individual results are provided in the Appendix).Some values are missing for various reasons such as illness and drop out from the   programme.Nevertheless, we included as many values as possible to calculate the statistics.

Test reliability
The reliability of the SignRepL2 test was measured for internal consistency of the test items (N = 50), resulting in Cronbach's alpha values of 0.904, 0.977, and 0.866 for T1 (n = 26), T2 (n = 23) and T3 (n = 23) respectively.This suggests good to excellent internal consistency for the test items.Test inter-rater reliability was measured for two raters' assessments of 13 participants' score results in T1.Due to the continuous nature of the score data variable, we used the Intraclass Correlation Coefficient (ICC) with a two-way mixed-effects model for single-rater measures (see Koo & Li, 2016).ICC reliability values between 0.60 and 0.74 are good and values between 0.75 and 1.0 are excellent (Koo & Li, 2016).The ICC for the SignRepL2 was 0.964 with a 95% confidence interval (CI) between 0.886 and 0.989, demonstrating excellent inter-rater reliability.Greenhouse-Geisser calculation, as Mauchly's test of sphericity was not met, F(1.281, 14.087) = 54.077,p < .001,η 2 = 0.831.

Test development
As shown above, by the third test session we saw a ceiling effect as the mean approached the maximum possible score.The project team therefore began work to improve the test.To this end, as the ceiling effect was most apparent in the single sign items, with most participants scoring almost 100% for single signs in both T2 and T3, we decided to reduce the number of single signs from 30 to 10 (i.e., the same number as the two-and three-sign items).
Of the 30 original single signs, we kept eight: three one-handed; two two-handed with one hand active and one passive; and three two-handed with double articulators.From these, we chose to keep one sign that most students performed correctly in the first test session, one that most students performed incorrectly, and one with an intermediate difficulty level.The two new signs were one with one active and one passive hand and one two-handed sign of a more complex construction with crossed thumbs and simultaneously moving fingers ('BUTTERFLY').For the two-sign sentences, we increased variability by replacing one item deemed too similar to one of the items in the three-sign sentences (one sign was the same and the sentence structure was similar).The three-sign sentences remained unchanged.To increase complexity, we created 10 four-sign sentences with varied grammatical structures and types of signs (lexical, depicting, pointing).These included signs with different handshapes, locations, and movements (space, size, and shape signs, signs with different aspectual markings, etc.).The new sentences were chosen after discussion with the teacher team in the same way as with the initial items in version 1.
The maximum score for version 2 of the test was 160 (single signs 10 items = 40 points; two-sign sentences 10 items = 40 points; three-sign sentences 10 items = 40 points; and four-sign sentences 10 items = 40 points).
As a result of the revisions, version 2 had more complexity and was therefore more difficult, making it useful for testing higher degrees of proficiency for L2 STS signers.Version 2 was used on the two cohorts of students after 400 hours and 600 hours of classroom STS instruction.Table 5 lists the number of students in each test session.
Test session 5 had the fewest participants.This decline is typical as students tend to drop out of interpreter studies toward the end of the programme.

Score results
Table 6 shows the mean scores for each cohort and in total for test sessions 4 and/or 5 (Individual results are provided in the Appendix).

Score comparison over time
To compare test results on repeated measures within the participants, a paired sample t-test was performed listwise (n = 10).The Shapiro-Wilk's test showed that the difference between T4 and T5 was normally distributed (p = .516).The paired sample t-test showed a statistically significant change over time between T4 and T5 scores: t(9) = 3.038, p = .014,d = .961(M score 130.70 (T4) and 145.10 (T5) and SD 9.44 and 10.43, respectively) (Figure 5).Although the sample is small, we interpret this as a promising result for future testing.

Test validity
In creating our test items (i.e., sentences), we have considered the modality difference as a crucial aspect of the L2 learning challenge.Furthermore, in SignRepL2, scoring in part emphasises sign phonology and reveals phonological errors, making the focus on the number of units inapplicable to SignRepL2.In sign language, many features co-occur as hands, arms, body, facial expressions, spatial locations, etc.These modality-specific features compensate for the shorter sentences in SignRepL2 compared to the number of units (words) in SRTs for spoken languages.While mainly informed by research on sign L2 acquisition, the validation process behind the test content and development phase also takes into account the third author's extensive teaching experience in sign L2 classrooms.Generally speaking, however, it is a challenge to reach appropriate content validity within sign language assessment due to limited research and knowledge in the area (see also Chapelle et al., 2022) In this context, the items were carefully developed by a team with extensive STS linguistic knowledge and demonstrated good-to-excellent internal consistency and inter-rater reliability, inferring that the test scores are consistent, a conclusion supported by a sub-analysis of the phonological accuracy of the participants' responses in SignRepL2 (see below).We also added non-manual mouth action to the analysis, as we found this to be an essential feature of the signed modality and part of the simultaneity of STS.
As illustrated in Figure 6, the analysis of data from T1, T2, and T3 version 1 reveals that mouth actions had a lower degree of accuracy than manual parameters.In T1, mouth actions were almost non-existent; however, in T2 and T3, mouth actions were better than the development of other parameters.This finding indicates that (hearing) sign L2 learners focus to a high degree on manual sign production at the expense of mouth actions (cf.Rinaldi et al., 2018).In T1, before the participants learned STS, they usually omitted mouth actions, especially in items with two-sign and three-sign sentences.In T2 and T3, participants sometimes used mouth actions other than the examples provided, indicating that they could recognise the manual sign but did not register the correct mouth action, which is often the only difference between two manually similar signs.This confusion often resulted in the participant using a familiar mouth action rather than the correct mouth action (e.g., "man" rather than "boy" and "cause" rather than "therefore").
Meanwhile, location is the parameter most often produced correctly.This is followed by orientation, handshape, and movement.The difference between these three parameters is minimal and offers the barest indication that the parameter movement may be the most difficult manual parameter to recall.This finding is in line with previous research, especially that conduced on ASL (e.g., Bochner et al., 2011;Fischer et al., 1999; see also Ortega & Morgan, 2015a, b), which concluded that the movement parameter is the most difficult to acquire for sign L2 learners.For example, many participants did not register the number of repetitions in signs such as CLEAN, NEVER, OFTEN, and COFFEE.In STS, these signs are repeated twice, but in T1 and T2, the participants often used more continuous repetitions (i.e., three or more repetitions).Another movement error revealed in our test sessions was the direction of movement; for example, for the signs RUN-AWAY and GET-THERE, the participants often performed a side movement rather than a straightforward movement.Other errors were observed for signs with back-and-forth or up-and-down movements where the participants used circled movements (e.g., in HOW and PLAY) and flexing instead of straight movements (e.g., in YOU).
The underlying scores for SignRepL2 demonstrate the suitability of the test items for measuring the participants' proficiency in phonological and mouth features in line with previous research on sign L2 acquisition, supporting construct validity.
Detailed scoring of the phonological parameters also allows us to identify qualitative patterns or recurring errors supporting construct validity.For example, fingerspelling and numbers are often mentioned as being particularly difficult for L2 signers to recognise and perform.When fingerspelling the word TAXI, for example, participants often simply moved their hands in a stream of unidentifiable handshapes.The concept 64 (years) was replaced with 46 or another number.Errors in non-manual grammar markers were also common, especially in T1 and T2, when participants often omitted these, leading to errors in syntax.This observation is in line with the omission of mouth actions; i.e., inexperienced signers often fail to notice non-manual grammar markers on the face as they fixate on the manual signing (cf.Rinaldi et al., 2018).The morphological feature for the sign VERY-BLUE, consisting of a complex emphasising movement, was also very difficult for many participants, even in later sessions.

Discussion
As previous studies have shown about SRTs in general (e.g., Gaillard & Tremblay, 2016;Klem et al., 2015), SignRepL2 also appears to be a useful tool for measuring some aspects of STS proficiency among L2 learners besides other tests measuring, for example, communicative competence.The SignRepL2 test provides holistic scores that make it possible to compare different learners with each other at different stages in STS education, as well as the individual's development from one test session to the next, and thus the students' progression over time.One common problem with sign language test developments (and sign language research in general) is getting sufficient sample sizes for statistical power.Overall, an ideal sample size of 30 should be appropriate for reliability analysis (e.g.Cronbach Alpha's), but according to Bujang et al. (2018), it also depends on the number of test items.Our sample size was as best at n = 26, which is close to 30.At the same time, n items were 50.Also, the small size of n = 10 for the paired sample t-test is not ideal, but not impossible depending on probability thresholds as well as desired effect size.For future studies, it would be good to update the analysis with bigger sample sizes to better ensure the test's reliability.
As mentioned in the literature review, learners unfamiliar with the visual-gestural modality of sign languages struggle with the language and the modality difference between spoken and sign language.Hearing L2 learners must learn to express and use the language in a second modality.This is considered a more complex process than learning a second language in the same modality (i.e., having a spoken language as L1 and learning another as L2).This being the case, we assume that modality difference also impacts how M2-L2 signers perform in an SRT.Although the recommendation is that SRTs should consist of four or more units (e.g., words for spoken languages) to be reliable, we argue that the modality effect negates this recommendation when applying a test such as SignRepL2.
In addition to the holistic scoring, SignRepL2 also allows for detailed phonological examination through the scoring sheet, which has rows for the accuracy of parameters.This makes it possible to identify recurring errors and where the test takers have difficulties.As Supalla et al. (2014) found, the error patterns of non-native signers were often related to phonological errors, and these can be detected in SignRepL2's scoring system.
SignRepL2 revealed that learners' mouth actions had a significantly lower degree of accuracy than manual parameters.This is in line with the finding in Rinaldi et al. (2018) that non-manual components are "among the most difficult linguistic elements to be acquired and mastered.These difficulties could be linked to a possible cognitive overload, with a heavy burden on memory" (p.417 f).Rinaldi and his colleagues suggest that a greater capacity for simultaneous processing is needed to reproduce both manual and non-manual components correctly at the same time.This pattern is repeated by the students in our test sessions, whose mouth actions become more accurate in later stages of their STS education but who still omit mouth actions when reproducing longer and more complex sentences.
In the future, it would be valuable to add more in-depth analysis of the SignRepL2 and individual sub-scores to further examine the reliability and validity of the test.For example, a Many Facet Rasch analysis would provide more information about the dimensionality of the scores.Further analysis of the sub-scores (e.g.handshape, movement, etc.) would also provide more information about inter-rater reliability on sub-scores and a better picture of construct validity.
In the future, we plan to use SignRepL2 version 2 to test L2 signers from the first session onwards, as in this study, it was impossible to test version 2 in earlier sessions with the same participants.Version 1 will no longer be used.We hope that by using version 2 on more L2 signers, we will eventually establish standard criteria that can be used to evaluate the progress of other sign language learners.The evolution towards greater complexity from version 1 to version 2 of SignRepL2 makes the test useful for other purposes and contexts outside education.Although SignRepL2 has primarily been developed for hearing adult L2 signers, our results suggest that the test can be used in other contexts and for other, non-fluent target groups.The test could, for example, be used to assess the STS language knowledge and proficiency of deaf or hard-of-hearing children (DHH).In Sweden, there is large variation in the linguistic background of DHH children, many of whom, especially those who use cochlear implants or other hearing aids, are not provided with opportunities to acquire STS from early childhood.Some of these children start to learn STS later in compulsory school, particularly in deaf schools or schools with programmes for hard-of-hearing students.The language proficiency of these children was recently studied in another project (DHT, deaf and hard-of-hearing children's bilingualism) managed by this paper's first and second authors.In this project, we used both the old STS-SRT test for native/advanced STS signers and the first version of the SignRepL2 test (together with other tests and assignments to assess written Swedish) to test the children's sign language proficiency.We found the STS-SRT test to be difficult for many children but the SignRepL2 to be more suitable (Schönström et al., 2021).Version 1 of SignRepL2 was used as the revised version 2 was developed after the data collection for DHT had started.The utility of SignRepL2 version 2 for young DHH learners will be evaluated in future projects.
We are also using SignRepL2 with deaf adult migrants learning STS in Sweden.The first and second anonyms are currently running a four-year project focusing on deaf migrants' multilingualism.In this project, we will use SignRepL2 version 2 to investigate deaf migrants' STS development during instruction.This may also reveal whether previous sign language knowledge contributes to faster and more accurate learning of STS.Using the SignRepL2 test with deaf adults learning STS will also help us generate knowledge about similarities and differences between M1-L2 and M2-L2 learners of STS and the development of STS in L2 students in general.

Conclusion
This article describes the development of a sign repetition test aimed at STS L2 signers.Much of the work has focused on the development of test items to demonstrate construct validity and the scoring method, informed by previous knowledge of sign L2 acquisition.SignRepL2 demonstrates good to excellent psychometric values and the test results obtained by L2 signers mirror the improvement in their signing skills as they gain experience during the course; in other words, the test reveals different proficiency levels in the L2 signers after varied exposure times.This suggests that SignRepL2 has the potential to function as one among other measures of STS proficiency, not only in educational contexts but also for other purposes, such as when recruiting teachers, interpreters or support staff for deaf signing children, or in sign language programmes to assess students' progression.Moreover, the test results provide a means to qualitatively analyse errors and identify individual variations in sign production through the detailed scoring sheet, which opens possibilities for future analysis.

Version 2
Score results SignRepL2 version 2, test occasion T4 and T5 (maximum Score: 160 points).NA: not applicable; T: test; The numbers after UTL2 are participant codes to ensure anonymity.

Participant
-cream costs seven Swedish krona' A three-sign item.A main sentence including sign with number incorporation.

Figure 4 Figure 6 .
Figure4shows the total score (maximum 200) achieved by the students in three test sessions (T1, T2, and T3) in mean and standard deviation.In order to reduce individual variability on scores, one-way repeated-measures ANOVA was run to compare the repeated scores among the 12 participants over time.The ANOVA analysis shows a statistically significant change in score within the participants (n = 12) over time, with Epsilon (ξ) = 0.640, using

Table 1 .
Examples of items from SignRepL2 test.
A one-sign item.Single one-handed sign, unmarked handshape, plain movement toward the breast, contact, repetition MEETING 'meeting' A one-sign item.Single two-handed sign, both hands

Table 2 .
A five-point grading scale.

Table 3 .
Test sessions, hours of instruction, and the number of students (in total, from both cohorts) that took each test.

Table 4 .
Scores for SignRepL2 (maximum score: 200)for two cohorts of students and in total, in mean (M), standard deviation (SD) and range.
The test participantsTwo cohorts of students (one starting in autumn semester 2016 and the other in 2018) in the Bachelor's Programme in Swedish Sign Language and Interpreting at[name]

Table 5 .
Test sessions and the number of students who took the new test.