Reliability and validity of the script concordance test for postgraduate students of general practice

Abstract Background: The script concordance test (SCT) is a validated method of examining students’ clinical reasoning. Medical students’ professional skills are assessed during their postgraduate years as they study for a specialist qualification in general practice. However, no specific provision is made for assessing their clinical reasoning during their postgraduate study. Objective: The aim was to demonstrate the reliability and validity of the SCT in general practice and to determine if this tool could be used to assess medical students’ progress in acquiring clinical reasoning. Methods: A 135-question SCT was administered to postgraduate medical students at the beginning of their first year of specialized training in general practice, and then every six months throughout their three-year training, as well as to a reference panel of 20 expert general practitioners. For score calculation, we used the combined scoring method as the calculator made available by the University of Montreal’s School of Medicine in Canada. For the validity, student’ scores were compared with experts, p <.05 was considered statistically significant. Results: Ninety students completed all six assessments. The experts’ mean score (76.7/100) was significantly higher than the students’ score across all assessments (p <.001), with a Cronbach’s alpha value of over 0.65 for all assessments. Conclusion: The SCT was found to be reliable and capable of discriminating between students and experts, demonstrating that this test is a valid tool for assessing clinical reasoning skills in general practice.


Introduction
The objective of the initial training of medical students is for them to acquire medical skills. To achieve this, they must master three areas: theoretical knowledge, professional skills and clinical reasoning. Professional skills and clinical reasoning are acquired during postgraduate medical training, including practical training [1,2], which relies on setting-specific practice and training. Currently, in France, no specific evaluation is performed to assess students' clinical reasoning, despite the fact that this is a crucial aspect of medical training that enables clinicians to treat the information gathered in clinical situations [3].
Formative assessment plays a key role in the acquisition of medical knowledge [4,5]. It helps learning by allowing students to determine their strengths and weaknesses helping them to improve their learning. Therefore, it seems important to assess the core competencies required for the specialty in question to enable the assessment of students' progress during their clinical practice sessions [6,7]. A new aim in Europe is to harmonize practice and learning for medical students [8]. Each country uses different tests to try to assess the clinical skills of the students. The objective structured clinical examination (OSCE) seems to be the closest test to the ideal assessment of clinical skills [9,10]. In France, it has been used in some places in a formative way but it is not used in a sanctioning way because it is not reproducible on large samples and depends on the operators. In this context, certain researchers have examined tests assessing the overall medicals skills of general practitioners (GPs) [11,12].
The script concordance test (SCT) was developed in Canada around 15 years ago and is used to assess students' clinical reasoning [13]. This tool reflects the extent to which the candidates' judgements map to those of a reference panel for the specialty in question in cases where there is clinical uncertainty [14]. The SCT offers a standardized assessment of the reasoning process applied to ill-defined clinical cases and has proven ability to differentiate between students and experts in relevant disciplines [15][16][17][18]. The SCT is, therefore, used to assess students' ability to reason through complex problems that cannot be solved merely by applying knowledge [19]. Any divergence between a student's response and that of the experts will allow the areas in which the student requires further training to be identified.
A previous study in the context of general medicine has compared the SCT with clinical reasoning problems (CRPs) but not as a valid tool for the evaluation of clinical reasoning [20]. Having a standardized tool that allows the identification students who are experiencing difficulties in acquiring clinical reasoning skills that is usable in a large sample could help to improve teaching, and adapt and hence standardize, practices.
The aim of this study was to demonstrate the reliability and validity of the SCT in the general practice context, and to assess whether this tool could be used to assess medical students' progress in acquiring clinical reasoning.

Methods
This study was a longitudinal observational study from November 2010 to November 2013. We first assessed the group of students at the beginning of their postgraduate training in general medicine in November 2010. They then took the same test each semester throughout the three years of their training, before their change in clinical placement. The students were given 20 minutes training on how to complete the SCT before their first assessment. We acquired data for seven sessions (S0-S6). The first session (S0) was obtained before the beginning of the students' postgraduate medical training. S1 to S6 were collected at the end of each clinical training semester.

The SCT
The SCT presents students with a series of uncertain clinical scenarios. Once the basic scenario has been introduced, three pieces of additional clinical information (items) are given, separately from one another. Students must then make decisions on the diagnosis, investigation and treatment for each of the three pieces of information offered, including answering three questions on a five-point Likert scale (Table 1) [21].
We used the 45-SCT developed by the Department of General Medicine at the University of Li ege, Belgium. This SCT was developed as a basis for admitting students to the general medicine course. Each scenario contains three questions, giving 135 items spread across 21 diagnosis scripts (63 items), 12 investigation scripts (36 items) and 12 treatment scripts (36 items). The scenarios covered a vast number of the fields involved in general medicine (Table 2). We needed to modify four sentences to clarify them for our students.

Construction of the reference panel
It is recommended that 15-20 experts be assembled as a reference panel to achieve stable scores independent of the composition of the panel [14]. In this study, experts were defined as persons with broad educational or organizational responsibilities in medicine, including but also beyond, the practice level (e.g. at university or national levels). The reference panel was made up of 20 academic experts in general medicine. They took the same SCT as the students, to enable the establishment of benchmark scores. The experts took the test once, one month before the students' first assessment.

Administration of the SCT
Sessions S0, S1, S2, S3 and S6 were held in a classroom and under supervision. The question sheets were collected at the end of each test, and there was no time limit for the responses. The students were asked to respond using tables commonly used for multiplechoice question (MCQ) responses, such that each potential response on the five-point Likert scale corresponded to a possible response to a MCQ item. This method allowed us to use an optical reader to score the responses electronically. The results were recorded in tables that were processed by an optical reader, resulting in an Excel file. For S4 and S5, we used an intranet platform provided by the medical faculty's continuing education division, which allowed us to use their online SCT platform. Several studies have demonstrated the validity of online SCTs and we wished to use an online process to verify its usability [22][23][24].

Scoring
The scoring system is designed to gauge the extent to which the candidate's script matches, or is similar to, that of some experienced doctors on a reference panel [1,19,25]. For each question, the number of points awarded to the examinees for each possible response depended on the number of experts who gave the same response. The global score was obtained by adding the scores for each question and transforming this into a 100-point scale [13]. For score calculation, we used the combined scoring method described by Charlin et al. [26], as well as the calculator made available by the University of Montreal's School of Medicine, Canada (cpass.umontreal.ca), explicitly dedicated to SCT.

Statistical analysis
To describe the student's sample, we used the mean, the standard deviation (SD) and standard error of measurement (SEM) through the calculation of the confidence interval (95%CI). The reliability of the test was estimated by Cronbach's alpha coefficient for each session. For the validity, these scores were compared with experts using either a t-test or paired Wilcoxon test, depending on variable normality, and p <.05 was considered statistically significant. STATA 12 software was employed for the statistical analysis.
To evaluate the students' progress in clinical reasoning, we determined the difference in point scores between two successive sessions for each student. We then averaged this point difference for each session, which enabled us to compare the averages every six months.

Results
Ninety students from the total cohort of 135 completed all sessions (66%). The average expert score was 76.7%. The students' scores were lower than that of the experts in all sessions, and this was statistically significant for each session (p <.05) ( Table 3). The students' mean scores scored ranged between 68.9 and 73.1, which was consistently lower than the experts' mean score of 76.7. Moreover, all Cronbach's alpha reliability coefficient values were higher than 0.6 (0.65-0.83). Individual confidence interval (95%CI individual) is around 5.5, meaning that individual score has a low zone of uncertainty ( Figure 1). The test was feasible either on paper or the internet SCT platform. However, we noted an average decrease in the students' scores when they took the SCT on the internet platform. Students' clinical reasoning progressed over the course of their three years of training, particularly during the first 18 months (Table 4).

Main findings
This study used the SCT to evaluate the clinical reasoning of postgraduate general practice students. We confirmed the reliability and the validity of the SCT, individually and for the sample, to assess clinical reasoning in general practice, and show that students' clinical reasoning changed over the 18-month clinical training period.

Strengths and limitations
The SCT has been demonstrated to be well accepted by students, whatever their level. This probably is because it is similar to clinical procedures in real life [27]. In our study, the students, all from the University of Toulouse, took the SCT for the first time, and despite a brief introduction to the method, they did not know how to use the test. However, they appeared to be well prepared for this type of test, and there were few missing answers, despite that the test results had no impact on their overall education.
Our study involved a significant number of participants: 90 students every semester, at one university. To the best of our knowledge, only one other study has included a larger number of students (n ¼ 202), bringing together postgraduate students in surgery from nine universities [28]. The participation rate in our study was high but since it was conducted at a single institution it may have generalized our results. However, we have a good sample size and low confidence interval (CI) that allows us to reflect our results in the general postgraduate students' population.
The SCT we used was reliable: it comprised 135 clinical articles, against the minimum of 60 recommended in the literature and 20 experts in the reference panel, against the suggested minimum of 15 [21,29,30]. In addition, the reliability measure, Cronbach's alpha coefficient, was greater than 0.6 on the test set [26]. We also noted a significant difference between the students and the experts, confirming that the SCT is capable of distinguishing between them. This has also been reported in a number of comparative studies [15][16][17][18]. We could not compare the SCT to another test because there was no gold standard to evaluate clinical reasoning. The literature described OSCE but it could not be applied to large cohorts such as ours [31]. Thus, we have a reliable and valid tool to assess the development of clinical reasoning in    our postgraduate general practice students. To increase the efficiency of the test we must now broaden the themes of the scenarios, as currently it represents only part of the discipline, as well as remove indiscriminate scenarios.

Implications
Interestingly, the results differed by approximately three points from a total of 100 when students were assessed using a paper-based test in the classroom and when they took the online SCT, although progression was still evident, and their scores returned to a constant average by the end of the assessments. We cannot explain this phenomenon since other studies have shown the online SCT to be reliable [22][23][24]. It is possible that the students did not concentrate on the test when taking it at home as well as they might in a classroom setting, and that this influenced the results. We chose to use the online platform for logistical reasons, thinking that it would be easier to assess large numbers of students in this way and because some were working in remote areas. Moreover, we hoped to use the SCT as a self-assessment tool in the future. However, ultimately it proved simpler and more effective to use a paper-based test and score it with an optical reader than to assess students with the online SCT test. This practical dimension needs to be considered when integrating SCTs into a general medicine training programme.
If we examine the development of clinical reasoning during postgraduate general practice training, we note that the largest increase occurs during the first 18 months (1.5 versus 0.5 points). This is because during their postgraduate training, students are implementing the scripts they have built during their undergraduate years, including during their practical training. This progression in clinical reasoning seems logical and has been shown in different populations by comparing the scores between undergraduate students, postgraduate students and experts but never in the same sample over such an extended study period. One previous study reassessed postgraduate general practice students after three months but did not show any change in their scores [20]. Hence, developing clinical reasoning is a process that takes time. This also raises the question of memorizing and learning through repetition, since our test was repeated seven times. However, the stability of the results, with their slow linear progression, as well as the fact that there were no 'right' answers leads us to conclude that a learning effect was unlikely.
With additional analysis, the SCT is a tool that could be integrated into the curriculum to allow the identification of students in difficulty and those that progress slowly, and enable the development of appropriate techniques to help them. To achieve this we must foresee a training programme that uses this technique and offers different methods of learning clinical reasoning [32,33]. Our study will allow us to conduct a second analysis to assess the impact of various clinical courses as students change course every six months during their clinical training. This will enable us to better tailor courses for students in difficulty. In the future, we could improve our test by removing or replacing the least discriminating items.
At the end of a students' training, their teachers must certify whether they are competent professionals [12]. To do this, it is necessary to assess their theoretical and practical knowledge and skills. Only the SCT provides an overarching assessment of these skills and allows a comprehensive evaluation of the different facets of professional training [2]. If the evaluation by SCT becomes one of the key tools in medical education, it could be a tool that would enable the standardization of the teaching and evaluation of students in general practice on a large scale, as is desirable for the future course of our speciality [9].

Conclusion
The SCT developed for this study was found to be reliable and capable of discriminating between postgraduate students and experts in general practice. These results demonstrate that the SCT is a useful tool for assessing clinical reasoning in postgraduate students of general practice. With additional analysis, we could propose this tool for monitoring progress in the development of clinical reasoning.