Variations in measurement of interprofessional core competencies: a systematic review of self-report instruments in undergraduate health professions education

ABSTRACT Educating health care professionals for working in interprofessional teams is a key preparation for roles in modern healthcare. Interprofessional teams require members who are competent in their roles. Self-assessment instruments measuring interprofessional competence (IPC) are widely used in educational preparation, but their ability to accurately and reliably measure competence is unknown. We conducted a systematic review to identify variations in the characteristics and use of self-report instruments measuring IPC. Following a systematic search of electronic databases and after applying eligibility criteria, 38 articles were included that describe 8 IPC self-report instruments. A large variation was found in the extent of coverage of IPC core competencies as articulated by the Interprofessional Education Collaborative. Each instrument’s strength of evidence, psychometric performance and uses varied. Rather than measuring competency as “behaviours”, they measured indirect proxies for competence, such as attitudes towards core interprofessional competencies. Educators and researchers should identify the most appropriate and highest-performing IPC instruments according to the context in which they will be used. Systematic review registration: Open Science Framework (https://archive.org/details/osf-registrations-vrfjn-v1).


Introduction
Delivering modern high-quality healthcare needs workers with expertise, competence and the ability to work collaboratively as part of an interprofessional team.Preparing professionals for interprofessional collaborative work requires reliable and feasible assessment methods.

Assessment and outcomes
Interprofessional education (IPE) does not uniformly improve interprofessional competence (IPC), which refers to specific skills for specific tasks (Marion-Martins & Pinho, 2020;O'Keefe et al., 2017;Spaulding et al., 2021).Because "in-role" competence is a prerequisite for successful IPE, understanding why IPE varies in its effectiveness requires reliable and accurate measures of IPC.
Assessment is used variously, from grading students to contributing to course development and research, but IPC assessment tools often lack conceptual clarity (J.E. Thistlethwaite et al., 2014).Outcomes often reflect professional domains (e.g.medicine and nursing) and proximal outcomes, such as attitudes measured before or after a training session, rather than distal outcomes, such as demonstrable competencies at the end of a programme (Guitar & Connelly, 2021;O'Keefe et al., 2017;Wooding et al., 2020).IPE outcomes can be thought of hierarchically: at the base, interprofessional reactions, upwards through attitudes, knowledge, skills and behavioural change and, finally, patient benefits (Hammick et al., 2007).Using theory alongside outcomes can help provide clarity by specifying the professional practices included in IPC and describing them and their relationships to each other (Reeves et al., 2011;J. E. Thistlethwaite et al., 2014).
To move beyond profession-specific education and clarify IPC, a consensus framework was developed by the Interprofessional Education Collaborative (IPEC, 2016).The IPC framework has four core competencies: values and ethics, roles and responsibilities, interprofessional communication and teams and teamwork.

Self-assessment as a response to the challenge of observing IPC
Ideally, competence should be assessed by observing professionals working in interprofessional teams, capturing their "real-world" performance in the context of complex interactions with other professionals and patients.Observing IPC systematically in authentic settings is, however, challenging (J.Thistlethwaite et al., 2016).Whilst tools such as observation rubrics for assessment can encourage fidelity and inter-and intra-observer reliability, they are no panacea (Curran et al., 2011).IPC changes over time, so assessment techniques must be able to efficiently capture this dynamic development (Anderson et al., 2016;Rogers et al., 2017).
Self-report instruments are a popular, pragmatic means of assessing IPC (Blue et al., 2015;Phillips et al., 2017).They are convenient to distribute, straightforward to analyse and report and allow comparison of findings between settings and over time.However, the validity of any comparisons and conclusions depends on the quality of the instrument.An instrument's quality can be reviewed, in part, from its systematic properties, such as various forms of internal validity and reliability (Havyer et al., 2016;Oates & Davidson, 2015;Thannhauser et al., 2010), and the empirical and theoretical adequacy of any development or validation processes (for example, the degree of non-responder bias in instrument development).
In this review, we focus on self-report instruments, as they are frequently used to measure the outcome of IPE.An evaluation of existing instruments can, by increasing awareness of the benefits and limitations of specific instruments enable an informed choice.The current literature lacks an overview of what components of the spectrum of conceptualisations of IPC are addressed by validated instruments.Moreover, judgement of the quality of an instrument depends on the context in which it is being applied.For example, an IPC instrument used for formal academic assessment may have differing criteria for being judged "high-quality," such as good reliability and strong internal validity, whereas if the same instrument is used as a prompt for informal educational development or training intervention, criteria such as ease of administration, speed and ease of completion and the strength of underpinning behavioural or educational theory will have more weight.Therefore, the contexts in which instruments are used are connected to the assessment of instrument quality.
There is a need to identify variations in the characteristics of self-report instruments and the ways they are used.To achieve the aim of exploring these variations, our review's primary objectives are to: (1) Describe the components of IPC assessed in self-report instruments, (2) Assess the quality of IPC self-report instruments, (3) Describe the theoretical foundation for IPC instruments and (4) Describe the educational contexts.
To achieve these objectives, we performed a review to support educators in informing assessment and research using selfreport instruments.

Method
We conducted a systematic review based on a published protocol (Allvin et al., 2020).

Eligibility criteria
Studies were eligible if they (a) used a quantitative or mixedmethods design, (b) assessed undergraduate students from two or more health professions, (c) related to educational interventions assessing one or more aspects of IPC, (d) used a self-report instrument regarding IPC and (e) evaluated these instrument's psychometric properties (e.g.validity and reliability).
Studies were excluded if they (a) failed to treat results from students and practitioners/faculty in the same sample separately, (b) used a self-report instrument unrelated to IPC or where the IPC focus was secondary to other aspects (where the major part of the instrument consisted of items not related to IPC e.g.simulation as a learning modality or teamwork not relating specifically to interprofessionality), (c) limited the evaluation of psychometric properties to only reporting internal reliability (e.g.Cronbach's alpha), (d) reported only course satisfaction and (e) were not empirical research.

Information sources
We searched the PubMed (NLM), CINAHL (EBSCO), ERIC (EBSCO), Scopus (Elsevier) and Web of Science (Clarivate) databases following pilot searches in PubMed and Scopus.Covidence TM software was used to facilitate screening papers for eligibility, to record the quality assessments and to extract the data.

Search strategy
The search strategy was developed by the research team and an expert librarian.Details of the search strategies used in each database are included in Supplementary File, Table 1.The search was restricted to articles published in English between January 2010 and May 2023.The reference lists of included studies were also screened for additional studies.The searches were performed in April 2019 with an update February 2023 (all databases except Scopus) and May 2023 (Scopus).

Study selection
Titles and abstracts were independently screened for eligibility by RA and SE.Potentially relevant articles were examined in full-text form by RA, and the screened articles were independently reviewed for eligibility by RA and SE.Inclusion decisions were made in consensus between RA and SE.CT was available as an arbiter in case of disagreement -there was none.

Data extraction and synthesis
A data extraction form based on the Medical Education Research Study Quality Instrument (MERSQI) (Reed et al., 2007) and the Best Evidence in Medical Education (BEME) guidelines (Gordon et al., 2019) was used.Two researchers (RA and SE) first extracted the data independently, verified the data collaboratively using CT as an arbiter in case of disagreement and in case of conflict of interest (one case).Quality was assessed using a coding rubric developed by Artino et al. (2018).
RIPLS (Spada et al., 2022) 14 Agreement Adapted from McFadyen et al. (2005).(excluding item 10,11,12,18,19) SEIEL (Mann et al., 2012) 16 Unlabelled Likert-type, 10-point and (Guitar & Connelly, 2021) author statements about the strengths and/or weaknesses of the instrument.The extracted data were analysed and compiled into tables.The instrument items were analysed in relation to the four IPC core competencies of IPEC ( 2016) by RA and SE, followed by discussions in the local IPE community during an academic seminar attended by IPE educators and researchers, after which the analysis was further refined.No ethical approval was required, and no sensitive data were handled; therefore, no ethical vetting was performed.

Results
Of the 7671 identified articles, 38 met the inclusion criteria and were included for analysis (see Figure 1).

Characteristics of the included studies
The study characteristics are presented in Supplementary File, Table 2. Thirty-one studies used a post-intervention only design and seven used pre -post designs.Most studies included medical students (n = 32), followed by nursing (n = 31), pharmacy (n = 18) and physical therapy (n = 15) students.Medicine and nursing were the two professions that most frequently shared IPE (28 studies, 74%).These two professions, in turn, were most frequently sharing IPE with pharmacy (14 studies, 37%), physiotherapy (11 studies, 29%) and dentistry (9 studies, 24%) students.Thirty-one studies included between 2 and 7 professions, and 4 studies involved more than 10 professions.Most studies were conducted in the United States (n = 9).IPE was mainly delivered in activities within courses or programmes with undescribed content and structures (n = 11).IPE was delivered in simulated settings (n = 8), classrooms (n = 6), and clinical practice (n = 2).Eight studies were unconnected to a specific activity, and eight studies provided no information regarding the educational context.The Readiness for Interprofessional Learning Scale (RIPLS) was the most commonly used instrument (23 studies).Time from learning activity to assessment of IPC varied between assessment alongside the intervention (without specifying before or after) (Cloutier et al., 2015;Ganotice & Chan, 2018;Tyastuti et al., 2014;Zaher et al., 2022), before the intervention (Hasnain et al., 2017;Kerry et al., 2018;Kottorp et al., 2019;Lie et al., 2013), directly after the intervention (Al-Shaikh et al., 2018;Edelbring et al., 2018;Milutinović et al., 2018;Shimizu et al., 2022;Willman et al., 2020) to before and after the intervention (Peeters et al., 2016;Yu et al., 2018).Two studies reported administering an online survey directly after the activity completion, without specifying the response period (Lunde et al., 2021;Schmitz et al., 2017).Twenty-one studies did not report the time from intervention to assessment.

Included instruments
The 38 included studies in this review used in total 8 selfreport instruments (Table 1).Most studies (n = 34) adopted or refined a previously developed assessment tool.Four surveys described instruments developed specifically for the study in question (Fike et al., 2013;Hasnain et al., 2017;Hojat et al., 2015;Mann et al., 2012).The number of items in the instruments ranged from 9 to 38.Most instruments used Likert-type rating scales with agreement response options.

Interprofessional core competencies in the instruments
As the IPEC competency tool, the IPECC-SET 38 (and the shorter versions, the IPECC-SET 27 and IPECC-SET 9) were all based on the IPEC framework description of IPC competencies, these self-assessment instruments matched at least some IPEC core competencies (see Table 2).However, there were also un-or under-represented IPEC competencies in some instruments.The IPECC-SET 9 (Axelsson et al., 2022;Kottorp et al., 2019) and the JeffSATIC (Hojat et al., 2015) omitted measuring the values and ethics core competence.The IPEC competency tool (Lockeman et al., 2016), the JeffSATIC (Hojat et al., 2015), SPICE (Fike et al., 2013), SPICE-R (Peeters et al., 2016), SPICE-2 (Zorek et al., 2016) and SPICE-R3 (Axelsson et al., 2022) did not measure interprofessional communication; the three RIPLSs with 16 items (Cloutier et al., 2015;Tyastuti et al., 2014;Yu et al., 2018) and the IPEC competency tool (Lockeman et al., 2016) omitted measuring the roles and responsibilities competence.Many instruments (the IPECC-SET and the IPEC Competency tool excepted) contained one or more additional items not related to any IPEC core competence (see Table 3).

Conceptual frameworks in IPC self-report instruments
The underlying theory or explicit conceptual frameworks for self-assessment instruments were under-reported.The rationales reflected a pragmatic need for IPC in healthcare and, thus, a need for IPC in professional education.Shimizu et al. (2022); Sollami et al. (2018) drew on social identity theory to portray IPE -and attitudes to IPE -as a function of the relationship between groups of students.Mann et al., (2012) used socio-cognitive theory and self-efficacy constructs to evaluate student learning through interaction with their environments and the people and activities within these environments.Hasnain et al. (2017) and Kottorp et al. (2019).also suggested self-efficacy as a relevant approach to developing IPE.Zaher et al. (2022) framed their study using relational coordination theory and situated learning theory.

Discussion
Our review identified eight self-assessment instruments that measure IPC in healthcare undergraduates and vary in two distinct and important ways.First, psychometrically: measures vary from reliable to unreliable and range from strong internal validity to questionable internal validity.Second, this performance comes from a narrow range of interprofessional learning evaluation contexts -mainly medicine and nursing students.All the included studies lacked a strong explicit theoretical base.Whilst the IPEC core competencies were all represented to varying extents, some received less attention than others -notably, interprofessional communication.The reasons why were not communicated by the study authors.
IPC developmental educational interventions are an example of complex interventions (Skivington et al., 2021).This complexity makes IPC and associated assessment intellectually challenging by specifying relationships between concepts, direction and causality.However, assessing groups of students  (RIPLS 14 items, Sollami et al (Sollami et al., 2018).item wording not available).
simultaneously is challenging practically: it needs to be feasible and efficient.Ease of administration and presentation of results means that self-report measures will likely remain.Methodologically, though, a further challenge remains: reported intention in learners does not necessarily translate to observable behaviours -what people say they will do or feel may not equate to what they actually do or feel in a situation or interprofessional context (McConnell et al., 2012;Rogers et al., 2017).Our review suggests that IPC instrument developers and evaluators omit this important implementation consideration, arguably overestimating instruments' utility in non-classroom (virtual, simulated or actual) contexts.An educator or researcher seeking an instrument for IPC assessment needs the highest quality, most trustworthy instrument for the intended use and context.This necessitates assessing the instrument content, quality of evidence (psychometric properties) and quality of application (user feasibility).Therefore, we have summarised the measurement characteristics of each self-assessment instrument to facilitate decisionmaking by educators and researchers seeking to choose an appropriate instrument (see Table 3).Any choice between instruments will involve benefit trade-offs between these three important aspects of quality.
If the educator is principally concerned with the psychometric properties (e.g.structural validity and reliability) of the instrument, then the IPECC-SET or SPICE are among the optimal instruments.However, the IPC challenge for the educator is to use the instrument in context.The context for the educator may differ from the context in which the scale was developed, for example, team surgery roles for a successful error-free operation vs team roles in a successful transfer into an emergency department from an ambulance.Therefore, the implementation potential is -in part -a function of the generic nature of the competences assessed and the volume of repeated and reproduced evaluations in which the instrument has been applied.The contextual importance of scale development may help explain why one self-report instrument may demonstrate strong psychometric quality in one study but perform less well in another.Ideally, a judgement of instrument quality will come from multiple validations in varying contexts.Thus, we need more systematic replication of instruments for varying IPC challenges, a challenge faced by practice developers using other quality improvement methods (Ivers et al., 2014).
IPC instruments rarely demonstrate best practice in instrument development (Oates & Davidson, 2015).Badly formulated item wording is common, and 95% of the survey instruments used in health professional education have been found to contain badly designed and laid out items (Artino et al., 2018).Most self-assessment instruments in our review contained items formulated as statements and assessed with Likert-type rating scales (Al-Shaikh et al., 2018;Cloutier et al., 2015;Edelbring et al., 2018;Ergönul et al., 2018;Fike et al., 2013;Ganotice & Chan, 2018;Ganotice et al., 2022;Hojat et al., 2015;Kerry et al., 2018;Keshtkaran et al., 2014;King et al., 2012;Li et al., 2018;Lie et al., 2013;Lockeman et al., 2021;Luderer et al., 2017;Mahler et al., 2016;Milutinović et al., 2018;Onan et al., 2017;Peeters et al., 2016;Piogé et al., 2022;Pudritz et al., 2020;Shimizu et al., 2022;Spada et al., 2022;Torsvik et al., 2021;Tyastuti et al., 2014;Villagrán et al., 2022;Violato & King, 2019;Williams et al., 2012;Yu et al., 2018;Zaher et al., 2022;Zorek et al., 2016), which is a format that can lead to a form of bias where any assertion made in a question is endorsed, regardless of content (Krosnick, 1999).Better practice is to formulate items as questions emphasising the underlying construct (Artino et al., 2011) (for example, "How confident are you that you can do well in this course?"instead of "I am confident I can do well in this course").The negatively worded items used in versions of the RIPLS (Ganotice & Chan, 2018;King et al., 2012;Li et al., 2018;Torsvik et al., 2021;Tyastuti et al., 2014;Yu et al., 2018) can be difficult to comprehend and answer accurately (Artino et al., 2014).Furthermore, the negatively worded items need to be reverse scored in a sum score analysis.The RIPLS was mainly adapted from Parsell and Bligh in 1999 (Parsell & Bligh, 1999).While some authors have recommended scale revision (Tyastuti et al., 2014) to generate new items (Milutinović et al., 2018), no authors have discussed the need to update the wording of items or the response options, despite the instrument being developed prior to 1999.Using or adapting an existing instrument can be advantageous, for example, for encouraging crossstudy comparisons.This argument may explain the popularity of the RIPLS.However, the differences in the structural validity of the RIPLS between contexts (Yu et al., 2018) shows little support for this strategy.A better strategy is to establish validity by focusing on item quality in relation to intended use.Furthermore, calls have been made to move forward from assessing attitudes towards learning IPC to assessing these competencies per se (Torsvik et al., 2021).
In this review, the psychometric evidence varied.Content validity was generally appropriate, evaluated using experts and students and/or based on the literature and previously developed instruments.However, the absence of some IPC aspects (e.g.values and ethics, interprofessional communication and roles and responsibilities) meant that conclusions about overall IPC are -or should be -limited.
Many studies performed factor analysis to assess structural validity.Confirmatory factor analysis (CFA) provides information on how well existing scale partitioning fits in a certain context, thus contributing useful psychometric data and provides possibility to generalize across contexts.In addition, we also found uses of exploratory factor analysis (EFA).While adding to judgements of psychometric quality, conclusions drawn from EFA may deviate from original, theory-driven scale partitioning.The case of one study using EFA to restructure IPEC-based scales, leaving only two (teams and teamwork and values and ethics) of four core competencies for future recommended use, is a questionable result of EFA practice (Lockeman et al., 2016).
Generally, high total levels of internal reliability (Cronbach's alpha) were reported.Cronbach's alpha, however, is affected by the length of the scale, so reported alpha values may not reflect the internal consistency of items or unidimensionality of the scale, but may derive from a large number of test items (Streiner, 2003).The subscale (factor) roles and responsibilities in different versions of the RIPLS showed very low values, which could be due to a low number of items, poor interrelatedness between items or a heterogeneous construct.Criticism has been directed at Cronbach's alpha as a measure of dimensionality, which has frequently been reported without adequate understanding and interpretation (Kalkbrenner, 2023;Tavakol & Dennick, 2011).The Omega measure is proposed by Kerry (Kerry et al., 2018) as a more accurate estimate of reliability due to e.g.being less sensitive to scale length.However, this estimate is not widely accessible in statistical packages and is also best functional under certain conditions of data (Kalkbrenner, 2023).Thus, more empirical and theoretical work is required to establish reliable common practices in establishing unidimensional measurement scales for IPC.
Half of the studies in our review were intended for and subsequently conducted among English-speaking students.The performance of IPC self-assessment instruments in non-English-speaking students and/or using translated instruments remains uncertain.A rigorous translation process should consider both the instrument's theoretical origin and the target context.
The quality of a self-assessment instrument can also be judged functionally, i.e. in context and in relation to its fitness for purpose, structurally (the quality of underpinning empirical and theoretical evidence) and in relation to the process (implementability) (Donabedian, 1988).As with other educational research (Niemen et al., 2022), our review revealed a narrow theoretical base, with only six studies drawing on explicit theory: social cognitive theory (Bandura, 2001), intergroup contact theory (Allport, 1954), social identity theory (Tajfel & Turner, 1979), relational coordination theory (Gittell, 2011) and situated learning theory (Lave & Wenger, 1991).These theories also focus on learning and attitudes, but IPC relies on team members' performing (i.e.their behaviours).This gap in the relationship between exposure to educational interventions, assessment and eventual competence is an important omission in the evidence base.
Assessment is related to the intentions of the educational strategies being used.Interprofessional learning educational approaches vary greatly between universities, such as using problem-based learning, e-learning or simulation-based strategies (Aldriwesh et al., 2022).The sparse description of teaching and learning approaches in the identified reports meant that we were unable to assess these contextual factors.A related aspect, however, is the time it takes to develop IPC competence in a manner that is reflected in a self-report instrument.IPC was assessed directly after the educational intervention in some studies which is something to consider in interpreting these results.Interprofessional learning is a dynamic process, and developing IPC takes time (Rogers et al., 2017).The effect of time from educational exposure and the advantages and disadvantages of more dynamic research designs (for example, interrupted time series designs) remain an important uncertainty.Future research should focus on systematic replication and adaptation of formal and explicit theory-based IPC self-assessment instruments.

Limitations
Publications were limited to the English language; therefore, we may have missed non-English IPC instruments and evaluations.Information in the included studies was minimal, limiting completeness of data extraction.Time and resource constraints meant that we did not contact individual authors of the studies to provide more details.Part of the potentially relevant full-text articles were initially screened by one researcher, with the attendant risk of erroneously excluding studies due to screening fatigue.After the first screening, the articles were independently reviewed by two researchers, and inclusion decisions were made by team consensus.
There are minor changes from the protocol (Allvin et al., 2020) to the performed study report: a) clarified formulation of objectives; b) in the protocol, the Scopus database was omitted which we amended in the manuscript; c) exclusion criteria were more elaborated; d) The protocol mentioned the overall outcome as the effectiveness of instruments used to assess interprofessional competence.To address this outcome in practice we directed focus towards characteristics of instrument in relation to their respective ways to relate to IPEC core competencies as a means to reach effectiveness to measure IPC; e) Quality assessment forms MERSQI, BEME, NOS-E and COSMIN were stated in the protocol as a basis for quality assessment.In the study we chose to merge MERSQI, BEME & and a rubric developed for item quality assessment by Artino et al. (2018).

Conclusion
IPC core competency domains are reflected to varying extents in different self-assessment instruments.Educators and researchers need to identify the most appropriate instruments for use in different contexts by considering both the quality of evidence and the quality of application.Interprofessional competence is a function of the work or educational context, so selecting an IPC assessment instrument should happen as part of the conception and design of the IPE intervention being considered.This review contributes an increased awareness of IPC aspects in self-report instruments and provides educators and researchers with a summary of the available IPC selfassessment instruments to guide instrument selection.

Table 2 .
IPC self-assessment instrument item numbers related to IPEC core competencies.

Table 3 .
Summary of measurement characteristics for each self-assessment instrument.