Peer reviewer training to build capacity in engineering education research

ABSTRACT The Engineering Education Research (EER) Peer Review Training (PERT) project aimed to develop EER scholars’ peer review skills through mentored experiences reviewing journal manuscripts. Concurrently, the project explored how EER scholars develop capabilities for evaluating and conducting EER scholarship through peer reviewing. PERT used a mentoring structure in which two researchers with little reviewing experience were paired with an experienced mentor to complete three manuscript reviews collaboratively. Using a variety of techniques including think aloud protocols, structured peer reviews, and exit surveys, the PERT research team addressed the following research questions: (1) To what extent are the ways in which reviewers evaluate manuscripts influenced by reviewers’ varied levels of expertise? and (2) To what extent does participation in a mentored peer reviewer programme influence reviewers’ EER manuscript evaluations? Data were collected from three cohorts of the mentored review programme over 18 months. Findings indicate that experience influenced reviewers’ evaluation of EER manuscripts at the start of the programme, and that participation can improve reviewers’ understanding of EER disciplinary conventions and their connection to the EER community. Deeper understanding of the epistemological basis for manuscript reviews may reveal ways to strengthen professional preparation in engineering education as well as other disciplines.


Introduction
Questions about how we know what we know and how knowledge is related to action have been posed for centuries, beginning with philosophers such as Plato (Epistemology-The history of epistemology | Britannica n.d.).Historically, researchers conceived of knowledge from a positivist perspective, with knowledge thought of as fixed and activated as needed to inform or guide problem-solving.Since the early 21 st century, this view has been increasingly challenged, with theorists and practitioners both arguing that disciplinary knowledge is transactional, socially constructed, and essentially functional, continually adapting and updating through experience.These views have infiltrated professional education, challenging conventional practices in higher education about how to prepare students to be teachers, architects, medical doctors, and engineers (Coles 2002;Schön 2017).
Yet research on the epistemology of researchers is limited.Ideas about how knowledge develops for engineering education research (EER) professionals is particularly interesting, because like many of the social sciences, EER is an interdisciplinary field shaped by the norms of the disciplinary origins of its members (Beddoes, Xia, and Cutler 2022).Some EER professionals were prepared in engineering education programmes.Others were trained as engineers with no previous expertise in education research, but whose professional practice and intellectual interests motivated them to explore the teaching and learning of engineering.Others migrated into EER from social science disciplines, having no previous training in engineering (Benson et al. 2010).
Regardless of their backgrounds, all paths that professionals have taken who study EER converge in manuscript review.Peer review of scholarship is critical to the advancement of knowledge in a scholarly discipline such as EER.Academia relies heavily on peer review, with nearly every facet of academic work evaluated, at least in part, by the peer review process.Indeed, publishing manuscripts, promotion and hiring, grant funding, awards, and in some cases, teaching evaluations rely on peer review (Hojat, Gonnella, and Caelleigh 2003).
For manuscript review in EER, peer reviewers apply their various perspectives and professional knowledge in assessing the quality and potential of a study to advance academic discourse and the practice of engineering education.Manuscript review is a discussion (sometimes a negotiation) between professional peers in the roles of reviewers, editors, and authors about effective and robust research practices (Lee 2012).At the same time, manuscript review has a weighty, gatekeeping function (Hojat, Gonnella, and Caelleigh 2003).The decisions made about manuscripts can have lasting effects on individuals, journals, and the profession itself.It is generally recognised that peer review strongly influences these decisions, but the basis by which manuscripts are evaluated by peer reviewers is little known or understood (Tennant and Ross-Hellauer 2020).Peer review has wideranging implications for research and academic communities.In research, peer review determines what is shared with the larger community and even what projects are conducted in the first place through distribution of grant funding (Langfeldt 2001).In academic communities, peer review inevitably wields power over who holds academic positions and whose voices are heard (Newton 2010;Lipworth and Kerridge 2011), and thus determines the level of inclusivity of a field as new scholars, ideas, and methods emerge.Collectively, peer review shapes academic disciplines and defines community values (Tennant and Ross-Hellauer 2020).Despite the enormity of these implications, scholars receive little or no training in effective and constructive peer review.
This study aims to explore the ways peer reviewers evaluate the quality and overall value of EER manuscripts, particularly with respect to their background and level of reviewing expertise.We anticipate that by examining the peer review process within the context of a mentored peer reviewer programme, we can advance knowledge about how mentors and mentees build shared understanding of the review process and perceptions of quality in EER research.This in turn will advance our ability to bring new scholars into the EER field and expand capacity as we develop vibrant, reflective networks of EER scholars.

Literature review
Peer review clearly constitutes a social epistemic feature of the production and dissemination of scientific knowledge.It relies on members of knowledge communities to serve as gatekeepers in the funding and propagation of research.It calls on shared norms cultivated by the community.And it relies on institutions such as journal editorial boards, conference organizers, and grant agencies to articulate and enforce such norms.-Lee, 2012 [7, p. 868].
Not surprisingly, perhaps, researchers who have studied peer review typically focus on issues of reliability or convergence in the assessment of reviewers' ratings.The premise underlying these studies is that a manuscript has an inherent quality that can be assessed against the standards and conventions of an intellectual discipline, as long as reviewers are not corrupted by biases and/or inattentiveness during the review process (Merton 1979;Tyler 2006).Several studies have explored bias in the peer review process (Ceci and Peters 1982;Cole and Cole 1981).Typical findings include low correlations between reviewers, bias in single-anonymous reviews (in which reviewers know the author's identity) that favour eminent researchers, and biases that favour prestigious institutions.
There is scant research on the bases by which reviewers formulate their recommendations.In a study of what reviewers focused on for 153 manuscripts submitted to American Psychological Association journals, researchers examined the proportion of reviewer comments related to the conceptualisation of the study, design, method, analysis, interpretations and conclusions, and presentation (quality of expression) (Fiske and Fogg 1990).Twothirds of comments overall were related to the planning and execution of the study, and one-the third related to the presentation.Conceptualisation (20%), analyses and results (22%), and interpretations/conclusions (16%) were also frequent weaknesses commented on by reviewers.The reviewers found minimal consensus on publication recommendations across reviewers, although they found very few disagreements across reviewers about specific issues in the paper.Variability in the reviewers' recommendations may have resulted from individuals weighting specific strengths and weaknesses differently, as other researchers have found (Newton 2010).A mixedmethods study of perceptions of reviewers and editors of the review process in EER specifically provided similar insights into what reviewers tended to focus on: relevance of the topic, data collection or analysis, and theoretical frameworks (Beddoes, Xia, and Cutler 2022).Although these authors draw distinctions between EER and other disciplines based on their analysis of reviews of articles that were rejected or accepted for publication in an EER journal, in reality reviewers weigh similarly narrow sets of factors in other fields and reviews are similarly seen as potentially biased (Newton 2010;Fiske and Fogg 1990).Reviewers' comments to editors to justify or explain their recommendations vary widely and typically do not include the tacit criteria reviewers use to evaluate a manuscript; some journals do not even require justification of recommendations by reviewers (Tennant and Ross-Hellauer 2020).In many reviews, only the most prominent features of a manuscript -negative and positive -are likely to be mentioned in reviewer comments.This raises the question, what factors influence how reviewers weigh various factors that result in their recommendation to editors on whether to publish a manuscript?
In deciding which of these factors bear the most and least weight in review criteria, reviewers will likely rely on previously formed schema to guide their decision-making process (Newton 2010).Schemata (plural of schema) are general representations of knowledge which are typically abstract and used to fit into a given context (Anderson, Spiro, and Anderson 1978).All schemata comprise variables (tangible objects or actions) that help to build a foundation for this larger, abstract conceptualisation based on connections between variables (Anderson, Spiro, and Anderson 1978;Rumelhart and Ortony 1977).However, when an event is encountered that does not fit a previously built schema, the schema must either be tuned to account for this dissonance, or completely rebuilt into an entirely new schema (Rumelhart 1980).In the case of schema for peer review, variables can consist of manuscript elements such as formatting, theoretical backing, and writing clarity.The assessments of these variables are formed based on personal, prior experiences and ultimately lead to inferences about an outcome (Rumelhart 1980), such as a recommendation to an editor.
Because schemata are based on individual experiences (including those encountered in professional situations), individuals with similar backgrounds are likely to have similar schemata.For example, in a study exploring how schema develops in teams of individuals from different backgrounds with a diverse schemata, as some individuals adjusted, their schemata were co-oriented (Rentsch et al. 1998).This allowed teams to better communicate and reach consensus decisions.When schemata were not cooriented, discord occurred and teams were unable to communicate effectively, leading to task failure.Teams that developed similar schemata were ultimately more likely to accurately identify a problem and deeply explain the logic behind their thoughts or conclusions to build on the team's similarly formed schemata.
Working to 'tune' a schema can be a slow and arduous process, but guidance from a mentor with well-developed schemata eases this burden (Rumelhart and Ortony 1977).Much like in team settings, apprenticeships allow for the co-orientation of schemata, however, apprenticeships use a scaffolding, stepwise approach under the direction of a coach.In 2009, Austin (Austin 2009) detailed these steps through a theoretical model of apprenticeship for doctoral students in a seminar.Five specific steps were outlined in order from lowest to highest amounts of scaffolding.1) Modelling -mentors model expectations for a working procedure with detail, 2) Coaching -students engage in the task with coaches providing formative feedback as needed, 3) Scaffolding -difficulty of the task increases with less direction from the coach, 3) Articulation and Reflection -students ask questions and articulate the underlying process (schema) they have learned, 5) Promote Transfer of Learning -coach encourages the student to think about and apply the built schema elsewhere.
Much like cohesive teams with similarly built schema, peer reviewers who have similar levels of peer review experience likely evaluate manuscript elements similarly.However, discrepancies are likely to arise when young career faculty or graduate students who may have little or no EER or peer reviewing experience and poorly formed schema conduct reviews.
In this study, we explore the relationship, if any, between peer reviewing experience and the way in which reviewers evaluate manuscripts.Aspects of manuscript evaluation include the tacit criteria for determining quality or value of EER manuscripts and the weighting of various elements of a manuscript.We examined these relationships within the context of a mentoring programme, guided by the following research questions: (1) To what extent are the ways in which reviewers evaluate manuscripts influenced by reviewers' varied levels of expertise?( 2) To what extent does participation in a mentored peer reviewer programme influence reviewers' EER manuscript evaluations?

Peer reviewer training (PERT) program
The PERT programme was developed to provide peer review training to emerging EER scholars from different disciplinary backgrounds, framing the peer review process around mentoring and building up the EER community (Benson et al. 2021;Jensen et al. 2021;Watts et al. 2022;Jensen et al. 2022).The goal of PERT is for participants to build capacity in EER through a peer reviewer training programme that grows their professional network and fosters schema development for reviewing EER manuscripts.The structure of the PERT programme was to pair less experienced mentees with more experienced mentors in triads (one mentor and two mentees).After virtual training and orientation sessions together, in which mentors and mentees could network with each other and the programme team (coordinators, researchers, and evaluators), triads then collaboratively wrote reviews of three manuscripts submitted to an EER journal (Figure 1).Research and evaluation data sources included five Structured Peer Review (SPR) forms, Think Aloud Protocols (TAPs), and exit surveys (Figure 1).Cohorts of up to twelve triads participated in the six-month programme.We report here the findings from three cohorts of the PERT programme conducted from January 2021 through May 2022.The first two cohorts completed all research and training activities in the mentored manuscript review programme; the third cohort only completed pre activities and went onto complete a mentored proposal review programme.

Participant recruitment and selection
Mentees were selected through a competitive, online application process that collected demographic information, professional background (Ph.D. discipline and year of degree), current position, relevant EER experience (e.g.publications, presentations, and reviewing history), confidence reviewing EER manuscripts, and the number of EER colleagues with whom they regularly interact.Participants were chosen based on a baseline level of experience (some EER training or education and at least in their last year of graduate study) and their desire to help advance EER through peer review.Special consideration was given to individuals deemed 'lone wolves' who were not wellconnected to an EER network (Riley et al. 2017), diverse participants who may not have been previously connected to the EER community, and postdoctoral researchers.This process resulted in an overall acceptance rate of 38%.Mentors were invited to participate based on their experience in EER, recommendations from journal editors and colleagues, and their desire to help advance EER through peer review.Invited mentors were senior researchers and faculty members who had reviewed multiple journal papers or were previous members of an editorial board of an education research journal.In total, mentors and mentees represented 23 universities in six countries.Across all cohorts, mentees' professional levels included

Data collection
Think Aloud Protocols (TAPs) are designed to explore individuals' thoughts and reasoning processes as they work through problems or engage in self-regulated learning activities (Greene, Robertson, and Costa 2011).After orientation and prior to beginning manuscript reviews, all mentees and mentors were invited to complete TAPs; interviews were conducted with twelve mentees and five mentors from cohorts 1-3.During these individual virtual interviews, participants verbalised their review of a brief (~1500 word) pre-published manuscript (previously submitted to an EER journal with all identifying information redacted).Two such manuscripts were used for data collection by the research team; the manuscripts used for all of the TAP interviews are referred to as Manuscript A or B throughout this paper.
Researchers conducting the TAPs asked additional probing questions at the end of each manuscript section to ensure that participants verbalised all thoughts.
Sessions were recorded and transcribed with all identifying information redacted.Structured Peer Review (SPR) forms were designed to evaluate the criteria on which reviewers based their evaluations of manuscripts.The SPR is an online questionnaire that prompts participants to describe five notable strengths and weaknesses of a manuscript, recommend a decision to the editor (accept, minor revision, major revision, or reject), and provide a 200word justification of their recommendation (Figure 2).Participants were instructed to complete their SPRs individually and then use them as a starting point for discussions within their triads (Benson et al. 2021).
Mentees and mentors were asked to complete an SPR prior to the triad's first meeting (Pre-SPR; Manuscript A for cohort 1 and Manuscript B for cohort 2), for each manuscript they reviewed as a triad (SPRs 1, 2, and 3), and after their final triad review was submitted (Post-SPR; Manuscript B for cohort 1 and Manuscript A for cohort 2).Manuscripts A & B were both ~1500-word manuscripts that had been submitted to a special edition of a peer-reviewed EER journal and were used with permission from the authors for our research purposes.For both the Pre-and the Post-SPR articles, the associate editor recommended 'major revision' after receiving recommendations of both 'major revision' or 'reject' by the actual manuscript reviewers.In cohorts 1 and 2, 62 out of the 63 PERT participants consented to participate in the research study and submit SPRs.Only results from the Pre-and Post-SPRs are discussed in this paper.
After the conclusion of the programme (i.e.their triad completed three manuscript reviews), exit surveys were distributed to cohort 1 and 2 participants.The exit survey included closed-and openended questions about programme expectations, impact on professional development and community, and recommendations for improvement of the programme (Benson et al., 2021).Survey response rates for mentees and mentors were 88% and 75%, respectively.
The data collected from TAPs and SPRs were used to establish baseline similarities and differences between mentors and mentees at the start of the programme in how they conducted reviews (TAPs) and the content of these reviews (SPRs).These data were used to address our first research question, which explored the extent to which reviewers' varied levels of expertise influences the ways they evaluate manuscripts.The SPRs and exit surveys collected after participants completed the PERT programme were used to answer our second research question, which explored the influence of participation in a mentored peer reviewer programme on reviewers' EER manuscript evaluations.The SPRs were primarily used to document changes in content in manuscript reviews after participating in the programme; exit surveys provided insight into why or how these changes may have occurred.

Data analysis
TAP transcripts were analysed using open coding methods by two members of the research team.They identified coding events (meaningful passages to which codes should be assigned) within transcripts collaboratively for two transcripts, then independently identified coding events for the remaining transcripts.Through open coding, one researcher initially developed a set of 27 potential codes, and tested and refined the codes with the second researcher, resulting in 25 codes (Appendix A).They established consistency of coding through inter-rater reliability (IRR) by dividing the total number of agreed-upon codes by the total number of codes assigned within four transcripts.IRR for the two coders was calculated to be 73%.Although no standards exist for IRR for qualitative data, a reliability rating of 70% for open coding of phenomenological data can be considered an acceptable cutoff point (Marques and McCall 2005;Miles, Huberman, and Saldaña 2014).The researchers then independently and iteratively coded the remaining transcripts (n = 14), developing axial codes and categories.Coded sections of transcripts were extracted and analysed for relevant themes using thematic analysis (Braun and Clarke 2006).
SPR codes were developed after collection of Pre-SPRs from cohort 1's open-ended survey responses pertaining to strengths, weaknesses, and recommendations for Manuscript A. Using thematic analysis (Braun and Clarke 2006), the same two members of the research team used open coding to identify responses that described similar features within the manuscript.These were reviewed and revised iteratively and then further refined similar to the process described for TAPs data analysis.To ensure that the SPR codes were comprehensive enough to capture strengths and weaknesses across a broad range of manuscripts, this process was repeated for cohort 2 using Manuscript B. All iterations required respondents to provide open-ended comments to justify their recommendations to the editor.In this subsequent analysis, only two new codes emerged.
Coding resulted in identifying six themes: Context, Methods, Results, Discussion, Mechanics and Structure, and EER Relevance.Within each theme, codes were organised as strengths (positive attributes) and weaknesses (negative attributes) (Figure 3).Once codes were finalised, they were inserted into the SPR form as checkbox lists that future respondents could select from within strengths and weaknesses (Appendix B).After each participant completed all three triad manuscript reviews, they were sent the Post-SPR manuscript to review.Participants identified strengths and weaknesses from the checkbox lists, then wrote 200-word, open-ended justifications to the editor.
Segments of the recommendation justifications were coded independently by two researchers using the SPR codes.IRR was calculated as the number of segments that reflected agreement between the two raters divided by total segments.Although some 200-word responses included the same code multiple times, any one code was only counted once per response.After IRR was determined to be greater than 70% for SPR analyses, analyses were conducted on segments that both coders identified as coding events.For analysis of these data, we report results for codes used by at least 50% of mentors or mentees, which we define as 'convergence', in response to the three SPR questions (strengths, weaknesses, and justification of recommendation to the editor).To account for potential differences in codes simply due to variability in manuscript content and quality, analysis results were compared across manuscripts.As illustrated in Figure 1, the manuscript used as the Pre-SPR for cohort 1 (Manuscript A) was used as the manuscript for the Post-SPR for cohort 2 and vice versa.
Exit survey data, which informed programme evaluation, were analysed by averaging closeended responses and thematically grouping openended responses into categories such as 'expanded EER network' and 'increased reviewing confidence'.These results provided insight into perceptions of reviewing skill development and EER community connections as an outcome of participating in the PERT programme.

Think aloud protocols (TAPs) revealed schema differences between mentors and mentees prior to participating in the mentored review program
Analysis of TAPs allowed us to identify similarities and differences in schema development between mentees and the more experienced mentors.All similarities, differences, and supporting quotes based on TAPs are presented in Figure 4.The primary similarity in schema between mentees and mentors is that they focused on formatting and grammar of the manuscript.Both began reviewing the manuscript with expectations of how it should be formatted, including what information should be in each section.
Beyond formatting, mentees had few expectations on manuscript quality or purpose at the start of their review.As mentees moved through the paper, their schema was fluid; they began to build their expectations on what contributed to the quality of the paper and its relevance to EER.As shown in Figure 4 under both 'builds schema while reading' and 'questions authors', mentees would often ask themselves questions about an author's intention or how a statement fit into the overall argument at the beginning of the manuscript.However, by the end of the article, mentees would forge ahead in developing their own interpretations of unclear components.Through building this schema, mentees maintained a holistic view of the paper, working to assess the entire manuscript's quality rather than specific sections.Mentees were more likely than mentors to want to read the entire manuscript before making judgements on specifics of the manuscript.Once the schema for manuscript quality and purpose were developed, if the manuscript deviated from their expectations, mentees tended to be more reactive to the manuscript and would question the author's intentions.When mentees would experience deviations in their constructed schema for quality, it would often lead to them somewhat discrediting the validity of the manuscript.A clear example of a mentee's reaction to a schema deviation is shown in Figure 4 under 'questions authors'.Throughout their reviews, mentees often referenced their lack of reviewing experience and would question if their judgements were 'right'.
Like mentees, mentors approached the manuscript with expectations of formatting and grammar.Unlike mentees, mentors approached their review with clear expectations of the manuscript's research quality and relevance to EER.Mentors would often review each section individually, comparing the manuscript to their pre-formed schema.When the manuscript deviated from these expectations, mentors would ask clarifying questions about the manuscript to understand the authors' intentions.While mentees also often questioned the authors' intentions, it was more rhetorical (i.e.'Why did they do this?'), whereas mentors would ask and then provide possible explanations pulled from prior experience.One specific example of an interpretive question asked by a mentor is listed under 'makes interpretations' in Figure 4.They would also often include guiding comments and suggestions to support the authors in revising their manuscripts in ways that aligned with their expectations of an EER manuscript.
Mentors primarily made comments within their areas of expertise and focused on the logical or research elements of the manuscript rather than the manuscript as a whole.When they made these comments, mentors were very confident and would often reference their prior experience as a reviewer.

Structured peer reviews (SPRs) provide evidence of schema development in mentees after completion of the mentored preview program
The SPRs reinforced our findings from the TAPs that mentors came into the programme with more of a shared schema than the mentees.For both cohorts, at the start of the programme mentors were more likely to identify the same criteria when reviewing the same manuscript than mentees (Figure 5) based on their Pre-SPRs.While mentees only aligned on one code (E-3P) at least 50% of the time in their Pre-SPRs for both cohorts 1 and 2, mentors aligned on six (cohort 1 mentors) to seven (cohort 2 mentors) different codes.
This indicates that mentors came into the programme with more of a shared schema in terms of the criteria that they apply when conducting a peer review.
In contrast, upon completion of the programme, the Post-SPRs show that mentees identify shared criteria and are more aligned with mentors.In their responses to the Post-SPR for both manuscripts, mentees aligned at least 50% of the time on five codes for both cohorts 1 and 2. Similarly, mentors aligned at least 50% of the time on four codes for both cohorts 1 and 2 in their responses to the Post-SPR.Of these four aligned codes for the mentors, mentees aligned with three of them (E-3P, C-2N, and C-4N).This alignment in codes after participation in the programme was consistent for the two different manuscripts used for training and evaluation purposes in this study.This provides evidence that through participation in a mentored review programme, mentees were able to enhance their schema development and become more closely aligned with mentors.
Codes relating to Context (C) showed the most convergence across mentors and mentees, and across different manuscripts.Codes C-2N (Theoretical framework is either not provided or uncompelling) and C-4N (Research questions are not stated clearly) were identified as major weaknesses of the manuscripts by mentors and mentees when reviewing both manuscripts A and B in their Post-SPRs.Code E-3P (Relevant to EER and/or Timely (e.g.COVID)) was identified as a major strength by mentors and mentees of both manuscripts in their Pre-SPRs and Post-SPRs.This could indicate that these criteria are some of the most important to reviewers in EER.

Exit surveys highlight the building of community of practice and increased confidence in peer review and research
In the exit surveys, mentees were asked to rate their perceived connection to the EER community before ('PRIOR') and after ('AFTER') utilising a Venn diagram format (McDonald et al. 2019).The averages of the responses are shown in Figure 6.There was a clear positive shift in mentees' connection to the EER community through participation in the mentored reviewer programme.When asked to explain the extent of this shift, mentees who reported a closer connection with the EER community mentioned an increased level of confidence and belonging as a result of the programme.One mentee reported: I saw the care for researchers and community that is embodied in the PERT program, and that made me feel much more safe to be part of the community.It has also been great to have so many opportunities to interact.
Similar sentiments were also expressed by the mentors, for example: My research and work looks at STEM from an interdisciplinary perspective.This has lead me to engage with EER community in a variety of different ways.By being intentional with my involvement pushed me further into engineering education than before.
The exit survey also provided evidence that the mentoring increased participants' confidence in executing various facets of peer review (Table 1).Participants rated the peer review programme as having increased their reviewing and research skills and confidence moderately and to a great extent.
In response to the exit survey question about connections between peer review skills and research skills such as identifying EER topics to research, framing research questions, designing studies, and preparing manuscripts, both mentors and mentees  overwhelmingly agreed that there was a strong connection between peer review and research skills, and that the mentored reviewer programme helped improve those skills.Typical responses include: Yes, there is a good connection.The mentoring process has made me think about the main components of an article, also the different types of research that exist.I can look at my own paper to ensure that these components are included (Mentee).
Yes, my research skills have improved significantly due to my participation in the program.One of the main benefits has been understanding alignment in study design.As a reviewer, I always look for congruence between the problem/focus, theory, methodology, presentation of findings, and discussion of conclusions/implications.This perspective has made me more intentional in how I design and describe my own research.Having a keen eye for research quality as a mentor and being able to articulate its importance to the mentees has helped develop my ability to do the same as an author (Mentor).

Discussion
This study sought to understand the relationship between reviewing expertise and manuscript evaluation and the influence of peer review mentoring on this relationship.Our research was guided by the following research questions: (1) To what extent are the ways in which reviewers evaluate manuscripts influenced by reviewers' varied levels of expertise?(2) To what extent does participation in a mentored peer reviewer programme influence reviewers' EER manuscript evaluations?

Reviewing expertise, mentoring, and schema development
Mentors and mentees entered the PERT programme with varied levels of experience in peer review, resulting in clear differences in the schemata they drew from as they evaluated manuscripts as illustrated in both the TAP and SPR results.Previous literature has identified that not only lack of experience, but also differences in disciplinary expertise can lead to divergence in the definition of research or writing quality in manuscripts (Tennant and Ross-Hellauer 2020;Brezis and Birukou 2020).These differences are often compounded by a poor understanding of what defines 'quality' research in an emerging, interdisciplinary field such as EER (Tennant and Ross-Hellauer 2020).For these reasons, divergences such as those seen in our pre SPR and TAP data are not unexpected.However, by the end of the programme, mentors and mentees were more aligned in the criteria that they identified as important in their evaluations, a strong indication that their schemata have also become more aligned.Notably, codes relating to Context had the most convergence for both manuscripts by mentors and mentees.This could potentially indicate that for EER researchers, criteria related to problem framing is the primary consideration in manuscript evaluation.
The peer review study conducted by Fiske and Fogg reported a similar finding (Fiske and Fogg 1990).
The convergence of quality evaluations for manuscripts could possibly be explained by the increase in EER community integration among mentees.Although schema is developed through individual experiences and understanding (such as those developed in a previous discipline), integration within a team or community can lead to convergences in group schema (Rentsch et al. 1998).While previous research has identified how similar team experiences can eventually lead to deeper understanding of others' schemata and, subsequently, coorientation of schemata, such results have not been reported for peer review training (Rentsch et al. 1998).
In the PERT programme, co-orientation of schemata related to manuscript review also appears to have developed as less experienced mentees converged with more experienced mentors in their SPRs.Beginning with the mentor, triad members alternated leading reviews of three manuscripts so each gained experience with the full process of writing, refining and submitting reviews.This experience likely contributed to the alignment of schemata due in part to the opportunity for mentees to be trained in peer review from their mentors.Future studies should further investigate factors contributing to this convergence and influencing co-orientation between mentees and mentors, specifically for reviews of more varied manuscripts in terms of quality and content.

Mentored peer review has implications for the field of EER
The arrangement of mentors and mentees into triads was designed based on cognitive apprenticeship in which learners acquire knowledge through carefully sequenced authentic learning activities Table 1.Mentees were asked to rate the PERT programme's effect on the following facets related to manuscript review according to 1=Not at all, 2=Minimally, 3=Moderately, and 4=To great extent.that allow them to develop expertise within a community of practice (Maher et al. 2013;Holum, Allan, and Brown 1991).Communities of practice often begin with exploring connectedness between a group and negotiations of what the practice may be (coalescing stage), eventually leading into active practice and adaptations to divergent schema (active stage) (Wenger 2008).Exit survey results showed that after participating in the PERT programme, mentees felt more connected to the EER community than when they started.These findings, along with the co-orientation of schemata, suggests that mentees involved in the programme shift from involvement in the coalescing stage of the EER community to the active stage.When members of the community make this type of shift, they become active practitioners (Wenger 2008).This active status has implications for the EER community, potentially allowing a wider diversity of young practitioners to engage in EER research and leading to a more inclusive, innovative community overall.The social construction of knowledge and shared ideas about what aspects of a manuscript to focus on during peer review can also lead to a shift in existing normativities for reviewing EER scholarship (Beddoes, Xia, and Cutler 2022).Future research should continue to explore changes to the larger EER community, including alterations to how quality research is defined and integrated into the field.

Broad impacts
Based on positive outcomes for the PERT participants, the triad mentoring structure (one mentor with two mentees who rotate through reviewer responsibilities) could be replicated with other journals in EER and beyond, notwithstanding the unique characteristics that reviewers and editorial boards must attend to for different journals that require different schema when conducting reviews based on a journal's aims and scope.The SPR codes we provide in Appendix B and other training materials on our website (EER peer reviewer training program, n.d.) provide guidelines for assessing the quality of EER manuscripts that could be useful to both reviewers in their evaluations and to authors in developing and revising manuscripts.Other interdisciplinary fields may also benefit in terms of the community integration found through peer review training, suggested by our community alignment results.Through co-orientation of schemata and stronger connections with the scientific community, novice researchers and reviewers can become better situated within their field while bringing in their own experiences and innovations.

Limitations and future research
A limitation of this study is that it included peer review of manuscripts from only one journal; the participant selection process, manuscript assignment and review process, key aspects of the PERT participants' experience, would be different for other journals.Although the application process for mentees was open to the entire engineering education community, mentors for the first cohort of the programme were invited by the project team based on their experience as reviewers.
Having mentors from within our own networks could have biased our findings.This was mitigated in subsequent cohorts by inviting mentees to be mentors based on recommendations from their own mentors.Another limitation of this study is that the manuscripts used for the Pre-and Post-SPRs were each only ~1500 words.
While the use of these abbreviated samples enabled the research team to collect a large number of reviews of the same manuscript, it raises the possibility that the content of these manuscripts does not reflect that of full research articles, which are typically ~ 8,000 to 10,000 words.Data collection also includes the three SPRs of full-length EER journal manuscripts for each triad, and future analyses of these SPRs will help address the limitations of using shorter manuscripts for training purposes.Additionally, we only conducted TAPs with participants prior to participating in the programme.In subsequent cohorts we will also conduct Post-TAPs to more fully investigate shifts in the manuscript evaluations of mentees resulting from their participation.Future research will involve participants who have participated in multiple cohorts of the PERT programme.We will continue to collect and analyse data from subsequent cohorts, which will provide a more robust sample size for our analyses.

Conclusion
This paper explored the aspects of EER manuscripts that peer reviewers notice and comment on in their reviews and recommendations to editors within the context of a mentored peer reviewer programme.
Our data are unique in including reviewers' recommendation as well as justifications for those recommendations, and the strengths and weaknesses of manuscripts they identified.These preliminary findings suggest that the ways in which reviewers evaluate manuscripts are influenced by their level of expertise.We also provide evidence that peer review professional development in the form of mentored training can influence not only reviewers' EER manuscript evaluations but also how reviewers understand EER research quality.Evidence of the effects of mentored reviewing can build capacity in engineering education research by recognising that this type of training is a form of professional development for novice peer reviewers such as senior graduate students, postdocs or those making the transition into EER from other fields.This evidence also demonstrates that, because assessing research quality is informed by one's professional knowledge and experience, reviewers can learn from each other through the varied aspects of manuscripts that they each focus on.We are in the early stages of our study, yet we find implications from the data in terms of expanding expertise and building community.Most researchers receive little or no training in peer review.However, as increasing numbers of EER scholars are involved in peer review of journal and conference manuscripts, it is essential to consider the extent to which understanding of quality in EER research is shared.Notably, there was greater convergence between mentors and mentees in how they evaluated EER manuscripts by the end of their participation.This suggests that there are epistemological foundations upon which EER professionals evaluate manuscripts and that these conventions can become shared through peer mentoring.In a field as new and interdisciplinary as EER, discussions about the criteria by which we evaluate manuscripts can promote enhanced understanding of the research questions we pose and the methods we use to explore them.Deeper understanding of the epistemological basis for peer reviews of manuscripts can continue to reveal ways to strengthen professional preparation in EER as well as in other disciplines.
graduate students, postdoctoral researchers, and earlycareer faculty, averaging around five years of experience in research outside of their Ph.D. Mentors averaged over five years of experience beyond their Ph.D. Participants' disciplinary backgrounds were in social sciences, engineering, science, technology and engineering education.Triads (one mentor and two mentees) were formed based on participants' time zones and areas of expertise.

Figure 1 .
Figure 1.Activities completed as part of the peer reviewer training program.Each triad completed three journal manuscript reviews, going through the review cycle collaboratively for each manuscript.Each participant was asked to individually complete a Structured Peer Review (SPR) at the beginning of the program (Pre-SPR), for each of the three manuscripts they reviewed as a triad (SPR-1, −2 and −3), and at the end of the program (Post-SPR).Participants also completed Think-Aloud Protocols (TAPs) at the beginning of the program.

Figure 2 .
Figure 2. The open-ended Structured Peer Review (SPR) form distributed to participants prior to their first triad meeting was used to determine what criteria participants used to evaluate manuscripts when conducting their reviews and making a recommendation to the editor.

Figure 3 .
Figure 3.The six themes used for characterizing responses on the Pre-and Post-Structured Peer Review (SPR) forms: Context, Methods, Results, Discussion, Mechanics and Structure, and EER Relevance.Each of these themes had multiple codes organized as strengths (positive attributes) and weaknesses (negative attributes).For example, Context had four strengths (P for positives) and five weaknesses (N for negatives).

Figure 4 .
Figure 4. Summary of mentor and mentee similarities and differences in manuscript review based on analysis of Think-Aloud Protocols (TAPs).

Figure 5 .
Figure 5.Comparison of aspects of a manuscript that at least 50% of mentors and 50% of mentees commented on in their reviews of Manuscripts A and B before (Pre-) and after (Post-) participating in the PERT program based on Structured Peer Review (SPR) data.

Figure 6 .
Figure 6.Response options to the following questions on the exit survey 'Which of the images below best characterises your connection to the EER community PRIOR to participating in the PERT Program?' and 'Which of the images below best characterizes your connection to the EER community AFTER participating in the PERT Program?'.