Identifying a research agenda for postgraduate taught education in the UK: lessons from a machine learning facilitated systematic scoping review

ABSTRACT This research aimed to describe and evaluate research on the Postgraduate Taught (PGT) sector in the UK from January 2008 to October 2019. The focus on PGT allowed a detailed analysis of an often overlooked part of the HE sector. Methodologically, the research is original in its use of an innovative machine learning approach to a systematic scoping review. The review scrutinised subject areas, topics studied and methodological approaches taken. Initial searches found 9,814 potentially relevant studies which were reduced to 693 for analysis. The machine learning approach was successful in reducing time without compromising accuracy. We conclude that this methodological approach is appropriate for similar reviews within education. Findings show a dominance of research into professional education programmes; a majority of research with PGT as the context rather than focus; a small number of comparative and large-scale studies; and substantial research categorised as ‘scholarship of teaching’. While further research is required to ascertain if the findings are transferable to other national contexts, this study provides a reproducible methodology and identifies areas for future research to examine.


Introduction
In 2019, the UK Council for Graduate Education (UKCGE) Postgraduate Taught (PGT) working group decided that 10 years on from the Universities UK report highlighting a gap in knowledge about the PGT sector (Boorman & Ramsden, 2009), and from the end of the main funding from the Teaching and Learning Research Programme (TLRP), it was time to take stock of what we know about UK PGT education.The aim of this study was to describe and evaluate the research carried out over this 10-year period.This paper reports findings of the resulting systematised scoping review which aimed to develop insights into the overall CONTACT Gale Macleod gale.macleod@ed.ac.ukInstitute of Education, Community and Society at the Moray House School of Education and Sport, University of Edinburgh, Holyrood Road, Edinburgh EH8 8AQ, UK Supplemental data for this article can be accessed online at https://doi.org/10.1080/03054985.2023.2203376.
undergraduate (UG) fees and UK domiciled numbers.As a result of the lack of PGT fee regulation, universities can charge whatever the market will bear.Hillman (2020, p. 5) reported that 'each international student in the UK pays an average of £5,100 more than it costs to educate them'.While the vast majority of the fee goes to supplement universities' research budgets, some of it is redirected to subsidise UK and EU course fees (House, 2020).
The contribution of PGT provision to the UK university sector was estimated to be £1.5bn in 2008/09 (Smith et al., 2010); by 2017/18 this had risen to £3.5bn, with 59% attributable to non-EU domiciled students. 1The increasing financial value of PGT students to HE may go some way to explaining the rapid increase in research in this area (Pereda et al., 2007).

Research, practice and policy
Increased attention on the PGT sector is welcome because of limitations in the transferability of findings from research on UG and postgraduate research (PGR) to PGT.However, the relationship between research, policy and practice is complex in all sectors of education (Diery et al., 2020), and PGT is no exception.Education policy and practice demand high-quality research (Bridges et al., 2009), and over the last 20 years, arguments about the place and the possibility of 'what works' research in education have continued apace (e.g.Biesta, 2007;Guldberg, 2017).As government demands for 'gold standard' evidence (Oancea & Pring, 2008) come up against the reality of education systems which are, in Biesta's words, 'open, semiotic and recursive' (Biesta, 2020, p. 39), the relationship between HE research, policy and practice continues to be complex.Calls for a 'positivist' evidence base, unencumbered by ideology, came at a time of 'crisis' in education research in the mid-1990s, when it was judged to be 'small scale, irrelevant, inaccessible and of low quality' (Pollard, 2010, p. 27).In response, the TLRP was devised, with the aims of improving the quality of research on education at all stages and better informing policy and practice (Pollard, 2007).While the early years of TLRP focused on schools, as the programme progressed there was a 'plethora of research' in HE which led to the development of 10 'evidence-informed principles for effective pedagogies in higher education' (David et al., 2009, p. 6).While widely accepted, Adolphhus (2010) described them as being general and endorsing contemporary 'fashionable' approaches.
Demand for evidence to act as the basis for policy led to the use of systematic reviews in an attempt to identify 'what works', but also led to the exclusion of anything which did not fit a narrow model of 'evidence'.Towards the end of its life, the TLRP invited philosophical consideration of what kinds of research should inform policy (Bridges et al., 2009).This identified multiple, subtle ways in which research of all kinds can inform policy at different stages and in various ways.Even the seemingly least likely methodologies, personal stories and narrative, were shown to have informed policy (Griffiths & Macleod, 2008).However, despite the best efforts of the philosophers, policymakers' preference for large-scale quantitative studies remains (Smith et al., 2017).
Another source of pressure was the Dearing Report's (1997) call for HE in the UK to be characterised by the linking of research, scholarship and education (O'Connor, 2010).One outcome was the establishment of the Institute for Learning and Teaching in Higher Education (ILTHE), which had a remit to commission research into HE learning and teaching (David et al., 2009).In 2004, the Higher Education Academy (later Advance HE) was formed from the merger of ILTHE with the Learning and Teaching Support Network and the National Coordination Team for the Teaching Quality Enhancement Fund (David et al., 2009).These developments can be seen to embody the shift to enhance, professionalise and accredit HE teaching in a context where research had historically been viewed as the only important success measure.
This brief reflection suggests that issues remain with the research that informs HE professional development and impacts HE policy.This may be particularly the case for PGT, where rapid growth and the diversity of the sector and its students make it difficult for those wishing to maintain an awareness of evidence which could inform policy and practice.The aim of this systematic scoping review was to generate an overall view of the current state of research evidence regarding the PGT sector in the UK, and to evaluate the methodological basis of the evidence, areas of focus and gaps in existing knowledge.

Methodology
Our approach combined the features of a scoping review, which is described as a 'preliminary assessment of potential size and scope of available research literature' and which 'aims to identify the nature and extent of research evidence' (Grant & Booth, 2009, p. 95) with those of a systematic review, in terms of the systematic nature of the search.Our study was unlike a systematic review in that it did not include quality assessment inclusion criteria, and was not focused on evidence relating to a particular intervention or area (Grant & Booth, 2009).The Supplementary Materials file contains details of the original protocol, amendments to the protocol, and search histories for each source.The checklist provided by McGowan et al. (2016) was used by Author 2 (an academic librarian) to guide the development of the search strategy.A set of 28 'indicator' papers was used to test the sensitivity of the searches.As a result, the selection of databases was revised (IBSS was used in place of ASSIA; additional databases were added).Searches for 'postgraduate' or 'masters' without a 'taught' or similar qualifier tended to retrieve huge numbers.As several of the indicator papers did not articulate 'taught' in title/abstract, they needed to be found in other ways (e.g.subject headings added by database indexers).In addition, some changes were made to proximity searches.The geographical part of the search was modified to add 'British' OR 'UK'.The thorough development and testing of the search strategy together with the increased number of databases gives confidence in the comprehensiveness and minimisation of bias in the data retrieval approach.Literature databases BEI, ERIC, Web of Science, IBSS, ProQuest Education, Taylor and Francis, and Science Direct were searched on 20 th October 2019 using thesaurus terms (where available) and free-text searches.The searches were adapted for each database as required.An example search history is: The Research Questions guiding the study were:

BEI via EBSCO
(1) What are the areas of focus of research into PGT in the UK since 2008?
(2) What are the methodological features of research into the PGT sector in the UK?
(3) What is the nature of any gaps in robust evidence?
The final inclusion criteria adopted were: • published from January 2008 onwards; • reporting data from the UK (may be alongside reports of data from other national contexts); • reporting primary data in which data for PGT students/sector are discrete and identifiable, or secondary analysis of such data; • includes description of the sampling strategy, data collection procedures and the type of data-analysis considered; • no inclusion/exclusion criteria based on quality.
The use of ML to help reduce the volume of human effort in traditional systematic reviews is becoming more common in medical research (Beller et al., 2018;Currie et al., 2019;Noel-Storr et al., 2020).However, there has not, as yet, been significant uptake of this approach in the social sciences.ML is defined as 'algorithms which "learn" to perform a specific task through statistical modelling of (typically large amounts of) data' (Marshall & Wallace, 2019, p. 2).Without ML, a human (or preferably two humans, each scoring each manuscript independently) would mark studies as 'include' or 'exclude' on the basis of their reading of the title and abstract (Bannach-Brown et al., 2019;Marshall & Wallace, 2019).The ML implementation used here hosted on the CAMARADES Systematic Review Facility (SyRF) (Bahor et al., 2021) with API linkage to EPPI-Centre London (Shemilt et al., 2016) provides a score from 0 to 1, along which a threshold can be set to provide the desired sensitivity.Using the approach of Bannach-Brown et al. (2019) we identified publications in the training corpus with the greatest mismatch between human and machine decisions, and revisited human categorisation to identify instances of human error.This led, iteratively, to improved performance of the ML algorithm.Because we were concerned not to exclude relevant studies, we set the satisfactory metrics to be sensitivity at 0.95 (similar to that achieved by human screeners) while pursuing an optimising specificity, accepting that this would have a lower specificity and lead to a number of studies being falsely identified as of relevance, which we would have the opportunity to detect as such at the full-text screening stage.
A total of 10,668 studies were returned.After removal of duplicates, titles and abstracts for 9,418 studies were uploaded to SyRF.Of these, 8,540 were journal articles, 215 were book sections, 112 were books, 40 were reports, and 13 were theses.A further 491 were tagged as 'generic' although the majority of these were also journal articles.The remaining seven were conference proceedings (2), cases (2) or had no annotation (3).We manually screened 2,418 citations in duplicate and allocated 80% of these, at random, to a training set.We used these to train a machine learning alogorithm hosted by our collaborators at The EPPI-Centre, University College London, which uses a tri-gram 'bag-of -words' model for feature selection and implements a linear support vector machine (SVM) with stochastic gradient descent (SGD), as described in Approach 1 in Bannach-Brown et al. (2019).This algorithm associates the training set screening decisions with features it identifies in the relevant title and abstract text, and uses these features to predict the inclusion or exclusion status for new unseen studies.In the remaining 20% validation set, the algorithm performed with sensitivity of 0.95 and specificity of 0.72.We dual-screened a further 1,000 citations, giving 3,418 in total, and repeated this process.Specificity increased to 0.74 at our required sensitivity of 0.95.Next, we identified studies with the greatest mismatch between human (include = 1, exclude = 0) and machine (a number between 0 and 1) decisions, and had humans re-screen those publications to identify potential human error.After five iterations of this process involving 862 citations (25% of the total), we identified 89 which had been omitted in error and three which had been included in error.Retraining the ML algorithm on this improved training set gave specificity of 0.77 at our target sensitivity of 0.95.
We would therefore expect that applying the trained algorithm to a corpus of 10,000 studies with a true inclusion prevalence of 20% would identify 1,900 of 2,000 relevant studies and wrongly include 1,840 of 8,000 irrelevant studies, giving a saving in human screening effort of over 60% at a cost of missing 5% of relevant studies.This compares with the human performance, where our error correction strategy showed that at least 3% of records (89/3,418) had been falsely excluded by two human reviewers.
Of the 3,179 included studies, there were 914 records for which pdfs were not readily available.Of these, 445 studies which did not meet the inclusion criteria on more detailed review, or were duplicated within the cohort, were excluded at this stage.Of the 469 remaining, pdfs for 463 were found through various internet searches, contacting the author, and library searches of the three home institutions of the authors.No further record could be found for three of the studies.
The remaining 2,731 studies were then full-text screened by Authors 1-4 within SyRF.A total of 693 studies were identified for inclusion, as shown in Figure 1.Each included study was annotated within SyRF in relation to key variables derived from the protocol (e.g.topic, discipline and research design).There were multiple codes for each variable, with a notes function to provide further details.While some codes were pre-determined, additional codes were generated by checking the notes section to check for frequently identified features, e.g.'Nursing' was originally included as a 'Subject allied to medicine' (SAM), but was then coded separately when more than 10 studies in this subject area were found.
The annotations were exported from SyRF to Excel for further analysis.Finally, an abbreviated form of annotations was exported to SPSS and re-coded to allow for simple descriptive statistical analysis.

Results
Table 1 shows the distribution of studies across subject areas.The large majority of the 693 studies were research into professional education.Over one third of all research into PGT over the 10-year period was within the subject area of education.All but six of these studies related to programmes for the professional education of teachers.SAM includes Pharmacy, Dentistry, Physiotherapy and Occupational Therapy.Only areas where 10 or more studies were identified are reported separately.The 693 included studies were annotated to show whether study or provision at PGT was the focus of the research or simply the context in which the research took place.Studies (n = 48) which reported PGT data alongside data relating to UG or PGR and where level of study was part of the analysis were included as 'focus', giving a total set of 250 'focus' texts, with 443 'context' studies where PGT level was not a feature of the analysis or discussion.

Screening Identification
The distribution of topics across the total (693), context (443) and focus (250) sets is shown in Table 2. Here, we report topics on which there were 10 or more studies.Topics which did not meet the 10 studies threshold include academic community (6) and academic misconduct (4).
Learning and teaching accounts for the largest proportion of studies in both the context and focus sets.This category includes a wide range of topics, for example: problem-based learning (Seymour, 2013), critical thinking (Bramhall et al., 2012); and mentoring (Eliahoo, 2016).Annotation within SyRF allowed us to record studies where there was a clear secondary topic, e.g.Quan et al. (2016), was recorded as 'transition' with the secondary topic 'international student experience'.Indeed, 'international student experience' was the largest secondary topic and was recorded against 14 studies (these were evenly spread across seven different primary topics).The second largest secondary topic group was technology/e-learning; of the eight which have this sub-topic the main topic in six of them was assessment, suggesting particular interest in this area.
Analysing topics covered by subject area gives rise to many cells with small counts (<5), particularly when looking only at the smaller focus set because of the multiple categories in both variables.Therefore, in Table 3, topics by subject area are provided for the whole set (693) with observed and expected values (in italics) presented and standardised residuals (in bold).Cells marked '-' indicates an observed count and an expected count of less than five.
Turning to the methodological approach used, we extracted data on both research design and sample size.Table 4 shows the numbers of studies adopting each research design in the whole, context and focus sets.For the most part, the proportion of studies using each design is fairly consistent across all sets.Where there is wider variation, this tends to be among the less commonly used designs, e.g.narrative and action research are more common in the whole set, with policy or documentary analysis, systematic review and secondary analysis more common in the focus set.
To explore design by subject area, we once again conducted a cross tabulation to view observed count and expected count.As with topics by subject areas above, the number of categories and small observed counts in some cells make more detailed reporting problematic.Table 5 shows the results for the five most common designs and the five subject areas with the highest number of studies.The final annotation to report relates to the sample size in all the included studies.Table 6 shows the perecentage of studies with each sample size within each subject area.

Discussion
Our research generated a database of 693 citations which met the inclusion criteria, divided into context (443) and focus (250) sets.Despite a growing interest in PGT, our analysis showed that studies are far more likely to have PGT as context rather than as a main focus.Given the size of the sector and its importance to UK HE, 250 studies over 10 years may be somewhat on the low side.The review also highlighted the methodological features of research into PGT and gaps in the knowledge base.There is a disproportionate amount of research in some areas and virtually none in others; professional or practice programmes dominate.Research in Education makes up more than a third of the database, reflecting its position as one of the largest areas of PGT provision, along with Business and SAM, which also feature prominently (HESA, 2021).While a small number of single-subject area studies were found, there is little within 'traditional' academic disciplines, with none in Anthropology, Classics, Chemistry, Divinity, History, Languages, Law, Philosophy or Sociology.
One explanation for the volume of research in Education may be its link to the professional community.Such professions commonly involve a career-long commitment to evidence-based practice and reflexivity.Our findings support the idea that as professionals move from practice to HE teaching, core values of research-informed and reflective practice persist, spurring research into professional education.Further evidence emerges in that over half of the action research studies were in Education.In contrast, those in 'traditional' academic disciplines are more likely to carry out research in that discipline rather than learning and teaching.As a result, we still know very little about pedagogy and provision in research or specialised/advanced programmes.
Topical differences emerged between the focus and higher sets.Unsurprisingly, the focus set has a higher proportion (almost one fifth) than the context set (just over one twentieth) of studies addressing curriculum design.A more unexpected feature is that few studies focused on wider issues such as widening participation, transitions, employability, policy or marketisation, with the large majority examining issues at programme, module, or course level.Another unexpected finding was the limited focus on 'technology/ e-learning' in the focus set compared to the context set (3.2% compared to 16%).This comparative lack of interest at M-level is difficult to explain given the growth in online PGT programmes and modules (Cejnar et al., 2020), suggesting that more research may be needed in this area.
As would be expected with a topic which transcends subject boundaries, most research into 'international student experience' was interdisciplinary and comprised the largest proportion of studies across multiple subjects.This contrasts with 'learning and teaching' which is much more likely to be explored within one subject, raising the question of which aspects of PGT study are context-dependent and which can be generalised.It is not clear whether this indicates topics that are more appropriately studied across subject boundaries and those that are not, or a divide between researcher by practitioners researching their own practice and those for whom HE is a field of study, not just where they work.Whatever the explanation, studying PGT learning and teaching might benefit from more multi-subject research, a point we return to below regarding methodological gaps.By far the most common methodology was 'mixed methods', accounting for nearly a third of the total set.This finding needs to be treated cautiously, as some reserve the term for when qualitative and quantitative data are integrated (Bryman, 2006).However, here we use it more loosely to mean simply that more than one method was used.Out of the 205 mixed methods studies, 48 were evaluations of a learning experience, many of which combined interviews or focus groups with a questionnaire (e.g.Elander et al., 2010).This may be driven by the increase in the use of mixed methods approaches generally (McKim, 2017), although it is likely that the high number of studies in our total set using mixed methods also reflects the desire to use multiple data sources to enrich small-scale research.
As might be expected, larger-scale research was seen in studies which adopted surveys gathering mainly quantitative data, as well as those categorised as 'observational', 'secondary analysis', and 'experiment' and in particular disciplines.Whereas research in some areas of professional education is small scale -for example, two thirds of the research in Education have a sample below 30 -for Business, Medicine and Engineering over 25% of studies have a sample size above 100.While larger studies were seen across the range of topics, they were more common than an even distribution would predict in the small number of studies focused on widening participation and employability, although overall numbers are very low so findings should be treated with caution.
There seem to be two different explanations for these findings.First, authors intending to inform policy are likely to be mindful of policymakers' penchant for research considered 'gold standard' (Oancea & Pring, 2008).This does not negate how other methodological traditions may continue to influence policy in other ways (e.g.identification of problems that require a policy response, providing alternative perspectives, highlighting the experiences of marginalised groups) but these occurrences are less predictable and therefore difficult to plan for (Griffiths & Macleod, 2008).Second, patterns can likely be accounted for by paradigmatic differences between disciplines influencing what research is considered 'worthwhile'.
While two studies across more than one subject made international comparisons (Cheng et al., 2018;Morgan, 2014) no studies compared experiences across programmes or subject areas.Rather, the 'more than one subject' approach is used to research general issues, such as 'international student experience'.This seems to be a missed opportunity, given the potential for comparison to strengthen and refine theory through enabling researchers to investigate whether their 'conclusion reached held under contrasting but conceptually related circumstances' (Bechhofer & Paterson, 2000, p. 9).Indeed, across the retrieved studies, we have identified a significant amount of research which could be categorised as 'scholarship of teaching'; people researching their own practice or contexts.Concerns with this approach include possible bias and coercion where the researcher is known to the participants, along with the risk of atypical participant groups associated with non-probability sampling (Etikan et al., 2016).Of course, there are many benefits of research in particular contexts which can be particularly useful for informing practice as they have relevance to a particular situation (Macleod, 2014).Indeed, Flyvbjerg (2001) argues that only research of this kind is relevant in the social sciences, reasoning that human behaviour is context-dependent and therefore there can be no context-free general theory.Instead, Flyvbjerg (2001) argues for a 'phronetic social science' where researchers produce contextualised knowledge drawing on localised understanding and subjective relationships.Nonetheless, however valuable individually, a collection of studies of this kind does not add up to a systematic programme of research.Thus, the diversity of the sector presents the opportunity for theory development and refinement through comparison across disciplines, types of programme, modes of delivery and institutions -an opportunity which is currently underexploited.

Conclusion
This systematic scoping review used an ML enhanced approach to identify 693 sources which reported empirical research on PGT provision and policy within the UK.In addition to the analysis reported here, this study has produced a database which can be interrogated to answer different questions to those addressed in this article, e.g. to explore the theoretical lenses commonly used in research in this area.Limitations of the study include the exclusion of grey literature and the reliance solely on database searches.This is in line with the study being a scoping review, although it was an amendment to our study protocol.The large volume of grey literature relating to PGT will be reviewed in a later separate study.While the study inclusion criteria limited our search to research in the UK, publication of our search strategy allows for replication with a wider data set.The error analysis suggests that up to five percent of potentially relevant records may have been missed.Given the aim of this review to identify broad patterns of activity, we judge that up to 5% omission is unlikely to affect the robustness of our findings.
While ML is increasingly used in biomedicine, its use here is novel.The use of SyRF to screen titles and abstracts for inclusion undoubtedly saved considerable time without compromising accuracy.There are some particular challenges associated with ML in social sciences and with the particular inclusion criteria we set.Some of these stem from the ambiguity of language, for example, the machine selected to 'include' publications with the words 'mastery' or 'mastered' in them.In part, this may stem from linguistic and epistemological differences between the disciplines in which ML has traditionally been applied, such as biomedicine, and a field such as education.Specifically, the paradigmatic profile of educational research may span from (post-)positivist psychological approaches to feminist poststructuralist ones.Accordingly, different educational researchers adopt and deploy situated lexica that challenge a 'global' terminology for particular phenomena.Thus, adoption of ML cannot be done uncritically or (currently) without additional judicious review.Combining ML with text mining approaches using Regular Expressions might address these issues.
Automating the identification of certain inclusion and exclusion criteria also proved challenging.While humans were able to exclude quickly on the basis of an abstract when research was carried out at a university outside the UK, this was something that the machine was not able to detect and country contexts are not available as filters in journal indexes.The Edinburgh Medical School and the NHS in Lothian.Marshall has particular expertise in systematic and related review methods.
Rosa Marvell is a lecturer in Sociology at the University of Portsmouth.Her research focuses on social class and gender inequalities within education and work, including a particular focus on postgraduate education.Her PhD explored how class inequalities shaped access to taught Masters degrees at English universities, interrogating the trajectories of first-generation students.Previously she was a postdoctoral researcher at Oxford Brookes University and a Research Fellow at the Institute for Employment Studies specialising in dynamics of inclusion and (in)equality in postcompulsory education and the labour market.Jing Liao is a researcher and machine learning expert working at the Space Science Center in the University of New Hampshire.Her research covers different disciplines with the main focuses on the ionospheric ion circulation and impacts on the Earth's magnetosphere and machine learning applications in sciences.Jing has previously worked at the University of Edinburgh with Professor Malcolm Macleod on natural language process applications on pre-clinical systematic reviews and helped building the SyRF platform.

Gerri
Matthews-Smith is Associate Research Professor at Edinburgh Napier University.Her interdisciplinary research and practice span three areas: human and organisational development, management and wellbeing.She works with several voluntary and public sector organisations devoted to military transition, education and wellbeing, including the national Council of Military Education Committees (COMEC).She is founder and director of the Centre for Military Research, Education and Public Engagement and the university's research lead for military research.Malcolm Macleod is Professor of Neurology and Translational Neuroscience, and Academic Lead for Research Improvement and Research Integrity, at the University of Edinburgh.His work considers the most efficient and systematic evaluation of research evidence to inform action whether that be further research, clinical treatments or policy.He led the early development of the CAMARDES collaboration, the development of the SyRF platform, and is active in the International Collaboration for the Automation of Systematic Reviews (ICASR).His work includes the development and evaluation of publication guidelines for stroke research, in vivo research and life sciences research in general.He is a clinical neurologist with NHS Forth Valley and serves on the UK Commission for Human Medicines.

Table 1 .
Number of publications in each subject area and as a % of each set.

Table 2 .
Topics studied as a percentage of the 'total', 'context' and 'focus' sets.

Table 3 .
Topics by subject area, whole set, observed count vs. expected count, with standardised residuals.

Table 4 .
Research design adopted in studies, whole, context and focus sets.

Table 5 .
Top five research designs by top five subject areas, expected counts and standardised residuals.

Table 6 .
Sample size by subject area.