Design, delivery and evaluation of a bioinformatics education workshop for 13-16-year-olds

ABSTRACT Bioinformatics is the use of computers in biology, particularly to analyse DNA and protein sequences and associated data. Bioinformatics has become crucial to most areas of life sciences research. However, bioinformatics education has not kept up with the pace of these advances. To help address this problem, we have designed an open-access bioinformatics workshop for secondary school biology pupils. The workshop is linked to the curriculum in Scotland, addressing learning objectives for Scottish Qualifications Authority Higher Biology and Human Biology. Furthermore, it aims to inspire pupils more generally and includes critical evaluation of evidence as a more generic skill. We delivered this workshop to biology pupils of seven schools in Scotland and conducted evaluations of pupil and teacher feedback. Quantitative pupil and teacher feedback suggests the workshop is useful and enjoyable, with no statistically significant difference between pupils identifying as female and pupils identifying as male. Qualitative responses suggest the workshop gives an increased knowledge of the field of bioinformatics and its importance in everyday life, and that pupils enjoy working in groups. Teachers also highlight the importance of hands-on experience in the classroom. We conclude the workshop is successful in its aims and is suitable for wider deployment.


Introduction
Over the last decades, a surge of technological innovation has changed the practice of life sciences research. In particular, the exceptional pace of recent advances in technology for DNA and genome sequencing has created a demand for computationally able researchers to analyse the large volumes of data produced (Attwood et al. 2019). Bioinformatics -the application of computation to such data -is now an integral part of modern biology. Bioinformatics has revolutionised how research is conducted across several traditionally disparate fields (e.g. Gladyshev, Meselson, and Arkhipova 2008;Naccache et al. 2014;Ripp et al. 2014; One Thousand Plant Transcriptomes Initiative 2019). The importance of bioinformatics is also recognised by UK government and industry reports (e.g. Life Sciences UK Industrial Strategy Report 2017).
Despite the clear and increasingly important role of bioinformatics in life sciences research, bioinformatics education at secondary school level -and, indeed, at undergraduate level (Williams et al. 2019) -has not kept up with advances in the field. This is of growing concern to both the scientific community and the educational community (Machluf and Yarden 2013). In Scotland, unusually in a UK and international context, bioinformatics is covered in the Scottish Qualification Authority (SQA) Higher Biology and Higher Human Biology curricula, though briefly (Table 1).
Preliminary research showed that senior secondary school pupils were capable of working through university-level bioinformatics exercises (Barker et al. 2015), using open educational resources developed for final-year undergraduates (Barker et al. 2013). This adds support to suggestions that early exposure to bioinformatics is useful for school pupils (Machluf and Yarden 2013;Form and Lewitter 2011). Following this preliminary research, a workshop for Higher Biology, Higher Human Biology and Advanced Higher Biology classes was developed and deployed across Scotland (to be reported separately). However, by the time pupils are studying Higher or Advanced Higher subjects -typically in the last one or two years of secondary school -they have already specialised on a very limited range of subjects. We hypothesised that a bioinformatics workshop targeting an even earlier stage in education would be valuable. This has the potential to reach a larger number of pupils at a stage where they have not yet specialised for the final years of school.
We designed a new workshop, intended to bring practical experience in bioinformatics to an even younger audience. This workshop targets junior stage Biology pupils in Scotland. Target pupils include those in years S3 and S4, and others taking SQA National 4 or National 5 Biology (awards at Scottish Credit and Qualifications Framework levels 4 and 5 respectively). Such pupils are typically 13-16 years old. This workshop aids the learning of key topics in the National 4 and 5 level Biology curricula -DNA, chromosomes and genetics (see Table 1 for curricula descriptions) -but also goes beyond this, teaching pupils how to use a key computational tool to conduct simple biological analyses. Additionally, pupils -and teachers -have the opportunity to work with large genomic datasets that are widely used in life sciences research and to consider the strength of evidence for their bioinformatics conclusions. Pupils are introduced to DNA barcoding and its applications to highlight the relevance of their schoolwork to real-life issues, such as biodiversity and food fraud. Our workshop uses free, publicly available Web-based tools and is designed to be conducted on desktop computers, tablets and even smartphones.
Many schools in Scotland now provide pupils with tablets or laptops to use during classes (dependent on Local Authority), which helps to overcome any issues associated with computer room availability and bookings. The activities in our workshop have simple objectives to allow students to focus on learning how to use a bioinformatics tool and relate it to topics in their biology curriculum and everyday life. A key aim is to show students that computational and statistical skills are a core part of life sciences and that research-grade analyses can be performed from any device with a Web browser. Furthermore, by targeting biology classes, we reach a mostly new and predominantly female audience for scientific computing: in Scotland, approximately two-thirds Table 1. Key topics in the SQA National 4 and National 5 Biology Curricula related to bioinformatics and genomics (SQA, 2013(SQA, , 2019.

National 4 Units
• Unit 1 Cell Biology DNA, genes and chromosomes; DNA and its structure; The genetic code; Genes and proteins; Chromosomes and inheritance; Genes and inheritance; Genetic testing and counselling; Gene therapy of pupils in biology classes are female (63% and 67% for National 4 and National 5, respectively, in 2018) (SQA 2018).
Here, we present the design, delivery and evaluation of our new bioinformatics workshop for 13-16-year-old (SQA National 4 and 5 levels) Biology pupils -Bioinformatics: Food Detective. Analyses of pupil and teacher feedback show that preliminary trials of our workshop were a success. Overall, pupils found the workshop enjoyable and useful, with no significant difference in response between pupils reporting as female and those reporting as male. Pupils particularly liked using the DNA database and working in groups.

Methods
This workshop aims to introduce pupils (and teachers) to the field of bioinformatics and highlight some of its applications in scientific research. The activities in this workshop involve the use of authentic genomic data, the National Center for Biotechnology Information (NCBI) database (NCBI Resource Coordinators 2016) and the online Basic Local Alignment Search Tool, BLAST, (Altschul et al. 1990), which may be used to compare a user-provided sequence with the entire publicly available DNA sequence database. The workshop was designed to create an environment of active learning (Felder and Brent 2016) in which pupils participate throughout and interact with their peers and staff. The workshop was adapted during the study through action research, improving it in light of feedback on early versions.

Bioinformatics: food detective workshop
The Bioinformatics: Food Detective workshop reinforces core elements of the SQA Biology syllabus (DNA, RNA and the genetic code; see Table 1 for full details) but also brings in concepts beyond this to highlight the applied significance of this knowledge. The workshop is based on DNA sequences of 16S rRNA sequences from a hand-made pork sausage purchased from a butcher shop (Cambridge, UK). All laboratory work and preliminary data analysis were carried out by Cambridge Pathology Laboratory as part of the first author's work with The Naked Scientists (see podcast at https://www.thenakedscientists.com/podcasts/naked-scientists-podcast/dna-decodedpast-present-and-sausage). DNA was extracted using Qiagen DNeasy Blood & Tissue Kit according to the manufacturer's instructions. 16S barcode sequences were generated using Illumina MiSeq next-generation sequencing technology. Quality control was conducted using FastQC (Andrews 2010) and sequences were aligned to the SILVA ribosomal RNA database (Quast et al. 2012).
In this workshop, pupils are provided with an unlabelled nonrandom sample of these barcode sequences that represent the diversity of species found in the sausage (available at https://4273pi. org/schools). Pupils are then asked to hypothesise what species they expect to find in these samples based on the type of sausage. Since the sausage is labelled 100% pork it is fair to hypothesise that only pig (Sus scrofa) DNA will be identified. The pupils then work in pairs using the NCBI database and the online BLAST search tool to identify which species the barcodes match in the database. The workshop is divided into two tasks, both of which involve using the BLAST tool. In task one, pupils are provided with a set of barcode sequences that have reliable matches in the database (expectation or e-value, E < 0.00001) to the following species: pig (Sus scrofa), cattle (Bos taurus), sheep (Ovis aries), chicken (Gallus gallus) and human (Homo sapiens). Pupils make a note of the species names and corresponding e-values. We then ask pupils -in their worksheets -to think about whether or not the results were as they expected. This generates a discussion that relates to their initial hypotheses. Pupils are also asked to hypothesise how 'unexpected' results may have occurred e.g. why would human DNA be found in a pork sausage sample? We discuss the results with pupils and highlight the fact that the vast majority of the DNA sequence reads in the data set map to pig (Sus scrofa), while the remaining species in the results table make up the other proportion of the reads.
In a second task, pupils are provided with a set of barcode sequences that have unreliable matches in the database (E > 0.1): marbled headstander fish (Abramites hypselonotus), bacteria (Methanothermobacter sp.), copepod(Neocalanus cristatus), snouted tree frog (Scinax sp.) and common grapevine (Vitis vinifera), though the list varies over time as the public DNA database is updated. Pupils again take note of the species names and their corresponding e-values. We explain the relevance of the e-value and how it is an indicator of the reliability of a BLAST search result. Pupils then examine the e-values of their BLAST data to identify which of the two sets of results is most reliable. Here we focus on highlighting the importance of statistical analyses to determine the reliability of results produced in an experiment. However, we do not explain the algorithm used to calculate the e-value as this is a complex statistical analysis and beyond the scope of a workshop of this level. We simply wish to highlight that, as with research conducted in a wet-lab environment, the results produced in computational research should also be scrutinised for reliability. We aim to provide pupils with the knowledge and understanding to be able to distinguish between an 'unexpected' but reliable result (e.g. Homo sapiens DNA present by contamination, low e-value) and an unexpected and unreliable result (e.g. all results in Task two, high e-values).
This entire workshop can be completed on any device with internet connectivity e.g. desktop computers, laptops, tablets (including iPads) and smartphones. We encourage pupils to work in pairs to promote discussion throughout the workshop. To effectively reach large numbers of pupils without the audience bias and exclusion that may occur through self-selection for extracurricular activities, we conduct whole-class visits during the school day. The duration of our workshops is a standard 'double period' of biology (= ~ 1 hr 40 mins) which means that our visits fit within the existing school timetable. This reduces scheduling complications for teachers and avoids negatively impacting other subjects.

Workshop delivery
For the current paper, nine workshops were carried out across seven Scottish state schools: Aith Junior High School (1 workshop), Whalsay School (1 workshop), Mid Yell Junior High School and Baltasound Junior High School (1 joint workshop held at Mid Yell), Sandwick Junior High School (1 workshop) and Brae High School (1 workshop), Shetland Islands; and Forfar Academy, Angus (4 workshops). We reached 152 Biology pupils of whom 94 (62%) identified as female, 45 (30%) identified as male, 13 (8%) did not identify as either male or female, and 10 teachers ( Table 2). All of the workshops were led by authors of the paper with other staff members and volunteers and at least one teacher was present. The structure of the workshop was consistent in all cases: a short introductory presentation was delivered to pupils, throughout which pupils are encouraged to ask and answer questions, they then work through activities in pairs, pausing after each task to discuss findings.

Evaluation of workshop
To evaluate the workshop, we provided pupils and teachers with feedback forms. Each pupil feedback form contained questions on a 1-3 Likert scale for usefulness and enjoyment of the activity and a free-text area in which pupils can highlight two things they enjoyed about the workshop and one thing that can be improved ('two stars and a wish'). We chose to use the 'two stars and a wish' design as this is something commonly used to evaluate lessons in both primary and secondary schools, therefore students are familiar with this system. We also record pupil gender (pupils define this themselves in an open text area of the feedback form). This is of particular interest due to well-documented gender biases in computational studies in Scottish schools (SQA 2018) and more generally in the sector (WISE 2019). We analysed qualitative feedback from pupils using Thematic Content Analysis (as described in Burnard et al. 2008). All free text feedback was transcribed into an Excel spreadsheet with one row per pupil across three columns ('two stars and a wish'). The first stage of analysis was open coding, which involved using keywords and/or phrases to summarise the content of each statement. Then, overlapping and similar categories were grouped under theme headings (e.g. 'computer use' and 'website use' categorised together under the theme 'Use of technology'). Each of the 'stars' and 'wishes' was assigned to a theme. All stages of this analysis were subject to peer-review i.e. the first and last authors conducted analyses independently, compared and discussed results before agreeing on final themes. Nine 'stars' were excluded from analyses either because they were left blank or we were unable to clarify what had been written. 46 'wishes' were not included in analyses because they were left blank or pupils had stated that they could not think of any ways in which the workshop could be improved.

Quantitative feedback
Overall, the majority of pupils report that the activities in the workshop are both useful and enjoyable. We asked pupils to respond to the following questions on a Likert scale of (1) Disagree; (2) Neither agree nor disagree; (3) Agree: i) I enjoyed the activity ii) I found the activity useful Analysis of this data shows that 72% of pupils agree (3) that the activities are useful and 79% of pupils agree (3) that the activities are enjoyable. Only 3% and 4% of pupils disagree (1) that the activities are useful and enjoyable, respectively. The remainder of pupils neither agree nor disagree (2) with the statements (25% and 16% for usefulness and enjoyment, respectively) (Figure 1(a)). We analysed all responses from pupils identifying as male or female (n = 139) to investigate a potential link between gender and pupil response to the activities. For this analysis, we focussed on feedback falling into the Disagree (1) and Agree (3) Likert scale categories. We find no significant link between reported gender of the pupil (male or female) and whether or not they find the workshop useful or enjoyable (Fisher's exact test, p-values = 1) (Figure 1(b)).
Although quantitative feedback on the day has its limits -it would be unusual for a class to not enjoy a change of pace and instructor -these results are encouraging and suggest that we have created a workshop that is not only informative but one that pupils enjoy. This is of relevance as there is growing evidence highlighting the importance of enjoyment in the learning process, reviewed in Lumby (2011).
Crucially, we found no gender-specific differences in pupil response to either of these questions. In Scotland, computer science classes (at SQA National level) are only ~15% female, whereas biology classes are ~68% female (SQA 2018). Our project specifically targets biology classes for these computational activities as this allows us to reach an audience less likely to be involved in computational education at school. This gender bias in computational science is not only evident at school level, it extends across the UK and into Higher Education. In 2018/19, only 19% of those studying computer science at university level in the UK were female (HESA 2020). Interestingly, these patterns are not global. Evidence from countries in which female and male participation in computational sciences is approximately equal, including India and Malaysia, suggests that introducing computing at an early stage of education is one step in solving this complex issue (Lagesen 2008;Thakkar et al. 2018).
We acknowledge that the statement, 'I found the activity useful', is open to pupil interpretation. However, it is plausible to assume an activity regarded as 'useful' is a more positive experience than an activity that is not.

Qualitative feedback
In addition to a quantitative analysis of the workshop, we assessed 152 qualitative feedback responses from pupils. We asked pupils to provide 'two stars and a wish' for the workshop i.e. two things they liked about the workshop and one thing they would like to see improved or included in future. We conducted a Thematic Content Analysis across all free text data collected i.e. from both 'stars' and 'wishes' and identified seven major themes: (i) Group work, (ii) Use of technology, (iii) Content of workshop, (iv) Delivery of workshop, (v) Learning and understanding, (vi) Overall quality of experience, (vii) Careers. Each of the themes is fully reported in Table 3. We discuss in more detail the themes of particular interest -(i) Group work, (ii) Use of technology, (iii) Content of workshop.

Group work
Our results show that pupils enjoy the opportunity to work in pairs and groups. In fact, all responses within the group work category are 'stars' except for one 'wish' to be 'allowed to work in bigger groups'. Pupils particularly liked that they could choose their own partner/ group: 'It was good because we got to choose our groups'.
We initially designed the workshop this way to promote discussion between pupils and also for convenience -working in pairs requires less equipment as pupils share a device. However, we believe that the ability to work in pairs and discuss work with classmates may  Figure 1(a) shows the percentage of total pupils (n = 152) scoring usefulness and enjoyment of the activity from 1 to 3 on the Likert scale. Most pupils agree that the activity is both useful (72%) and enjoyable (79%). Figure 1(b) shows categories 1 (Disagree) and 3 (Agree) of the pupil feedback categorised by gender (pupils identifying as either male or female, n = 139). There is no statistically significant difference in response between males or females for either enjoyment or usefulness of the activity (Fishers Exact Test, p-values = 1).
play a key role in the success of this workshop. Scientific research is collaborative in nature and research evidence shows that working in pairs and/or groups (collaborative learning) in the classroom is beneficial for learning (Johnson and Johnson 1986). Table 3. Thematic content analysis of pupil feedback (n = 152) based on 'two stars and a wish' free text evaluation. Comments made in the 'star' sections of the feedback form are in plain text. Comments made in the 'wish' section are in italics.

Theme
Main findings Quote Group work Pupils reported that they enjoyed working in pairs and groups with friends.
'I enjoyed working in pairs' 'It was good because we got to choose our groups' Use of technology Pupils enjoy working on iPads/computers. Pupils also found the NCBI webpages interesting. Technological problems such as slow Internet connection and website server failure can be an issue, negatively impacting pupil experience.
'I liked looking through the websites' 'I enjoyed working on the database' 'It was good fun. I liked getting to go on the computers.' 'used ipads for the activity' 'I think the BLAST website was cool' 'make sure the website works' 'website functional'

Content of workshop
Pupils liked that the content of the workshop was new to them. They also found the inclusion of 'everyday' uses of bioinformatics interesting. Pupils also report that the content was at a level they could understand. The main 'wish' from pupils was that there were more activities and homework that they can do at home.

Use of technology
Pupils report that they enjoy using computers and iPads to complete the activities in this workshop: 'It was good fun. I liked getting to go on the computers.' 'I think the BLAST website was cool' 'used iPads for the activity' Pupils today have grown up with computers, the internet and technology and therefore most have a level of technological competence (Cummings and Temple 2010). Feedback from pupils suggests that they have no serious difficulties accessing and navigating the webpages and tools used in the activities. However, our workshop is dependent upon a reliable Internet connection and the servers of the websites we use. Very occasionally, servers may fail and Internet connection may be poor, and this negatively impacts pupil experience: 'make sure the website works' However, we aim to minimise the likelihood of these issues occurring, for example, by having Internet-independent access to sequences on a USB drive. Furthermore, although we have designed this workshop for use with the NCBI website, BLAST search tools and genome databases are available on other websites such as the European Bioinformatics Institute (https://www.ebi.ac.uk).

Content of workshop
Pupils like the fact that the content of our workshop is new to them and report that it is interesting to see examples of the uses of bioinformatics:

'I got to do something new and interesting with someone who was new' 'It was interesting to learn about bioinformatics and how it works and is used in everyday'
Pupils also felt that the workshop was pitched at a level they could understand, however, a small number reported that the activities were complicated. Interestingly, most of the pupil 'wishes' in this theme were for more activities, including a worksheet to take home for more practice: 'Maybe do more activities' 'Possibly hand out another sheet to take home to practice even more questions.' Overall, feedback shows that pupils found the content interesting and at an appropriate level. Therefore, we feel that we successfully achieved the aim of introducing pupils to bioinformatics and explaining how it relates to their biology course and beyond, using food fraud as an example.

Teacher feedback
The number of teachers reached via this workshop (n = 10) is much smaller than the number of pupils reached. All teachers agree that they found the workshop both enjoyable and useful (Likert score 3). Thematic content analysis of teacher 'stars' and 'wishes' identified three key themes, some of which overlap with themes identified in pupil feedback: (i) Content of workshop, (ii) Delivery of workshop and (iii) Pupil engagement (full details in Table 4). Overall, teachers report that the workshop engages pupils and provides useful practical, 'hands-on' experience of bioinformatics. Teachers commented that the content of the workshop was challenging but pitched at the appropriate level for pupils. A particular positive reported in feedback was that the workshop provided 'abstract' biological concepts, such as DNA sequencing, with real-life context and relevance.

Conclusion
Advances in DNA and genomic sequencing technology over the last decades have resulted in a major shift in life sciences research, with computation now an integral part of modern biology. This has generated demand for computationally literate researchers, particularly those with bioinformatics skills. Therefore, bioinformatics education is increasingly important for life sciences students. In this study, we have shown that our curriculum-linked bioinformatics workshop is a useful resource for secondary school biology pupils and teachers. Teachers reported that the workshop makes complex biological concepts, such as DNA and genome sequencing, less abstract. Pupils found the workshop enjoyable and reported an increased understanding of not only bioinformatics but core elements of the SQA National Biology curriculum such as genetics. Crucially, we find that there are no differences in pupil experience based on gender. Computing classes at school level in Scotland are often male-biased. So, by targeting biology classes, which are often female-biased, we reach an audience less likely to be involved in computing activities. We believe that incorporation of practical bioinformatics in the biology curriculum at secondary school level is not only useful for pupils hoping to pursue a career in life sciences but can also provide a tool to help address the gender disparities in computer science in general.
Our workshop is available as an Open Educational Resource in an editable format (https:// 4273pi.org). A more detailed version exists as a GOBLET Practical Guide (Bain et al. 2020).