Coding Code: Qualitative Methods for Investigating Data Science Skills

Abstract Despite the elevated importance of Data Science in Statistics, there exists limited research investigating how students learn the computing concepts and skills necessary for carrying out data science tasks. Computer Science educators have investigated how students debug their own code and how students reason through foreign code. While these studies illuminate different aspects of students’ programming behavior or conceptual understanding, a method has yet to be employed that can shed light on students’ learning processes. This type of inquiry necessitates qualitative methods, which allow for a holistic description of the skills a student uses throughout the computing code they produce, the organization of these descriptions into themes, and a comparison of the emergent themes across students or across time. In this article we share how to conceptualize and carry out the qualitative coding process with students’ computing code. Drawing on the Block Model to frame our analysis, we explore two types of research questions which could be posed about students’ learning. Supplementary materials for this article are available online.


Introduction
The year 2010 marked a turning point for research in Statistics Education.That year, the discipline saw the publication of the first discussion of the reflections a researcher must consider when designing a qualitative study (Groth 2010).Where previously there were few studies, we now see a breadth of methods of investigation, from case studies (e.g., Findley 2022) to grounded theory (e.g., Justice et al. 2020) to phenomenology (e.g., Theobold and Hancock 2019) to archaeology (e.g., Weiland 2019).In recent years, with the increased importance of computational skills, Data Science Education and Statistics Education have become inextricably intertwined.Even though qualitative methods of investigation have taken off in Statistics Education, few have been employed in the context of Data Science Education research.
The intent of this article is to provide a framework for qualitatively analyzing students' computing code so others can draw on this work to research the teaching and learning of data science in new ways.We motivate the need for this framework in two ways: (a) reviewing the current state of research on student learning in Data Science Education and (b) outlining how Computer Science (CS) Education has used computing code to investigate student learning.Drawing on the need to attend to code students produce in the process of their learning, we introduce the Block Model (Schulte 2008) as a framework to support the qualitative analysis of the computing code created by students.Next, we walk the reader through three phases in qualitative data analysis that must be considered when analyzing students' code: units of analysis, creating qualitative codes, and discovering emergent themes.Using these phases, we then demonstrate how a qualitative investigation of students' code could look when addressing two types of research questions posed about student learning.Finally, we conclude with a discussion of how these tools help to shape the future of research in Data Science Education.
In this article, the phrase "data science concepts and skills" is used to encompass the knowledge and skills necessary for one to engage in the entire data analysis process, focusing specifically on computing knowledge and skills necessary for this work cycle.Moreover, we consider Data Science to be a discipline in its own right, separate from, but related to, the disciplines of Statistics and Computer Science.These definitions are consistent with those laid out in the National Academy of Sciences' report on "Envisioning the Data Science Discipline: The Undergraduate Perspective" (National Academies of Sciences, Engineering, and Medicine 2018).

Investigations into Student Learning
In this section, we outline the current state of research in Data Science Education and what the discipline can learn from research on student learning from Computer Science (CS) Education.Using this research as a backdrop, we introduce a framework which supports the analysis of the code students produce in their process of learning.
Parallels can be seen between the early stages of Statistics Education and Data Science Education, as a great deal of research has focused on what to teach in data science courses, but little focus on how students learn data science concepts.Although investigations in Data Science Education have grown over the last 10 years, these explorations have focused on reports detailing: (a) concepts or competencies that ought to be included in data science programs (e.g., Danyluk et al. 2019), (b) perspectives on when to teach data science (e.g., Çetinkaya-Rundel and Ellison 2020), (c) how to teach data science concepts (e.g., Loy, Kuiper, and Chihara 2019), (d) methods for integrating data science into the classroom (e.g., Broatch, Dietrich, and Goelman 2019), and (e) assorted topics to be considered in data science courses (e.g., Beckman et al. 2021).During this time, the field has also seen the creation of numerous guidelines for data science programs (e.g., National Academies of Sciences, Engineering, and Medicine 2018), each specifying what topics or competencies should be included in undergraduate and graduate programs in data science.On top of everything else, the field has also witnessed a dramatic shift in the interfaces for programming in R (R Core Team 2020), with the development of the tidyverse package ecosystem (Wickham et al. 2019) and the RStudio integrated development environment (RStudio Team 2020).
While these reports are useful, it is important to understand how these recommendations translate into student learning.Teaching data science effectively is more than identifying end goals and developing a novel curriculum.Effective teaching demands an understanding of the perspective of the learner and how they make sense of information and write computing code in the process of learning.
The discipline of Data Science Education shares a great deal of similarities with that of CS Education; namely, the important role students' code plays in their learning.Thus, the discipline of Data Science Education stands to learn from how CS education researchers investigate the student learning in the context of the code they produce.

Investigations into Student Learning in Computer Science Education
CS Education researchers have carved out two distinct areas of research attending to the code students produce.The first area considers the code students produce, but carries out a static analysis of whether the code executes with no investigation of student learning.Since the invention of BlueJ in 1999, a free Java development environment for beginners, the investigations into novice programming behavior have steadily increased.BlueJ has allowed researchers to describe the most common errors students make and the amount of time students spend between compilations (Jadud 2005), to develop tools that describe students' compilation behavior (Jadud 2006), and to develop categories of compilation errors (McCall and Kolling 2014).While this research specifically attends to the code each student produces, it does not focus on student learning or understanding in the context of their code.Alternatively, the second area of research investigates novice programmers' understanding of introductory programming concepts, but through "foreign" code-code which the student themself did not produce.Researchers in this area probe students' understanding by providing students with code and asking them to think aloud while examining specific aspects of the code.Although these studies do investigate student understanding in the context of written code, the programs were never written by the students themselves.These types of studies take a critical step by paying direct attention to students as learners.The next step is for us, as a researcher community, to attend to the code students produce in the process of their learning.This lack of attention to the code produced by students throughout their learning leaves educators without a clear understanding of student misconceptions and growth points.In our review of the literature, we were only able to find one study that specifically attends to both the code produced by a student and their learning process (Lewis 2012).Lewis' microgenic analysis of a student's debugging behavior pairs their actions and their code side-by-side, painting a clear picture of both the bug in their code and the direction of their attention.These analyses of students' code should not be few and far between.Students' code poses a unique avenue for qualitative research in the teaching and learning of computing.

Using Students' Code as Qualitative Data
Students are creative thinkers, and their code provides a window into their learning process.A qualitative analysis of a student's code can provide insight into their creative process and collective learning processes rather than focusing solely on whether the code is executable.Moreover, qualitative methods allow for the comparison across students' code to identify commonalities that may exist and potential growth points.
For qualitative investigations of students' computing code, we propose researchers consider the Block Model (Schulte 2008) as an analytical framework.The Block Model is an educational framework that supports the analysis of the different aspects of a computer program.Table 1 summarizes the framework of the Block Model, where computing code is analyzed from two perspectives: the level of the program and the dimension of the program.The level of the program corresponds to the rows of the matrix, zooming in and out from expressions (atoms), to blocks, to relationships between blocks, and finally the macrostructure of the entire program.The dimension of the program, situated in the columns of the matrix, steps through deeper levels of program abstraction, from the text, to the program execution, to the overarching purpose of the program.We believe the Block Model is a powerful analytical framework for analyzing students' code, as each cell "highlights one aspect of the understanding process" (Schulte 2008, p. 150).Furthermore, Schulte suggests the cells should be thought of as movable, so not every cell needs to be taken into account and the cells can be arranged in different orders.
We view each row to represent a possible selection for the unit of analysis and each column speaks to a different analytical lens.To decide between the 12 possible options a researcher must consider the context of inquiry.This context dictates the scale of the code that deserves attention.An investigation focusing on the broader purpose or structure of a program requires a researcher to zoom out and consider a program's macrostructure.Whereas, studying individual pieces or segments of a program requires a researcher to zoom in on the atoms or the blocks.
The intention of this article is two-fold: (a) to outline methods for qualitatively analyzing students' code, and (b) to sketch how these data open the doors to research vital to the teaching and learning of data science.In the sections that follow, we first set the stage for conducting qualitative research, outlining the three phases foundational to every qualitative study.Then, drawing on these foundations, we describe how the Block Model could be used to address different types of questions about the code students produce.

Designing a Qualitative Study
Similar to a quantitative researcher determining what statistical method to use, qualitative researchers have a diversity of philosophical stances from which to choose (e.g., interpretive, critical, postmodern), where the choice of a qualitative stance informs the design of a study (e.g., phenomenology, case study, grounded theory).Furthermore, qualitative research also possesses a variety of data sources (e.g., interviews, observations, artifacts).Although these options present a large variety of study designs, there are some constants that hold across all forms of qualitative research.
First, in every qualitative study, "the researcher is the primary instrument for data collection and analysis" (Merriam and Tisdell 2016, p. 5).Due to the nature of how qualitative researchers are involved in the research process, qualitative methods value researcher reflexivity.This reflexivity requires that researchers "position themselves" in the context of the study, by conveying their background and discussing "how it informs their interpretation of the information in a study" (Creswell and Poth 2018).Second, the analysis of the data collected for qualitative studies seeks to find emerging themes or categories, whose meanings compose the findings of a qualitative study.Finally, the "product of a qualitative study is richly descriptive" (Merriam and Tisdell 2016, p. 5, emphasis in original).
In this article, we propose researchers consider students' computing code as an artifact of their learning, prime for investigation with qualitative methods.We have the reader follow along as we walk through three phases researchers must explore when performing a qualitative data analysis of students' code, displayed in Figure 1.To start, we describe important considerations researchers face during their qualitative data analysis process: selecting units of analysis, then creating qualitative codes, and finally classifying these qualitative codes into themes.Following this methodological introduction, we walk the reader through how these considerations play out in qualitative analyses of students' code, addressing two possible research questions.We close with additional examples of how this framework could be used for other investigations of student learning.

Determining the Amount of Data to Collect
When designing a qualitative study, researchers are first faced with deciding what type(s) of data to collect.For this research, we have situated ourselves in the context of studying the computing code students produce, akin to studying static documents.Having decided the data source, we must next consider the amount of code that should be collected.The "amount" of code included in a study can be thought of in two ways: the number of samples of a student's code to collect (withincase sampling) and the number of students a study should include (cross-case samples).For both cases, the amount should be driven by the research question(s).If the study focuses on mapping a student's learning over time, then the study should include samples of a student's code at multiple time points.Miles, Huberman, and Saldaña (2020) provide an excellent summary of what researchers should consider when determining the amount of within-case samples to include in a study.
Alternatively, it is possible for a researcher to be interested in making comparisons across students at a specific time pointa study which requires us to consider how many cases (students) to include in the analysis.There is no "correct" answer to the question of "How many cases should I include in my analysis?",as this is a conceptual question addressing the desire for confidence in our analytic generalizations.Multiple-case sampling can add confidence to the findings of a single caseby grounding the findings of a single case in the how, where, and why they occur.Multiple-case sampling sometimes seeks out special cases to illustrate a specific phenomena, such as a particular type of reasoning or a student's background.By including these varied perspectives, multiple-case sampling can strengthen the "precision, validity, stability, and trustworthiness of the findings" (Miles, Huberman, and Saldaña 2020, p. 29).
An additional consideration a qualitative researcher must make when analyzing students' computing code is which cases to compare.A cross-case comparison juxtaposes the themes emerging across students rather than the themes within an individual student or a "within-case analysis" (Miles, Huberman, and Saldaña 2020, p. 95).These two methodologies may arrive upon different themes, as themes found across students may not have appeared within each student.By isolating our attention to a single student's computing code, we can discover themes unique to that student, understand the interconnected nature of these themes, and potentially map how these themes change over time.
If we are then interested in exploring how these individual themes differ across students, a "cross-case" analysis is appropriate.A cross-case analysis can deepen the understanding of the themes within individual students by examining similarities and differences across students.The transferability of findings of a within-case analysis to additional cases can occur if similar themes are found when additional cases are considered.Furthermore, the inclusion of additional cases allows a researcher to investigate conditions associated with the appearance/absence of each theme.

Selecting a Unit of Analysis
The process of data analysis begins by "identifying segments in your dataset that are responsive to your research questions" (Merriam and Tisdell 2016, p. 203).These segments form the units of analysis, which can be as small as a single word or as large as an entire report.Collectively, themes identified in these units will answer a study's research question(s).Lincoln and Guba (1985) suggest that a unit of analysis ought to meet two criteria.First, the unit should be heuristic-that is, it should reveal information pertinent to the study and move the reader to think beyond the singular piece of information.Second, the unit should be "the smallest piece of information about something that can stand by itself " (Lincoln and Guba 1985, p. 345).A unit must be "interpretable in the absence of any additional information" (p.345), requiring only that the reader have a broad understanding of the study context.
Once the unit of analysis has been determined, the researcher can process the data, identifying segments that aid in addressing the question(s) at hand.As recommended by Miles et al., thinking of "whom and what you will not be studying" is a way to "firm up the boundaries" of what is being defined as a unit of analysis (Miles, Huberman, and Saldaña 2020, p. 26).Once these segments have been identified, the researcher then begins the process of synthesizing the information.

Creating Qualitative Codes
The process of coding, where a researcher makes notes next to bits of data that are potentially relevant to addressing the research question, can be thought of as "having a conversation with the data" (Merriam and Tisdell 2016, p. 204).Codes act as labels, assigning "symbolic meaning" to the information compiled during a study (Miles, Huberman, and Saldaña 2020, pp. 62-64).It may be tempting to view coding as the preparatory work necessary for higher level thinking, but we suggest readers think of coding as a deep reflection about, and a deep interpretation of the data's meanings.The code for a unit of data is determined through careful reading and reflection, providing the researcher with an intimate familiarity with every datum in the corpus.From now on, we will use the term "code" to denote a qualitative code and "computing code" or "statement of code" to denote the computer (R) code generated by a student.
The initial codes assigned to the units of analysis can be thought of as the "first cycle codes." There are over 25 different methods for creating first cycle codes, each with a particular focus and purpose.In our analysis of students' computing code in Section 4, we discuss two methods of coding that we believe are relevant to researchers: descriptive coding and process coding.Creswell and Poth (2018) recommend that, especially beginning qualitative researchers pay attention to the number of codes included in their database.These authors advocate starting with a short list of codes and only expanding the list if necessary-a process called "lean coding." A shorter code list makes the subsequent process-discovering emergent themesfar easier, as the process relies on collapsing codes into categories with overarching similarities.Additionally, during the process of identifying codes, it is recommended that researchers "highlight noteworthy quotes" as they code (Creswell and Poth 2018, p. 194).These noteworthy quotes can inform the development of themes and make is easier to represent the key idea(s) of a theme.

Uncovering Emergent Themes
In the second cycle of coding, often called "pattern coding" (Miles, Huberman, and Saldaña 2020, p. 79), the qualitative researcher groups the codes made in the first phase into a smaller number of categories or themes.Themes or categories, terms we will use interchangeably, in qualitative research "are broad units of information that consist of several codes aggregated to form a common idea" (Creswell and Poth 2018, p. 194).These categories can be thought of as somewhat of a meta-code.For quantitative researchers, this process can be thought of as an analogue to cluster-or factor-oriented approaches in statistical analysis.
Categories should "cover" or span multiple codes that were previously identified.These categories "capture some recurring pattern that cuts across your data" (Merriam and Tisdell 2016, p. 207).Merriam and Tisdell (2016) suggest this process of discovering themes from codes feels somewhat like constantly transitioning one's perspective of a forest, from looking at the "trees" (codes) to the "forest" (themes) and back to the trees.This process breaks data down into bits of information and then maps these "bits to categories or classes which bring these bits together again, if in a novel way" (Dey 1993, p. 44).During this process, the discrimination between the criteria for each category becomes more clear, allowing for some categories to be subdivided and others subsumed into broader categories.
Researchers are forced to decide how many themes are appropriate when addressing the research question.A large number of themes reflects "an analysis too lodged in concrete description" (Merriam and Tisdell 2016, p. 214).This principle is analogous to that of parsimony in quantitative methods-selecting the simplest possible model that adequately represents the data to avoid overfitting.Furthermore, a fewer number of themes requires a "greater level of abstraction" which should leave a researcher with "a greater ease with which to communicate their findings to others" (Merriam and Tisdell 2016, p. 214).
As the qualitative researcher begins to discern themes from the codes, the names for these themes may either be readily apparent, or may require a bit more reflection regarding the focus of the study.As the categories or themes are responsive to the study's research questions, "the names of these categories will be congruent with the orientation of the study" (Merriam and Tisdell 2016, p. 211).The names of the categories can come from three different sources: (a) the researcher's words, (b) the participants' words, or (c) external sources, such as the literature informing the study.

Reporting Results
The themes which emerge from the data form the backbone of the findings of a qualitative study.When reporting their results, a qualitative researcher weaves their themes into a narrative which is responsive to their research questions, illuminating nuances in their findings.Writing the results of a qualitative study requires researchers to consider the focus of their reportfor whom is it being written, what was the purpose of the study, and what level of abstraction was obtained during the data analysis.There is no one size fits all method for summarizing the findings of a qualitative study, but multiple qualitative researchers offer "guidelines" on the overall narrative structure of the report (Creswell and Poth 2018;Merriam and Tisdell 2016).
When writing the findings of a study, a researcher is expected to convince the readers of the trustworthiness of their findings.This trust comes from multiple avenues.First, qualitative research demands researcher "reflexivity, " a process by which the researcher discloses their experiences with the phenomenon being explored and discloses how these experiences shaped their interpretation of the data.These details "allow the reader to better understand how the researcher might have arrived at a particular interpretation of the data" (Merriam and Tisdell 2016, p. 249).A researcher must also consider issues of confirmability, reliability, credibility, and transferability.These avenues address questions including researcher bias, if an analysis is stable across researchers, if the findings of the study paint an authentic portrait of the data, and if the conclusions of a study can be transferred to other contexts.These are profound questions for which a variety of qualitative researchers have provided direction (see, e.g., Merriam and Tisdell 2016;Creswell and Poth 2018;Miles, Huberman, and Saldaña 2020).
Finally, when studying teaching and learning, it is necessary to make explicit what is meant by "learning." While a student's computing code provides insight into their learning process, it is not equivalent to explicitly measuring their understanding.Moreover, a student's computing code is context-dependent.Code produced in a classroom where templates are provided communicates different aspects of a student's learning than in a classroom where students are expected to generate their own computing code.Furthermore, there is an important distinction between a student's ability to generate source code and a student's understanding of the underlying concepts (Lister et al. 2004).If, in fact, a researcher is interested in explicitly measuring a student's understanding, it would be necessary to supplement the student's computing code with additional data sources (e.g., interviews, think-aloud tasks, screen recordings).This is not to say that the code generated by students provides no insight into their understanding, rather a caution that inferences about learning or understanding cannot be generated from computing code alone.

Qualitative Investigations into Students' Code
In this section, we explore each of the three phases of qualitative research-defining the unit of analysis, creating qualitative codes, and uncovering emergent themes-in the context of two research questions, presenting two possible perspectives for how students' computing code can be analyzed.The data used to address these questions come from a broader empirical investigation of the data science skills necessary for environmental science research (Theobold 2020).For this analysis, we focus on the code produced by two students, "Student A" and "Student B, " for their independent research project at the end of a graduate-level applied statistics (GLAS) course, a course targeted to graduate students in the environmental sciences.The independent research project was intended to provide students with the opportunity to apply concepts learned in class in the context of their own research.The two requirements for the project were that (a) students use an analysis strategy learned in the course, and (b) a visualization be made to accompany any analysis and resulting discussion.As computing was not a learning goal of the course, there were no specific skills students were expected to exhibit, aside from creating a visualization, in their projects.Moreover, as each student's research is unique, the variability of research projects was substantial.
In our demonstrations, we focus on the following research questions: RQ1 What types of data science skills do students employ when analyzing data for an end-of-semester research project?RQ2 What are similarities and differences in students' constructions of multivariate visualizations?
The following sections provide examples associated with each phase of the qualitative data analysis process when addressing each question.To address these questions, we draw on the R code produced by two graduate students during their independent research projects.In these discussions, we will use the typewriter font when we are referring to objects and functions in R.These examples focus on the techniques rather than the results.The entirety of the computing code produced by Student A and Student B for their independent research project is included in the Supplementary Materials.A full analysis of each student's code is publicly available through a GitHub repository1 and interactive website2 (Theobold 2022).
The data from which the students' code was derived, however, are not available due to anonymity concerns.

What Types of Data Science Skills Do Students Employ When Analyzing Data?
This first research question seeks to describe the types of data science skills used by Student A and Student B in their end-ofsemester research project for their GLAS course.As outlined in Section 3.1, it is important to note that this analysis uses multiple-case sampling to compile a set of data science skills used across multiple students, rather than isolating skills each student used or making comparisons between the skills used by each student.

Unit of Analysis
We chose the atom level of the Block Model as an atom satisfies the two criteria outlined by Lincoln and Guba (1985)-it reveals the data science skills used by each student and is the smallest piece of information that can stand by itself.However, an atom constitutes any language element of a program, and can thus have a variety of grain sizes, from characters to words to statements.While some data science skills can be surmised from characters (e.g., $) or words (e.g., subset()), some skills may require a larger grain size to ascertain.As such, we defined an atom as a syntactic statement of computing code.Once a unit of analysis is selected, the next step is to determine the analytical lens that should be used when inspecting each unit.The Block Model offers three analytical lenses (dimensions)-text surface, program execution, and function/purpose.The program execution dimension explicitly analyzes computing code to summarize its action or operation, which aligns with the question this analysis seeks to address.

Creating Qualitative Codes
Now that we have an analytical lens, we begin the process of creating qualitative codes.Recall, the process of qualitative coding requires a researcher to make notes next to each unit that are relevant to addressing the research question.Although there exist numerous methods for creating qualitative codes, we believe descriptive codes are well posed to address atom-level analyses.As the name begets, a descriptive code "summarizes the basic topic of a unit of data with a short word or phrase" (Miles, Huberman, and Saldaña 2020, p. 87).The intention of this atom-level analysis was to understand the computing code students produced, and descriptive codes allow us to do exactly that.In this setting, the "topic" of an atom is the operation the statement performs (e.g., object creation), whereas the content would contain information regarding the context relevant to the statement (e.g., variable names).Table 2 displays an example of how descriptive codes could be created for statements of computing code used by Student A and Student B to filter data in their research project.The descriptive codes produced for each student's code are nearly identical.In fact, the two differences between these descriptions are (a) the tool used to perform the filtering, and (b) the type of object which is being filtered.Filters a dataframe using subset(), based on equality relation with a variable selected from filtered dataframe

Uncovering Emergent Themes
Two themes were expected to emerge from the data due to the nature of the project requirements as stipulated by the professor.Recall, students were expected to (a) use a data analysis strategy learned in the course, and (b) create a visualization to accompany any analysis and resulting discussion.Thus, themes of "data model" and "data visualization" were expected to be seen in students' computing code.Additionally, from the first author's personal experiences, as both an educator and a researcher, they expected students' analyses, which used data from their own research, would also necessitate they perform some aspect of "data wrangling" during their project.While examining the statements of code assigned to the data wrangling theme, the first author noticed that students used some techniques that called upon specific attributes of data structures (e.g., dataframe, vector, matrix).As these data structure attributes persisted across many other themes, it was decided these skills warranted their own "data structures" theme rather than being relegated to a subtheme of data wrangling.A theme of "R environment" was similarly discovered by inspecting the statements associated with the themes of data model and data visualization.The most obvious statement that evoked this theme was Student A's use of with() to temporarily attach a dataframe while plotting.There were, however, other statements that also fit into this theme, such as function arguments being bypassed, sourcing in an external R script, loading in datasets, and loading in packages.The theme of "efficiency" was found in a similar vein, by recognizing code within the theme of data wrangling and data visualization which did/did not adhere to the "don't repeat yourself " principle (Wilson et al. 2014).
Through the examination of statements unassigned to a specific category, the theme of "workflow" surfaced.Contained within this theme were statements whose purpose was to facilitate a student's workflow, such as code comments or statements which inspect some characteristic of an object (e.g., structure of a dataframe, names of a dataframe, summary of a linear model).Thus, at the close of this analysis, seven themes had emerged from the data: data model, data visualization, data wrangling, data structures, R environment, efficiency, and workflow.For brevity, we will focus on unpacking the data wrangling and data structures themes.3

Data Wrangling and Data Structures
The theme of data wrangling contained statements of code whose purpose was to prepare a dataset for analysis and/or visualization.The skills associated with this data wrangling theme were: selecting variables, filtering observations, and mutating variables.Keeping this in mind, let's revisit the qualitative codes initially introduced in Table 2.Under the theme of data wrangling, both statements of code select variables, as well as filter observations.However, digging into how these tasks were carried out, we see specific attributes of vectors ([]) and dataframes ($) being used.Both statements use attributes of a dataframe when selecting variables ($); however, only Student A explicitly uses an attribute of a vector when filtering observations ([]).Table 3 above highlights the specific components of each statement classified under the theme of data structures.Statements of code evoking the theme of data wrangling did not always implore attributes of a data structure.For example, Line 1 of Student A's code in Table 4 carries out the process of filtering observations and Line 2 carries out the process of mutating a variable.However, neither statement makes an explicit call to an attribute of a dataframe.
Similarly, statements of code employing attributes of data structures were not solely for the purpose of data wrangling, as demonstrated in Table 5.In Line 1 of Student A's code, she uses c() to create a vector for the purpose of displaying a legend on a plot.Statements of code such as Line 1 of Student B's code were classified under the theme of data structures, as they create an atomic vector. 4In the R code written by Students A and B, these vectors were then used in a similar vein to what is seen on Line 2 of Student B's code, where the values stored inside these vectors are called upon as function inputs.Line 2 of Student B's code provides an interesting insight into other methods that were classified under the theme of data structures.In this statement of code, Student B uses three distinct methods for creating a vector (c(), seq(), rep()); this resulting vector is then used to create a matrix, another fundamental data structure in R. Still, there was a substantial overlap in the statements of code classified under each of these themes.Particularly, every method these students used to select variables employed some attribute of a dataframe ($, []).This is not to say this is the only method 4 The basic data structure in R is a vector.legend(15, 600, legend = c("1998-2003", "2006-2017"), col = c("black", "red"), lty = 1:1, cex = 0.8) Student B Line 1 numRows <-1000 Line 2 pMat <-matrix(data = c(seq(kTotMin, kTotMax, length.out= numRows), rep(log(sigmaEst), times = numRows)), nrow = numRows) one can use to select variables.For example, one could use the subset() or select() functions which do not explicitly call on attributes of a dataframe.These considerations as to whether an entire unit of analysis can be classified under two different themes are a critical component to deciding if a set of themes is complete.Merriam and Tisdell (2016) state that categories constructed during this process should be exhaustive, mutually exclusive, sensitizing, and conceptually congruent.First, the qualitative researcher should be able to sort the entire corpus of data into the chosen categories.Second, a particular unit should fit into only one category.This is not to say that one statement of computing code could not have certain aspects in one category and the rest in another.As we saw in Table 3, the $ and [] components of RPMA2GrowthSub$Weight[RPMA2GrowthSub$Age == 1] belong to the theme of data structures, whereas the remaining code was classified as data wrangling.Third, the name of the theme should be sensitive to what is in the data, such that an outsider could read the names of the themes and gain some insight into their nature.Finally, categories should all be at the same level of abstraction.For example, a theme specifically dedicated to one specific function (e.g., lm()) would not be at the same level of abstraction as the themes created in this analysis, which include numerous functions to describe their nature/purpose.

Reporting Results
For this research question, we investigated the data science skills used by environmental science graduate students in their endof-semester research projects.In addition to describing themes which emerged from the data, a qualitative researcher also interprets the findings of the study in the broader context of the data.Reflecting on the context of these data-data science skills necessary for analyzing data in a research project-it should be unsurprising that we see an alignment between the themes of data science skills which emerged from Student A and Student B's code and the stages of the data science cycle (Wickham and Grolemund 2017).The themes of data wrangling, data visualization, and data model see a direct overlap with the "explore" stage of this cycle, while workflow, R environment, efficiency, and data structures address the nature of data science skills that may be necessary throughout the entire cycle.Some aspects of these themes saw substantial differences between Student A and Student B, whereas others saw a large overlap.

What are Similarities and Differences in Students' Constructions of Multivariate Visualizations?
Having investigated the data science skills employed by students in their research projects, we transition now to our second research question.In this research question, we are interested in comparing differences in how students construct multivariate visualizations.Similar to the previous section, the data for our analysis will be drawn from the R code produced by Student A and Student B in their research project at the end of their GLAS course.However, different from the previous analysis, this research question allows us to explore two possible analytical methods-within-and cross-case analyses.A within-case analysis allows us to explore multivariate visualizations within one student, whereas a cross-case analysis explores multivariate visualizations across students.

Unit of Analysis
Although it is possible to create a multivariate visualization in one syntactic line, students' constructions may use multiple lines of code to create a visualization.As such, we have chosen the block level of the Block Model for this analysis.As shown in Table 1, an underlying process is the focal point of this type of investigation, and as such, a block should reflect the nature of the process itself.The size of the block depends on the question of interest.If a researcher has difficulty deciding how to define the region of interest (ROI) for a block, Spohrer, Soloway, and Pope (1985) recommend looking at the overarching goal of the program (e.g., a student's analysis).Once the goal has been defined, the researcher can then "look at the program to find lines of code that are connected" in how they achieve a specific goal (p.166).
For their project, both students created visualizations with different colors for different groups.As such, we defined a "block" to be any instance where a student plotted the relationship between two variables, coloring certain aspects of the plot (e.g., points, lines) by group affiliation.Once the region of interest has been defined, the next step is to isolate every block in the student's computing code that meets this criteria.Table 6 displays one such block found in Student A's research project, consisting of six lines of code.
Using the blocks identified in each student's computing code, there are two ways the analysis could unfold.If we were interested in comparing the process of creating multivariate visualizations between students (a cross-case analysis), the previously identified block level analysis would be the most appropriatecomparing blocks between students.Alternatively, if the focus of Table 6.Example of code generated by Student A which creates a multivariate visualization, modeling the relationship between two variables and coloring points based on group affiliation.Line 1 ##Exponential function Line 2 expAnterior <-lm(PADataNoOutlier$Lipid∼log (PADataNoOutlier $PSUA)) Line 3 summary(expAnterior) Line 4 expAnterior Line 5 with(PADataNoOutlier, plot(Lipid∼log(PSUA), las = 1,col = ifelse(PADataNoOutlier$'Fork Length' < 260,"red","black"))) Line 6 abline(expAnterior) the research was on how the process each student used to create scatterplots with different colors for different groups was similar or different across blocks within their computing code (a withincase analysis), we could consider the "relationships" level of the Block Model.In the sections that follow, we provide guidance on how each of these two types analyses might unfold.
Having defined blocks as the level of analysis, the next step is to determine what dimension is the most responsive to the research question.As this question seeks to understand how students construct their multivariate visualizations, once again, the program execution dimension seems the most appropriate.Moreover, focusing on the operations of a block provides an opportunity to discuss another type of qualitative coding: process coding.

Creating Qualitative Codes
A "process code" or "action code" uses gerunds ("-ing" words) to connote action in the data (Saldana 2013).Process coding is especially salient when analyzing blocks of computing code, as blocks are comprised of sequences of statements.The ordering of these statements speaks to the decisions each student made when carrying out their process, as the structure can be "strategic, routine, random, novel, automatic, and/or thoughtful" (Corbin and Strauss 2008, p. 247).Additionally, processes can be intertwined with time, such that actions can emerge, change, or occur in particular sequences.Process coding can also be used to analyze how a student's computing process changes and/or evolves over time.
Table 7 displays another block found in Student A's research project with annotations exploring how Student A enacted the process of creating a multivariate visualization: plotting a scatterplot of one group, creating a line for that group, then adding colored points for the second group, including a colored line for the second group, and finally, inserting a legend detailing which group each color corresponds with.

Uncovering Emergent Themes-Within-Case Analysis
We have now seen two blocks of Student A's computing code which create multivariate visualizations.In Table 6 (Line 5), Student A uses the ifelse() function to change the color of the points within the plot() function.Table 7 presents a second "block" of Student A's code which has a slightly different structure than what was seen in Table 6 but carries out the same process of plotting the relationship between two variables with different colored points for two groups.In Table 7, we see that Student A begins by creating a scatterplot between two variables for one subset of the data, they then add line segments between the points, next they include points for a different subset of data, coloring the points red, and finally they add line segments between the red points.Unlike the previous plot, in this block Student A finalizes the plot by including a legend explaining which group each colors is associated with.In Table 6, however, Student A is able to accomplish the process of creating different colors for the points in one statement of code-using ifelse() inside the col argument.Thus, it may appear that Student A is more efficient in Table 6, but there is an important difference in this situation-the points being plotted are contained in not one but two datasets.Additionally,

Uncovering Emergent Themes-Cross-Case Comparison
In Table 8, we explore an alternative qualitative analysis approach-a cross-case comparison of the process used by Student A and Student B. We see that, for both students, this process consists of five statements of code, beginning with an initial plot and ending with a legend.
We see many similarities in the process carried out by Student A and Student B. Both students add points to their plots, modify their axis labels, rotate their axis tick mark labels (las), and include a legend in their plot.There are, however, notable differences within these similarities.Student B uses the builtin type argument to generate a line plot, rather than pairing the plot() and lines() functions.Whereas Student A specifies axis labels within the plot() function, Student B uses the title() function to declare more specialized axis labels.Finally, these students differed in their placement of the legend, with Student A specifying specific x and y coordinates and Student B using the ("bottomright") string specification.Notably, for nearly every plot these students created, they employed the same plotting customizations, consistently changing both the axis titles and the orientation of axis labels-a skill they learned in their GLAS course.

Reporting Results
Within-Case Analysis Although short in nature, these two blocks of code paint a picture of Student A's understanding and her experiences gaining computing skills.The first code block (Table 6) only slightly strayed from the scatterplot "template" provided by the GLAS instructor.In fact, the only difference is the specification of a col argument.When Student A was asked how they learned to produce different colors inside the plot() function, she stated that she had used Google to find something that worked.All of the other components, however, Student A was able to replicate using the examples she had seen in GLAS (e.g., using with(), specifying las = 1).As stated previously, the plotting scenario presented in Table 7 differs from Table 6 in a substantial way-there are two datasets being plotted.Similar to the nature of how she figured out how to plot two colors with the ifelse() function (shown in Table 6), Student A was never presented with plotting two datasets on the same plot in her GLAS course.When questioned about her process of calculating summary statistics (using ddply()) and plotting these summary statistics, Student A stated she had relied on another graduate student's code which they gave her.This perspective leaves us with a greater understanding for why Student A used two different methods when carrying out similar plotting processes-she was piecing together skills from external resources.These piece-bypiece solutions never allowed Student A to "abstract what she learned from each task to broader classes of tasks" (Nolan and Temple Lang 2010, p. 100), unable to see how she could merge two datasets and use the same coloring process she had used before.
Cross-Case Analysis While small, the differences in these students' computing code illustrates profound imbalances in the R environment theme seen in Section 4.1.3.Student B's ability to use the built-in type argument and use a string specification for a legend's position came from her understanding of functions in R. Specifically, Student B was aware of and able to access a function's documentation to use all the options a function could provide.On the other hand, Student B was unaware of these built-in arguments, as well as other functions' built-in options (e.g., lm()).
Although Student A and Student B showed substantial differences in their knowledge and abilities when working in R, the visualizations they each created used nearly identical skills.Both students created predominantly bivariate visualizations using scatterplots, adding colors for groups to their visualizations either by using a conditional statement or specifying the color of additional points and lines.When adding these colors to their plot, both students created legends, differing only in their placement.Furthermore, both students added additional lines to their plots, both using abline().Throughout the R scripts, Student B would add vertical lines to her plots, corresponding to quantiles of a vector, and Student A would add linear trend lines, corresponding to the variables included in each scatterplot.

Avenues for Qualitative Research of Students' Code
In this article, we outlined how qualitative analyses situated within the Block Model framework can be used to investigate two types of research questions.However, you may be wondering how this framework might be used to address additional research questions, and the types of data and participants these questions would require.In the supplementary materials, we provide examples of four additional research questions regarding data science education, outlining the unit of analysis, data collected, cases considered, and qualitative coding method we believe would be appropriate to address each question.In this section, we outline two different avenues in data science education research which are ripe for investigation and are well suited for using these types of qualitative methods.

Learning Trajectory for Data Science Concepts
First and foremost, there is a great need for research investigating how students learn data science concepts and skills, whether independently or alongside statistical concepts.A recent study by Fergusson and Pfannkuch (2021) demonstrates how this type of phenomenon can be investigated-exploring how teachers connect the processes of GUI-driven statistical tools to codedriven tools.After thoroughly discussing the computational process underpinning the GUI-based randomization test, teachers were able to "immediately make links between each line of [R] code and what they knew about the randomization test" (p.13).Although this study did not investigate teachers' ability to independently write code, it opens the door to future research on how statistics and data science concepts can be learned sideby-side.
This type of learning, often referred to as "learning trajectories" or "learning progressions, " is "a set of behaviors (including both landmarks and obstacles) that are likely to emerge as stu-dents progress from naïve preconceptions toward more sophisticated understandings of a target concept" (Confrey 2006, as cited in Arnold et al. 2018, p. 298).Learning trajectory research has gained prominence in statistics education research over the last 20 years (Arnold et al. 2018), but has yet to be seen for research in data science education.
Yet, data science educators produce hypothetical learning trajectories which reflect their predictions for "how students' thinking and understanding will evolve in the context of the learning activities" (Simon, Geldreich, and Hubwieser 2019, p. 136).These hypothetical learning trajectories "capture the result of a process in which a teacher posts a conjecture regarding their students' current understanding of a targeted concept and then develops learning activities they believe will support them in constructing more sophisticated ways of reasoning toward a particular learning goal" (Lobato and Walters 2017, p. 83); however, these hypothetical learning trajectories may not accurately reflect students' reasoning processes and the connections students make as they learn data science concepts.
With a myriad of data science programs around the country, there likely are multitudes of hypotheses surrounding how students learn data science concepts and skills.Until we formally evaluate these hypotheses we cannot in good faith state that our curricula effectively build student understanding.We maintain that the Block Model provides a robust framework for qualitatively analyzing the computing code students produce in the process of learning (Izu et al. 2019), especially with the creation of tools for storing student-generated code (Kross and McGowan 2020).Particularly, when paired with student think-aloud interviews (Reinhart et al. 2022), where students explain their thinking with respect to the computing code they produced, these methods can provide important insight into the perspective of the learner and how they make sense of information and create computing code in the process of learning.
A common educational approach for studying learning trajectories is design-based implementation research (DBIR) methodology (Confrey and Lachance 2000;Cobb et al. 2003;Gravemeijer and Cobb 2006;Prediger, Gravemeijer, and Confrey 2015).DBIR uses deliberately designed activities to investigate the development of students' learning, with the aim of developing or testing theory.Learning to explicitly outline how one believes students learn and designing tasks which investigate these hypotheses is no small feat!Rather than outlining and evaluating students' learning across an entire curriculum, investigations into the teaching and learning of data science concepts and skills can, and should, start small, focusing on connections between a few concepts.Then, as one becomes more familiar with the DBIR process, scaling up to consider multiple concepts should feel less daunting.For example, a researcher could start with a targeted activity paired with think-aloud interviews to probe how student conceptions of dataframes influence their understanding of code used to filter data.Alternatively, research could examine how these conceptions of dataframes inform students' understanding of code written to pivot data.Examining these small relationships will give way to larger investigations, such as exploring how the use of named arguments informs students' conception of user-defined functions.

The Role of Programming Environment
For nearly 25 years statistics educators have navigated the pedagogical decision between using "tools for learning" and "tools for doing" data analyses (Biehler 1997).However, during this time we have seen the growth of tools which are not overly complex and extend past an introductory statistics class (McNamara 2015).The mosaic package (Pruim, Kaplan, and Horton 2017) is built on the foundational idea that a tool which minimizes the cognitive load of students helps to foster their creativity.Unique in its creation, this concept of "less volume more creativity" (Pruim, Kaplan, and Horton 2017, p. 77) seems to have proliferated throughout statistics education (Lovett and Greenhouse 2000;Guzman et al. 2019;Burr et al. 2021;Fergusson and Pfannkuch 2021;Gehrke et al. 2021;McNamara et al. 2021;Çetinkaya-Rundel et al. 2022).Although cognitive load has become a principal consideration for teaching programming alongside statistics, few studies directly investigate how syntax impacts students' learning.
Computer science educators have found the intuitiveness of programming language syntaxes differ substantially for undergraduate students, as well as their ability to write accurate code (Stefik and Siebert 2013).Rafalski et al. (2019) extended these same ideas to compare students' ability to write accurate code across three different R syntaxes: the tidyverse, base R, and the tilde style.While the authors did not find evidence of a difference in the number of errors or the time to completion, they did find a relationship between syntax and task, suggesting certain tasks are better aligned with specific syntaxes.Myint et al. (2020) reiterate the possibility of an incongruence between tasks and syntax, finding that students were more comfortable creating a complex plot using ggplot2 rather than base R.
These studies spotlight the need for research exploring how different programming environments facilitate or impede the learning of data science concepts.Research comparing different syntaxes has typically focused on the result of a student's code (e.g., plot, accuracy) rather than directly inspecting the code itself.In contrast, we advocate these investigations pay direct attention to students' computing code, acknowledging the rich data it provides for understanding a student's learning process.
A key philosophy of the tidyverse syntax is its "human centered" design (Wickham et al. 2019), where function names are verbs which describe the actions each function performs.However, there are no empirical studies which investigate how these names impact learners' mental models of what the function accomplishes.By pairing a qualitative analysis of students' code with think-aloud interviews researchers can uncover how the language (text surface) of a function relates to a student's mental model of the function's action, or a student's ability to acquire new data science skills.

Conclusion
The field of data science education is emerging as its own discipline of research, primed to investigate the teaching and learning of data science concepts.While the field has seen reports summarizing the concepts or competencies that ought to be included in data science programs or how to infuse data science into the statistics curriculum, as well as strong opinions on which R syntax should be taught, we have yet to see empirical research directly examining how students learn data science.Without these investigations, how can we "distinguish merely interesting learning from effective learning" (Wiggins and McTighe 2005)?
Data science education faces a multitude of open questions surrounding the teaching and learning of data science, and we posit the horizon of research in data science education critically inspects student learning from the perspective of the learner.We hope this future research pays specific attention to students' computing code as a relic of their learning, with more thoughtful investigations than whether their code contains errors.Furthermore, we believe qualitative research will play a dominant role in the future of data science education research, and hope the methodology outlined in this article inspires and emboldens researchers to continue this important work.

Figure 1 .
Figure 1.Critical components of the qualitative data analysis process.
program (in the context at hand) Relationships Relations & references between blocks (e.g., method calls, object creation, accessing data...) Sequence of method calls, object sequence diagrams Understanding how subgoals are related to goals, how function is achieved by subfunctions Blocks Regions of interest (ROI) that syntactically or semantically build a unit Operations/actions of a block, a method, or a ROI (chunk from a set of statements) Understanding the function of a block, seen as a subgoal Atoms Language elements Operation/action of a statement Function of a statement: its purpose can only be understood in a context NOTE: The rows of the matrix are associated with the "level" of a program and the columns are associated with the "dimension" of the program.

Table 2 .
Descriptive coding of two statements of R code produced by Student A and Student B.

Table 3 .
Highlighting sections of code classified as data structures within statements of code in the theme of data wrangling, as indicated by pink text.

Table 5 .
Examples of statements of code classified solely under the theme of "data structures."

Table 7 .
Example of process coding for Student A's statements for creating a scatterplot with different colors for different groups.

Table 8 .
Example of processes for Student A and Student B, where each block of code creates a scatterplot with different colors for different groups.