Factors related to teaching quality: a validated questionnaire to assess teaching in Spanish higher education

Abstract Quality in higher education requires the evaluation of the teaching-learning and assessment methodologies used by the teachers, the adaptation of the students, as well as the resources used. To respond to this need, this study aims to analyze the validity and reliability of the Factors Related to Teaching Quality (FRTQ) questionnaire, developed for this research. The sample was obtained among Spanish undergraduate students (n = 291). Validation was performed by a content and construct analysis through a review of the literature on the dimensions of teaching, learning and assessment, in addition to obtaining the measure of sampling adequacy and a factor extraction. Reliability was determined by Cronbach’s alpha. The validity and reliability of the questionnaire was confirmed, and the factor analysis showed a six-factor structured model: Functions of the assessment; Attention to diversity, Clarity and control of the educational process; Learning resources; Teaching resources; Assessment resources. The conclusions indicate the importance of approaching some aspects of the educational process from the dimensions of teaching and learning at the same time due to their transversal nature, as well as the relevance of having a useful instrument to evaluate the processes of change and improvement of teaching in Spanish higher education.


Introduction
Assessing the quality of university teaching performance, with the aim of advancing academic excellence, is an element of interest for educational research due to the important impact that improvements in universities have on the development of societies (Gil Álvarez et al., 2017). As a complex and conflictive activity (Maussa Diaz & Hernández Romero, 2018), it is necessary to have valid and reliable instruments to assess the performance in order to identify strengths and weaknesses (Cipagauta-Moyano, 2019), training needs (de Dios Alija et al., 2017) or assess whether the training received leads to changes in the teaching practice that are consistent with that training (Gómez & Valdés, 2019).
The evaluation of the teacher's performance needs to be addressed from the three dimensions of the educational process: teaching, assessment, and learning. Therefore, the theoretical background of this study is the analysis of these three dimensions of teaching in higher education. The theoretical analysis will allow the elaboration of the items that make up the Factors Related to Teaching Quality (FRTQ) questionnaire (Figure 1), which will be subsequently validated by checking the validity of its content, its internal structure and its internal consistency.
In contrast to the traditional transmissive model, active methodologies in these three dimensions of education are emerging as proprietary elements in the achievement of quality education in higher education (Borralho et al., 2015;Bozzi et al., 2021;Fernández-Sánchez et al., 2020;Jiménez Hernández et al., 2020). Boosting the quality of education also requires the adaptation to the rapid introduction of the digital context in higher education. The incorporation of new technological resources into the educational system in the last years (Ramos-Pardo et al., 2020), particularly in the European and Spanish context, requires an emphasis on digital literacy and the development of digital competence in the use of 2.0 resources, not only in the teaching-learning process, but also in the evaluation process (Guillén-Gámez & Mayorga-Fernández, 2020). Nowadays, the introduction of Information and Communication Technology (ICT) opens up new fields and opportunities to improve academic results and the quality of the educational process, extending educational processes both temporarily and geographically (Paredes-Labra et al., 2019), especially since such integration of technology has been accelerated by the COVID-19 pandemic (Czerniewicz et al., 2021).

Teaching dimension
With regard to the teaching dimension at university and the performance of the university teaching staff in the Spanish context, it should be considered that this has been altered in recent decades by reforms promoted by national and supranational bodies, which have implemented the European Higher Education Area (EHEA) since the beginning of the century, reforms which, in the opinion of Tena Piazuelo (2020), have generated debate on whether they have favoured the improvement in the quality of education and teaching, and which will be evaluated with the appropriate instruments. For Álvarez-Arregui (2019) it has had an effect on the quality of education and teaching in the case of Spain, because the changes have been in the direction of seeking results in terms of effectiveness as opposed to the aspiration to shape teaching practice toward a function that articulates and controls the educational process centered on the development of autonomous student learning and academic guidance. This may entail the risk that the students identify the university as a service that is provided in exchange for money and not as an institution that demands their involvement and effort .

Assessment dimension
Regarding to the dimension of student assessment, the decisions that teachers should make are based on the duality between summative or formative assessment, which can be addressed from an integrative or dichotomous perspective (Broadbent et al., 2018). These decisions carry an important ethical and social burden, sometimes, "understood as synonymous with standardized, external, summative and output-focused measurement" (Moreno Olivos, 2014, p. 16), with consequences that condition previous decisions about the teachinglearning process, or the decisions to be made due to the needs detected throughout the educational process.
Currently, research on formative assessment, which is more oriented toward improving learning, is gaining ground against more classical assessment focused on standardized tests and isolated from teaching and learning processes (Villarroel et al., 2018). Professionals and researchers are concerned about the availability of the instruments to understand and evaluate the decisions made by the university teaching staff with regard to student assessment, as well as their relationship with educational quality and academic performance, which becomes even more relevant when talking about university teaching for the training of early childhood and primary education teachers (Molina-Soria et al., 2020) because of the impact it may have on the rest of the educational stages.  Mellado-Moreno et al., Cogent Education (2022), 9: 2090189 https://doi.org/10.1080/2331186X.2022.2090189

Learning dimension
It is assumed that the methodologies which favor active learning require specific material conditions, e.g., an adequate number of students per classroom, and also an adequate number of strategies that consciously extend disciplined autonomy in learners or incorporate error as an opportunity to learn and improve the learner's memory (Metcalfe, 2017), as well as the development of the ability to make complex judgments about the area of knowledge under study (Deeley & Bovill, 2017). Traditional teaching strategies, in which students play a passive role, promote a seemingly less successful methodology (Deslauriers et al., 2019;Gil-Galván et al., 2020), even though it may generate a sense of greater confidence regarding how much the students have learned.
This different view of the educational process in higher education implies greater interaction between teachers and students and is embodied in teaching methods that guide and adapt to students, such as teacher-student feedback, understood as a cyclical and dialogical process (Ajjawi & Boud, 2017), which arises interest among the students. However, its scarce presence in the classroom generates a certain dissatisfaction among the students (Mulliner & Tucker, 2017). On the other hand, the students who start from a good disposition and previous knowledge deepen the skills necessary for the feedback to be effective and lead to an improvement in the educational process and in their academic results (Carless & Boud, 2018).
This introductory approach to the dimensions of teaching, assessment and learning is the starting point for the design of an instrument to assess the Spanish teacher's performance in each of these dimensions. In the literature review, a gap has been found in which there is no questionnaire with strong psychometric properties available in Spanish for the evaluation of the quality of teaching in the European Higher Education Area (EHEA) system.
For this reason, the aim of this work is to propose an instrument for the evaluation of university teaching, and to calibrate it in terms of validity and reliability through the validation of its content by expert judgment, the validation of the construct and the reliability analysis. In addition, the other purpose of this work is to identify the factors underlying the set of variables to determine whether the items grouped by factors are logically connected and whether a satisfactory total variance explained by these factors is obtained.
The remainder of this article is therefore structured as follows: method, describing in detail the process of design, sample obtained and calibration of the instrument by analyzing its validity and reliability; discussion, which presents the reduced version of the questionnaire, obtained because of its calibration, and also discusses the factors into which the items have been grouped; finally, the conclusions drawn from this work and the possible lines for its continuation are presented.

Method
Bearing in mind that the aim of this work is to propose a tool for the evaluation of university teaching, and to calibrate it in terms of validity and reliability, the development of the research has been carried out from a quantitative perspective. Thus, a questionnaire was chosen as one of the most commonly used evaluation instruments to assess teachers (Ďurišová et al., 2015;Goos & Salomons, 2017;Montoya Vargas et al., 2014), and its application is generally reliable, provided that the size of the sample is adequate.
The validation process was the main element in defining the research questions and hypotheses (Table 1). First, the necessary evidence was formulated from the empirical concerns in order to subsequently establish the study questions and hypotheses.
The method can be summarized in the following stages: • Design of the questionnaire, based on existing literature and which aims to give greater importance to student assessment and the use of ICT resources.
• Sample, obtained from Spanish undergraduate students.
• Calibration of the instrument, through content and construct validation, followed by reliability analysis and factor extraction.
• Analysis and discussion of the results.

Assessment questionnaire
In the design of the Factors Related to Teaching Quality (FRTQ) questionnaire, we have considered aspects related to teaching planning, clarity of the educational process, attention to diversity, the model of student assessment or the use and typology of materials and resources (Gil Álvarez et al., 2017;Gómez & Valdés, 2019;Goos & Salomons, 2017;Rosas Villena, 2016) being grouped according three different dimensions: teaching, assessment and learning.
In contrast to the approach followed in most research in this area, the questionnaire designed does not consider the performance of individual teachers, but of teachers as a whole, nor does it seek to find out the degree of student satisfaction, as it has a very low correlation with student learning and academic performance (Uttl et al., 2017), but their perception of the educational process as a whole. Furthermore, considering the theoretical basis, in which special relevance is given to student assessment, we decided to include elements that are less common in previous works, such as the use of ICT resources.
When determining the size of the measuring instrument, i.e., the total number of items, it was considered that it should not exceed 50 questions to avoid discouraging participants and obtaining an insufficiently representative sample, and to guarantee the reliability of the work. The responses to the items have been considered Likert-type (2 to 5 items per sub-dimension) with a response interval of 1 to 4. (Vílchez-González et al., 2015), where 1 means the person strongly disagrees with the question and 4 means they strongly agree with it.
The questions were divided into three blocks, corresponding to the teaching-learning-evaluation dimensions, which in turn are divided into sub-dimensions and, finally, into items:

Research question or hypothesis
Evidence based on content validity R1 Are the items of the FRTQ questionnaire relevant and adequate in terms of the undergraduate teaching evaluation construct?

R2
Are the items of the FRTQ questionnaire clear, correctly ordered and understandable?
Evidence based on internal structure

H1
The data confirm the construct validity of the FRTQ questionnaire with a coefficient in the test of sample adequacy (KMO) higher than 0.8.

H2
Factor extraction identifies factors consistent with the model that explain more than 50% of the variance.
Evidence based on internal consistency (reliability)

H3
The data confirm the reliability of the FRTQ questionnaire with a Cronbach's alpha coefficient above 0.7.
• Teaching dimension, made up of 8 items related to planning, grouped under the sub-dimension "initial planning", to teaching methods, grouped under the sub-dimension "methodology" and, finally, to teaching-learning activities, grouped under the sub-dimension "teaching resources and activities".
• Assessment dimension, made up of 14 items, grouped around the "assessment criteria" established by the teaching staff, "assessment techniques" used and "functions of assessment " in the different subjects.
• Learning dimension, made up of 8 items, mostly concentrated in the sub-dimension "learning techniques", except for those related to student autonomy, which are included in the subdimension "active and autonomous learning". Table 2 shows the content of the final questionnaire, which consists of 30 items, organized on the basis of 3 dimensions and 8 sub-dimensions. The questionnaire was deployed in digital format and the data collection was carried out using the Google Forms tool.

Sample
For the selection of the participants, a non-probabilistic sampling was carried out among undergraduates of the degrees of Teacher in Early Childhood Education and Teacher in Primary Education of Spain. The choice of these degrees was motivated by the influence that early childhood and primary education have on the rest of the educational stages. Thus, the participants who completed the questionnaire have the profile of future teachers and have some prior theoretical knowledge about the teaching-learning-evaluation process. This allows them to provide wellfounded answers, making the results more solid and, as well as the calibration of an instrument with which an educational process is evaluated.
The target population was a total of 427 Spanish university students, of whom 219 students between 20 and 21 years of age participated, distributed unequally by gender between 219 (75.2%) women and 72 (24.8%) men. Therefore, the participation rate was 51.3%.
All the students were Spanish, so the instrument does not reflect any cultural or racial differences, and among them there was no kind of disability that would require adapting the questionnaire. The researchers checked that all participants had their own cell phones so that everyone had the same opportunity to fill in the questionnaire voluntarily.
Upon accessing the link provided, students were informed of the objectives of the research work for which their participation was requested and, if they participated, that their data would be treated anonymously and used exclusively for this research. In addition, they were informed that participation implied the student's consent to use the data for the indicated purposes.
Most of them completed it during school hours, either at the beginning or at the end of each lesson, and taking as much time as they needed to do so. The students who could not attend class had ten days to complete the questionnaire outside the classroom and accessed it through the link provided in the section in the Moodle platform section of the teachers who collaborated in the research.

Data analysis
The methodology proposed by Lacave Rodero et al. (2016) was used to validate and analyze the reliability of the questionnaire, which sets out the stages in the process of calibrating a teacher questionnaire. For the calibration and validation analysis, we used the IBM SPSS Statistics v. 26 statistical package.
Before starting the data collection, and in order to ensure that the content of the questionnaire was representative of the object of study, the content of the questionnaire was validated by expert judgment. The group of experts was composed of four PhDs in Education, two from Spain and two from Portugal, with extensive experience in university teaching and knowledge of assessment instruments. They were chosen from two different countries in order to gather different points of view in the validation.

Results
The initial content of the questionnaire provided to the experts is not the one shown in Table 1, although it was very similar, consisting of the same items (R1), with only the order of some of them or their wording varying (R2). Precisely the aspects related to the final order and wording were the result of the expert judgment. Suggestions for improvements were approved by all participants at this first stage, hence the version of the questionnaire shown in Table 1 was considered final.
The relevance of the construct validity of the designed instrument was confirmed by calculating the test sample adequacy (KMO) with a value of 0.838, confirmed by the significance level of the Bartlett's test of sphericity, where a value of 0.000 was obtained. In addition, a factor extraction was carried out by selecting the maximum likelihood estimation, which allowed identification of latent factors in groups of variables, in which 8 factors with eigenvalue greater than 1 were obtained, explaining only 58.66% of the model.
Because the subject area of the questionnaire is evaluation in education and the aim is to draw conclusions about social phenomena based on empirical data (Gutiérrez, 2009), a reliability analysis was performed to indicate that the responses to the questionnaire items are consistent. In order to confirm its relevance, Cronbach's alpha was calculated, taking a value above 0.7 as the minimum reliability criterion (Esposito et al., 2015). In this case, a value of 0.849 was obtained and all items scored above 0.800. Table 3 shows the results obtained, and it can be observed that the deletion of items 2.9, 2.10, 3.1, 3.2, 3.3, 3.4 and 3.8 would increase the reliability of the questionnaire. Furthermore, the saturation of these items was under 0.2, which justifies their elimination.
Cronbach's alpha was recalculated, and its value rose to 0.889 (H3). Re-running the test for each of the elements we concluded that it was not appropriate to remove any further elements.
In terms of validity, first of all, we considered that for the final questionnaire, composed of 23 items, the size of the sample (n = 291) was optimal, considering that the communality of most of the items was close to 0.70 (Lloret-Segura et al., 2014).
After the reduction in the number of items in the initial version, its validity was re-analyzed obtaining a value of 0.884 for the KMO statistic (H1) and a significance level of 0.000 for the Bartlett's test of sphericity. Next, the factors were extracted, again using the maximum likelihood method, reducing the factors with an eigenvalue greater than 1 to 6, which in this case would explain 57.28% of the variance (H2). A matrix of principal components was generated, and we observed that almost all items were grouped according to a single factor, therefore we decided to make the matrix more interpretable by using the Varimax rotation method with Kaiser normalization. Table 4 shows the results obtained, where the items are distributed more consistently with the proposed model.
The results showed strong reliability, good content validity and acceptable factor identification statistics. The main result of the calibration of the questionnaire is its reduction in the number of items and its reorganization into 6 factors, as shown in Table 5.
For the reduced questionnaire, the sample size (n = 291) was optimal, considering that the communality of most items is close to 0.70. (Lloret-Segura et al., 2014).
Grouping the items according to the 6 factors obtained is also consistent. The reliability analysis by factors (Table 6) indicates that the strongest results are found in the factor referring to evaluation functions, while the last three factors, referring to teaching, learning and evaluation resources, show worse reliability results, although with adequate inter-item correlations.

Discussion
Through the items included in the first factor ("functions of assessment") it is possible to identify how the student assessment is carried out and to what extent it is used to improve student learning, the feedback received and the relationship of the assessment process to the final qualifications achieved. This section is the core of the questionnaire, which was based on the assumption that methodological decisions in the evaluation condition the whole educational process (Borralho et al., 2015;Moreno Olivos, 2014). Considering the study of Tait and Kulasegaram (2022), the variables associated with this factor are coherent with the characteristics that define the so-called programmatic evaluation, increasingly present in North American higher education, which aims to favor the transition from a summative evaluation to a learning or formative evaluation (Kulasegaram & Rangachari, 2018). The factor labelled "attention to diversity" groups together items from the initial dimensions of teaching and assessment, those related to the methodology used in the classroom, making it easier for the students to know what they need to study, to the feeling that the assessment is objective and adapted to their needs. In coherence with the model of attention to diversity in universities of Ramos Santana et al. (2021), the variables are related to actions aimed at guaranteeing equal opportunities to achieve academic success among an increasingly diverse student body due to the expansion of higher education in Spain. The development of critical thinking is also included in this factor because working on this aspect reading and producing academic texts, makes it possible to correct inequalities of origin, especially in first-generation university students (Núñez Cortés & Errázuriz Cruz, 2020).
The factor associated to "clarity and control of the educational process" is built around the precepts that students know what is expected of them, they identify what they are learning as relevant to their future profession and that it enhances their professional identity (Barba-Martín et al., 2020) and there is control of the process by both the teaching staff, through constant assessment of how the students progress, and by the students themselves, who develop their learning autonomously and receive feedback through continuous assessment, which helps them to correct any mistakes they make. Clarity for Hills et al. (2018) plays a fundamental role in the student-tutor relationship, which determines the quality of communication between them and must be addressed to ensure that the student does not perceive the learning and evaluation process as incoherent and meaningless.  Finally, the factors regarding teaching, learning and assessment resources have been identified separately although they refer to the same issue: the correct adaptation of teaching-learningassessment tools and activities through a variety of resources, including ICT resources, which allows the different tasks and activities to be addressed in a meaningful way (Cedeño Mendoza et al., 2020). The variety of resources is one of the great challenges facing universities due to their potential to improve student training and the academic production of higher education institutions, but it is an element that exceeds the educational aspect and must also be considered in terms of efficiency and the economic capacity of universities (Santos Tavares et al., 2021).

Conclusions
This paper has described the design process of a FRTQ questionnaire to evaluate university teaching in Spain, which consists of 30 items organized into 8 sub-dimensions. The calibration process has led to a reduction in the number of items (from 30 to 23) and in the number of subdimensions (from 8 to 6), with the final grouping varying from the initial version in some respects. Thus, the new indicators obtained are related to the quality of teaching in higher education, focusing solely on pedagogical aspects, and represent functions of assessment, attention to diversity, clarity and control of the educational process, as well as learning, teaching and assessment resources. Thus, the study provides a questionnaire in Spanish, adapted to the current reality of the European Higher Education Area (EHEA) and with strong psychometric properties.
Based on theoretical reflection, most models of evaluation of educational processes focus on identifying the variables of analysis according to whether they belong to the dimensions of teaching or learning. However, the results of the FRTQ questionnaire indicate that in some aspect it is more relevant to analyze the educational process based on sub-dimensions that contain variables belonging to more than one dimension. This is the case of the sub-dimensions identified as attention to diversity and transparency and control of the educational process. The analysis of these sub-dimensions transcends the gap between teaching and learning and requires a cross-cutting approach, identifying how a decision or strategy can affect both sides of the process In the case of evaluation, a similar phenomenon takes place, which had already been identified in the theoretical framework of the research. The FRTQ questionnaire groups in the same sub-dimension the different functions that can be addressed by evaluation, also in a cross-cutting manner. Thus, the strategies chosen for student assessment have the capacity to influence the teaching and learning process itself, eventually allowing students to learn better through assessment. The only exception found to this model in which the teaching and learning dimensions are interrelated are educational resources. In this case, it is still relevant to differentiate between materials for teaching, learning and assessment, so their analysis should be considered independently.
Among the limitations of the study, we found a model that is based on a questionnaire that collects data on the perception of students receiving educational training. This means that the results are exposed to errors of student perception and to the sincerity of the students when answering the questions in the questionnaire, elements that may be affected by multiple factors beyond the control of the research. Therefore, it is necessary to extend this research outside Spain and with new data from classroom observations, in order to contrast the perception of students and researchers about the same educational process and to contrast the results obtained in different countries. Likewise, it is ideal to conduct experimental studies based on the model proposed in this work that allow associating good teaching practices to each of the factors identified, and to determine whether their application leads to better academic performance. This would make it possible to evaluate the importance of each factor and how they interact with each other, which may be useful for planning training programs to improve the quality of teaching in higher education.