An initial investigation of the effect of quality factors on Agile test case quality through experts’ review

Abstract A test case is a cornerstone of the testing process; hence, it is quintessential to ensure the quality of the test cases. However, test case design in the Agile testing process has limitations that affect its quality standards, leading to the failure of many software projects. Previous studies are limited in providing a clear guideline for assessing the test case quality. However, evaluating the quality of test cases will help software testing practitioners to understand some critical issues in designing test cases from various perspectives. Therefore, this paper aims to present the verified factors and criteria of test case quality by software testing experts in an Agile environment. The proposed factors and criteria were verified through expert review techniques, which included an online survey of professionals and those in domain and knowledge experts. Twenty-three industry practitioners, including developers, testers, and systems analysts, were used to identify the domain experts. In contrast, the knowledge experts were drawn from 13 academic areas of software engineering and testing. The results showed that the quality of test cases is paramount in Agile projects; therefore, the experts accept seven factors and 32 criteria with a few modifications and suggestions. The finding may assist the researchers in improving the proposed factors and criteria for validation purposes. Hence, may contribute to software testing practitioners as a guideline in constructing, designing, and assessing compelling test cases. Most significantly, the fixed factors and criteria may contribute to the body of knowledge in the software testing domain and industries.


Introduction
Software testing is a quality assurance activity and an important component of any project (Nawaz & Malik, 2008). In an Agile context, a testing process has more value for demonstrating quality products (Harichandan et al., 2014). Brian Marick provides a philosophy of Agile testing as "a style of testing, one with lessened reliance on documentation, increased acceptance of change, and the notion that a project is an ongoing conversation about quality" (Leffingwell, 2010). However, continuous changes adopted in Agile methods require much effort on testing activities (Beck, 2003;Gay et al., 2016;Humble & Farley, 2010). The efficiency of testing activities depends mainly on the test case quality (TCQ), which directly defines the quality of testing Kumar & Ranjan, 2017;Lai, 2017). A good-quality test case is a test case that has high chances to expose defects in a minimum effort, produce more accurate results, increase the performance of the system at a lower cost, and has a high probability of discovering unknown defects (i.e. the higher the quality of a test case, the more the capacity to reveal failures; Gómez et al., 2016;Kamde et al., 2006;Mohi-aldeen et al., 2014;Pressman, 2010;Yadav & Yadav, 2015). However, some issues, such as those detailed in Barraood et al. (2021c), are capable of causing reductions in the quality of test cases. Some of these issues are unspecified test target or expected results (Jovanovikj et al., 2018), incomplete, incorrect, and inconsistent software requirements (Kārkliņa & Pirta, 2018), and traceability between test cases and related requirements are weak (Fischbach et al., 2020). Therefore, it is essential to evaluate and measure the quality of test cases, to understand how much testing is needed and where future test efforts should be applied (Ahmed et al., 2016).
Researchers in Agile methods mainly focused on the quality of the code , with only a few studies that give attention to the quality of test cases. Hence, this paper aims to provide the quality factors and criteria that may assist the Agile team in designing quality test cases, which lead to producing a high-quality testing process in building a good-quality product. The factors and criteria were extracted from the existing empirical studies in TCQ. This study conducted an exploratory survey to investigate the identified test cases' factors and criteria in an Agile environment. An online survey of academic and industry experts was conducted to verify the factors and criteria. Thirty-six experts, 13 academias, and 23 practitioners in Agile organizations participated in the online survey to give their opinions. The study produces 32 criteria of TCQ grouped into seven factors: documentation quality, management quality, maintainability, reusability, requirement quality, performance efficiency, and test case effectiveness.
The rest of the paper is organized as follows. Section 2 introduces the related work. Section 3 presents the methodology, Section 4 summarizes the results and discussions, and Section 5 concludes the paper.

Related work
Significant efforts have been deployed by researchers towards building test cases with quality, as can be seen in (Adlemo et al., 2018;Athanasiou et al., 2014;Bowes et al., 2017;Daka & Fraser, 2014;Jovanovikj et al., 2018;Juhnke et al., 2021Juhnke et al., , 2018Kamde et al., 2006;Kaner, 2003;Kochhar et al., 2019;Lai, 2017;Tran et al., 2019). Kamde et al. (2006) designed a standard checklist to determine the risk areas and improve the test cases. The study provides a set of quality factors of test cases, including correctness, accuracy, economical, reliability and repeatable, traceable, and measurable.  used three attributes of TCQ, code coverage, mutation score, and the total number of failing assertions, as metrics for measuring quality. Athanasiou et al. (2014) built a model that combines metrics to define a test code quality measure. This model combines eight metrics for three quality factors, completeness, effectiveness, and maintainability, for assessing the test code quality. However, the metrics of the aforementioned studies are most applicable to automated test cases, and the model can only be applied to tests implemented in some programming language (Horváth et al., 2015). Bowes et al. (2017) tried to answer the question of "How good are my tests?" by presenting a list of 15 testing principles that capture the essence of testing goals and can be quantified as indicators of test quality. The authors highlight the importance of tests being correct, maintainable, and simple in their findings. They argue that test cases should be simple and meaningful to improve readability, especially in Agile. However, this study focused on smell tests in unit testing and some of their criteria were not implemented and others were without metrics. Lai (2017) proposed a test case quality measurement (TCQM) model for improving TCQ, which ensures the efficiency of continuous testing activities. Four quality factors were identified for TCQ which are documentation, maintainability, manageability, and reusability. These factors include 13 criteria.
Although these studies (Kamde et al., 2006;Lai, 2017) proposed factors of TCQ, the factors definitions are, however, not dependent on the practitioners' goals and opinions.
The recent studies that tried to identify the factors of TCQ based on the practitioner perspective are conducted by (Adlemo et al., 2018;Jovanovikj et al., 2018;Kochhar et al., 2019;Tran et al., 2019). Juhnke et al. (2018) conducted a study related to the quality model of test case specifications. Juhnke identified seven quality indicators of informal test case specifications in the automotive domain from 816 test case specifications as a case study specified by an OEM and suppliers. A test case specification contains a set of test cases necessary to adequately test a particular test object according to defined test objectives. They have conducted another study in Juhnke et al. (2021), which identified nine groups of test case specification challenges. The study lists comprehensibility, unambiguity, completeness, uniformity, atomicity, and suitability for the respective test platform as the factors of test case specification quality. Their model focuses on automotive-specific test case specifications; thus, the model may not be suitable for any other domains. Jovanovikj et al. (2018) proposed an approach called TCQP, a quality plan for evaluating the usability of test cases with appropriate tool support. They proposed four quality factors (i.e., usability, maintainability, reusability, and test effectivity) with four quality criteria (understandability, analysability, comprehensibility, and faultrevealing capability). They defined metrics by using the GQM approach. However, other quality factors and criteria for the test cases are conspicuously missing in their study. Kochhar et al. (2019) provided 29 attributes from practitioners' reviews related to the sound test cases which they grouped into six factors: content, size, complexity, coverage, maintainability, and bug detection.
Similarly, Tran et al. (2019) conducted a study on TCQ from professional developers and testers perspectives. The study identified understandability, simplicity, step cohesion, completeness, homogeneity, issue-identifying capability, repeatability, traceability, effectiveness, efficiency, and flexibility as quality factors for manual written test cases. Tran et al. improved their work in (Tran et al., 2021) by developing a quality model for test artefacts composed of 30 attributes categorized into nine with 20 attributes features from the ISO/IEC 25010 standard. This model also includes measurements for the attributes. The identifiable drawbacks of this model are as follows: 12 out of 18 measures are without descriptions, 19 attributes have no information, the 20 attributes from ISO need further investigation, and this model needs to be validated. Adlemo et al. (2018) used some criteria from (Kaner, 2003) and (Atmadja & Shuai, 2011) for identifying and ranking the TCQ factors. Adlemo et al. (2018) identified and ranked about 15 criteria for quality test cases based on testing experts in Sweden. These criteria include repeatable, accurate, correct, powerful, maintainable, complete, traceable, consistent, reusable, simple, efficient, clear, independent, covering, and compact. However, these studies are limited to providing guidelines for assessing the quality of test cases.
Additionally, they are mostly conducted based on a traditional, non-Agile approach. Among the very few studies related to test cases in Agile software development is Ahmad et al. (2019), which reported that requirements should be unambiguous to ensure the good quality of test cases. In another study, Shrivastava and Jain (2010) presented an automated test case for unit testing (ATCUT) design metrics explicitly focusing on the testability of test cases. The most related study to our approach was conducted by Lai (2017); however, the study did not depend on practitioners' goals to define the factors and criteria. In addition, the factors and criteria are minimal, and some critical criteria are missing in his model.
A comprehensive literature review was conducted to identify the TCQ factors and related criteria. About 14 journal articles based on SLR (Barraood et al., 2021b(Barraood et al., , 2018 and other journal articles were reviewed, and 22 established software testing websites were investigated (for more details, refer to (Barraood et al., 2021a)). The finding from the literature reviewed and software testing website led to identifying seven TCQ factors, documentation quality, management quality, maintainability, reusability, requirement quality, performance efficiency, and effectiveness of test case, and 32 related criteria. Table 1 illustrates the identified criteria grouped into seven factors and their resources.
Seven factors and 32 criteria are identified from the literature review for the quality of the test cases model in Agile. However, the domain and knowledge experts need to verify these factors and criteria. The validation of the proposed model will be performed after getting feedback from the experts, and the proposed factors will be amended based on the feedback from experts. Hence, all the proposed factors and related criteria shall be accepted based on the experts' feedback. The verification procedures are discussed in the next section.

Research methodology
The exploratory study was part of the research work as a pre-evaluation stage. In this study, an online survey was used to get opinions and pre-verification from the experts on the identified quality factors and criteria in the Agile environment. The study adopted judgment sampling, which is a kind of purposive sampling method to select the experts who responded to the instruments. Figure 1 illustrates the steps of the research methodology in this paper. As mentioned earlier, these quality factors and criteria are identified from literature reviews (Barraood et al., 2021b(Barraood et al., , 2018 and established software testing websites (Barraood et al., 2021a). The respondents were asked to provide their rating on the proposed criteria based on the 5-Likert scale provided in the instrument.

Subjects:
The two categories of subjects for this exploratory study are experts from industry and academics. The industry experts were obtained through contacts on social media (Facebook, LinkedIn, YouTube, and WhatsApp) and by contacting friends working in the software industry from different countries such as Malaysia, Yemen, India, and Jordan to get many of respondents in Agile area. These experts were selected based on the criteria that they are i) Agile Testing practitioners and ii) have experience in Agile Testing and quality assurance for more than 3 years. This is as adapted from Mohamed (2015) and Tran et al. (2019). Academic respondents were contacted through emails based on the following criteria, as suggested by Hallowell and Gambatese (2009), Rogers and Lopez (2002), Mohamed (2015), and Rajaram et al. (2021): i) currently lecturing in the field of the study, ii) holds an advanced degree such as PhD in Software Engineering or software testing, iii) faculty members at an accredited university, iv) have authored book/academics materials related to the software testing, and v) have at least 5 years of experience in software testing. The selected responded from academics category are experts in software engineering from six Malaysian universities (UUM, UKM, USM, UM, UTM, and UPM) and three Nigerian universities (ABU, BUK, and SSU). They were introduced by senior faculty advisors and friends based on their excellent reputation in the software testing domain.
The experts were asked to indicate their willingness to participate in the survey after they were informed about the purpose of the study. The verification forms were then sent to the experts who accepted the invitation emails, and they were asked to spread them to their colleagues and friends in the related field. The thankful notification was sent to the experts who apologized for their inability to participate in the exercise.
The instrument was validated through face validity by three experts in software engineering and testing to ensure that the items were comprehensive and appropriate to the targeted construct and assessment objectives. Additionally, the instrument's structure needs to be friendly, with clear instructions, not too lengthy, and written in a well-understandable language. This number of experts is appropriate based on the assertion by (Lynn, 1986;Waltz et al., 2010) that three to five experts can adequately face validate an instrument. The instruments were refined based on the experts' suggestions before the actual administration to the target experts.

Data Collection:
The instruments were sent to many experts, but only 41 were returned. Data collection took 2 months more in contrast to a planned 1-month schedule for the respondents to answer the instruments. Reminders were sent to the respondents who failed to return the instruments at the expiration of the given time Five instruments with incomplete answers were rejected because the respondents' companies do not follow Agile methodology, so they could not complete the instrument. Therefore, the valid instruments were 36 (13 academicians and 23 practitioners). These 36 instruments were complete without blank answers (no missing data). The professional experience of the respondents varies from 1 year to more than 10 years. The majority (52.7%) have between 1 year and 5 years of experience, and 33.3% have more than 5 years and less than 11 years of experience. This data was collected in 2 months, between 3 May and 2 July 2020. Several reminders were sent to those who failed to respond until the last day. Table 2 shows the overview of respondents. For the model validation stage, the data set will be based on the testing reports that use IEEE Standard for Software Test Documentation (IEEE 829, 2008), followed by the interview session with the software testers regarding the process flows of designing constructing the test cases. The validation process will be discussed in the next journal article.

Data analysis:
The collected data were analyzed using descriptive statistics to describe the opinion of respondents about the factors and criteria of the test case in Agile software development. Their ratings are converted to a 5-Likert score, which is mapped as strongly agree, agree, neutral, disagree, and strongly disagree to 5 to 1, respectively. Frequency, mean, and crosstabulation were used for analysis by the SPSS tool version 26. Descriptive analysis was used to compute the mean of each criterion and factor. By the mean value, it can be known which criterion and factor are more suitable to measure good TCQ among the respondents. To support or reject the proposed criteria and factors, the respondents' comments and suggestions were also sorted, which the authors summarized.

Results and discussion
In this section, the feedback of the experts on the importance of test case quality in Agile Software Development (ASD) and the proposed TCQ factors and criteria are presented.
(A)Importance of test case quality: The respondents were asked to provide their opinion on the importance of good-quality test cases in ASD. Six questions were given to gain their views on that. The results showed that all respondents agreed that the test case quality in ASD is critical to 1) improve the software quality, 2) meet customers' needs, 3) improve the adequacy of testing, 4) reduce defects and bugs, 5) decrease time of delivery, and 6) reduce the cost of development, as shown in Table 3. However, the benefit of TCQ to improve software quality has the highest value (4.69) among others. This is followed by (4.50) to show that TCQ is very important to meet customers' needs.
(1) Software Quality Improvement: Software product quality depends on the testing quality (Gupta et al., 2018;Kayes et al., 2016). The testing process has more value for demonstrating the quality product in an Agile environment (Harichandan et al., 2014). Testing can detect that the software has faults and estimate its likely overall quality (Ahmed et al., 2016). Software quality improves when these faults are corrected (Ahmed et al., 2016). The test case is a significant asset in software testing activities (Lai, 2017). A good-quality test case has a high chance to reveal defects (Gómez et al., 2016;Kamde et al., 2006;Mohi-aldeen et al., 2014;  2010; Yadav & Yadav, 2015). Therefore, this study intended to investigate the influence of test case quality on increasing software quality improvement. The result showed that most respondents emphasized that good quality of test cases can improve the quality of software products, as in Table 2.
(2) Decreasing Delivery Time: Agile Software Development has many advantages such as frequent delivery (Matharu et al., 2015;Penmetsa, 2017). ASD divides the entire project into smaller pieces or iterations, each of which is handled separately to control time and item change risks (Lai, 2019;Rajasekhar & Shafi, 2014). Testing activities are achieved in each iteration (Crispin & Gregory, 2009;Javed et al., 2019). Every iteration has to be fully integrated and carefully tested as a final production release (Penmetsa, 2017). Delivery time is a critical factor for agile product success (Sanaa et al., 2016). Since the iterations make the software development effective and efficient to meet the customer's requirements and contribute to the project's success, it also makes the development process a little more complicated and time-consuming (Javed et al., 2019). This complication is because each iteration in ASD contains many activities such as requirement analysis, design, implementation, testing, and deployment (Javed et al., 2019). Short iterations in ASD imply that there must be an efficient testing process to avoid too much time being spent in the iteration (Olausson et al., 2013). Thus, the efficient testing process needs effective test cases. This study confirmed that good-quality test cases could reduce delivery time, as rated by respondents at 3.97, which is close to 4 (agree scale).
(3) Decreasing Development Cost: The continuous changes of requirements in ASD may affect a large number of test cases (Beer & Felderer, 2018) and increase the number of test cases to be executed, which can increase execution cost and duration of software projects (Do, 2016;Hynninen et al., 2018). The good-enough testing of software should sufficiently assess quality at a reasonable cost (Goeschl et al., 2010). The good quality of test cases is essential for assuring the quality of software (Tran et al., 2019;Tudjarova et al., 2017) because the higherquality test cases have a high chance to reveal defects, increase the performance of the system, and reduce the cost of testing and maintenance (Gómez et al., 2016;Kamde et al., 2006;Mohi-aldeen et al., 2014;Pressman, 2010;Yadav & Yadav, 2015). Thus, it can reduce the cost of development as indicated by the experts through the majority of them agreed (3.81) about it. Consequently, this condition explains why test case quality issue has attracted researchers as demand for global quality software with rapid delivery has become more competitive for organizations due to the volatile nature of software testing.
(4) Meet Customer Needs: Companies worldwide adopt Agile methodologies to meet increased software complexity and evolving user demands (Matharu et al., 2015;Penmetsa, 2017). Customer satisfaction is one of the crucial ASD advantages (Matharu et al., 2015;Penmetsa, 2017). When the iteration is developed and tested, the customer sends the system for feedback which they later provide as feedback in the form of stories. When the required functionalities are delivered, customers stop writing stories and development (Olausson et al., 2013). The continuous changes in customer requirements increase the importance of testing practices in ASD methods (Penmetsa, 2017). The testing tasks in ASD should be appropriately prepared to cater for continuous changes in the requirements (Yu, 2018). In order to nimbly test a software system during ASD, it is crucial to identify what to test (e.g., requirements) and how to test it (i.e., test cases; Bucaioni et al., 2018). The test cases in ASD must be developed as the requirements evolve (Lewis, 2009). Therefore, most experts strongly agree (4.5) that good-quality test cases can meet the customer requirements.
(5) Improving Adequacy of Testing: In designing test cases, it is notable to ensure that testing achieves a certain level of thoroughness (Romli et al., 2020). This thoroughness will determine how adequate the testing is (Hayhurst, 2001). Adequacy of testing can be measured by code coverage, where there is a relationship between coverage criteria and fault detection (Ahmed et al., 2016). Hence, it can distinguish between good and bad test cases and determine whether the testing is enough (Zhu, 1995

Factors and criteria of TCQ
The respondents were asked to investigate and provide their rating on the proposed factors and criteria of TCQ in ASD. The experts provide feedback on the proposed 32 TCQ criteria, grouped into seven factors: documentation quality, management quality, maintainability, reusability, requirement quality, performance efficiency, and effectiveness. Table 4  (1) Criteria_mean = ( ∑ n k¼1 p k )/n, k = 1,2, . . .,35, 36 Where n is the number of experts, p is the perspective value in the Likert scale (1 to 5) given by the experts. Correctness = ( ∑ n36 k¼1 p k )/36 = 154/36 = 4.28 (154 is the total rating of 36 experts) Traceable The ability to trace a requirement from its conception through its specification to its subsequent design, implementation and test

4.19
Performance Efficiency

4.09
Bug detection A test case should possess the ability to discover bugs easily 4.42

Resource Utilization
The amounts and types of resources required to perform a test case are enough 3.92

Time behaviour
The mean time is the time taken by a test case to respond to a user task or a system task. (2) Documentation_quality = MEAN (Uniform_format, Correctness, Completeness, Consistency, Readability, Understandability, Independent, Specific) As shown in Table 4, the mean of all TCQ criteria is more than 3.0, which ranged from 3.42 (simplicity-maintainability) to 4.67 (accuracy-effectiveness). That means the respondents accepted all the proposed criteria. Twenty-one out of the 32 criteria receive a Likert score of 4.0 (i.e., agree) and above (strongly agree). By the mean value, it can be known which criteria are more suitable to measure good TCQ among the respondents. The mean of the factors of TCQ in ASD is calculated by getting the average of the related criteria means. The highest score is 4.47 for effectiveness, while the lowest is for requirement quality (4.03). However, the respondents agreed with all the factors with scale value of more than 4 (i.e., agree). In addition, most respondents comment that these factors and criteria are reasonable, satisfied, and sufficient. Following are the descriptions of the experts' feedback about the factors and criteria of TCQ.
(1) Documentation Quality: A documentation of a test case is considering the essential item for generation, change, revision, integration, and reuse of test case (Lai, 2017). According to Kochhar et al. (2019), well commented, named, and designed test cases may serve as good reference documentation. Aamir and Khan (2017) assured that test case documentation should be performed to develop a quality product. In addition, most developers in Li et al. (2016) survey emphasized the importance of test case documentation to improve product quality. The mean of documentation quality is 4.1, which is calculated based on its criteria score. This value refers to the respondents agreeing that test cases in ASD should be well documented. The eight proposed criteria for this factor are uniform format, correctness, completeness, consistency, readability, understandability, independent, and specific. The findings showed that the whole criteria are essential for measuring test cases documentation quality, as shown in Table 4.
(a) Uniform Format: Uniform test case items can assist in generating practical test cases which reach the test objective (Lai, 2017). The test case documentation should quickly inspect uniform items (Lai, 2017). Therefore, most respondents agreed (the mean is 4.00) that this feature is one of the criteria for getting good-quality test cases.
(b) Correctness: Correctness of a test case is the degree to which a test is free from faults in its specification, design, and applied algorithms, as well as in returning the test results (Zander-Nowicka et al., 2008;Zeiss et al., 2007). The test is correct when it always returns correct test verdicts and when it has reachable end states (Zander-Nowicka et al., 2008;Zeiss et al., 2007). The test case should provide the correct results with the needed degree of precision. Any ambiguity should be removed in test case execution to gain high quality (Adlemo et al., 2018;Bowes et al., 2017;Kamde et al., 2006). The correctness is the most exciting criteria of test case documentation quality based on experts' opinion, with a mean score of 4.28.
(c) Completeness: A test case should contain all relevant information for its execution (Tran et al., 2019). Sometimes test cases might be incomplete and not cover some functional specifications (Chernak, 2001). In addition, test-design logic could be incomplete, and some of the necessary test conditions could be missing in test case specifications (Chernak, 2001). Therefore, when test cases cover all system functionalities under test, they will provide good-quality test cases (Boghdady et al., 2011). The respondents (experts) provide this high criterion score (4.08) to insist on the completeness of test case documentation.
(d) Consistency: Consistent test case means to always adhere to the same rules. Using the same pattern in organizing the test cases makes the testing easier (Adlemo et al., 2018;Tran et al., 2019). If the test cases are consistent, they are much easier to combine, and the cost of switching between test cases (context switch) becomes much lower (Adlemo et al., 2018). The test case should be consistent to ensure high quality (Lai, 2017).  Beniwal (2015), most defects are found through reading, so the test case readability should be more considered. Test case readability is optimizing the test case understandability (Daka & Fraser, 2014;Setiani et al., 2020). So, any tester can understand it by reading it once (Adlemo et al., 2018;Bowes et al., 2017;Kaner, 2003). The readable test cases benefit developers when performing maintenance tasks on the source code (Grano et al., 2018). Readability is very important and highly rated by the respondents, hence scored the second-highest (4.25) among other test case documentation.
(f) Understandability: According to Tran et al. (2019), the information of a test case (e.g., name, objective, precondition, steps, terms) should be easily understood by both testers and developers. Many researchers claim that the test case should be understandable for its quality (Bowes et al., 2017;Daka & Fraser, 2014;Jovanovikj et al., 2018;Kochhar et al., 2019;Li et al., 2016;Setiani et al., 2020;Shrivastava & Jain, 2010;Tran et al., 2019). Understanding the test cases by developers takes a long time, especially automatically generated tests compared to manually written ones (Shamshiri et al., 2018). As shown in Table 4, the score of "understandability" is high (4.22), referring to the experts' insistence on the necessity of test case to be understandable.
(g) Independent: A test case must be self-standing and executed without any tester performing it. One must run a test case individually, and the success or failure of an earlier test case must not affect the outcome of the test case (Kochhar et al., 2019;Limaye, 2009). This criterion allows adding new test cases without considering any effects they might have on existing test cases (Adlemo et al., 2018;Bowes et al., 2017). According to Adlemo et al. (2018), the independent test case allows a tester to execute them feasibly. Also, it can be easier to use them as building blocks to build flows. In addition, the stronger the independence of a test case, the better its reusability (Juan et al., 2010). Despite the benefits of test case independence, it did not receive a favourably high score from the respondents (3.94). However, it is still close to agreeing that this feature is one of the criteria of test case documentation quality.
(h) Specific: A good test case should test one aspect of a requirement and address a unique functionality, thereby not wasting time and resources (Chauhan, 2010;Kochhar et al., 2019). Each test case should test one requirement in a specific way, and if the requirements have many facets, many test cases should be written (Adlemo et al., 2018;Bowes et al., 2017;Kamde et al., 2006;Kaner, 2003;Lai, 2017). The Single-Condition Tests principle (Meszaros, 2007) suggests that test cases should not verify many functionalities at once to avoid test obfuscation. These criteria of a test case make it as test focus, so it is easier to understand and, therefore, it eases fault localization and debugging tasks (Grano, 2019). However, based on our findings, this criterion got the lowest rate among other criteria (3.89). That the figure is close to agree refers to the fact that the criterion is accepted to be one of the quality criteria of test cases.
One of the respondents commented on this factor (documentation quality) as: "I agree with all these points but put in mind if these implemented completely, delivery will be late, but it will be solid and reduce bugs". This comment refers to the importance of quality content for quality test cases, although it implementation can hinders the fast delivery of the product.
(2) Management Quality: Test case management is concerned with the control processes of the test cases (Kao et al., 2016). Due to the requirements changes in ASD, a test case like other development documents must be flexibly changed or adjusted and easily retrieved and recovered (Lai, 2017). The features of a test case should be manageable to create room for quick recovery of previous versions, easy retrieval of the different levels of the test cases, and version control which is the detailed record of the tester, date of revision, reasons, and other related activity (Lai, 2017). The respondents (experts) agreed that a management quality factor could provide good quality of test cases in ASD with a mean score of 4.06, which refers to the agreed scale. Table 3 also shows that the respondents accept the criteria of this factor. (a) Retrieval Quality: Test cases can be retrieved quickly by their identifications at different levels (Lai, 2017). This feature is significant for the quality of test cases, and to be sure about this point, the respondents were asked to rate it. Their rating was 3.97, which is acceptable as one of the test case quality criteria.
(b) Recoverability: It means the degree to which the previous state of the test case can be reestablished quickly in the event of an interruption or a failure (Lai, 2017). The respondents gave this criterion the lowest rate (3.92) among management quality criteria, but it is still acceptable as one of the test case quality criteria.
(c) Version Control Mechanism: A good test case contains a detailed record of the revision date, reasons, tester, and related activity (Lai, 2017). Any changed test case has the complete documents in the version control system (Lai, 2017). The version control systems store and reconstruct past versions of program source code (Ball et al., 1997). "Version control mechanism" had the highest agreement from most respondents, with 4.31.
(d) Prioritization and Organization: Test cases should be arranged in chronological order based on criticality. According to Afzal (2007), not all test cases are equally important; therefore, the test cases need to be prioritized. The test case prioritization is needed because full regression testing requires a lot of time and system resources; therefore, there is a need to identify which test cases should be run first (Berberyan & Ali, 2019). Hence, the respondents gave this criterion a high score of 4.06, which is more than agree. In addition, one of the respondents commented on this factor (management quality) as "Sometimes when we don't have much time, so we are covering importing test cases", which means that they test the most critical test cases first in case of limited time. This comment supports the prioritization of test cases "prioritization and organizability".
(3) Maintainability: Zeiss et al. (2007) define test case maintainability as the degree to which the specified test cases could be modified with ease due to changes in the software. Test cases should be written so that it is easy to maintain. Therefore, for any changes in requirements, the tester should effortlessly manage the test cases (Adlemo et al., 2018;Bowes et al., 2017;Lai, 2017). According to the findings of this study, the academic and industry experts confirmed that maintainability influences test case quality. Consequently, this factor was deemed important to measure test case quality in ASD. Therefore, the average criteria scale of maintainability is 4.04, which means the experts insist on test case maintainability. Most of the criteria (i.e., changeability, update regularly, traceability, peer review) had more than 4 rating scales, making them acceptable to measure test case maintainability. Some respondents commented on test case maintainability, such as "Add the negative impact of writing unnecessary or duplicate test cases" and "Whenever requirements or code change, test cases should be fixed". These two comments insist on not including the unnecessary test cases and updating the test cases in case any changes occur, perspectivity. (a) Simplicity: A test case must be simple to understand so that the tester does not have to do any research or invention of what is to be done next (Limaye, 2009). Li et al. (2016) claim that developers do not need help on simple test cases compared to complex test cases. The simplicity can lead to less chance of making mistakes and ease a test case's maintainability (Adlemo et al., 2018;Bowes et al., 2017;Lai, 2017). Simplicity was the only acceptable criterion of test case maintainability, where 11 out of 36 respondents disagreed with. While the average rating is still up to 3.42, this criterion is still acceptable.
(b) Changeability: The test case steps should be easily modified based on the changing requirements. A test case is changeable if its structure and style are such that any changes can be made easily, completely, and consistently (Juan et al., 2010). Since agile welcomed the changes, this criterion (i.e., changeability) is critical in the test case quality. Therefore, the experts gave it a high score (4.28).
(c) Traceability: The ability to trace requirements in a specification to their origin from higher to lower-level requirements in a set of documented links (Ali, 2006). To nimbly test a software system during its agile development, it is crucial to identify what to test (e.g., requirements, code and development artefacts) and how to test it (i.e., test cases). The traceability of test cases will help to easily change the impacted test cases if any of the requirements get changed (Beer & Felderer, 2018). Traceability links should be maintained between test cases, code, and requirements (Kochhar et al., 2019). The traceability rating scale is 4.19, implying that this criterion is accepted in the test case maintainability criteria.
(d) Peer Review: The test case should be corrected by the test case reviewer. Whenever requirements are altered, test cases need to be updated. Testers should plan and update test cases for sprint stories (Collins et al., 2012). Therefore, test cases need to be regularly reviewed and revised, and new tests need to be written to exercise different parts of the software to potentially find more defects (Quadri & Farooq, 2010). Reviewing test cases and regular communication between developers and testers were also highly recommended by practitioners (Tran et al., 2019). In our findings also, the respondents recommended it by giving it a high score (4.08).
(e) Update Regularly: A test case must be updated when the underlying code changes (Adlemo et al., 2018). According to Li et al. (2016), developers can more easily identify test cases that relate to some new or modified functionality of the system by relying on up-to-date documentation. Due to the importance of updating test cases, the respondents provide a high score of 4.25.
(4) Reusability: Reusable test case is a test case that can be used at various application-level s (Juan et al., 2010). Some test cases in basic testing levels assist in combining into test cases that are of higher levels (Lai, 2017). Hence, test cases should be reusable to improve the quality and efficiency of agile testing, where time is a rare resource that makes reusable test cases very attractive (Adlemo et al., 2018;Lai, 2017). For this, the respondents confirmed that the reusability of a test case should be one of the TCQ factors through the average Likert scores for the proposed criteria of test case reusability. The respondents agreed that the test cases should have the characteristics such as automaticity, extensibility, simplicity, repeatable, and combinability to be reusable. (a) Automaticity: The automaticity of a test case is the ability of the test case to be run frequently by recording test data and test results as the format of testing tools (Lai, 2017). The automated test cases can help deliver more effectively and in shorter timescales (Rajasekhar & Shafi, 2014). Therefore, it would be better to automate the test cases (Aamir & Khan, 2017). Our respondents rate this criterion 4.08, which is more than agree.
(b) Extensibility: Each test case should have the ability of easy adjustment and modification to meet the extension requirements and increase the efficiency of maintenance operation (Lai, 2017(Lai, , 2019. The rating scale that this criterion gains is 4.14, which show the importance of this criterion to reusability. (c) Combinability: Test cases in higher levels of testing need to combine test cases at base levels to increase the testing efficiency (Lai, 2017). However, this criterion is not highly favoured by the experts based on their rating (3.92), close to being agreed, but still acceptable.
(d) Simplicity: The needed extensions and application rules should be easily adapted to the current testing procedures. The more items a test case tests, the more complex it would be, and thus the less reusable (Rava & Daengdej, 2014). Therefore, the test case must be simple to be reusable. The experts confirm this by providing a 4.08 rating scale, which is more than agree with this criterion.
(e) Repeatable: a test case should provide the same result if run many times for the same inputs (Adlemo et al., 2018;Kamde et al., 2006;Kochhar et al., 2019;Sundmark et al., 2005;Tran et al., 2019). If a test case is repeatable, it will be fast and efficient to execute (Adlemo et al., 2018;Eldh et al., 2011;Kamde et al., 2006). One of the respondents commented on this factor "Repeat test leads to higher cost in development. Therefore, good and specific test cases can reduce the time and effort". This comment is negative on repeatability but positively impacts on the specific criterion. However, the mean of repeatability is high (4.11), which means the respondents unanimously agreed. The repeatable test cases reduce the time, and effort as they help to reduce the time by not rewriting the same test case many times whenever needed.
(5) Requirement Quality: Requirement is a statement describing the proposed system where all stakeholders agree must be made true for the customers' problems to be genuinely solved (Hussain et al., 2016). It is essential to create test cases based upon the requirements (Craig & Jaskiel, 2002). The requirements should be well known before designing test cases (Beer et al., 2017) because the aim of testing is requirements verification (ensure that the requirement is correct) and validation (confirm that the solution meets the requirements), by defining test cases and designing a rationale (Allala et al., 2019;Limaye, 2009;Schedlbauer, 2012). The respondents were asked about the rating of impact on the quality of test cases by the quality of requirements. According to Mannion and Keepence (1995), the criteria to measure requirement quality should be specific, measurable, attainable, realizable, and traceable (SMART). Other criteria were accepted as well, but they have a mean less than 4 in the range of 3.75 to 3.94 (see , Table 4) which is close to 4. The respondents sometimes suffer in analyzing the customer requirements as one of the respondents commented about this issue "We really facing issues with requirements we receive from a business analyst. As we have a lot of discussions and questions because requirements are not ready for implementation. Also, the lack of understanding of the existing implementation which results difficulties in the transition to new requirements or modifications." (a) Specific: The requirement should say what is required, and this includes that it should be clear, complete, and accurate (Mannion & Keepence, 1995;Quadri & Farooq, 2010). The requirement should not have unnecessary redundancy of information. The extraneous words should be removed. Most respondents agreed or strongly agreed about specific where its rating scale becomes 4.33, which is the most significant scale among other criteria of requirement quality. As a result, the story requirements should be specific to provide goodquality test cases in ASD. Twenty of respondents strongly agreed that requirements should be specific.
(b) Measurable: It is possible to verify that the requirement has been met. The level of detail required to describe and set up the corresponding test is a strong indicator of whether the requirement should be broken down into sub-requirements. The requirement should specify a fixed performance against a predefined set of test cases (Mannion & Keepence, 1995). In addition, the requirement should be specific to be measurable (Mannion & Keepence, 1995). The respondents agreed to this criterion by giving it an average of Likert Scale 3.94, which is the exact value of "realisable".
(c) Attainable: The attainable requirement means that implementation of the requirements can be achieved on time, within budget (Mannion & Keepence, 1995). We asked the experts whether this criterion was important for test case quality, but they were unsure about it, so they gave it the lowest scale of 3.75. Though this value is low, it is still acceptable.
(d) Realisable: The realisable requirement can achieve the requirement given what is known about the constraints under which the system and the project must be developed (Mannion & Keepence, 1995).
(e) Traceable: The ability to trace a requirement from its conception through its specification to its subsequent design, implementation, and test, that to say, from the requirement to test cases across design and implementation artefacts. This implies, from the requirement to test case across design and implementation artefacts (Bucaioni et al., 2018;Mannion & Keepence, 1995). The traceability between requirements and test cases can provide valuable feedback on how the verification and validation have been operated and its progress status in the sprints (Bucaioni et al., 2018). Based on our survey findings, 16 and 15 of respondents strongly agreed and agreed, respectively, that test cases should be traceable. One of the respondents opined that "Traceability of test cases is a very important topic for customer". Based on this importance, the respondents provided a high score (4.19) for traceability, which is considered the second-highest score related to requirement quality after Specific (4.33).
(6) Performance Efficiency: Performance efficiency is one of the ISO/IEC 25010 quality characteristics (ISO-IEC 25010:2011) which is defined as "the capability of the software product to provide appropriate performance, relative to the number of resources used, understated conditions." Efficiency is frequently used as a quality factor for assessment (Yan et al., 2019). A good test case quality is more likely to discover severe bugs (Adlemo et al., 2018;Kaner, 2003), which refers to test case efficiency (Limaye, 2009). The respondents were asked to provide their scales about the proposed criteria (i.e., bug detection, resources utilization, and time behaviour) to measure the performance efficiency of test cases in ASD. The results showed that the three criteria are accepted. (a) Bug Detection: A test case should possess the ability to discover bugs easily. Bug detection is one of the main reasons for writing test cases (Kochhar et al., 2019). When practitioners write a new functionality or add a piece of code, they need to test whether that code is working fine or not (Kochhar et al., 2019). A good test case quality is more likely to discover severe bugs (Adlemo et al., 2018;Kaner, 2003). This should help to identify issues and weaknesses of features/functions (Tran et al., 2019). The results shown in Table 4 refers that the respondents strongly agree about bug detection. The mean rating for bug detection is 4.42.
(b) Resource Utilization: The number of computing resources and code required by a program to perform its function should be enough (ISO-IEC 25010:2011). The respondents closely agreed about this criterion for good quality of test cases, so they scaled it as 3.92.
(c) Time Behavior: Test cases should use tags or categories, such as slow tests, fast tests, and so far, to run a specific set of tests easily at a time (Adlemo et al., 2018). The respondents were asked to rate this criterion as one of the performance efficiency criteria for test case quality. The result was 3.94, which is acceptable.
(7) Test Case Effectiveness: A test case effectiveness means that the test case covers the expected requirements (Tran et al., 2019). It shows how to determine whether a set of test cases sufficiently and effectively reveal defects (Chernak, 2001). To design a good and practical test case, a test case should cover all features and the expected requirement, but the developer should not make too many test cases (Tran et al., 2019;Yamaura, 1998). The value of test case effectiveness as a result of calculating the average of coverage and accuracy of a test case is 4.47, which is the highest value of the proposed factors of test case quality (refer to Table 4). The results showed that the proposed criteria of test case effectiveness (i.e., coverage and accuracy) are very important for high TCQ. (a) Coverage: Coverage should be used to understand what feature(s) is/are missing or covered in test cases as expected in the requirements. Coverage information can help developers and testers find parts of the code that are not covered and might contain bugs (Kochhar et al., 2019). According to Grano (2019), a test case is more effective when it has a high statement coverage and does not contain test smells. The results showed that the proposed criterion of test case effectiveness (i.e., coverage) is essential for good TCQ. The average ranking is 4.67, which is the highest scale among other criteria of test case quality.
(b) Accuracy: A test case is accurate when it produces results that correspond to the expected outputs (Adlemo et al., 2018;Kamde et al., 2006). It must describe the desired result correctly to avoid confusion on whether the test case has passed or failed (Limaye, 2009). If the tester does not know the expected output and assumes the result given by the computer is correct, the tester will not be able to detect these logic errors (Atmadja & Shuai, 2011). The respondents strongly agreed about this criterion with a high Likert scale (4.28) given.

Conclusion and future work
Software testing is an essential activity in Agile methods. Writing test cases, therefore, has a great significant impact on the testing process's activity. This paper presented an exploratory study on the investigation of test case quality and its importance in ASD from experts' perspectives. A survey was conducted on 36 academic and industry experts from different countries to examine the importance of test case quality in ASD. The survey investigates 32 criteria, grouped into seven factors, comprising documentation quality, management quality, maintainability, reusability, requirement quality, performance efficiency, and effectiveness of TCQ in ASD. The experts insist on the importance of good quality of test cases in ASD to improve the software quality, meet customers' needs, improve the adequacy of testing, reduce defects and bugs, decrease delivery time, and reduce the development cost. The positive response of the experts on the factors and criteria indicates its suitability for measuring the quality of test cases in ASD. This study is part of research on designing a measurement model for the quality of test cases in Agile Software Testing. At the moment, none of the test case studies constructed the same factors related to documenting its quality, together with related criteria and metrics. Because the system is tested using the quality assurance technique indicated in the proposed test case quality model, the research could help to produce high-quality system applications utilising the Agile testing approach. Furthermore, it may contribute to the software quality auditor and selfquality assessment in software industries. The future work will focus on defining metrics and measurements for measuring these verified factors and criteria of test case quality in Agile projects.