Quantum algorithm for solving the test suite minimization problem

Abstract Test-suite minimization problem is an essential problem in software engineering as its application helps to improve the software quality. This paper proposes a quantum algorithm to solve the test-suite minimization problem with high probability in , where is the number of test cases. It generates an incomplete superposition to find the best solution. It also handles the non-uniform amplitudes’ distribution case for the system with multisolutions. The proposed algorithm uses amplitude amplification techniques to search for the minimum number of test cases required to test all the requirements. The proposed algorithm employs two quantum search algorithms, Younes et al. algorithm for quantum searching via entanglement and partial diffusion to prepare incomplete superpositions that represent different search spaces such that the number of test cases is incremented in each search space, and updated Arima’s algorithm to handle the multisolutions case. The updated Arima’s algorithm searches for a quantum state that satisfies an oracle that represent the instance of the test-suite minimization problem.


PUBLIC INTEREST STATEMENT
Test-suite minimization problem is an essential software engineering problem that has special importance in software testing. Evolutionary algorithms and other algorithms have been proposed to solve this problem. In this paper, a quantum algorithm is proposed to solve the est-suite minimization problem with high probability. Applying quantum algorithms to software engineering problems gives better results than that obtained using classical methods. This paper proposes a quantum algorithm that uses amplitude amplification techniques to search for the minimum number of test cases required to test all the requirements. The proposed algorithm employs two quantum search algorithms, Younes et al algorithm for quantum searching via entanglement to prepare incomplete superposition, and an updated Arima's algorithm for searching the prepared search space for a possible solution to the given problem instance.

Introduction
Software testing is an essential stage in the software development life cycle. Redundant test cases can arise through the development of software versions. New test cases may arise for new versions of the software that cover the requirements tested by previous test cases for earlier software versions. Thus, software testing minimization techniques are required to reduce the number of test cases that cover the same requirements. Software testing minimization techniques are also required to utilize resources efficiently to reduce the cost and time requirements and to detect software bugs (Harman et al., 2012;Rothermel et al., 1998). There are many metaheuristic algorithms used to solve optimization problems, for example, quantum algorithms, quantum inspired algorithms, genetic algorithms, differential evolution, simulated annealing, ant and bee algorithms, bat algorithm, particle swarm optimization, and others as surveyed (Ahmed, 2016) and discussed next in this section.
Reduction can be performed classically using clustering such as Subashini and Jeyamala in 2014 (Subashini & JeyaMala, 2014) and Wang, Qu, Lu in 2015(Wang et al., 2015 such that the program can be tested using any one of the clustered test cases. These clusters can be classified with the similarity in profiling (Alian et al., 2016) or by using the data mining approach of clustering (Wang et al., 2015). Many quantum algorithms are proposed to solve minimization problems for various applications, such as Anindita and Anirban (Banerjee & Pathak, 2009) who proposed a minimization algorithm for quantum circuit cost in 2010, where they minimize the number of gates in various selected quantum circuits. This algorithm can be applied on other circuits, but to do so, templates have to be developed for the corresponding gate library or the circuit has to be converted into other gate library with a previously prepared template. In 2020, Hussein, et al. (Hussein et al., 2020) proposed a quantum inspired algorithm that performs better than the classical genetic version of the algorithm. Orsan Ozener and Hasan Sozer (Orsan Ozener & Sozer, 2020) proposed a formulation of the test-suite minimization problem in 2020 that solves the issues in heuristic techniques or integer linear programming focusing on a specific criterion or bi-criteria. In 2020, Yinxing Xue, and Yan Li (Xue & Li, 2020) proved that integer linear programming models multi-criteria test-suite minimization then they proposed a multi-objective integer programming approach to solve it.
The aim of the paper is to propose a quantum algorithm to solve the test-suite minimization problem. The proposed algorithm consists of two stages: the first stage prepares an incomplete superposition of a search space with certain properties using Younes et al. algorithm for quantum searching via entanglement . The second stage searches for a solution of the test-suite minimization problem in the prepared incomplete superposition using an updated version of Arima's algorithm for incomplete superposition searching (Arima et al., 2009). This paper is organized as follows: Section 2 reviews the required background and discusses the quantum search algorithms used in the proposed algorithm. Section 3 proposes the quantum algorithm to solve the test-suite minimization problem. Section 4 shows the analysis of the searching phase. Section 5 discusses the proposed algorithm. Section 6 concludes the paper and mentions the future work.

Test-Suite minimization problem
The aim of the test-suite minimization problem is to cover a given set of requirements with the smallest number of tests. The problem can be defined as follows (Panda & Mohapatra, 2017).
Given: A test suite T with a set of n test cases {t 1 , t 2 , t 3 , . . ., t n } and a set R of m test requirements {R 1 , R 2 , R 3 , . . ., R m } that must be satisfied. Each test case must cover one or more requirements and a requirement can be covered by one or more test cases.

Required:
Find the minimum subset of T that tests all the requirements.
For example, Table 1 illustrates a given test suite showing the requirements that are covered by each test case. Many solutions can be found that cover all the requirements, but the target is to find the minimum number of tests that cover all the requirements. For example, all the requirements can be covered with the test set {t 1 , t 2 , t 3 , t 4 } or {t 1 , t 2 , t 4 }, but the minimum set which covers all the requirements is {t 1 , t 2 , t 4 }. It becomes more complicated with large data sets. The TestNo column represents the test case number, while the R 0 s columns represent the requirements that are satisfied by each test case.
A test requirement matrix (TR) has to guarantee covering all the requirements, for example, to minimize the number of test cases shown in the test-suite minimization problem shown in Table 1, we start to represent the problem by its TR matrix as follows: (1) To solve this instance of the test-suite minimization problem that is shown in Table 1, the minimal number of true assignments that satisfy the following Boolean formula should be found as follows: (2) which is a reduction of the test-suite minimization problem to the SAT problem (satisfiability problem).

Quantum computing
The elementary unit of data in quantum computing is the quantum bit or qubit. The qubit can be in a superposed state that is a combination of state j0i and state j1i and when a measurement is performed, the superposition is collapsed to one of the states in a probabilistic way. A combination  of j0i and j1i before measurement is called a superposition as shown in Equation 3 where α; β are complex numbers representing the probabilistic amplitudes of j0i and j1i respectively. The amplitudes must satisfy the condition in Equation 4: and jαj 2 þ jβj 2 ¼ 1: One of the quantum properties is entanglement, which means that each object of the quantum system can't be described independently, and instead, the quantum state has to be described for the whole system (Menon & Ritwik, 2014). Another feature in quantum computing is the parallelism where it takes a quantum computer a single step to operate on N inputs with a single gate, while the classical computer takes N steps for the same input size. Parallelism does not require additional hardware or to wait for other processes to complete, but it performs multiple operations at a time. Quantum gates are unitary operators, considering that a gate has s inputs, and then it can be represented as 2 s � 2 s unitary matrix assuming that state j0i ¼ 1 0 Examples of such gates are: The Hadamard gate with the following effect on j0i and j1i: The X gate which is equivalent to the NOT gate in classical computers maps j0i to j1i and j1i to j0i as shown in the following equation: The Z gate that does not change the state j0i but it converts j1i to À j1i as shown in the following equation: Quantum circuits are combinations of elementary quantum gates to perform a certain task.

Quantum searching algorithms
The aim of this section is to review the three quantum algorithms that will be used in the proposed algorithm: Grover's algorithm for searching an unstructured list via phase shift (Younes, 2008), Younes et al. algorithm for searching an unstructured list via entanglement , and Arima's algorithm for searching in an incomplete superposition (Arima et al., 2009).
Grover's algorithm is a fast searching algorithm, Ventura added the feature of searching in a subset of the whole superposition to Grover's algorithm, Arima amended Ventura's algorithm to guarantee high probability results, and the Partial diffusion operator that updated Grover's algorithm to guarantee high probability results in case of having multiple copies of the searched data.

Grover's Algorithm
Grover's algorithm can search an unstructured database of size N in Oð ffi ffi ffiffi N p Þ given that N¼ 2 n . Grover's algorithm requires the following initialization (Younes, 2008) of qubits where the first n qubits are initialized to state j0i and an extra workspace qubit initialized to state j1i, then it applies H �nþ1 on each of the n þ 1 qubits so that the first n qubits contain the N states representing the list and the extra qubit will be in state ðj0i À j1iÞ= ffi ffi ffi 2 p as follows: Grover's algorithm then applies the oracle operator O G that evaluates a Boolean function f G : f0; 1g n ! f0; 1g. The operator O G gives the amplitudes of the matches a phase shift of e iπ (Younes, 2008) as shown in Equation 13, and the system can be written as shown in Equation 14.
Next, it applies the diffusion operator D on the first n qubits. The diffusion operator D is as follows: where I n is the identity matrix of size N � N. Consider a general system jψ G2 i of n qubits as follows: Applying D on the general system jψ G2 i has the following effect: where α h i ¼ 1 N ∑ NÀ 1 i¼0 α i represents the mean of the states' amplitudes.
The oracle operator O G and diffusion operator D must be iterated approximately π ffi ffi ffi N p 4 times. Measurement is then performed on the first n qubits to reveal one of the searched items.

Younes et al Algorithm
An unstructured database of size N can be searched with higher probability of success with an algorithm that uses the partial diffusion operator (D p ) in Oð ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi N=M p Þ where M is the number of matches satisfying 1 � M � N. This algorithm amplifies the amplitudes of the solutions via entanglement of the search space with the extra qubit that is useful in the preparation of the incomplete superposition that can be done by applying the measurement on the extra qubit (Younes et al., 2003).
Younes et al. algorithm requires the following initialization of n þ 1 qubits where the n þ 1 qubits are initialized to state j0i : then applies H on the first n qubits as follows: This algorithm then applies the oracle operator O Y on n þ 1 qubits where O Y evaluates the Boolean function f Y : f0; 1g n ! f0; 1g as shown in the following equation: Next, it applies the D p operator which can be described in the following form : where j0i's size is 2N¼ 2 nþ1 , and the identity matrix I k is of size 2 k � 2 k . Consider a general system jψ Y2 i of n þ 1 qubits as follows: The state jψ Y2 i can be re-written to where α i ¼ δ k : k even and β i ¼ δ k : k odd.
Applying D p on the general system has the following effect: where α h i ¼ ∑ NÀ 1 i¼0 α i =N is the mean of the amplitudes of the subspace entangled with the j0i of the extra qubit. The O Y and D p operators are iterated q y times where q y is as follows: Assume that ∑0 is the sum over the desired M matches, while ∑00 represents the sum over the N À M undesired matches. The system after q y � 1 iterations can be described as follows: where a q , b q , and c q values are as follows: Let y ¼ 1 À M N and v ¼ 1 ffi ffi ffi N p , and then , Finally, the first n qubits are measured to get one of the searched items.

Arima's Algorithm
Grover's algorithm is effective when the initial amplitude distribution of dataset is uniform, which means that N is equal to the number of stored data, but it is not always effective in the nonuniform cases where N is not equal to the number of stored data. Arima's algorithm was proposed to solve the search in an incomplete superposition.
Given an incomplete superposition jψ in i and an oracle U TR that evaluates to j1i for states in jψ in i, and an oracle U f h . It is required to find any state in jψ in i that makes U f h evaluate to 1. Arima's algorithm is summarized as (Arima et al., 2009), • Initial state jψ in i.
• Repeat q a times, where q a ¼ π • jψ in i ¼ Djψ in 000 i.
• Observe the system.
The test-suite minimization problem definition has been illustrated. Quantum computing basics have been shown. Quantum searching algorithms that will be used in the proposed algorithm have been explained in detail.

Methodology
The main idea of the proposed algorithm is to search the power set of the set of test cases in rounds, excluding the empty set since the problem is promised to have a solution. Each round searches simultaneously the set of all subsets from the power set that have the same cardinality, and if no solution is found, then increment the cardinality of the subsets in the search space next round. For example, given the set of test cases {t 1 , t 2 , t 3 , t 4 }, the first round searches in a search space that contains a single test, i.e. t 1 f g; t 2 f g; t 3 f g; t 4 f g f g, the second round searches in a search space that contains two tests, i.e. t 1 ; t 2 f g; t 1 ; t 3 f g; t 1 ; t 4 f g; . . . ; t 3 ; t 4 f g f g, and so on.
The proposed algorithm prepares an incomplete superposition using Younes et al. algorithm where all the states in the incomplete superposition have the same Hamming distance with the state j0i �n . Starting with hamming distance equals to 1, the proposed algorithm searches for a state that satisfies the oracle that represents the TR using an updated version of Arima's algorithm. If the TR oracle is satisfied with a state in the prepared incomplete superposition, then the algorithm terminates and the truth assignment from the incomplete superposition is reported as the minimum solution, and otherwise prepare the next incomplete superposition with an incremented hamming distance and restart the algorithm.
The proposed algorithm has a maximum of Oðlog 2 NÞ rounds, where the first round will search in n C 1 states and the second round will search in n C 2 states and so on. After log 2 N rounds in the worst case, the algorithm will search the N states since ∑ n i¼0 nC i ¼ N, where state j0i �n is excluded since it is guaranteed that the m requirements will be covered with a subset of the n test cases.

Problem encoding
Given a test-suite minimization problem as shown in Table 1. Assuming that all the requirements will be satisfied by a subset of the set of test cases, the problem can be represented as TR matrix, for example, the problem shown in XTable 1 can be represented as shown in the TR matrix shown in Equation 1. The TR matrix will be the input to the proposed algorithm.
The idea of the proposed algorithm is to represent the tests to be applied as quantum states, for example, given a set of tests {t 1 , t 2 , t 3 , t 4 }. Applying the subset {t 1 , t 2 , t 4 } is represented as the quantum state j1101i. The proposed algorithm prepares an incomplete superposition of states with the same Hamming distance as will be illustrated next, for example, j0001i, j0010i, j0100i, and j1000i have the same Hamming distance with state j0000i, which is equal to 1. The algorithm then searches if the operator representing f TR is satisfied by one of these states, otherwise, increment the number of 1's in the prepared states and repeat the algorithm until a solution with the minimum number of tests is reached.

Preparation of states with a given hamming distance
A TR matrix of size n � m is considered as an input, and this means we will need to prepare n þ 1 qubit system. The quantum circuit shown in Figure 1 illustrates the algorithm steps, which will be explained in detail in this section.
The algorithm first initializes n þ 1 qubit system to j0i so that the system takes the following state: then apply H on the first n qubits to prepare the complete superposition. The system will be The TR matrix shown in Table 1 will be taken as a working example to illustrate the proposed algorithm. Using the working example, the complete superposition stored in jψ 2 i will be equal to The proposed algorithm prepares superposition of states containing a specific t number of 1's that is nt states. For a set of states that have t number of 1's, define f h as follows: . . . ; n 0; o:w: : Using the working example, f h can be written as follows (Younes & Rowe, 2015), where n ¼ 4 and t ¼ 3, This can be represented as Reed-Muller expansion (Younes & Miller, 2004) so it can be represented as follows: then the proposed algorithm applies the oracle U f h for the function f h as shown in the quantum circuit in Figure 2 (Younes & Miller, 2004). U f h is defined as follows: In this paper, the quantum circuits for boolean functions will be represented as Reed-Muller expansions for illustration. In the proposed algorithm, the query complexity will be considered where using a boolean function will be counted once since the optimization of the quantum circuits for boolean functions is a different research area irrelevant to the scope of the paper.  After applying U f h , we apply the partial diffusion operator as will be shown next. The current system can be rewritten as a summation of two subspaces, the subspace of the non-matches that is denoted as ∑ and the subspace of the matches that is denoted as ∑,

States preparation Searching stage
Let us consider the following equation: The proposed algorithm applies D p U f h for q times, where q ¼ π 2 ffi ffi ffi 2 p ffi ffi ffi ffi ffi N nt r : (40) Then, the system will be updated to be Using the working example with t ¼ 3 and n ¼ 4, q will be equal to π ffi ffi 2 p , which is approximately 2 and the system will be updated as follows: Then, the proposed algorithm applies measurement on the extra qubit and proceeds by applying Z then H on the extra qubit, if the outcome is j1>, otherwise restarts the preparation stage. The probability to get j1> on the extra qubit is ntjc q j 2 . If the outcome is j1>, then the superposition can be represented as Using the working example, assuming that U f h is an oracle to prepare all possible states with t ¼ 3, applying D p U f h for q ¼ 2 times prepares all the possible solutions with t ¼ 3 resulting in the following incomplete superposition:

Searching the incomplete superposition
Given an incomplete superposition of states that have a specific t number of 1's, we amended Arima's algorithm (Arima et al., 2009) to search for the assignment that satisfies U TR in the case of having multisolutions. Considering the given matrix TR in Equation 1, f TR can be represented as shown in Equation 2 (Younes & Rowe, 2015). This can be represented as Reed-Muller expansion (Younes & Miller, 2004) so it can be represented as follows: then use the oracle U TR for the function f TR in the algorithm as shown in the quantum circuit in Figure 3 (Younes & Miller, 2004).
The proposed algorithm amended Arima's algorithm to search for the states that satisfy f TR in the prepared incomplete superposition in the case of having multisolutions, so we are searching for states recognized by f h and to satisfy f TR if exists, and otherwise, we prepare another an incomplete superposition with an incremented Hamming distance and repeat the algorithm. The updated Arima's algorithm is applied for π ffi ffi ffi N p 4 times and report whether or not the prepared superposition contains a state that satisfies f TR .
The methodology of the proposed algorithm has been illustrated. Problem encoding, preparation phase, and searching the incomplete superposition phase have been explained.

Analysis of the searching phase
To analyze the dynamics of the searching phase, jψ 0 0 0 y > can be generalized to the form: where we define the average of the amplitudes of the subspace 1 h ∑ h k¼1 α k ðt x Þðji> � j1>Þ of possible solutions at time t x by α k ðt x Þ where k starts with 1 and increments to h where h representes the number of possible solutions. The average of the amplitudes of the subspace 1 mÀ h ∑ m i¼hþ1 β i ðt x Þðji> � j0>Þ of non-solutions by β i ðt x Þ where i takes the range of h to m. Thus, α k ðt x Þ and β i ðt x Þ are the average amplitudes of the subspace 1 NÀ m ∑ N j¼mþ1 γ j ðt x Þðji> � j1>Þ of the stored data, while the amplitude of the other data that were not in the prepared superposition at time t x is defined by γ j ðt x Þ where j takes values from m to N.
To calculate the averages of such amplitudes, we use the following equations: considering that the initial distribution at t x ¼ 0 is arbitrary. The weighted average over states is calculated as follows: Then, the following relation holds based on (Biron et al., 1998): The analysis of the searching phase has been explained in detail. The amplitude averages can then be calculated at any given time t x .

Discussion
A pseudocode is shown in Algorithm 1 to summarize the proposed algorithm.

end if
The target of the proposed algorithm is to search for the best solution for the test-suite minimization problem. Equation 31 shows the system initialization. The n þ 1 qubits are initialized to j0> and then H is applied to the first n qubits as shown in Equation 32. These steps prepares the complete superposition. A superposition of t number of 1's is needed; thus, the oracle U f h and the partial diffusion operator D p are applied for q times as shown in Equation 40.
This step prepares the incomplete superposition that contains a candidate solution we are trying to find. The extra qubit is then measured to amplify the amplitudes of the solutions via entanglement of the search space with this extra qubit. In case that the outcome is j1>, Z is applied on the extra qubit, and then H is applied on this extra qubit to update the system to be as shown in Equation 44. The updated version of Arima's is then applied to search for the best solution in the prepared incomplete superposition. Equation 47 shows the system after applying the updated Arima's algorithm q a times in its general form. Equation 47 also shows the average amplitudes for the possible solutions represented by αðt x Þ, the average amplitudes for the nonsolutions represented by βðt x Þ, and the average amplitudes for the other data that were not in the prepared superposition represented by γðt x Þ and the weighted averages are shown in Equation 51. The average of the amplitudes can be calculated at any time t x by Equation 55. The proposed algorithm has a maximum of Oðlog 2 NÞ rounds.