An Attack on Hollow CAPTCHA Using Accurate Filling and Nonredundant Merging

ABSTRACT As one of the effective access control mechanism, CAPTCHA can provide privacy protection and multimedia security for big data. In this paper, an attack on hollow CAPTCHA using accurate filling and nonredundant merging is proposed. Firstly, a thinning operation is introduced to repair character contour precisely. Secondly, an inner-outer contour filling algorithm is presented to acquire solid characters, which only fills the hollow character components rather than noise blocks. Thirdly, the segmentation to the solid characters produces mostly individual characters and only a small number of character components. Fourthly, a minimum-nearest neighbor merging algorithm is proposed to obtain individual characters without redundancy. Lastly, the convolutional neural network (CNN) is applied to acquire the final recognition results. The experimental results based on the real CAPTCHA data sets show that comparing with the existing attack methods on hollow CAPTCHA, the proposed method has higher success rate and superior efficiency.


INTRODUCTION
With the proliferation of network, huge amounts of data are being explosively generated every day. How to protect the security and privacy for big data becomes a key issue in this field. As one of the effective access control mechanisms, CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart [1]) is widely applied to many large websites, including Google, Yandex, and so on. Nowadays, CAPTCHA has played an increasingly important role in privacy protection and multimedia security for big data.
At present, the frequently used CAPTCHAs are textbased CAPTCHA, image-based CAPTCHA, shortmessage-service -based CAPTCHA, audio-based CAPTCHA, etc. Among them, the most widely-used CAPTCHA is the text-based schemes [2], an image containing several distorted characters. The typical types of text-based CAPTCHA are solid CAPTCHA, hollow CAPTCHA, 3D CAPTCHA, etc.
In order to verify the security and reliability of CAPTCHA, some corresponding attack technologies came into being. The research on CAPTCHA attack is always a win-win result [2]. Some CAPTCHAs are hard to break, and their features can be used as an important theoretical reference for designing high-security CAPTCHA systems. Some CAPTCHAs can be successfully broken, relevant techniques can be extended to other related fields, such as document recognition [3], speech recognition [4] and network localization [5]. It is of great significant to research the CAPTCHA attack technique.
In this paper, we propose an attack on hollow CAPTCHA using accurate filling and nonredundant merging (AF& NM). The main contributions of this paper are as follows.
(a) A new framework of hollow CAPTCHAs attack is presented. Within this framework, five steps are effectively integrated, including preprocessing, filling, segmentation, merging, recognition. It can improve efficiency due to a classification of segmented images. If the segmented image is an individual character, it will be recognized directly; otherwise, it will be merged firstly and then recognized. (b) A thinning operation based on removing pixels iteratively is introduced in the preprocessing stage. It can repair the character contour precisely while retaining the original structure. (c) An inner-outer contour filling algorithm based on 8-neighborhood pixel detection is proposed in the filling stage. It can accurately distinguish character blocks from noise blocks, and quickly convert hollow characters into solid characters. (d) A minimum-nearest neighbor merging algorithm of character components is presented in merging stage. According to structural features and connection order, the algorithm can combine character components into individual characters without redundancy. It can improve the attack efficiency by reducing candidate characters' number.

RELATED WORK
There are two known ways of text-based CAPTCHA attack: whole-based strategy and segmentation-based strategy.
In whole-based strategy, the character image of CAPTCHA can be recognized without segmentation. Some CAPTCHA attack methods have been proposed. For example, Mori and Malik [6] used the shape context matching technique to break the Gimpy scheme with a success rate of 33%. Bursztein et al. [7] showed that 13 kinds of CAPTCHAs on popular websites were vulnerable to automated attacks with Spatial Displacement of the Neutral Network, and they proposed a universal method for global recognition of the overlapping and adherent CAPTCHAs.
In a segmentation-based strategy, the boundary of the characters should be determined first, and then the CAPTCHA image can be segmented into individual character images to be recognized finally. The key to the strategy is whether CAPTCHA images can be segmented correctly. Starostenko et al. [8] defined three-color bar code by skillfully using the number of vertical pixels to break reCAPTCHA scheme with a success rate of 54.6%. Ahmad et al. [9] segmented characters by thoroughly analyzing distinctive shape patterns, and achieved 46.75% success on Google scheme and 33% success on reCAPTCHA scheme. Moy et al. [10] developed distortion estimation techniques to attack EZ-Gimpy of CMU with a success rate of 99% and four-letter Gimpy-r with a success rate of 78%. Nachar et al. [11] used edge corners and fuzzy logic segmentation/ recognition technique to break the CAPTCHAs of eBay, Wikipedia, reCAPTCHA, Yahoo!, and the success rates were 68.2%, 76.7%, 62.5% and 57.3% respectively.
The above attack methods are used on solid CAPTCHAs and not aimed at hollow CAPTCHAs directly. In 2013, Gao et al. [12] first analyzed hollow CAPTCHAs' robustness and proposed a general method to break a variety of hollow CAPTCHAs, with the success rates ranging from 36% to 89% and the average time per challenge ranging from 1.23s to 5.30s. In addition, Gao et al. [13] effectively broke both hollow and solid CAPTCHAs based on 2D Log-Gabor filters and achieved a success rate ranging from 5% to 77% and the average time per challenge ranging from 2.81s to 28.56s.
Despite having advantages, there is some room for Gao's two methods to improve.
(a) Recognition accuracy can be improved. Both methods are "jigsaw puzzle" [12] thinking, all character blocks that are likely to be grouped together are taken as candidate characters. Since the number of candidate characters is large, the relative proportion of correct characters is small. Naturally, the recognition accuracy would be reduced. (b) The attack time cost can be decreased. In [12], the breakpoint location algorithm used in repairing contour needed traverse all characters. It leads to high time cost and much redundancy. In [13], hollow characters are segmented into a large number of segments. It can further produce a huge set of possible combinations during the recognition stage. It would increase the time and difficulty on attack.

PROPOSED AF&NM METHOD
In order to solve the above problems, this paper proposes an attack on hollow CAPTCHA using accurate filling and nonredundant merging. By observing and analyzing the features of hollow CAPTCHAs, we propose a principal framework ( Figure 1) and some algorithms to improve the accuracy and reduce the complexity, mainly including 8-neighborhood detection outer contour algorithm, inner-outer contour filling algorithm and minimumnearest neighbor merging algorithm.

Principal Framework and Main Steps
The principal framework of AF&NM method mainly includes five steps: preprocessing, filling, segmentation, merging and recognition.
(a) Preprocessing: Convert images into binary ones, then thin and repair the character contour.   The principles and the algorithms involved in our method will be described in detail in the following subsections.

Preprocessing
The purpose of preprocessing is to highlight the information related to characters in the given image and to weaken or eliminate interfering information. In our method, the preprocessing includes image binarization, image thinning and repairing contour. We take two CAPTCHA images in Figure 2 as examples to illustrate the whole process.
Firstly, image binarization is to highlight interesting targets' contour and remove noises in the background. Using Otsu's algorithm [14], we can get binary images, i.e. white-and-black images (see Figure 3).
Image thinning is to process character contours as the skeleton, yet which won't change the adhesion of characters ( Figure 4). The main effect of this step lies in two aspects. First, in a CAPTCHA image, the contours with irregular and uneven thickness are converted into single-pixel-width ones by thinning. It can reduce redundant points. Second, compared to unprocessed characters, the thinned characters have clearer contours without shadows. Thus, the subsequent processing will be simpler, too.
The method in [15] is fast and effectively, thereby we adapt it to thin CAPTCHA images. By image thinning, we obtain hollow characters with one-pixel-thick contours. Furthermore, we find some broken contours.
To get the closed hollow characters, we use the endpoint detection algorithm in [16] to mark the endpoints of each line. Then examine and draw a one-pixel-thick line to connect those neighboring pairs.
After the above steps, we get preprocessed CAPTCHAs, which are binary images containing several closed hollow characters with single-pixel-width contour (see Figure 5).

Filling
For the thinned images, we fill the closed hollow characters acquired in Section 3.2 to solid ones. The main purpose of filling is to enrich the character information and prepare for segmentation. During this phase, the key is to fill the character components accurately rather than noncharacter components. The filling method mainly consists of labeling connect regions, removing noise blocks and filling hollow character.

Labeling Connect Regions
For closed hollow characters, we use Haralick's run algorithm [17] to label different numbers of connected components (as Figure 6). Now a label matrix of the same size as the original image matrix is generated. In this matrix, all black pixels are labeled as 0, white pixels are labeled as the ordinal of connected components, such as 1, 2, 3, etc.

Removing Noise Blocks and Filling Hollow Character
In this paper, noise block refers to the noncharacter connected component. The noise blocks have two types such as Figure 6(a): One is the connected component inside a character, such as No. 3, 4, 6, 11. The other is the connected component between adherent characters, such as No. 8, 9. They have the common feature: both are circled by character components. As can be seen from Figure 6, the contour of character components is adjacent to the background area; and because noise blocks are surrounded by character components, their contour will not be adjacent to the background area. Therefore, we can distinguish character components from noise blocks according to this feature.
Let the contour L is a set of pixels that are labeled as 0.
According to whether their contours are adjacent to the background area, the contour L can be divided into two parts: the outer contour and the inner contour. The set of those pixels which are adjacent to the background area is called the outer contour L out . Namely, among the neighborhoods of outer contour, there are pixels whose labels of connected components are 1.
The part of contour apart from the outer contour called the inner contour L in . The pixels of the inner contour are not adjacent to the background area. Among the neighborhoods of inner contour, there are pixels whose labels of connected components are not 1.
Then the current pixel label number is changed to be the background pixel label: p x,y = 1. // Remove the outer contour pixels 4. } In order to obtain connected components circled by inner contour and find out noise blocks accurately, we proposed an 8-neighborhood detection outer contour algorithm (Algorithm 1) to remove the outer contour, and then fill character connected components. Its main steps are as follows.
Copy the label matrix to a new matrix.
Examine each pixel of the contour in the copy of label matrix, and if there is a background pixel in its eight neighborhoods, it is the pixel of outer contour, and then its label value is changed to the background label value. When all outer contour pixels are changed to the background, we get inner contour.
Label the connected area circled by the inner contour, namely noise blocks (see Figure 7).
In the original label matrix, the labels of noise blocks are changed to 1, and reorder character connected components, as shown in Figure 8.  Fill character connected components, and hollow characters become solid ones (see Figure 9).
In summary, our inner-outer contour filling algorithm is summarized in Algorithm 2.

Segmentation
For the solid character images obtained in the above subsections, we will segment them to individual characters or character components.
Because each connected component corresponds to a character component, our attack uses different labels of connected components to segment the characters.
For each character connected component, first, create an all-1 matrix, which has the same size as the original CAPTCHA image matrix. Then, obtain position coordinates of each pixel from the current character component, and then set pixel value to 0 for those pixels that have same position coordinates in all-1 matrix.  Figure 10). If the number of character components obtained by segmenting a CAPTCHA image is equal to the number of individual characters contained in the CAPTCHA, there is no need to perform a merging operation because the image is segmented into individual characters (as Figure 10(a)). Otherwise, it needs to perform a merging operation because the image may be segmented into individual characters and character components (as Figure 10(b)).

Merging
For the character components obtained in Section 3.4, we need to merge them into individual characters before recognition. As we know, the character components have two features: (a) The number is small and random. A CAPTCHA image can be segmented to 5 ∼ 7 character component images. (b) There are mutual relations. Character components are created from an individual character, and there are inevitable links in structure and order.
According to the above features, we propose a minimumnearest neighbor merging algorithm: (a) Due to the small number of components and to avoid merging errors and confusion, only two components are merged each time. Therefore, the number of mergers can be calculated by: where k represents the number of mergers, S 1 represents the number of character components produced by segmenting a single image, and S 0 represents the actual number of characters in a single image.
(b) In general, the one character component with the minimum number of pixels is most likely to be merged.
where d 1 , d 2 , . . . , d S 1 are elements in the array d, which used to store the number of pixels corresponding to the ordinal of characters, f (d 1 , d 2 , . . . , d S 1 )returns the ordinal of the minimum number of pixels to C 1 .
(c) During selecting the other character component to be merged, the inevitable connection among character components in structure and order should be fully considered in order to reduce the blindness of merging.
In terms of structure, the width of two character components together must be less than the maximum width of an individual character.
In terms of order, the character components are segmented from an individual character, so the left and right neighbors of the C 1 th character component to be merged are prioritized as the other component to be merged. So, the ordinal of the other character component to be merged can be calculated by: where C 2 respects the ordinal of another character component to be merged, w l is the width of new character component consisted of the character component C 1 and its left neighbor C 1 − 1. w r is the width of new character component consisted of the character component C 1 and its right neighbor C 1 + 1, w t is the maximum width of a character component among the CAPTCHA image.
Finally, the proposed minimum-nearest neighbor merging algorithm is summarized in Algorithm 3. { Find the one character component with the minimum number of pixels in all character components: Find the other character component C 2 5. { If C 1 = 1, C 2 is the right neighbor of C 1 : C 2 = 2; 6.
If 1 < C 1 < S 1 , C 2 is the neighbor with suit width and pixels 8.
{ If w r > w t &w l ≤ w t , C 2 is the left neighbor: If w l > w t &w r ≤ w t , C 2 is the right neighbor: C 2 = C 1 + 1; 10.
If w l ≤ w t &w r ≤ w t , C 2 is the neighbor with less pixels; 11. } 12.
Merge the character components C 1 and C 2 ; 14.
Reorder the character components; 16. } In order to help the readers to understand Algorithm 3, we describe the merging process in detail using an example. Take for example the character components in Figure 10(b) and the specific parameter values are shown in Table 1. In addition, let t is the loop times, n min is the corresponding ordinal of d min which is the minimum element of array d, and w t = 30.
When t = 0, the character components are segmented and sorted. The number of pixels, left boundary, right boundary of each character component are obtained respectively. For example, the number of character components segmented from CAPTCHA image is 7, the number of individual characters in a CAPTCHA image is 4. So it needs 3 times of merging.
When t = 1, the 5 th component is the smallest character component, then d min = 13, n min = 5. First, the minimum character component is in the middle, w l = 26, w r = 26, and they are less than the maximum width of character 30. Next, compare the number of pixels of left and right neighbor components, we found d 4 > d 6 . Therefore, the 5 th and 6 th character components are merged into a new 5 th character component. Accordingly, it needs to update the pixel number, left and right boundary and other related information (see Figure 11). Lastly, the 6 th character component information is deleted, and all character components are reordered.
When t = 2, the 6 th component is the smallest character component, d min = 58,n min = 6. The minimum character component is in the end, it and its left neighbor can be merged into the new 5 th character component. (see Figure 12).
When t = 3, the 2 nd component is the smallest character component, that is, d min = 59, n min = 2. First, the minimum character component is in the middle, w l = 39, w r = 21. Therefore, the 2 nd and 3 rd character components are merged into a new 2 nd character component. (see Figure 13).
After 3 loops, the merging operation has been completed. Character components are merged into individual characters (show as Figure 14).

Recognition
After segmentation and merging, all CAPTCHA images are segmented to individual characters, and we can recognize them by some machine learning methods. As well as the method of [12], we also use CNN to recognize these individual characters for high accuracy.
CNN allows to directly inputting the individual character images to be recognized without feature extraction, and it has a certain degree of robustness in displacement, scale, and deformation. As a deep learning network, CNN has been widely used in the fields of CAPTCHA recognition [7,12,13], information hiding [18,19,20,21], and so forth.

ANALYSIS OF AF&NM METHOD
Compared with the methods of [12] and [13], proposed AF&NM method has some difference. In the preprocessing stage, the thinning operation has been used to reduce redundant breakpoints. In the filling stage, our attack removes noise blocks with inner-outer contour quickly. Next, we acquire candidate characters without redundancy by merging rather than combination. So, our attack has no search stage.

(a) High accuracy in repairing contours
In the method of [12], there are redundant breakpoints, which might produce noise blocks.
In the AF&NM method, the character contour is thinned into single-pixel-width lines. Thus, redundant breakpoints can be correspondingly reduced, and the right breakpoints will be located more accurately. (b) Fast denoising

As shown in
In [12], all nonblack pixels components are filled; and black contour is removed by dilation operation. If m × m  structure element processes n × n image, the time complexity is O(m 2 n 2 ).
In contrast, we mark the noise blocks firstly, and then only fill the character strokes to avoid useless work. It can be seen from Table 3 that the filling area and the processing time of the method of [12] are 6481 pixels and 63.85ms, while the filling area and the processing time of our method are 777 pixels and 4.02ms. We calculate that the filling area is 1/8 of the method of [12] and the processing time is its 1/16.

(c) No redundancy in merging stage
A large number of characters to be recognized are produced by the combination method of [12], and the analysis of algorithm is as follows.
It's assumed that S 1 represents the number of character components produced by segmenting a single image, m represents the number of components in character components, j represents the number of the starting component in a character component, num(j) represents the number of character components whose starting component is j, sum represents the number of character components that are merged by components. In addition, the actual number of characters in a single image is S 0 , and for the width limit of the character component, the maximum number of components per character component is m = S 1 /(S 0 − 1) . Then the following analysis of the combination methods of [12] and [13] can be known.
When j = 1, that is, the first component is the start of the combination. Due to the limit of m , the number of character components produced by combining m character components is n m (m = 1, 2, . . . , m ), So, the total number of character components which start from the 1 st character component is the sum of the above numbers. When j = n − m + 1, when the combination starts from the (n − m + 1) th character component, the total number of character components is m m=1 C m−1 S 1 −j .
When j = S 1 − m + 2, since the remaining character components have been less than m , so when the combination starts from the (S 1 − m + 2) th character component, the total number of character components is To sum up the above two cases, we can get the formula for calculating the number of combined characters components in a single image: Therefore, sum is the number of candidate characters for a single image. It is more than the actual number of characters S 0 .
In AF&NM method, no redundant characters to be recognized are generated. Let n c is the number of images, n 0 is the number of images to be merged, m i is the number of components in the current image, thus the total merging times: where if n 0 = 0, all images are divided into individual characters directly without merging, and the minimum of merging times is 0. If n 0 = n c , all images are divided into the maximum m 1 character components, and the maximum of merging times is (m 1 − S 0 ) × n c . Moreover, no matter how many times of merging, the number of characters to be recognized is always the same as S 0 × n c , hence the number of the actual individual characters without redundancy.

EXPERIMENTAL RESULTS
In order to verify the performance of proposed AF&NM method, we performed experiments on real CAPTCHA data sets. We respectively test the success rate of AF&NM method and its attack speed, and complete contrast experiments with methods of [12] and [13].

Experimental Environment and Settings
We implemented experiment in MATLAB 2015a, and tested it on a desktop computer with a 3.10 GHz Intel Core i5-2400 CPU, 4 GB RAM, and Windows 10 professional x64.
We have implemented our attack and tested it on two hollow CAPTCHA schemes: Tencent 1 and BotDetect 2 . Tencent is the largest Internet platforms in China. BotDetect was the world's first commercially available CAPTCHA component in 2004. Therefore, the two representative CAPTCHA schemes with applicability and professional are used to test the performance of our attack method.
For each scheme, we collected from the corresponding website 800 random CAPTCHA images as a training set, and another 200 as a test set. The CAPTCHA samples possess the following features (see Figure 15).
• Every CAPTCHA image consists of 4 capital hollow letters with rotation; • CAPTCHA images have light background without interference lines and noise blocks. We ran AF&NM method and the methods of [12] and [13] on Tencent scheme and BotDetect scheme. In [22], Gao et al. pointed out that "We have converted hollow characters into solid characters in advance." In this way, the method of [13] can be more specific for attack hollow CAPTCHAs.
We implement a CNN as our recognition engine. The CNN's classical structure adapted from LeNet-5 [23], which has 6 layers in total (see Figure 16).

Experiments of Attack Success Rate
The success rate is an important indicator to evaluate the effectiveness of the attack method. In experiments, we count the success rate of individual characters and the success rate of single images. The total of images and characters were respectively 200, 800.
For Tencent scheme, the success rate of each phase in AF&NM method shows on Table 4. Considering the mistakes accumulated in all stages, the success rates for individual characters of each phase of AF&NM are 99.25%, 84.00%, 98.50% and 97.25%. Accordingly, the success    And the methods of [12] and [13] maintain the nearly equal success rate whether for individual characters or images, both of which are lower than those of AF&NM method. By and large, compared to the methods of [12] and [13], AF&NM method is better in terms of the success rate on Tencent scheme.
Similarly, we ran our attack on BotDetect scheme. In Table 6, the success rates for individual characters of each phase of AF&NM are 80.88%, 66.13%, 80.75% and 89.00%. Accordingly, the success rates for single images of each phase are 62.00%, 36.50%, 71.50% and 75.50%. As shown in Table 6, we achieved 89.00% success rate for individual characters and 75.50% success rate for single images by AF&NM method. The method of [12] achieved 67.88% success rate for individual characters and 64.00% success rate for single images. The method of [13] achieved 57.75% success rate for individual characters and 53.00% success rate for single images. The success rate of our attack is still higher than the ones of the methods of [12] and [13].
From Tables 5 and 7, we also can see that the success rate of Tencent scheme is higher than that of BotDetect scheme. Compared to Tencent CAPTCHA, BotDetect CAPTCHA images have more characters with adhesion and overlap, which will have an impact on success rate. This is because overlapping characters produce a large number of character components, which adds to the difficulties for subsequent combinations.
Bursztein et al [7] suggested that a CAPTCHA scheme is broken when the attacker achieves an accuracy rate of at least 1%. According to the criterion, two hollow schemes are successfully broken by our attack. Thus, the AF&NM method is applicable to the hollow and adherent CAPTCHA and has some effect on the hollow and overlap CAPTCHA. The experimental data show that our method is superior to the other two methods in success rate.

Experiments of Attack Speed
The attack speed is an important indicator to measure an attack method. The shorter attack time indicates that the security of a CAPTCHA scheme is worse. In addition, improving the attack speed of CAPTCHAs can reduce the time spent in the massive CAPTCHAs attack test.
For Tencent scheme, our attack speed of each stage and the proportion are as shown in   To evaluating the attack speed of AF&NM method, contrast experiments have been conducted with methods of [12] and [13]. In the attack process of Tencent and BotDetect CAPTCHAs, the time cost of three methods is as shown in Figures 17 and 18. For Tencent scheme, the attack speeds with [12] and [13] are respectively 139.29ms and 134.63ms, our attack speed is 48.25ms. For BotDetect scheme, the attack speeds with [12] and [13] are respectively 1032.93ms and 979.94ms, our attack speed is 177.54ms.
According to contrast results, the average attack speed of [12] and [13] are very close to each other and both far slower than our attack speed. Meanwhile, we can see that the attack speed of Tencent scheme is faster than that of BotDetect scheme because the latter has more character components. Comprehensively, AF&NM method is superior to the methods of [12] and [13] in terms of the attack speed. For Tencent scheme, we compare the attack speed of method of [12] and AF&NM method in detail. From Figure 17, the time cost of filling and segmentation stages of the AF&NM method is 6.56ms per piece, and that of method in [12] is 63.85ms which is 10 times more than AF&NM method. First of all, the time of filling all of the connected components in [12] is 4 times that of only filling characters components regions in AF&NM method. Moreover, the method of [12] performs the dilation algorithm on each stroke to remove the black contour. Its time cost is very large. In addition, the combined time in [12] is nearly 5 times the merging time in AF&NM method. The main reason is that there are a large number of redundant characters in [12] by combining.
As we can see from Table 9, it takes a little more time in the preprocessing stage in AF&NM method, for a thinning operation is extra introduced. However, more preparation will only quicken the speed of doing work. In all stages, the proposed method saves approximately 65.35% of the time than in the method of [12]. Especially in the filling & segmentation stages and combination/merging stages, the attack time is approximately reduced by 89.73% and 80.88% respectively. From another perspective, this also fully validates the optimization that thinning has brought to the whole attack.
In short, from the experimental results, we can see the proposed AF&NM method can increase the attack accuracy for images by approximately 8% ∼ 22.5% and decrease the attack time by approximately 2/3 ∼ 4/5 in average compared with the methods of [12] and [13].

CONCLUSIONS
To improve the attack accuracy and reduce the attack time, we gave a simple and novel framework for hollow CAPTCHA attack in this paper. Different from the existing frameworks, our framework supports the classification of character components and individual characters produced by segmentation. In addition, learned from these state-of-the-art attacks, we proposed an attack on hollow CAPTCHA using accurate filling and nonredundant merging. Lastly, the experimental results show that compared with existing typical attack methods on hollow CAPTCHAs, the proposed method has a higher success rate and a superior efficiency in attack. Next, we will further study on seriously distorted CAPTCHA characters.

DISCLOSURE STATEMENT
No potential conflict of interest was reported by the authors.