Design and evaluation of features and classifiers for OLED panel defect recognition in machine vision

ABSTRACT With the rapid growth of organic light-emitting diode (OLED) display devices, the industrial manufacturing of OLED panels is currently an expanding global reality. Regarding quality control, automatic defect detection and classification are undoubtedly indispensable. Although defect detection systems have been widely considered in the literature, classification systems have not received appropriate attention. This study proposes the design of an efficient and high-performance system for defect classification by combining well-known machine-learning algorithms: support vector machine, random forest (RF), and k-nearest neighbours. To begin, possible features are designed and feature selection using principal component analysis and RF is investigated to automatically select the most effective features. Then, a hierarchical structure of classifiers is proposed for efficiently adjusting the rates of true defect and fake defect classification. The proposed system is evaluated over a database of 3502 images captured from real OLED display devices in different illumination conditions. The defects in the database are divided into 10 classes corresponding to the types of true defect and fake defect. The experiments confirm that the proposed system can achieve an accuracy of up to 94.0% for the binary classification of true defect and fake defect and an overall recognition rate of 86.3% for the 10 sub-classes.


Introduction
The organic light-emitting diode (OLED) is emerging as the advanced display technology for applications in televisions, smartphones, projection displays, and micro-displays, for several reasons including (Uttwani et al., 2012) rapid response times (in the order of microseconds), wide viewing angle of almost 180°, wide colour gamut, and low power consumption, which is an important characteristic for portable devices (Bardsley, 2004;Gu & Forrest, 1998;Kanno, Hamada, & Takahashi, 2004;Shinar & Savvateev, 2004;So, 2009). With the rapidly rising demand, OLED panels are now manufactured in batch quantities by autonomous processes. However, controlling product quality is an essential issue for the complex manufacturing processes because panel defects are inevitable. There are several factors that can generate defects including dust and foreign matter in the assembly process, short circuits, and open circuits in the scanning electrode and signal electrode, poor driver-integrated circuit contact, substrate ruptures, and scratches (Wang, Gao, Jian, Cen, & Chen, 2012). The defects on an OLED panel can cause not only visual failure but also electrical failure and operation malfunction (Lu & Tsai, 2005). The size of OLED defects is generally minute and mixed with the texture in the background, making manual inspection impractical. Figure 1 illustrates the overall structure of an OLED display, which consists of several layers (Shinar & Savvateev, 2004). The defects can appear in the outer-cover glass layer or in other internal layers.
With the development of image processing techniques, machine vision is assuming an important role in improving the flexibility and level of automation in the OLED panel manufacturing industry. The three principal steps in defect control are defect detection, defect classification, and removal of the defect sources. In addition to the importance of the defect detection step, defect classification is also critical in revealing the source and severity of the defects.
Over the last decades, researchers have proposed solutions on this subject in two directions: defect detection (or inspection) and defect classification. Previous studies have focused on the first category of detection. Kim, Kwak, Song, Choi, and Park (2004) proposed a multi-level adaptive thresholding method on local image blocks to binarize the defect/background. This method was intentionally developed to inspect spot-type defects. Kim, Kang, and Jeong (2008) processed the OLED images in the frequency domain by filtering out the plain background using a high-pass filter. The high-frequency passing components were detected as defects. Chen and Chou (2008) considered mura defects as low contrast and luminance variation without a clear contour on a uniformly produced surface, and then proposed two methods to detect defects based on discrete cosine transform and discrete wavelet transform. Lu and Tsai (2005) detected several defect types such as pinholes, scratches, and particles or fingerprints by using singular value decomposition. Considering the texture on the panel background as a periodical, repetitive pattern, proper singular values are selected to reconstruct the background and then the anomalies of the defects can be extracted by subtraction. Independent component analysis is also proposed (Lu & Tsai, 2008;Wang et al., 2012) in a similar methodology as given by Lu and Tsai (2005) reconstructing the background texture through the independent components extracted from the faultless images. Gao, LiuChuanxia, and Chen (2012) presented a recursive scheme of the adaptive Otsu method to calculate the threshold for image segmentation of defect/background. More research on detection can be referenced in the publications and patents (Kaltenbach, Guiguizian, & Stephenson, 2005;Lim, Seo, & Jeong, 2005;Perng, Chen, & Lee, 2005;Trujillo et al., 2009).
Conversely, there has been little research on defect type recognition reported in the literature of machine vision applications. Kang, Lee, Song, and Pahk (2009) proposed a framework for the classification of four defect types by introducing 24 image features to represent the shape and texture of the defect regions; then, a classifier was trained using a support vector machine (SVM). The paper reported a classification rate of 86.409%. However, the method considers a limited number of defects (four classes), whereas real-world applications generally exhibit a greater number of defect types. Similarly, Huang and Lu (2013) extracted the features of shape, histogram, and colour and proposed local binary pattern-equality with features from three images, the original image, defect mask, and circuit zone image, input to a linear-SVM classifier. Lim et al. (2005) defined a feature based on the intensity gradient image and the relative position between the defect area and the pattern for an over-simplified method.
The previous studies considered a simplified model of the problem or evaluated the performance of the recognition on SVM as a classifier. This can weaken the applicability in real systems. Furthermore, fake defects caused by artefacts including hair, dust, or cleanable pollution on the surface were not investigated. These types of defects can be removed easily without remanufacturing the panel and hence reduce the manufacturing cost. Furthermore, previous studies considered a small number of classes, which makes it difficult to address real-world application requirements.
To address this deficiency, in this paper, a design and evaluation of features and classifiers for OLED panel defect recognition are proposed where the fake defects and true defects are considered in the classifier design. A true defect indicates a real defect like substrate ruptures, scratches, and so on, and a fake defect refers to a defect generated by capturing the image of hair, dust, or cleanable pollution on the surface. Assuming the given defect masks, our study extracts the image features of shape and texture by inheriting a portion of the previous studies and proposing additional high-level features including Gabor filters and Hu's moments to better characterize the shape and texture of the defect patterns. To efficiently control the balance between the fake defect and true defect classification rates, a hierarchical structure of classifiers is proposed. Following feature extraction, an emerging random forest (RF) (Breiman, Friedman, Stone, & Olshen, 1984) classifier is proposed and evaluated as this method has demonstrated excellent results on noisy data in several researches (Schroff, Criminisi, & Zisserman, 2008;Schulter et al., 2013;Shotton et al., 2013). Furthermore, the performance of RF is compared with SVM (Cortes & Vapnik, 1995), the commonly used classifier in the previous works, and k-nearest neighbour (kNN) methods on several scenario configurations. An experimental evaluation is performed using a database of 3502 images with ten defect types where three types are fake and the balance are true. Finally, the importance of each variable for each layer classifier is evaluated by RF feature selection.
The remainder of this paper is organized as follows. Section II discusses and illustrates the cause and defect types that can occur in an OLED display panel during the manufacturing process. Section III describes the overall structure of the method with which each module will be analysed and introduces the features and classifiers in detail. Experimental results are presented in Section IV followed by our Conclusions in Section V.

OLED panel defects
The defects in OLED displays are commonly classified into two groups: glass defects and cell defects; the former indicate defects on the glass surface and the latter defects on the internal layers. The glass defects can only cause a visualization problem, whereas the cell defects are more severe and not only affect the visualization but also electrical failure and operational malfunction (Lu & Tsai, 2005). To efficiently capture defects in the internal layers, the OLED device is illuminated with different lighting conditions and photographed from different camera viewing angles. Figure 2 depicts the six illumination and camera setups in the proposed system to capture the different types of defects.
In this study, only cell defects are addressed in the 10 defect categories; seven defects are true defects (crack, flaking, spacer broken, liquid overflow, liquid, surface scratch, and back pit), and the remaining three defect types are fake (cleanable pollution, dust, and artefact). The description for each type of defect is presented below and defect samples corresponding to the defect masks are illustrated in Figure 3. To simplify the process, the defect types are coded as in Table 1.
The S0 in Figure 2 is lit up by a blue LED light source, and the camera is set up to capture the reflective component. Under this scenario, all types of defects are visible with different amplitudes under this lighting setup. The S1 uses a white LED source at 45 degrees on the surface of the panel, and the camera is positioned to capture the reflection component. This setup can also capture almost all types of defects. S2 is set up in the opposite direction of S0 using a white LED source, in the backlight direction to capture the defect types of crack and back pit. The illumination scenarios S3, S4, and

Hierarchical structure of classifiers
To efficiently control the error rate between fake and true defect recognition, a hierarchical structure of classifiers is proposed as in Figure 4.
In this structure, three classifiers are trained in two layers: the fake defect or true defect classifier, a binary classifier responsible for bifurcating the input samples into either fake or  true defect types; these are then fed into the fake defect classifier or the true defect classifier for further categorization to individual defect types. The main advantage of this structure is that the balance between the error rates of fake defect and true defect classification is efficiently controlled by tuning the F-T classifier. In real applications, this issue is considerably meaningful. For some systems, one may want to recognize all the true defects with no error rate, though the error rate for a fake defect to be classified as true may be high, because the misclassification of a true defect to a fake may cause severe problems. However, for other systems, this can be balanced to satisfy the quality measurement and cost optimization.

Shape and texture features
The features used in our study are defined from a greyscale image and mask binary image used to extract the texture and shape information as summarized in Table 2. The texture features are computed from the region of interest (ROI) defined by the mask to describe the properties of the ROI by (1) mean pixel intensity T00, (2) standard deviation of the intensity distribution T01, (3) mode of the intensity distribution T02, (4) entropy of Represent the image's complexity and uncertainty Measure the degree to which a blob is compact Hu's moments S06-S12 Equation (7) (Given in the text) intensities T03, (5) contrast T04, (6) skewness of intensity histogram T05, (7) kurtosis degree of intensity histogram T06, (8) mean of gradient T07, and (9-13) five Gabor features using Gabor filters at different angles T08-T12. From the mask image, the shape features include (14) aspect ratio S00, (15) roundness S01, (16) number of blobs (binary large objects) S02, (17) rectangularity S03, (18) blob size ratio deviation S04, (19) compactness S05, and (20-26) seven Hu's moments S06-S12 (Hu, 1962). The definition and description of the features are grouped in Table 2 where L represents the grey levels, h i is the histogram at grey level I, p(h i ) is defined as p(h i ) = h i / L−1 j=0 h j , R is the ROI region with size R , s h is the standard deviation of histogram distribution, h = L−1 j=0 h j /L , ∇I, is the edge image of I, H B and W B are the height and width of the blob, respectively, and P is the perimeter of the blob region, A B is the area of the ROI, and A R is the area of the bounding box of the blob.
Worth remarking is that the mask image commonly contains more than one connected blob (as seen in Figure 3); the scalar shape features are therefore averaged over the single connected blobs in the mask image, weighted by the blob areas: where f i is the feature value computed from a single blob, n is the number of blobs, and w i is the weight of the ith blob defined by: where A i indicates the area of the ith blob. This rule is applied to features S00, S01, S03, S05, and S06-S12. Heavier weights are assigned to blobs with larger sizes. The feature (18) blob size ratio deviation is proposed to measure the size difference of blobs in the mask image, as: where a = N i=1 a i /N is the mean, a i = A i / N j=1 A j with N represents the number of blobs, and A i is the area of the ith blob. This feature measures the equality in size of the blobs in the mask image.
The Gabor features (9-13) are generated from convolving the grey image with Gabor filters at different angles. A Gabor filter is defined through several controlling parameters as: g(x, y; l, u, c, s, g) = epx − x ′ 2 + g 2 y ′ 2 2s 2 cos 2p x ′ = xcosu + ysinu, where u controls the filter direction. Given a Gabor filter instance, the response to an image is computed as the convolution of the image and the filter as: G F (x, y; l, u, c, s, g) = I(x, y) * g(x, y; l, u, c, s, g).
By fixing the other parameters (l = 0.55, c = (p/2), s = 2, g = 1) and changing the direction u = {0, p/5, 2p/5, 3p/5, 4p/5}, a bank of five Gabor responses is generated. From this bank, a pattern map is computed by retrieving the index of maximum response pixel-by-pixel as: Finally, Gabor features (9-13) are defined as the five elements of the pattern map histogram.
To effectively extract the shape of the blobs, Hu's (1962) invariant moments are included in the feature set. Hu's moments include six absolute orthogonal invariants and one skew orthogonal invariant based on the algebraic invariants. The moments are independent of size, position, orientation, and parallel projection. The moments are proven to be adequate features for tracing image patterns to the challenges of image transformations and are extensively applied in many applications. The moments are defined as features in our study as: where central moments m pq and normalized central moments h pq are defined as in Hu (1962).

Machine learning
In this model, the FG classifier can be optionally selected using SVM, kNN, and RF. SVM is widely used in pattern classification problems (Mitchell, 1997). SVM is developed based on minimizing the structural risk to minimize the upper bound on the generalization error, rather than other machine-learning methods based on empirical risk minimization (Burges, 1998). Therefore, it is suitable for applications without the domain knowledge of the input data. SVM can converge to the level of maximum margin between classes. Further, the balance between the learning error and model complexity can be controlled through the cost parameter: a higher value of cost parameter leads to a misclassification rate decrease; however, the complexity increases; the converse is also true (Burges, 1998). The kNN (Altman, 1992) algorithm is a classification method based on closest training examples in the feature space. kNN is categorized as nonparametric and lazy learning as the function is approximated locally and all computation is deferred until classification. The method is a fundamental and simple technical method where virtually no prior knowledge of the data distribution is required.
Emerging as a powerful classifier for computer vision applications, RF (Breiman et al., 1984) approximates the Bayes decision rule by minimizing a margin-based loss function greedily and implicitly via recursively reducing the uncertainty of given training samples using independent base trees (Schulter et al., 2013). In comparison with other machinelearning algorithms, the RF technique, first proposed by Breiman, is known as an efficient framework that has many advantages such as the rapidity in training and detection stages, flexibility to a high-dimensional data set, robustness to noise, and the ability of parallel programming (Caruana, Karampatziakis, & Yessenalina, 2008).

Feature and variable selection
Feature and variable selection attracts more attention in machine-learning applications as many features are commonly added to train the predictors. Feature and variable selection can reveal many potential benefits such as facilitating data visualization and understanding, reducing the requirements for storage, decreasing the time of training and predicting, and improving the performance by defying the challenge of dimensionality (Guyon & Elisseeff, 2003). In this study, feature selection is performed by the widely established principal component analysis (PCA) algorithm (Jolliffe, 2002) and variable selection is performed by RF (Genuer, Poggi, & Tuleau-Malot, 2010).

Data description
A collection of 3502 sample defect images (7004 images including masks) was acquired from actual OLED manufacturing processes to perform experiments in our study. As mentioned above, the defects belong to 10 sub-classes describing the defect types including seven types of true defects: crack, flaking, spacer broken, liquid overflow, liquid, surface scratch, and back pit, and three fake defect types: cleanable pollution, dust, and artefact. A summarization of the detailed number of samples belonging to each defect type is indicated in Table 3. An automatic defect detection method based on Fourier transformation (Tsai & Hung, 2005) was used to create the masks of the defect regions. To ensure the evaluation of the defect classification was independent Table 3. Data distribution in experiments. G00  G01  G02  G03  G04  G05  G06  F00  F01  F02   Total samples  3502  132  105  73  371  67  594  160  313  955  732  Training  1748  66  52  36  185  33  297  80  156  477  366  Testing  1754  66  53  37  186  34  297  80  157  478  366 of the defect detection process, the masks were manually examined and the incorrect masks were reproduced by manually tuning the algorithm parameters to work on the individual defect images. The data set was separated by the ratio 5:5 for the training and testing sets, namely 1748 images used for training and the remaining 1754 images for testing. The ground-truth of class labels was provided by the manufacturer for experimental purposes.

Accuracy measurements
The classification accuracy of the system was calculated using two measurements: the overall classification rate and the average classification rate. The overall classification rate indicates the probability that a new sample is correctly classified as: where N is the number of classes, k is the layer, T is the total number of samples, T i is the number of samples of class i, and 6 i is the number of samples of class i correctly classified by the system. The measure computed the summation of the individual accuracy rate of each class 6 i /T i , weighted by the number of samples in the class T i /T. A class with a greater number of samples contributed more to the overall classification rate. The second measure was developed to demonstrate the balance between the fake defect classification rate and the true defect classification rate as: where 6 F (6 G ) indicates the number of fake (true) samples that are correctly classified and N F (N G ) is the number of fake (true) samples.

Classification accuracy
To thoroughly evaluate the different combinations of classifiers, several test scenarios were configured by changing the classifier-controlling parameters and number of used features. The scenarios are summarized as in Table 4. From each test scenario, the best performance was recorded and is presented in Table 5 with the optimal parameter values (k for kNN(-)), max tree depth and number of trees, respectively, for RF(-,-), and kernel type and C value, respectively, for SVM (-,-). The detailed trials for each parameter are specified in Table 5. The feature selection was recorded with the optimal number of features as indicated in Table 6.
The first scenario was targeted at determining the optimal number of features by fixing the classifiers of the first layer and changing the number of features used by RF and PCA. The best performance was obtained with the use of RF as the feature selector to select the 18 most important variables.
The next three scenarios were aimed at evaluating the individual classifiers to use as the first layer classifier. This was accomplished by tuning their controlling parameters while fixing the feature selection method using RF with 18 used features. The best performance of each individual classifier is recorded in Table 6, where the RF demonstrates outstanding performance compared to kNN and SVM. The last three scenarios were to evaluate the use of individual classifiers in the second layer by fixing the first layer with RF (15/200) and feature selection by RF (18) and tuning the classifier-controlling parameters. Once more, RF outperformed the two others as recorded in the table in bold. The optimal parameters are determined by a brute-force search with all combinations of manual parameter sets, and recording the combination of the highest successful rate. The value range for each parameter is specified in Table 5.
The experimental results reconfirm the statement that RF is robust against noise in a multi-class classification problem. As the nature of the problem, defects can present at arbitrary appearances with different amplitudes. The data extracted from the mask and texture are, therefore, considerably noisy.
In comparison, the proposed is not compatible to the other methods on the same category (Huang & Lu, 2013;Kang et al., 2009;Lim et al., 2005) due to different database. Similar to other works, this method is evaluated on the unpublic data set since a public data set is unavailable. Furthermore, the proposed framework considers the fake and true defects while the other      works only focus on the true defect types. However, this method proposes to use RF instead of SVM that was used in the previous works. As shown in Table 6, for the first classifier of fake and true defects, RF gains the classification accuracy of (R

Variable selection
This part of the experiment aims at evaluating the effectiveness of the feature selection and the contribution or importance of each variable to each layer of the classifiers using the RF variable importance utility. As many features are defined and evaluated, some of them may be redundant. For the feature selection, the RF and PCA are independently evaluated. The result is presented in Table 7 using the optimal parameter setup found in Table 6. As shown in Table 7, for PCA the classification accuracy increases as the number of selected features increases since PCA reduces the feature space thus leading to the loss of information. However for RF, the features are not transformed to another feature space, and some features are redundant and maybe contrastive to the others; therefore using the entire feature set does not guarantee the best classification rate.
On feature selection, RF was used to evaluate the importance of each feature to each layer classifier. The higher the variable importance was, the more important the feature was in the prediction accuracy. RF uses out-of-bag samples to measure this individual prediction accuracy. The variable importance chart of used features is presented in Figures 5-7, for each classifier in the hierarchical layer, where we evaluate the importance of features for each layer classifier separately. It can be seen in Figure 5 that the texture features are important to the FG classifier, whereas the shape features were highly crucial in training the fake classifier. Regarding the true defect classifier, the texture features and shape features are nearly adequate in the prediction capability. This result highlights a conclusion that texture information is a valuable trait for separating fake and true defect samples and shape information is effective for fake sub-classes.

Conclusions
Defect detection and classification is an important function in improving the product quality of OLED manufacturing because defects can cause severe failures. This paper proposed a hierarchical structure for designing a defect classification system and evaluated several features and classifiers thoroughly. This study considered the fake defects as a part of the input data; thus a hierarchical structure was more efficient in controlling the rate between fake and true defect classification. This matter is useful in the real-world manufacturing process. Recognizing fake defects caused by cleanable factors can significantly assist the manufacturer in reducing repair cost.
The first layer of the structure was responsible for the binary classifier to determine fake or true; the second layer further categorized the detailed defect types. Three widely accepted machine-learning algorithms were employed in each layer of the structure to compete in the selection of the best performance. Twenty-four (24) features extracting the texture and shape characteristics of the defect region were evaluated along with RF and PCA as the feature selection algorithms. Several experimental scenarios were configured and the results confirmed a promising performance that can satisfy the requirements of industry.

Disclosure statement
No potential conflict of interest was reported by the authors.