Dictionary-Based interpolation technique for text quality enhancement

This paper presents a new dictionary-based interpolation technique to improve text quality when the resolution of an image is increased, so that the text can be displayed on a high-resolution display device. The proposed algorithm analyzes an image and extracts the shape of the text from the image. Further, it encodes and decodes the pattern of the text, and enhances the legibility of the text using a pre-trained code dictionary. Therefore, the proposed method improves text quality in terms of sharpness. In the experiments, the proposed algorithm outperformed benchmark methods for all test images. Specifically, the proposed method reduced the blur index by up to 0.112.


Introduction
The resolution of displays, such as the liquid crystal display (LCD) and the organic light emitting diode (OLED) display, has been increasing in response to demands for clearer and more vivid images. High resolution is now possible because of the improvement of electronic devices and display manufacturing processes. For example, the improvement of television systems has led to an increase in their resolution from high definition (HD) to full high definition (FHD), and recently, to ultrahigh definition (UHD). However, even if a display device has a UHD resolution, the increase in the resolution of the image contents requires the replacement of broadcasting equipment and an increase in the transmission bandwidth. To solve this problem, image up-scaling, a technique that increases the resolution of an image using two-dimensional interpolation, is used. Figure 1 shows an example of two-dimensional interpolation in a case wherein the resolution of the display device is twice higher than that of the image content. In this case, new pixels are generated using an interpolation technique based on either the current pixel or the current and neighboring pixels. The performance of the up-scaling system depends greatly on the interpolation performance.
The basic interpolation techniques are bilinear and bicubic interpolations [1][2][3][4]. These methods use linear interpolation, and hence, require little computation. However, they do not exactly produce the interpolated pixel in the edge area, where the luminance changes nonlinearly, because a new pixel is generated by linearly connected pixels. To solve this problem, several interpolation techniques have been proposed. The methods used in [5] and [6] are based on fitting local fifth-order polynomials.
They consider a small number of neighboring pixels, and hence, have low computational complexity. The method in [7] uses the raised cosine pulse for interpolation. It has several advantages, such as fast implementation using the fast Fourier transform (FFT) and simple hardware architecture using a finite impulse response (FIR) filter. The methods in [8] and [9] use polynomials matched by the values of the function at various points and those of the first-order derivatives, thus achieving a higher degree of continuity. Conventional interpolation methods focus on improving the smoothness among neighboring pixels for generating an interpolated pixel. However, if these methods are used to increase the resolution of a text image, a blurring artifact at the boundary of the text occurs. When the width of a text string is thin and the background color is uniform, such as in web pages, the legibility of text can be seriously degraded.
In this paper, a dictionary-based interpolation technique is proposed to improve the text quality when the width of a text string is thin and the background color is uniform, such as in web pages. The proposed method separates the text from the background of the image. Specifically, it analyzes the shape of the text and changes such shape to enhance the legibility of the text using the pre-trained dictionary. This paper is organized as follows. In section II, the proposed dictionary-based interpolation algorithm is described in detail. Section III presents the experimental results and evaluates the performance of the proposed algorithm in terms of blur assessment for text quality. Finally, section IV concludes the paper. Figure 2 shows the entire architecture of the proposed up-scaling system. First, the RGB input image is converted to a YCbCr image. Next, Y signals are encoded into serial bits by blocks using the majority recognition module and the block encoding module. Based on the pre-processed code, the input code is classified and the most suitable match in the dictionary is selected. Finally, interpolation is performed after the selected bits are decoded.

Encoding stage
In this stage, the shape of the text is encoded after the text image is extracted from an input image. Specifically, using the Y signal as an input, a luminance histogram is generated for pixels in the current and neighboring blocks (as shown in Figure 3), and the highest majority, i.e. the background luminance, which is defined in (1), is extracted.
where l i denotes the luminance of the i-th pixel and l maj denotes the luminance of the majority of the pixels, i.e. the background luminance. N stands for the number of bits used to represent the pixel luminance. In this case, eight bits were used. Based on the luminance of the background, the shape of the text for a current block is encoded as shown in where l in denotes the luminance of an input pixel and M denotes the acceptable margin between which the luminance may fluctuate due to noise. M is set experimentally to eight. C stands for the one-bit code.
Using (2), the the nine-bit code (Code in ) is extracted for the 3 × 3 block, as shown in Figure 4(a). For example, in Figure 4, the luminance of the background and the luminance of the text are white and gray, respectively. Hence, l maj and l text are 255 and 100, respectively. Therefore, in the case of the text, the difference between its luminance and the background luminance is higher than M, and the one-bit code is 1. In the case of the background, the difference between its luminance and the luminance of the text is lower than M, and the one-bit code is 0. For the 3 × 3 block, encoding is performed from the top-left to the bottom-right, and 100 010 001 is extracted, as shown in Figure 4. Subsequently, this code and the luminance of the text enter the decoding stage.

Decoding stage
In the decoding stage, the input code is analyzed and the legibility of the text is improved by changing its shape. For this, a code dictionary that contains pre-trained codes is used. Specifically, in the code dictionary, an image with a low resolution (LR) is paired with an image with a high resolution (HR), as shown in Figure 4(b). The pairing of the LR and HR images in the dictionary is defined by a user to improve the text legibility. If the code dictionary receives the input code (code in ), as shown in Figure 4, the same code is searched for in the dictionary and the code (code out ) for the HR image is selected.
Next, the interpolation module converts the code for the HR image into the luminance of the output image, as shown in (3).
where l text and l out denote the luminance of the pixel of the text and the final luminance of the output image, respectively. For example, in Figure 4, when code in is 100 010 001, the same code is searched for and code out , which is 110000 111000 011100 001110 000111 000011, is outputted. Next, the interpolation module decodes this code into the luminance, and accordingly, the boundary of a diagonal line is improved, as shown in Figure 4(c).

Experiment results
The performance of the proposed method was evaluated subjectively and objectively. First, the text qualities of sample text images constructed using the proposed and benchmark methods were visually compared. Second, the text qualities of the proposed and benchmark methods were assessed in terms of their blur, using a perceptual blur metric [10]. Third, the computation times of the proposed and benchmark methods were calculated. The interpolation algorithms in [5] (method 1), [7] (method 2), [9] (method 3), and [4] (method 4) were used as the benchmark methods. For the test images, sample text images in several languages, i.e. English, Japanese, Chinese, German, and Korean, were used. Each language category consisted of 20 images. Images with an HD resolution (1280 × 720) were used, and their resolutions were converted to WQHD (2560 × 1440). For the subjective evaluation, Figure 5 shows samples of the English and Japanese text images. The benchmark methods (methods 1, 2, 3, and 4) used neighboring pixels to enhance smoothness in the boundaries, and hence, blurred the boundaries of the text. Methods 3 and 4 used a large number of neighboring pixels, and the blur is more marked than that in methods 1 and 2, as shown in Figure 5. On the other hand, the proposed dictionary-based interpolation method considered not only the smoothness but also the sharpness of the boundaries. Therefore, the blur was significantly reduced, unlike in the benchmark methods. Additionally, Figure 5 shows that the staircase effect of the text was reduced, unlike in the benchmark methods. This is because the proposed method, using the pre-trained code dictionary, performed interpolation only when the boundaries had to be improved.
For the objective evaluation, the blur indexes of the proposed and benchmark methods were assessed over the blur index of all the test images. The blur indexes ranged from 0 to 1, where a higher index indicates poorer image quality with respect to blur. Figure 6 shows graphs of the blur indexes of all the test images. The blur index of the proposed dictionary-based interpolation is lower than the blur indexes of the conventional methods for all the test images. In the case of the languages with many strokes, i.e. Chinese, Japanese, and Korean, the blur reduction effect was enhanced in the proposed method. Table 1 lists the average blur indexes of the proposed and benchmark methods. The average blur indexes of methods 1, 2, 3, and 4 were 0.072, 0.069, 0.114, and 0.162, respectively. On the other hand, the blur index of the proposed method was 0.058. Specifically, the average blur indexes of the proposed method were up to 0.021, 0.017, 0.067, and 0.112, which are lower than those of methods 1, 2, 3, and 4, respectively. This is because the proposed method could maintain the sharpness of the   text boundaries by adaptively controlling the interpolation based on the pre-trained code dictionary. Table 2 lists the computation times of the proposed and benchmark methods. To measure the computation times, all the simulations were performed using MATLAB on a PC with an Intel Core i3-2120 processor at 3.30 GHz. The computation times of the proposed method were 14.77, 3.87, and 37.77 ms lower than those of methods 1, 2, and 3, respectively. Even though the computation time of the proposed method was slightly higher than that of method 4, which involves linear interpolation, this drawback was significantly compensated for by the improved image quality in the proposed method.

Conclusion
In this paper, a proposal was presented for a dictionarybased interpolation for text quality enhancement when low-resolution image content is up-scaled to a highresolution display device. The proposed algorithm extracts the shape of the text and the background luminance from an image, and then adaptively applies interpolation to improve the text legibility using a pre-trained code dictionary. The proposed method improves the text quality in terms of sharpness, which was experimentally validated using various test images. In these experiments, the proposed method reduced the blur index by up to 0.112, higher than did the benchmark methods.