A DCT domain smart vicinity reliant fragile watermarking technique for DIBR 3D-TV

This work presents a vicinity reliant intelligent fragile watermarking scheme for depth image-based rendering technique used for three-dimensional television. Depth map of a centre image is implicitly inserted in the block-based discrete cosine transform (DCT) of the same using an aggregate, which also takes into account the presence of its neighbourhood blocks. Based upon the parity of a Boolean operation on the aggregate, parity is modulated which implicitly embeds the watermark. Genetic algorithm is then utilized to select the appropriate frequency bands in the DCT domain to become eligible for watermark embedding based on imperceptibility requirements. Experimental results demonstrate the usefulness of the proposed scheme in terms of its resistance against a set of fragile watermarking attacks and its ability to detect and localize tempering attempts.


Introduction
With the advent of Internet and high-speed connected networks, multimedia applications have protruded extensively in the last decade. Among such applications, three-dimensional television (3D-TV) is one of the most evolving and sought out technology derived by the desire of viewing prerecorded content closest to the reality. Depth image-based rendering (DIBR) is one of the most promising techniques to realize the virtual view synthesis for depth perception [1]. In DIBR, a centre image and its associated depth map is utilized to generate corresponding left and right images. These are then used to produce anaglyph image that yields depth perception with suitable viewing aids [2]. Among many challenges, content authentication is one of the primary concerns for DIBR 3D-TV content producers. Amidst many solutions, fragile digital watermarking is one of the promising techniques. In such a watermarking, slightest modification in the cover work would destroy the watermark such as to raise alarms against any tempering attempts [3].
A fragile watermark, when embedded, is likely to be destroyed by the slightest modification of the watermarked content. This enables and ensures a mechanism where an extracted watermark can be compared with the inserted bits to indicate an attempt of tempering. More advanced fragile watermarking systems are also able to localize tempered regions of the cover work. Temper localization not only helps to identify the areas of an image where an attack was attempted but also provides an insight into the attacker's region of interest and intention. In addition, it also helps to rectify and correct the changes made in the cover work. More recent watermarking schemes are also able to automatically recover the changed bits [4].
To be adequate, a fragile watermarking technique must localize any tempering, geometric transformations, and be able to sustain any legitimate manipulations, such as image compression. It must not contain any security gaps for attacks such as cut and paste [5,6] and birthday attacks [5,7]. Researchers have shown that a very well established necessity to encounter such attacks is block-wise dependence. Baretto et al. [5] has indicated that contextually deterministic dependence is susceptible to transplantation attacks [6,8]. To deal with the transplantation attacks, Barreto et al. [5] proposed a scheme based on the hash function calculation. However, this scheme is relatively time-consuming and the size of the blocks limits accuracy of temper localization.
Several schemes have been proposed in literature including Wu and Liu's scheme [9] of inserting a binary watermark sequence in DCT coefficients in a lookup table, and Wong's public key scheme [10]. However, these schemes, along with some others [11][12][13] are block-wise independent and therefore are exposed to cover up, transplantation and vector quantization attacks [6,14].
Recognizing the importance of block-wise dependence, several schemes have been proposed including Edupuganti et al.'s [15] and Nyeem et al.'s [16] schemes based upon cyclic redundancy checks to authenticate the features of a block stored in pairs of mapping blocks. Li et al. [17] proposed a scheme that uses a binary feature-map extracted from an underlying image as a watermark. This scheme is simple to implement and takes into consideration of many of the challenges in fragile watermarking. However, it was not developed for DIBR and thus does not take into consideration the depth map for embedding purpose. In addition, appropriate frequency bands are not selected for watermark embedding. Yu et al. [14] utilized the concept of stereo matching and block categorization and classification to propose a fragile watermarking scheme. Their scheme classifies the non-overlapping blocks of the left image as matchable or non-matchable based on stereo matching of left and right images. For temper localization, they employed a detection technique with alterable-length watermark to increase accuracy of tamper localization at the receiver side. Nevertheless, compared with the proposed technique, their technique lags in imperceptibility.
All these schemes were proposed for digital images and do not incorporate ways to be used for 3D-TV. Recently, Ali et al. [18] proposed a transform-based watermarking algorithm for copyright of 3D images by using discrete wavelet transform and singular value decomposition transforms to utilize their individual properties. Nevertheless, their method is not developed for content authentication and is suitable only for copyright protection applications. Rana et al. [19] developed an Sacle Invariant Feature Transform (SIFT) featurebased coefficient selection scheme for robust watermarking of 3D DIBR images which are view invariant. This scheme [19] exploits the shift-invariance and directional property of the DIBR-based view synthesis process to embed the watermark.
In this work, we propose a fragile watermarking scheme for DIBR-based 3D-TV. The proposed scheme takes into consideration the neighbourhood, or the vicinity, of the embedding location while calculating a secret sum associated with the implicit embedding of the watermark bit. It enhances Li's [17] DCT-based watermarking scheme by selecting an appropriate frequency band using the genetic algorithm (GA). In addition, the depth map of the image itself is used as a source for generating block-wise dependence. The watermark is created using the depth map and a pseudorandom sequence using a predefined secret key. The same key is used at the receiver side to authenticate the embedded watermark. In the process of watermark embedding, any of the images from the stereoscopic image pairs can be used for the embedding of the watermark. For the purpose of this study, the proposed scheme is named as vicinity reliant fragile watermarking (VRFW) technique.
The rest of the paper is organized as follows. Section 2 explains the proposed technique with subsections on important related topics such as general architecture, GA-based training and testing, the watermark embedding and extraction processes. Section 3 presents the experimental results and discussion, whereas the conclusion is presented in Section 4.

General architecture
In this section, the proposed VRFW watermarking scheme is presented. We first describe the general architecture of VRFW and then elaborate the individual modules in later subsections. The rendering operation for the left-eye and the right-eye image synthesis is explained separately in Section 2.4. Figure 1 shows its general architecture, whereby, a stereoscopic left or right image is block DCT transformed to produce frequency-domain coefficients. GA is then utilized to select the appropriate frequency band in block DCT for embedding of the watermark based on imperceptibility property of the watermark. On the other hand, the watermark is generated using the depth map, a secret key and a pseudorandom sequence. The frequencydomain coefficients and the watermark bit are then used to implicitly embed the watermark using a secret sum and block-wise neighbourhood dependence. Conjointly, these two provide an implicit mechanism for resistance against a number of attacks a fragile watermarking application is susceptible to. Based on the performance of the particular candidate frequency band selected by GA in terms of imperceptibility characteristics, the algorithm decides to continue or terminate its generational evolution. In the next section, the proposed embedding scheme is discussed in detail. Figure 2 shows the watermark embedding process of the proposed VRFW scheme. First, a stereo left or right image X is 8 × 8 block DCT transformed to generate Y.

Watermark embedding algorithm
and where r = (p × q)/(8 × 8) is the number of DCT blocks, p and q are the number of rows and columns of image matrix, respectively, and k = 0, 1, 2, . . . , 63 represent the zigzag coefficient index of each 8 × 8 DCT block. Using Y, a binary image M of the same size as Y is generated using the following equation: On the other hand, the depth map DM is converted into a binary image using a predefined threshold 0 ≤ α ≤ 255 to generate b. All pixel values in depth map lesser than the defined threshold are converted to 0 while the larger values are converted to 1. It is then multiplied by a pseudorandom sequence S which is generated using a secret key K. We call it R.
The final watermark message (W) is then generated by taking exclusive OR between M and R using: In our GA simulation, we define our genome sequence as a binary sequence with length equal to the DCT block size, i.e. 64. Initially, some "ON" bits' locations in the middle-frequency bands are considered as watermarkable and the zero bits' locations are ignored. These watermarkable frequency locations are then optimized in the GA simulation based on the imperceptibility of the watermark which is taken as fitness of every genome and is calculated using Equation (9). Corresponding to each watermarkable coefficient selected by GA, we compute a secret sum according to the following equation: where n = 0, 1, 2, . . . , 63, and m is the dependence neighbourhood including the block itself as show in Figure 3. The h in Equation (6) represents the set of indices of the watermarkable coefficients. Let i (j) be the concatenation of SUM i (j) and Y i (j)in one's complement format. Then, let i (j) be the concatenation of i (j) with i (j). Finally, the watermarked image Y is produced by modulating the watermarkable coefficients of each block in a way that, where Alarm () is a function that results in 1 if the count of "1" bits is odd and 0 if it is even. Finally, inverse DCT  is performed on Y to generate the watermarked image in the spatial domain.
From a given population in GA simulation, the individuals satisfying the fitness criteria are selected for the reproduction. For this purpose, a fitness value for each individual in the population is calculated as given below: where f c is the fitness value of the current chromosome, whereas PSNR and SSIM denote Peak Signal to Noise Ratio and Structural Similarity Index Measure, respectively. PSNR is divided by 50 to scale its value in accordance with SSIM value. They are the quality metrics that give better visual image analysis in any given scenario and are calculated between original image and watermarked image. In any population, best individuals are selected based on f c to make the next generation using replication, mutation and crossover to achieve diversity as well as convergence in the solution space. The algorithm iteratively enhances the fitness, generation by generation, until the stopping criteria are met. The convergence or stopping criteria is mainly based on the optimal choice of DCT coefficients for the embedding of the watermark. The best DCT locations are saved as K 0 . The keys K and K 0 are transmitted over a secure network, or hardcoded in the decoder, to be used in watermark extraction.

VRFW watermark extraction algorithm
The proposed scheme provides blind watermark extraction, i.e. the original image is not required on the extraction side. First, the watermarked image X is 8 × 8 DCT transformed to generate Y . The same embedding procedure is adopted to generate the binary sequence M, R and W. The same key K 0 is used to identify the watermarked coefficients, and their indices are saved in h. For the corresponding marked coefficients, SUM i (j) is calculated using Equation (6). The concatenation of SUM i (j) with Y i (j) in one's compliment to produce i (j) and i (j) as explained in the previous section. The watermarked image is then authenticated using Equation (7). If Equation (7) does not hold, the image is tempered and the block which raises the alarm is the tempering location.

Rendering operation
The synthesis of left-eye and/or the right-eye images is done through pixel-wise warping of the centre image [20]. For any depth image, let the depth values range from ζ near to ζ far . Where ζ near is the nearest object plane and ζ far is the farthest object plane in the 3D scene. For an 8-bit depth image, the depth values map the intensity values [0, 255] in the range [ζ far , ζ near ]. The proposed rendering operation warps the pixel location according to the depth value using the following equations: and where x L and x R are the corresponding x-coordinate of the pixels in the left and right eye-images, respectively.
x C is the x-coordinate corresponding pixel in the centre image. ζ characterizes the depth value of the current pixel corresponding to the centre image.τ x represents the camera distance and f is the focal length of the camera.

Experimental results and discussion
The proposed VRFW scheme is implemented in Matlab using Intel Core i7 2.4 GHz processor with an 8 GB onboard RAM. Middlebury Stereo Vision Lab 2006 image datasets [21][22][23] are used for the evaluation of our proposed technique. The size of each image as well as the corresponding depth map used in experiments are 256 × 256. In our GA simulation, we have used Matlab GA Toolbox with a population size of 100 and maximum generation size of 20. The fitness function used is presented in Equation (9) and comprises of both PSNR and SSIM performance metrics. Figure 4 shows the strength of the watermark embedded in different 8 × 8 blocks of DCT for Bowl-ing2 image. It can be seen that the proposed VRFW scheme is intelligent enough to learn different frequency bands and their response to additive watermarking. A high strength watermark is embedded in areas where it is easily concealed, and a low strength watermark or none is embedded in areas where it would be more perceptible and increase distortion. At the same time, the proposed scheme caters for the imperceptibility constraints of the fragile watermarking domain. Also, most of the watermark is embedded in the middle frequency bands which are highly desired in watermarking applications. If the watermark is embedded in higher frequencies, it is susceptible to a low-pass filter attack and can be removed easily. On the other hand, adding a watermark to lower frequency regions will increase the noise in smooth areas of the image and will be highly perceptible. Table 1 demonstrates the performance of the proposed VRFW scheme in terms of watermark imperceptibility. Different values of PSNR and SSIM corresponding to different images are shown. The higher values of PSNR (which are around 50 dB) and SSIM (which are almost 1) depict that even after the insertion of the watermark, the proposed technique is able to   keep the watermark imperceptible and subsides visual attenuation of the images. Figure 5 shows the performance of the VRFW in terms of watermark embedding in the left-eye image, its authentication and tamper localization. The results shown in Figure 5 demonstrate the subjective outcomes of the proposed scheme for Monopoly, Plastic, Rocks1, Bowling2 and Wood2 images through (I)-(V) of the Middlebury [21] database. Figure 5(a-d) represents the original image, its associated depth map, watermarked image and the temper-localized image, respectively. Addition and removal of artefacts attacks were mounted on the watermarked images to produce temper-localized images. It can be seen that there is no visual difference in the original and watermarked images. The black squared regions in the last column demonstrate the areas where slightest modifications are made in the watermarked image. Tempering attempts are first detected and then localized, which is very much desirous in real-world applications. Table 2 presents the performance comparison of the proposed technique with the technique presented by Yu et al. [14] in terms of watermark imperceptibility. For this purpose, a number of images are selected from the Middlebury Stereo Vision Lab 2006 image datasets [21] and the PSNR values are computed between the original and the watermarked images generated by the two techniques. As shown from the table, the proposed technique performs better as compared with Yu et al.'s technique in terms of PSNR values. This is due to the fact that appropriate frequency bands are selected intelligently in the proposed technique utilizing a minimum number of coefficients for watermark embedding. Another reason for this enhanced performance is that the proposed technique finds the most suitable bits for watermark embedding. In addition, it uses only four bits to alter (and most of the time it does not need to alter) while embedding the watermark. This provides very little distortion even if the watermark is embedded. It is also visible from the SSIM and PSNR values in Table 1, which are very near to the original image and completely imperceptible to the human visual system. Through our experimentation in the laboratory, we found that there is no significant change in the quality of the watermarked 3D view.
One of the major drawbacks of the proposed scheme is its time complexity during the training phase. Since the proposed scheme is developed on simulation-based optimization using the GA, the time complexity is high during the training process. However, once a generalized suitable genome is generated, it takes a fraction of a second to watermark the image. This is primarily due to the fact that the appropriate frequency bands are provided by the best-evolved genome for all the block locations in the DCT coefficients. The only computational cost then remains is following the rest of the steps in the algorithm for watermarking the image.

Conclusion
In this paper, a fragile watermarking technique is proposed for DIBR 3D-TV images with improved visual quality. The proposed VRFW scheme is based on GA and block DCT transform. Block-wise neighbourhood reliant watermarking scheme is based on neighbourhood aggregate and parity modulation. This creates strong block-wise dependence to enforce security against different types of attacks. As a result, the watermarked images obtained are not only better in visual quality but also the scheme is able to detect and localize tempering attempts for 3D-TV media.