Data Hiding in Iris Image for Privacy Protection

ABSTRACT This paper proposes a novel iris image data hiding scheme for privacy protection. Privacy personal data is embedded into the iris image such that its impact on the iris recognition is minimised. This is achieved by restricting the modifications of the embedding within the regions that seldom affects the iris recognition with the help of the STC (syndrome trellis coding) framework. We also propose a novel distortion function to measure the impact of the data embedding on iris recognition, where the regions with important iris features are assigned with high embedding cost. Experimental results show that, by using our proposed scheme, we can embed sufficient data into the iris image with high recognition accuracy maintained.


INTRODUCTION
Traditional authentication methods are mainly based on tokens e.g. ID cards, or knowledge, e.g. passwords [1]. The former is easy to lost and copied, and the latter is at risk of being forgotten or stolen. Because of this, biometric authentication technology has been widely developed recently [2]. According to the origin, bio-characteristics can be divided into physiological bio-characteristics, e.g. fingerprints, irises, faces, DNA, etc. and behavioural bio-characteristics, e.g. signatures, voiceprints, gait, etc. These biometric characteristics are generally unique, lifeimmutable, and hard to be forgotten or stolen during identity authentication.
Among the various biometric authentication systems, iris authentication [3] has attracted great attention due to its stable recognition performance and high recognition rate. In the existing iris authentication systems, personal iris images are collected as biometric templates by an authorising institution and stored in different types of sub-databases with the corresponding personal privacy data, e.g. name, phone number, address, bank account, etc. The personal privacy data stored separately in a database also has a great risk of theft. Therefore, how to protect these personal privacy data while minimising the iris recognition accuracy has become a serious issue.
Data hiding [4,5] is one of the main techniques for privacy protection. The aim of biometric data hiding [6][7][8][9][10][11][12][13] is to embed sufficient personal data into cover biometric templates and to maintain the performance of recognition. Existing biometric data hiding methods usually perform the data embedding on the region that does not contain key features of the biometrics. For example, Vatsa et al. [9] use phase consistency to detect the location of human face features, data is embedded into the rest locations. Whitelam et al. [11] use multi-layer watermarking and steganography to protect multi-modal biometric information. In [12], the iris template data is embedded only in the blue channel. Though these schemes achieve reasonably good performance for biometric image data hiding. How to minimise the impact of the data embedding on the biometrics recognition remains unanswered.
In this paper, we incorporate the STC [14] framework for iris image data hiding, where the impact of the data embedding on the iris recognition is minimised. Concretely, the personal privacy data is embedded in an iris image using STC with a proposed distortion function, which assigns high embedding costs for the regions to contain important iris features for recognition. During the data embedding, the modifications are restricted into the regions with minimised iris recognition impact. Experimental results show that we are able to embed sufficient personal data into the iris image while maintaining a high-recognition accuracy.
The rest of this paper is organised as follows. We introduce the related works in Section 2. The proposed method described in Section 3. Experimental results and analysis are provided in Section 4. Section 5 concludes the whole paper.

RELATED WORKS
In this Section, some related works are introduced, including watermarking and steganography-based biometric data protection methods.
Jain et al. [8] proposed two scenarios for biometric data protection. In the first scenario, the fingerprint minutiae data is hidden in a cover image to protect the fingerprint minutiae before the image is transmitted through a public channel. In the other scenario, face information is embedded into a fingerprint image as watermarking and then registered on a smart card. At the phase of authentication, the fingerprint image and the reconstructed face image can be extracted for multi-factor authentication. In [9], Vatsa et al. proposed a 3-layer RDWT bio-watermarking algorithm to embed biometric Mel Frequency Cepstral Coefficients (MFCC) coefficients into colour face images. In this way, the robustness and safety of system are enhanced. It firstly uses phase consistency to detect the location of human face features. Furthermore, these key feature areas in the watermark algorithm are skipped, so that the accuracy of the face recognition system is slightly affected.
Li et al. [10] propose a new scheme for protecting biometric templates using a saliency region's authentication watermarking. A multi-level authentication watermarking scheme is firstly proposed to verify the integrity of biometrics. Secondly, principal component analysis (PCA) features of these biometric images are used as watermarks to recover tampered images. Based on the detection result of the salient region, the authentication bits and the information bits can be adaptively embedded in the biometric image. Whitelam et al. [11] proposed a framework that uses multi-layer watermarking and steganography to protect multi-modal biometric information. The specific approach is to use the watermark technology specifically for grayscale images to embed human face features into corresponding fingerprint images. The watermark image is then embedded in any cover image that is not related to biometrics. If the cover is attacked, this method does not show any possibility of the existence of biometric data, which can provide the user with an extra layer of security.
Chaudhary et al. [12] propose an LSB-based steganographic method to protect colour iris template. In order to enhance security, the iris template data is embedded only in the blue channel. In the authentication system, the cover image embedded with the iris template can be directly stored to avoid storing the original biometric data of privacy. Li et al. [7] proposed a privacy fingerprint authentication system. Personal privacy data is embedded into binary template fingerprint images. Since no boundary pixels are generated during the embedding process, the distortion caused to the cover can be minimised. The embedding method causes the smallest possible anomaly in the refinement of the fingerprint template without affecting the recognition performance of the fingerprint authentication system. A template protection and fusion strategy are proposed in [13]. Fingerprint feature vectors and iris features are used as watermarks. The watermark is then embedded in the low-frequency AC coefficient of the smooth 8×8 DCT block in the image. Decision-level fusion strategy is used to improve the overall performance of multi-modal systems. Tarif et al. [6] proposed a method for ensuring the secure transmission of biometric data by means of encryption and information hiding in a multi-modal authentication biometric system. The fingerprint and iris biometric template are embedded into the Slantlet-SVD transform domain of the face image.
In this paper, we propose a data hiding method for the privacy protection of iris image using STC embedding with a proposed distortion function. To the best of our knowledge, we are unable to find any literature which takes advantage of the STC framework for biometrics data hiding.

THE PROPOSED METHOD
In this paper, a data hiding method for iris image is proposed. The personal privacy data is embedded in an iris image using STC-based data hiding. To minimises the impact of data embedding on the iris recognition, we propose a novel distortion function by assigning high embedding cost for the region with important iris features. Figure 1 shows the iris authentication system considered in this paper. In enrolment phase, the personal privacy data M is firstly encrypted into cipher text M using existing encryption algorithms, e.g. data encryption standard (DES), international data encryption algorithm (IDEA) with an embedding key K. Then M is embedded into a registered iris image C using a proposed embedding algorithm to obtain the iris image S with secret data. Finally, S is stored in the iris database for authentication. In this phase, only one iris image database is needed, instead of additional storage space. In the authentication phase, an iris image is processed to obtain its iris code T i . Then another iris code T j is extracted from the iris image in the database, and then used for verification by matching. The matching is passed when the matching score is higher than the threshold τ set by the system. The cipher text M can be extracted from the iris image in the database when the matching is passed. After decryption with the same key K, the personal privacy data M is obtained and returned to the user. If the matching is failed, the privacy data will not be extracted.

STC-based Data Embedding and Extraction
In the proposed method, the STC framework is employed for data embedding. Denote the size of a cover image as k×l, the (i, j)th element as c(i, j), and the embedding cost assigned for c(i, j) as ρ(i, j), where i = 1, 2, . . . , k and j = 1, 2, . . . , l. The theoretical minimal steganography distortion D of stego image with embedding capacity m (bits) [15] is, where are the probabilities of modifying c(i, j). The parameter λ (λ > 0) is used to make the information entropy of modification probability equal to the capacity m, The STC coding provides a practical encoding scheme to approach this theoretical bound. Once the embedding costs are specified, the secret data can be embedded into the cover by STC.
In the phase of data embedding, the elements of a cover image, the distortion function, and secret data are inputted into STC (with a constraint height h = 10) to generate the output stego. Let secret data be α = [α 1 , α 2 , . . . , α m ] T ∈ {0,1} m . Then, α can be embedded into c using (4), where s = [s 1 , s 2 , . . . , s k×l ] T is the elements of stego image, and C(α) is the coset of α.
In the phase of data extraction, the secret data α in the stego elements s can be directly extracted by a matrix computation, where H is a low-density parity-check matrix that is pre-defined determined by the embedding speed, the embedding efficiency and the payload.
Therefore, the distortion function is the most crucial factor in STC-based data embedding framework. There are many distortion functions, e.g. HUGO [16], WOW [17], S-UNIWARD [18], HILL [19], etc. These distortion functions aim to embed the secret into complex areas as much as possible. Such operations on iris images may destroy its features, which leads to the reduction of the recognition performance of the iris authentication system. For this reason, in the next subsection, we designed a novel distortion function for the iris image.

The Distortion Function
To minimise the impact on the iris recognition caused by embedding, a new distortion function is proposed.
Since the edge area of the iris image contains no key iris features, the accuracy of the iris recognition system is seldom affected by data embedding in the edge area. This motivates us to incorporate the edge detection for the design of the distortion function. Any edge detection algorithm can be adopted for this purpose. In our implementation, we adopt the canny edge detector which is a classic and fast edge detection algorithm.
There are two main steps for the design of the distortion function. Firstly, the canny edge detection algorithm [20] is employed to produce an initial embedding cost for each image element. Then the gradient of the image is exploited to adjust the embedding cost.
As shown in Algorithm. 1, the dichotomy method is used to determine the threshold t h of canny edge detection adaptively, where q is a factor to adjust the range of edge which is set as 0.02 in experiments.
After the threshold of canny detection t h is obtained, the canny edge detection algorithm is performed on Algorithm 1 Obtaining threshold for edge detection Set t min = 0, t max = 1, n = 0; while m ≤ n ≤ (1 + q)m is not satisfied, t h = (t min + t max )/2; Set the threshold of the canny edge detection algorithm as t h to update the numbers of edge pixels n; if n ≥ m, t min = t h ; else t max = t h ; end end Firstly, the gradient values of the cover image are calculated, which can reflect the changing magnitude in various directions. Denote the gradient of c(i, j) in the horizontal direction as G x (i, j), in the vertical direction as G y (i, j). As shown in Figure 2, the three-neighbor average absolute values G h (i, j) and G v (i, j) of gradient for c(i, j) in horizontal and vertical directions are calculated respectively using Equation (6) and (7). In natural images, the spatial correlation in horizontal and vertical direction is significantly stronger than diagonal direction. For saving the computational complexity, we use only the horizontal and vertical direction to calculate the gradient values.
After G h (i, j) and G v (i, j) are obtained, which characterise the complexity around c(i, j) in both horizontal and vertical directions, a complexity value ζ (i, j) of c(i, j) is calculated using Equation (8).
A larger ζ (i, j) means more complex texture, thus the corresponding embedding cost should be smaller since complex locations affect image quality weekly. In other words, the value of embedding cost should in inverse proportion to the value of ζ (i, j). According to this, the proposed distortion function is defined as, , c(i, j) ∈ E and c(i, j) = 0, 255 where μ is a positive number and is set as 6 in experiments, E is the edge positions obtained by canny edge detection algorithm mentioned above. In STC-based data embedding, the value of a pixel may be changed by +1 or −1. In a grayscale iris image, the range of the pixel value is from 0 to 255. To avoid the overflow, we have to suspend all the pixels with the value of 0 and 255. Since the secret data can be correctly extracted by the matrix computation defined in Equation (5), it is unnecessary to suspend other pixels.
The proposed distortion function assigns high embedding costs for the regions with important iris features.
Combining with STC, impact on the iris recognition caused by embedding is minimised.

EXPERIMENT RESULTS
The CASIA-IrisV2-device1 iris database [21] is used to measure the performance of the iris recognition system. We choose the 60 classes, 10 images for each class, and a total of 600 BMP images sized 640×480 for experimentation. Figure 3 (a) indicates the embedding costs calculated on an iris image, which is marked with white. The brighter the mark is, the smaller the embedding cost and larger modification possibility represented, and vice versa. The black area is the non-edge area, in which the embedding cost is set infinitely. Figure 3 (b) shows the iris region obtained by the iris region detection method mentioned above. It can be seen that the modification positions are concentrated at the complex area and not at the iris feature region using the proposed method. Therefore, it minimises the impact on iris recognition performance furthest.

Recognition Rate
In the authentication, the iris recognition system compares the query iris image with the corresponding iris image in the database to obtain the matching score. The larger the matching score, the larger the probability that the two are from the same individual. We employ the receiver operating characteristic (ROC) curves to measure the performance of iris recognition. The ROC curve is plotted based on the Successful Match Rate (SMR) against the False Accept Rate (FAR). The FAR is the ratio of the matching scores in the false matches that are greater than or equal to the threshold τ set by the system. The value of τ is determined by the value of FAR, where where s ij means the matching scores of the first samples of different individuals, M denotes the set of all scores, and | · | denotes the number of internal satisfied conditions. The SMR indicates the ratio of the scores in the genuine matches that are greater than or equal to τ , where where s ij means the matching scores of the samples of the same individuals and N denotes their set.
We use a pseudo-random number generator to simulate personal privacy data for embedding. The iris images without personal data are used as the query to match the corresponding images in the database to obtain the matching scores. The proposed method, the STC-based embedding algorithms HILL [19], S-UNIWARD [18], HUGO [16], and the non STC-based embedding algorithms EMD [22] and matrix embedding [23] are used to  Figure 4. For comparison, the original ROC curve without any embedding operation is also plotted. It can be seen that the proposed method performs a slighter drop of the recognition rate. Comparing with non STC-based embedding algorithms, the STCbased embedding algorithms perform better recognition rate since the image content is took into account. Furthermore, the proposed method achieves the best recognition rate among all the embedding algorithms since the proposed distortion function is designed with the guidance of recognition rate.
The curve of SMR against the payload is shown in Figure 5 with FAR = 10 −4 . It can be seen that the proposed method performs better recognition rate for all cases. When the payload is large enough such as 0.2 bpp, the personal data may be embedded into the iris feature region, which would result in the decreasing of iris recognition accuracy. In this case, the advantage of the proposed method becomes not obvious. For this reason, the recognition rates of the several methods are similar when payload is larger than 0.2 bpp.

CONCLUSION
This paper proposes a data hiding method for iris image.
To minimise the impact on the iris recognition, the embedding modifications are restricted into the regions without important iris features with the help of the proposed distortion function and STC framework. After the personal privacy data is embedded in iris images, the privacy of users is protected. In this way, no additional storage space is needed. Experimental results show that we are able to embed sufficient personal data into the iris image and maintain a high recognition accuracy meanwhile. For further study, we will focus on the data hiding methods of other biometric images.

DISCLOSURE STATEMENT
No potential conflict of interest was reported by the authors.