Evaluation of colour space effect on estimation accuracy of hyperspectral image by dimension extension based on RGB image

Recently, the utilization of hyperspectral images containing several hundred wavelength information has been increasing in various fields. If a hyperspectral image can be estimated from a low-cost RGB image that has only R, G, and B wavelength information without using a hyperspectral camera, it would be useful in various fields. Herein, we propose a hyperspectral image estimation method based on RGB images, wherein RGB components and YUV colour space information calculated from the RGB are applied to a neural network for tuning, and the hyperspectral image is estimated by inputting the output from the tuning neural network to a decoding function of the trained autoencoder. To evaluate the estimation accuracy of hyperspectral images based on differences in the combination of RGB and colour space models, we conducted validity experiments for the estimation of hyperspectral images in three scenarios with different colour spaces: RGB and YUV, RGB and HSV, only RGB. The results showed that the scenario with RGB and YUV colour space exhibited the highest estimation accuracy of 0.913 by averaging all similarities for wavelength among the three scenarios; thus, the validity of the proposed method as an estimation method for hyperspectral images was verified.


Introduction
Recently, hyperspectral images have been receiving increased attention as it contains several hundred wavelength information. Compared with RGB images, which have three-wavelength information, hyperspectral images have a wide range of wavelengths; thus, it is possible to recognize and classify some specific phenomena using hyperspectral images with high accuracy. Various studies utilizing hyperspectral images instead of RGB images have been conducted in various fields. In [1][2][3], certain diseases or stress were diagnosed noninvasively using hyperspectral images in the medical field. In [4][5][6][7], certain objects or targets that cannot be recognized by RGB images were recognized and tracked using hyperspectral images in the security control field. In [8][9][10], analysis systems of soil or plants for environmental protection or application for agriculture were designed using hyperspectral images of land, plants, or crops. It should be noted that hyperspectral cameras are utilized for capturing hyperspectral images, and they cost approximately 5-10 million yen. Hence, it is difficult to install hyperspectral cameras for extended fields. Therefore, if a hyperspectral image can be estimated from a low-cost RGB image that has only R, G, and B wavelength information without using a hyperspectral camera, it would be useful in various fields.
To extend low-dimensional information of the images to high-dimensional information, several studies have been conducted on colourization, in which RGB images are estimated by extending one dimension of monochrome images to three dimensions of RGB images. In [11][12][13][14][15][16][17], high-quality colourization methods were proposed using a neural network. In [18,19], colourization methods using generative adversarial networks were proposed and it showed high accuracy. It is extremely difficult to apply the methods in these studies directly to the estimation of hyperspectral images with wavelength information in several hundred dimensions from RGB images with three wavelengths. Additionally, the computational cost for training and number of training datasets would be very expensive if the colourization method, which is a dimension expansion from one dimension to three dimensions, is adopted to perform dimension expansion from three dimensions to several hundred dimensions because the rate of dimensional expansion is more than a hundred times greater. To accelerate learning, the compressed values of high-dimensional data in hyperspectral images can be used as feature values. Because the closed-form equations for compressing high-dimensional hyperspectral images are unknown, we utilize machine learning methods (i.e. neural network and autoencoder) to generate models that will transform RGB images into the desired compressed values. To ensure the effective training of the machine learning models, we add additional information to supplement RGB components, which are insufficient as input values for estimating the high-dimensional wavelengths of hyperspectral images. As additional information, we focus on two colour spaces with different perspectives and independent information relative to the RGB space, namely a colour space with luminance and colour difference signals, and a colours space with hue, saturation, and lightness. YUV colour space information is utilized for image encoding strategies as an alternative to common image compression schemes [20] and an effective novel content-aware chroma reconstruction method for screen content images [21]. HSV colour space information is utilized for skin colour enhancement to make skin colours displayed on largescreen flat-panel TVs consistent [22] and an image defogging algorithm with colour correction for video processing [23]. Furthermore, we proposed an estimation method for hyperspectral images with 462dimensional wavelength information using neural networks from the three-dimensional wavelength information of RGB images [24]. In [24], a high-dimensional wavelength image was estimated by applying not only RGB information but also HSV colour space information calculated from RGB to the neural network. In addition, the estimation accuracy due to differences in the characteristics of the light source in the image capturing environment was compared.
In this study, we propose a method for the estimation of hyperspectral images based on RGB image and YUV colour space information calculated from the RGB by utilizing a neural network and an autoencoder's decoding function. In the learning phase, images captured by a hyperspectral camera are learned. Next, a hyperspectral image is estimated from an RGB image utilizing a neural network and autoencoder decoding function. At this time, a hyperspectral camera is not required for the estimation phase. We then evaluate the effect of the colour space on the estimation accuracy of the hyperspectral image due to differences in the combination of RGB and colour space models. Figure 1 shows the architecture of the proposed method for estimating a hyperspectral image from an RGB image. The proposed method mainly consists of a tuning part and dimension extension part. The tuning and dimension extension parts are implemented utilizing a neural network model and the decoding function of the autoencoder, and both parts are trained in the learning phase in advance as shown in Figure 1(a). The autoencoder in the dimension extension part is trained by a hyperspectral image; then, the neural network in the tuning part is trained by applying the middle layer of the trained autoencoder and feature vector containing the RGB components and YUV colour space information.

Proposed method
In the estimation phase shown in Figure 1(b), a hyperspectral image is estimated by applying a feature vector with an RGB components and YUV information from an RGB image to the trained tuning neural network and decoder of the autoencoder for dimension extension.

Defined variables
Herein, we define variables related to image information, hyperspectral image, autoencoder and neural network, YUV colour space, and HSV colour space as shown in Table 1.

Dimension extension part based on the autoencoder
The dimension extension part was implemented using the decoding function of the autoencoder as shown in the dimension extension part of Figure 1(a). To obtain the decoding function, the autoencoder was trained using the N I of hyperspectral images with an image size of N x × N y . In the nth hyperspectral image for training the autoencoder, an arbitrary pixel i and j has brightness value of I λ tn (i, j) with 256 levels for each index λ ( = 1, 2, . . . , K) corresponding to the wavelength. In the autoencoder, both the input and output layers have K units corresponding to the brightness values I t n 1 (i, j), I t n 2 (i, j), . . . , I t n K (i, j); M ( < K) of units Q m (m = 1, 2, . . . , M) were arranged in the middle layer of the autoencoder, as shown in Figure 1. Because N I images with an image size of N x × N y were used for training, the autoencoder was trained by N I × N x × N y pixels in total. From the result of the training, an encoder which compresses the dimension K of brightness value I t n λ (i, j) for each pixel to the dimension M of units Q m , and a decoder which extends dimension M of units Q m to dimension K of brightness value I t n λ (i, j) were obtained. Let F decode (Q m ) be the decoding function of the decoder which outputs the brightness value I t n λ (i, j) in dimension K from unit Q m in dimension M.

Tuning part based on the neural network
To estimate hyperspectral image I t n λ (i, j) by an RGB image, a tuning neural network that creates Q m from the RGB components was trained as shown in the tuning part of Figure 1(a). For an arbitrary pixel I t n λ (i, j) of the training hyperspectral image, the brightness values I t n λ R (i, j), I t n λ G (i, j), I t n λ B (i, j) represent the brightness of each R, G, and B components, respectively. Hence, the I t n λ R (i, j), I t n λ G (i, j), and I t n λ B (i, j) were used to train the tuning neural network as RGB components. In the tuning neural network, the feature vector Z t n (i, j) for the input layer consisted of RGB components and YUV colour space as follows, as defined by the International Telecommunication Union: The gamma values (γ = 2.2) for gamma correction were taken from [25].
Therefore, the number of units in the input layer was six to receive each element of the feature vector Z t n (i, j). N I × N x × N y of feature vectors Z t n (i, j) were obtained from the hyperspectral images for training. The tuning neural network was trained using the set of Z t n (i, j) Number of discretized wavelengths and number of units in the input and output layers in the autoencoder λ Index numbers corresponding to each wavelength of the R, G, and B components of the RGB images λ low , λ high First and last number of wavelengths λ for each wavelength band (B1)-(B8) and full wavelength band defined in Table 2 Hyperspectral Image Brightness value of an arbitrary pixel i, j with a wavelength index λ in nth hyperspectral images for training Brightness value of an arbitrary pixel i, j with wavelength index λ of the hyperspectral image in the estimation phasê Feature vector to input the tuning neural network in the learning phase Z e (i, j) Feature vector to input the tuning neural network in the estimation phase Component of Y (luminance signal), U (colour difference signal representing blue-yellow component), and V (colour difference signal representing red-green component) in the YUV colour space for the learning phase Component of Y, U, and V in the YUV colour space for the estimation phase Component of H (Hue), S (Saturation), and V (Value) in the HSV colour space for the learning phase Component of H, S, and V in the HSV colour space for the estimation phase and Q m . From the results of the training, a tuning function F tune {Z t n (i, j)}, which outputs Q m by applying the feature vector Z t n (i, j) was obtained.

Estimation phase
In the estimation phase, a hyperspectral image was estimated per pixel as shown in Figure 1(b). Hence, for arbitrary pixel i, j, a pixelÎ e λ (i, j) including K of brightness corresponding to each λ ( = 1, 2, . . . , K) was estimated by a pixel I e were obtained. Then, the feature vector Z e (i, j) was input to the tuning function as F tune {Z e (i, j)} to obtainQ m (m = 1, 2, 3, . . . , M). TheQ m was applied to the decoding function as F decode (Q m ); then, the hyperspectral imageÎ e λ (i, j) at the pixel i, j including K of brightness corresponding to each λ ( = 1, 2, . . . , K) was estimated. Finally, the entire hyperspectral image was estimated by repeating the above-mentioned procedures for all pixels I e λ (i, j) (i = 1, 2, . . . , N x , j = 1, 2, . . . , N y ).

Validity experiment
In the validity experiment, we verified how accurately hyperspectral images are estimated by the proposed method. We also compared the estimation accuracies of the hyperspectral image among three scenarios with different combinations of RGB components and two models of colour spaces. Figure 2 shows the entire experimental system used in the validity experiment. The experimental system comprises a hyperspectral camera (RESONON, Pika RC2) to capture hyperspectral images, and a PC to control the hyperspectral camera. The number of wavelengths of the hyperspectral images was 462-level (K = 462) wavelengths and corresponded to the range from 398.67 nm (λ = 1) to 1016.78 nm (λ = 462) with wavelength resolution of λ = 1.34 nm. The number of pixels in each image corresponding to each wavelength level of a hyperspectral image was 1,600,000 pixels (N x × N y = 1, 000 × 1, 600). An object that would be captured by the hyperspectral camera was set at a distance of 0.15 m from the camera lens. To irradiate a light source containing widespread wavelengths to the object, the experiment was conducted under sunlight, as shown in Figure 2. An RGB image is obtained by extracting three components from the 462-level wavelengths in the captured hyperspectral images because it may not be possible to evaluate the results of estimation accurately by changing the perspective between an RGB camera and hyperspectral camera if RGB images and hyperspectral images are captured from different cameras. The number of pixels in each image for the R, G, and B channels is 1,600,000.

Experimental procedures
As objects to be captured for the images to verify the proposed method, we used tomato, banana, and eggplant to cover a wide range of wavelengths. Figure 3 shows the RGB images with three RGB components (λ R = 183 (640.01 nm), λ G = 115 (549.43 nm), λ B = 48 (460.66 nm)) from the 462 wavelengths in the captured hyperspectral images for training and test image. In this experiment, three images were used in the learning phase (N I = 3), and an image was applied as a test image for the estimation phase. The images in Figure 3 are RGB, but hyperspectral images corresponding to each RGB image were also captured simultaneously.
The number of units in the middle layer M for the autoencoder was set to six. To compare the estimation accuracies of the hyperspectral image among different combinations of RGB components and two models of colour spaces that are input to the neural network in the tuning part, the following three scenarios with different feature vectors Z t n (i, j), and Z e (i, j) were applied to estimate the hyperspectral image.

1st scenario
In this scenario, a hyperspectral image was estimated via the proposed method utilizing the RGB and YUV colour space. Hence, the feature vectors were used in the learning and estimation phases, as mentioned in the proposed method.

2nd scenario
In this scenario, RGB and HSV colour space were applied to estimate the hyperspectral image. Hence, in the learning phase, the feature vector Z t n (i, j) consisted of RGB components and HSV colour space as and it was applied to the input layer in the tuning neural network. The U t n h (i, j), U t n s (i, j), U t n v (i, j) of the HSV colour space information at pixels i and j are calculated as follows: where  (i, j).

3rd scenario
In this scenario, only RGB components were applied to the input layer in the tuning neural network. Hence, the feature vector Z t n (i, j) = [I t n λ R (i, j), I t n λ G (i, j), I t n λ B (i, j)] and Z e (i, j) = [I e λ R (i, j), I e λ G (i, j), I e λ B (i, j)] were used applied in the learning and estimation phases to estimate the hyperspectral imageÎ e λ (i, j).

Evaluation method
The test image had an RGB image corresponding to I e λ R (i, j), I e λ G (i, j), I e λ B (i, j) and hyperspectral image I e λ (i, j) which can be a reference for the estimated hyperspectral imageÎ e λ (i, j). Therefore, in this experiment, we defined the similarity S λ to express cosine similarity. Cosine similarity has been widely used in machine learning applications such as natural language processing and image processing. Mathematically, cosine similarity is very similar to Euclidean distance in that both methods aim to define the similarity between two vectors. The main difference between the two methods is that Euclidian distance considers both the sizes and directions of two vectors, whereas cosine similarity considers only the directions. In the context of this study, the size of a vector corresponds to the brightness of an image. Because we are more interested in the relative brightness between pixels, we adopted cosine similarity as an evaluation index. The similarity S λ is calculated for every wavelength λ between the estimated hyperspectral imageÎ e λ (i, j) and reference hyperspectral image I e λ (i, j) as follows: Because the similarity S λ is calculated for each wavelength, we obtained 462 similarities (S 1 , S 2 , . . . , S 462 ) in total. The accuracy of the method was evaluated for eight wavelength bands (B1-B8) and full wavelength band. For each band, we calculated an evaluation index S(λ low , λ high ), which was defined as the average value of the similarities from S λ low to S λ high as follows:  Finally,S(λ low , λ high ) values for each band were calculated for all three scenarios. We compared the estimation accuracies of the hyperspectral image among three scenarios with different combinations of RGB components and two models of colour spaces. Figure 4 shows the results for the wavelength 529.51 nm (λ = 100), which corresponds to green wavelength band (B4). Figure 4(a) represents reference image I e 100 (i, j), and Figure 4 (b-d) is estimated imageŝ I e 100 (i, j) obtained from 1st, 2nd, and 3rd scenarios, respectively. Comparing the estimated imagesÎ e 100 (i, j) of all three scenarios, the 3rd scenario in Figure 4(d), in which only RGB components were applied, was estimated to be darker than those of other scenarios and reference image I e 100 (i, j). The 2nd scenario in Figure 4(c), in which RGB and HSV colour space information were applied, was estimated to be brighter than those of other scenarios and reference image I e 100 (i, j). Among all three scenarios, the 1st scenario (Figure 4(b)), in which RGB and YUV colour space information were applied, was estimated more precisely than the other scenarios. Similarities, S 100 , in Figure 4(b-d) with Figure 4(a) were 0.985, 0.717, and 0.777, respectively. The results for the 1st scenario exhibited the highest accuracy among all the three scenarios at the wavelength of 529.51 nm. Figure 5 also shows the results for the wavelength 959.54 nm (λ = 420), which corresponds to the infrared band (B8). Figure 5(a) shows the reference image I e 420 (i, j), and Figure 5 (b-d) shows the estimated imagesÎ e 420 (i, j) obtained by the 1st, 2nd and 3rd scenarios, respectively. Comparing the estimated imagesÎ e 420 (i, j) of the three scenarios, the 2nd scenario in Figure 5(c), in which RGB and HSV colour space information were applied, was estimated to be brighter than those of the other scenarios and reference image I e 420 (i, j). The 1st scenarios and 3rd scenario in Figure 5(b, d), in which RGB and YUV colour space information, and RGB components were applied,  respectively, were estimated to be darker than those of 2nd scenario but brighter than the reference image I e 420 (i, j), as shown in Figure 5(a). Similarities, S 420 , in Figure 5(b-d) with Figure 5(a) were 0.803, 0.810, and 0.796, respectively. The similarity S is cosine similarity, which considers the directions of two vectors corresponding to the relative brightness of the images used in this study. This is why the similarity S 420 in Figure 5(c) is higher than that in Figure 5(b), even though the estimated image in Figure 5(c) differs more significantly from the reference image in Figure 5(a) compared to that in Figure 5(b). Because the wavelength at 959.54 nm is in the infrared band, objects in the image I e 420 (i, j) of Figure 5(a) were darker and the edge between objects and background was not clear. In contrast, the objects in the estimated image presented in Figure 5(b-d) were clearer. In addition, eggplant in the correct image in Figure 5(a) was whitish, but in Figure 5(b-d), the eggplant in eachÎ e 420 (i, j) was estimated to be black, as in the visible light band image. Hence, comparing the similarities S 420 and S 100 for each scenario, it was found that all S 420 values were lower than S 100 . Table 3 shows evaluation indicesS(λ low , λ high ) for each wavelength band (B1)-(B8) and all 462 wave-lengthsS(λ 1 , λ 462 ) in the 1st, 2nd, and 3rd scenarios, respectively.

Results of evaluation indices for each scenario
The results for 1st scenario exhibited the highest accuracy. In the visible light band from the (B1) purple band to the (B7) red band, the evaluation indices S(λ low , λ high ) of 1st scenario were the highest among the three scenarios. The evaluation indexS (1,462) for all 462 wavelengths was also the highest at 0.913 in 1st scenario. Regarding the (B8) infrared band, S(287, 462) of the three scenarios were 0.838, 0.852, and 0.831, respectively, and there were no significant differences.
From Table 3, it was confirmed that by applying the RGB components and YUV colour space information to the input layer of the tuning neural network, the estimation accuracy of the hyperspectral image in the visible light band is higher than that in the case of applying RGB components and HSV colour space information or only RGB components to the input layer.

Discussions
The results of the validity experiments showed that the 1st scenario, where RGB and YUV colour space information are applied, had the highest accuracy from a comprehensive perspective and the evaluation index on full wavelength bandsS(1, 462) were 0.913, 0.812, and 0.811 in 1st, 2nd, and 3rd scenarios, respectively. This is listed in Table 3. In each wavelength band, there was a large difference in the index of the visible light bands, that is, from the purple band (B1) to the red band (B7). In the visible light bands, evaluation indices in 1st scenario were above 0.9, whereas all evaluation indices of (B1)-(B7) in 2nd and 3rd scenarios were less than 0.9. The YUV colour space was utilized in telecast for high-quality moving images. By using YUV colour space information including a wider luminance signal Y, differences occurred between the components on the YUV even for objects that were displayed in the same way between the components on RGB. Thus, the input of the YUV colour space as additional information, which has characteristics different from RGB, played a major role in training the tuning neural network and is considered to be a primary factor for the improved accuracy. In contrast, the HSV colour space information, which is devised based on the intuitive human sense of colour recognition, was not suitable for training, and did not improve the accuracy in the 2nd scenario.  Figure 6. RGB image of a colour checker derived from the 462 wavelengths in the captured hyperspectral images.
In the infrared band, there is no significant difference in evaluation indices among all scenarios, that is, 0.838 in the 1st scenario, 0.852 in the 2nd scenario, and 0.811 in the 3rd scenario. The reason why the YUV and HSV colour space information could not improve the similarities in the infrared band is that the YUV and HSV colour space information are essentially adapted for visible light wavelengths and not for infrared wavelengths.
In addition, in scenarios (e.g. 1st and 2nd scenarios) with six input layer units of the tuning neural network, the evaluation indices in the infrared are slightly higher compared to those (e.g. 3rd scenario) with three input layer units.
In this experiment, three images of a tomato, banana, and eggplant were used as objects for training data because the accuracy of estimation was verified by utilizing images of these vegetables as testing data. Additionally, an image of another object was inputted into the learning model used in the 1st scenario as testing data for this verification. RGB and YUV colour space information regarding the image of the colour checker used as a different object in the testing data for this experiment (see Figure 6) were inputted into the same models used in the 1st scenario (4th scenario). Table 4 lists the evaluation indicesS(λ low , λ high ) for each wavelength band from (B1) to (B8) and all 462 wavelengths S(λ 1 , λ 462 ) in the 1st and 4th scenarios, respectively.
All similarities in the eight wavelength bands and full wavelength spectrum are lower than those in the 1st scenario when inputting the image shown in Figure 3 as testing data, as shown in Table 3. Therefore, it is possible to obtain more accurate estimation results by learning images of objects that are closer in colour and material to the objects used as testing data.

Conclusions
In this study, we proposed a hyperspectral image estimation method based on RGB images. In the proposed method, the RGB component and YUV colour space information calculated from the RGB were applied to the tuning function provided by the tuning neural network. The hyperspectral image was estimated by inputting the output from the tuning function to the decoding function of the trained autoencoder, which functioned as a dimension extension. To evaluate estimation accuracy for different combinations of RGB and colour space models, we conducted validity experiments for the estimation of hyperspectral images in three scenarios with different colour spaces: RGB and YUV, RGB and HSV, and only RGB. The result showed that the scenario with RGB and YUV colour space information had the highest estimation accuracy of 0.913 by averaging all similarities for wavelength among the three scenarios; thus, the validity of the proposed method as an estimation method for hyperspectral images was verified. Using YUV colour space information, which corresponds to visible light wavelengths, the similarity in the visible light bands were improved compared to our previous work [24]. In our future study, we will conduct validity experiments to increase the estimation accuracy of the infrared band by utilizing other colour spaces which are suitable for the characteristics of images at infrared wavelengths instead of using YUV colour space information. Furthermore, we will validate a set of appropriate parameters for the proposed method. Additionally, to construct a model with greater generalizability, we must conduct verification on the choice of learning data for generating such a model.

Notes on contributors
Ryoji Sato received the B.E. degree from Aoyama Gakuin University, in 2021. He is currently pursuing the master's degree from the Graduate School of Science and Engineering, Aoyama Gakuin University. His research interests include image estimation and machine learning.