Effective detection by fusing visible and infrared images of targets for Unmanned Surface Vehicles

ABSTRACT The research progress for Unmanned Surface Vehicle (USV) is of great significance to human offshore operations. Target detection is the foundation for USV applications. Ocean wave, frog, and illumination are the most important factors that affect exactness of target detection through visible and infrared images. This paper proposes an algorithm for weighted averaging fusion of visible/infrared images. Firstly, the visible light/infrared devices are required to collect the target surrounding information, perform feature analysis, and complete the anti-fog and de-noising preprocessing. These operations aim at improving the accuracy of image segmentation. Secondly, feature extractions of the visible and infrared target images are performed, respectively, and the recognition of the target image is further completed. Finally, image fusion is performed by weighted averaging of the targets detected by visible light and infrared images. The fusion uses a matching matrix to represent the similarity of the two images. When the two images are very similar, the image is fused by weighting pixels, which effectively improves the accuracy of the detection. Lots of simulations were conducted on MATLAB 2015a with a personal computer, and the results verified the success rate of target detection and recognition.


Introduction
Unmanned Surface Vehicle (USV) is a kind of ocean surface vehicles. It has the advantages of high speed, small size and autonomous ability. USV is primarily used to perform tasks that are dangerous and unsuitable for manned vehicles, for example, reconnaissance, searching and rescue, navigation and other tasks. To enhance the autonomous ability of USV on the ocean surface, successfully completing the detection and identification of marine targets is necessary. Many methods suitable for surface target detections were implemented by means of visible light, infrared and radar images. Through the analysis of image characteristics, visible and infrared target detection and identification approaches are adopted in this paper. Complementing the fusion of the visible/infrared target images provides the USV with accurate inputs for implementing the next operations.
In order to ensure the safety of ship navigation, the effective sea surface objects image enhancement method can make the computer imaging system get clearer sea surface image [1]. Because the ocean target is easily affected by many factors, such as climate, sunshine, and cloud in the optical image, it is difficult to extract the feature of the visible light image. Therefore, grey statistical features of the target, geometric and edge texture features, visual perception features, fractal model, fuzzy theory and rough set theory, et al are used. We regard them as the preferred target extraction of visible light [2][3][4]. In order to completely preserve the edge characteristics of the image [5], proposed a steering filtering technique, which has similarity with the bilateral filtering. The approach can maintain the edge characteristics of the image, and makes the image feature extraction and detection more convenient. Comparing the characteristics proposed by the scholars above, this paper proposes a multi-scale fractal selection approach, to complete the region detection and recognition.
The infrared target detection and recognition, image preprocessing usually used low-pass filtering, highpass filtering, median filtering and spatial low-pass IIR filtering. In the infrared target detection, the traditional methods include statistical threshold segmentation, geometry and motion characteristics analysis. Roberts operator, Sobel operator, Canny operator and Laplace operator are commonly used in target edge detection [6][7]. When detecting moving targets in an image sequence, there are differential image method, optical flow method and statistical model. In order to achieve the goal segmentation [8], proposed a feature based fuzzy inference system for infrared image segmentation. It used the unimodal threshold and morphological processing to extract the local spatial features of the target image, and then used the fuzzy rea-soning system to complete the target segmentation [9]. Introduced an efficient infrared image segmentation method for sea target identification. The method consists of two subroutines: iterative image segmentation and ship target determination. In order to reduce the influence of noise [10], proposed a Gaussian median filtering framework where Moirés noise was removed in X-ray fibre images [11], proposed an improved adaptive Gaussian filter to reduce the periodic noise in digital images, and the parameters of the filter were determined by region growing method. In moving target detection [12], proposed a moving target detection method based on infrared image sequence in a complex background. In the method, the target was extracted from background diversity characteristics of target and background diversity. In addition, the trajectory of the target moving was also extracted by random projection filter RX. In the dim target detection [13], proposed an approach for detecting a kind of infrared moving dim small target in image space based on the significance detection. It used local adaptive comparison operation to calculate time significance maps and spatial significance maps in the time domain. Based on the advantages of the above methods in infrared target detection and recognition, this paper utilizes the ideal high-pass filter and linear Laplace operator to complete the image preprocessing, and utilizes the inter-frame difference method to complete the moving object detection and image feature extraction [14].
In the visible/infrared image fusion, the main methods include: weighted average, logic filtering, multiresolution pyramid, wavelet transform methods, et al. In order to complete the fusion of infrared/visible images [15], proposed a fusion algorithm based on target region extraction and compressed sensing. It solved the problem that the fusion of infrared and visible images is easy to be disturbed by noise. In order to improve the visual effect of low brightness infrared and visible image fusion, and using bilateral filters to estimate illumination [16], proposed a contrast-enhanced image fusion method based on multi-scale Retinex transform, which effectively improved the sharpness of the fused image. The weighted averaging fusion of visible light and infrared image is widely used in the field of image detection and target recognition, and the weight factor of source image has an important influence on the fusion result [17]. In addition [18], proposed the fusion of standardized information entropy, standard deviation and the difference between the quantitative indicators. Combining with the scientific simulation results, an optimal scale factor method for determining the weighted fusion of visible and infrared images was studied, which laid the foundation for evaluating the fusion quality of visible and infrared images. In the image fusion algorithms for solving time-consuming and fusion strategy selection problem [19], proposed a new multi-band fusion method at feature level based on region covariance matrix. It can be used for the fusing of multiple features, meanwhile ensures the difference of different goals and reduces the amount of computation [20].
Comparing the advantages and disadvantages of the above-mentioned visible/infrared image fusion methods, it can be summarized that when the conditions for visible light detection and infrared detection are not obvious, the information between the two methods can complement each other. Then the unmanned boats conducting reconnaissance, search and rescue have efficient and accurate information input. Furthermore, this paper takes advantage of the weighted averaging fusion method at pixel-level for visible/infrared image fusion. It effectively combines the complementary grey-scale information of the visual and infrared images, and fully utilizes the function of the visible light image to supplement the detailed contour of the foreground target, so that the target detection and recognition capability of the USV are obviously improved.

Problem description and motivation
With the increasing emphasis on sea developments, the investigations about the infrared, video and radar detecting technologies for target detection and recognition have become an important content of scientific research. Therefore, the proposed Infrared/visible imaging system includes image preprocessing, target detection and target recognition modules. The image preprocessing includes de-noising, clutter suppression and image enhancement. Space or frequency domain image transformations enhance the potential target's feature information. Thereby suppressed background clutter and noise, and improved the target detection probability. The above measures felicitate target detection and identification.
In the process of visible light image processing, the problem of target detection and recognition in complex sea environment and sky background is solved. Because of the fractal nature of the sea's natural background, artefacts like ships have no fractal character. Multiscale fractal model is composed of image preprocessing, threshold segmentation, multi-scale fractal feature extraction, target determination and regional growth process. Since the fractal parameters of natural objects and man-made objects vary with the corresponding scales, the target image is extracted. Multi-fractal is used to describe the distribution characteristics of the fractal structure under different local conditions. The multi-scale fractal characteristics of sea clutter are analysed, and the false alarm probability of target detection is reduced.
Infrared image sequences can be described in the following: where f (x, y, k) represents the grey value of the point (x, y) on the image of the k frame, f T (x, y, k) represents the grey value of the target passing through the point, f B (x, y, k) represents grey value of background, and N(x, y, k) is the sum of noise caused by the system. The background grey image f B (x, y, k) has a relative larger correlation length. It is distributed in low-and middlefrequency bands of infrared images. Both the noise N(x, y, k) and the grey value f T (x, y, k) are distributed in the high-frequency band.
The infrared radiation intensity of the target is greater than that of the general smooth background clutter. Therefore, the ideal high-pass filter is usually used to realize the preprocessing of the infrared images instead of using low-pass and median filters. After preprocessing, Signal-Noise Ratio (SNR) of the target was enhanced, and background clutter was suppressed. The results of preprocessing laid a foundation for the detection of infrared targets. There are two objectives of target detection: (1) to judge whether the target exists, and (2) to extract the location of the target. Infrared target detection typically consists of two categories: target detection based on single infrared images and target detection based on sequence images. The former is also called target extraction, and is an image segmentation approach based on the geometric features; the latter is called moving target detection, which removes the stationary background from the video, detects the moving target and gets the motion information. Target extraction using Sobel operator and Prewitt operator for image segmentation, extraction and recognition in the background. Moving objects detection uses the image inter-frame differences about moving target.

Target detection and recognition strategy for visible light images
The flowchart for sea surface visible light target detection and recognition has been shown in Figure 1. Firstly, the Retinex image enhancement algorithm and median filter are used to preprocess the image to eliminate the noise interference. The input sequence is where I is a natural number subset, the window length is n, then the filter output is as follows: where q = (n − 1)/2. The larger inter-class variance and genetic algorithm are utilized to realize fusion and optimization. Then, an optimal threshold is obtained. Through segmenting of the image, the number of potential target points is determined. It is assumed that there are m grey values in an image, and there are n pixels which have the grey value i, then the total number N of pixels and probability p i for each pixel grey value are obtained. Dividing the m grey values into two groups as C 0 = [1, . . . , k] and C 1 = [k + 1, . . . , m], then probabilities of groups C 0 and C 1 are given in the following, respectively.
The average grey value of groups C 0 and C 1 is The whole average grey mean is where w = m i=1 p i . When the threshold is k, the mean becomes The mean of a sample is µ = w 0 u 0 + w 1 u 1 , and the variance equation between the two groups is as follows: Substituting u of (7) into the above equation, then we have Secondly, wavelet decomposition is utilized to extract image features. Each image takes a potential target as centre to extract the features of potential multi-scale fractal. Compared with other background images, the multi-scale fractal features of the actual target points are greatly different and can be used for the extraction of the actual target points. Finally, the fractal dimension threshold and multiscale fractal feature threshold are used to judge the edge of the target to ensure the stability of the target detection and effectiveness of eliminating wave interference. Since many intermittent sub-images have high multiscale fractal feature, the maximal area is selected as the target area. Afterwards, the area centre of mass is calculated, and the area is recognized by grey image region growing.

Target detection and recognition strategy for infrared images
The flowchart of sea surface infrared target detection and recognition is shown in Figure 2.
(1) Ideal high-pass filter for image preprocessing The use of high passes filter weakens low-frequency components and sharps the high-frequency components of the image during image processing. This paper uses an ideal high-pass filter. It's a very clear "ringing" filter that lets high-frequency components of a Fourier transform passing. Maintaining the relative invariance of high-frequency components while reducing the passage of low-frequency components can more completely suppress the background. This enhances the image so that its edges and details are clearer.
Furthermore, the Laplace transformation is used to sharpen the image. Laplace operator is an edge point detection operator that is irrelevant to the direction of edge, its response to isolated pixels is faster than the response to edges or lines. In addition, it is a linear second-order differential operator with rotation invariance. So, it is suitable to improve the image mistiness resulted from light diffuse reflection, enhance the edge distinctness, weaken or eliminate low-frequency components without affecting the high-frequency components. Filtering out of these low-frequency components makes the image contrast increasing, and edges clearing. The goal of making the images suitable for detecting and recognizing is realized.
(2) Image segmentation method based on edge detection Due to the edge of the object is reflected by the grey level discontinuity, so the detection method is the general change of each pixel grey value of image edge in a region. With the edge of the neighbouring first-order or second-order derivative changing of edge detection, this method is often referred to as the local edge detection operator method. The commonly used edge detection operators include gradient, Roberts edge detection, Prewitt, Sobel, Laplace ones, etc. The change in a region is examined by each pixel grey level of the image of through Sobel and Prewitt operators in this paper. Because the two operators both conducting differentiating and filtering calculations on the image, the grey gradient and noise suppression are improved. At the same time, the edge location of the two operators is accurate and complete, which laid a foundation for image segmentation.
Sobel operator-based image edge detection method is realized by a neighbourhood convolution on one of two direction templates and the image, respectively, in the image space. The roles of the two direction templates are that one template detects the vertical edge and the other detects the horizontal edge. Then larger value of the two convolution results is assigned to the corresponding pixels on the image in the template, as a new pixel grey values. The Prewitt operator-based approach utilizes two mask convolutions as an edge detector. Usually larger value is taken as the output, which makes the algorithm sensitive to the edge changing trend. If their square mean is adopted, the more consistent performance of the full range of response can be obtained. The result is close to the true gradient. In addition, the operator can extend to eight directions, namely edge template operator, which is obtained by offline edge sub-images. The edge templates are used to detect the image in turn, and the template most similar to the detected area with the maximum, which is used as the output of the algorithm. So, the edge can be detected.
(3) Detection of moving targets by inter-frame difference Inter-frame difference is a method for the differential operation of adjacent two image frames of the video sequence to obtain the contour of the moving target; it is suitable for multiple moving targets and camera in motion. When abnormal moving objects appear in the scene, there will be a significant difference between the two frames. The absolute value of the brightness difference for two images is obtained by a frame subtracting the other frame. Whether the value is greater than a threshold represents the motion characteristics of the video or image sequence. Moreover, whether there is a moving target in the image sequence is determined. Inter-frame difference for two adjacent frames is equivalent to filter the image sequence by a high-pass filter in the time domain.
(4) Infrared binocular parallax It is assumed that two identical lens systems are adopted, their focus is f, the distance between the two lens centres is the baseline b, the focal plane of the two lens is in the X-Y plane, and the two optical axes of two lenses are parallel. If a space target located at W(X, Y, Z) is detected by the two lenses in the focal plane and represented as a(X 1 , Y 1 ) and b(X 2 , Y 2 ), respectively, then the parallax is d = X 2 − X 1 . The range finding system uses the parallax of the target on two focal planes.

Infrared/visible image fusion based on the weighted average method
The Infrared/visible image fusion flowchart is shown in Figure 3.
The weighted averaging method is applied for treatment of the corresponding pixels of multiple original images. Assuming that N 1 , N 2 , . . . , N λ source images of size I×J need to be fused, and the fused image is F, then the fusion process of the weighted averaging method is: Because the noise in the image has a high contrast ratio, the technique is very sensitive to noise. The synthesized images will contain very strong noise. To overcome the shortage, a combination approach of selecting and averaging is adopted. In detail, a matching matrix to represent the similarity of the two images is utilized. If two images are very similar, then pixel weighting value is divided into 256 grades, each pixel corresponds to the level of S i , which is an integer and ranges from 0 to 255, the scar for point i is S x / λ x=1 S x . If there is a significant difference between two images, then select an image that more distinct, the weight corresponding to the pixel with larger contrast ratio is 1; otherwise, all weights are zeros to achieve the purpose of noise suppression.

Sea surface visible light images
The simulation is conducted on MATLAB R2015a environment with a personal computer; the pictures used hereafter are derived from the Caltech-101 image library [21].
The Retinex enhancement algorithm and median filter are used to preprocess the image to eliminate the interference of fog and noise on the image. The original RGB images were processed through R, G, B three channels by Retinex enhanced processing, and then integrated into new images. The distribution of histograms before and after processing show that Retinex image enhancement can maintain local features of the original image to some extent. As shown in Figure 4, its processing effect is smoother, more natural colour characteristics and the fog effect is more obvious. The median filter is used to de-noising the image and compared with twice wavelet transform de-noising image.  In this paper, the median filter is realized for image de-noising. According to the threshold segmentation technique, the maximum inter-class variance and the genetic algorithm are used for the fusion. The optimal threshold of the graph is 94. Next, the multi-scale fractal feature is used to extract the image features. Each potential target point is taken as the centre, and the actual target points are extracted. Since consecutive sub-images have very high multi-scale fractal features, and the larger area is selected as the target region. The target area is obtained through the growth of the grey image area, and the area is identified, as shown in Figures 5-7.

Sea surface infrared images
The ideal high-pass filter is used to de-noise for the images. The image is preprocessed so that the edge features are clear, as shown in Figure 8. Using the linear Laplace transformation to perform image sharpening, the images are made more clear, as shown in Figure 9. Edge detection is done using the edge detection method. The image segmentation used is Sobel operator and Prewitt operator-based approaches [5][6], as shown in Figure 10.

Sea surface visible/infrared image fusion
Firstly, the visible/infrared image (296 × 232) is taken for image preprocessing (enhancement, de-noising, etc.) in the same scene to complete the extraction of the visible/infrared edges and texture features to make the image clear and easy to blend. Then, the processed visible/infrared image which has the same size as the original images is dealt with by weighted averaging. The result images are clearer than the original images, as (a) Fusion of Otsu method and genetic algorithm (b) Optimal fitness value evolution curve      shown in Figure 11. Finally, the sharpness of the visible/infrared images and the fused images are calculated, respectively, as shown in Table 1.
By comparing the sharpness of visible, infrared, and fused images in Table 1, it is concluded that the image that was processed visible image fused with the image processed by the infrared Sobel and Prewitt eight neighbourhoods has high sharpness. In particular, the sharpness of image fused with the infrared Prewitt eight-neighbour processing has upper bound 10.6152. After a series of processing of grey-scale for the visible/infrared image, it is found that the detected visible/infrared images have their own disadvantages. Visible light image of the target cannot be highlighted; in the face of foggy weather, night, the target detection is not obvious. Infrared images can only reflect whether there is a target in front of USV, but cannot tell what the specific context about the target.  By comparing the accuracy and efficiency of existing visible/infrared image fusion methods for sea surface target detection, it is found that most of the fusion algorithms are based on local features of visible/infrared image targets, edge textures, and regional extraction to enhance the local effects of the target. However, they cannot effectively extract and apply for detecting targets with incomplete information. Therefore, this paper addresses the weighted averaging fusion at pixel-level. The method can effectively combine the complementary grey-scale information of the visual and infrared images, and fully utilize the function of the visible light image to supplement the contour details of targets in front of USV. It can complement the information of the target detected by visible light/infrared and improve the detection and recognition ability of the USV.

Conclusions
Based on the fusion of infrared and visible images, the weighted averaging fusion method has been addressed. The effectiveness and practicability of the algorithm are evaluated from a computer simulation point of view. By comparing the images obtained by fusion, the intended purpose can be achieved, so that the USV can overcome unfavourable factors caused by weather, night and the like on the sea surface. However, in the process of image processing, there are also some problems difficult to overcome, such as that texture details did not reach the expected value under current technology. Therefore, image processing is an important module in the detecting sub-system. How to improve the image processing ability and develop accurate detecting equipment are important parts of future research.

Disclosure statement
No potential conflict of interest was reported by the authors.