An improved GMS-PROSAC algorithm for image mismatch elimination

ABSTRACT Image matching usually plays a critical role for visual simultaneous localization and mapping (VSLAM). However, the resultant mismatches and low calculation efficiency of current matching algorithms reduces the performance of VSLAM. In order to overcome above two drawbacks, an improved GMS-PROSAC algorithm for image mismatch elimination is developed in this paper by introducing a new epipolar geometric constraint (EGC) model with a projection error function. This improved algorithm, named as GMS-EGCPROSAC, is made up of the traditional GMS algorithm and the improved PROSAC algorithm. First, the GMS algorithm is employed to obtain a rough matching set and then all matching pairs in this set are sorted according to their similarity degree. By selecting some smatching pairs with the highest similarity degree, the parameter of the EGC model is obtained. By resort to the calculated parameter, the improved PROSAC algorithm can be carried out to eliminate false matches. Finally, the real-time and the effectiveness are adequately verified by executing some contrast experiments. Our approach can not only quickly eliminate mismatches but also get more high-quality matching pairs.


Introduction
Real-time Visual Simultaneous Localization and Mapping (VSLAM) (Fraundorfer & Scaramuzza, 2011, 2012Mur-Artal, Montiel, & Tardos, 2015;Mur-Artal & Tardos, 2017) and 3D reconstruction (Bourke, 2012;Izadi et al., 2011;Rebecq, Gallego, Mueggler, & Scaramuzza, 2017) have become a research hotspot in recent years. It is well known that image matching plays an important role in this area (Masoud & Hoff, 2016). Some essential requirements of image matching algorithm involve real-time, accuracy as well as robustness (Wang, Zhu, Qin, Jiang, & Li, 2018;Zheng, Hong, Zhang, Li, & Li, 2016). Up to now, there is still a wide performance gap between slow (but robust) feature matchers and the much faster (but often unstable) real-time solutions. Generally speaking, image matching algorithm usually consists of four main steps. First, we should pick up some 'interest points' at distinctive locations in an image, such as blobs, corners, and T-junctions. Then, we can obtain the descriptor of the neighbourhoods of every interest point by utilizing a feature vector (Nepomuceno, Martins, Amaral, & Riveret, 2017;Rajarathinam, Gomm, Yu, & Abdelhadi, 2017). Such a descriptor should be distinctive and, at the same time, robust to noise, detection errors, and geometric and photometric deformations. Next, these CONTACT Derui Ding deruiding2010@usst.edu.cn descriptor vectors must be matched between different images. The matching is often based on a distance between the vectors, e.g. the Mahanalobis or Euclidean distance (Bian et al., 2017;Jordan & Zell, 2016). Finally, mismatch elimination should be dealt with to obtained origin match by implementing above steps.
In recent years, the image matching algorithm with point feature has been widely deployed because of its simple extraction operation, flexible matching and less time. The typical point feature matching algorithms include Scale Invariant Feature Transform (SIFT) (Lowe, 2004) and Speeded Up Robust Features (SURF) (Bay, Tuytelaars, & Gool, 2006), to just name a few. SIFT algorithm (Lowe, 2004) has good adaptability to scale change, rotation and illumination change, but has large computation, long time consuming and no real-time performance. SURF (Bay et al., 2006) feature matching algorithm usually reduces the dimension of the feature descriptor with intent to improving the matching efficiency. Unfortunately, the real-time performance of above algorithm is far from the requirement of VSLAM for mobile robots. By adopting convolutional neural networks (Wang, Chen, Qiao, & Snoussi, 2018;Zabidi et al., 2017) to train the learned invariant feature transform (LIFT) descriptors, a new algorithm has been developed in Morel and Yu (2009) to improve matching accuracy. But, it is still a time consumption one. Very recently, an oriented matching algorithm (called as ORB Rublee, Rabaud, Konolige, & Bradski, 2011) composing features from accelerated segment test (FAST) and rotated BRIEF feature (Rublee et al., 2011) discloses the rapidity in detecting feature points and matching while achieving the rough real-time performance for sparse feature point detection. For obtained match feature points in VSLAM of mobile robots, we found that there are a large number of false matches, which leads to the inaccuracy of the pose estimation and the poor robustness.
It is very crucial to eliminate false matches while retaining the matching quality in VSLAM. Typical approaches include both the Fast Library for Approximate Nearest Neighbours (FLANN) algorithm (Muja & Lowe, 2009) and Random Sample Consensus (RANSAC) (Fischler & Bolles, 1987) algorithm. FLANN is dependent on an artificial threshold value and RANSAC cannot accommodate the sheer number of false matches in the set of all nearestneighbour matches. Recently, Bian et.al. has proposed a grid based motion statistics (GMS) algorithm (Bian et al., 2017) to distinguish some false and true matches. It should be pointed out that such an algorithm only get a rough match set by eliminating some false matches of two images about the same 3D scene. In other words, there still exist some false matches. As false matches do not satisfy epipolar geometry constraints (EGCs) (Hartley & Zisserman, 2003;Kushnir & Shimshoni, 2014), it is possible to develop a new PROSAC algorithm (Chum & Matas, 2005) by fusing an EGC model to further eliminate false matches. EGC, which reveals the relationship between two views, is a basic concept in image geometry. Specially, suppose a point P in the three-dimensional space can be projected into two image planes, and its projection is denoted as p and p in the left and right images, respectively. This constraint shows that p must be located in the p corresponding polar line. In the end, it can be utilized to get more desired quality matches for VSLAM.
Responding the above discussion, our paper is concerned with the design of novel mismatch elimination algorithm via uniting both GMS algorithm and PROSAC algorithm. We endeavour to deal with the following challenges: how can we introduce the concept of epipolar geometry constraint and then utilize it to further eliminate mismatches, and how can we design and carry out a series of experiments to verify the effectiveness of proposed algorithm. To this end, first, we use the ORB algorithm (Rublee et al., 2011) to describe feature points and get a rough matching set with the help of hamming distance method. Then, a new algorithm via the epipolar geometric constraint is designed to get a large number of high-quality matching pairs. In comparison with the traditional ORB algorithm (Rublee et al., 2011) combined with FLANN (Muja & Lowe, 2009) and RANSAC (Fischler & Bolles, 1987), the proposed algorithm should possess higher real-time performance, higher matching precision and more number of high-quality matches. Our main contributions can be highlighted as follows: (1) A new epipolar geometric constraint model, combining with a projection error function, is employed to further eliminate mismatches; (2) an improved GMS-PROSAC algorithm, named as GMS-EGCPROSAC algorithm, is put forward to enhance the real-time performance and the matching precision, and to increase the number of high-quality matching; and (3) in comparison with the GMS, ORB, SIFT, SURF algorithms with a standard ratio-test, the effectiveness of proposed algorithm is adequately demonstrated by a series of experiments.
The rest sections are organized as follows. In Section 2, we introduce the ORB image matching algorithm including both the ORB feature detection and the ORB feature descriptor. In Section 3, we briefly summarize the GMS algorithm and then put forward to an improved PROSAC algorithm by adding an epipolar geometric constraint. Its flow chart and the algorithm steps are provided in details. In Section 4, we verified the validity of our algorithm through experiments. Finally, a conclusion is drawn in Section 5.

ORB image matching algorithm
In this section, we introduce the ORB image match algorithm, which consists of oriented FAST and rotated BRIEF feature match algorithms.

Oriented FAST
Oriented FAST is commonly obtained by adding a direction information to the well-known original FAST proposed by Rosten and Drummond (2006). It should be pointed out that FAST has neither the direction invariance nor the scale invariance. Firstly, an image pyramid (Klein & Murray, 2008) should be built for oriented FAST in order to achieve the scale invariance, and then the FAST corner is detected at each level of such a pyramid. Furthermore, the direction of feature points can be obtained by calculating the centre of mass of image blocks via the grey matter method. In what follows, specific steps of the oriented FAST are shown with the help of an example, see Figure 1. In this figure, the right sub-figure shows a dissociated Bresenham ring around the centre corner candidate p, where the red squares are the pixels used in the FAST corner detection and the arc plotted by dashed line passes through 12 contiguous pixels which are brighter or darker than the candidate p.
(1) The detection of FAST corners. We start by detecting FAST points in the image. In the above template, we can obtain a circle of 16 pixels around the centre corner candidate p via drawing a half-diameter of three circumference. In order to determine whether p is a feature point or not, the following corner response function is adopted where G q means the grey value of 16 pixels on the ring; circle(p) stands for a collection of 16 pixels on the Bresenham ring at the centre of p; and N is the number of pixels, whose greyscale difference with the pixel point p is greater than a given threshold θ d (taking 20% of G p in our paper). When N consecutive pixels are satisfied, the candidate p is called a FAST feature point. Especially, N is set as 12 in our paper.
(2) Adding a direction of feature points. The intensity centroid (Rosin, 1999) is introduced to determine the direction of the feature points. We usually assume that the intensity centroid is deviated from its geometry centre. Under the framework of the grey matter method, the moment M ij of a patch B in Chum and Matas (2005) is defined as follows: where m = (x, y) stands for a pixel point, that is, the coordinate of a pixel in patch B, and I(m) is its intensity. According to the calculated moments, we can obtain the centroid c of the patch B by resort to Denoting the point p as the origin point and connecting point p with the obtained centroid c, we can calculate the orientation of such a patch B: where atan2 is the quadrant-aware version of arctan. Up to now, the feature points can be fixed an orientation so that we can describe the information around it via BRIEF. It should be pointed out that such a description possesses the rotation invariance.

Rotated BRIEF feature descriptor algorithm
Rotated BRIEF is a critical part of the ORB feature match algorithm by adding the rotation invariance based on the original BRIEF algorithm (Calonder, Lepetit, Strecha, & Fua, 2010). Now, let us introduce it from the original BRIEF algorithm. Firstly, by performing a set of binary intensity tests of an image patch (Zitnick, 2010), a bit string description can be obtained.
Define the binary intensity test τ in a smooth image patch B as: where I(m) is the intensity of pixel p(m) (denoted as p for convenience) with the coordinate description m = (x, y). For various distribution of tests considered in Zitnick (2010), a Gaussian distribution around the centre of patch B, one of the best performers, is exploited in our paper. In what follows, choosing a vector length n, we carry out the τ test for the n neighbour vectors for each feature point. First, the neighbour' location is denoted as (x i , y i ) (i = 1, 2, . . . , n). For the purpose of analysis, these points are then described as a matrix S (corresponding to a two-frame image) with dimension 2 × n S = x 1 x 2 · · · x n y 1 y 2 · · · y n (6) Implement the oriented FAST to obtain the patch orientation θ and then get the rotation matrix R θ Executing the rotation invariance to two-frame image S results in Therefore, we have the following improved τ test for the BRIEF algorithm with rotation invariance We need to point out that the rotated BRIEF descriptors can not only save the storage space and but also reduce the matching time. In addition, it is convenient to ORB feature matching with hamming distance.

Our approach: GMS-EGCPROSAC
As shown in Figure 2, using hamming distance to Bruteforce (Lin et al., 2014) match will produce a large number of mismatches, which leads to a big obstacle for the subsequent visual localization and mapping work. In order to improve the accuracy of matching and obtain as many good matches as possible, we propose a union mismatching elimination algorithm, named as GMS-EGCPROSAC. Such an algorithm makes up of the grid based motion statistics (GMS) (Bian et al., 2017) and the improved PROSAC algorithms (Chum & Matas, 2005) with epipolar geometric constraints.
To be specific, a rough matching set is first generated via GMS algorithm, and then the false matches in such a matching set will be further eliminated by utilizing a new improved PROSAC algorithm, which embeds an epipolar geometric constraint model with a projection error fusion.

GMS algorithm
GMS algorithm is essentially a matching statistical constraint model. For a pair of images taken from different views of the same 3D scene, a feature correspondence implies that a pixel (i.e. feature point) in one image is identified as the same point in the other image. If targets in scene move, their neighboring pixels and features will move together. Motion smoothness guarantees that a small neighbourhood around a true match shows the same 3D location. Likewise, the neighbourhood around a false match views geometrically different 3D locations.
Consider an image pair {B a , B b } with M and N feature points for each image and assume M ≥ N. The set of all matching pairs can be obtained according to the hamming distance from B a to B a . For the purpose of convenience, it is denoted as χ = {χ 1 , χ 2 , . . . , χ N } with χ i = {p i , p i } where p i and p i come from images B a and B b , respectively. The goal of GMS algorithm is to divide χ into two sets: the true and false sets of matching pairs by utilizing the local support of each matching. In addition, remove false matches to get a new matching set χ. For this purpose, denote S i as the matching statistical constraint scoring: where · stands for the count operation on matching pairs, R(χ i ) stands for the neighbourhood around the pair χ i , and S i is the number of other matching pairs in neighbourhood R(χ i ), which has removed the matching itself via subtracting '1'. We can judge if a matching pair is correct by comparing S i with the threshold β obtained by experiments.
In order to enhance the robustness of this calculation model, an N × N neighbourhood (see Figure 3 with N = 9 for an example) is selected to calculate the score of each small neighbour area. In this case, the sum of scores is calculated via (10): Additionally, selecting an N × N neighbourhood in experiments as a score template, we can get the threshold β = α √ δ, where α is a suitable parameter (α = 6 for Figure 3) and δ is the average number of feature points present in per single grid-cell in Figure 3. In addition, executing the same operation for all matching pairs in χ, and removing all false matching pairs for S ab ≥ β to produce a new oneχ.

Improved PROSAC algorithm
In light of the traditional RANSAC algorithm (Fischler & Bolles, 1987), the main idea of PROSAC algorithm (Chum & Matas, 2005) is to (1) sort the feature matching point pairs according to their similarity degree and then produce a suitable set of interior points; (2) to obtain the corresponding parameters in algorithm template with the help of parts of interior points; and (3) to execute template for all matching point pairs. Obviously, the correct set of interior points plays an important role in VSLAM and the mismatch pairs inevitably deteriorate the matching performance. In what follows, we will provide an improved PROSAC algorithm (Chum & Matas, 2005) by fusing a novel epipolar geometric constraint (EGC) model to achieve more precise elimination of mismatches.
To be specific, in the first step, we apply the GMS algorithm to obtain a rough matching point set, and then calculate and sort the similarity degree of each matching pair in this set. In the second step, we use N matching pairs with the highest similarity degree to calculate the epipolar geometric constraint model, and then employ the obtained model to estimate remaining matching pairs in the matching point set. There are two tasks: (1) eliminating the outer points that do not meet to the geometric constraint model and (2) obtaining a large number of good matching pairs via updating model with an iteration form.
The EGC model is essentially the geometric projection relations of two views, or the geometry of the intersection of the image plane and the plane beam, whose axis is the baseline (the line joining the camera centres). Suppose that a point P in the three-dimensional space can be projected in two image planes: p 1 in the left image and p 2 in the right image. The epipolar geometry constraint shows that, for the projection point p 1 , if the projection point p 2 lies in the corresponding polarline, we can reduce the number of match pairs, increase the matching accuracy, improve the speed of delete mismatches, and strengthen the real-time performance of the algorithm. The mathematical equation for polar geometric constraints can be expressed as: where E is the essential matrix and K is the camera internal reference obtained by calibration. x 1 = [u 1 , v 1 , 1] T and x 2 = [u 2 , v 2 , 1] T are normalized coordinates of points p 1 and p 2 , respectively. Because the type under different scales is equivalent, one has In addition, stretching E as a vector e leads to e = (e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 e 9 ) T Then, with the help of this vector, the epipolar geometry constraint (12) can be written as a linear form relating to e: Obviously, we can employ eight matching pairs to calculate the essential matrix E: ⎛ With the calculated matrix E, we can evaluate other matching pairs in matching setχ to get high quality matching pairs. Specially, by resort to the essence matrix E, arbitrary feature points p i in the previous frame image can be mapped in one point p i in its corresponding polar line in the next frame. Define the projection error function If d s is less than a threshold d T predetermined by experiments (d T = 5 in our paper), it can be regarded as an expected high quality matching pair, all of which consist of the support match set T. The specific steps of the improved PROSAC algorithm are as follows.
(1) Obtain a matching point setχ by GMS algorithm, and then sort the matching point pair according to their similar distance. (2) Give the threshold d T (d T = 5 in our paper). Select 8 matching pairs with the highest similar degree in the matching setχ, and calculate the essential matrix E to get the epipolar geometric constraint model Q.
(3) According to the obtained essential matrix E, calculate the projection error function via (15) for all data in matching setχ. When the error d s < d T , it is a high quality matching pair and then achieve the support match set T k . (4) If the element of the current support match set T k is greater than the last one, then recompute the essential matrix E like step (2), and update the support match set T k and the iteration number k. (5) If k is greater than k max , k max can be obtained by (16), then exit and can obtain the best inliers set T * . Specially, k max can be calculated by where t is a confidence coefficient (usually taking 0.95), D = 8 is the minimum number of samples, and w is the ratio of the inliers to the number ofχ in the initial state, it can be defined as: w = D/V, V is the number of the matching setχ.
By calculating the geometrical constraint model, we can eliminate some points that do not satisfy the epipolar geometric constraint model and get better matching point pairs. Based on GMS algorithm and the improved ROSAC algorithm, we provide the following GMS-EGCPROSAC mismatching elimination algorithm.

GMS-EGCPROSAC mismatches elimination algorithm
In the above subsection, we develop an improved PROSAC algorithm, named as EGCPROSAC algorithm, in order to eliminate mismatches in a GMS matching set via embedding a novel epipolar geometric constraint model into the traditional PROSAC algorithm. In addition, fusing the implementation foundation based on GMS matching set, the whole union algorithm can be called as GMS-EGCPROSAC mismatches elimination algorithm, whose implementation framework is shown in Figure 4 and the flow is provided in 'Algorithm 1: GMS-EGCPROSAC Algorithm'.

Experiments
In this section, we evaluate the proposed approach based on two popular data sets: Cabinet (Strecha, Hansen, Gool, Fua, & Thoennessen, 2008) and TUM (Sturm, Engelhard, Endres, Burgard, & Cremers, 2012). The picture of two data sets is collected by the visual sensor along the real track. TUM includes six video sequences with the characteristics of wide-baselines, low-texture as well as blur. Cabinet, a subset of TUM, can permit the separate analysis in low-texture scenes. In our experiments, the computer is with an Intel Core i7-5557U and 16 Gib RAM, and the analysis is executed in comparison with GMS algorithm (Bian et al., 2017) and other state-of-the-art image match algorithm like traditional ORB (Rublee et al., 2011), SURF (Lowe, 2004) and SIFT (Bay et al., 2006). Figure 5 plots the picture of 'rgbd_dataset_freiburg2_ flowerbouquet_brownbackground' in TUM (Sturm et al., 2012). There is a certain angle between these two images and the picture size is 640 * 480 in our experiment. The matching effect is clearly plotted via the colour lines in Figure 5 and the match results of these two algorithms are shown in Table 1. Specially, the initial matching pairs are obtained directly via the ORB algorithm (Rublee et al., 2011) and then mismatch pairs are preliminarily removed via FLANN for GMS and RANSAC for our scheme. In addition, both the GMS algorithm (Bian et al., 2017) and our improved algorithm are, respectively, executed to catch much more suitable matching pairs. We can find from Table 1 that the proposed algorithm can effectively increase the matching accuracy at the expense of a little longer time compared with the result via the  GMS algorithm. Our algorithm can eliminate some mismatches which do not conform to the polar geometric constraint model, and thereby the matching precision can be increased and the quality of matching can be improved.

Our approach compare to some traditional picture matching algorithm
In the following experiment, the effectiveness of our algorithm is shown from the number of high-quality matching pairs, the matching accuracy as well as the running time in comparison with traditional algorithms. When guaranteeing the same matching accuracy, our method can get more high quality matching pairs.

Matching quantity and matching precision
The five group images from the TUM and Cabinet data sets are arranged according to their matching complexity. Firstly, applying these two algorithms, 500 initial matching pairs produced directly by the ORB algorithm is plotted in Figure 6(a). Obviously, there exist a lot of false matching pairs. In what follows, the resultant eliminating false ones are, respectively, depicted in Figures 6(b) and 6(c) by adopting the traditional ORB with FLANN-RANSAC and our algorithm, where colour horizontal colour lines stand for matching pairs with high-quality. The number of obtained good matching pairs and the matching accuracy are listed in Table 2, where the matching accuracy here is the ratio of the number of rough ones and good ones. We can observe from the four, eight and ten columns that the number of high-quality matching pairs obtained by our algorithm is more than that via the traditional ORB, whose all matching pairs are contained in our results. In addition, the number of obtained high quality matching pairs achieves 1 to 2 times more than the ones by traditional algorithms, and thereby the accuracy of VSLAM can be effectively improved. Additionally, the accuracy of our algorithm is slightly higher than the traditional algorithm at the same time.

Matching time
Image match algorithm requires not only the matching accuracy but also the matching real-time. In this experiment, the running times of our algorithm and several classical matching algorithms are recorded in Table 3. From this table, we can easily find that the running speed of our algorithm is about 0.5 times faster than that of traditional ORB (Rublee et al., 2011), 30 times faster than that of SURF (Lowe, 2004), and even 40 times faster than one of SIFT (Bay et al., 2006). Through the above comparison,  it is verified that the real-time performance of our developed algorithm is greatly improved for image matching, which is beneficial to the VSLAM work in future. In summary, the above experimental results show that the real-time of the improved algorithm is obviously better than the traditional ORB algorithm, even an order of magnitude faster than the classical SIFT and SURF image matching algorithms. The matching accuracy and the number of obtained high quality matching pairs are better than the ORB algorithm. Therefore, the effectiveness is adequately verified for the developed algorithm in our paper.

Conclusions
In this paper, we have developed an improved algorithm of image mismatches elimination by introducing a new epipolar geometric constraint (EGC) model. Such an algorithm adequately absorbs the advantages of both the traditional GMS algorithm and the PROSAC algorithm. By resort to the calculated epipolar geometric constraint model, our algorithm has been carried out to eliminate false matches while guaranteeing the matching accuracy and the number of good matching pairs. In addition, the real-time and effectiveness have been adequately checked by comparing with other algorithms in different scenarios. Future topics would include the camera pose in visual odometry in light of our algorithm developed in this paper.