Toward the use of smartphones for mobile mapping

Abstract This paper considers the use of a low cost mobile device in order to develop a mobile mapping system (MMS), which exploits only sensors embedded in the device. The goal is to make this MMS usable and reliable even in difficult environments (e.g. emergency conditions, when also WiFi connection might not work). For this aim, a navigation system able to deal with the unavailability of the GNSS (e.g. indoors) is proposed first. Positioning is achieved by a pedestrian dead reckoning approach, i.e. a specific particle filter has been designed to enable good position estimations by a small number of particles (e.g. 100). This specific characteristic enables its real time use on the standard mobile devices. Then, 3D reconstruction of the scene can be achieved by processing multiple images acquired with the standard camera embedded in the device. As most of the vision-based 3D reconstruction systems are recently proposed in the literature, this work considers the use of structure from motion to estimate the geometrical structure of the scene. The detail level of the reconstructed scene is clearly related to the number of images processed by the reconstruction system. However, the execution of a 3D reconstruction algorithm on a mobile device imposes several restrictions due to the limited amount of available energy and computing power. This consideration motivates the search for new methods to obtain similar results with less computational cost. This paper proposes a novel method for feature matching, which allows increasing the number of correctly matched features between two images according to our simulations and can make the matching process more robust.


Introduction
Thanks to the continuous increase of applications using geo-spatial data Habib et al. 2005;Pirotti et al. 2015;Remondino, Guarnieri, and Vettore 2005), in the last decades several mobile mapping systems (MMSs) have been developed, mostly based on the use of terrestrial or airborne vehicles (Chiang, Noureldin, and El-Sheimy 2003;Kraus and Pfeifer 1998;Pirotti et al. 2014;Remondino et al. 2011;Toth 2001;Toth and Grejner-Brzezinska 1997), equipped with remote sensing instruments such as laser scanners and cameras.
MMSs have become quite popular even among the general public due to the success of web tools which allow street view navigation. Despite the popularity of such applications, it is worth to notice that the acquired georeferenced spatial data can be used in definitely wider range of applications, in the real time case (e.g. location-based services) and the post-processed one (e.g. by a Geographic Information System (GIS) (El-Sheimy and Schwarz 1998;Hadeel, Jabbar, and Chen 2011;Piragnolo et al. 2015;Tao 2013), and for recognition purposes (Facco, Masiero, and Beghi 2013;Facco et al. 2011;Jaakkola et al. 2010;Pfeifer, Gorte, and Winterhalder 2004)).
The diffusion of MMSs has been limited until now by their quite high cost, mostly due to the need for quite expensive sensors (e.g. terrestrial laser scanners) and vehicles (e.g. cars). However, motivated by the worldwide capillary diffusion of mobile devices (e.g. smartphones, tablets) embedded with both positioning (e.g. GNSS, inertial sensors) and remote sensing instruments (e.g. camera), several efforts have been recently spent in order to develop mobile MMSs based on smartphones.
Two great advantages are showcased with the development of an MMS based on the use of a smartphone -the much lower cost comparing with other MMSs, and the much wider diffusion of these devices, which represents a potentially very large customer base. However, several challenging issues are related to the realization of such a system: first, the limited amount of available energy imposes stringent requirements on the system power consumption (and hence a restriction to the available computational resources) and/or on its battery autonomy. Furthermore, the current generation of smartphones is typically embedded with several MEMS sensors (Schiavone, Desmulliez, and Walton 2014), which, however, typically provide quite noisy measurements (e.g. positioning obtained by means of only OPEN ACCESS inertial sensors is unreliable). For instance, although several works in the literature recently have tried to tackle the positioning problem by MEMS sensor measurements when the GNSS signal is not available (or not reliable, as in certain city centers, or indoors), it is still a challenging problem (Chen, Meng, et al. 2015;Chen, Zou, et al. 2015;Bahl and Padmanabhan 2000;Huang and Gao 2013;Lukianto and Sternberg 2011;Saeedi, Moussa, and El-Sheimy 2014).
The aim of this paper is twofold. First, it aims at providing a new solution for the indoor positioning problem when the GNSS signal is not reliable (Piras, Marucco, and Charqane 2010). More specifically, this paper improves the positioning approach proposed in . Here the positioning problem is tackled by means of the use of a pedestrian dead reckoning approach, referring to the study of Widyawan et al. (2012), and, similarly to the previous study ), a computationally efficient particle filter is used (it requires a quite low number of particles, e.g. ~100). The proposed improvement is to make it more effective in a wide range of conditions of interest (e.g. in emergency conditions, during firefighter intervention), in particular, a new movement mode detection method is proposed, and an altitude estimation based on the use of the barometer.
The standard camera embedded in the smartphone is used as a remote sensing device providing images to be processed in order to obtain a 3D reconstruction of the scene (or of the object of interest). Thus, the goal is to allow 3D reconstruction based on the solution of the Structure from Motion (SfM) problem directly on the mobile device. The overall reconstruction algorithm can be summarized with the following steps: feature extraction and matching, reconstruction of the geometry of the scene, and computation of a dense 3D point cloud. Since the execution of such computations on the device can drastically reduce the battery life, reducing the power consumption needed for computing the 3D reconstruction is of fundamental importance. Several approaches have been recently considered in order to efficiently compute the solution of the SfM problem given a set of images (Agarwal et al. 2010;Brand 2002;Byröd and Åström 2010).
Independently of the specific adopted algorithm, it is clear that the accuracy and the computational complexity of the 3D reconstruction is closely related to the quality of the candidate matching features. In order to improve the results of the feature matching steps, this paper considers the use of an alternative feature description, similar to affine scale-invariant feature transform (affine SIFT, or ASIFT) (Morel and Yu 2011), that takes advantage of the information provided by the navigation system in order to improve the feature matching ability of the system, while simultaneously reducing the computational burden required in the ASIFT approach. According to the results shown in Section 5, the proposed method allows to increase the number of correctly matched features with respect to the standard SIFT (i.e. with respect to the state of art).

System description
Most of the navigation systems on the market exploit the use of the GPS/GNSS signal, however, since this solution is not reliable in indoor environments in this paper an alternative navigation procedure will be considered: the proposed solution (that integrates information provided by a three-axis accelerometer, a three-axis magnetometer, and a barometer) is specifically designed to be usable in indoor environments (i.e. when GNSS is not reliable), however it can be used outdoors as well (as a stand-alone navigation system or integrated with the GNSS positioning and/or with WiFi information). Despite not considered in the minimum requirements, a three-axis gyroscope can be considered as well, in order to provide more reliable estimations of the change of the device orientation and of the heading direction. The geometrical characteristics of the building are assumed to be pre-loaded on the navigation device before starting the navigation algorithm.
In order to make the use of the system as simple (and comfortable) as possible to the user, the device is assumed to be hand held, and differently from most of the previously proposed systems for pedestrian navigation, the use of external sensors (e.g. Remote sensing ability is achieved by using the standard camera sensor embedded in the smartphone. In this work, a SfM approach is used in order to provide 3D reconstruction of the environment. The accuracy and robustness of the obtained results is usually related to the ability of the 3D reconstruction algorithm to provide a (possibly) large number of good matches between features in different images: this typically ease both the estimation of the scene structure and the dense reconstruction. Motivated by these considerations, in Section 4 a novel method is proposed in order to improve SIFT (scale-invariant feature transform) feature matching and, consequently, the overall 3D reconstruction.
The proposed method is actually based on a similar rational of the ASIFT method Yu 2009, 2011), however, it improves the computational efficiency of the ASIFT method taking into account of the information provided by the inertial measurement unit (IMU) and/or by the navigation system. Furthermore, the method requires an approximate knowledge of the value of the intrinsic camera calibration parameters (this information is usually available from the operative system of the device).
The system has been developed in the Android environment and the results provided in this paper have been obtained by a Huawei Sonic U8650 ( Figure 1) and an LG Google Nexus 5. Notice that, despite (since now) the system has been developed only for Android, this is not a limitation of the proposed approach, which actually can be considered for other operative systems (and devices) as well.

Navigation
Since in terrestrial applications often the height with respect to the floor is of minor interest, in this section the navigation problem will be separated in estimating movements on a planar map and on the vertical direction (Section 3.1). The positioning system that will be described in the following of this section is an evolution of that presented in . The most significant differences are related to the estimation of the movements along the vertical direction (Section 3.1), and to the detection of different movement modes (Section 3.2).
The rationale of the positioning system is that of using a dead reckoning-like approach (Foxlin 2005;Ruiz et al. 2012): detect the human steps by means of a proper analysis of the accelerometer measurements (Jahn et al. 2010), then the combined use of magnetometer and accelerometer (and gyroscope, if available) measurements allows to estimate the movement direction with respect to the North (Bonnet et al. 2009).
Let (u t , v t , w t ) be the 3D position of the device (e.g. smartphone), expressed with respect to the North, East, and vertical directions (i.e. global reference system), before the t-th step, then where s t is the length of the t-th step, and α t is the corresponding heading direction. Notice that an estimation of the initial 3D position (u 0 , v 0 , w 0 ) is assumed to be a priori available. (1) The step length s t is estimated by properly combining (in a linear estimation fashion) the current values of the following variables: the acceleration peak difference, the average of the acceleration absolute values in the time interval related to the considered step, the time duration of the step, and their inverse values. The weighting parameters in the considered estimator are computed on a learning data-set. More details on the considered variables can be found in Jahn et al. (2010). Alternatively, s t can be fixed to a constant value (an approximation of the mean step length): the tracking algorithm proposed in  and summarized in this section is designed to compensate (relatively small) step length errors.
The mobile device is supposed to be carried by the user's hand and the heading direction is assumed to be approximately fixed with respect to the local coordinate system (u s , v s , w s ), i.e. the user does not drastically change the device orientation during the navigation. The tracking system is designed to estimate and correct device attitude changes, with absolute value lower than 36 degrees, with respect to the conventional orientation. Allowing free changes of the device orientation can be achieved by generalizing the initial heading estimation procedure proposed by , or as done by Deng et al. (2015).
Let y t be the vector of measurements corresponding to the t-th step and Y t be the collection of measurements y τ from τ = 0 to t−1. Furthermore, let q t = [u t v t ] T , then the probability distribution of the estimated position q t after the (t−1)-th step is expressed as follows: where q i,t and w i,t = 1/n are the position and weight of the i-th particle at t, while δ(·) is the Dirac delta function.
Then, at the next user's step the above probability distribution is updated as follows: • for each particle i: • draw a sample q i,t+1 from the proposal distribution: where the heading direction α i,t and the step length s i,t are sampled from Gaussian distributions centered in α t and ks t + b t , respectively. k and b t are scalar variables that aim at reducing the effect of measurement errors, as shown in .
(2) if q i,t+1 violates the geometrical constraints of the building, then the last part of the particle trajectory is rotated of a random angle α b,t (|α b,t | ≤ π/5). Since most of the times, the violation is due to small deviations of the heading direction from its true value (e.g. measurement errors due to small calibration errors), occur in the considered environment, a linear model between p w and w can be considered as well). At the beginning of the navigation procedure, p 0 is estimated as (the average value of) the pressure measurement(s) at the known altitude w 0 .
If certain environmental variables are known (e.g. temperature, georeferenced position, and corresponding value of the gravitational acceleration), then the value of the parameter a can be analytically computed. However, in order to reduce the measurement errors due to sensor calibration, the following simple procedure is adopted: since w 0 is often expressed with respect to the ground and the initial position is assumed to be known, then a can be computed as the best fitting value in Equation (5) by measuring p w with w corresponding to the ground altitude.
The goal of the above procedure for the estimation of p 0 and a is to be as simple (and fast) as possible for the user. However, it is worth to notice that a more robust estimation of such parameters should be adopted if possible (i.e. varying the altitude during the calibration procedure on all the range of the expected values of interest during the navigation).
Once the parameters p 0 and a have been computed, the variation of altitude with respect to w 0 can be estimated as follows: When an accurate estimation of w is required, a Kalman filter can be implemented as well to exploit the temporal smoothness of the device movements in order to reduce the influence of measurement errors.

Detection of movement mode
The rationale of this subsection is that of improving the positioning estimation by providing information about the current action of the user. For instance, step lengths are typically different when walking on the stairs with respect to walking on a corridor: hence with such information the positioning algorithm can easily adapt the step length according to the zone where the user is currently moving. Taking into account of the above considerations, in the first part of this subsection a method for detecting several user's movement modes will be presented. Then, Equation (4) will be updated in order to exploit the movement mode information.
To be more specific, the aim of this section is that of presenting a method for detecting five actions typically related to moving inside of a building: going up or down on stairs, and going in the up or down direction with a lift, walking on a floor.
A support vector machine (SVM) approach has been used in order to detect and correctly classify such actions. Measurements provided by the barometer and by the accelerometer are used as input for the classifying small rotations of the last part of the trajectory can often tackle this issue. (3) if q i,t+1 still violates building geometrical constraints then set w i,t+1 = 0, otherwise set w i,t+1 = 1/n. When the WiFi connection is available, the weights can be computed taking into account also of the WiFi radio signal strength, as shown in Widyawan et al. (2012).
• Scale the particle weights in order to normalize their sum to 1. • Resample n particles from the following and set w i,t+1 = 1/n.
Notice that the computational complexity of this particle filter is linear with respect to the number of particles n, and, more interestingly, it allows to achieve good positioning performance while using a small number of particles (e.g. n ≈ 100).
The reader can refer to ) for more details on the original version of the particle filter summarized above. The estimates obtained by this filter are integrated with those of the altitude described in Section 3.1 in order to obtain 3D estimations of the device position. 3D orientations can be obtained as well, by processing the IMU measurements with a Kalman filter.

Estimating variations of altitude
The variations of the altitude of the device are estimated in this subsection by measuring the variations of atmospheric pressure by means of a barometer embedded in the mobile device. Since atmospheric pressure changes with time and space, a fast calibration procedure is assumed to be performed at the beginning of the navigation. Furthermore, the working conditions are assumed to be invariant during the navigation, e.g. constant atmospheric pressure (and temperature): for instance, in ideal conditions (null measurement error) the same pressure value can be measured in the same spatial position at the beginning and at the end of the navigation. This assumption is clearly an approximation of the reality; however, it is reasonably good for the typical extent (in the spatial dimension and in the time duration) of pedestrian navigation.
According to the above assumptions (i.e. considering the atmosphere as an ideal gas at constant temperature), from Boyle and Stevino's laws the pressure p w at an altitude w can be expressed as follows: where p 0 is the pressure at altitude w 0 (notice that in certain cases, for instance when changes of temperature (5) p w = p 0 e −(w−w 0 )∕a mode j given the current measurements. As shown in Section 5, the movement mode detector has low probability error, hence the detected mode is often the correct one, i.e. p t+1,j |Y t+1 is close to 1 for the correct value of j. Furthermore, p q t+1 |Y t+1 is formally defined similarly to Equation (4). The main difference with respect to (4) is that in p q t+1 |Y t+1 the parameter values can be particularized with respect to the specific case of interest, e.g. the mean step length while walking on a corridor is typically quite different with respect to that while going up/down on the stairs.

3D reconstruction
In the proposed system, 3D reconstruction of the scene is obtained by means of SfM approach (Hartley and Zisserman 2003). To be more specific, the reconstruction procedure can be summarized as follows: • Computation and matching of feature points in the acquired images. • Solution of the SfM problem on the matched features (i.e. bundle adjustment (Agarwal et al. 2010), or incremental SVD method (Brand 2002;). • Dense point cloud computation (this is based on dense pixel matching in different images (Furukawa and Ponce 2010) and triangulation for computing the corresponding 3D positions (Hartley and Zisserman 2003;Masiero and Cenedese 2012)).
In particular, this section deals with the improvement of the first part of the reconstruction procedure, the feature matching step, while the other steps are performed with standard algorithms.
Feature matching is based on the appearance of the 2D image regions in the neighborhood of the considered feature locations. Since images are taken from different point of views, the same feature can undergo certain appearance changes, then the goal of several matching techniques recently proposed in the literature is that of extracting features invariant to such deformations. As widely known, SIFT descriptors (Lowe 1999) allow to obtain reliable matching between features in two images when the images are related by a different scaling, illumination and rotation on the image plane (see Figure 2(b)). However, matching issues can occur when in presence of different changes of the point of view between the two images (e.g. rotations along the other two axes as in Figure 2(c) and (d)). As shown in Yu (2009), (2011), in this case local changes between features in the two images can be usually well represented by means of affine transformations.
In order to make the SIFT descriptor robust even to these kind of transformations, Yu (2009), (2011) proposed to simulate N = 32 versions of each of the original images, where each version simulates the machines: indeed, 5 SVMs are used, where each SVM is used to recognize one specific action.
All the SVMs use the same data input. A linear conversion is used to properly relate the change of pressure measured by the barometer with the change of the device altitude. Then, a Δt interval of measurements is considered, and the following variables are provided to the SVM classifiers as input: • The mean altitude variation in the Δt interval (the value of this variable is obtained by a linear fit of the data in the considered interval). • The standard deviation (in the Δt interval) of the absolute value of the measured acceleration vector.
The mean altitude variation in Δt often allows to discriminate each of the cases from a device moving on the same floor. Indeed, a significant (positive/negative) change on its value corresponds to a device movement (also) in the vertical (up/down) direction. Furthermore, its value is usually (approximately) fixed to a constant value when the user is not moving (e.g. as quite usual in a lift), whereas it can assume quite different values for different humans walking on stairs.
Despite such variable can often be successfully used to detect the actions of interest here, its reliability strongly depends on the length of the time interval Δt: computing the mean over a longer time interval allows to significantly reduce the influence of the measurement noise and of the human movements, whereas for short time intervals such factors can lead to wrong classification results.
The standard deviation (in the Δt interval) of the absolute value of the measured acceleration vector has been considered as well in order to improve the classification results: in particular it can significantly improve the discrimination between going up/down with a lift or on the stairs: indeed, human (and consequently device) movements are usually quite limited while being in a lift, hence (excluding the initial lift acceleration and the final deceleration) the acceleration measured by the sensor is mostly similar to the gravitational acceleration. Instead, while walking the acceleration measured by the device is subject to significant changes due to the human steps: its mean value might be not so different from the gravitational acceleration, but its standard deviation (due to the acceleration changes due to the steps) results to be usually much larger than in the lift case.
Thanks to the use of the movement mode detector presented above, and by exploiting the law of total probability, the position estimation Equation (4) can be updated as follows: where t+1,j indicates the detection of the j-th mode at time t + 1, whereas p t+1,j |Y t+1 is the probability of (7) p q t+1 |Y t+1 = ∑ j p q t+1 |Y t+1 , t+1,j p t+1,j |Y t+1 axis is approximated with (u 0 , v 0 ) ≈ (c/2, r/2), where r and c are the number of image rows and of columns, respectively. The parameter a is related to the focal length and to the pixel size. When the characteristics of the device are available, the value of a can be quite easily approximated. When no information on such characteristics is available, the procedure described in the following section shall be repeated for different values of a ranging in the following interval (1/3(r + c), 3(r + c)) (Fusiello and Irsara 2010;Heyden and Pollefeys 2005), where the real value of a is supposed to be the value of a that allows to obtain the largest number of matching features.
Furthermore, let R t be the matrix provided by the navigation system describing the device orientation during the acquisition of the image at time t. Since the approximate orientation matrix R t is assumed to be available for all the acquired images, then, for each couple of acquired images I 1 and I 2 , the corresponding orientation matrices and interior parameter matrix K are used in order to compute the approximate rectification of I 1 with respect to I 2 , i.e. I 1 is transformed in order to simulate the view from the same orientation of I 2 . After such transformation two corresponding features in the two images should have approximately the same orientation, hence the SIFT descriptor is modified in order to use absolute orientation angles instead of relative ones, i.e. invariance with respect to (large) rotations is now an undesired condition. Since the navigation system provides also an estimation of the device position at each time interval, for each couple of matched features it is possible to compute also an approximation of the corresponding scales: however, doing this for all the couples of candidate matched features might be time consuming, hence it might be optionally considered just for an (almost) final list of matched features.
Finally, exploiting the information provided by the navigation system and the approximate interior parameter matrix K, an approximate fundamental matrix can be computed, and an approximate epipolar constraint can be imposed in order to reduce the number of candidate feature matchings. However, it is worth to notice that the fundamental matrix computed in this way is typically a quite rough approximation of the correct one, hence the appearance of the corresponding image after applying a specific affine transformation. The N affine transformations can be associated to different values of the rotation angles, i.e. they represent a discretization of the set of possible rotations that provide variant results to the SIFT descriptors (combinations of rotations as in Figure 2(c) and (d)). Then, feature points are extracted in all the N versions of the images. Thus, when searching for feature matching between image I 1 and I 2 , all the corresponding SIFT descriptors are matched: all the possible combinations of matchings are checked, i.e. the features extracted in each of the N versions of I 1 are matched (when possible, according to the standard SIFT matching procedure, (Lowe 1999) with the features extracted in each version of I 2 . In accordance with the type of transformations applied to each image, this procedure is named affine SIFT. Applying the feature matching procedure (Lowe 1999) to each couple of transformed images the computational time for feature matching is proportional to N 2 times that of the original SIFT.
Taking into account of the above considerations, the goal of this section is that of exploiting the information provided by the navigation system in order to improve the performance of the SIFT-based feature matching but at a lower computational cost with respect to the affine SIFT.
First, in order to make the use of system as simple as possible for the user, the camera embedded in the device is assumed to be uncalibrated (see (Karel and Pfeifer 2009;Ma et al. 2004;Remondino and Fraser 2006) for camera calibration and its advantages). However, in most of the cases it can be approximated by a pinhole camera. Then, notice that despite the real value of the interior parameter matrix K is unknown, it can be roughly approximated as follows: where pixels are assumed to be approximately squares, sensor axes are assumed to be orthogonal, and the displacement of the sensor center with respect to the optical the same floor of that of the calibration procedure is of 0.2 m. Hence, the simple parameter learning procedure previously proposed can be useful when the goal is that of detecting changes of floors, however when the required estimation error has to be relatively small it has to be used only in the range of altitudes considered during the learning of the model parameters (hence in order to obtain reliable estimations on a larger range of altitudes it is necessary to increase the range of altitudes also during the learning of the model parameters).
The detection of movement modes has been validated in the same university building previously considered. In order to make the system as simple as possible to use for an unqualified user, the results reported in the rest of this section have been obtained with uncalibrated sensors (however, the use of calibrated sensors allows to significantly improve the results).
Measurements involved the use of two stairs and two lifts, going in both up and down directions. Figure 3 shows the training results for the SVM classifier of the action of going down on the stairs (Δt = 3 s).
The mean altitude variation during the interval Δt long typically allows to properly discriminate each of the cases of interest. On the one hand, going up/down leads to positive/negative mean altitude variation, while, on the other hand, being in a lift or on the stairs can usually be determined by comparing the absolute value of the mean altitude variation (that is usually (approximately) fixed to a constant value for a lift). The latter case can be also distinguished by considering the standard deviation (during the Δt interval) of the absolute value of the measured acceleration vector: human (and device) movements are usually quite limited while being in a lift, leading typically to a small standard deviation. Instead, approximate epipolar constraint should be applied with a threshold significantly larger than 0.
Interestingly, this approach allows to perform the feature matching procedure with a constant computational burden (with respect to N), thus significantly reducing the computational complexity with respect to the affine SIFT. The performance (in terms of correctly matched features) of the proposed matching feature procedure will be shown in the next section.

Results
The positioning approach presented in this work is an evolution of that in . In this section, the functionality of the main changes with respect to ) is tested, i.e. the altitude estimation and the detection of different movement modes (and their use during navigation). Experiments have been conducted on three floors of a university building, and in order to make the results statistically more robust, experimental data have been collected by three volunteers, two men and one woman, with heights from 1.65 to 1.85 m.
First, the (variation of) altitude estimation has been tested on 21 check points distributed on two buildings, where the range of considered altitudes ranges from the ground to 5 m, approximately. The average altitude estimation is of 0.6 m, approximately. However, this has been obtained learning the parameters of the estimation model varying the device altitude of only 1 m, approximately, whereas the computed estimation model was used mostly outside of the altitude range used in the calibration. Instead, the altitude estimation error obtained restricting the considered altitudes only to those on during a walk the acceleration measured by the device can change because of the human steps, leading to a much larger value of the standard deviation.
The two considered variables most of the times can be successfully used to detect the four actions of interest here. However, the reliability of the computed classifiers strongly depends on the length of the time interval Δt. The longer is the interval, the smaller is the influence of measurement noise (and of the human movements). Figure 4 compares the classification error obtained with the SVM classifiers by varying the value of the time interval Δt (from 0.5 to 3 s). The reported results are the mean of 100 independent Monte Carlo simulations. As   For what concerns the feature matching approach presented in Section 4, first the effect of (approximate) image rectification is shown in Figure 6.  Figure 6(a) to be more easily comparable to Figure 6(b). It is clear that the use of uncalibrated sensors ensures lower quality results with respect to those shown in the figure, the number of classification errors becomes less than 5% when considering Δt ≥ 2 s.
Then, the use of the movement mode detector has been tested during navigation: a slightly adapted version of the Widyawan's particle filter (Widyawan et al. 2012) has been applied in order to track the device while the user is going on the stairs (Figure 5(a) and (b) show an estimated trajectory sample (distributed among two floors); 50 trajectories have been considered). WiFi has been deactivated to validate the functionality of the positioning approach based on the use of the inertial sensor measurements. The use of the movement mode detector during the navigation (i.e. by means of (7)) allowed a  range from Geomatics, to Computer Vision, Smart Camera Networks, modeling, and control of adaptive optics systems. His research mainly focused on statistical and mathematical modeling, identification and control of spatio-temporal processes, signal processing, statistical filtering, optimization, and information fusion. He is currently working on low-cost positioning and mobile MMSs.
Francesca Fissore received MSc in Environmental Physics at University of Turin. In April 2014, she received her PhD in "Monitoring of Systems and Environmental Risk Management" at the University of Genoa, during which she dealt with low-cost sensors for the detection and monitoring of environmental parameters and developed a real time web-based monitoring tool able to acquire, manage, smooth, aggregate, historicize, and display environmental parameters. Currently, she holds a Post-doc position at CIRGEO where she is working to realize a mobile MMS using low cost sensors. The project aims at developing approaches for the statistical processing of mapping data and the monitoring and collection of information from multiple sensors loaded on mobile devices.

Francesco
Pirotti is an assistant professor at the Department of Land, Environment, Agriculture and Forestry of University of Padova. His research interests are in remote sensing applications for forestry and the environmental sciences, natural hazards, and risk, in particular using laser scanning (LiDAR) data. He uses collaborative web solutions for applying new algorithms to large remote sensing data-sets. He has also investigated full-waveform LiDAR data for forestry applications. He is in several national and international scientific committees and research groups related to geomatics.

Alberto Guarnieri received the degree in Electrical
Engineering at the University of Padova (Italy) in 1998, then in 2004 he got the PhD in Geodetic and Topographic Sciences at the University of Bologna (Italy). Since November 2008, he is assistant professor by the Department of Land, Environment, Agriculture and Forestry (TESAF) of the University of Padova. His research activity is mainly focused on 3D modeling supported by terrestrial laser scanning and digital photogrammetry for cultural heritage and hydrogeological risk mapping. He is also involved in the development of land-based and UAV-based low-cost mobile MMSs. He currently teaches higher education courses on Global Navigation Systems (GNSS) and GIS at the School of Agricultural Sciences and Veterinary Medicine of the University of Padova. He is active in the International Society of Photogrammetry and Remote Sensing (ISPRS), serving from 2012 to 2016 as secretary of Inter-Commission Working Group ICWG I/Va -Mobile Scanning and Imaging Systems for 3D Surveying and Mapping.
Antonio Vettore is a full professor of geomatics at the University of Padova. Author of more than 200 publications in the fields of Surveying, Photogrammetry, Cartography (Geomatics). His research topics are: (1) Mobile Mapping Systems, Inertial Navigation Systems, Road Tracking, Image Processing. (2) Laser Scanner applications and GPS/INS techniques for Surveying and 3D modeling of cultural heritage sites. (3) Processing of LiDAR data from airborne or ground sensors (ALS and TLS), both discrete return and full waveform. He is the head of research units in several national (Italian Ministry of Education, Italian Civil Protection Agency) and international projects. He is actively involved in the International Society of Photogrammetry and Remote expected in a calibrated case. Nevertheless, the rectification procedure has partially succeeded in producing an image more similar to the middle one with respect to the left one, in particular close to the feature position. Interestingly, by comparing the left border of the synthetic window (Figure 6(c)) with that in Figure 6(a) and (b), it is possible to notice that obviously the system cannot properly estimate image parts that are not visible in the original image (e.g. the internal left border of the window is not visible in the Figure 6(a), and, consequently, it cannot be presented in Figure 6(c) as well).
Then, the feature matching approach has been tested on a set of images downloadable on the Internet from the website of (Lhuillier and Quan 2005) (Figure 7 shows two of them, with the corresponding SIFT feature points; the size of all the considered images is 640 × 480 pixels). Since orientation information is not available for these images, approximate orientations have been computed after matching the features in the images and adding to the computed orientation angles a Gaussian random noise with standard deviation 0.15 radiant (100 independent Monte Carlo simulations have been considered in order to provide statistically reliable results). Figure 8 compares the number of correctly matched features by the proposed method (red circles) with that obtained by means of the standard SIFT (blue x-marks) varying the value of the rotation angle between the two camera poses (feature locations and SIFT descriptors have been computed with VLFeat (Vedaldi and Fulkerson 2010) for both the considered methods). The proposed method allowed to detect approximately 23% more correct feature matchings.

Conclusions
This paper presented recent improvements on an indoor positioning approach and a new strategy to improve feature matching results. The proposed navigation approach has been designed to work even in particularly difficult working conditions, e.g. with uncalibrated sensors, when WiFi connection is not available. The method presented here allows to estimate altitude variations and to exploit a movement mode detector in order to improve positioning estimation (20% of positioning error reduction, approximately, with respect to the version not using the movement mode detector).
Furthermore, the presented method for feature matching has reduced the computational burden required by the ASIFT, while ensuring a significant improvement in the number of correctly matched features with respect to the standard SIFT.

Notes on contributors
Andrea Masiero received the MSc in Computer Engineering and the PhD degree in Automatic Control and Operational Research from the University of Padova (Italy). He currently holds a Post-doc position at CIRGEO. His research interests