Modeling neutral-atmospheric electromagnetic delays in a “big data” world

Abstract If left unmodeled, the delay suffered by electromagnetic waves while crossing the neutral atmosphere negatively affects Global Navigation Satellite System positioning. The modeling of the delay has been carried out by means of empirical models formulated based on climatological information or using information extracted from numerical weather prediction (NWP) models. This paper explores the potential use of meteorological information of several types that will become available with the increasing number of sensors (e.g. a cell phone, or the thermometer of a nearby smart home) in cyberspace. How can we make use of these potentially huge data-sets, which may help to provide the best possible representation of the neutral atmosphere at any given time, as readily and as accurately as possible? This situation falls in the realm of Big Data. A few potential scenarios, a sequential improvement of Marini mapping function coefficients, a self-feeding NWP, and near real-time empirical model updates, are discussed in this paper. The pros and cons of each approach are discussed in comparison with what is done today. Experiments indicate that they have potential for a positive contribution.


Introduction
The neutral-atmospheric delay is suffered by electromagnetic waves when they cross the neutral atmosphere (more popularly but imprecisely called the troposphere). The modeling of this delay is of major interest in positioning, navigation, and timing of any type. If left untreated, it can cause significant error. The usual way to deal with the neutral-atmospheric delay is by modeling it. Models for this purpose can be constructed based on climatological information (empirical models), or based on information derived from numerical weather prediction (NWP) models.
The initial approach for modeling the neutral-atmospheric delay was based on the use of climatological models derived from meteorological sparse data. An improvement in the climatological models took place as denser meteorological networks started to be developed and more meteorological observing techniques became available. The neutral-atmospheric delay can be predicted based on a set of equations, at any given time, but based on a historical climatology. In this category, we can include the early models and mapping functions due to Hopfield (1969) and Saastamoinen (1972), as well as the more recent empirical models such as UNB3m (Leandro, Langley, and Santos 2008).
More recently, investigation began to focus on extracting information from NWP models through ray tracing (e.g. Nievinski 2009). Even though computationally more expensive, this approach has been shown to be more advantageous than modeling based on climatology because it provides the representation of the neutral atmosphere at the epoch, when the NWP is available. This has led to the development of the Vienna Mapping Functions (VMF) (Boehm and Schuh 2003), which are based on several NWP models from the European Centre for Middle-Range Weather Forecast, National Oceanic and Atmospheric Administration, and Canadian Meteorological Center. VMF services have been developed making available gridded neutral-atmospheric delay and other ancillary parameters, which can be used from general positioning applications up to scientific applications such as the ones in support to the Global Geodetic Observing System (GGOS), which is a component of the Global Earth Observation System of Systems (GEOSS). The use of NWP models changed the paradigm from the use of simple model equations to the derivation of the delay directly from a huge amount of gridded data.
There are a few limitations in both approaches. Models based on climatology are based on sparse data covering a certain period, usually in the past, whereas NWP models, although based on more up-to-date data, are provided on data intervals that range between several hours.
An alternative way can be foreseen as a consequence to the ever-increasing number of sensors (being OPEN ACCESS meteorological or not, being satellite-based or groundbased), the wider spread of their geographical distribution and the growing generation of a continuous flow of data, notably in real time. For example, one can ask if individual cell phones will become a mobile meteorological station or whether the observations from their embedded Global Navigation Satellite System (GNSS) receivers (each time more accurate) can even be used. It is reasonable to add this speculation to the list of future possibilities.
That being a possibility, can we think ahead and start inquiring how to make use of these huge datasets, floating around in cyberspace, with potential to provide the best possible representation of the neutral atmosphere at any given time, as readily and as accurately as possible? This futuristic scenario fits well within the emerging field of Big Data (Mayer-Schönberger and Cukier 2013).
This paper explores and discusses scenarios that show a tremendous potential of opening new trends in modeling the neutral-atmospheric delay. It includes: (1) a sequential improvement of Marini mapping function coefficients (e.g. within a VMF), (2) a self-feeding NWP, and (3) near real-time empirical model updates. The discussion and simulations that will be shown cover the whole planet as an indication of their potential use under GGOS, but they can easily be tied to specific locations.

Big data
It is important to admit that, at one point, Big Data sounded to us as synonymous to "signals of opportunity. " The concept means the use of any kind of data available for the purpose of navigation. A signal of opportunity would be available, most often, by a physical mean, like radio wave. Even though the Big Data has the idea of "opportunity to use available data" it entails a more complex meaning, because the data are not necessary what we would consider as measured data in earth sciences and engineering but more like information that is available "out there" waiting to be used.
In the concept of Big Data, the data are a "thing", meaning that it can be anything. Nonetheless, the data has to possess certain attributes. First, it has to reside in cyberspace and not in physical space (even though it might have been generated in the physical world). Second, it requires a unique identity in such a way that it can be distinguished from other data. For example, it can be associated with time and location, which would distinguish from similar data which take place on the same location but at a different time. Third, there has to be capability to communicate. This can be understood as a sensor connected to the Internet or information that is created and resides totally in cyberspace. Fourth, it requires "senses" in analogy with human senses. Again, we can think of sensors connected to the Internet or information that is created and resides totally on cyberspace. Finally, it requires the capability to be accessed and/or controlled from anywhere.
A problem connected to Big Data is the complexity of retrieval and processing algorithms. We are not dealing with this problem in this paper as these mechanisms are in constant evolution. We would like to emphasize though two aspects of utmost importance: quality control and data format.
Quality control seems self-evident but the fact is that in a Big Data scenario the data are pulverized, due to coming from sensors of any kind. To be made useful in the context of the application a quality control mechanism must be present. The tests described in this article consider that the data have gone through such process and are ready to be used.
Another very important issue is related to data format. A certain data format will be necessary to make it quantifiable. Again, in the pulverized world of Big Data that is not guaranteed. Therefore, a certain type of translation will be necessary. Again, in our tests, we consider that this has already taken place.
Big Data deals with "what" not with "why" and this distinction is very important. The huge amount of data that is constituted by Big Data deals primarily with events and patterns. The treatment of these data will allow the establishment of trends within certain probabilities. The conclusions are based on the data, as if it was speaking for itself, but the reason for those trends is not part of the data. Therefore, Big Data is not the brain but rather a muscle that reacts to impulses.

Experiments
We discuss three scenarios that we consider as potential for use in the modeling of neutral-atmospheric delay.

"Improving" Marini mapping functions
The Marini mapping function (Marini 1972) is a fraction form that relates the mapping function (mf) with an elevation angle (e) along the direction to the satellite as shown in Equation (1). The coefficients b and c are treated as constants whereas the coefficient a is estimated in a least-squares adjustment (Boehm and Schuh 2003) by inverting the fraction form: The mapping function is used to relate a delay along the zenith d z with the delay along the path (slant delay) d t . In a more complete formulation, we have where the total slant delay (d t ) is a function of the zenith hydrostatic (dry) delay (d z h ) projected to the line-of-sight (2) d t = d z h × mf h + d z w × mf w using a hydrostatic mapping function (mf h ) and the zenith non-hydrostatic (wet) delay (d z w ) projected using a wet mapping function (mf w ). In Equation (2), the total slant delay (d t ) can be separated into a dry component (d z h × mf h ) and a wet component (d z w × mf w ), referred to as the slant dry delay and the slant wet delay, respectively. Therefore, there are two equations similar to Equation (1), one for the dry part and another for the wet part, with corresponding a, b, and c coefficients for the dry and wet parts. In the sequel, we will refer to zenith total delay (ZTD) as the sum of d z h and d z w . Now, let us consider that we have a VMF grid containing value of the coefficient a for a particular epoch. Then, the Big Data algorithm has detected nearby "stations" (e.g. a cell phone, or the thermometer of a nearby smart home) with its analytics and defined it to be of quality. What can we do with it? In essence, we can solve for a correction "δa" by inverting Marini continued fraction (Equation (1)). Figure 1 shows a simulation of how much improvement this approach would provide around the world. The simulation uses actual data from a GNSS station as the "nearby station" and then estimates "δa". Figure  1 shows the behavior of the dry coefficient a. Figure   2 shows the similar behavior of the wet coefficient a. The grid locations used in the simulation are located in Antarctica, Greenland, Canada, North Europe, Middle Europe, the Amazon, Tanzania, and India, covering the whole year of 2013. We can see the improvement in the warmer locations and larger variability for midand high-latitudes. Similar pattern exists for the wet coefficient a, with the difference whose values are much smaller and there is much more variation along the year.

"Self-feeding" NWP
We have not performed any test with this approach. We only explain the idea, as portrayed in Figure 3. The usual way to generate a NWP is by feeding global

Near-real time empirical model updates
The UNB3m is a climatology-based model which uses the Saastamoinen formulas as modified by Davis et al. (1985) to calculate the zenith dry delay and the zenith wet delay from a look up table containing values for temperature, pressure, and relative humidity. It uses the mapping meteorological data into an assimilation process that represents as well as possible the physics of the neutral atmosphere. The result of this assimilation is a set of prediction values covering a grid. The idea is that meteorological data can be retrieved from cyberspace and automatically feed the assimilation process.

Notes on contributors
Marcelo C Santos is a professor of Geodesy in the Department of Geodesy and Geomatics Engineering, University of New Brunswick (UNB), Canada. He is the president of International Association of Geodesy Commission 4 "Positioning and Applications. " His research interest relates to geodetic applications of GNSS and to modeling of Earth's gravity field.
Thalia Nikolaidou is a doctoral student in the Department of Geodesy and Geomatics Engineering, University of New Brunswick (UNB), Canada. Her research interests lie with modeling the neutral atmosphere and GNSS positioning.

Marcelo C. Santos
http://orcid.org/0000-0001-6354-4601 function developed by Niell (1996) Figure 4 shows the zenith total delay (ZTD) for the whole year of 2013 evaluated for station UNBJ, located on the UNB Campus in Fredericton, NB, Canada. Because UNB3m is based on climatology, it represents the annual variation but not the short-term variations. The short-term variations are visible in the solution computed from GNSS data-sets (indicated by IGS ZTD in Figure  4) as they contain the actual variations. The modified UNB3m, using the "big data information", is capable of representing the short-term variations and follows very closely the IGS solution. Figure 5 shows the differences between IGS ZTD and the modified UNB3m ZTD. Figure  6 shows the zenith wet delay evaluated with UNB3m and the modified UNB3m. The IGS solution is not represented because it does not provide the wet delay. We also experimented for the predicted ZTD. For this purpose, we trained the neural network with data of 30, 180, and 250 days, and applied the neural network for a period of 40 days. The predicted ZTD was compared with actual ZTD. Figure 7 shows the differences between them. It can be seen that the differences are at the millimeter level for the cases when the neural network was trained with the data of 180 and 250 days.

Conclusions
There will be more and more data "out there" that is just waiting to be used. The "extra" (local) tropospheric information can be potentially used in three separate ways. It can be used to improve Marini coefficient, to feed NWPs, and to provide near-real time empirical model updates. We tested two of the three potential ways. The results show that it is possible to improve the Marini coefficient using the extra information from cyberspace. The experiments with the UNB3m model show that the climatology-based model can be significantly improved if actual meteorological information is readily available. In real life, the use of "meteorological big data" will require validation algorithms for retrieval and quality control, as well as to make data format consistent.

Funding
This work is partly funded by the Natural Sciences and Engineering Research Council of Canada.