Collecting volunteered geographic information from the Global Navigation Satellite System (GNSS): experiences from the CAMALIOT project

ABSTRACT Raw observations (carrier-phase and code observations) from the Global Navigation Satellite System (GNSS) can now be accessed from Android mobile phones (Version 7.0 onwards). This paves the way for GNSS data to be utilized for low-cost precise positioning or in ionospheric or tropospheric applications. This paper presents results from data collection campaigns using the CAMALIOT mobile app. In the first campaign, 116.3 billion measurements from 11,828 mobile devices were collected from all continents. Although participation decreased during the second campaign, data are still being collected globally. In this contribution, we demonstrate the potential of volunteered geographic information (VGI) from mobile phones to fill data gaps in geodetic station networks that collect GNSS data, e.g. in Brazil, but also how the data can provide a denser set of observations than current networks in countries across Europe. We also show that mobile phones capable of dual-frequency reception, which is an emerging technology that can provide a richer source of GNSS data, are contributing in a substantial way. Finally, we present the results from a survey of participants to indicate that participation is diverse in terms of backgrounds and geography, where the dominant motivation for participation is to contribute to scientific research.


Introduction
Citizen science, crowdsourcing and volunteered geographic information (VGI) are all terms that relate to the production of knowledge, often in the form of data collection (See et al. 2016).Citizen science has emerged largely from the fields of biodiversity and conservation (Bonney et al. 2009), where species observations are one area in which citizen science has, and continues to make, substantial contributions (Chandler et al. 2017;Sullivan et al. 2009).Crowdsourcing originates more from a business domain, defined as the outsourcing of tasks to the crowd that would otherwise be impossible with only the current resources of an organization (Howe 2006), e.g. a range of different tasks that have been allocated via systems such as Amazon's Mechanical Turk (Bergvall-Kåreborn and Howcroft 2014) and can earn volunteers micropayments.VGI, a term coined by Goodchild (2007), comes from a geographical domain and focuses on the locational aspect of the data, where OpenStreetMap (OSM) is the most successful example of VGI to date (Jokar Arsanjani et al. 2015).
Although these terms (and many other similar ones, c.f. See et al. 2016;Eitzel et al. 2017) have different nuances that partly reflect their domain origins, the elements that unite them are the volunteers who take part, and increasingly the technology that enables participation, in particular mobile devices.Smartphones have become one of the key tools for data collection by volunteers, which takes advantage of the GNSS (Global Navigation Satellite System) receiver for pinpointing location in real-time, based on the navigation solution utilizing code measurements (L-band pseudorange measurements) to at least four visible GNSS satellites, as well as other features of the phone such as the camera (for taking geo-tagged and date/time stamped photographs), the gyroscope for determining if the phone is moving or tilted, and the compass for direction, among others.More recently, some models of Android-based smartphones have chipsets with dual-frequency GNSS receivers (Dabove and Pietra 2019) while Google has made it possible to access the raw GNSS data in Android smartphones from Version 7.0 of the Android operating system onwards (EGSA et al. 2017).Together these two developments have opened up the possibilities for employing such IoT (Internet of Things) data in many new applications.In this context, the raw GNSS data should be understood as epoch-wise carrier-phase and code (pseudorange) observations carried out between a GNSS receiver (smartphone) and a single GNSS satellite.Therefore, at each epoch, the receiver collects multiple observations based on the visible set of GNSS satellites, and such observations are then used in the processing of the location.The amount of collected observations per epoch may reach twenty or more, but it varies as it depends upon the performance of the GNSS chipset and the in-built antenna, the number of supported GNSS constellations, and the measurement environment in which the receiver is present, e.g.urban canyons or rural areas.The clear benefit of leveraging raw GNSS data is the possibility of using carrier-phase observations that can improve the positional accuracy of smartphones from several meters to decimeters (Psychas et al. 2019;Critchley-Marrows et al. 2020;Darugna et al. 2021;Li et al. 2022), for spoofing and jamming detection (Miralles et al. 2018), and in seismological applications related to earthquake and tsunami detection (Fortunato, Ravanelli, and Mazzoni 2019).However, they can also be used for applications such as more precise augmented reality (Fu, Khider, and van Diggelen 2020) or other types of scientific applications that go beyond navigation.
GNSS is considered as an important tool in the field of atmospheric research thanks to its high accuracy and all-weather capability.The GNSS signal is delayed by water vapor as it passes through the atmosphere, which can provide information related to the current weather conditions (Guerova et al. 2016).GNSS can also deliver a very precise estimation of the total electron content (TEC), where slant TEC is the linear integral of the electron density along any satellitereceiver ray path (Davies and Hartmann 1997).The performed GNSS parameter estimation and dedicated processing (Takahashi et al. 2016;Zhang et al. 2018;Fermi, Realini, and Venuti 2019) allows changes in ionospheric and tropospheric states to be quantified with a high temporal resolution at a global scale.The capability of modern smartphones to precisely measure parameters of the atmosphere has, for example, been demonstrated by Stauffer et al. (2023).These subjects have been investigated within the framework of the CAMALIOT project (ESA NAVISP-EL1-038.2).It consisted of a set of activities related to the development of cloud-native software dedicated to processing of diverse GNSS datasets (high quality observations collected with the use of static geodetic receivers, and smartphone observations), the application of Machine Learning (ML) for spatial interpolation and forecasting of troposphere-and ionosphere-related parameters as derived with the use of GNSS, and acquisition of GNSS observations from the modern generation of smartphones with the use of a dedicated Android application.ML or deep learning (DL) in this context can be used to interpolate GNSS-derived time series (acquired at specific locations) in space and time.Incorporating information from different and rather complex domains (e.g.solar and weather data) can be beneficial, but also challenging as a direct physical relation is usually difficult to formulate mathematically.However, ML-enabled processing allows large amounts of data to be combined of various origins and types, including relevant parameters or models that are acquired with the use of various instruments and methods (Bao Zhang and Yao 2021).In addition, a subset of ML architectures related to recurrent neural networks and their variations, or other state-of-the-art approaches including transformers, tend to be especially powerful in terms of temporal extrapolation of target parameters.Given the above, ML tends to be an appropriate solution in terms of data fusion, classification or forecasting tasks.In relation to GNSS and the atmosphere, among others, recent examples include the utilization of DL for spatio-temporal interpolation of tropospheric parameters (Lu et al. 2023;Shehaj et al. 2023) and the development of models for TEC forecasting (Cesaroni et al. 2020;Natras, Soja, and Schmidt 2022).
Although there is a global network of geodetic stations that receive GNSS data on a continuous basis, there are large gaps in these networks, e.g. in Africa, South America and Asia.Hence, in this paper, we address one of the key research questions behind the CAMALIOT project: can we motivate volunteers to help collect VGI in the form of raw smartphone-based GNSS data for scientific applications?These data could help to fill the gaps in areas where there is a lack of stations, or the data could provide a denser set of measurements.
The starting point for this investigation was the availability of raw GNSS data, which can be accessed directly using an Application Programming Interface (API) provided by Google (2022).Note that such an API is not available for iPhones.Alternatively, it is possible to download an existing mobile application such as Geo++ RINEX Logger or Google's GNSS Logger, which allows users to download the raw data as CSV files or in a format specific to the field of geodesy and land surveying, i.e.Receiver Independent Exchange Format (RINEX; IGS and RTCM-SC104 2013).Receiver-specific RINEX files form the main input to various GNSS analysis software packages that perform calculations in post processing and utilize both carrier-phase and code observations.Such software packages (e.g.Bernese GNSS Software (Dach et al. 2015) or GipsyX/RTGx (Bertiger et al. 2020)) employ advanced processing schemes with the consideration of long observation periods and global GNSS networks to determine various target parameters (e.g.station coordinates, tropospheric delays, Earth rotation parameters, satellite orbits) with utmost accuracy.GNSS observations collected with the use of a global network of high performance receivers and antennas are utilized on an operational basis for realization of the international terrestrial reference frame (Altamimi et al. 2016) or investigation of global-scale geodynamical phenomena (e.g.Bevis et al. 1994;Hammond, Blewitt, and Kreemer 2016;Takahashi et al. 2016;Beutler et al. 2020), both of high importance for society and crucial for revealing the signals of a changing climate.When converted to files in the RINEX format, smartphone-based GNSS data can also be processed with such state-of-the-art GNSS software packages, which is the reason why commonly used GNSS logger apps support such a feature.However, there are two main issues with using Google's GNSS Logger app: (i) it does not support data download in the latest version of RINEX, and more importantly, (ii) there is no way to access the data from other users; e.g. even if the data were collected from Android phones by a company like Google, it would not be possible to share the data due to laws on data protection and data privacy.In addition, some stability issues were identified with prolonged use of the GNSS Logger app, so it did not represent a viable solution for the CAMALIOT project.There are other similar mobile apps available, but they are more geared towards viewing the satellite data (e.g.GPS Satellites Viewer and GNSS Compare), displaying positional accuracy (e.g.GPSTest) or are of a commercial nature (e.g.GNSS Surveyor).Hence, it was decided to develop a dedicated CAMALIOT Android application as a VGI tool to support meeting the goals of the project.This paper provides an overview of the application developed and a description of the campaigns that were launched to support data collection by volunteers.
2. Overview of the CAMALIOT mobile app and data collection campaigns

The CAMALIOT mobile app
At the beginning, we identified two main target audiences: (i) the general public, who might be interested in helping science, and (ii) the GNSS community, which might be interested in using the data for practical or research purposes.Hence the design was intentionally simple to encourage participation, including some gamification elements, while also providing data download and conversion (RINEX ver.3.03) capabilities targeted at the GNSS community.
In building the app, we used a layered approach by first creating a very stable base application, similar to Google's GNSS Logger app that also collects raw GNSS data.However, in contrast to GNSS Logger, which often crashed on our test devices after running it for some hours, we designed the CAMALIOT app to be as stable as possible.In addition, the base app was able to upload the data to the CAMALIOT server (ingestion micro-service comprising the CAMALIOT software that was running on Kubernetes), and export the data collected by users in RINEX-3 file format, which, as previously mentioned, would make the app very useful for researchers in the GNSS community as such smartphone-based single-or dual-frequency GNSS observations can be used directly in sophisticated analysis packages.The base app was written in Java, using the Google API to directly access the raw GNSS data from Android mobile phones as well as open-source code by Rokubun for conversion of the data to RINEX files (Rokubun 2020).
On top of this first base app, additional pages, written in Unity, were added.The example pages of the user interface (UI) of the CAMALIOT Android application are shown in Figure 1.The first page consists of simple instructions and an explanation of why users should participate in the project, which is shown when the user starts the app.The second page is the main screen (shown in Figure 1a), from which users can start the process of data collection.On top of this page, statistics and a leaderboard were added (Figure 1b); such a gamification element was intended to create competition, which could appeal to the motivations of some participants (Thiel and Fröhlich 2017).The leaderboard also acts as a form of feedback so that users could see their collective contribution in relation to other users.In addition, logos of participating partners were displayed (Figure 1c) as well as an FAQ (Frequently Asked Questions) targeted at those users interested in learning more about the project, the GNSS, and the scientific use cases in the project.
Once users download the app from the Google Play Store, they create an account and supply an alias for the leaderboard (Figure 1b), which is displayed along with how many measurements they have collected in relation to others.Users are also asked to agree to a set of Terms and Conditions with a link to the Privacy Policy for the data, which explains how any personal data are stored and used within the project.Both documents are also available from the CAMALIOT website (https:// www.camaliot.org).The user database is housed at a separate location to the CAMALIOT infrastructure (cloud-native micro-service-based software running on Kubernetes); in this way we separated the GNSS data collected from any user details, ensuring that users could not be identified, thereby complying with the EU's General Data Protection Regulation (GDPR).With a similar regulation in mind, the CAMALIOT software has been deployed and tested on the resources provided by EXOSCALE (https://www.exoscale.com),where all the infrastructure is located in Switzerland.Once users agreed to the Terms and Conditions and Privacy Policy, they could begin collecting GNSS data.In addition to the username, an encrypted password, an email address and the total number of measurements collected by the participant, the GNSS-relevant data collected by the mobile phone are listed in Table 1.The received GNSS satellite time, at the measurement time, in nanoseconds ReceivedSvTimeUncertaintyNanos The error estimate (1-sigma) for the received GNSS time, in nanoseconds Cn0DbHz The Carrier-to-noise density in dB-Hz PseudorangeRateMetersPerSecond The Pseudorange rate at the timestamp in m/s PseudorangeRateUncertaintyMetersPerSecond The pseudorange's rate uncertainty (1-Sigma) in m/s AccumulatedDeltaRangeState This indicates the state of the AccumulatedDeltaRangeMeters measurement AccumulatedDeltaRangeMeters The accumulated delta range since the last channel reset, in meters AccumulatedDeltaRangeUncertaintyMeters The accumulated delta range's uncertainty (1-Sigma) in meters CarrierFrequencyHz The carrier frequency of the tracked signal CarrierCycles The number of full carrier cycles between the satellite and the receiver CarrierPhase The RF phase detected by the receiver CarrierPhaseUncertainty The carrier-phase's uncertainty (1-Sigma) MultipathIndicator A value indicating the 'multipath' state of the event SnrInDb The (post-correlation & integration) Signal-to-Noise ratio (SNR) in dB ConstellationType The constellation type AgcDb The Automatic Gain Control level in dB Data collection occurs via the main screen of the app (Figure 1a), which shows an example of ongoing data collection after a user has pressed the START LOGGING button.The information on the screen indicates how much data have been collected, but also from which satellites and how many satellites per constellation (i.e.GPS, Galileo, etc.) are available.If the phone has dualfrequency capabilities, this is also shown on the screen.For example, in Figure 1a, data are being collected using GPS satellites in L1 and L5 bands, whereas for Galileo satellites, measurements in the corresponding E1 and E5 bands were recorded.
The app's main data collection screen also has a real-time Measurement Quality indicator, which in the measurement UI is represented as a color-changing circle.This indicator reflects how well the phone is situated for data collection.In the case of the CAMALIOT project, the most beneficial measurements would be from a phone that is stationary and has an unobstructed view of the sky, i.e. located outside.Conversely, if the phone is moving or has a completely obstructed view, e.g.inside a building, the measurement quality indicator will change color from yellow (decreasing quality) to red (poor quality), depending upon the quantity of recorded phase and code observations.The approach for an epoch-wise classification of the quality of observations was derived empirically, taking into account all frequency bands (either one or two), several factors, and two threshold values.In general, the application checks whether the phone is moving or not based on the input from the motion sensors (accelerometer and gyroscope) and speed recordings acquired via Android`s LocationListener interface (values below 0.1 m/s are ignored), while utilizing the number of carrier-phase and code observations collected (per epoch) to assess the measurement epoch.The classification algorithm is described below: . gray color: no GNSS observations at all .red color: ○ in general, if the number of code observations (from all bands) < 7 ○ a case where carrier-phase observations were recorded, but the number of carrier-phase observations (from all bands) is < 7 .orange color: o in general, if the number of code observations (from all bands) is < 15 o a case where carrier-phase observations were recorded, but the number of phase observations (from all bands) is < 15 .yellow color: o in general, the number of code observations (from all bands) is at least 15, but the smartphone is moving o a case where carrier-phase observations were recorded, and the number of phase observations is at least 15, but the smartphone is moving .green color: o similar to yellow, but the smartphone is not moving.
The motivation behind the utilization of the number of GNSS observations is that they can be a good proxy for the environmental context (indoor, outdoor, obstructed view) in which the smartphone is present, similarly to the observed satellite-specific C/N 0 values.Although such an algorithm could be refined, for instance by leveraging additional factors such as the aforementioned C/N 0 or orientation of the phone (vertical/horizontal), the real-time Measurement Quality Indicator was, nevertheless, helpful to direct the users to some extent towards collection of a greater fraction of data that could be of use for the use cases investigated within the framework of the CAMALIOT project.
The INFO link on the right (in Figure 1a) provides more information about what the colors mean.Figure 1a shows two further options: the first is the ability to log data in the background, activated by clicking on the LOG IN BACKGROUND button.This means that the app will run completely in the background and use less resources.Finally, the app can run in continuous mode as indicated at the top of Figure 1a.This means that the user can tell the app to upload data quasicontinuously to the server (with an upload frequency chosen by the user), thus avoiding the need to manually upload the data.This latter feature was added based on user feedback.In addition, the user can also choose whether only the WiFi connection should be used for data uploading instead of mobile data.In case the WiFi connection is not available, all of the pending uploads will be resumed once the WiFi is available.Details of how to obtain the app are provided on the camaliot.orgwebsite or by searching the Google Play Store for CAMALIOT.

The CAMALIOT data collection campaign
We launched the first campaign on 17 March 2022, which ran until 31 July 2022.The length of the campaign was chosen to provide adequate time for demonstrating GNSS data collection by volunteers as a proof of concept.The campaign was advertised through various media channels, newsletters, magazines and mailing lists.To encourage participation, we offered prizes including a dual-frequency mobile phone, Amazon vouchers and branded goods from various companies.Although we announced that there would be prizes during the launch of the campaign, the details were only released as the campaign progressed.This allowed us to feed information to the volunteers via the website, social media and notifications in the app regarding the prizes.In the end the first prize consisted of a dual-frequency mobile phone, 2nd to 5 th prizes were Amazon vouchers ranging from 50 to 200 Euros, and the prizes for the 6 th to 20 th ranked winners consisted of goodie bags containing swag from various companies.We also provided a clear set of prize rules posted on the website in which we stated that the 20 prizes would be awarded based on a random draw but weighted according to the number of measurements made by each participant.Hence, the more data collected, the more likely a participant would be to win a prize.During the campaign, we introduced a map showing the global distribution of data collected (Figure 2) so that users could see their contributions as part of a larger collective effort, updated in near-real time.We also added a dashboard showing the top 10 contributing countries by number of devices and the number of measurements collected.
At the end of the first campaign (31 July 2022), the random prize draw was held, and the prize winners and all participants were asked to fill out a questionnaire.We then launched another campaign, called the Autumn campaign, due to the success of the first data collection campaign (see Section 3), which ran from 1 August until 30 November 2023.

Overview of the data collection
Table 2 summarizes the total amount of data collected by absolute number of GNSS measurements (epoch-wise satellite-specific GNSS observations, where carrier-phase and code observations are treated here as a single measurement) and the total number of devices (as a proxy for number of participants).These results indicate that the first campaign had a very good level of participation, with a large number of measurements contributed by the volunteers.Although there were less observations collected in the Autumn campaign, it still demonstrated a continued interest in collecting data.
Figure 3 shows the distribution of the data collected across the entire time period by the total number of GNSS measurements collected each day and the number of devices.Right after the launch, there was an abrupt increase in the number of measurements and participants, which coincided with a campaign launch that was marked by a press release, announcements on social media, the release of the video produced for the campaign and a news item that appeared on the web pages of the European Space Agency.After the campaign was advertised in various newspapers (including the Guardian in the UK) and on SciTech Daily (marked by the red dotted line and A in Figure 3), there was a marked increase in the measurements and mobile usage.A second event occurred around 20 May 2022 (marked B in Figure 3), when a major update to the CAMALIOT mobile app was released in which users could now upload data on a continual basis rather than manually uploading data.There were some small peaks just after this event occurred.The event marked as C in Figure 3 denotes the end of the first campaign, which showed a slight increase in measurements as the Autumn campaign began, possibly due to advertising via the website and social media.A similar pattern can be seen again at the end of October (marked by E) with some additional social media activity.There was a decline in participation as the Autumn campaign progressed although numbers stabilized during the month of November.Moreover, a good amount of data is still being collected by volunteers (Table 2).Finally, one can see an example of where the server had issues (marked D on Figure 3), but volunteers alerted us to these issues, which were then promptly fixed.

Geographical distribution of the data collected
One of the key goals of the campaign and the CAMALIOT project was to try to determine whether the data gathered can either fill gaps in the network of geodetic stations (Blewitt, Hammond, and Kreemer 2018) around the world (Figure 4a) or provide a denser contribution in areas where the geodetic network is already good such as North America, Europe, or Australia.Note that some of these stations might have outages or are no longer operational, and the network with GNSS stations of the highest quality, which is run by the International GNSS Service (https://www.igs.org/network), is significantly smaller than that shown in Figure 4a (i.e.having approx.500 stations).In contrast, Figure 4b shows the spatial distribution of data collected from the campaigns (17 March 2022-30 November 2022).One can see that the main VGI contributions are from Europe and North America, but that there are some notable areas in which the density of contributions is clearly higher, i.e.Brazil, some parts of India and Pakistan.
Figure 5 provides a more in-depth view of the distribution of measurements across South America and demonstrates clear gap filling, with some measurements collected in the Amazon basin.However, the majority is concentrated in more populated areas of Brazil.A more in-depth view is also provided in Figure 6, showing the southern part of Europe.
Although the network of geodetic stations is relatively well distributed across many European countries, the distribution of the VGI demonstrates how denser samples have been collected in many areas, e.g. in urban areas, shown clearly in Spain, while more rural areas still show gaps.However, the density of VGI in Germany is very high in comparison to the geodetic network.Moreover, gaps in the geodetic station network in areas such as the Alps have been partly filled by VGI.

Retention of participation in the Autumn campaign
To examine the issue of retention, we considered participation only from the Autumn campaign since we expected that many volunteers would be those already recruited from the first campaign.To do this, we calculated the number of days from the start of the Autumn campaign (1 August 2022) until 30 November 2022 during which contributions were received from the same device (as a proxy for participants).We then calculated the amount of participation as a percentage of the total time as well as the total number of measurements collected.The results are shown in Figure 7.The first three categories are participation for 1, 2 and 3 days, which are likely participants who tried out the mobile app, but did not continue to make any substantial contributions.In total, these three categories account for 56% of the participants, implying that we still continued to onboard participants during the Autumn campaign.Looking at the contributions of participants who collected data for more than one month (or what we might term as 'active' participants), they collected data in 32 countries around the world.As a percentage of the total number of active contributors, the highest participation was from Germany (25.4%), USA (13.1%),Brazil (9.8%), Spain (6.5%) and the UK (5.2%).Of the remaining countries, most are European, but there are also active contributors in Asia and Australia.

Types of mobile device used and presence of dual-frequency phones
As part of the data collection, we recorded the type of phone used by each participant and whether the phone was single-or dual-frequency since one of the purposes of the CAMALIOT project was to collect GNSS data from dual-frequency phones, specifically for a scientific use case related to space weather modeling and forecasting (Natras, Soja, and Schmidt 2022).Figure S1 in the Supplementary Material shows the breakdown of the type of phone used as a percentage of all phone types while Figure 8 shows the percentage of dual-frequency phones (i.e.GNSS chipset models/versions that can support dual-frequency observations) compared to single-frequency ones for all phone manufacturers that offer both types of GNSS chipsets.Although Samsung represents the largest number of phones used in the two campaigns (Figure S1), less than 20% have a dual-frequency capability.However, the next two largest manufacturers (Xiaomi and Google) have a larger percentage of phones with dual-frequency capabilities, with Google at roughly half.Both OnePlus and Nubia have mainly dual-frequency phones although the number of Nubia phones used was quite small (around 50).Information about the distribution of phone types by continent is provided in Figure S2 in the Supplementary Material.In total, the percentage of phones with dual-frequency capabilities used to collect data across both campaigns was 27.1%.
Finally, Figure 9 shows the distribution of single-and dual-frequency phones by continent to determine the feasibility of gathering dual-frequency measurements spatially.Overall, there is a relatively even distribution between mobile phones with these multi-frequency GNSS recording capabilities although single-frequency mobile phones are slightly more prevalent in Asia and South America.

Brief analysis of data quality
In addition to data quantity, data quality plays a crucial role in determining the usefulness of crowdsourced data for scientific applications.In this study, we considered three important indicatorsobservation continuity, C/N 0 , and the capability of multi-GNSS -to analyze the quality of the data collected.The distribution of the data collected along these dimensions is illustrated in Figure 10.To obtain high-precision tropospheric or ionospheric delays from GNSS data, it is necessary to have a sufficient time span of observations to enable solution convergence during parameter estimation.Typically, smartphone GNSS data require a longer duration for convergence due to their higher measurement noise.Our analysis reveals that 51.2% of the data had observation durations longer than 1 h, which offers the potential for achieving converged solutions.Additionally, a noteworthy portion of data collected during the campaign exceeded longer durations, with approximately 18.4% and 4.9% of data having durations longer than 6 and 12 h, respectively.The quality of the data, as indicated by C/N 0 values, is directly influenced by the observation environment.Higher C/N 0 values generally correspond to better data quality, with values larger than a certain threshold implying data collection in open sky.The mean C/N 0 of the data collected during the campaign was 27.4 dB-Hz, suggesting that the majority of data was collected indoors.Only 7.4% of the data potentially originated from outdoor environments when a threshold of 35 dB-Hz was used.If both observation continuity and C/N 0 are considered, 2.7% of the data holds potential for contributing to troposphere and ionosphere monitoring in the CAMALIOT project.Interestingly, we found that 94.6% of the data contained not only GPS observations but also observations from at least one other GNSS constellation, such as GLONASS, Galileo, or Beidou.This indicates that most modern Android smartphones have the capability to track multiple GNSS systems, thereby enhancing their potential contribution to scientific studies.

Participant profiles and motivations
The prize winners were asked to fill in a questionnaire as a condition for receiving their prize (clearly outlined in the rules of the campaign).This consisted of 21 responses (due to one prize being unclaimed in the first campaign and the August prize winner having already won a prize in the first campaign).Due to EU's GDPR, we were not able to email all participants to ask them to fill in the questionnaire more generally.Instead, we advertised the link to the questionnaire on the front of the camaliot.orgwebsite, and notifications were pushed through the app asking participants to fill it in.As a result, we collected an additional 361 responses from non-prize winners, resulting in a total of 382 responses to the questionnaire.Note that the majority of these responses are from the participants in the Autumn campaign.The results are summarized in the sections that follow.

Socio-demographic profiles
The first part of the questionnaire was focused on gathering sociodemographic data about the participants, where we first asked about gender.All prize winners were male, but when all 382 responses were considered, 94.8% were male, 5.0% were female and one person chose non-binary.After gender, we asked participants to indicate their age from a range, which is summarized in   The next question was focused on education as shown in Figure 12.Although the majority had a university degree (bachelor or masters), roughly a third of participants had a high school education as the highest level of attainment.Hence, the educational profile is quite diverse, where the campaign reached not only those with a higher education background.
Next, we asked about the type of employment of the participants (Figure 13), which shows that the majority are working in private industry, but that there is a range of diverse employment types  including students and retired people.In fact, researchers make up only a small proportion of the respondents, which indicates that the campaign has appealed to many people outside of academia.
Finally, in the category regarding general information about participants, we asked them to tell us in which country they reside.Figure 14 shows this geographical distribution, focusing on only those countries with at least a 1% representation in the sample.The majority of respondents  were from European countries with France and Germany as the top countries.However, there were also responses from outside Europe including USA, Brazil, India, Canada and Australia.Table S1 in the Supplementary Material shows the geographical location of respondents in other countries where data were collected.

How respondents learned about the campaign
In the next section of the questionnaire, we asked participants how they found out about the campaign, shown in Figure 15 as a percentage of total responses.The two main sources are social media and websites, e.g.news pieces or internet searches.The 'Other' category included responses such as radio, weather forecasts, various YouTube channels, podcasts, etc. Newsletters, emails and word of mouth played a much smaller role in reaching the participants.

Motivations for participation
The next part of the questionnaire addressed the motivation behind why participants took part in the campaign.Figure 16 shows those motivations that were selected by the respondents, where more than one motivation could be chosen.The most frequently selected motivation was that participants are contributing to science followed by the project is interesting, but all motivations were selected by at least some of the respondents.Hence, in addition to contributing to science, gamification and competition did appeal to some of those taking part.There was also an 'Other' category with free text to capture other motivations, but no additional responses were received.
We then asked respondents to choose the dominant motivation from those they had selected, which is summarized in Figure 17 as a percentage of all motivations.Here the desire to contribute to scientific research is clearly in the majority, with the fact that participants found the project interesting also acting as a key driver for participation.

Data usage
Table 3 summarizes the responses from two questions regarding data usage.We wanted to understand if users were downloading the data for their own purposes and whether participants would be interested in accessing all the data collected from the campaign, e.g. because they were GNSS professionals.The results indicated that the majority were not interested in downloading their own data (i.e.84.5%) but a larger percentage were interested in the full data set (i.e.31.2%).Thus, it is clear that the majority of participants were volunteers who were interested more generally in the project, which is supported by both educational backgrounds and the motivations for participation.3.2.5.Overall user ratings of the campaign At the end, the questionnaire had two questions related to the overall experience in the CAMALIOT campaign and the quality of the information on the CAMALIOT website.Ratings were made using a star system from 1 (worst) to 5 (best).The average rating for the overall experience in the CAMA-LIOT campaign was 4.35 while the quality of information on the CAMALIOT website was rated as 4.19 overall.Both indicate general overall satisfaction.

Discussion
Here we have shown that the CAMALIOT data collection campaigns have been successful in terms of the amount of data collected as well as the geographical distribution, where data have been collected across all continents, including even Antarctica where one device has collected measurements.Although this exercise was intended as a proof of concept as part of a scientific research project to examine how a modern generation of smartphones with GNSS capabilities can contribute to scientific applications beyond navigation, it has also demonstrated the willingness of citizens to take part in collecting GNSS data.In many ways this application is similar to the SETI (Search for Extraterrestrial Intelligence) project in which participants were willing to provide their processing power for scientific research (Anderson et al. 2002) and the Weather Signal app, in which users passively collected temperature data from their mobile phones from around the world (Sosko and Dalyot 2017).Based on responses from the questionnaire, the overwhelming motivation for participation was helping science, which we have found in similar citizen science applications involving volunteers, even when small payments were provided (Laso Bayas et al. 2020), and more generally in the literature (West, Dyke, and Pateman 2021).Although gamification and prizes were added to the campaign to appeal to different motivations of the crowd, it appears as if these elements were not considered to be as important as helping science or even selected all that often by questionnaire respondents as one of the underlying drivers of motivation.
As mentioned in the introduction, GNSS data can be used to provide valuable information on the presence of water vapor or on the ionospheric state.Hence, an important outcome of the campaigns has been to demonstrate that the data collected has the potential to fill gaps in the current network of geodetic stations (e.g. in Brazil), but that they can also provide a denser sample in other parts of the world (e.g. in Europe).The large contributions from Brazil were very surprising, but another citizen science project in which 10 million battery temperature measurements were collected in Sao Paulo, Brazil (Droste et al. 2017), indicates that there is clearly a willingness to take part in such projects.We acknowledge that greater participation in the countries of Africa and Asia is desirable, but has not been fully achieved in this project, which was originally designed as a proof of concept.To increase participation in these regions, one would need to work with partners in these countries, e.g.space agencies, meteorological services and non-governmental organizations, to help promote the app and the concept of crowdsourcing GNSS observations.Translation of the app and website into local languages, information campaigns to raise the awareness of the value of GNSS data beyond positioning or navigation, and other types of incentives such as micropayments could be avenues for exploration to improve the worldwide data collection coverage.
Another geographical pattern in the contributions was the concentration of data collected in populated areas versus more rural, which is a bias seen in other crowdsourcing projects like Open-StreetMap (Thebault-Spieker, Hecht, and Terveen 2018).However, in the CAMALIOT project, this urban-rural bias could be an advantage because one of the aims is to improve local weather forecasting of precipitation events, e.g.across a city, and this would require a dense set of observations.However, in this case one needs to be more specific upon the preferable location of the smartphones while taking measurements.GNSS radio signals used for precise positioning remain vulnerable to interference, jamming, and local variations in demanding environments such as dense urban areas, tunnels/bridges or thick vegetation.As a result, this leads to unwanted blockages, multipath interference, and reflections of GNSS signals before they reach the receiver antenna (Kubo, Kobayashi, and Furukawa 2020).Due to the low suppression level of such errors in the case of in-built GNSS smartphone antennas (G.Li and Geng 2019), this is an important aspect to consider in the future in relation to the science use cases examined throughout the course of the CAMALIOT project.Therefore, an open field with no obstructions from buildings or other large features would be favorable in this case.In general, one would also need to communicate clearly that the preferred orientation of smartphones is an upward orientation, as this tends to be the most beneficial in terms of the quantity and quality of raw GNSS observations that can be collected from smartphones (Li et al. 2022).The proposed concept would also require relatively continuous observations, at a specific location or area where tropospheric and ionospheric conditions are still identical, if these were to be assimilated into numerical weather prediction models in near real-time, or used in ionosphere-related studies, and therefore have an impact.Although such a data collection campaign can potentially contribute if more users could be convinced to collect GNSS data, there is also an argument that the intervention of big technology companies is needed such as Google, who could provide this information in the same way that IBM's Weather Channel has provided pressure data in the past for weather forecasting purposes (Cliff Mass Weather Blog 2022) as a form of corporate social responsibility.
In terms of contributions to the campaign over time (Figure 3), the pattern follows many other typical VGI and citizen science contribution patterns, i.e. large increases in participation following the launch of a campaign (due to advertising) and then increases linked to any subsequent major advertising events, e.g.being picked up by the Guardian newspaper and SciTech Daily, followed by a decrease in participation as the campaign progresses.Retention of participants is a well-known issue in VGI, citizen science and user-generated content such as Wikipedia (Frensley et al. 2017;He 2012).Referred to as the 90-9-1 rule for online communities, 90% of users never contribute, 9% of users contribute a little bit and almost all of the content is generated by only 1% of participants (Nielsen 2006).In the case of OpenStreetMap, 3.5% of volunteers have generated more than 98% of the content in the past (Neis, Zielstra, and Zipf 2011).To investigate this phenomenon, we examined retention in participation during the four months of the Autumn campaign (1 August 2022-30 November 2022), with the argument that these participants would most likely have been recruited during the first campaign and hence be more likely to continue collecting data.However, we showed that 56% of participants only collected data for one to three days so we clearly continued to onboard new participants that we did not retain.More active participants, i.e. those that contributed for more than one month, represent 15.4% of the participants who contributed 85% of the data.Hence, the ratio of contribution is not as extreme as that of the 90-9-1 rule or what was found in OpenStreetMap, but it still follows this type of typical skewed contribution pattern.
The results from the questionnaire provided insights into the backgrounds of the participants.There is a clear gender bias towards male participation, which is also seen in applications like Open-StreetMap (Gardner et al. 2020).However, the questions on age, education and employment indicate a diversity of backgrounds, which is often not the case with citizen science projects, e.g. a recent survey showed that participation was highest amongst those currently in education while in the CAMALIOT campaigns, a much larger proportion were employed in industry (West, Dyke, and Pateman 2021).This also demonstrates that the target audience reached by the campaigns was citizens interested in the project and much less from the GNSS community.Moreover, advertising the campaign in the GIM international online magazine, aimed at professionals, also showed little impact on the numbers of participants when this advertising event occurred.
The distribution of Android mobile phones in 2022 in terms of market share (Curry 2022) largely follows the relative distribution of the types of mobile phones that were used to collect data in the campaigns.Although this is not unsurprising, the number of devices capable of receiving GNSS signals in dual-frequency mode was higher than expected and was also well distributed globally, especially given that this is an emerging technology.This further demonstrates the potential of collecting raw GNSS data at scale for scientific applications of a global nature.However, it should be noted that even if a phone model is theoretically capable of dual-frequency GNSS signal reception, this is not always the case with all GNSS chipset models and versions as demonstrated by Barbeau (2021).

Conclusions
This paper provided an overview of the CAMALIOT mobile app for the collection of GNSS data along with results from two data collection campaigns, an intrinsic part of the CAMALIOT project (ESA NAVISP-EL1-038.2).We demonstrated the potential of the data in terms of filling spatial gaps and increasing the density of observations relative to permanent geodetic GNSS networks.We also showed that there has been a relatively substantial contribution from mobile phones with dual-frequency capabilities, which are still an emerging technology.We then presented results from a questionnaire showing the diversity of participation and the drivers motivating the data collection.As the CAMALIOT project has now ended (as of 31 October 2022), plans for how to sustain the data collection are still under discussion.However, for the time being, we will continue to encourage data collection via the app and participation in the development of the CAMALIOT community.The science use cases related to CAMALIOT that integrate the GNSS VGI will be presented in future papers.

Figure 1 .
Figure 1.Screenshots from the CAMALIOT mobile app (from 20 May 2022) showing (a) the data collection page, (b) the leaderboard and (c) the information page with the partners involved in the project.

Figure 2 .
Figure 2. Screenshot of the global map shown on the CAMALIOT website during the campaign (for data collected up to middle of June 2022).

Figure 3 .
Figure 3. Volume of data collected (grey line) and the number of devices (blue line) from 17 March to 30 November 2022.The letters A to E correspond to events, described in more detail in the text.

Figure 4 .
Figure 4. Spatial distribution of (a) publicly available GNSS stations of geodetic grade in orange and (b) crowdsourced data collected between 17 March 2022 and 30 November 2022 in blue.

Figure 5 .
Figure 5.An example of gap filling of measurements in Brazil showing (a) publicly available GNSS stations of geodetic grade in orange and (b) data collected via the CAMALIOT app in blue.

Figure 6 .
Figure 6.An example of densification of measurements in Europe showing (a) publicly available GNSS stations of geodetic grade in orange and (b) data collected via the CAMALIOT app in blue.

Figure 7 .
Figure 7. Participation by the number of days the participants took part in the autumn campaign as a percentage of the total number of days along with the percentage of measurements collected.

Figure 8 .
Figure 8. Distribution of mobile phones by single-and dual-frequency GNSS recording capabilities.

Figure 9 .
Figure 9.Continental distribution of mobile phones by single-and dual-frequency GNSS recording capabilities.

Figure 10 .
Figure 10.Distribution of time span and C/N 0 of collected data.Each dot represents a data file containing GNSS observations from a single session.Black dots indicate files with GPS observations only, while red dots represent files with multi-GNSS observations.The horizontal dashed line denotes the C/N 0 threshold of 35 dB-Hz, while the vertical line denotes the time span threshold of 1 h.

Figure 11 .
Figure11.The results indicate that participants were from all ranges, but that just over a quarter were aged between 35-44.The next question was focused on education as shown in Figure12.Although the majority had a university degree (bachelor or masters), roughly a third of participants had a high school education as the highest level of attainment.Hence, the educational profile is quite diverse, where the campaign reached not only those with a higher education background.Next, we asked about the type of employment of the participants (Figure13), which shows that the majority are working in private industry, but that there is a range of diverse employment types

Figure 11 .
Figure 11.The distribution of respondents by age classes as a percentage of total respondents.

Figure 12 .
Figure12.The distribution of respondents by type of education as a percentage of total respondents.

Figure 13 .
Figure 13.The distribution of respondents by the employment type as a percentage of total respondents.

Figure 14 .
Figure14.The geographical distribution of respondents ranked by country where the number of responses is greater than 1% of the sample.

Figure 15 .
Figure15.The distribution of channels through which participants learned about the campaign as a percentage of the total respondents.

Figure 16 .
Figure 16.The motivations for participation where respondents could choose more than one motivation from a list.

Figure 17 .
Figure 17.The dominant motivation for participation in the CAMALIOT campaigns as a percentage of the total responses.

Table 1 .
The variables collected by the mobile phone with their descriptions.
Name of Variable Collected Meaning Manufacturer The manufacturer of the product/hardware Model The end-user-visible name for the end product ElapsedRealtimeMillis Elapsed milliseconds since boot Provider The name of the provider that generated this fix Latitude The latitude, in degrees Longitude The longitude, in degrees Altitude The altitude if available, in meters above the WGS 84 reference ellipsoid Speed The speed if it is available, in meters/second over ground Accuracy The estimated horizontal accuracy of this location, radial, in meters (UTC)TimeInMs The UTC time of this location fix, in milliseconds since epoch (January 1, 1970) TimeNanos GNSS receiver internal hardware clock value in nanoseconds LeapSecond The leap second associated with the clock's time TimeUncertaintyNanos The clock's time Uncertainty (1-Sigma) in nanoseconds FullBiasNanos The difference between TimeNanos and the true GPS time in nanoseconds BiasNanos The clock's sub-nanosecond bias BiasUncertaintyNanos The clock's bias uncertainty (1-Sigma) in nanoseconds DriftNanosPerSecond The clock's drift in nanoseconds per second DriftUncertaintyNanosPerSecond The clock's drift uncertainty (1-Sigma) in nanoseconds

Table 2 .
The total number of measurements (in billions) and total number of devices in the first campaign (17 March to 31 July 2022) and in the subsequent months of the Autumn campaign.

Table 3 .
Responses to questions regarding interest in using data from the campaign.