New frontiers on open standards for geo-spatial science

Abstract The process of sharing of data has become easier than ever with the advancement of cloud computing and software tools. However, big challenges remain such as efficient handling of big geospatial data, supporting and sharing of crowd sourced/citizen science data, integration with semantic heterogeneity, and inclusion of agile processes for continuous improvement of geospatial technology. This paper discusses the new frontiers regarding these challenges and the related work performed by the Open Geospatial Consortium, the world leading organization focused on developing open geospatial standards that “geo-enable” the Web, wireless, and location-based services and mainstream IT.


Challenges in geo-spatial science
Twenty years ago, data were kept locally and sharing was performed via FTP, media such as DVDs, and mail attachments. Setting up a web server to publish data required having developers with enough skills to setup a machine, configure security, and manage hard disc quotas and other issues. These developers had a limited set of tools that helped them achieve tasks, such as image processing or spatial analysis software. Websites were mostly static and there were no mobile applications.
Today, sharing data is easier than it has ever been before. Dedicated projects exist to support uploading and sharing of data in distributed environments (Zaslavsky et al. 2014;Cao et al. 2016). Setting up a web server on a cloud environment can be completed in minutes. Plenty of libraries exist to support science (Steiniger and Hunter 2012;Qin, Zhan, and Zhu 2013;Bunting et al. 2014).
Today's issues include maximising reusability and performing real collaboration for building tools, creating automated analytical process flows and, most importantly, making the resultant information available to the widest possible community of interest. More than ever it is important to setup agreements about best practices for technologies architectures, interfaces and encodings, to support cross community collaborative science. This is where the Open Geospatial Consortium (OGC) plays an important role.
The OGC, founded in 1994, serves as the global forum for developers and users of geospatial data. As of 2017, OGC has more than 500 members including vendors, government agencies, universities and research institutes all of which provide representatives who participate in a consensus process to advance open interfaces and encoding standards.
OGC collaborates with other standard development organizations, such as the International Standardization Organization (ISO), Organization for the Advancement of Structure Information Standards (OASIS), Internet Engineering Task Force (IETF), the International Hydrographic Organisation (IHO), the World Meteorology Organisation (WMO), the National Emergency Number Association (NENA), and Open Mobile Alliance (OMA), and many others, to provide cross-cutting solutions.
OGC has learned to be aware and react to the challenges of the new frontier. The 2010 article geospatial cyber-infrastructure: past, present and future (Yang et al. 2010), discusses key needs to provide a cyber infrastructure to support collaborative research because of the need to advance the following challenges: • Support for different communities that view their own problems with their own unique lenses; • Analysis of data; • Semantic heterogeneity; • Intermediate support services; • Citizen-based science; • Advancement of cloud computing; • Federations of different types of organizations.
Seven years after the publishing of the paper, the landscape and the needs highlighted by Yang et al. still hold to be true. This article, inspired by Yang's challenges, highlights the status and support of standards in three areas critical to advancing a geospatial cyber-infrastructure OPEN ACCESS to support science: big data (including data analysis and cloud computing), citizen science, and semantic heterogeneity. A fourth factor, agility, has been added to this discussion as another key characteristic of the new frontier. Agility covers the process to support rapid innovation, open collaborative development environments, and validation tools.

Big data
"Big data" is a technology trend that can be explained with the following 4 Vs characteristics: Volume -the size of the data; Velocity -the frequency of new created or published data; Variety -the different kinds of data, including formats, structures and data types; and, Veracity -the trustworthiness, quality, and provenance of data.
Big data came after the cloud computing trend, allowing systems to scale to handle data with the 4 V characteristics. Access and publishing of the data are simplified but most importantly analysis and visualization of data also are considered. The variety and velocity characteristics can impede the fast processing for extracting crucial information or spot patterns, as well as providing powerful visualization for decision-makers.
For example, consider wide area motion imagery (WAMI), where a sensor might have several highresolution cameras taking pictures several times per second. An uncompressed WAMI ortho-rectified image is about 450 MB (150 megapixel at 8 bits per second). If recording at a frequency of 2 frames per second, 86,000 frames will be sampled over 12 h. At a 10:1 compression ratio, the data in 12 h will be 3.7 TB per day per sensor (Thakkar 2012). Processing and analysing these amounts of data in real time or looking at archives of data can be a challenge.
Standards provide a common set of interfaces and formats that facilitate the creation of tools for fast ingestion, analysis, and visualization thus minimizing the time it takes for data processing. The server layer presented in Figure 1 provides examples of OGC standards that support the translation from data sources into open interfaces easily consumable by different types of clients. Servers can be of three types: data servers which includes both images and vector data; catalogue servers that provide an inventory of sources and that helps clients discover web services, a particular type of data (e.g. rainfall, Landsat, population) or a specific service interface (e.g. web map service (WMS), web feature service (WFS), web coverage service (WCS), sensor observation service (SOS), etc.), and processing servers which provide web processing service (WPS) to support services such as semantic translation, geocoding, and chaining of services, among others.
Organizations are collecting enormous amounts of data and it is not practical to download all the data from a server to process it locally. The data servers, acting as a middleware, provide the query capability for clients to only access the data required by the application. This pattern allows the data provider to support queries on big data. Cloud patterns can be used such as optimizing tiling images to speed up visualization or storing metadata in noSQL databases which support both fast discovery and various metadata formats ( Figure 2).
Another pattern in cloud computing deals with moving the software closer to the data. OGC provides support for this kind of operation with the WPS standard. WPS allows data providers and other organisations to publish ad hoc algorithms to process the data without the need to download anything to the local machine.

Feature data
Feature data is often referred to as vector data. Data can be modelled as point, line, polygons or other geometries. A general feature model provides the conceptual framework to model features and adds geospatial and non-geospatial properties.
The OGC WFS is a service that allows publishing of vector data. It is based on a general feature model. The data is modelled following the OGC geography markup language (GML), which is an encoding standard used to describe geometries and geographical relations.
GML, also known as ISO 19136 (ISO 2007), is an eXtensible markup language (XML) encoding used for sharing geographic information modelled according to the conceptual modelling framework used in the ISO 19100-series. It includes both the capability to model spatial (e.g. lines, curves) and non-spatial properties (e.g. name of feature, statistical properties) of geographic features. GML defines an abstract feature model and a set of XML schema specifications. GML contains a rich set of primitives including feature, geometry, coordinate reference system, time, dynamic feature, coverage (including geographic images), unit of measure, and map presentation styling rules. These primitives enable information communities to define profiles of GML that capture the information models required for interoperability in their respective communities. One example is GeoSciML which describes a logical model and GML/ XML encoding rules for the exchange of geological map data, geological time scales, boreholes, and metadata for laboratory analyses. There are currently more than 30 GML profiles across multiple communities (OGC Network 2013).
The model which at the basic level is very simple can represent other types of encodings that have a clear separation of spatial and non-spatial properties. GeoJSON and other encodings follow the same pattern.
Another good example of a GML profile is CityGML. It has allowed cities to publish 3D data using an open format. Kolbe highlights the work performed to model one million buildings in 3D, and 150,000 streets for New York city (Kolbe, Burger, and Cantzler 2015). Van der Zee highlighted the importance of city models to support Internet of Things and Smart Cities (Van der Zee and Scholten 2014).

Image data
The OGC WMS interface standard defines a set of interfaces for requesting map images over the Internet. WMS makes it easy for a client to request images on demand changing parameters such as size and coordinate reference systems. A WMS server (i.e. a service that implements the WMS standard) provides information about what maps the service provides, and it produces a map and answers basic queries about content of a map.
A related standard is the web map tile service (WMTS), which describes a strategy to optimize the distribution and the performance of the WMS, by supporting operations for pre-rendering georeferenced map tiles. An image is broken up in a set of tiles and depending on what the user is requesting, only those relevant tiles are submitted to the user.
Big data centres are making use of OGC visualization standard to allow access to big data image repositories (Percivall and Bermudez 2012). A good example is the NASA global imagery browse services (GIBS) project that uses WMTS as an interface to allow users to access over 400 satellite imagery products collected in the last 30 years (Cechini et al. 2013).

Coverage data
Coverages refer to raster or gridded data, which are those images that have assigned values in each grid cell. Satellite images and digital elevation models fall under this category. The OGC WCS standard supports the interaction between server and clients to publish and share coverage data.
The transatlantic EarthServer initiative provides support for 1000 petabytes of data for science analytics (Cechini et al. 2013). It uses WCS and a related standard web coverage processing service (WCPS) that provides raster processing capabilities.

Brokers
Data brokers support translating requests and data encodings from one form to another. If open standards are followed, then a broker can support a variety information about disasters and reports on conflict events.
OpenStreetMap has become one of the most used publicly available sources to obtain information from local areas including streets and features (building, parks, etc.). A vibrant community updates the data source on a continuous basis. In some counties, it supports the overall spatial data infrastructure. Organizations, like the World Bank, use OpenStreetMap as sometimes the only source of data for a country (Quirós and Mehndiratta 2015). A sub group from OpenStreetMap, the Humanitarian OpenStreetMap Team focuses on getting data in times of disaster.
Another tool originally developed to share information by a group of Kenyan bloggers to post information about the 2008 post-election violence is Ushahidi. The community behind Ushahidi provided web and mobile tools for citizens to collect and publish information during and after a disaster, or civil unrest, using mobile devices, for example.
These three platforms are examples of crowd source data that have provided support for organizations to develop maps with geospatial and non-spatial properties. Crowdsource data can also support science, not only in creating spatial features or updating news about an event, but serving measurements/observations from different locations. This activity is sometimes referred to as citizen science.
The citizen observatory web (COBWEB) project is a good example of citizen science (Higgins et al. 2016) that developed a crowdsourcing infrastructure platform for environmental monitoring. It advanced a solution for integrating sensor data from citizens, the architecture, and standards required, as well as strategies for of protocols and focuses on supporting standardised services to the end user. This is beneficial for users that favour one format over another, or that have tools that support only one kind of format. The broker can be configured to take the ad hoc formats and make available the data in a more standardized way.
The global earth observation system of systems (GEOSS) is an example of systems of system that are connected to each other via policy agreements. The GEOSS common infrastructure (GCI) coordinates access to these systems, interconnecting and harmonizing the different types of data.
Brokering approaches in a system of systems architecture supporting open standard interface is challenging (Nativi et al. 2015). Nativi provides the technical details of a GEO brokering framework that features the following: • Load balancing and auto scaling, • Ranking metrics to improve discovery, • Caching techniques for data download and preview, • Re-projection of coordinate reference system, • Domain resampling, • Transformation of format encodings, • Spatial and temporal subsetting.

Crowdsource and citizen science
Crowdsourced data have become an important source of data of all kinds, including geospatial. Examples include twitter and OpenStreetMap. Twitter has become an important mechanism for getting information from the public and to communicate important messages in almost real time. These messages include propagating a semantic translation (e.g. from "dumpsters" to "trash containers") to happen seamlessly and often without human intervention.
OGC members in the OGC geo-semantics domain working group have been working jointly with W3C members under the spatial data on the web working group to come up with best practices for using OGC and W3C standards to publish spatial data on the semantic web.
The data provided by these standards can be made available via linked data specified by the W3C semantic web technologies using unique resource indicators (URI) on the Web and to connect the data using a simple graph model: the resource description framework (RDF).
In 2010, OGC members began developing a standard for representing and querying geospatial data on the semantic web. GeoSPARQL, the geographic query language for RDF data 1.0 was adopted by the OGC membership in 2012. It defines an ontology for representing features and their geometries (Battle and Kolas 2011). The geometries are encoded in well-known text (WKT) which has been used in OGC standards for more than a decade. The feature model is based on the OGC abstract baseline ISO 19107 (ISO 2003).
Several activities at OGC have helped advanced tools and strategies to deal with semantic heterogeneities that can help reduce the time scientists need to clean the data. The 2006 OGC geospatial semantic web interoperability experiment (Lieberman 2006) demonstrated how ontologies from airports, airplanes, rules, and features can be linked to extract information about things. This interoperability experiment helped answer such questions as, "Which airports in a specified area of the world are capable of receiving a C-5 cargo plane?" The 2011 OGC ocean science interoperability experiment (Bermudez 2011) provided best practices for encoding semantics using the OGC SOS interface standard so that sensor concepts (sensor, platform, and parameter) can be easily linked to ontologies. The OGC member participants were also involved in the development of the W3C semantic sensor network ontology (Compton et al. 2012).
The cross community interoperability thread, included in OGC's major testbed activities since Testbed 8, has advanced the use of a RDF knowledge base to disambiguate symbologies and mediate differences between community models, including incorporating data from social media (Hobona and Brackin 2013).

Agility
Agility is the characteristic to move and change direction quickly. Moving quickly on developing geospatial technologies to support science is key for today's changing world. Agility allows for innovation and keeping pace with other related technologies that can support assuring the quality of measurements. The architecture discussed by Higgins is presented in Figure 3. All the components framed in rectangle (sensor service, conflation service, portal website etc.) represent a software tool. These software components need to talk to each other via interfaces.
The specialization of components is critical in today's cyberinfrastructure. The architecture, like the one designed by the COBWEB project, helps materialize the strategy to develop collaborations in federations or communities of interest when sharing geospatial data. The arrows in the figure represent the interfaces (or agreements) that are required to be developed and implemented by the software components to communicate with each other. The challenge is to design the interfaces as simple as possible to be easily implemented and used by the citizens, while at the same time support the functionality required to advance science. In the case of COBWEB, cooperative design (Co-Design) was used as the approach to actively involve all stakeholders in the process.

Linked data and semantic web
Today science problems are solved by performing cross domain collaborations. Scientific data is served by various organizations and sources, including the previously mentioned citizen science; however, semantic heterogeneity makes it difficult to discover and merge data (Bermudez and Piasecki 2006). For example, air pollution from trash dumpsters can cause negative effects in health (Sheehan et al. 2010). If scientists were to create a model for better understanding the geospatial effects, it will be required to develop a city model, get sensor data and determine pollution sources.
Developing a model of the city in 3D can be performed via CityGML. The data will include properties such as the height of the buildings to calculate shadows, air flows and solar potential.
Data comes from various sources that can report the concepts in different ways making it difficult to integrate the data. It can come from an environmental agency that manages air quality sensors. Crowdsource data can capture asthma cases reported via twitter. Another agency might publish the location of dumpsters spotted by drones. When putting the data together it is important to use common vocabulary to refer to the same things. If the city model labels a feature "trash containers" and the organization providing drone data labels the similar feature "dumpsters, " it will be difficult to perform an automatic integration of the data. Conflation, including semantic translation might be required to convert from one concept to another.
Linked data (Bizer, Heath, and Berners-Lee 2009) can be used to help organizations expose, share and connect data from different sources on the web by explicitly declaring the meaning of the concepts. This allows for The validator can test servers (e.g. supporting WMS and WFS), clients (e.g. WMS 1.3) and encodings (e.g. XML, GML, and GeoPackage). The validator can be exercised online, downloaded and being invoked via command line, or installed on a local web server.
It has successfully been used in projects like eEn-vPlus. This project set up the validation tool in their own cloud environment and has helped validate metadata and data following GML profiles (Epsilon-Italia 2017). Some profiles are based on the INSPIRE data specifications (INSPIRE 2014). The profiles include data models for cadaster, administrative units, transport networks, hydrography, elevation, geology, soil, orthoimagery, human health and safety and species distributions, amongst others. The eEnvPlus service also supports validation of other profiles based on the AQD (Air Quality Directive) and the GeoSmartCity (GSC) INSPIRE-extended data models.
Making tools like the OGC validator available via open source allows communities to help improve data sharing, create spatial data infrastructures more quickly, help scientists concentrate on the science and not the format of the data, and guarantees that the interfaces and data encodings are as expected.

Conclusions
Cloud computing and new software tools have provided the means for easier sharing of data and processes. Important challenges remain such as handling big geospatial data, integrating crowdsourcing data, dealing with semantic heterogeneity and embracing an agile process. This paper highlighted a few of the new frontiers and the work performed by the OGC regarding these challenges.
OGC Interfaces provided the capability to support organizations to store all the data in a cloud environment, allowing users and applications to query and efficiently analyse and visualize the information. Advanced technologies like WMTS, that allows tiling of images, have enabled agencies to make available vast amounts of satellite data.
Linked data allows data to be exposed with rich semantics. The query language OGC GeoSPARQL supports querying of data based on ontologies and allows for easier understanding of different terminologies to represent one concept (e.g. "dumpsters" and "trash containers"). The challenge is constructing the mapping amongst concepts, which can be done manually, inferred by rules or via machine learning techniques.
Agility is embedded in today's software development environments, and in organizations that are improving on a continuous basis. OGC has embraced agility in three areas, by providing: A space to prototype solutions via the OGC Innovation Program, an open development environment of standards, and open source flexible validation tools. The challenges within OGC and similar sharing of geospatial data or serve as new requirements for improving geospatial standards. OGC is providing agility in three ways: a program for innovating and prototyping, an environment for open collaboration for developing standards, and an open source testing facility.

Innovation
OGC provides the venue to prototype standards in an agile way, complementing the work develop by the standards working groups. The OGC interoperability program (OGC-IP) is a global, innovative, collaborative, hands-on engineering and rapid prototyping program for advancing, validating and testing geospatial technology. Interoperability Experiments, Pilots and Testbeds allow the geospatial community to advance solutions collaborating with world-wide experts.

Open development environment
OGC recently added the process to allow developing of standards via GitHub, which is one of the most popular software development repository used world-wide. Development teams can easily set up their own project, track code changes, keep an issue tracker to manage bugs and enhancements, and manage releases. One of the key features of this hub is the ability for anybody to comment or suggest changes in the code, which are easily tracked via a pull request.
This approach for code management also applies to developing documents. OGC has adopted AsciiDoc (AsciiDoctor 2017) as the language to develop standard documents. This "markdown" type text is easily trackable in code repositories. A good example is the GeoPackage GitHub repository (OGC 2017), where the draft documents have been publicly available since 2013. The working group has processed more than 100 pull requests. GeoPackage has become one the most used OGC standards in mobile devices.

Open source testing facility
The OGC provides a validator that can be used by OGC and non-OGC members as often as they like to test their implementations of OGC standards. This helps implementers speed up their development process. The source of the engine and the tests are available at GitHub (Bermudez 2012).
The validator is based on a validation tool called TEAMEngine, the compliance test language (CTL) that provides a simple mechanism to expose test grammar like eXtensible stylesheet language transformation (XSLT) and the TestNG Java framework. It is typically used to verify specification compliance and it is the official test harness of the OGC compliance program, where it is used to certify implementations that follow OGC standards.
organizations is to embrace continuous improvement, engage new developers and foster agreements within communities to solve complex science problems that require cross-community collaborations.

Notes on contributor
Luis Bermudez is the executive director of the Innovation Program of the Open Geospatial Consortium, the world leading organization focused on developing open geospatial standards that "geo-enable" the Web-, wireless-and location-based services, and mainstream IT. He has a PhD and MS in environmental informatics from Drexel University and an MS in industrial engineering from the Andes University in Bogota, Colombia. He has more than 20 years of experience in the information and technology industry including geospatial, earth informatics, sensor web, semantic web, and legal fields. He has co-authored more than 50 publications and is co-editor of the GeoFocus. He is an Adjunct Professor in the GIS master's program at the University of Maryland. Before OGC, he was the technical manager at the Southeastern Universities Research Association and technical lead of the marine metadata interoperability project at Monterey Bay Aquarium Research Institute. In both positions, he advanced technologies to support sharing and improvement of numerical models and the integration of ocean observing systems around the world.