Global Reference Grids for Big Earth Data

The emerging field of Discrete Global Grid Systems (DGGS) provides a way to organise, store and analyse spatio-temporal data at multiple resolutions and scales (from near global scales down to microns). DGGS partition the entire planet into a discrete hierarchy of global tessellations of progressively finer resolution zones (or cells). Data integration, decomposition and aggregation are optimised by assigning a unique spatio-temporal identifier to each zone. These identifiers are encodings of both the zone’s location and its resolution. As a result, complex multi-dimensional, multi-resolution spatio-temporal operations are simplified into sets of 1D array and filter operations. DGGS are therefore particularly suited for efficient multi-source data processing, storage, discovery, transmission, visualisation, computation, analysis, and modelling. DGGS are supported by both the Open Geospatial Consortium (OGC) and the International Organization for Standardization (ISO) TC211 standards (OGC Abstract Specification – Topic 21 1 , ISO 19170-1 2 ). These published specifications support 2D equal-area DGGS of the Earth’s surface. Current work led through both OGC and ISO/TC-211 is drafting standards to specify 3D (3D & equi-volume) 3 , 4D (spatio-temporal) 4 and axis-aligned 5 DGGS, as well as OGC API DGGS 6 , 7 interface encodings for DGGS infrastructures. The continued effort to develop international standards for DGGS will support the implementation of standardised interoperable Global Reference Grid Infrastructures that can support efficient and scalable integration of Big Earth Data across multiple organisations around the world. Introducing

may be misplaced or possibly even misleading. How do the papers in this Big Earth Data Special Issue help us appreciate the difference?
Alexander Kmoch et al. from the Department of Geography, Institute of Ecology and Earth Sciences at the University of Tartu compare the broad functionality and usability of five of the more prominent open-source DGGS libraries, and explore the global variation in geometric distortion and zone area for the DGGS(s) supported by each library. Results are presented as maps, histograms and tables. While a small portion of the variation is due to coding choices, the dominant source of variation is an inherent characteristic of the geometry of each DGGS and therefore the results provide a very useful introduction and guide to the compromises inherent in the geometry selection for a DGGS. The paper also makes the distinction between those libraries that focus on providing grids that satisfy particular criteria, and those libraries that also generate identifiers that support efficient processing by providing spatial functions that are driven by the DGGS zone's identifier.
The next two papers provide examples of machine learning that use DGGS to manage many highly heterogeneous data sources. Bing Han et al. extend the 2D surface DGGS examined by Kmoch et al. to 3D DGGS using the GeoSOT-3D DGGS from the College of Engineering, Peking University. They model site selection of emergency airports in the Yangtze river delta. This is an area with high population density that is prone to flooding. So, emergency airports need to be dynamically chosen in response to rapidly evolving situations, and sub-optimal choices have potentially severe consequences. Andrew Rawson et al. from the University of Southampton use their dggridpy library to deploy Big Data technology across North America to assess the likelihood of maritime risk associated with 200,000 shipping incidents during the last few decades in the context of harsh environmental conditions and vessel movements using Automatic Identification System (AIS) data. While both these papers focus primarily on their application, they also describe how using DGGS helps with their analyses. Bing Han et al. describes how the use of GeoSOT-3D's 3D spatial identifier allows them to use a traditional database structure with a single column containing the 3D identifier for their analysis. Andrew Rawson et al. initially focus on DGGS as grids and describe how the advantages to their analysis of both equal-area and uniform nearestneighbor properties supported their choice of a DGGS using Icosahedral Snyder Equal Area projection with grids of hexagonal zones. For the analyses, once again table-based data structures are used indexed by zone identifiers to feed data through the machine learning pipeline.
The remaining papers address three different forms of challenges associated with DGGS -extreme scaling, pathways to adoption and future research. Xinyu Tang et al. from College of Land Science and Technology, China Agricultural University, Beijing, and colleagues address the wider challenge of Big Data remote sensing analytics at peta-and exa-scale. While they do not consider DGGS specifically, the issues they discuss and the solutions they examine do refer to techniques such as space filling curves that are widely used for spatial identifier logic in DGGS. This paper underscores that moving to peta-and exa-scale requires special consideration, whatever technology is used -whether traditional raster or DGGS. Even vector systems at giga-scale struggle, let alone at peta-to exa-scale, so it would be very interesting to see research explicitly exploring the issues Xinyu Tang et al. discuss in the context of DGGS identifiers and diverse vector and raster source data.
Jeffery Thompson et al. from Minnesota Supercomputing Institute, University of Minnesota, and from the National Snow and Ice Data Center, University of Colorado tackle the challenge of adoption by adding DGGS capability to an existing package -Equal-Area Scalable Earth Grid (EASE-Grid 2.0) -that is already widely used by the Earth Observation (EO) community. Their proposal underlines how key DGGS characteristics such as a pre-defined hierarchy of nested grids and structured cell indices substantially benefit global scale EO data storage and interoperability. At the same time, it demonstrates the dilemma in which other assets like seamless global coverage or a consistent subdivision pattern across the grid hierarchy are being sacrificed to data legacy and user comfort in pursuit of easier acceptance. This raises the question whether this initial version of EASE-DGGS is a sufficient endpoint in DGGS adoption or whether it is a valuable first step towards the use of DGGS by the Earth Science community. The challenge will be to ensure that this is just a first step towards a wider adoption of DGGS, so that in time the community can realise the full DGGS vision described by Goodchild (Goodchild, 2019) and address the shortcomings of the early vector and raster decisions whose legacy continues to hold back the global community (Goodchild, 2018).
Majid Hojati et al. from the Department of Geography and Environmental Studies, Wilfrid Laurier University, Waterloo and University of Saskatchewan pick up on recent developments in OGC's DGGS standards, and explore the implications compared with traditional raster and vector-based GIS. The authors take readers on deep dives into particular issues discussing both the superficial pros and cons and also exposing deeper practicalities that will need to be addressed through future research, richer implementations and evolving best practices.
So what do these papers tell us about the relative importance of grids and identifiers? Alexander Kmoch et al. demonstrate the trade-offs between different grids that we need to be aware of when we select a DGGS for our application. Jeffery Thompson et al. show that a DGGS-compliant spatial index can be developed for the Earth Observation community's EASE-Grid v2 system of grids, thereby enabling DGGS capabilities for that community, and Xinyu Tang et al. and Andrew Rawson et al. show the power of leveraging DGGS spatial identifiers for complex analysis of mixed raster and vector data sources. So, the choice of grids is an important consideration in DGGS selection, but it is the indexes that drive analyses.
The challenge of leveraging and benefiting from the wisdom and insights we can gain from Big Earth Data is significant and increasingly being recognised by those who are beginning to drown in a sea of data that is rising at an exponential rate. Our ability to increase our compute capacity to keep up with this growth in data is now limited by the laws of physics, and we have over-reached Moore's Law, as evidenced most obviously by the switch from single processor to parallel processing in the early 2000s (Hilbert, 2016;Karl, 2018;Rydning, Reinsel, & Gantz, 2018).
The papers in this issue clearly demonstrate that there is no single DGGS which is ideal for every use case nor are the advantages of using DGGS technologies realisable at no cost; however, it is also apparent that established techniques for global geospatial data storage and analysis have reached their limits and are now acting as an impediment rather than a facilitator for realising the combined power of a "Digital Earth". To solve this problem, we need to rethink the way we are storing, managing, sharing and working with our data, and DGGS offers a very promising opportunity.
The perceived lack of established and available DGGS solutions, which has been a barrier to implementation, is rapidly becoming an issue of the past. The papers in this Special Issue provide a solid argument for the application and adoption of DGGS to the Big Earth Data paradigm and some of the challenges that we as a community must resolve to truly realise the concept of the "Digital Earth".
We would like to thank all contributing authors, including those whose papers were not selected in this Special Issue. We would also like to express our hearty gratitude to those anonymous reviewers. Last but not least, special thanks go to the Executive Editor-in-Chief Dr Changlin Wang and the Assistant Editor Dr Linlin Guan for their great assistance.

Notes
Matthew B.J. Purss is a geophysicist, data scientist, thought leader and entrepreneur with over 24 years' experience in the exploration, research and government sectors. He is a world leader in the standardisation, development and implementation of DGGS technologies through his roles as Founding Co-Chair of the OGC DGGS Standards and Domain Working Groups, Co-Chair of the OGC-ISO/TC211 Joint Advisory Group and Co-Chair of the OGC Points of Interest Standards Working Group. He is the Co-Founder and Chief Executive Officer of Pangaea Innovations Pty. Ltd., a spatial data technology startup company developing commercial applications of 3D & 4D DGGS technologies.
Zoheir Sabeur investigates natural phenomena, behaviour and processes using experimental sensor observations for science and knowledge discovery. He uses OGC DGGS standards on spatial and temporal Earth data for enabling the scalability of his multi-modal data fusion and deep learning distributed methods. These aim at capturing advanced situational awareness in the context of understanding potential critical events and forecasting. Zoheir has been working in this domain for more than 25 years and validating his research and development on environmental, health and security driven intelligent agents. He is currently Professor of Data Science and Artificial Intelligence at Bournemouth University, Department of Computing and Informatics, United Kingdom.
Peter Strobl, during more than 30 years in a broad range of Earth Observation topics, Peter recognised spatial representation and architecture as a key to interoperability of diverse geospatial data. He works as a Senior Scientist in the European Commission's Joint Research Centre advising the Copernicus Earth Observation programme on questions of references and standards and on the development of future sensors and data technologies. He is a member or chair of various working groups and expert panels within OGC, CEOS, ESA and NASA/USGS. Tengteng Qu is currently a Research Assistant Professor of the Department of Aeronautics and Astronautics in the College of Engineering, Peking University of China. Her research interests focus on geospatial big data analysis and global subdivision grids. Since 2019, she has served as a Standard Expert of Geographic Information Science in OGC DGGS SWG and ISO/TC211. In particular, she is now in charge of the drafting of "OGC 20-049r3, Topic 21 -Discrete Global Grid Systems -Part 4 Axis Aligned DGGS Reference Systems".