Innovative approaches to the Sustainable Development Goals using Big Earth Data

ABSTRACT A persistent challenge for the Sustainable Development Goals (SDGs) has been a lack of data for indicators to assess progress towards each goal and varying capacities among nations to conduct these assessments. Rapid developments in big data, however, are facilitating a global approach to the SDGs. Tools and data products are emerging that can be extended to and leveraged by nations that do not yet have the capacity to measure SDG indicators. Big Earth Data, a special class of big data, integrates multi-source data within a geographic context, utilizing the principles and methodologies of the established literature on big data science, applied specifically to Earth system science. This paper discusses the research challenges related to Big Earth Data and the concerted efforts and investments required to make and measure progress towards the SDGs. As an example, the Big Earth Data Science Engineering Program (CASEarth) of the Chinese Academy of Sciences is presented along with other case studies on Big Earth Data in support of the SDGs. Lastly, the paper proposes future priorities for developments in Big Earth Data, such as human resource capacity, digital infrastructure, interoperability, and environmental considerations.


Introduction
The United Nations Sustainable Development Summit in 2015 proposed 17 Sustainable Development Goals (SDGs) as a part of the 2030 Agenda for Sustainable Development, a global commitment to address the economic, social, and environmental challenges collectively shared by all (United Nations, 2015). These goals represent a comprehensive approach and respect the direction of national development and international cooperation. Since the beginning, rapid adoption of these goals has been constrained by several challenges that still persist in some manner or form even after five years. The global indicator framework for SDGs was adopted in 2017 by the United Nations as a preliminary system for Member States to adopt on a voluntary basis, and to monitor progress in SDG implementation (United Nations, 2017). The framework is subject to regular refinement and updating, though challenges have limited the potential to understand the state of

Big Earth Data
Most modern digital infrastructure is heavily reliant on digital data, which is being generated in large volumes and by a large variety of sources (Guo, Wang, Chen, & Liang, 2014). With the current pace of technological development, computational power, and the volume of data being generated, a digital copy of Earth can be created to provide valuable insight into Earth system processes (Guo, Wang, & Liang, 2016). Additionally, integrating data on Earth observation with other forms of big data within a geographic context holds potential to link anthropogenic activity with its impact on Earth processes. However, data integration presents a key challenge in generating new information and adding value to this volume of multi-source data.
Big data science normally deals with four aspects of data: volume, referring to the quantity of data; velocity, referring to the speed of data generation and its processing; variety, referring to the types of data; and veracity, referring to the availability and accountability of data (Marx, 2013). Commonly, the term big data refers to data that is largely human-centric, generated from the growing use of modern devices that allow both passive and active collection of highly disaggregated data. In simple words, a large portion of big data is created by human activity, utilized for services to humans and is now increasingly being used to study and understand human behavior (Guo, 2019b).
However, unlike the big data that is largely a byproduct of services to the population distributed through technology and software applications, Earth systems data requires concerted efforts, specialized technology, and, like other specialized fields, trained human resources. This collective of data, both obtained from Earth observation systems and other means, with geographic context, can be termed "Big Earth Data". It encompasses the size, heterogeneity, complexity, multi-dimensionality (source, scale, time), and non-stationary and unstructured nature of big data, but at the same time distinguishes itself from the conventional use of the term big data (Guo et al., 2020). Big Earth Data is necessarily a special class of big data related to the processes of the Earth system and utilizes the principles and methodologies of established literature on big data science.
This concept of Big Earth Data provides the capability to virtually analyze a large amount of scientific, social, and cultural information so that people can understand the Earth system, human activities, and their interconnections. This can enhance functions within organizations, businesses, and governments, making them more sustainable. However, the multidisciplinary and interdisciplinary integration of information needed for Big Earth Data requires multi-stakeholder cooperation and collaboration between people from different backgrounds, expertise, and organizations. In the context of sustainable development, the key challenges that must be addressed to facilitate a functional Big Earth Data system include, first, improved communication methods to organize priorities and policy goals for a diverse range of stakeholders, ensuring feasible, sustainable solutions and consensus. Second, data organization and management technologies must be more efficient to improve data transmission channels and optimize storage schemes. Third, data inconsistencies must be addressed through unified benchmarks and improved efficiency of data access and retrieval, which in turn will facilitate access to intelligent data processing and analysis methods and tools. Lastly, data visualization must be provided to facilitate insights into complex and integrated multi-source data to generate valuable, actionable information. Beyond these key issues, Big Earth Data faces new challenges in terms of data transmission, storage, processing, analysis, management, and sharing.
With efforts to make more data accessible and create cloud services to process, analyze, and convert it to information, the Big Earth Data approach has become a relevant and important utility in future sustainable development policies and practices. Big Earth Data will also provide a valuable foundation for studies on resource management, service management, disaster preparedness, risk reduction and mitigation, climate change impact, and other complex management, productivity, and developmental decision-making. Therefore, there is a need to create methods and models for objective and effective monitoring and evaluation based on compatible, quantifiable data, towards solutions to global climate change and other international efforts such as SDGs (Guo, 2020a).
At present, the United Nations as well as governments and international organizations all over the world are conducting research on constructing SDG indexes for monitoring and evaluation. However, data constraints are presenting difficulties in the implementation process. The SDGs, especially the goals related to Earth's surface, environment, and resources, are characterized by a large scale and periodic changes. With a multi-scale perspective, primarily at global, regional, national, and local scales, Big Earth Data can serve the SDGs to support their implementation ( Figure 1). This is done in three broad aspects: 1) fill in missing data and develop technologies to generate data for SDG evaluation; 2) develop Big Earth Data methods and models to evaluate SDGs; and 3) provide decision support to SDGs by monitoring and identifying progress (Guo, Chen, Sun, Liu, & Liang, 2021).

Big Earth Data Science Engineering Program
CASEarth was officially approved by CAS on January 1, 2018, with an implementation period of five years. CASEarth looks at Big Earth Data as a United Nations Technology Facilitation Mechanism to encourage innovation in formulating viable solutions through data-driven science. The overall objective of CASEarth is to build a state-of-the-art Big Earth Data infrastructure, develop an innovative Big Earth Data platform, and construct a decision support system. To achieve these objectives, CASEarth has prioritized research on the integration of multi-disciplinary and multi-source data, big data management, and knowledge mining, which all support scientific discovery and decision-making. CASEarth will also oversee development and integration of key technologies such as big data and cloud services, as well as the Digital Earth Science Platform (Guo, 2017).
In the past two years, CASEarth has successfully developed the fundamental structure of the CASEarth Prototype System with an advanced Digital Earth Science Platform and Big Earth Data Cloud Service Platform. This foundation platform enables integrated data management, computation, analysis, and services. It also consists of a Big Earth Data demonstration platform including a modern visual display system that integrates a variety of interactive visualization equipment such as large LED screens, a contact-free radar display, 270-degree curved screen projection, large-scale spherical screen projection, interactive "magic mirror" display, and interactive augmented reality.
The Big Earth Data Cloud Service Platform is dedicated to Earth sciences, offering comprehensive computing and analysis capabilities, high-performance computing (HPC) services, scientific data publishing and sharing, application environment customization, and online data analysis and mining services for researchers. The cloud HPC platform has a peak performance of 1 petaflop, nearly 10,000 virtual machines for data processing, 50 PB of storage, and petabyte-level data management capabilities. With the CSTCloud Passport, users can apply for, obtain, and use services for supercomputing, cloud computing, and data storage, and can also customize the data processing environment as needed. The ability to directly and rapidly process and publish research data on the platform will support the development of data-centered research. The hardware systems and core software of the platform have been deployed and as of December 2020 there are more than 1,000 cloud hosts and almost 8 PB of research data. It has supported the construction of multiple applications, including the CMIP6 large-scale numerical simulation experiment, Big Data Library, ground monitoring data integration and sharing, EarthDataMiner mining and analysis cloud service, BioONE Integrated Big BioData Infrastructure, Catalogue of Life in China, MapBio biological map of China, Automated Fruit Fly Identification System, and Big Earth Data Integrated Analysis and Visualization Platform, among others. The big data resource system provides global users with systematic, diversified, dynamic, continuous, and standardized Big Earth Data with a global unique identifier. With the establishment of a data sharing system that integrates data, computing, and services, a new model of Earth science data sharing will be formed. The independently developed Big Earth Data system software stack provides a solution for spatiotemporal data management and analysis.
Complicated and inter-linked global challenges, as well as national developmental challenges and other administrative activities at various scales, require reliable decision support systems that depend on precise, timely scientific data. In particular, to understand and mitigate shared risks and challenges with interconnections beyond administrative boundaries, data sharing and knowledge transfer is critical. However, scientific data sharing is limited due to inadequate sharing policies, scattered resources, and repeated layouts, resulting in insufficient results both in China and around the world. With capabilities to analyze large quantities of data, the capacity of data sharing within disciplines like Earth sciences needs to be improved and impediments to massive data applications need to be removed. This is the first time in China that data references of major national projects are being published in accordance with the Chinese national standard "Information Technology Scientific Data Reference" (GB/T 35294-2017), which allows orderly association of data and entire life cycle management. Users can access all metadata gathered and released by CASEarth, and share the data according to sharing permissions. As of December 2020, more than 50,000 users from multiple fields such as Earth sciences, information sciences, and life sciences in 115 countries have accessed the system, and the total online visits have exceeded 29 million, with 400,000 data downloads.
By the end of 2020, the total amount of data shared on this platform was approximately 8 PB, including 2.8 PB of Earth observation data, 4.6 PB of biological and ecological data, 0.4 PB of atmosphere and ocean data, and 0.2 PB of basic geographical information and ground observation data. It hosts 490,000 items in the stratigraphic and paleontological databases, 3.6 million items in the Catalogue of Life in China, 420,000 items in the Microbes Database of China, and 1 billion pieces of Omics data. Users can now retrieve 40% of the data online through a variety of data discovery modes, such as project classification, keyword search, tag filtering, and relevant data recommendation. The data can be downloaded directly from the Internet or it can also be accessed through an application programming interface. Interactive online data services support online data viewing, preview, and search, and also provide personalized services for data surveys, collection, recommendation, download, and evaluation. With improving infrastructure, more data will be released online in the future, and about 3 PB of data will be updated each year.
The opening and sharing of CASEarth data will help solve a series of major worldwide scientific problems in the Earth sciences. Meanwhile, based on the Big Earth Data Cloud Service Platform, CASEarth also comprehensively integrates the databases, models, and decision-making methods of CAS in the fields of resources, environment, ecology, and biology. CASEarth is building a sustainable development indicator system and decision support platform, monitoring and evaluating the sustainability of resources, the environment, ecology, and other aspects. The program is incorporating Big Earth Data into the sustainable development evaluation system of the United Nations and China, and actively facilitating China's support for the 2030 Agenda for Sustainable Development.

Efforts towards utilizing Big Earth Data for SDGs
CASEarth is working to formalize the process of information-based decision support in China and around the world, and several of its research activities have been published. A series of reports has been compiled detailing over 100 case studies, 38 of which were submitted by China to the 74 th and 75 th United Nations General Assemblies in 2019 and 2020, respectively. The series, titled Big Earth Data in Support of the Sustainable Development Goals, is centered on developing data products, methodologies, and models, and provides decision-making support by monitoring and evaluating CASEarth's targeted set of six SDGs.
For SDG 2, case studies employed multisource data including data from remote sensing, information extraction models, statistical models, and ecological models. Results from studies on SDG 2.3.1 and SDG 2.4.1 showed that global crop production per labor unit has increased by 10% during the last decade, while another study estimating environmental impacts on food production in China demonstrated that China's cropping systems are becoming more sustainable (Zuo et al., 2018). The prevalence of stunting among children less than five years of age in China between 2002 and 2017 significantly declined, fulfilling the 5.9% target prescribed in SDG 2.2. Furthermore, another study found that China will be able meet the forecasted consumption for 2030 through increased crop yield and also reduce nitrogen and phosphorus fertilizers for China's three major staple food crops, without affecting their production (Wang et al., 2020a).
To assess the effectiveness of policy, it is necessary to link its impacts with grounded realities. One of the case studies therefore adopted a Big Earth Data methodology for SDG 6.6.1 and reported a steady increase in China's mangrove area since 2000, which could be linked to the implementation of ecological protection policies such as "returning ponds to forests" and "returning ponds to wetlands". Another study using Big Earth Data registered a net growth of 22.11% in the area of mangrove forests in China, marking success in the restoration efforts (Jia, Wang, Zhang, Mao, & Wang, 2018). In two other studies, the area of Ramsar sites was also reported to have increased, and reduction in the invasive Spartina alterniflora was observed, suggesting it is under effective control (Mao et al., 2019). Another study reported that water transparency in Chinese lakes is good and improving (Liu et al., 2020a). These experiences in policy and practice can provide reference for the protection of water-related ecosystems in the Belt and Road region.
Several case studies on SDG 11 monitored public transport (SDG 11.2.1), urbanization (SDG 11.3.1), cultural and natural heritage (SDG 11.4.1), PM2.5 (SDG 11.6.2) and public spaces (SDG 11.7.1). An outcome of one of the projects is a global 10-meter spatial resolution impervious surface product developed by fusing optical and synthetic aperture radar data (Sun, Xu, Du, Wang, & Lu, 2019). This product has helped to resolve a data deficiency in long-term urban expansion time-series datasets for 1,500 cities with populations greater than 300,000 in the Belt and Road region, produced from 1990 to 2015 at five-year intervals (Li, Cai, & Du, 2021). Case studies related to SDG 11.3.1 and 11.7.1 used Big Earth Data techniques and found that the expansion of urban built-up areas began to slow down after 2015 while open public space as a proportion of built-up areas in 342 prefectural cities increased by 1.5% on average (Wang, Huang, Feng, Zhao, & Gu, 2020c;Wang et al., 2020b). Public access to transportation was observed to have increased in approximately 80% of cities to varying degrees.
Big Earth Data can also facilitate innovative methodologies. For example, for SDG 13, a method for detecting abnormal variations in global greenhouse emissions based on time-series data fitting was developed, and it produced continuous spatiotemporal data products for CO 2 monitoring with the world's atmospheric satellites. Similarly, another method was developed to produce spatiotemporal monitoring products for glaciers and to forecast Arctic sea ice changes (Huang, Li, Zhou, & Zhang, 2021). Big Earth Data can also contribute to climate change research through time-series analysis for Antarctic ice sheet snowmelt and mass balance (Liang et al., 2021a(Liang et al., , 2021b. These products provide strong information support to all countries to develop mitigation strategies and endeavor to reach the world's emission-cutting and temperature-control targets. Other studies on SDG 13 used Big Earth Data approaches to study the frequency and intensity of extreme hightemperature events and heat waves that were observed to have markedly increased since 1990, warranting closer attention. A study on climate change found a risk of yield reduction for wheat and maize resulting from a high-probability forward shift in the anthesis and maturation of China's main crops through the 2030s, requiring adjustments and improvements in such areas as crop management and breeding of improved varieties (Liu, Zhang, & Qin, 2020b).
Big Earth Data can also be used for ecosystem management, for example for marine pollution (SDG 14.1) and marine ecosystem health management (SDG 14.2). One study employed Big Earth Data to study the estuaries and bays along China's coast at multiple scales to monitor water quality and ecological effects using the eutrophication assessment model. The results suggest that for more than a decade from 2007 to 2019, the overall health of the Jiaozhou Bay ecosystem was stable with some upticks. Another study estimated the average abundance of floating debris in China's coastal waters in 2018 using Big Earth Data and reported a decrease of approximately 25% compared to the 2010-2014 average (Zhou et al., 2016). Similarly, Big Earth Data case studies on the ecosystems of Sishili Bay and Daya Bay helped to establish that, on the whole, both ecosystems were in good health. A study on the area of raft culture established that its area in China's seas was observed to be growing on the whole, but the area of raft culture within the boundaries of the coastal ecological conservation red line remained more or less stable.
Big Earth Data case studies on SDG 15.1.1 and 15.1.2 reported significant improvement in vegetation cover on China's Loess Plateau in the past decade. China's land degradation neutrality (LDN) showed a positive trend marked by a 60.30% growth in net area of restored land from 2015 to 2018, which represents approximately one fifth of the net area of land restored globally, making the country the number one contributor to global LDN. A case study also used Big Earth Data to develop an assessment methodology system for global land degradation and indicated that 32 countries had more land degradation area than land restoration area, meaning the realization of SDG 15.3.1 by 2030 is facing serious challenges. A separate study used Big Earth Data to evaluate the red list index, and found that the red list index of higher plants and terrestrial mammals in China was on the rise from 2004 to 2016, while the red list index of birds was on the decline. Other studies were also conducted on forest area as a proportion of total land area (SDG15.1.1), proportion of important sites for terrestrial and freshwater biodiversity that are covered by protected areas (SDG15.1.2), proportion of land that is degraded over total land area (SDG 15.3.1), mountain green cover index (SDG 15.4.2), and red list index (SDG 15.5.1) (Guo, 2020b).
The outcomes of these reports are a consequence of China's efforts to promote scientific and technological innovation to achieve the SDGs. It can be seen that Big Earth Data, as a new approach to scientific discovery and an emerging frontier technology, can bring great value and potential for evaluating and monitoring SDGs worldwide.
Based on research on SDGs by CASEarth, Chinese President Xi Jinping, during his address at the 75 th United Nations General Assembly General Debate, announced that China will set up an International Research Center of Big Data for Sustainable Development Goals. The center will make full use of Big Earth Data infrastructure for SDGs, providing data-driven decision support, data sharing, and scientific knowledge. The center has five key missions, including to develop SDG data infrastructure and information products, to provide new knowledge for SDG monitoring and evaluations, to develop and launch a series of SDG satellites, to establish a think tank for science, technology, and innovation to promote SDGs, and to provide capacity development for SDGs in developing countries.
The center is preparing for the launch of the first SDG satellite, "SDGSAT-1", which will provide necessary information about national urban growth, monitor the quality of coastal and offshore environments, and give insights into the status, patterns, and regional gaps in socioeconomic development at a very fine scale. The datasets from the satellite will also be made available to the world through the center. SDGSAT-1, to be launched into orbit in September 2021, will carry multiple sensors on board including a thermal infrared imager, the world's highest-resolution noctilucent low-light imager, and a multispectral imager.

Conclusion and the way forward
The year 2020 marks the start of the Decade of Action for achieving SDGs (United Nations, 2020). Challenges identified during the first five years include shortfalls in the implementation of the 2030 Agenda for Sustainable Development, data deficits, lack of research on the indicator system, and imbalanced development. The COVID-19 pandemic, since the beginning of 2020, has introduced new challenges to implementing the 2030 Agenda and has particularly highlighted the vulnerability of public health systems around the world. The economic challenges as a result of this pandemic have also slowed progress towards the implementation of the SDGs.
The spirit of collaboration witnessed during the pandemic is encouraging and should be replicated by the science, technology, and innovation community to explore innovative solutions to challenges towards sustainable development and accelerate the implementation of the 2030 Agenda (Silvestre & Ţîrcă, 2019). Constraints such as gaps in data and methods need to be removed as they take precedence both in utility and urgency, as they are a major obstacle in monitoring and mapping progress in many countries. Lack of data or absence of SDG-related data is also limiting the adoption of more sophisticated technologies in implementation projects (Kroll, Warchold, & Pradhan, 2019).
The science, technology, and innovation community should also collaborate to improve the SDG indicator system and its evaluation methodology. Big Earth Data methods have been demonstrated through several case studies presented internationally to provide an important compatible alternative approach to traditional statistical methods that can easily complement and enhance existing approaches to quantify and monitor progress and assess change. The potential of Big Earth Data in implementing SDGs globally is still yet to be fully understood by policymakers, scientists, and practitioners in different disciplines. Big Earth Data approaches are therefore now an important project that demands the immediate attention of the global scientific and technological community and requires scientific and technical cooperation (Guo, 2020b).
The efforts by CASEarth to utilize Big Earth Data methods for implementing SDGs must continue to develop dynamic and objective macro-scale monitoring systems to provide an abundant, large-scale, frequent stream of information for policymakers to support the implementation of the SDGs. However, there are certain priorities that need to be addressed to ensure a systematic and holistic approach towards this transformation (Guo, 2020c).
(1) There is an urgent need to update data sharing services both nationally and internationally, which will require research on a range of technologies for, inter alia, real-time access to multi-source data. These services should enable open data sharing and on-demand aggregation of multisource data.
In addition to open access to data, there is also a need to develop accessible Big Earth Data-enabled SDG assessments and measurements delivered as complete lines of data products to be shared with the United Nations system, including agencies and Member States, as a substantive solution to overcome the data and information deficit for science-enabled policy development towards the implementation of the SDGs (Yang et al., 2019). (2) The three dimensions of the SDGs, economic, social, and environmental, are all quantified through techniques and methods that have largely developed independently and do not naturally integrate. This interoperability challenge is further magnified by increasing diversity of data sources for each dimension, as is inherent to the concept of Big Earth Data. Therefore, there is a need for extensive research, at both national and international levels, on unified standards, formats, and units or standard conversion algorithms, improving data interoperability between the three disciplines.
(3) There is a need to improve infrastructure and human resource capacity to ensure global digital interconnectivity between various entities essential for decision support and online solutions, services, and information (Guo, 2018). This is also essential to ensure no one is left behind. Proactive sharing of science, technology, and innovation deliverables through international cooperation mechanisms will enable many countries to incorporate information and communications technology and adopt scientific approaches to sustainable development to the greatest extent possible (Sarvajayakesavalu, 2015). (4) To take advantage of physical infrastructures, there is also a need to develop human resources in every country. International and national training programs and development of online platforms providing access to high-performance computing, big data analysis, and artificial intelligence will facilitate rapid development of trained human resources for national SDG research and Big Earth Data-enabled monitoring and evaluation of SDG progress. (5) The environmental dimension of the SDGs needs special attention, as the environmental implications of the decisions towards development manifest long into the future and have lasting impacts (UN Environment, 2019). Given the challenges in resource management and environmental constraints on our planet, it is urgent to effectively monitor, observe, and understand the changing climate, environmental degradation, and use patterns of environmental resources in the Earth system. The socio-environmental interactions and interplay between their relationships must be investigated and measured to ensure lasting sustainable development (Griggs et al., 2013).
Fang Chen is a Professor at the Aerospace Information Research Institute, Chinese Academy of Sciences. He conducts interdisciplinary work combining remote sensing, ecology, and other fields of study to assess spatial patterns of disaster risk. His current work also focuses on adapting Big Earth Data technologies to meet the United Nations Sustainable Development Goals (SDGs) assessment needs (mainly for SDG 11 and SDG 13).