A new big data approach based on geoecological information-modeling system

Abstract In this paper, the geoecological information-modeling system (GIMS) is described as possible improvement of the Big Data approach. The main GIMS function is the use of algorithms and models that capture the fundamental processes controlling the evolution of the climate–nature–society (CNSS) system. The GIMS structure includes 24 blocks that realize a series of models and algorithms for global big data processing and analysis. The CNSS global model is the basic block of the GIMS. The operational tools of GIMS are demonstrated by examining several scenarios associated with the reconstruction of forest areas. It is shown that significant impacts on forests can lead to global climate variations on a large scale.


Introduction
Present civilization development advances the problems of assessment and forecasting of the expected climate changes and related variations in habitat of humans and animals. In the first place, the beginning and expansion of dangerous natural processes leading to the loss of people and economic damages is one of the main problems of environmental monitoring data processing. The complexity of this problem is related to the heterogeneity and multi-formity of available information from various sources such as Earth monitoring systems and existing databases (Sudmanns, Tiede, Lang, & Baraldi, 2017). Indeed, human civilization must solve a more important problem of sustainable development between nature and society. What is needed is the investigation of reliable and efficient informational technology providing spatial scope of global and regional relationships between complex structures of relations within nature and society by considering possible constraints and multi-fold structures. According to Varotsos (2007, 2008) the fundamental problem lies in the conception of globalization and its understanding.
This paper provides a description of the approach and methodology to be used to solve the problem of sustainable development of the climate-nature-society system (CNSS) taking into account both natural and demographic processes.
Among the existing tools for environmental data visualization, the geophysical information system (GIS) is the most demanded approach to environmental monitoring data processing and representation. Basic GIS imperfection consists in that it does not focus on multi-pronged prognosis of monitoring objects. Important improvement of GIS technology was made by Kondratyev, Krapivin, Savinykh, and Varotsos (2004) when geoecological information-modeling system (GIMS) was presented as a combination of GIS and modeling technology. Key aspects of GIMS have been discussed in many publications (Cracknell, Krapivin, & Varotsos, 2009a, 2009bKrapivin & Shutko, 2012;Krapivin, Shutko, Chukhlantsev, Golovachev, & Phillips, 2006;Krapivin, Varotsos, & Soldatov, 2015).

Big Data Approach and global sustainable development problems
With the development of society, the CNSS sustainable development problem becomes increasingly critical covering practically globe. Even the use of satellite environmental monitoring does not provide data that can help to assess the NSS characteristics with high reliability and, in particular, the prognosis of CNSS evolution. Big Data tools help solve limited economical problems but encounter difficulties when environmental considerations are taken into account when the nature protection problems are considered. Therefore, widening the Big Data tools with the provision of functions for the processing and analysis of superlarge volumes of environmental information delivered from different sources irregularly at the time and fragmentary in space is a real problem. Solution of this problem is mainly realized with the use of global models oriented on the study of the CNSS restricted aspects, including climate and biospheric models that practically consume pre-historical structured data of restricted volume (Krapivin, 2009;Krapivin & Kelley, 2009). The sustainability development problem of global CNSS for its solution requires the collection of data at unprecendented scale. Decisions that are based on existing global models of different environments, including biosphere, geosphere, and atmosphere, cannot answer the main question: what is the optimal structure of global monitoring data that reflects different aspects of individual CNSS items and helps to overcome the un-removable uncertainties within the many areas (Held & McGrew, 2007a, 2007b. As demonstrated by Varotsos (2007, 2008) the development of biogeochemical, biocenotic, hydrophysical, climatic, and socioeconomic processes taking place in the NSS inevitably requires a balanced criterion for information selection taking account the hierarchy of causative effects in the CNSS with the coordination of spatial digitization. Existing environmental monitoring systems such as the Earth Observing System (EOS) and the Global Ocean Observing System (GOOS) provide long-term global observations of the land surface, biosphere, atmosphere, solid Earth, and oceans and enable an improved understanding of the Earth as an integrated system. These systems have allowed for the synthesis of Earth Observing System Data and Information System (EOSDIS) that provides capabilities for managing data from various sources of different types, including satellites, aircraft, field measurements, and various other sources. The EOSDIS contains growing database in its ingestion of approximately 8.5 terabytes daily. The EOSDIS archive volume growths from last years from about 0.2 PB in 2000 to 14 PB in 2015. These data flows can be completed with the global and regional socioeconomic information from Socioeconomic Data and Applications Data Center (SEDAC) the catalog of which summarizes data available in 52 countries. The EOSDIS and SEDAC data combined processing can play a key role in the NSS sustainable development problem solution using the GIMS approach (Krapivin & Shutko, 2012).

The GIMS as Big Data Approach improvement
The generalized concept of GIMS is shown in Figures 1 and 2. The GIMS key item is a global NSS model (Krapivin & Varotsos, 2007. Basic GIMS principles are: • Integration, unification, and coordination of big data fluxes delivered by the existing monitoring resources basing on the unique organizational and science-methodic basis; • Coordination and compatibility of big data fluxes using the unique coordinate-time system, common system for classification, coding, format, and data structure; and • Providing the independence of big data fluxes from ecosystem and state boundaries. Construction of the GIMS is connected with consideration of the components of the biosphere, climate, and social medium characterized by the given level of spatial hierarchy. Realization of the GIMS function is provided by its subsystems listed in Table 1. Basic block of the GIMS is the CNSS global simulation model (CNSSGSM), the structure of which is shown in Figure 3. The CNSSGSM assimilates big data fluxes considering complex of biospheric, climatic, and socioeconomic processes taking into account their space-temporal hierarchy. According to the procedure represented in Figure 2, the CNSSGSM structure is oriented on the adaptive functioning mode introducing the global model into a system of geoinformation monitoring (Krapivin & Kelley, 2009;Krapivin & Shutko, 2012). The approach to the CNSSGSM synthesis is based on the two mathematical methods: • Balance equations are used when knowledge of fluxes of matters and information between CNSS components are exhaustive. • Evolutionary algorithm is applied when the build-up of an adequate balance model is, in principle, impossible because of the lack of information completeness and knowledge of environmental and socioeconomic laws are insufficient.
The GIMS-based method makes it possible to create a global monitoring system in which the CNSSGSM provides the entire system to be categorized as a class of subsystems with variable structures and makes the system adaptable to changes in natural and socioeconomic  processes. Figure 4 demonstrates the general structure of GIMS as the aggregate approaches, instruments, and methods for the processing of structured and unstructured data characterized by big volumes and significant variety. GIMS allows the combined use of different approaches to the processing of big data fluxes that primarily solve a new decision-making process to optimize these fluxes at the expense of effective monitoring alternatives and data processing tools (Krapivin, Varotsos, & Soldatov, 2017a, 2017b. The GIMS function The description of the function Planning and analysis of big data clouds the analysis of the structure of the environmental data acquisition system using satellite data, flying laboratories, and mobile and stationary ground observation means as well as providing socioeconomic information (Krapivin & Shutko, 2012) Synchronous analysis of big data fluxes using space-time interpolation and extrapolation methods retrieval of data and their reduction to the common time scale is performed. global model parameters are determined. the thematic classification of big data is carried out and space-time combination is performed on measurements obtained from various types of sources (Kondratyev et al., 2002) Evaluation of the state of the atmosphere the gas and aerosol composition of the near-earth atmospheric layer are provided and forecasting maps of their distribution are created (Kondratyev, ivlev, Krapivin, & Varotsos, 2006; Evaluation of the state of the soil-plant covers Determining the structural topology of land cover revealing soil-plant formations in accordance with spatial resolution (Kondratyev et al., 2002;Krapivin & Shutko, 2012) Evaluation of the state of the water medium Simulation model is used for the hydrological processes taking into account seasonal changes of surface and river runoff, the influence of snow cover, and permafrost and the regime of precipitation and evaporation (Krapivin & Varotsos, 2007;Krapivin et al., 2015) Modeling of the global biogeochemical cycles Mathematical models of global cycles of greenhouse gases are used taking into account the roles of soil-plant formations, World Ocean and geosphere as well as anthropogenic processes (Kondratyev, Krapivin, & Phillips, 2003;Krapivin & Varotsos, 2016;Varotsos, Krapivin, & Soldatov, 2014) Modeling of the photosynthesis in the oceans and vegetation layers, the photosynthetic processes are described by proper mathematical models (Krapivin & Kelley, 2009;Sellers et al., 1986) Modeling of demographic processes Population dynamics is described by two models with consideration of the role of environmental and social factors. Models differ in mathematical approaches (Kondratyev et al., 2002;Krapivin & Varotsos, 2007) Climate change modeling Climate change processes are described by simple functional models reflecting the roles of greenhouse gases and pollution of the atmosphere (Krapivin & Varotsos, 2008;Mintzer, 1987) identification of causes of ecological and sanitary disturbances in the environment investigation and identification of dangerous environmental processes is realized, including the detection and prognosis of tropical cyclones, floods, and excessive atmosphere pollution (Krapivin & Shutko, 2012;Nitu et al., 2013) intelligent support Software-mathematical algorithms are developed to provide the user with intelligent support in performing complex analysis of simulation experiment results (Nitu, Krapivin, & Pruteanu, 2004)

Examples of global big data processing
Interaction level of society and nature has reached global planetary scales when anthropogenic impacts on the natural subsystems and processes become the dangerous changes of the habitat both for animals and peoples. There is only a single approach to the search for CNSS sustainable development, which consists in the evaluation of the consequences of anthropogenic scenario realization by means of simulation experiments. The GIMS approach can help in these experiments allowing the estimation of environmental consequences from anthropogenic scenarios realization, including global and regional scales of the impacts. Variety of these impacts covers practically all environments including pollution of the atmosphere and hydrosphere at the expense of the release of toxic compounds into the environment, the destruction of habitats through agriculture and urban sprawl, agricultural expansion to the forested areas, etc. In other words, human society aiming at living comfort, depletes the resources, destroys the vegetable kingdom and animal word, and pollutes the environment. To demonstrate the GIMS functions, several scenarios of the forest areas reconstruction are considered. Mainly, the GIMS could realize different scenarios that are harmonized with our understanding of possible effects on the environment. The land cover is characterized by the heterogeneity of biomes and other environmental objects. At present the main areas of the land are woodland (41.2%) and cropland (24.7%). The land use is a basic problem and the solution of which involves the management and Source: Kondratyev, Krapivin, and Phillips (2002) and Krapivin et al. (2015) modification of the natural environment which mainly relates to the reduction of the forest areas (Kargel, Leonard, Bishop, Kaab, & Raup, 2014;Lambin & Geist, 2006;Ramachandran & Garrity, 2012). Unfortunately, the forest areas are subject to the realistic implementation of anthropogenic scenarios associated with the withdrawal of their biomass.
It is known that boreal and tropical forest undergo the most anthropogenic impacts. The phenomenon of wildfire by lightning strike or by human actions is the primary determinant of the forests. The GIMS allows for the realization of different scenarios of influence on these forests with spatial resolution 1 × 1 (Hengeveld et al., 2015).
Simulation experiments show that total burning of all coniferous forests up to 42°N leads to an increase of atmospheric carbon by 21.7% with increasing global temperature by 4 °C. During the next year, the oceans absorb 10% of the emitted carbon and during the next century only 1.3% of the increased atmospheric carbon has remained in the atmosphere. The system atmosphere-ocean over 100 years goes to the carbon balance and carbon content in the deep ocean increase by 9% (Krapivin et al., 2017a). Realized impact on the coniferous forests is reflected in the humus layer dynamics, that over 30 years, has lost 4% of its reserves. Fired coniferous forests are restored during 100 years by 68%. Other forests absorb 1/30 part of emitted carbon.
Tables 2-4 show the estimates of variations of carbon reserves in basic biospheric reservoirs when forests of different climatic zones are partially burned. It is seen that large-scale impacts on the land biota are damped during 60-100 years. Under this biosphere is more stable to the impacts on the tropical forests than on boreal forests. Simulation results show that the forests of the Northern Hemisphere (42°N and higher) play a significant stabilization role in the global carbon cycle. Within these scenarios it is assumed that the forest areas     during post-fire restoration covered by the same plants. Certainly, post-fire restoration leads to a new forest structure and other species diversity which is the result of many environmental processes including global climate change and natural forest succession. Table 5 shows the change of vegetation role in the atmospheric carbon absorption under the reconstruction of soil-plant formations. Anthropogenic change of vegetation covers can significantly change balance components of global carbon cycle. It is clear that such hypothetical vegetation cover transformations need to take into account climatic zones and biological compatibility. The GIMS partly helps to realize similar simulation experiments.
The GIMS gives the opportunity to evaluate the mosaic picture of the carbon dioxide sinks in the vegetation biomes in its dynamics. The knowledge of this mosaic makes it possible to assess the role of concrete biomes in the regional balance of carbon and on this basis, to estimate the possible consequences from anthropogenic interference in these biomes. The GIMS gives an opportunity to estimate the atmospheric CO 2 sequestration by vegetation sites in their evolution on different territories. It is assumed that CO 2 emissions in 2015 are estimated by 36.1 GtCO 2 globally and 1.7 GtCO 2 from the Russian territory with consecutive decrease by 10% to 2150. As well as it is accepted that deforestation processes on Russian territory are no realized. In the context of this scenario, rates of CO 2 assimilation by plants on the territory of Russia will increase from 206.  types at the expense of the tundra-taiga boundary succession processes. Such effects are reduced for the mid-taiga forests and dry steppes up to 205 and 114%, respectively. Natural land cover transformations are real processes that are realized by humans for the improvement of living habitat and food production growth. These actions lead to the change of many evolutionary processes, including changes in biogeochemical cycles, which directly leads to climate change. Table 5 demonstrates some modeling results when various biomes are transformed. Such hypothetical experiments lead to the understanding of the limits of natural stability and to the possible ranges of anthropogenic interventions to the natural environments.
Simulation experiments show significant dependence of global climate on the overall health of global forests. For example, the reduction of global forested areas by 19% to 2050 leads to the CO 2 concentration increase by 53% by the end of twenty-first century and, on the contrary, increase of forested areas by 10% during the same time period gives a CO 2 concentration decrease by 12%. Figure 5 represents a climate change within realization of different scenarios for the impacts on forests. We see that forests and climate are intrinsically linked significantly through the sequestering atmospheric carbon as well as through direct and indirect impacts on the global hydrological cycle (Roberts, 2009).
The GIMS reflects the interactions of natural and anthropogenic factors that play a significant role in the greenhouse effect formation depending on the energy use and economic development. Figure 6 gives a comparison of modeling results regarding the future global temperature change under the realization of IPCC scenarios. It is evident that the GIMS Table 5. the dynamics of the ratio of the integral rates of CO 2 assimilation by vegetation covers from the atmosphere in the context of the scenario when natural biome is globally replaced by other biome. forecasts more low deviations in average global temperature compared to results from the atmosphere-ocean general circulation model (AOGCM) of Hadley Centre. For example, the realization of the A1FI pessimistic scenario is geared to a very rapid economic growth and intensive use of fossil resources, giving the global temperature rise to 2100 by 4 and 2.6 °C according to AOGCM and GIMS, respectively. But these changes to 2200 are 5.5 and 4 °C, respectively. The conclusion that can be drawn from the results of Figure 6 is that GIMS gives more precise results due to more broad components taking into consideration. Figure 5. the dynamics of the CO 2 concentration for different scenarios of changing forest areas in the context of possible scenarios: 1 by 2050 the area of the forests is increased by 5% and remains without changes; 2 by 2050 the area of the forests is reduced by 5%; 3 by 10%; 4 by 20%; 5 by 30%; 6 by 2050 the forests will be liquidated at all; 7 by 2050 the area of the forests is increased by 50% without change in the future.

Conclusions
Co-evolution of climate, biosphere, geosphere, hydrosphere, and human society depends on how the Earth's system generates and maintains thermodynamic imbalance. Understanding and evaluating processes in the climate-nature-society system requires the big data processing algorithms under the exponential growth of them and when using traditional data processing tools eventually become obsolete. Most of the existing climate models and global biospheric models do not provide overall analysis of the processes existent in the Earth system. The GIMS as it is seen in Table 6 and Figure 1 can play the role of the Big Data information-modeling system that at one time can analyze heterogeneous data delivered by different monitoring systems with incongruous scales and un-removable uncertainties. Tables 2-5 demonstrate such functions of the GIMS as a new Big Data Approach. As a result of simulation results, there are significant impacts of the saturation of greenhouse effect due to CO 2 growth, which is agreed with the physical laws and is justified by many modeling results (Miskolczi, 2007). Simulation experiments analogous to Table 5 can help to search for optimal forest management strategy when climate change forecast will be acceptable for a long time. Certainly, this paper cannot solve this task. It is necessary to realize series of simulation experiments with consideration of reasonable scenarios (Krapivin et al., 2017b).
Moreover, the GIMS possesses the data fusion function when data are delivered from dissimilar sources by irregularly in time and fragmentary by space. This function allows for the answer on the following questions that inevitable arisen under environmental monitoring management (Nitu, Krapivin, & Soldatov, 2013;Krapivin, & Shutko, 2012):  (Krapivin, 1993;Nitu et al., 2004) PMtM Photosynthesis model for the tropical and moderate oceanic zones (Krapivin, 1996). PMaa Photosynthesis model for the arctic and antarctic zones of the World Ocean (Kondratyev, Krapivin, Phillips, 2003;Krapivin et al., 2017a; aBPM arctic Basin pollution model (Kondratyev, Krapivin, & Varotsos, 2003) MatP Model of long-range atmospheric transport of the pollutants (Kondratyev et al., 2002) FPM Food production model (Dao et al., 2015;Nitu, Dumitrasku, Krapivin, & Mkrtchyan, 2015) aiFi Evolutionary algorithm for the indicator calculation of the food industry (Nitu et al., 2004) UEM an upwelling ecosystem model (Krapivin & Varotsos, 2016 (Kondratyev et al., 2006) BDP the big data processing with the use of sequential and cluster analyses Soldatov, 2010Soldatov, , 2015 gSa the giMS structure adaptation to the simulation experiment conditions  DFM Big data cloud formation and management SS Synthesis of the scenarios for the interaction of population with the environment SEMC Simulation experiment management and control • What tools, remote sensing platforms, and instruments should be used to form the GIMS database? • What is the cost of information for the GIMS simulation experiment? • What balance should be between different data sources?
In this paper, we have been able to show the way of addressing these and other questions, that are predominately linked to CNSS sustainable development using simple biosphere models and complex simulation models that require the big data processing (Bartsev, Degermendzhi, & Erokhin, 2008;Degermendzhi, Bartsev, Gubanov, Erokhin, & Shevirnogov, 2009;Sellers, Mintz, Sud, & Dalcher, 1986). In any case, the search for alternative pathways toward global CNSS sustainable development is realized through simulation experiments in the context of scenarios that are proposed by experts. The GIMS extends the sphere of scenarios and optimizes the big data fluxes. This paper describes the main structure of GIMS and gives examples of its use to demonstrate functional efficiency of the big data analysis and processing. Scenarios that are studied here show the presence of alternatives in environmental anthropogenic strategies.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This study was partly supported by the Russian Fund for Basic Researches [Project No. 16-01-000213-a].