The digital Earth Observation Librarian: a data mining approach for large satellite images archives

Throughout the years, various Earth Observation (EO) satellites have generated huge amounts of data. The extraction of latent information in the data repositories is not a trivial task. New methodologies and tools, being capable of handling the size, complexity and variety of data, are required. Data scientists require support for the data manipulation, labeling and information extraction processes. This paper presents our Earth Observation Image Librarian (EOLib), a modular software framework which o ﬀ ers innovative image data mining capabilities for TerraSAR-X and EO image data, in general. The main goal of EOLib is to reduce the time needed to bring information to end-users from Payload Ground Segments (PGS). EOLib is composed of several modules which o ﬀ er functionalities such as data ingestion, feature extraction from SAR (Synthetic Aperture Radar) data, meta-data extraction, semantic de ﬁ nition of the image content through machine learning and data mining methods, advanced querying of the image archives based on content, meta-data and semantic categories, as well as 3-D visualization of the processed images. EOLib is operated by DLR ’ s (German Aerospace Center ’ s) Multi-Mission Payload Ground Segment of its Remote Sensing Data Center at Oberpfa ﬀ enhofen, Germany.


Introduction
The Earth observation data, gathered throughout the past decades, fall very well in the big data 3Vs (Volume, Velocity, and Variety) paradigm (Laney, 2001). In order to bring the endusers closer to the knowledge hidden within the huge volumes of data, new tools are required, which are capable of finding and retrieving the sought-for information. A multitude of domains, such as agriculture, ecology, forestry, urban planning, security, and many others can benefit from the hidden potential of EO data. A general framework, which could serve all of them, has to overcome the differences which occur due to the multitude of various sensor data acquired throughout the years. Currently, EO image data are stored in data warehouses, and depending on the source mission/sensor, they can have different spatial resolutions, number of spectral bands, and were obtained through active or passive methods.
Manual annotation of the gathered EO data is not feasible because of the overwhelming volumes of data being involved. Computers can ease this process, but several issues must be tackled. The difference between human and machine knowledge, and the way each of them understands data is a potential impediment. Humans rely on visual perception (color, texture, and shape), while computers require appropriate numerical representations in order to find logical correlations in the data. Thus, the question of how to represent the data arises. Another impediment is the overall volume of the data. Even if all information is correctly extracted, finding it in the raw data and within a reasonable amount of time is quite a challenge. Thus, defining an efficient method for storing and indexing imaging data with supplementary information may be the most challenging task as any proposed solution must be highly scalable, since ongoing EO missions provide increasing amounts of data.
We introduce a new paradigm for Earth observation: Data Knowledge Discovery. This paradigm defines the entire chain "data-information-knowledge-value" and relies on its meaningful EO content extraction, i.e. the semantic and knowledge aspects.
We develop user-invariant and domain-compensatory EO methods for the individual users and for subjective domain biases. The derived models generate a shareable knowledge body as a means to enable the communication of fragmented knowledge learned from meta-data, image data, and other data in synergy with the domain expertise of EO users.
This paper describes the full functionality of EOLiban Earth Observation Image Librarian, a software framework which tries to overcome the 3Vs of big EO data, working directly with the PGS data (Wolfmuller et al., 2009), in an effort to provide information each time when data are produced. In Section 2 we present several EO data mining systems which offer solutions for some of the impediments encountered when tackling large volumes of EO data. The main data source for EOLib, the TerraSAR-X test dataset, is described in Section 3, together with all the associated pre-possessing functions. Section 4 describes the overall architecture of the system, and each of the individual, specialized modules. The algorithms used in some of the key components of the system are presented in Section 5. A step-by-step presentation of the system functionality is given in Section 6, while Section 7 is reserved for conclusions and potential future improvements.

State of the art
Throughout the years, several Image Information Mining (IIM) systems and concepts have been developed. These tried to solve some of the Earth Observation data mining issues through different methods. A solution for data classification and fast queries of compound structures from very high-resolution optical EO data is proposed in (Gueguen, 2015). Here, a bag of features with spectral and shape representation of multi-scale segmented images (tiles) is employed. Each tile is represented by the distribution of visual words it fully contains; MinTree (Salembier, Oliveras, & Garrido, 1998) is used to produce the necessary multi-scale segmentations. The bag of features employs nine spectral descriptors and four shape descriptors. For test purposes, the authors of Salembier et al. (1998) used the UC Merced 21-class dataset (Yang & Newsam, 2010) and two WorldView-2 (8-band) images for feature definition.
Queries can be performed for each tile scale and start with a training phase (one-vs.-rest algorithm). A selected region can be used to narrow down the searched range of tile scales. Another solution is GeoIRIS (Shyu et al., 2007) which includes automatic tile-based and objectbased feature extraction. GeoIRIS comprises six main modules: FE -Feature Extraction, IS -Indexing Structures, SF -Semantic Framework, GS -GeoName Server, FR -Fusion and Ranking, and RV -Retrieval and Visualization. The system uses spectral feature extraction algorithms, such as histograms with eight bins (for panchromatic grayscale, RGB, and NIR data), texture extraction algorithms such as Haralick's method (Haralick & Kelly, 1969) computed for panchromatic grayscale, RGB, and NIR data, linear features used to detect manmade elements, and DMP object extraction (Pesaresi & Benediktsson, 2001). For feature definition, the authors used QuickBird data which are made of a combination of 0.6 to 1 m panchromatic and 2.4 to 4 m multispectral satellite imagery resulting in 0.6 to 1 m pansharpened multispectral (PSMS) images. These images are then divided into 256 × 256 m tiles (approx. 256 × 256 pixels). The tiles overlap by 25% in each of the four corners, totaling in 70,824 tiles, which contain 531,208 objects. The feature extraction module extracts the features from each individual tile and from the relations between objects found within a maximum radius. The retrieval of top 100 ranked objects from the database typically returns results within 200 to 450 ms.
The supported query methods include: query by example (CBIR) (Blanchart, Ferecatu, & Datcu, 2011b), hybrid querying (CBIRlimited to a geographical area and a specific semantic meaning), object querying, multi-object spatial relationship queries, and semantic queries. As an alternative, Rathore et al. (2015) proposed a real-time big data analytical architecture comprising three main units: a remote sensing Big Data acquisition unit (RSDU), a data processing unit (DPU), and the data analysis decision unit (DADU). The RSDU component preprocesses the data gathered from multiple sources and includes data ingestion, data cleaning, and redundancy elimination. The DPU has the role of data filtration (identification of useful data) and load balancing of the processing resources. Each processing server contains algorithms which analyze land, sea, and ice data. The system is also capable of performing statistical calculations, mathematical, and logical tasks. The DADU component contains three sub-modules: an aggregation and compilation server, a result storage server, and the decision making server. The datasets used for testing are made up of ENVISAT ASAR and MERIS images from Vietnam, Poland, Germany, Western Sahara, Mauritania, South Africa, and Spain. The system is written in Java and uses the Beam 5 library (a precursor of SNAP (Sentinel application platform [SNAP], 2019)), Hadoop, and MapReduce.
Another interesting approach is the SemQuery framework (Semantic Clustering and Querying on Heterogeneous Features for Visual Data) (Sheikholeslami, Chang, & Zhang, 2002). It uses a set of main clusters which generate a high-level representation with attached semantic meaning for a given set of images. Each cluster contains several multi-level subclusters, representing low-level image features. For an image to be indexed to a high-level cluster, all image features must be similar to those of the cluster. Then, an index tree is built on semantic labels, sub-labels and extracted features. Its root is made of general categories, while the semantic sub-categories are built on the clusters of a combination of different feature clusters. All queries search in multiple feature clusters and the retrieved results are ranked and merged using a weighted method (linear merging-sum of relevance x weight). Relevance is the similarity measure or the rank of the image in the result list following the decreasing order of the similarity. The dataset used to test SemQuery is made of 29,400 texture and color feature vectors which were grouped into five categories: clouds, flowers, foliage, mountains, and water.
Wang and Song proposed the SBRSIR (Semantics Based Remote Sensing Image Retrieval) system (Wang & Song, 2013) based on semantic spatial relationship detection. The proposed scheme is composed of a series of steps: image decomposition, segmentation, object-based classification, spatial relationship reasoning, and scene modeling based on semantic spatial relationship detection. Each image is decomposed into blocks using a quin-tree structure. An image segmentation method divides the image blocks into parcels. Each parcel has its VF (visual features: texture and color) extracted and classified into objects. The area, orientation, and the semantic topology features are calculated and stored for the labeled parcels. Each generated parcel is vectorized and its features (spectral featuresspectral mean and standard deviation, shape featuressuch as parcel area, perimeter, and length/width ratio, texture features -GLCM (Popescu, Gavat, & Datcu, 2012) and Gabor coefficients) are calculated and stored. The test dataset for the SBRSIR method consists of three SPOT-5 images, five GeoEye images, and six ALOS images acquired in 2000 and 2011. These were split into 580 (1,024 x 1,024 pixel) sub-images and processed using PCA, and decomposed into quin-tree subblocks (2,900 second level blocks with 512 × 512 pixels). The level-2 sub-blocks were decomposed into 16 × 16 feature blocks and their visual features were calculated and stored. The parcels were classified using an SVM with an RBF kernel (Chang & Lin, 2011) (with cost-25 and gamma-40 parameters). During testing, one to five semantic categories were selected, and the search time on the entire database ranged from 46 to 66 s.
Other mentionable systems are KIM (Knowledge driven content-based image Information Mining) (Datcu et al., 2003) which relies on the stochastic modeling of probabilities providing an application-independent EO data mining system; and GeoIRIS (Shyu et al., 2007) which tries to manage the huge volumes of data through automatic pre-possessing and indexing of image data. Both systems incorporate content-based retrieval methods and use features to represent the image content. Data processing requires a high level of expertise from the endusers which can limit the access for some of them. A solution for this limitation is presented in (Rasiwasia, Moreno, & Vasconcelos, 2007), which uses semantics-based queries in order to reduce the level of expertise required from the end-users. The latest, more complex systems, such as (Espinoza-Molina & Datcu, 2013), use multiple sources of information to optimize the search patterns.
Today's EO paradigms and technologies are largely domain-oriented. As an example, ESA's Thematic Exploitation Platforms (TEPs) (Thematic exploitation platform [TEP], 2019) are designed and focused on Coastal Areas, Forest, Urban Areas, and other application domains, integrating a standard processing chain that has low user interaction. Within the currently developed Copernicus system, the Data and Information Access Services (DIAS) (Data and information access services [DIAS], 2019) are a major achievement but still in a "classic" paradigm technology.

Benefits of the EOLib system
In an effort to obtain a complete EO mining system, we have developed and implemented Earth Observation Image Librarian (EOLib) system. It is a latest-generation active Image Information Mining (IIM) system, which has novel functions for image content exploration and knowledge discovery. EOLib is integrated and operated within the Payload Data Ground Segment (PDGS) of the German TerraSAR-X mission (TerraSAR-X, 2019). It allows a more complete exploration of SAR data by the use of large-scale information mining functions. The system is capable of closing the "data-information knowledge-value" chain, containing modules which extract information from both new data and existing databases. Its modular architecture enables it to grow as new EO missions produce new data. The EOLib software framework has been tested on a dataset totaling more than 700 GB, and the results are detailed in Section 6.
The system offers to the users a list of feature extraction methods (Blanchart et al., 2011b), and the exploration of the image content (e.g., classification, annotation) is made with very few examples due to the developed algorithm  which is also capable of extracting a large variety of categories from the images.
Another important point of the system is the unique hierarchical semantic annotation schema (Dumitru, Schwarz, & Datcu, 2016) that initially has been developed for highresolution images (e.g., TerraSAR-X), and was now upgraded also for medium-resolution and low-resolution images.
The output of the system can be a classification map, a list of query results (combining metadata with semantic labels), or some statistics.

Data overview
For our tests, from the TerraSAR-X archives (TerraSAR-X, 2013), we prepared a dataset that mainly covers urban and industrial areas together with their infrastructure from all over the world. The dataset consists of 1,100 full scenes with a distribution presented in Figure 1.
The TerraSAR-X imaging parameters are detailed in Table 1. The scenes were selected based on their availability, their content, the typical diversity of country-specific land covers, and their acquisition parameters. For storage and indexing purposes, each image is divided into patches of different sizes (e.g., 200 × 200 and 160 × 160 pixels). Feature vectors are extracted from each patch and the information was stored in a database. Given the number of images, the dimensions (in pixels) of each individual image, and the number of segmentation levels, the database than contains 9.24 million vectors (per each type of image descriptor).

EOLib system architecture
EOLib (our Earth Observation Image Librarian) is an operational Image Information Mining (IIM) system with a modular structure integrated within the Payload Ground Segment (PGS) of the TerraSAR-X mission, at DLR's premises at Oberpfaffenhofen, Germany. It enables content interpretation and helps create a correlation between semantic labels and object categories. EOLib offers multiple functionalities such as: data model generation, visual data mining and knowledge discovery in databases (including semantic annotation), queries, and epitome production. The system is capable of understanding TerraSAR-X data and their meta-data in order to improve the entire processing chain consisting of pre-processing, feature extraction, semantic annotation of the EO data, and the creation of product catalogs. Through EOLib, we attained a more complete and comprehensive exploration of the satellite image data. The system is intended as a long-term exploitation of the PGS component by Data Mining (DM) and Knowledge Discovery in Database (KDD) elements. The architecture allows for continuous evolution, migration and scalability. The system has applications both for SAR and optical image data. It also provides the infrastructure for further development of algorithms, methods, and sub-components for DM and KDD. EOLib manages to provide the following highlevel functionality: • Operates in real time (once the data from the PGS have been processed and ingested into the database) • Operates on very large data volumes • Extends the functionality of the PGS with new Knowledge Discovery components • Integrates novel components into the PGS architecture.
The system currently runs on a single server machine at DLR's premises. The server is composed of 8 CPUs (each with 10 cores and 20 threads), 256 GB of RAM, 22 TB of HDD storage, and runs under Ubuntu Linux Operating System. The EOLib software is based on Java and uses MonetDB (MonetDB, 2019) as its Database Management System. The architecture of the EOLib system together with its main components and their interactions is shown in Figure 2. Rectangles with rounded corners are used to represent the main components (PGS components are presented in blue, while EOLib components are given in orange). Arrows are used to represent interfaces which interconnect the components. The orientation of the arrows presents the data flow direction, while the data passing through the interface are presented as a rectangle above or near the arrow. The components which require the interaction of the user are presented by a user icon on their upper-right corner.
We will further describe the EOLib components given in orange in Figure 2. The processing chain starts with the pre-processing of TerraSAR-X products and ends with the semantic annotation. Figure 3 outlines the main modules of EOLib together with the existing interface with the PGS, the end-user, and the data flows between modules. Components are shown as rectangles, while functional sub-components are represented by folders. Dashed arrows represent used-by relations, while full arrows are used to represent data flows. The main components of EOLib which will be described in the following subsections are: Data Model Generation, Data-Mining Database, Query Engine, Content-based Image Retrieval, Visual Data Mining, Knowledge Discovery in Databases, Epitome Generation, and System Evaluation.
Operation of the EOLib system by the users: The relation between the modules, and the steps of their use are presented below before detailing all the modules. Due to our concept that the data have already been extracted from the archive and processed via Data Model Generation (DMG), the system can be operated in three ways: (1) Visual Data Mining: The users navigate through the data (based on the extracted features) and can understand the content of the data as well as potential clustergroupings.
(2) Image Mining: The users run a machine learning tool via its interactive GUI (Graphical User Interface) based on an active learning module (Blanchart et al., 2011b) which is exploiting all actionable information. Its functions are: search, browse, and query for image patches (the full image was already tiled into square sub-images with a side length defined before in the DMG). The discovered relevant structures are semantically annotated and stored into the Data-Mining Database. For this, we use the features extracted from the patches. (3) Data Mining: Via the Query Engine, the users perform SQL searches, queries, and browse or extract the data analytics information. Data Mining uses image features, image semantics, and selected satellite product meta-data.

Data model generation
The Data Model Generation (DMG) module is the first element of the TerraSAR-X product analysis chain. As can be seen in Figure 3, it is part of the Processing System Management (PSM), which is a sub-component of the DIMS system (Wolfmuller et al., 2009). DMG processes the Level-1b (L-1b) TerraSAR-X products. The TerraSAR-X products are composed of a GeoTIFF image and are accompanied by XML meta-data files. The processing steps (shown in Figure 4) of the DMG module are: Figure 3. EOLibcore components, overall functionality, and dependencies.
• Meta-Data extraction • Tiling of the image content at multiple resolutions • Basic feature extraction (BFE) • High-resolution quick-look (HR-QL) generation • Ingestion of image time-series (ITS) • Import of other sources as, for example, GIS information.
During the meta-data extraction phase, relevant entries such as location, angles, and mission information are obtained from the XML file which accompanies the image data. A multi-resolution tile pyramid (several grid levels with tiles of different size) and quicklooks are created in order to improve visual data manipulation and to aid the machine learning process. BFE is applied to the previously generated tiles which are considered as distinct entities, using the relations between pixels within each patch and no single pixel information (Popescu et al., 2012). Gabor filters (Manjunath & Ma, 1996), (Singh & Datcu, 2013) and Weber Local Descriptors (WLDs) (Cui, Dumitru, & Datcu, 2013) are used as feature extraction methods in the DMG module. In order to aid the data exploration process, in this phase, a quick-look image (a compressed RGB or gray-scale image, either optical or SAR) is created and stored in the database. The final product model of the DMG is an XML file which contains the previously extracted meta-data, the grid levels, the tile information, and data vectors representing the extracted image descriptors derived by the Gabor and WLD algorithms. The descriptors are used later in the classification process of the knowledge discovery module. In addition, the XML file is ingested into the Data Mining Database which is be described below.

Data mining database
The products generated by the DMG are mapped into a relational database. The Data-Mining Database (DM-DB) module manages data handling, storage, administration, and some of the image processing for all EOLib components. It is composed of data, a database schema, and stored procedures (which support the data mining operations). The stored information is composed of meta-data, image parameters, tile parameters, quick-looks, feature vectors, and semantic labels. Thus, all information about the TerraSAR-X products are stored into a table-based scheme, which implements the proper relationships between the tables and indices for optimized processes. This module is supported by a relational Data Base Management System (DBMS), which represents a core component of the system, and interacts with all the user-oriented components and supports their functionality. This database also stores the semantic annotations made by the end-users and provided through the Knowledge Discovery component.

Query Engine
The Query Engine (QE) consists of a GUI (Graphical User Interface) named Query Builder, and the Query Engine Core which uses the queries received through the GUI to retrieve data from the DM-DB. The end-user can use the QE to find and retrieve desired scenes with specific content. The image database can be queried using meta-data, semantic definitions, and multi-temporal parameters. In order to generate semantics-based queries, the end-user can select one or several labels from the options being available in the semantic catalog. The most complex queries can combine meta-data and semantic definitions in order to produce more specific results. The QE GUI (standalone or webbased) helps the user to access the query services of the image archives. The queries read part of the data model stored within the data mining database (DM-DB) and the output is a list of data items like image patches or full images.

Content-based image retrieval
The Content-based Image Retrieval (CBIR) consists of a GUI which interacts with the user at one end, and receives images from the Query Engine Core. The latter reads the userdesigned queries, and uses them to retrieve the data from the DM-DB. This component is based on the query-by-example (QbE) concept, in which the user provides an external or internal image, and retrieves images which have similar content. The semantic labels are determined through one or multiple examples, and the results are returned based on the similarity of features extracted from the query samples, and the previously ingested images. The results are ordered in decreasing order of their similarity.

Visual data mining
The Visual Data Mining (VDM) module provides a graphical and intuitive method to search highly complex, and nonvisual data sets being stored within the database. The selection of different images in 2-D or 3-D space is achieved through visualization techniques, data reduction methods, and similarity metrics to group the retrieved images into relevance clusters (Singh & Datcu, 2013). VDM provides browsing, querying, and zooming, which enable the end-user to navigate the data mining database. Each 2-D and 3-D projection of the database is obtained through a dimensionality reduction algorithm called t-distributed Stochastic Neighbor Embedding (t-SNE) (van der Maaten & Hinton, 2008). It uncovers heterogeneous hidden structures in the data, exposes natural clusters, and smoothes non-linear variations along the feature dimensions.
The VDM module consists of a GUI which allows the user to navigate through the data, and a VDM Core which retrieves the data from the DM-DB and adapts them in order to be displayed through the GUI.

Knowledge discovery in databases
Manual semantic annotations are time-consuming and user-intensive tasks. The process of applying semantic labels to the ingested data can be optimized through supervised or un-supervised machine learning methods.
The Knowledge Discovery in Databases (KDD) module consists of a GUI which receives the end-user input, and a KDD Core which processes the user's input, and applies it to retrieve data from the DM-DB. This module is used to create semantic definitions of the image content. This is achieved through semi-supervised machine learning methods, and uses a relevance-feedback loop in order to include human expertise in the annotation process and the semantic category definition.
The active learning algorithms are iterative sampling schemes where a classifier is tuned at each iteration by providing it with newly labeled samples. In turn, this helps achieve greater accuracy with fewer training labels (Espinoza-Molina, . These concepts for the labeling of image content are implemented in the KDD component (see Figure 5) as follows: The active learning component is based on a Support Vector Machine (SVM) with cascaded learning (Blanchart, Ferecatu, Cui, & Datcu, 2014). The end-user visually analyzes the results and refines the training data by giving more examples and by correctly relabeling the erroneous results. The actions are supported by the Relevance Feedback (RF) interface contained in the GUI, which generates an automatic ranking of the returned images. The active learning loop is stopped when the user considers the results to be satisfactory. The process can end with a definition of a semantic category for the ranked tiles, which is further stored within the data mining database catalog. The two goals of the active learning phase are to learn the targeted image category as accurately and exhaustively as possible, and to minimize the number of iterations in the relevance feedback loop.

Epitome generation
An epitome is a new product type, for future satellite ground segments, consisting of a semantic label generated through a feature-based description of the image product content. It includes the semantic labels of the image tiles and is produced through image feature extraction and semantic annotation. The generated information can be delivered independently or together with a standard EO product. An epitome represents an enriched and added-value product that can be exploited interactively with appropriate end-user applications. The epitome is generated in the Delivery PGS component and is composed of meta-data, high-resolution quick-looks, basic features, and all the annotations. The epitome generation component reads the information from the Data Mining Database and transfers it to the delivery service (Espinoza Molina & Datcu, 2016).

System evaluation
The System Evaluation (SE) module consists of a GUI which interacts with the end-user, and receives its input from the SE Core, which exploits the user's input to retrieve and process data from the DM-DB. The SE component collects information and statistical data about the activity of the EOLib components. Quality metrics and evaluation procedures are defined according to the evaluated element.
This component requires pre-defined test data sets, and is also in charge of the quality evaluation of individual EOLib components, as well as system validation.

EOLib algorithms
EOLib uses different feature extraction methods according to the input data. The feature extraction algorithms read the original L-1b EO products (e.g., for detected and geo-coded TerraSAR-X products, data are represented by unsigned 16-bit values, while for QuickBird unsigned 11-bit values are used). The algorithms are applied to the patches generated from a full image, according to the analyzed window (and its patch size, as a parameter). The main feature extraction algorithms implemented in EOLib are: Weber Local Descriptor (WLD) (Chen et al., 2010), Non-Linear Spectral Feature Description (Popescu et al., 2012), Gabor Linear Moments (Manjunath & Ma, 1996), Adaptive Weber Local Descriptor (AWLD) (Cui et al., 2013), and Multi-Temporal Similarity Metric Based on Wavelet Modeling (Cui & Datcu, 2012). The currently implemented classifiers are Semi-Automatic Semantic Annotation Based on a Support Vector Machine with Relevance Feedback (SVMRF) (Dumitru, Cui, Schwarz, & Datcu, 2015), and Cascaded Active Learning for Object Retrieval (CALOR) (Blanchart, Ferecatu, & Datcu, 2011a).

Weber Local Descriptor (WLD)
This descriptor consists of two components: differential excitation, and orientation. It is inspired by Weber's Law, which is a psychological law. It states that the change of a stimulus (such as sound, lighting) that will be just noticeable is a constant ratio of the original stimulus. When the change is smaller than this constant ratio of the original stimulus, humans would recognize it as background noise rather than a valid signal. Motivated by this point, for a given pixel, the differential excitation component of the proposed Weber Local Descriptor (WLD) is computed based on the ratio between two terms: the relative intensity brightness differences of a current pixel against its neighbors (e.g., a 3 × 3 square box); and the brightness of the current pixel. With the differential excitation component, we attempt to extract the local salient patterns in the input image. In addition, we also compute the gradient orientation of the current pixel. That is, for each pixel of the input image, it is possible to compute two components of the WLD feature (i.e., differential excitation and gradient orientation). By combining the WLD feature per pixel, one can represent an input image (or a region) with a histogram, which is called WLD histogram hereinafter. In (Chen et al., 2010), the WLD features are computed pixelwise as a dense descriptor.

Non-linear spectral feature description
A non-linear short-time Fourier transform analysis is based on the principle of stationary short-time signals as proposed in (Popescu et al., 2012). The method extracts six nonlinear features: the first two features are based on statistical properties of the spectrum, and the next four ones are motivated from timbre features used for music genre classification. The computed results are: mean and variance (of the coefficients), spectral centroid in range and azimuth, and spectral flux in range and azimuth.

Gabor linear moments
A Gabor filter is a linear filter used in image processing (Manjunath & Ma, 1996). Frequency and orientation representations of a Gabor filter are similar to those of the human visual system, and were found to be particularly appropriate for texture representation and discrimination. In the spatial domain, a 2-D Gabor filter is a Gaussian kernel function modulated by a sinusoidal plane wave. The Gabor filters are self-similar and can be generated from one mother wavelet by dilation and rotation. This implementation of the Gabor filter (Manjunath & Ma, 1996) convolves an image with a lattice of possibly overlapping banks of Gabor filters at different scales, orientations, and frequency. The scale is the scale of the Gaussian being used to compute the Gabor wavelet. The texture parameters are computed from the first-order statistics of the Gabor filter: mean and variance for different scales, orientations, and frequencies.

Adaptive Weber Local Descriptor (A-WLD)
An adaptive WLD is proposed for texture characterization, and contains two components: differential excitation and orientation. Based on these two terms, a joint histogram can be constructed, followed by a conversion to a one-dimensional histogram, which is the WLD descriptor. The advantage of this feature is that it considers not only the local contrast but also the structure information represented by a gradient histogram. Although Weber's law is a theory from visual perception, and there is no visual criterion for SAR images, the principle of feature extraction is applicable to SAR images, too. However, in the case of SAR images, multiplicative speckle noise dramatically decreases its discriminative ability for image indexing. To combat speckle, a solution is to replace the gradient in WLD by the ratio of mean differences. Based on our newly adapted components (orientation and excitation), WLD for SAR images can be defined as a joint histogram (Cui et al., 2013). Following the same strategy as WLD, this joint histogram is converted to a onedimensional histogram. The adapted WLD includes not only local statistics but also local structure information, resulting in an improved performance in SAR image indexing.

Multi-temporal similarity metric based on wavelet modeling
In the context of multi-temporal SAR image change detection for Earth monitoring applications, one critical issue is to generate accurate change maps. A common method to generate change maps is to apply a logarithm to the ratio image. However, due to the speckle effect and without taking contextual information into account, logarithms are usually not efficient for accurate change detection applications. In (Cui & Datcu, 2012), an unsupervised change detection method in the wavelet domain based on statistical wavelet sub-band modeling is proposed. The motivation is to capture textures efficiently in the wavelet domain. A wavelet transformation is applied to decompose an image into multiple scales. A probability density function of the coefficient magnitudes of each sub-band, assumed to be generalized Gaussian distributions (GGD), are obtained by fast parameter estimation.
A closed-form expression of Kullback-Leibler divergence between two corresponding subbands of the same scale is computed and used to generate the change map. This approach is comprehensively evaluated and compared using different parameter setting, scales, window sizes, and estimators. The proposed SAR change detection in the wavelet domain shows promising results as texture can then be better characterized than in the spatial domain. Through this study, the authors of (Cui & Datcu, 2012) conclude that the accuracy depends heavily on the estimation methods, although the model is also important.

Semi-automatic semantic annotation based on a support vector machine with relevance feedback (SVMRF)
Support Vector Machine (SVM) methods have been intensively used for classification and learning in Content-based Image Retrieval (CBIR) systems, with very good results during the last years. These methods are kernel-based learning machines and they have the capability of performing high-accuracy classifications for different types of data and with a small number of examples (patches) for each category. Of high importance in retrieval systems is the problem of Relevance Feedback (RF), which is a form of query-free retrieval (e.g., based on Euclidian distances (Costache & Datcu, 2007)), and which allows that images be retrieved according to a similarity measure and a given set of sample images. The goal of the system is to retrieve images based on the user's decision and on their interest. Thus, in the RF loop, the user labels the images as being either relevant or irrelevant. SVM is a tool that is well suited to implement the RF method (Costache, Maitre, & Datcu, 2006). RF based on an SVM can be seen as a twostep loop process.
The first step is the learning step where, using a small number of examples (patches), the system learns how to separate data which belong to different categories. In the RF concept, there are only two categories: relevant samples, and irrelevant samples. In the second step, the classification of the remaining samples (patches) is made in order to decide which samples are relevant or irrelevant (Costache et al., 2006).
An SVM-RF tool is used in  to automatically retrieve the existing categories from a large dataset and to semantically annotate them. Before starting the annotation, in , an evaluation of the four feature extraction methods is performed. After selecting the ones with the best performance, based on the precisionrecall metric, the tool is used to automatically retrieve the categories from the dataset. In order to annotate and find an appropriate semantic meaning for each retrieved category, human interaction is needed using a reference dataset (e.g., from Google Earth). Our dataset covers different areas of the Globe (e.g., 39 locations), where 320 categories were identified.

Cascaded Active Learning for Object Retrieval (CALOR)
CALOR (Blanchart et al., 2011a) is based on a hierarchical top-down processing scheme for object retrieval in high-volume high-resolution optical satellite repositories. It learns via a multistage active learning process, with a cascade of classifiers working each at a certain scale on a patch-based representation of an image. On each stage, we seek to eliminate large parts of images considered as being non-relevant; the purpose is to focus on the finest scales and on more promising and as spatially limited as possible areas. The scheme is based on the fact that by reducing the size of the analysis window (i.e., the size of the patch), we are able to better capture the properties of the targeted object. The cascaded hierarchy is introduced to compensate for the extra computational burden induced by diminishing the size of the patches, which causes an exponential growth of the number of patches to be processed. Blanchart (Blanchart, 2011) proposed a cascaded active learning strategy to build a classifier at each level of the hierarchy and we provided a new Multiple Instance Learning Algorithm to automatically propagate the training samples from one level of the hierarchy to the next one.

EOLibfunctionality
In this section, the full functionality of the EOLib system will be presented together with examples. These are: Query by Meta-Data and Semantics, Content Based Image Retrieval, Visual Data Mining, Knowledge Discovery in Databases, and System Evaluation.

Querying meta-data and semantics
From the various query types that are implemented, we demonstrate three typical queries referring to meta-data and semantic annotation. The first group of examples shall find typical TerraSAR-X scenes. Figure 9 shows the product quick-looks matching a criterion (bottom half) together with their meta-data: • Incidence angle between the lower and upper bound of the specified range (see Figure 6) • Ascending and descending orbit branch and looking direction (see Figure 7) • Latitude and longitude between the range values of each continent (see Figure 1) • Acquisition dates between 2007 and 2015 (see Figure 8).
A second example is to find all patches within our database that contain "Storage tanks" as a semantic label and to display them together with their meta-data. The GUI which offers such information is shown in Figure 9. The meta-data is displayed in the top-half part of the interface, while the visual representation is given in the bottom half. An element selection in either half will produce a highlight in the other half. A similar example can be to find "Skyscrapers" within the database ( Figure 10) and to display the percentage of retrieved "Skyscrapers" for several given North American cities (e.g., Calgary, Ciudad Juarez, North San Diego, Poway, South San Diego, Sun Lakes, Tijuana, Tucson, Santa Clarita, Reno, Vancouver, Washington DC, and Ottawa) (see Figure 11).

Content-based image retrieval
We carried out some experiments using semantically annotated TerraSAR-X data. We searched our database for patches with different content and we compared the retrieved results with the reference data (the ground truth data). Finally, we computed the precision/recall metrics. Precision is defined as the fraction of the retrieved images which is relevant, while recall is defined as the fraction of relevant images which has been retrieved (Han, Kamber, & Pei, 2012). From our TerraSAR-X dataset, we extracted a subset that covers German cities such as: Bonn, Oldenburg, Munich, Kiel, Cologne, Berlin, Lindau, Karlsruhe, Mannheim, Stuttgart, and Garmisch-Partenkirchen. These images were divided into patches, and about 50 semantic categories were identified (e.g., "Agricultural land", "Bridges", "Channels", "Broad-leaf forest", "Mixed forest", "Highdensity residential areas", "Medium-density residential areas", "Low-density residential areas", "Mountains", "Lakes", "Roads", "Railways", "Rivers", and "Storage tanks"). In Tables 2 and 3, we show the performance metrics for the results obtained from two cites, namely Munich and Berlin, Germany using Gabor linear moments as a feature extraction algorithm. The average precision is 88% in the case of Munich, and 90% in the case of Berlin, while the recall amounts to 85% in the case of Munich, and 60% in the case of Berlin. The results of the CBIR are summarized in Figure 12 for a number of important categories (e.g., "High-density residential areas", "Channels", "Broad-leaf forest", "Sports terrain", and "Railway tracks") selected from the two cities.

Visual Data Mining
Visual Data Mining is a tool which allows the user to interactively visualize feature spaces of large repositories of images with a functionality supporting browsing, querying, and zooming (Espinoza-Molina, Datcu, Teleaga, & Balint, 2014). The    loading time for more than 120,000 patches with a size of 160 × 160 pixels is less than two minutes, but it should be noted that by increasing the patch size, the loading time will increase. This tool allows interactive exploration and analysis of very large, high-complexity, and non-visual data sets stored within the database. It provides the end-user with an intuitive tool for Data Mining. By using a graphical interface, where the selection of different images and/or image content in 2-D or 3-D space is achieved through visualization techniques, data reduction methods, and similarity metrics to group the images, efficient data manipulation at a higher scale can be achieved. Samples can be annotated and stored into the database and can be augmented using Google Earth.   An experiment was performed using TerraSAR-X scenes from Germany comprising a total of 7,000 patches. Each patch created a Gabor feature vector with 48 dimensions. Dimensionality reduction was applied and the feature vector of 48 dimensions was converted into a three-dimensional vector. Most of the patches are grouped into one big cluster.
In Figure 13, we selected a group of patches to be exported to Google Earth. The next step is the mapping of the projected patches into a three-dimensional space, and an analysis of their content and possible outliers. The patches can also be annotated and exported to Google Earth in order to be validated by comparing them with terrain information Figure 12. CBIR results for five semantic categories: "High-density residential areas", "Channels", "Broad-leaf forest", "Sports terrain", and "Railway tracks". The left column shows the example, while the right side presents the retrieved patches similar to the provided query example.

Knowledge discovery in databases
The KDD module is used to search for image content and to create semantic annotations of the ingested images. This implementation is based on the Cascaded Active Learning for Object Retrieval (CALOR) algorithm, which contains a Support Vector Machine (SVM) as active learning and relevance feedback method in order to include human expertise in the annotation. The definition of image semantics is achieved by using an interactive loop where human expertise is required to terminate the loop and to define the semantic category.
The idea behind active learning is that a machine learning algorithm can achieve higher accuracy with fewer training labels if it's allowed to choose the data from which it learns. The main functionality of the KDD module is two-fold: • Exploration of the data set in order to see the evolution of the content of a category along the retrieved patches • Semantic annotation of the dataset after the user has an idea about the content of the data set, and semantic meaning can be added to the data.
The experiments were performed on a dataset consisting of 200 images out of the 1,100 TerraSAR-X acquisitions which cover a wide variety of the Earth's surface. These images include categories such as: "Skyscrapers", "Storage tanks", "Ships/Boats", "Harbor infrastructure", "Airports", and "Inhabited built-up areas". Once a user has an idea about the content of the data set, the user can semantically annotate the data with the explored categories. These semantic categories are stored into the DBMS and can be queried later. In Figure 14, we show the steps that need to be followed by the user in order to make the annotation and save the information into the DBMS. In this example, 33 TerraSAR-X images have been used together with three panels.   Figure h) shows the outline of the patches in blue for "Mixed urban areas" category and in green for "Sand".
In Figure 14(b) the GUI interface shows the results after the training, in which we are looking to find those patches with similar content. In this case, we are looking for the "Mixed urban areas" category. The retrieved patches in level 1 are marked in blue and projected on the image. Figure 14(c) shows the retrieved patches in level 2 and projected onto the image. Figure 14(d) shows the retrieved patches in level 3 and projected onto the image. In Figure 14(e) the user can now semantically annotate the retrieved patches by choosing the correct label from an existing or user-defined list. For each patch, a label is allocated and saved into the database. Figure 14(f) presents the annotated area in reddish together with the label and the number of annotated patches with this label (see the GUI bar interface on its center-left side). In Figure 14 Figure 14(h) shows the results of the "Mixed urban areas" category in blue and another category, namely "Sand" in green. The annotation can continue until the full image is annotated. Other labels to be retrieved can be "Ploughed agricultural land", "Industrial buildings", "Roads", etc. while the sought-for category is "Mixed urban areas". The GUI interface is composed of a top-left panel which shows the relevant retrieved patches, and a bottom-left panel which shows the irrelevant retrieved patches. The large right panel shows the image/images (one can see the list of images on the top-center GUI bar together with their TerraSAR-X unique id) that are being processed, benefiting from the zoom function. Through this panel, a user can see the distribution of the retrieved patches, and all the training samples. The users can also verify the selected training samples by checking the surrounding context or displaying it like in the bottom of Figure 14(g). Training samples can be selected from any of the three panels.
6.5. System evaluation 6.5.1. Evaluations based on performances We evaluated the system using our TerraSAR-X data set (using the Gabor linear moments algorithm and the cascaded learning approach). The data were divided into data sub-sets, one for each continent. Here, we chose for demonstration our North American data set that contains 10 cities: Tijuana, Tucson, North and South San Diego, Poway, Sun Lakes, San Francisco, Ciudad Juarez, Calgary, and Ottawa.
We evaluated the performance of the KDD/cascaded learning algorithms using this data sub-set in order to verify the reduction of the computational effort. This approach helped reduce the number of training samples (i.e., patches) from one level to another level and finally, at the lowest level, to obtain an accurate semantic annotation. Based on this method, we can also generate a "bag of objects" for subsequent classification activities. Table 4 contains for eight given general categories (Dumitru et al., 2016), for each level, the initial number of patches for each category and, after classification, the number of patches that contain this category being used further for the next level. Based on the cascaded learning algorithm, only the positive patches (this means the patches that contain the selected category we are looking for) are kept and used for annotation. Table 5 shows the percentages of the data kept from the entire data set. For example, for the "Water bodies" category, we are using only 11.99% from the entire amount of patches, while the rest of the patches are assigned to other categories. These "positive" patches are split again, classified, and the residues that do not belong to the desired category are removed (we keep 65.57% of all patches). On the last level, we repeated the previous procedure and we were finally annotating 94.17% of the patches with the category we are looking for. This procedure was repeated for all categories (Dumitru, Schwarz, & Datcu, 2018a). Similar results were obtained for all other data sub-sets.
The performances of the EOLib KDD annotation tool is demonstrated, in Figure 15 above, using five metrics, namely Precision, Recall, Accuracy, F-measure, and Specificity. The definition of each evaluation metric is described in detail in (Manning, Raghavan, &    Schutze, 2008) and (Powers, 2011). For TerraSAR-X, we chose images for which the ground truth already existed (Dumitru, Schwarz, & Datcu, 2018b) and then we computed the evaluation metrics. The primitive features used to characterize the image patches (with a size of 160 × 160 pixels) were Gabor filters with 5 scales and 6 orientations. Table 6 summarizes the results obtained for EOLib. Figure 14: Process of semantic annotation of the category "Mixed urban areas" using the KDD module of EOLib. Figures (e) to (h) are from top to bottom. The positive and negative patches are marked in green and red. The retrieved patches by the KDD are marked in blue, while the patches stored from one level to another are marked with purple. Figure h) shows the outline of the patches in blue for "Mixed urban areas" category and in green for "Sand".

Evaluations based on user feedback
In 2016, an EOLib User Workshop was organized by the German Aerospace Center (DLR) in order to conclude the EOLib project, to evaluate the latest version of the EOLib system, and to collect user feedback and future requirements for the validation of the system. This workshop was attended by users from the European Space Agency, Italy; Airbus Defence and Space, Germany; the National Research Council, Italy; DEIMOS Engenharia, Portugal; Thales Alenia Space, France; and the DLR, Germany. It proposed applications based on TerraSAR-X and other satellite data (e.g., Copernicus).
The first part of the User Workshop was a presentation of the EOLib system, where the users could interactively ask questions and make observations and comments. In the second part, the overall functionalities of the system was demonstrated in order to show the users how the system looks like, and the functions of the different components. The third part was training and a hands-on session.
During the hands-on interactive session, the users could familiarize with the system to define some semantic categories, and to annotate several the images. Here, Knowledge Discovery in Databases and Query by Meta-Data and Semantics having been tested.
In the end, a round table discussion was organized where the users exposed their opinions, comments, and suggestions about the evaluated system. The inputs provided by the users during the user workshop told us that the users were very satisfied with the system and its offered functionality, its response times, and the obtained results. The users also proposed a number of additional applications such as ship detection, coastline detection, rapid mapping, etc. They also addressed a change detection scenario highlighting the locations where changes occurred by using the query component of the system. These aspects were already taken into account in Dumitru et al. (2016Dumitru et al. ( , 2018a, Dumitru, Cui, Faur, and Datcu (2015), etc. The users also suggested to add additional components: the integration of evaluation metrics, and the generation of automated classification maps and statistical analytics. All these observations were taken into account and integrated with the latest version of the system described in (H2020 Candela project, 2019).

Conclusions
In this paper, we introduced the modular EOLib software framework which is capable to perform data mining and knowledge discovery within the TerraSAR-X Payload Ground Segment of the German Aerospace Center. EOLib can serve as a model for the next generation of Image Information Mining systems. The main goal of EOLib is to create a domain-invariant data mining solution, directly attached to the Payload Ground Segment. It supplies the enduser content-enriched imagery, in the form of taxonomies and meta-data.
Each module of EOLib can be used as a standalone application. We described the main components of the modular EOLib framework covering functions such as ingestion and feature extraction from EO imagery, meta-data extraction, semantic definition of the image content based on machine learning and data mining methods, advanced queries of the image archives utilizing content, meta-data and semantic categories, as well as 3-D visualizations of our huge and complex image archives.
EOLib is interfaced and operated in the Multi-Mission Payload Ground Segment (PGS) of DLR's Remote Sensing Data Center. The dataset size shows that the presented software framework is capable of handling the 3 Vs of big data.
In general, the performances of EOLib modules are reliable, however, some water-related categories remain difficult to classify for the KDD. This can be overcome by implementing new feature extraction algorithms and classifiers which better suit this specific class.
The experiments show that for TerraSAR-X images the best feature extraction method is Gabor Linear Moments (e.g., with 5 scales and 6 orientations) for man-made infrastructure categories such as urban and industrial areas, transportation, while for natural categories such as agriculture, forest, vegetation the Adaptive Weber Local Descriptor is the best performer (e.g., with 8 orientations and 18 excitation levels). A comparison between different feature extraction methods is already described in Dumitru et al. (2018b), Dumitru and Datcu (2013), and Cui, Dumitru, and Datcu (2014). For multispectral images the best feature extraction method is the Weber Local Descriptor (e.g., with 8 orientations and 18 excitation levels). For EOLib, the optimal classifier is based on Cascaded Active Learning (Blanchart et al., 2011a).
An extension of EOLib aims at the development of new algorithms and a validation of the system using Copernicus and selected third-party mission data, the demonstration of leading edge concepts, and methods for information content exploration and utilization for Earth observation data, mainly for Sentinel-1, Sentinel-2 but also other third-party Earth observation image data. This new version is used and validated (Dumitru, Schwarz, Castel, Lorenzo, & Datcu, 2019) in the CANDELA project (H2020 Candela project, 2019).