RockSL: an integrated rock spectral library for better global shared services

ABSTRACT Spectral data of different rocks and minerals usually show different waveforms and absorption characteristics in visible and infrared wavelengths, which allow identification of mineral species and composition. However, massive spectra of rock/mineral on earth surface were scattered across a variety of spectral libraries worldwide, exhibiting inconsistent data structures and measurement conditions. To advance the data interoperability and the data usability, we collected data and information from six shared libraries with different format and measured field specimen in laboratory to establish an integrated rock spectral library (RockSL). Both the data quality of spectral curves and the integrity of descriptive metadata are considered in the integrated RockSL to be published in GitHub open-source repository. RockSL contains not only the big spectral dataset of rocks and minerals for data service (i.e. data sharing and retrieval) and geological discrimination, but also the characteristics dataset of key parameters/metadata (e.g. particle size, mineral composition and full-band signature, etc.) for exploration of data mining and knowledge discovery. We hope that more researchers will join to improve the availability and practical value of RockSL for remote sensing community. This article introduces the database structure and data processing workflow, and demonstrates a matching service and several examples of characteristic datasets of RockSL.


Introduction
Spectral data is measured using spectral sensors, which record either solar or artificially provided radiation reflected from the surface of materials. Since many materials absorb radiation at specific wavelengths, it is possible to identify material species by the characteristic absorption features, which appear as troughs in a spectral curve (Kruse, 1994). Wavelength ranges most suitable for the discrimination of geological materials include the visible and near-infrared (VNIR, 0.3-1.1 μm), short-wavelength infrared (SWIR, 1.1-2.5 μm) and the mid infrared (MIR, 3-25 μm), while the characteristic fluorescence of hydrocarbons occurs in the ultraviolet (UV) spectral region (van der Meer et al., 2012).
Spectral geology deals with the measurement and analysis of portions of the electromagnetic spectrum to identify spectrally distinct and physically significant features of different rock types, which can be a useful aid for remote sensing image interpretation and mineral compositions discrimination.
Geological spectral data obtained from laboratory, field, airborne and orbital sensors, together with related metadata form a spectral library, which provides compositional standards of importance to geological research programs (Kokaly et al., 2017). The existing spectral libraries worldwide can be divided into universal spectral libraries that emphasize on collecting spectra of various materials (e.g. rock, mineral, vegetable, soil, snow, etc.) to support data matching and land cover discrimination, and specialized spectral libraries that serve in a specific field and pay more attention to the influence of different variables (e.g. particle size, waveband, roughness, observation angle, porosity and chemical composition, etc.) on spectral characteristics (Zhang, Xiao, & Wen et al., 2017;Zhou & Zhou., 2009). In this article, we describe and integrate the representative universal libraries (e.g. USGS, JHU, ASTER, Gospel spectral library, etc.) and the geological spectral libraries covering rock/mineral specimens (e.g. JPL, ASU, Mineral Infrared Spectral Atlas, PDS spectral library, etc.), which are shown in Table 1.
The United States Geological Survey (USGS) spectral library as a widely recognized library was assembled of spectra measured with laboratory field and airborne spectrometers, covering various natural and artificial materials. The USGS spectral library produced generic formats of the spectra and metadata in ASCII file for data dissemination and provides compositional standards of significance for research programs executed by the U.S. Geological Survey (Kokaly et al., 2017, Clark, Swayze, & Gallagher et al., 1993Clark et al., 2007). Johns Hopkins University (JHU) spectral library included bidirectional (biconical) reflectance data of minerals/meteorites and directional hemispherical reflectance of rocks, which pay more attention to data quality for providing standard spectra (Meerdink, Hook, Roberts, & Abbott, 2019). To support the researches of Advanced Spaceborne Thermal Emission Reflection Radiometer (ASTER) providing observations in the visible and near infrared, the shortwave infrared and the thermal infrared band, the ASTER spectral library spectral library was compiled with 2400 spectra of natural and man-made materials, which were acquired by ASTER and contributed from Jet Propulsion Laboratory (JPL) spectral library, JHU library, and USGS spectral library (Baldridge, Hook, Grove, & Rivera, 2009). Ground object background spectral library (Gospel) was the most comprehensive spectral library in China, covering various materials (i.e. rock/mineral, vegetable, water, ice/snow and artificial target) and spectral datasets with characteristics of full-band, multi-scale, multi-angle and time series (Su, Li, Wang, & Tang, 2003;Zhong & Xiao et al., 2020). Contrast to the universal spectral library, the specialized libraries focus on the validation of the effect of physical properties, chemical compositions, measurement parameters on sample spectrum. JPL established three spectral libraries on ground-object reflectance according to three particle sizes (i.e. 125-500 nm, 45-125 nm and less than 45 nm) to reflect the influence of particle size on spectral reflectance (Grove, Hook, & Paylor, 1992;Kahle & Goetz, 1981). Mineral Infrared Spectral Atlas (MISA) was measured and established by the Chinese Academy of Science, containing VIS/SWIR/MIR data (0.3-5 μm) for deep mining of infrared information (Peng & Liu, 1982), while the Arizona State University (ASU) thermal infrared spectral library collected emissivity of terrestrial materials in the 8-14 μm atmospheric window to explore the function of kinetic temperature and spectral emissivity (Christensen et al., 2000). The Planetary Data System (PDS) was a compilation of laboratory spectra of the Earth, the Moon and some meteorites in the visible, nearinfrared and mid-infrared ranges, which was submitted by various data providers (e.g. RELAB, Janice Bishop, JMUSTARD, RVMORRIS and TLROUSH Spectral Library) for providing a basic data platform for related researchers in planetary geological field (Pelkey, Mustard, & Murchie et al., 2007).
However, the spectra of rocks/minerals stored in different libraries exhibited inconsistent data structure and shared format. The accessible spectral libraries were organized in folder system (e.g. USGS, JPL) or relational database (e.g. SPECCHIO), which provide spectral data and associated metadata in ASCII or image format, which make it difficult for users to compare and analyze the data, resulting in low data interoperability and uneasy utilization (Xie, Zhou, & Wu, 2020). It's important to noted that the spectral libraries established by diverse organizations were usually based on local spectra but not global scale. The spectral signatures of minerals obtained from shared spectral libraries, such as USGS, JPL and JHU, are sampled in American, neglecting general applicability for global research like geological (Shanshan, Kefa, Nannan, & Wang, 2014;Stelle, Ariza-López, & Ureña-Cámara, 2018;Vignesh & Kiran, 2020). Besides, the respective spectral libraries mentioned above organized data in different ways lacking of a common model of metadata, which made less effort on the interoperability and interpretation of ancillary data (Hueni, Nieke, Schopfer, Kneubühler, & Itten, 2009;Stelle et al., 2018). To improve mineral/rock spectral data to be FAIR (Findable, Accessible, Interoperable and Reusable) (Rybkina et al., 2018), we unified the data structure and provided a homogeneous framework of metadata to produce a comprehensively integrated spectral library in a global scale.
Generally, RockSL has realized globally the semantic unification and established the standardized metadata space of the rocks/minerals spectral data, which provides the data repository for global or regional geological mapping and field surveying, and characteristic datasets of key parameter (e.g. particle size, wavelength range, etc.) for spectral analysis and comparison. We hope that RockSL as an integrated spectral library can provide a unified data access for researchers with better data consistency to avoid unnecessary redundant measurement or cumbersome query work, and enriches the content of global geographic information resource service products.

Data sources
The shared spectral data acquired from universal and specialized libraries, and field sampling data were integrated and realized in relational database management systems (i.e. SQL Server software). The query language (SQL) allows to manipulate and query the spectral data in RockSL. The shared data was mainly downloaded from USGS, JHU, JPL, PDS, ASU spectral library and MISA, which existed as ASCII, HDF or image files ( Table 1). The constituent data was distributed and stored in Chinese or English, behaving different format and related parameters (Xie et al., 2020). Besides, the sampling data measured in laboratory are also the important portion of RockSL, which consist of the spectral signature and related metadata (e.g. physical and chemical attribute of rock/mineral samples, measurement conditions, spatial/temporal information of sampling process). The specimens including magmatic, metamorphic, sedimentary rocks, iron and coal minerals were collected from the central and northern region in China. The processing workflow of field spectroscopy includes mineral composition analysis, sample preparation, spectral measurement and data storage. The mineral composition and content of the specimens were analyzed through thin section identification by Axioskop40 microscope. After crushing and grinding, specimens with different particle sizes were tested by SVC HR-1024i Field Spectroradiometer (0.35-2.5 μm) and 102F Fourier transform infrared (FTIR) spectroradiometer (0.4-20 μm) (Song, Liu, Yu, Mao, & Wu, 2017;Wang, Liu, Mao, Wang, & Tian-Zi, 2018). By means of data integration, RockSL provides digitized spectral data in a uniform format, as well as data in the original format downloaded from each spectral library for further research and application.

Metadata of spectra collections
The reflection spectrum of rock/mineral is mainly affected by its chemical composition, mineral purity and crystal structure. In addition to intrinsic factors, spectral data is affected by many variation factors such as sample granularity, roughness, observation method and sample compositions, which were expressed as metadata of spectral data. Metadata as a central component in the quality and reliability of spectral data contains further information of the sampling environment and measurement conditions, which is important to support the explanation of scientific data and ensure long-term data usability and exchange (Michener & Brunt, 2009;Rasaiah, Malthus, Jones, & Bellman, 2012). It was verified that there has been less effort on providing a standard metadata model to facilitate spectral data interoperability. In order to improve the universality and accuracy of the metadata, we referred to the main documents from International Organization for Standardization (ISO) and Quality Assurance Framework for Earth Observation (QA4EO). The work of some peers including the general proposal endorsed by the Committee on Earth Observation Satellites (CEOS) and investigated protocols for recording metadata in field spectroscopy were identified in an international experiment (Rasaiah, Jones, Bellman, Malthus, & Hueni, 2015;Stelle et al., 2018). The metadata of a spectral resource can be categorized into four types of variables: quantitative (e.g. sampling position, measurement angle, particle size), categorical/qualitative (e.g. specimen species), alphanumeric string (e.g. specimen description) and pictorial types (e.g. the target images) (Hueni et al., 2009). Based on the characteristics of shared data and sampled data, we established the customized metadata spaces of RockSL (Table 2), which help users to retrieve target data quickly and analyze the intrinsic laws of spectral data. It is noted that data record was rated according to the integrity of related metadata. Data with higher integrity (i.e. higher

Categorical
Data feature The type of spectral data based on a customized classifier Quantitative data level) can provide a more reliable reference for users to discriminate accurately unknown objects and could be applied for deeper analysis to explore the internal relationship between parameters and data.

Data structure
The RockSL data model used in the core and basis of database system consists of data structure, data manipulation and integrity constrain, which refers to the static characteristics (e.g. data type, content and relationship), dynamic behavior (e.g. data retrieval, modification) and constraint conditions of relational data tables. We considered the performance of data structure to reduce the data redundancy and designed appropriate constraints to ensure the consistency and correctness of data storage. Relational model is the mainstream database structure at present, which was applied to RockSL. The data structure of RockSL (shown in Figure 1) contained several relational data tables mainly used to save reference spectral data with related parameters, attribute data, classification code of rocks and minerals, and the specific information of spectrometers. The relational table of rock/mineral code as the main relational table was designed to store sematic and classification contents for data consolidation from diverse regions, and to facilitate rapid retrieval. The database structure of RockSL is in the third normal form (3NF), which contains no data redundancies (Mcfadden & Hoffer, 1988). Besides, the referential integrity between tables was established by the primary key and foreign key, which guarantee data consistency of the associated tables. It's important to note that the spectral data was stored in the relational table of data pre-storage before data cleaning, transformation and assessment.

Data processing
The processing workflow of RockSL including data acquisition, transformation, quality control, data storage is described in Figure 2. The shared libraries distributed spectral data with varying formats (e.g. ASCII, image file, etc.). The spectral data and related metadata in ASCII format can be extracted and imported directly into RockSL. However, some spectral libraries (e.g. MISA) stored spectral data in the form of curve images instead of providing directly digital spectral curve. The thinning and non-thinning algorithms illustrated in our previous work (Xie et al., 2020) can effectively realize the extraction of digital curves from image files ( Figure 3). The methods of data acquisition make it possible to establish a more digital and consistent dataset. Before importing the spectral curves and its descriptive parameters into RockSL, the relevant information of target data in the relational tables (i.e. mineral/rock code, mineral/rock type and instrument information) of RockSL should be completed. Then the target curve of different measurement unit (e.g. reflectance, emissivity) and wavelength unit (e.g. nanometer, micrometer and wavenumber) was transformed to a unified format based on wavelength unit conversion and Kirchhoff's thermal radiation Law (Reflectivity = 1 -Emissivity). The reflectance spectral curve with nanometer unit was used as a standard format for RockSL. The spectral data stored in the relational table of data pre-storage was screened through different methods of data quality check and assessed for metadata integrity comparing with the metadata space (Table 2). After data cleaning and evaluation explained in Section 4, the target data was imported to the final table of spectral data from the pre-storage table.

Data record and application
All minerals are classified and coded based on its chemical composition and crystal structure (i.e. crystallochemical principle), which was logical and widely accepted. According to the principle, the minerals/rocks in RockSL are firstly divided into diverse parent nodes according to its dominant anion or anionic group, which includes elements, sulfides, oxides and hydroxides, halides, silicates, other oxygen-containing salt minerals, mixed mineral (e.g. igneous, sedimentary and metamorphic rock) and energy resources (e.g. coal). The parent nodes are subdivided in part based on its compositions but mainly according to intrinsic structure (e.g. the child nodes of silicates include framework, chain, ring, Island and sheet silicates). The RockSL following the classification principle mentioned above contains more than 130 rock/mineral groups, about 200 sample types and 3000 spectral data records. The main components of the RockSL are silicate (more than 1800 records), other oxygenated salts (nearly 400 records), oxides and hydroxides (more than 200 records) and rock samples (more than 200 records). The massive spectral data of RockSL covering various types and complete ancillary data provide great support to spectral matching and easy comparison for unknown rock/mineral discrimination. More types of spectral data will be collected from other available source and laboratory measurement later to improve the applicability in mineral identification. The spectral dataset of characteristics was extracted from big dataset on based the metadata selection. Generally, RockSL provided not only big dataset for data matching service, but also multiple spectral datasets of characteristics for knowledge discovery.

Matching service
Spectral matching means to compare the spectrum of unknown samples with reference spectra to identify sample category, which put emphasis on mathematical algorithms and reference spectral library. The algorithms are designed to reduce data noise and improve matching efficiency, which consist of preprocessing algorithms and spectral matching methods. Some researchers have studied various spectral preprocessing methods (e.g. spectral differentiation, continuum removal, etc.) to enhance effectively absorption valley characteristics and improve prediction accuracy of spectral matching model (van der Meer, 2004;Whitaker & Pigford, 1960). Since the RockSL contains spectra with different measurement information (i.e. spectral range and spectral resolution), it is necessary to automatically resample the reference spectrum according to the target spectral information in the matching process. The matching algorithms are executed by calculating the similarity between unknown spectrum and reference spectrum based on complete waveforms or some spectral characteristics, which includes Binary Encoding (BE), Spectral Angle Matching (SAM), Spectral Correlation Fitting (SCF), Spectral Information Divergence (SID), and so on (Goetz, Vane, Solomon, & Rock, 1985;Kruse et al., 1993;Mcglone & Shufelt, 1994;Noronha & Nevatia, 2001). It is noted that we consider also the proportion of the matching range to the target spectral range as a key point to evaluate the matching reliability. The spectral matching service was embedded into the operating system of RockSL, which contained preprocessing methods (i.e. first-order and second-order differentiation, and continuum removal) and matching algorithms (i.e. BE, SAM, SCF, SID and SID_SA). The quartz specimen was observed by portable FTIR spectroradiometer ranging from 0.2 to 20 µm in laboratory. The sampling data in the thermal infrared band range (8-14 μm) of better quality was selected to compared with reference spectra in RockSL. We applied various preprocessing and matching algorithms to carry out matching service, and listed three reference spectrums with high matching similarity (Table 3). The customized algorithm (BC_SA_SID_SCF) calculating the total score (e.g. sum of similarity coefficients) through four matching methods (BE, SAM, SID and SCF) was applied in operating module to achieve unknown object discrimination (Figure 4). The matching results demonstrate that RockSL can be used to identify accurately and effectively unknown minerals. The target spectrum has high matching similarity with the reference spectra (quartz) from different libraries (i.e. ASU and PDS spectral library) with different measurement parameters, which undoubtedly improves the reliability of recognition result.

Spectral datasets
Spectra of minerals and rocks are affected by chemical composition, observation geometry (e.g. observed angle, distance, etc.), and surface morphology (e.g. particle size, roughness, etc.). The chemical composition of samples not only causes the change of spectral reflection intensity, band position and absorption depth, but also causes the emergence of new characteristic bands due to the generation of new ions in rocks and minerals. In addition, the reflection spectrum is also changed with external environment and surface characteristics, which manifested as the change of reflectance value, absorption width, absorption depth and the shape of spectral curve.
The affected factors mentioned above were recorded in metadata space (Table 2), which support the retrieval of target data and spectral analysis of characteristics. The RockSL of shared data and sampling data was refined to obtain data collections of characteristics by projecting metadata space to a subspace ( Figure 5). These characteristic datasets of minerals and related metadata can be established using Structure Query Table 3. Matching results of spectral data of quartz specimen and reference spectra in RockSL. Language (SQL), which are illustrated in Table 4. Three refined datasets were selected as examples based on the key parameters (i.e. particle size, wavelength range, chemical composition), which are elaborated in detail below.

Particle size
Particle size is an important variable of surface morphology affecting the spectral reflection characteristics of rocks and minerals, and the influence of particle size on the spectral characteristics depends on the type of specimen. Yang (1987) found that the reflection  Particle size Same sample with different particle size spectrum characteristics of rock/mineral are related to surface states and some bands are more sensitive to the influence of different particle sizes. Okin and Painter (2004) studied the relationship between the reflectance spectra and particle size of montmorillonite and quartz, and found that the reflectance gradually increased with the decrease of particle size in the particle size range (50-750 μm). Ma and Sheng et al. (2015) proposed that the sample spectrum is less affected by particle size when the particle size is greater than 1 mm (threshold value). In the VNIR-SWIR region, the reflectivity of most rocks increases with the decrease of particle size, while the reflectivity of black (dark) rocks decreases with the size of finer particles. In thermal region, the reflectance decreases with decreasing particle size, according to the fact that the spectral contrast of fundamental molecular vibration bands appears to decrease with decreasing particle size (Salisbury & Eastes, 1985). Smaller particle size of the sample reduces the porosity and increases the number of particles in the field of view, which leads to the change of reflection value or absorption value. Since the relationship between particle size and spectral signature is highly dependent on rock mineral composition and spectral region, it is necessary for researchers to use comprehensive datasets to explore its universal laws.
In this article, we choose the spectral data of iron mineral samples with different particle sizes in RockSL as example demonstration, which was attributed by collaborative team from Northeastern University (Wang et al., 2018). The comparison result shows that the effect of particle size on the reflection spectrum of hematite behaves significantly different ( Figure 6). The spectral reflectance decreases with the increase of particle size (ranges from 0.03 mm to 1 mm), which shows a significant negative correlation between reflectance value and particle size (Wang et al., 2018). When the particle size of the sample is greater than 1 mm, the influence of particle size on the reflection spectrum is much weak. Besides, the effect of particle size on different wave bands can be divided to the stable band (i.e. 350-950 nm) and sensitive band (i.e. 950-1250 nm). The reflection spectra of hematite are overlapping when the particle size of hematite sample is greater than 1 mm, which indicates the effect of particle size is significantly weakened.

Full bands
Different bands have different response mechanisms to the groups and ions of minerals. The visible and near-infrared (0.3-1.1 μm) band is mainly used to detect the electronic processes of some metal ions while the short-wave infrared (0.3-1.1 μm) band and the mid-thermal infrared (2.5-14 μm) region is mainly used to detect the molecular vibration of water-containing hydroxyl minerals and hydroxyl-free minerals (e.g. carbonates, silicates), respectively (van der Meer et al., 2012).
Both the universal spectral libraries and specialized spectral libraries focused on the spectral information of disparate bands and measured the sample spectra with different spectral resolution. The spectral data downloaded from the USGS spectral library was measured by different instruments, which are provided in four bands of 0.2-3.0 μm, 1.5-6.0 μm, 5-25 μm and 25-200 μm, respectively. The data stored in JPL spectral library ranges from 0.4 to 2.5 μm with spectral resolution of 1 nm (0.4-0.8 μm) and 4 nm (0.8-2.5 μm). The spectral data stored in John Hopkins University (JHU) spectral library consisted of rock spectra recorded from 2.08 to 25 μm, and mineral spectra recorded from 0.4 to 14 μm. PDS Spectral Library is a multisource contribution of several spectral library (e.g. CRISM, Janice Bishop, JMUSTARD, RVMORRIS and TLROUSH Spectral Library), in which the spectral data ranges from 0.3 to 26 μm. Arizona State University (ASU) thermal infrared spectral library acquired emissivity of terrestrial materials in the 8-14 μm atmospheric window. The MISA containing 583 images of mineral spectra ranges from 0.25 to 5.0 μm. The data measured by FTIR spectroradiometer in laboratory ranges from 0.4 to 20 μm with spectral resolution of 6 cm −1 (wavenumber). The shared data and measured data can be integrated to design approximately a full-band dataset for exploring the complete feature combination of rock and mineral spectra. The spectra of calcite extracted from the full-band dataset are shown in Figure 7. The calcite as carbonate minerals with the chemical composition (CaCO 3 ) have diagnostic absorption features of the VNIR, MIR and FIR regions because of the combinational and harmonic bands of vibrations of the bond C-O in the ion composition CO À 2 3 (Gupta, 2003;Hunt & Salisbury, 1971). The calcite spectrum indicates the presence of prominent spectral absorption features in the wavelength ranges of 2.50-2.55 µm (4000-3922 cm −1 ) and 2.30-2.35 µm (4348-4255 cm −1 ) in the VNIR (Clark et al., 1990;Gaffey, 1986), around 13.70-14.04 µm (730-712 cm −1 ) and 11.19-11.40 µm (894-877 cm −1 ) in the MIR (Lane & Christensen, 1997), and two strong separate absorption at around 110 cm −1 and 228 cm −1 in the FIR (Farmer, 1974;Legodi, Waal, & Potgieter, 2001).

Mineral composition
Material composition is an intrinsic factor affecting rock spectrum. Mineral spectrum depends on three basic characteristics of mineral composition including chemical composition elements, the spatial geometry or structure of atoms and the strength of interatomic forces. Different mineral contents and mineral assemblages indicate  Figure 6. The reflectance spectra of hematite with different particle sizes. Spectral data from NEU measured in laboratory (Wang et al., 2018). a unique set of spectral features. The establishment of spectral dataset on mineral composition and content can be used to explore and verify the relationship between reflectivity, characteristic band intensity and mineral materials.
To recognize the impact of mineral content and chemical composition, the calcite specimen mixed with kaolinite, montmorillonite and dolomite is selected for example from the characteristic dataset of mineral composition (Figure 8). The graph shows that mineral composition has a significant influence on the spectral characteristic spectrum segment, which is reflected in the prominent absorption peak at around 1.4 µm due to the vibration of OH-groups in silicate minerals (i.e. kaolinite, montmorillonite) relative to dolomite (CaMg(CO 3 ) 2 ) (Hunt & Salisbury, 1970). The spectrum curve of calcite-montmorillonite has a deep and wide absorption band at around 1.9 µm because of the interlayer water by comparing the spectrum curves of calcitekaolinite and calcite-montmorillonite. In addition, the calcite-kaolinite sample with higher calcite content has wider and deeper absorption characteristics at 2.34 and 2.5 µm.

Technical validation
The process of quality control was designed for evaluating spectral data in two dimensions: data integrity and validity. Data integrity requires the completeness of metadata, and data validity measures the justifiability of the spectral signature. The methods of quality control were designed for single and multiple spectral curves, which was illustrated briefly in a previous work (Xie et al., 2020). The data with poor quality was eliminated beforehand, and the qualified data was imported into RockSL after data checking. To evaluate data integrity and improve data reusability, each record in RockSL is rated referring to the metadata model (Table 2). Records lacking common basic parameter were rated as medium, and records lacking common basic parameter and measurement information were rated as low.
From the perspective of data validity, semi-automatic validation workflow was designed to evaluate data quality. The thresholds of reflectance or emissivity (i.e. 0 and 1.0) and boxplot algorithm were applied to detect outliers of single spectral curve automatically. The reliability and availability of data will be greatly reduced when the number of outliers exceeds a certain proportion, which can be evaluated through boxplot algorithm. The boxplot algorithm can display the distribution of a series of points and found the error outliers through using statistics (e.g. median and quartile). The formula of boxplot algorithm is as follows: where Q 2 = the value at 25th percentile and Q 3 = the value at 75th percentile. For example, the spectral dataset of hematite downloaded from RVMORRIS spectral library (Lane, Morris, Mertzman, & Christensen, 2002) was tested based on the boxplot algorithm to show the distribution of spectral curves (Figure 9). The unqualified spectral data was deemed unavailable when the proportion of abnormal points exceeds the threshold (set to 0.4). However, due to the value of the published data, the abnormal data was evaluated again by visual observation to avoid incorrect culling of data. The semi-automatic validation workflow can ensure the quality of the integrated dataset to a certain extent.
To ensure stability and accuracy of spectral sampling data, target specimen is repeatedly measured by the same spectrometer. The curves with obvious anomalies in multiple groups of similar data can be accurately identified by manual observation, which is timeconsuming. The important indicators (Xie et al., 2020) were designed to evaluate automatically data quality of multiply similar spectral curves observed in our laboratory including (1) the accuracy of internal conformity indicating the deviation between target spectrum and average spectrum, and (2) the position offset of main absorption peak. The spectral sampling data contributed by NEU have been successfully applied to mine area monitoring and coal extraction (Mao et al., 2014). The formula of internal conformity is shown as ε ¼ � ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P m j¼1 ð P n i¼1 δ 2 ij Þ m � n s (4) whereF ij = the observed reflectivity at the corresponding point; m= the number of repeated observations; and n= the number of curve points involved in the calculation. Multiple similar spectra (i.e. the spectra of monzonite specimen measured by 102F FTIR) and a noise spectrum were tested to the accuracy of internal conformity (Figure 10).

Usage notes
As stated earlier, shared data with diverse format (i.e. ASCII and image files) from several representative spectral libraries (i.e. USGS, JHU, JPL, ASU, PDS and MISA) and sampling spectra of rock/mineral were checked and collocated to construct a comprehensive spectral library with more digital and better consistent format. The spectral data was stored in a corresponding relational table in the form of row record. An integrated rock spectral library (RockSL) contains not only the big spectral dataset for unified access, quick retrieval and matching service, but also the characteristics dataset of key parameters/metadata (e.g. particle size, mineral composition and full-band, etc.) for data mining and knowledge discovery. An operating software system of RockSL was developed being capable of data management, information retrieval and user application, which could be used for the import of open-source datasets with related metadata, the vectorization of spectral curve images, the query of attribute data and spectral data, the quantitative analysis of spectral data and the matching of unknown mineral and rock spectra (Figure 4). The spectral data in RockSL will be published as a database file (.db), which includes several relational data tables of rock/mineral spectral data and related auxiliary data. The database file can be managed and operated through SQL, which help users to obtain quickly target data and establish a complete dataset according to their own needs. Users can develop also relevant data management and analysis software by themselves based on the shared database file. The integrated library product (RockSL), together with operating software, was released and is shared on GitHub (https://github.com/CSU-PCP-XBS/spectral-dataset-RockSL). GitHub is a hosting platform for open-source and proprietary software projects, which help servers to share data and codes. Publishing RockSL on GitHub will facilitate data release and sharing to communities, which allows users to follow data update and feedback on problems with data services. We hope that more researchers can join us and contribute customized data to RockSL to improve the availability and practical value of data for global communities.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was jointly supported by the Key Program of National Nature Science Foundation of China

Notes on contributors
Busheng Xie is a Ph.D. student in Geo-science and Info-Physics school of Central South University. He obtained his bachelor's degree in Geomatics Engineering from Central South University in China. His current research focuses on data organization, analysis and visualization of hyperspectral remote sensing and geo-science.