Comparative study of big data of global adakites and mineralization-related granite in the Geza arc metallogenic belt, northwest Yunnan, Southwest China

ABSTRACT The Geza arc is an important part of the Sanjiang tectono-magmatic belt and is a newly discovered copper polymetallic ore concentration area in northwest Yunnan province, Southwest China. The area comprises numerous metal ore deposits, including one super-large deposit, three large deposits, etc. The formation of these deposits was closely related to intermediate–acidic magmatic intrusions. Based on previous studies, the “big data” analysis technique was used for a comparative study of large geochemical datasets of granite related to ore-formation in the Geza porphyry copper deposit and global adakites. As a result, 1313 element combinations and 127,765 overlap ratios were obtained. The results show that the Geza porphyry has similar geochemical characteristics to global adakites (the ratios of REE and Ga to major elements are in the range of global adakites). However, the Cu, Mo, and Zn contents of the porphyry are significantly higher than those of global adakites, and the porphyry may, therefore, represent an end-member of the global range of adakite composition. In addition, the geochemistry of adakites associated with the porphyry copper deposits overlaps in part with that of global adakites, although most of the data lie outside of the range of global adakites (i.e. low Mn/Cu, Sr/Cu, Na/Cu, and Zr/Cu values, and high Th/Cu, Ba/Cu, Na/Mo, Rb/Mo, Th/Mo, Ta/Mo, Ba/Mo, Mn/Zn, and Ba/Zn values). The samples with characteristics that deviate significantly from the geochemistry of global adakites show more advanced mineralization and alteration, and a stronger relationship with Cu and Mo mineralization. The results of geochemical data mining can be used as a prospecting indicator, and provide a new scientific basis for geological prospecting of the deep levels and periphery of the Geza Cu polymetallic ore belt.


Introduction
The Yidun arc is located in the eastern part of the Qinghai-Tibet Plateau and between the Garzê-Litang Fault Zone and the Dege-Zhongdian Block on the southwestern edge of the Yangtze Block. The arc is the product of the Garzê-Litang western orogenic subduction in the southwest Sanjiang Paleo-Tethys. It is an important arc belt for studying the subsidence of the Garzê-Litang ocean basin and closure of the East Tethys, and an important Cu polymetallic ore belt (comprising non-ferrous metal mineral resources) in western China. The westward subduction of the Garzê-Litang oceanic crust commenced in the earliest Early Triassic, followed by westward subduction beneath the Zhongza-Zhongdian microcontinental block during the latest Late Triassic Li, 2007;Liu, Li, Zhang, & Yang, 2017;Li, Yu, & Yin, 2013;Li, Zeng, Hou, & White, 2011;Liu, Li, & Yin, 2012;Yang, Hou, Huang, & Qu, 2002;Zeng, Li, Wang, & Li, 2006).
Frequent magmatic activity has occurred in the Geza area of northwest Yunnan. Large amounts of geochemical data have been collected from magmatic rocks in the area during long-term geological exploration . These rocks have I-type granite geochemistry with an adakitic affinity; however, only some are related to copper-gold mineralization. The large amounts of available data have not yet been effectively exploited and utilized, which limits our assessment of the resource potential of porphyry Cu deposits in the region and thus the efficiency of geological prospecting. Geological big data, including construction of big data-based intelligent metallogentic and identification of prospecting sign. At present, intelligent geology is still at its primary stage. The construction of big data based intelligent metallogentic and mineral prospecting is an important part of intelligent geology (Zhang & Zhou, 2017;Zhou, Chen et al., 2018;. This paper studies the geochemical data (including elements related and unrelated to mineralization) and discusses the reliability of geochemical indicators of ore deposits, thereby providing a scientific basis for the study and prospecting of these deposits.
1.1. Regional geological background 1.1.1. Regional structural evolution The study area is located on the eastern margin of the Dege-Zhongdian Block in the western Garzê-Litang Fault Zone, and is bounded by the NNW-trending Jinsha River Fault Zone ( Figure 1). A carbonate platform developed in the area during the late Paleozoic, in a passive continental margin setting associated with the development of the Garzê-Litang ocean basin. An active continental margin formed along the Garzê-Litang subduction zone during the latest Middle-Late Triassic. The Late Triassic represents the stage of magmatic arc development, which involved deposition of a great thickness of clastic-carbonate-volcanic rocks. These rocks are divided into the Qugasi (T 3 q), Tumugou (T 3 t), and Lamaya (T 3 lm) formations, which comprise mainly sand slate limestone, basaltic andesite to andesite, and dacite, respectively. These Late Triassic formations make up the wall-rock of the intermediate-acidic mineralized porphyry in the region.
Early tensile fractures in the area trend NW-SE. The distribution of Late Triassic intermediate-acid volcanic rocks (related to the shallow porphyry and porphyrite) is controlled by NW-and NE-trending faults. The Jurassic-Cretaceous was marked by intracontinental convergence and post-orogenic extension. Large-scale collision-type intermediate-acidic magmatic intrusions developed during this time, comprising mainly monzogranite and remelted continental crust granite. Magmatic intrusions during the Late Cretaceous were accompanied by Mo, W, and Cu mineralization events.

Regional geochemistry and geophysical features
The Cu content of the Geza intrusive belt is generally higher than that of the host strata. Therefore, there is a high background level of Cu in the areas where magmatic rocks exist, compared with a low background level in strata without magmatic intrusions. The small porphyry intrusions (units γδπ, ηοπ, λπ, etc. in Figure 1) in the larger intermediateacid composite body (unit δομ; Figure 1) have high contents of Cu, W, Mo, Au, and Ag, indicating that the area contains Late Triassic-Late Cretaceous porphyries with favorable ore-bearing properties. These geochemical characteristics are beneficial for the formation of porphyry (or skarn) type copper deposits. The Late Cretaceous biotite monzogranite (unit ηγ, ηγ; Figure 1) is characterized by high background contents of W, Mo, and Cu. Monzonitic granite-porphyry has a W content of 27.35 × 10 −6 and Mo content of 9.4 × 10 −6 . Biotite monzogranite has a W content of 8.87 × 10 −6 and Mo content of 2.66 × 10 −6 . These contents are much higher than those of the host strata, indicating that these magmatic intrusions generated suitable conditions for metallogenesis. Therefore, these rocks may represent the ore-forming source of the Xiuwacu and Relin quartz-vein-type W-Mo polymetallic deposit in the region.
The Geza arc is located near an N-S-trending low-gravity anomaly and lowtempered positive magnetic anomaly, which is consistent with the moderateintensity local anomaly in the intermediate-acid volcanic alteration zone, beaded distribution in the trend. The distribution of Cu polymetallic deposits is characterized by the east-west axial elliptical positive magnetic anomalous (0-100 nT) edge or positive and negative magnetic anomaly transition zones centered on Hongshan. Interpretation of remote sensing images revealed that a line-and-ring structure is developed in the area, comprising medium-and small-scale ring structures, magmatic rings, and volcanic rings overlapping each other, with an approximately N-S-trending beaded distribution. Comprehensive analysis of regional gravity, aeromagnetic, geological, and remote sensing data indicates the occurrence of numerous buried intermediate-acidic magmatic rocks in the low-gravity belt . The Xiuwacu, Relin, Hongshan, Chundu, and A're rocks are distributed in the area (listed from north to south), and are important in exploration for porphyry Cu deposits and skarn-type Cu polymetallic deposits.

Data sources and data volume
Data for all Cenozoic granites (SiO 2 > 56%) were collected from both the GEOROC and PetDB databases, and duplicated data from both databases were deleted (Supplementary Table 1). The sign of adakites is Sr> 400 × 10 −6 , Yb< 2 × 10 −6 , Y < 20 × 10 −6 , and not the Sr/Y ratio commonly used in academia. Because Sr/Y is a derivative symbol, element content is a primary symbol. Most samples of the databases have Y and Yb data, but some samples only have Y but no Yb. Therefore, when selecting a sample, first use Yb as a marker, and then take Y data (Y < 20 × 10 −6 ) for those samples without Yb, thus obtaining a total of 6954 samples of global adakites.

Data filtering principles
① Retain the data with SiO 2 content more than 56% and less than 90%; ② Exclude some samples with abnormally high content of major elements, such as Fe 2 O 3 > 30%, samples with abnormally high content of MnO; and ③ Remove abnormally high data of trace elements (samples are remained).

Confidence ellipses drawing
Confidence ellipses, also known as error ellipses, are geometric representations that intuitively reflect the difference between the estimate and the expected value of a point or average function. Using linear algebra knowlegde, the covariance matrix of the data points is calculated. The eigenvalues and eigenvectors of the matrix can be used to determine the axis lengths (major axis, short axis) and direction of the confidence ellipse, and the center of the ellipse is obtained from the mean value of the data. A confidence ellipse can be drawn after a given confidence interval.

Overlapping ratio calculating
An overlapping ratio represents the overlap degree of two data regions, namely, confidence ellipses, calculated by the Monte Carlo method. We generate a sufficient number of data points (x, y) randomly within the coordinate system, and determine the positions of the points (inside the ellipse or outside the ellipse) based on the confidence ellipse equations, and obtain the overlapping ratio between the ellipses by calculating. This paper uses the Python language to write a program that reads the cleaned data, analyzes the data and plots diagrams, and selects the best discrimination diagrams in all combinations. In constructing a discrimination diagram, in order to enhance the discriminant effect and reduce overlapping and multiple solutions, the ratios of the elements are often used as the horizontal and vertical coordinates of the diagram, such as Ti/Y-Nb/Y, La/Yb-Sc/Ni, etc. (Zhao, Qian, & Huang, 2007). In addition, the value distribution range of each element in the overall data tends to be much larger than the sampling data. In order to make the data relatively concentrated, the logarithm method is usually used to process the original data, and base-10 logarithm is generally taken to calculate the elements ratios .

Data mining process
1) Collect geochemical data of all granites in the GEOROC and PetDB databases; Consult the original literature in the study area and obtain rock geochemical data; 2) Delete the duplicated data from two databases and perform data cleaning. The cleaning process should be as objective and fair as possible; 3) Secondary data cleaning, delete the abnormally high and negative values, excluding the influence of abnormal values on the statistical results; 4) Compute the data of 44 elements in pairs to obtain element combination and overlapping ratios; 6) Protract scatter plots and confidence ellipses to find their correlations; 7) Gain geochemical information through correlativity analysis and get the results from comparative studies .

Results
This study involved comparison of the geochemistry (45 selected major and trace elements) of (1) global adakites and (2)  (2) Although the data of the Geza porphyries overlap with those of global adakites, they plot mainly in one corner of the global adakite field (e.g. SiO 2 /Er-SiO 2 /Ga, Al 2 O 3 /Ho-SiO 2 /Ga, FeO/Pr-SiO 2 /Ga, FeO/Dy-SiO 2 /Ga, Zr/Nd-SiO 2 /Ga, and Sr/Ga-Al 2 O 3 /Er; Figure 3). This indicates that although the correlations between some elements ratios of the Geza porphyries and global adakites are generally similar, the Geza porphyries show less variation in their geochemistry. The geochemistry of the Geza porphyries indicates that they may represent an end-member component of global adakites, and this requires further investigation.

Discussion
Here we discuss the significance of the differences in average trace-element contents of the Geza porphyries and global adakites. Figure 7 shows a spiderdiagram for the Geza adakites normalized by the average global adakite, revealing (1) elements with similar distributions between the Geza porphyries and global adakites; (2) elements with different distributions between the Geza porphyries and global adakites; and (3) elements with anomalous contents (e.g. Cu and Mo) in the Geza porphyries. Cu and Mo are ore-forming elements in the Geza porphyry, and Cu-Mo contents are anomalously high in those Geza porphyries that are most strongly affected by ore-bearing fluids, reflecting ore-related hydrothermal or mineralization alteration. This is the most important prospecting criterion for the Geza porphyry Cu deposit. Elements that belong to the second category are Ba, K, Zn, W, Ga, Sr, and Ti, among which W and Ga exhibit negative anomalies, and the remaining elements belong to the first category and occur at concentrations similar to those of the average global adakite. These element groupings have a number of implications: the first group contains elements with similar concentrations between the porphyries and global adakites, and the third group includes elements that show the greatest differences between the two types of rock, and may therefore be used as a prospecting tool. The main geochemical differences between the Geza porphyries and global adakites are illustrated in Figures 4-6. Figure 8 illustrates three important features: (1) the red data points represent the range of values for global adakites (the dotted line represents the 85% confidence ellipse, so the data points outside of the ellipse may be unreliable); (2) the blue data points represent the Geza porphyries, including altered samples, and many of the data points fall in the range of the 85% confidence ellipse for the adakite data, indicating that most of the Geza porphyry data are reliable; and (3) there is a partial overlap between the Geza porphyries and global adakites. The yellow arrow indicates the trend of alteration and mineralization of the Geza porphyries. In the case of low ore-bearing fluid activity and weak mineralization, the Geza porphyry samples plot within the range of the adakite samples (i.e. feature 3, as outlined above). With increasing intensity of ore-bearing fluid activity and mineralization, the data points of the Geza porphyry deviate increasingly from those of global adakites. Figure 8 also provides information on three "zones" of geochemical distributions, as follows. In Zone 1, the geochemistry of global adakites overlaps with the range of global adakites. In Zone 2, data points of adakites associated with Cupolymetallic mineralization in the Geza porphyries overlap with those of global adakites (Figures 4-6; i.e. there is a close relationship between adakite and the formation of porphyry Cu-Mo-Au deposits). Porphyry associated with porphyry copper deposits also plots in this zone. In Zone 3, the greater the degree to which the data deviate from the global adakite range, the stronger the alteration and mineralization; consequently, these geochemical characteristics can be used in prospecting. The granite samples from the Geza area plot in the lower part of the global adakite field in the Al 2 O 3 /Er versus Sr/Cu diagram ( Figure 8)   O 3 /Er, and MnO/Cu-Al 2 O 3 /Lu (Figure 4(a-f)). In contrast, the granite samples plot in the higher parts of the global adakite field in the scatter diagrams of Ta/Ga-MnO/ Cu, Ta/Ga-U/Cu, Ta/Ga-Th/Cu, Ta/Ga-SiO 2 /Cu, Ta/Ga-CaO/Cu, Ta/Ga-Ba/Cu, Ta/Ga-Na 2 O/Mo, Ta/Ga-Rb/Mo, and Ta/Ga-TiO 2 /Mo (Figure 4(g-l)). Thus, the three zones identified from adakite geochemistry are not necessarily related to mineralization.

Conclusions
(1) Our analysis of very large geochemical datasets ("big data") of porphyries in the Geza arc shows that the porphyries have a positive correlation with global adakites (Figure 2), indicating that they have adakitic properties. The data are concentrated in a corner of the global adakite field (Figure 3), which illustrates the unique geochemical characteristics of the Geza porphyries. This unique geochemistry may have been derived from the source rocks, the degree of partial melting, magma mixing, and/or contamination, and may also reflect the influence of hydrothermal fluid activity on the ore-bearing rocks. Most of the Geza porphyry data plot outside the range of global adakites (Figures 3-5), due to mainly the anomalous contents and ratios of some elements. This enables the identification of ore-bearing porphyries.
(2) Some of the geochemical data of the Geza porphyries overlap with the data of the adakite field (Zone 1), which reflects porphyries unrelated to copper-molybdenum mineralization. Samples that plot in the area of Zone 2 (far from Zones 1 and 3) have been strongly affected by ore-bearing fluids and are closely related to copper- molybdenum mineralization. Samples in Zone 3 are related to copper-molybdenum mineralization. Therefore, geochemical data mining has identified new geochemical signatures that can be used for geological prospecting in deep and peripheral areas of the Geza copper polymetallic deposit.

Data availability statement
The data referred to in this paper is not publicly available at the current time.

Disclosure statement
No potential conflict of interest was reported by the authors.