Space-associated domain adaptation for three-dimensional mineral prospectivity modeling

ABSTRACT Geographical information systems (GIS) are essential tools for mineral prospectivity modeling (MPM). Three-dimensional (3D) MPM is able to learn the association between geological evidence and mineralization in shallow zones and thereby build a prospectivity model for deep zones, making it a desirable technique to target deep-seated orebodies. However, existing 3D MPM methods directly generalize the model learned in shallow zones to the deep zones without attention to model transferability caused by the different metallogenic mechanisms between the two zones. In this study, we aim to robustly transfer the prospectivity model learned from shallow zones to deep zones. We cast the 3D MPM as a domain adaptation problem, which is an important realm of transfer learning. Because the metallogenic mechanism can be closely associated with spatial locations, we specifically focus on domain adaption concerning the spatial locations that are ignored by conventional domain adaptation methods. To measure the spatial-associated domain discrepancy, we propose a novel spatial-associated maximum mean discrepancy (SAMMD), which compares the joint distributions of features and spatial locations across domains. Based on the SAMMD criterion, a deep neural network, referred to as the spatial-associated domain adaptation network, is devised to learn cross-domain but mineralization-indicative features for building prospectivity model that is transferable to deep zones. A case study of the world-class Sanshandao gold deposit, in eastern China, was carried out to validate the effectiveness of the proposed methods. The results show that compared with other leading MPM methods and other domain adaption variants, the proposed method has superior prediction accuracy and targeting efficiency, demonstrating the effectiveness and robustness of the proposed method in targeting deep-seated orebodies in areas with different metallogenic mechanisms and no labeled data.


Introduction
Geographical information systems (GIS) have been widely applied in mineral prospectivity modeling (MPM) (Bonham-Carter 1994;Carranza 2008;Rodriguez-Galiano et al. 2014, 2015).Owing to a powerful ability to acquire, store, process, analyze, and visualize spatial data, GIS workflows are capable of collecting multi-source/scale geosciences data, representing geological objects and evidence, analyzing the geoscientific information, modeling the association to mineralizations, and predicting mineral exploration targets in the MPM task.With the advances of three-dimensional (3D) GIS, especially 3D modeling and 3D spatial analysis techniques, GIS-based 3D MPM has emerged (Deng et al. 2022;Fallara, Legault, and Olivier 2006;Li et al. 2015Li et al. , 2019;;Mao et al. 2019;Wang et al. 2015Wang et al. , 2011;;Xiao et al. 2015;Yuan et al. 2014).GIS-based 3D MPM has recently become a desirable technique for visualization and quantitative targeting of deep-seated mineralization.
GIS-based mineral prospectivity methods can be categorized into two groups: data-driven and knowledge-driven.Knowledge-driven models use subjective evidence based on prior knowledge of processes that may contribute to the formation of mineral deposits under circumstances where few mineral deposits are known to occur (Carranza 2008).On the other hand, data-driven models can be built with less subjective knowledge of geological settings and mineral deposits (Bonham-Carter 1994).Due to the emergence of geosciences data in this big-data era and demand for reduced dependence on subjective experience, data-driven prospectivity methods have become the advocated paradigm in MPM (Carranza andLaborte 2015a, 2015b;McKay and Harris 2016;Xiong and Zuo 2018;Yousefi and Nykänen 2016).
Conventionally, multivariate analysis has been used to find associations between predictor variables and mineralization.However, due to the complicated mineralization processes, associations between geological evidence and mineralization are very complex.Therefore, the machine learning approach is a promising category of data-driven methods to model the complex associations of mineralization.Mainstream machine learning approaches such as support vector machines (Zuo and Carranza 2011), random forests (Carranza and Laborte 2015b;Rodriguez-Galiano et al. 2014, 2015), artificial neural networks (Brown et al. 2000;Chen 2015; Chen and Wu 2017), and ensemble approaches (Wang et al. 2020) are utilized to integrate multi-source geoinformation and build mineral prospectivity models.Recently, deep learning-based methods have been applied to MPM because they can automatically extract highlevel representations directly from data (Zuo 2020;Zuo et al. 2021).A series of advances have evolved mainly in 2D MPM (Li, Qian, and Li 2020;Li et al. 2021;Xiong and Zuo 2018, 2020, 2021;Yang et al. 2021;Zhang et al. 2021).Despite starting relatively late in 3D MPM, Li et al. (2021) first adopted 3D convolutional neural networks (CNN) to learn the 3D evidence layers and extract high-level features.Deng et al. (2022) proposed a 2D CNN method that directly learns high-level features from primitive shape descriptors of 3D geological models.These deep learning methods have achieved outstanding performance in predicting exploration targets.
The specific task of 3D MPMs is to predict the 3D distribution of mineral prospectivity and identify mineralization targets in deep-seated zones.While machine-learning methods have been shown to be effective in the predictive mapping of mineral prospectivity, modeling of mineral prospectivity in deep-seated zones is rather challenging.Due to the differences in temperature, pressure, and rock permeability at different positions, the characteristics of tectonic stress, fluid transport, and mineral localization mechanisms can be different between shallow and deep zones.With increasing prospective depths, such differences would become more significant, consistent with Tobler's first law of geography: 'Near things are more related than distant things (Tobler 1970(Tobler , 2004))'.Additionally, because of the limited exploration depth and the high cost of data acquisition, the geoinformation related to deep-seated zones is generally highly scarce, uncertain and heterogeneous.Directly generalizing the mineralization information captured from shallow zones limits the effectiveness of mineral prospectivity models in deep zones (Cloetingh and Podladchikov 2000).However, existing 3D MPM methods predict mineral prospectivity in deep zones by directly learning and generalizing mineral associations from shallow zones.The differences in metallogenic mechanism and the information discrepancy between the two zones are not typically emphasized.
From a machine learning perspective, 3D MPM targeting deep-seated economic orebodies can be regarded as a domain adaptation problem, an important subcategory of transfer learning (Pan and Yang 2009;Weiss, Khoshgoftaar, and Wang 2016;Zhuang et al. 2020).Domain adaptation seeks to enhance the performance of a target model with inadequate or no labeled data by transferring the features learned from a source domain with sufficient labeled data.The major goal of domain adaptation is to narrow the gap across domains, thereby guaranteeing model performance.A discrepancy-based mechanism is commonly applied to bridge the gap between source and target domains through a latent feature space (Ding and Fu 2017;Gong, Grauman, and Sha 2013;Long et al. 2014;Pan et al. 2011;Sun, Feng, and Saenko 2015;Wang et al. 2018;Zhang, Li, and Ogunbona 2017;Zhang et al. 2013).Recent advances in deep learning shows that deep neural networks ensure domain adaptation models to learn more transferable features (Bousmalis et al. 2016;Long et al. 2015Long et al. , 2016Long et al. , 2017;;Tzeng et al. 2014;Yan et al. 2017;Zhang et al. 2015).For 3D MDM scenario, the shallow zones generally include adequate labeled data, and thus can be regarded as the source domain in domain adaptation.In contrast, the deep zones, whose metallogenic mechanism are probably different, are endowed and with limited or even no labeled data, and thereby correspond to the target domain.Therefore, 3D MPM can be regarded as a typical domain adaptation task.By casting 3D MPM as a domain adaptation problem, it is expected to bridge the discrepancy of metallogenic mechanism and mineralization information between shallow and deep zones.
Even though domain adaptation is reasonable for 3D MPM scenarios, introducing the domain adaptation paradigm to model mineral prospectivity is non-trivial.The spatial variance of metallogenic mechanism follows Tobler's first law of geography, wherein the metallogenic mechanism in deep zones is correlated to that in shallow zones (Tobler 1970(Tobler , 2004)), but the mechanism at nearby locations is more correlated than the distant ones.This suggests that the domain discrepancy in 3D MPM is essentially associated with spatial locations, making the spatial location a determinant in domain adaptation.Existing domain adaptation methods were originally designed for problems involving, for example, images, videos, or text for applications in computer vision and natural language processing.These methods only estimate the domain discrepancy according to the feature discrepancy.To the best of our knowledge, none of them has considered the domain discrepancy across the geographical space and used spatial locations to estimate feature transferability for domain adaptation.
In this study, we introduce a transfer learning paradigm to 3D MPM and present a spatial-associated domain adaptation method for building mineral prospecivity models.Specifically, the spatialassociated maximum mean discrepancy (SAMMD) is derived, which measures kernel mean embedding of the joint distributions of features and spatial locations and enables the estimation of the domain discrepancy over geospace.Using SAMMD, a spatial-associated domain adaptation network is developed, which incorporates SAMMD as the domain adaptation module to the top layers of a convolution neural network (CNN).The network learns mineralization-associated and transferable features across geospace by minimizing the fitness to source data (shallow zones) and the SAMMD between source and target domains (deep zones).The 3D mineral prospectivity model evolves and is developed based on the network.A case study investigating the Sanshandao gold deposit in eastern China demonstrates that the proposed method achieves outstanding results regarding prediction accuracy and target efficiency.

Problem definitions
For a domain adaptation problem in the 3D MPM scenario, we are given a source domain (i.e.shallow zones) of m labeled data (X s , Y s ) := {(x s 1 , y s 1 ), . . ., (x s m , y s m )}, where x s i and y s i denote the features and the ore-bearing label in the source domain, respectively, and the target domain (i.e.deep zones) of n unlabeled data X t := {x t 1 , . . ., x t n }, where x t i denotes the observed features in the target domain.Each source domain and target domain data is associated with the spatial location Z s := {z s 1 , . . ., z s m } and Z t := {z t 1 , . . ., z t n } in source and target domains, respectively.The source and target domain data are represented as marginal distributions P(X s ) and Q(X t ), joint distributions P(X s , Y s ) and Q(X t , Y t ), and conditional distributions P(Y s | X s ) and Q(Y t | X t ), where Y t represents the predicted ore-bearing label in the target domain.Due to the different metallogenic mechanism between source and target domains, we assume P( ).The study aims to learn transferable features to reduce the difference between P(X s , Z s ) and Q(X t , Z t ), so that the prospectivity model can approach Q(Y t | X t ) for the target domain by using the supervised information from the source domain.

Maximum mean discrepancy
Given the information obtained from shallow zones, we can introduce the domain adaptation method in transfer learning to model mineral prospectivity.According to the domain adaptation method, although the source domain data and the target domain data have distinct distributions, the features of the two domains and the correlation between features and labels are related.Thus, the source domain data can be transferred to the learning task of the target domain (Goodchild 2003;Gretton et al. 2012).To build a transferable model across different domains, a crucial issue is to reduce cross-domain discrepancy of features.Maximum mean discrepancy (MMD) is currently one of the most mainstream non-parametric methods to measure the discrepancy between distributions from source and target domains (Gretton et al. 2012;Gretton, Sriperumbudur et al. 2012;Gretton, Sutherland, and Jitkrittum 2019).Given samples from two distributions P and Q, MMD maximizes the two-sample test statistic that determines whether two arbitrary independent samples are drawn from the same distribution.Thus, MMD minimizes the Type-II error (the probability of mistakenly accepting the null hypothesis P = Q when P = Q) in a two-sample test, i.e. the distribution discrepancy d k (P, Q) = 0 if and only if P = Q.
The MMD is defined by introducing a reproducing kernel Hilbert space (RKHS) H of, say, the probability distribution P(x) over the feature space X .H is endowed with a characteristic kernel k(x, x ′ ):X × X R as the family of functions satisfying the reproducing kernel property f (x) = 〈f , k(x, .)〉H .The kernel is positive definite and can be regarded as the inner product of a nonlinear feature map f(x) : 〈f(x), f(x ′ )〉 H = k(x, x ′ ) (Steinwart and Christmann 2008).The probability distribution embedding of P(x) can be extended by defining mean embedding m P [ H for distribution P(x): (1) The probability embedding has the property that E x f = 〈f(x), m P 〉 H is true for any f [ H. MMD is defined as the distance measure in RKHS between probability distribution embeddings.In a scenario of domain adaptation, let x s P and x t Q be the features that follow different probability distributions in the source and target domains, respectively.The square of MMD is expressed as: Suppose m features {x s i } m i=1 from source domain are drawn from distribution P(x) and n features {x t j } n j=1 from target domain are drawn from distribution Q(x).The empirical MMD is defined as: (3) Here, the kernel k(x, x ′ ) is defined as a convex combination of U multivariate isotropic Gaussian kernels x ′ 2 /g u with different bandwidths g u , formulated as: where b u denotes the combination coefficient for k f u .In practice, b u , u = 1, . . ., U, is determined by maximizing the two-sample test statistic and minimizing Type-II error.Interested readers are referred to Gretton, Sriperumbudur et al. (2012) for technical details of MMD.Equation (3) measures the feature discrepancy between the source and target domains.Thus, to build transferable mineral propectivity models, we can extract transferable features by minimizing (3).

Space-Associated maximum mean discrepancy
To consider domain discrepancy associated with spatial locations, the original MMD is generalized by associating features with spatial coordinates.To this end, we assume that the feature x and spatial coordinate z are from space X and Z and follow joint distribution P(x, z) over X and Z.With this setting, the discrepancy between joint distributions P(x, z) and Q(x, z) is measured instead of that between marginal distributions P(x) and Q(x) in the original MMD.Let H f and H c be RKHSs endowed with kernels k f (x, x ′ ) and k c (z, z ′ ) that satisfy reproducing kernel properties for nonlinear mappings f(x) and c(z), respectively: The tensor product of H x and H z is also assumed to be a Hilbert space The tensor product representation leverages the established theories and algorithms for RKHSs.It facilitates efficient computations and enables us to utilize existing tools and techniques for modeling and learning in combining X and Z. Analogous to the mean embedding definition in Equation ( 1), the mean embedding of joint probability distribution P(x, z) is defined as: (5) Let (x s , z s ) P(x, z) and (x t , z t ) Q(x, z) be features from source and target domains that follow joint probability distributions of P(x, z) and Q(x, z), respectively.Based on the mean embedding of joint distributions in Equation ( 5), the SAMMD measures the discrepancy of P(x, z) and Q(x, z) as: The idea of using joint probability distributions for domain adaptation was first introduced in Long et al. ( 2017) and inspired the SAMMD in our method.Given m features and spatial locations {x s i , z s i } m i=1 from the source domain and n features and spatial locations {x s j , z s j } n j=1 from the target domain, an empirical estimate of SAMMD computes the squared population between the empirical kernel mean embedding of P(x, z) and Q(x, z), formulated as: Equation ( 7) indicates that, when introducing the spatial coordinate information, SAMMD uses the 'uneven weight' in contrast to the 'even weight' in MMD.Thus, SAMMD assigns higher importance to features with larger values of k c , which captures the spatial association between samples.Here, we wish that nearby samples be more transferable than distant ones.Thus, we focus more on samples that are spatially proximate to each other in SAMMD.To this end, k c (z i , z j ) is defined to be a monotonically decreasing function related to the spatial distance z i − z j between z i and z j , which expresses the spatial dependent of the two samples.Additionally, we take into account the spatial anisotropy in k c (z i , z j ).Then, k c (z i , z j ) is formulated as: where [ R 3×3 is the positive definite symmetric matrix encoding the spatial anisotropy.Here, the off-diagonal entry of Σ expresses the correlation between the spatial dimensions (Appendix 2).Since each dimension of spatial coordinates is expected to be positively correlated, we constrain Σ to be semi-positive definite: X 0. In 3D MPM, the value of Σ is optimized by minimizing the SAMMD to ensure feature transferability across geospaces.
In the Appendix 1, it is proven that k c (z i , z j ) is equivalent to the variogram of Gaussian models that express spatial variability.Moreover, the singular value decomposition of extracts the major axis directions of spatial variability and the associated variances, which indicates the spatial anisotropy in feature transfer.Therefore, the proposed SAMMD allows interpretability of the model in the spatial dimension.

Spatial-associated domain adaption network
Recent advances in 3D MPM (Deng et al. 2022;Li et al. 2021) show that deep networks can learn high-level features that are associated with mineralization.To further enhance the domain adaptation capability of deep networks, the proposed SAMMD is combined with deep neural networks.This results in a space-associated domain adaptation network for learning transferable features and adapting to mineral prospectivity in deep zones (Figure 1).
The domain adaptation network developed in this study is an extension of the convolutional neural network (CNN) for 3D MPM proposed in our previous work by Deng et al. (2022).The network takes the input of projected images of 3D geological models and learns the mineralization-associated features from the images.The network's backbone is based on the AlexNet architecture (Krizhevsky, Sutskever, and Hinton 2012), which comprises five convolutional layers and three fully-connected (FC) layers.The final fully-connected layer outputs the orebearing probability.Originally, the network F Q with network parameters Θ is trained in a data-driven fashion by minimizing the empirical error between the output mineralization probability ỹ′ i = F Q (x i ) given the input x i and the known mineralization information y i : where m is the number of samples with known mineralization information, and J measures the empirical error, which uses cross entropy loss in our scenario.The minimization of Equation ( 9) results in a CNN that fits known mineralization information and specifies 3D MPM in known areas which are generally in shallow zones.As a result of network training, the learned features, from lower to higher levels of the CNN, transit from generic to specific.In other words, the feature transferability can be substantially decreased in the FC layers on the top of CNN, which has been demonstrated by Yosinski et al. (2014).Thus, for predicting mineral prospectivity in deep zones (target domain) that have no or minimal labeling (mineralization) data, the above CNN architecture can suffer from over-fitting to shallow zones and performance degradation in deep zones.Therefore, in addition to training the network according to the labeled samples of the source domain, the SAMMD is exploited as the regularizer to improve feature transferability in FC layers.
The SAMMD measures both the feature and spatial discrepancy between source and target domains.SAMMD is minimized between FC layer features learned in shallow (source) and deep zones (target domains) to learn transferable features on FC layers of the CNN.This allows the enforcement of the distribution similarity and spatial correlation between the high-level features of CNN, thus ensuring the learned features are adaptive to the two domains.SAMMD is measured over FC layers and combined with the empirical error in the training loss (Equation 9).Then, the network learning problem becomes: where L denotes the set of FC layers, P l and Q l are joint distribution of features and spatial postions for l-th FC layer in source and target domains, d SAMMD (P l , Q l ; Q) is the empirical SAMMD estimated given the network parameters Θ and kernel parameters β and Σ, respectively, and α denotes the weighting.Note that in Equation (10), we maximize the SAMMD loss l[L d SAMMD (P l , Q l ; b, S) with respect to β and Σ, which optimizes the parameters β and Σ such that the two-sample test statistic is maximized.Thus, network learning is equivalent to minimizing the upper bound of the spatial-associated feature discrepancy between source and target domains.With such a training loss, the mineralization data (target labels) in Equation ( 10) can be unknown.Instead, the unlabeled data and spatial coordinates in deep zones in SAMMD are leveraged to guide the network to learn transferable features from the shallow zones.To extract the transferable features across geospace, we achieve domain adaptation by using the SAMMD module, which measures the spatial-associated domain discrepancy between shallow and deep zones according to spatial distance and feature dissimilarity.The transferable features learned by minimizing the SAMMD are finally used to predict the ore-bearing probability for the target voxel.

Model learning
The model learning problem in Equation ( 10) is a constrained minimax optimization problem with enormous numbers of parameters.Directly using network learning algorithms such as stochastic gradient descent to learn the parameters is intractable since the optimization is constrained.Thus, we use an alternating optimization approach that iteratively updates parameters Q, b and Σ to optimize the learning problem in Equation ( 10).In each iteration of the algorithm, every parameter is optimized sequentially by taking the other two variables as constants.
(1) Optimization of Θ.The network parameters are associated with the two terms in Equation ( 10).
It is straightforward to learn Θ in a back-propagation fashion using mini-batch stochastic gradient descent.However, in practice, the pair-wise estimate of the gradient of SAMMD can be time-consuming.Instead, the linear-time unbiased estimate of SAMMD can be adopted (Gretton, Sriperumbudur et al. 2012): We compute the gradients ∂h SAMMD ∂Q l for each tuple ) for l-th FC-layer.Then, in each iteration, Q l is updated via gradient descent according to the gradient: (2) Optimization of β.When fixing Θ and Σ, we follow the idea of Gretton, Sriperumbudur et al. (2012) to find β that maximizes SAMMD loss in Equation (7).To maximize the two-sample test statistic, optimization of β is equivalent to maximizing the population variance of SAMMD.Let h b denote a vector in which each entry is the component in terms of u-th kernel function k fu for representing the kernel k(x, x ′ ) in Equation ( 4): . The variance of SAMMD can be expanded in terms of kernel parameters β as The QP problem can be efficiently solved by methods such as active set, which is endowed with polynomial complexity.
(3) Optimization of Σ.When fixing Θ and β, optimization of Σ is equivalent to maximization of the SAMMD.For simplicity, d SAMMD (S) can be rewritten as: where sign(i, j) indicates the sign, which satisfies sign(i, j) = 1 if samples i and j are from the same domain and sign(i, j) = −1 otherwise.Simply maximizing d SAMMD often leads to a trivial solution of S 1.Thus, values of σ are penalized from becoming too large.Knowing that Σ is a semi-positive definite matrix, the trace of Σ is penalized instead.Therefore, σ is optimized by solving the problem: where tr(•) is the trace of Σ and α is the weight.The derivative of the objective function in Equation ( 16) is then: where Knowing that the derivative of Σ is highly nonlinear, we linearize it by fixing Σ in v ij Equation ( 17) and this implies an update for optimizing Σ, Here, S (t) is the updated Σ in iteration step t.During implementation, the initial value of Σ is set to the identity matrix I 3×3 , and then Σ is updated iteratively by using Equation (18) until convergence.
A complete procedure for our method is summarized in Algorithm 1.The effectiveness of the algorithm is validated by a case study, which will be discussed in the next section.

Study area
A case study in the Sanshandao gold deposit, eastern China (Figure 2), is undertaken to validate the effectiveness of the proposed method for 3D MPM.The Sanshandao gold deposit was chosen for two reasons.Firstly, the deposit is endowed with a 'ladder-like' mineralization pattern with depth (Song et al. 2015).There is a consensus that the metallogenetic mechanism between shallow and deep zones of the deposit is related but different (Bahiru and Woldai 2016;Eldursi et al. 2009;So et al. 1998;Yang et al. 2006).Secondly, the Sanshandao deposit has been explored and mined for four decades, with recent exploration progress exceeding depths of −3, 000 m. Thus, a large amount of legacy exploration data are available as the training/validation data.
The Sanshandao gold deposit is located in the northwest of the Jiabei Uplift, 25 km north of Laizhou city.The orebodies in the Sanshandao gold deposit are located in the footwall of the Sanshandao fault.There is a consensus that the orebodies are closely related to the geometry of the Sanshandao fault (Song et al. 2021;Yan et al. 2018).The main orebodies extend approximately 1,000 m along the strike of the Sanshandao fault, with a dip angle of approximately 40 • and an extended depth of about 1,500 m.In recent years, many deep drill holes in the deposit revealed the alteration and mineralization at −1170 to −3555 m (Figure 3).By including the newly discovered gold resources on the seabed in the north of the mining area, the total resource reserves of the orebodies are as high as 600 t (Song et al. 2015), making it one of the largest gold deposits in China.

Data preparation
The current and legacy exploration datasets for the deposit were retrieved to predict mineral prospectivity for the Sanshandao deposit.The data collected contains 27 cross-sections, 121 drillholes, and 48,119 gold assays (until June 2019).Here, the interval of the parallel cross-sections is generally 100 m.The geological profiles at each cross-section are generated according to drill holes and controlled source audio-frequency magnetotellurics (CSAMT).The gold grades in drillhole samples were determined by fire assay, followed by flame atomic absorption analysis.The cross-section profiles, drillhole data, and the associated gold assay data were digitalized.The digitalized dataset forms the GIS database for the Sanshandao gold deposit.Based on the GIS database, the 3D models of the Sanshandao gold deposit were then reconstructed.
Since the geometry of the Sanshandao fault controls the gold mineralization, a 3D geologic model of the Sansandao fault was constructed to represent its geometry.In the 3D modeling process, 3D contours of the fault surface were delineated according to the cross-section profiles.Then, the geometry between the 3D contours was interpolated using the implicit modeling method (Macêdo, Paulo Gois, and Velho 2011), resulting in the 3D model of the Sanshandao fault (Figure 4 (a)).The mesh of the 3D models were generated using the surface generation package of CGAL Algorithm 1 Framework of domain alg:Framwork adaptation with SAMMD.
Input: The feature representations and space position for source and target domain observation data, (X s , Z s ) and (X t , Z t ); the label for source domain observation data, Y s ; learning rate η; penalty parameter λ; Output: Network parameters Θ, multi-kernel coefficients β and matrix Σ. 1. Initialize Q (0) randomly, b (0) = 1 U 1 T and S (0) = I 3×3 , where 1 is an all-ones vector and I 3×3 is an identity matrix; 2. for epoch t = 1 E do 3.
Update Θ according to the gradient ∂Loss(Q, b, S) ∂Q l in Eq.( 12): 4. Optimize β by solving the QP problem in Equation ( 14): while not converged do 7.
end for (Fabri and Pion 2009) with 30 m resolution for triangular faces.The 3D models of orebodies of the Sanshandao deposit (Figure 4(b)) were generated using a procedure analogous to that of the fault model.Then, the area of the Sanshandao deposit was then partitioned into regular voxels.The size of each voxel is 25 3 m 3 .To reflect the 3D distribution of mineralization, we interpolated gold grades using the Kriging method according to Au assay samples.The key characteristics of the data used in this study are summarized in Table 1.

Mineral prospectivity modeling
To predict the mineral prospectivity in deep-seated areas, space-associated models were built to transfer the association learned in shallow areas of the Sanshandao deposit.The workflow utilized previously by Deng et al. (2022) was used to generate multi-channel images input to the deep neural network (Figure 1).The workflow includes the computations of shape descriptors for the 3D fault model (Figure 5) and the projection of shape descriptors onto multi-channel images (Figure 6) for the voxels.Here, the multi-channel image has a resolution of 224 × 224 and includes 20 channels, i.e. 19 channels projected from shape descriptors as described above and one extra channel representing the surface distance to the voxel.As in Deng et al. (2022), it is justified that these channels can encode the information of geological control from the geological boundary (i.e. the Sanshandao fault) to the locations of the voxels.
To build the mineral prospectivity model in a domain adaptation fashion, the area of the Sanshandao deposit was partitioned into source and target domain according to the data availability of each area.The 'known' area corresponds to voxels inside the orebody model, and the remaining voxels are controlled by drillholes.These known area voxels, 48,119 voxels in the case study, form the source domain examples.According to the cutoff Au grade of 1.0 g/t, every source domain example was labeled as ore-bearing or non-ore-bearing, resulting in 17,822 ore-bearing voxels and 30,297 non-ore-bearing voxels.The known area voxels were partitioned into training and validation sets.8,966 known voxels with depths between −1500 to −3000 m were used for validation.In addition, 31,358 unknown voxels in the deep zones were taken as the target domain examples in the domain adaptation.
The proposed approach was implemented using PyTorch, an open-source Python machine learning library developed by Facebook AI Research (FAIR) (Paszke et al. 2019).In this framework, the space-associated deep adaptation network was trained in an end-to-end manner.RAdam (Rectified Adam) (Liu et al. 2019) was used to calculate gradients of the joint loss function in Equation ( 10), which provides a dynamic heuristic method to provide automated variance decay, thus eliminating the need for manual tuning involved in warm-up process during training.The domain adaptation prospectivity network was trained from scratch rather than pre-trained from large-scale datasets.Dropout regularization was applied to both FC6 and FC7 during training.

Performance assessment
The results were compared with several leading MPM methods and another domain adaption variant to validate the performance of the proposed method.To verify the contribution of domain   adaptation, CNNs (Deng et al. 2022), random forests (Carranza and Laborte 2015b;Xiang et al. 2020) and multi-layer perceptrons (Abedi and Norouzi 2012;Ghezelbash, Maghsoudi, and Carranza 2020)were chosen for comparison.Here, the CNN is equivalent to the backbone of the spatial-associated domain adaptation network in Figure 1.The random forest and the multilayer perceptrons are statistical learning-based methods and associate the hand-crafted predictor variables with mineralization.In random forest and multi-layer perceptrons, the predictor variables are the same as those in Mao et al. (2019).
To specifically verify the contribution of SAMMD, another mineral prospectivity model was built based on domain adaptation.The domain adaptation model is a variant of the network in Figure 1, in which the SAMMD module is replaced with the original MMD proposed by Long et al. (2015).The same setting of source and target domains was used to train the MMD model.
The five methods described above were then compared according to their receiver operating characteristic (ROC) curves (Barreno, Cardenas, and Tygar 2007) and success-rate curves (Agterberg and Bonham-Carter 2005).The ROC curve plots the actual positive rate (TPR) variations against the false positive rate (FPR) at different decision thresholds to distinguish prospective areas.The area under the curve (AUC) method was employed to assess the prediction accuracy of the compared methods.The AUC is the probability that a random positive example is estimated more highly than a random negative example.The success-rate curve plots the proportion of the prospective regions to prospective areas against the proportion of prospective areas to the whole study area at different thresholds to distinguish prospective areas.A higher success rate indicates that, given a certain percentage of perspective areas, more orebodies can be successfully targeted and thus the estimated model has a higher targeting efficiency.Only the examples in the validation set were used to plot ROC curves.
Several interesting characteristics are observed from both AUCs and success rates (Figure 7).Firstly, the deep learning models (CNN, MMD, and SAMMD) achieve solid improvement over statistical learning models that use hand-crafted predictor variables.This outcome indicates that the CNN backbone (Figure 1) can learn features more closely associated with the deep mineralization on top of the network.Deep learning approaches have the advantage of being able to learn complex and abstract features that may not be easily identified by conventional statistical methods.However, they require large-scale training data.Thereby, the transfer learning scheme is advocated to leverage data from other domains to improve the performance of deep learning models.
It is seen that the two deep transfer learning models (SAMMD and MMD) perform better than the vanilla CNN model.This outcome reflects the discrepancy between shallow and deep zones, demonstrating that minimizing the feature discrepancy in the mineral prospectivity model improves the transferability of features to deep mineralization.The domain adaption approaches allow us to transfer information from related domains, which considers domain discrepancy in applications while reducing the amount of data required for training.In 3D MPM, when the domain discrepancy is associated with spatial locations, simply considering the feature discrepancy across domains can be insufficient to effectively quantify domain discrepancy.Notably, the SAMMD model is substantially superior to the original MMD model.In contrast to the original MMD that measures feature transferability only from the feature space, the SAMMD further incorporates the proximity of geospace into the feature discrepancy assessment.The results of SAMMD demonstrate that leveraging the characteristics of spatial correlation and spatial variance can correct the measure of feature transferability.This suggests that minimizing SAMMD in spatial-associated domain adaptation networks can promote positive transfer while suppressing excess transfer in the transfer learning.As indicated from AUCs and success rate curves, the prospectivity model based on SAMMD can be more sensitive to deep mineralization and exclude more false positives in delineating prospective areas, reflecting that SAMMD is crucial to ensure prediction accuracy and targeting efficiency in 3D MPM.Thus, the spatial-associated domain adaptation enables us to use spatial information to guide the feature transfer and boost the performance of 3D MPM.

Interpretation of SAMMD
Theoretical analysis has shown that the risk of the target domain is bounded by the discrepancy between source and target domains (Ben-David et al. 2010;Mansour, Mohri, and Rostamizadeh 2009).To test the risk of the target domain for the three leading models (CNN, MMD and SAMMD), the domain discrepancy and the empirical risk were evaluated.Since the domain discrepancy of joint distributions over feature space and geospace cannot be measured by generic metrics like the A-distance described by Ben-David et al. (2010), the SAMMD defined in Equation ( 7) was used as the cross-domain discrepancy metric.Here, the SAMMD loss was computed at the top layers of network models.On the other hand, the cross-entropy loss for the validation set was employed to measure the empirical risk of the target domain.Figure 8 illustrates the outcomes of SAMMD and cross-entropy data.According to the SAMMD, the features learned by CNN have the highest cross-domain discrepancy, followed by those learned by the MMD model, and the features learned by the SAMMD model yield the smallest metric of cross-domain discrepancy.This demonstrates that deep features learned from shallow zones by CNN cannot be simply generalized to the target domain.While the features learned by the MMD model show a smaller cross-domain discrepancy than the vanilla CNN, the MMD model, as a variant of CNN considering domain discrepancy of individual features, cannot substantially reduce the spatial-associated cross-domain discrepancy of features.In comparison, the features learned by the SAMMD model reduce the cross-domain discrepancy by minimizing the joint distribution discrepancy over features and geospace, suggesting the necessity of introducing spatial factors into transfer learning in 3D MPM.On the other hand, the cross-entropies of CNN and MMD are nearly identical, with the cross-entropy of MMD slightly higher than that of CNN.In contrast, the SAMMD model attains the minimum cross-entropy.The results of cross-entropy suggest that simply minimizing the feature discrepancy between shallow and deep zones cannot yield safely transferable features to indicate the mineral prospectivity for the deep zones (target domains) in 3D MPM.Notably, when minimizing the joint discrepancy over features and geospace, the learned features are considerably more transferable to the deep zones and the empirical risk of the target domain is essentially reduced.Therefore, spatial factors are essential to guide feature transfer from shallow to deep zones in 3D MPM.
In the transfer learning process, the spatial kernel k c defined in Equation ( 8) is optimized to maximize the SAMMD.In particular, the learned parameter Σ reflects spatial anisotropy of feature transferability.As shown in the Appendix 1, the anisotropy can be interpreted by estimating the singular value decomposition (SVD) of Σ.The directional variograms interpreted from Σ show evident spatial anisotropy (Figure 9).The variogram of the major axis has the most extensive range.The major axis roughly coincides with the strike directions of the Sanshandao fault and the orebodies, suggesting the domain discrepancy varies gently along the strike direction of the Sanshandao fault.Since the Sanshandao fault is the ore-controlling structure, the low domain discrepancy along the strike direction of the Sanshandao fault suggests a similar metallogenic mechanism at roughly the same depth along the Sanshandao fault.The direction of the semi-major axis is roughly consistent with the horizontal extending direction of major orebodies at Sanshandao (Song et al. 2015).The variogram range of the semi-major axis is nearly 2/3 of the range for the major axis, suggesting that the domain discrepancy varies moderately along the horizontal extending direction of orebodies.The variogram of minor axis has the smallest range, which is approximately 1/5 of the range for the major axis.This small range highlights the substantial variation of domain discrepancy along the deep direction.

3D predictive mapping and target appraisal
Given the mineral probability predicted by the 3D MPM models, potential areas in the deep zones of the study area can be identified for future exploration.To determine the potential areas, an optimal threshold of probability was chosen to separate high-probability areas from the low-probability background.Following the work of (Chen and Wu 2017;Deng et al. 2022), the probability value with the maximum Youden index (MYI) (Ruopp et al. 2008;Youden 1950) was chosen as the threshold.Table 2 lists the MYIs and corresponding thresholds for the SAMMD model and the counterpart MMD model.The SAMMD model exhibits higher MYIs, consistent with the results of ROC and success rate curves.The thresholds for SAMMD and MMD are both higher than the prior probability (0.3703) of ore-bearing voxels.
Figure 10 shows the potential mineralization areas predicted by MMD and SAMMD models.Both of them identify two potential areas: One is located in the northern segment of the study area, and the other one is located in the middle segment of the study area.In the northern segment, the potential areas identified by the two methods overlap to a large extent.Both of the potential areas are plate-shaped and are likely the NE extension of the orebodies.In the middle segment, the SAMMD and MMD methods both identify a potential area extending from the main orebodies.However, the SAMMD method predicts higher gold prospectivity over a larger potential area in the segment compared to the MMD method.Considering that this shallow zone above this potential area hosts a large amount of gold mineralization and orebodies have been  discovered in other deep zones in the study area, we argue that the high potential area predicted by SAMMD could be the deep extension of shallow orebodies and has a reasonable potential for future exploration.Overall, two potential areas are delineated by the SAMMD model (Figure 11).Target I extends from the northern segment of orebodies in the northwest direction and is located at elevations from −1600 to −2200 m.Target II is sited at elevations from −1000 to −2000 m beneath the middle segment of the orebodies, which is considered to be their deep extension.

Discussion and conclusions
In this paper, we have proposed a spatial-associated domain adaptation method for 3D MPM.Unlike existing methods for 3D MPM, this evolving technique has several distinct advantages.Firstly, the proposed method considers the differences in metallogenic mechanism between the shallow known zones and deep unknown zones.This is achieved by introducing a transfer learning paradigm to 3D MPM and casting the 3D MPM as a domain adaptation problem.This approach allows us to avoid the bias inherent in the mineral prospectivity model constructed solely based on examples from shallow known zones.As a result, it provides a more adaptive model that can effectively prospect deep unknown zones.Secondly and more importantly, the proposed method can bridge the domain discrepancy concerning not only the features but the geospace, which is a major contributing factor to the differences in metallogenic mechanism.This ensures the measure of spatial-associated domain discrepancy in 3D MPM.The resulting workflow prevents the limitation of MPM that solely relies on across-domain features or predictor variables to predict mineral prospectivity.Finally, the proposed method combines the scheme of spatial-associated domain adaptation with a deep learning framework.The model leads to a spatial-associated deep adaption network that learns feature representation 'end-toend'.The learned features are indicative of mineralization and are transferable across the geospace for 3D MPM.As demonstrated in a case study of deep mineral prospectivity modeling, the advantages of the proposed method enable a substantial improvement in the prediction accuracy and targeting efficiency in 3D MPM, providing a novel and effective tool for deep exploration targeting.
While the proposed method addresses the mineral prospectivity modeling problem (i.e.solving the 'where' problem) by assessing the difference of metallogenic mechanism between shallow and deep zones, how to extend the proposed method to achieve quantitative mineral resource estimation (i.e.solving the 'how much' problem) by considering the domain discrepancy between shallow and deep zones is a promising avenue for our future work.To estimate the ore-bearing ratio in deep zones, in particular, we will scale SAMMD and the associated spatial-associated domain adaptation network to the regression task for the prediction of the ore-bearing ratio in deep zones.We anticipate that this will further improve the applicability and reliability of GIS in mineral exploration targeting.

Figure 1 .
Figure 1.Pipeline and architecture of spatial-associated domain adaptation network for 3D MPM.Given the 3D geological model and a target voxel, we generate a multi-channel image for the voxel via descriptor extraction and projection and input it to the deep neural network.The backbone of the network is a convolution neural network consisting of five convolutional layers and three fully-connected layers.The mineralization-associated features are extracted from fully-connected layers for domain adaptation.To extract the transferable features across geospace, we achieve domain adaptation by using the SAMMD module, which measures the spatial-associated domain discrepancy between shallow and deep zones according to spatial distance and feature dissimilarity.The transferable features learned by minimizing the SAMMD are finally used to predict the ore-bearing probability for the target voxel.
where cov(h b ) [ R U×U is the covariance matrix of h b .Thus, optimization of β reduces to the solution of quadratic programming (QP) problem, namely min b T z=1,b≥0 b T cov h b b.

Figure 2 .
Figure 2. Geological map of the Sanshandao gold deposit (adapted from Song et al. 2021).

Figure 5 .
Figure 5. Shape descriptors of Laplace-Beltrami eigenfunctions (a) and surface normals (b) of the Sanshandao fault.
Figure 6.Projection results of the shape descriptors for target voxels: Laplace-Beltrami eigenfunctions (a) and surface normals (b).

Figure 7 .
Figure 7. ROC curves (a) and success-rate curves (b) for the five evaluated prospectivity models.The SAMMD-based model attains the best performance among the five tested methods.

Figure 8 .
Figure 8. Cross entropy and SAMMD loss resulting from CNN, MMD and SAMMD models.

Figure 10 .
Figure 10.The high-potential areas resulting from the SAMMD (a) and the MMD (b) models.

Figure 11 .
Figure 11.Two mineral exploration targets identified by the SAMMD model.

Table 2 .
MYIs and the corresponding thresholds for the SAMMD and the MMD-based models.