Analysis of gradient descent ontology iterative algorithm for geological setting

Abstract As a computational tool, ontology has wide applications in information retrieval and other disciplines. Ontology concept similarity calculation is an essential problem in these applications. In this paper, the focus is on the ontology setting such that the ontology function maps every vertex in graph into a real value. And the similarity between vertices is measured by the difference of their corresponding scores. We propose the gradient descent ontology algorithm for this setting, and the learning rates with general convex losses are determined.


Introduction
The term "Ontology" comes from philosophy and used to describe the nature of things. In computer science, ontology is defined as a shared conceptual model, which has been applied in image retrieval, knowledge management, information retrieval search extension, information systems, collaboration, and intelligent information integration. Ontology, as an effective concept semantic model, has been widely employed in many other fields such as education science (Bouzeghoub & Elbyed, 2006), biology medicine (Hu, Dasmahapatra, Lewis, & Shadbolt, 2003), and geography science (Fonseca, Egenhofer, Davis, & Camara, 2001).
Each vertex on an ontology graph represents a concept; each edge on an ontology graph represents a connection between two concepts. Let G be a graph corresponding to ontology O; the aim of ontology similarity measure is to approach a similarity function which maps each vertex to a real number. The similarity between two vertices is measured by the difference of their corresponding scores. Let graphs G 1 , G 2 , … , G k correspond to ontologies O 1 , O 2 , … , O k , respectively, and G = G 1 + G 2 + … + G k . For every vertex v V (G i ), where 1ik, the goal of ontology mapping is to find similarity vertices from G to G i . Thus, the core of ontology mapping problem is just ontology similarity measure.
There are several effective learning tricks for ontology similarity measure. Wang, Gao, Zhang, and Gao (2010) first proposed that ranking learning technology can be employed in ontology similarity calculation. Huang, Xu, Gao, and Jia (2011) presented fast ontology algorithm in order to reduce the time complexity of the algorithm. Gao and Liang (2011) raised that ontology function can be given by optimizing NDCG measure, and applied such idea in physics education.  obtained the ontology function using the regression approach. Huang, Xu, Gao, and Gong (2011) presented ontology algorithm based on half transductive learning. Lan, Xu, and Gao (2012) explored the learning theory approach for ontology similarity computation in a setting when the ontology graph is a tree. Lin, Weige, Xiaozhong, and Wei (2016) proposed new criterion for multi-dividing ontology algorithm from AUC standpoint, which was designed to avoid the choice of loss function. Gao, Gao, and Liang (2013) presented new algorithms for ontology similarity measurement and ontology mapping in terms of harmonic analysis and diffusion regularization on hypergraph. Recently, Gao and Shi (2013) proposed new algorithms for ontology similarity measurement such that the new computational models consider operational cost in the real implement.
Several papers have contributed to the theoretical analysis of different ontology settings. Gao and Xu (2013) investigated the uniform stability of multidividing ontology algorithm and gave the generalization bounds for stable multi-dividing ontology algorithms. Gao, Gao, and Zhang (2012) researched strong and weak stabilities of multi-dividing ontology algorithm. Gao and Xu (2012) learned some characteristics for such ontology algorithm. Gao, Xu, Gan, and Zhou (2014) studied the multi-dividing ontology algorithm from a theoretical view. It is highlighted that empirical multi-dividing ontology model can be expressed as conditional linear statistical, and an approximation result is achieved based on projection method. Gao, Gao, and Liang (2016) presented the characteristics of the best ontology score function among piece constant ontology score functions.

ARTICLE HISTORY
Received 20 september 2016 accepted 2 december 2016 Gao, Gao, Zhang, and Liang (2014) investigated the upper bound and the lower bound minimax learning rate is obtained based on low noise assumptions. Gao, Yan, and Liang (2014) and Yan, Gao, and Li (2013) presented an approach of piecewise constant function approximation for AUC criterion multi-dividing ontology algorithms.
In this paper, we consider the ontology problem using ranking learning technology. The gradient descent ontology algorithm is designed, and the error rates for gradient learning with general convex losses are presented. The organization of this paper is as follows: we introduce the basic setting in Section 2; in Section 3, we present several preliminary lemmas; and in Section 4, we prove the main result in this paper.

Setting
We use a vector to express all the information for each vertex in ontology graph. Then, we say V takes its value in a high dimension feature space. Let V and Y ⊆ ℝ be, respectively, an input and an output space; V is called instance space which instances in V drawn randomly and independently according to some (unknown) distribution ρ. Given a training set it is usually assumed that the ontology algorithm A is symmetric with respect to S. The goal of ontology algorithm is to obtain an ontology (score) function f: V → R, which assigns each vertex a score.
For v, v′ ∊ V, the order of v and v′ under ontology function f should be consistent with their labels. The ontology loss function l also satisfies that l(f, (v, y), (v, y′)) is a non-negative real number and symmetric with respect to (v, y) and (v′, y′).
The quality of ontology function is measured by the expected l-error: But it can't be estimated directly since distribution ρ is unknown. We use empirical l-error to measure ontology algorithm instead: However, the ontology function given by minimum R l (f , S) usually has bad smoothness. It often adds a regularization term to solve this problem, and use regularized empirical l-error with regularization parameter > 0 to measure f: R l (f )} be the collection of optimal ontology functions. Note that the optimal ontology function is not unique. A Mercer kernel is a symmetric and positive semi-definite continuous function K: V × V → ℝ. The RKHS H K associated with the kernel K is defined to be the closure of the linear span of the set of functions The regularized ontology algorithm associated with RKHS can be implemented by The expected version of Equation (1) is The aim of this paper is to present the statistic property of gradient descent ontology algorithm which is to learn an optimal ontology function in ϑ. In terms of the convexity of l, we deduce that its left derivative l � − is well defined and non-decreasing on ℝ.
Let t ∊ ℕ and {η t } be the sequence of step sizes. The stochastic gradient descent ontology algorithm is denoted for the sample S by f S 1 = 0 and Our main tasks are to analyze the errors where C 0 > 0 and q ≥ 1. Denote the constant Following are the main conclusions in our paper.

Theorem 1: Suppose ontology loss function l satisfies (4), and the step size is chosen as
(1) f z, = arg min (2) f = arg min (3) (4) Then, for any 0 < δ < 1, with confidence at least 1 − δ, we get where C is a constant independent of m, and Using Theorem 1, we derive the estimates of inf under some approximation conditions.
Under the condition in Theorem 1, for any 0 < δ < 1, with confidence at least 1 − δ, we obtain where C is a constant independent of m, and

Preliminary lemmas
The main contributions of this section are to deduce some preliminary lemmas for proofing Theorem 1. Our first lemma determines a special character of Since the proof is similar as that in (Ying & Zhou, 2006), we leave the detail proof to the readers.
Lemma 1: Let > 0 and f ∊ H K . We yield By virtue of Lemma 1, we derive the one-step analysis.
Proof: In view of the convexity of l and the Schwartz inequality, we verify and In terms of Lemma 1, we infer By combining (13) and (14) with (12), we get the desired result. Now, we discuss the sample error iteratively by applying (11); its necessary is bound to the quantity φ(S, t). Hence, we bound for the norm of f S t which is required. Suppose that l � − is locally Lipschitz at the origin if the local Lipschitz constant is finite for any > 0.
Lemma 3: Assume that l � − is locally Lipschitz at the origin and t (4 2 M( ) + ) ≤ 1 for each t. Then, Proof: We derive the conclusion by induction.
Obviously, f S 1 = 0 satisfies the bound. We assume that this bound holds for f S t ,  (0)) L ij is a positive linear operator on H K and its norm is bounded by 4 2 M( ).