University of Birmingham An Ontology-based Intelligent Data Query System in Manufacturing Networks

This paper investigates the development of an intelligent data query framework through the use of semantic web technologies for manufacturing purposes. The primary objectives of the ontology-based data query were to develop an efficient and scalable data interoperability and retrieval system; in order to find the most relevant query results with minimum message cost, most hits per query and least response time. This document explains the idea of ontology and the application of the same in the manufacturing domain. A computer simulation software was developed based on a real case study of distributed networks of manufacturing workshops. In this research, a semantic query algorithm was developed where query results are returned by investigating the semantic richness of each workshop. Results were compared with a semantic-free search mechanism based on key performance indicators. The results show the validity of the proposed model for efficient data query when compared to random search.


Introduction
Evolution of the web from the prevailing web technologies and web services to the semantic web technologies and semantic web services is rapidly advancing.This is as a consequence of the huge amount of data available, the increasing concern of missing out on using valuable data and the cost of using wrong data for decision making (Bikakis, & Sellis, 2016;Esposito et al., 2015).The distinction between data and information is important.Data can be seen as raw, unstructured facts that seem random and useless until organised and information is organised data presented in a given context (Chen et al., 2014).Although missing data (non-response) can be ignored resulting uncertainty in any system, it should also be considered as an Informative output.It can be argued that a failure by some entities of the system to respond to some queries can also be considered as a response as it provides us with some information.
Therefore, in such an example, an absence of data provides information (informative non-response), and this is important in instances where there is minimal data.Therefore, data has potential for a variety of valuable but also erroneous informative outputs, while information is limited to one informative output.Manufacturing currently uses a mixture of data and information infrastructures such as databases of raw data collected from its internal activities and a web of information for business-to-business information transfer along the supply chain.
The impact of the quality of an organization's data on their revenue is addressed by various research studies.According to an EBiz article, there is a dramatic 40% growth in business data each year, while approximately 20% of that data is irrelevant and useless (Tibbetts, 2012).On the other hand, 70% more revenue can be generated by an organization purely based on the quality of its data (SiriusDecisions, 2015).To quantify the cost of data quality, the sales and marketing research firm SiriusDecisions proposes the 1-10-100 rule as a rule of thumb.The rule states that the verification of data as it is entered would cost $1, data cleansing and removal of duplicates would cost $10 and the cost of doing nothing about the data would be $100 (SiriusDecisions, 2015).The development in semantic web technologies such as Resource Description Framework (RDF) instead of traditional Hypertext Markup Language (HTML) allows data to be linked instead of linking information.There is a trend in intelligent systems for embedding RDF descriptions inside HTML pages through RDF attributes (RDFa).The embedded RDF model inside an HTML web page enables automated knowledge discovery and inferencing for intelligent agents (machines).The relationship between data is described by the RDF and the description is stored within a knowledge-base instead of a database.This knowledge-base has an ontology at its foundation.When data is stored with respect to an ontology, it impacts positively on data re-usability and maintenance.
Furthermore, the ontology enables semantic web services to be developed.Given a simple example of an online query, the traditional web services will support and search only for semantic-free keywords.A semantic web service version of this would be a unified and inferred list of related semantics.Likewise, in manufacturing context, the semantic web services can increase the visibility of good data in an automated way.
The semantic web and its services are scalable to accommodate virtually any knowledge in the manufacturing environment, be it predictive production scheduling, production control or design specifications.
On the other hand, there is a trend for manufacturing companies to shift from productoriented production to user-oriented production (Pine, 1993;Tseng et al., 1996;Tseng & Hu, 2014).The shortening of products' life cycles, global competitiveness and the economy are contributing to the market volatility (Kernschmidt et al., 2015).The shortening of a products' lifecycle is caused by the ability of the competitors to quickly react to release innovative products.The global competition among companies competing on the performance dimensions of product features, price, lead time and quality and also competing within the same market segment are becoming increasingly fierce.To stay on the leading edge, manufacturers need to address the market in new ways (Ageron et al., 2012).
Effective and efficient knowledge sharing and reuse within collaborative manufacturing enterprises is a game-changing opportunity for manufacturers to stay competitive in such a dynamic environment.A seamless flow of information enables manufacturers to increase their throughput while keeping overhead costs to a minimum (Lin et al., 2011).The manufacturing systems play a crucial role in supporting the manufacturer's efforts as they are intended to provide accurate data in a fast way (Bi, Da Xu, & Wang, 2014).This data comes in a variety of formats, including documents, graphics, video files and the non-human interpretable signals produced by sensors.Data is collated within distributed databases, which are in turn kept within a variety of storage formats and programmes (Song et al., 2013).The data may be retrieved using limited key search attributes within the database, such as Relational Data Bases (RDB) table, row, column, primary and foreign key (Astrova, 2009;Cerbah, 2010;Sane, & Shirke, 2009) .The power of the database's search capacity may increase exponentially if the data is linked, allowing for further attributes and for queries to be combined.
In regard to shop floor operations, the collected data may provide key metrics to the management systems on the shop floors.Such key metrics include level of asset utilisation, statistical control processes and job tracking among others.The underlying intelligence in such systems utilises all available data and provides analysis, forecasting future states and support in decision making (Khan et al., 2011).This ensures that manufacturers make better decisions at all times in supporting their operations (Lee et al., 2014).The correct application of this intelligence gives leverage to allow manufacturers to query their manufacturing sites, to make informed decisions and in turn, perform necessary activities in an educated manner.These activities may include design experimentation, alerts within the system, optimisation, simulation, modelling, prediction and reporting more generally (Davenport et al., 2010).This project is motivated by the lack of a systematic and intelligent data query system within the manufacturing supply chain.Better decision making resources with intrinsic manufacturing intelligence emerges from a synchronised data sharing system for the mutual benefit of every stakeholder.In this paper, a novel ontology-based query methodology is presented where the result of each query from the system is evaluated based on the semantic relevance and distance of the query to the local manufacturing documents.The model is built upon a real case study and the results are compared with a semantic-free random walk mechanism.By investigating an intelligent and efficient data query system in a manufacturing network, our work was able to fill a gap in current research.

Research Methodology
Intelligent data query is an emerging field of research for the manufacturing supply chain community, as the need for an efficient and effective data sharing and retrieval system is more demanding than ever.This section addresses the key concepts used and the foundation for an intelligent semantic query system.

Manufacturing Ontology
Ontology is defined as a data model which provides information about a domain with the aid of relationships established among various concepts under that particular domain (Gruber, 1993;Guarino et al., 2009).Organizational knowledge is encapsulated to build a set of knowledge-bases that can combine details such as those of rules, policies, processes, definitions and relationships.Ontology has gained application in many different areas and domains, including manufacturing.Simple to complicated processes are easily explained by ontology in the manufacturing domain; this makes it simpler to create patterns, algorithms and frameworks for the automation of any process.
The Process Specification Language was proposed by Schlenoff et al. (2000) from the National Institute of Standards and Technology (NIST) to define a taxonomy to capture and exchange discrete manufacturing process information.Process Specification Language (PSL) is a neutral standard language for process specifications; such as production scheduling, manufacturing process planning, workflow, business process reengineering, simulation, process modelling and project management.The neutrality of PSL allows process specifications to be translated to and from softwareproprietary file formats.
For the domain problem of keeping track of the cost in the manufacturing process, Lemaignan et al. (2006) proposed an upper ontology in the manufacturing domain to be a common ancestor for more domain-specific ontologies.MASON (Manufacturing's Semantics Ontology) addresses a similar problem domain to that from the NIST.The difference is that PSL is written in Resource Description Language (RDF); while MASON uses the more up-to-date semantic language called Ontology Web Language (OWL).Moreover, MASON shows a concrete application in expert systems for automatic cost estimation and in semantic-aware multi agent systems.Our work on the other hand looked at an intelligent and efficient data query system in a manufacturing network which is missing in the literature.
In an extensive and rigorous review of the literature, Sanfilippo and Borgo while preserving the intended semantics.This will give an insight into the importance of semantic web technologies for the current manufacturing systems.In our paper, we have tried to demonstrate the advantages of using ontologies for data query in a quantifiable manner.
In another study, Wei et al. (2009) proposed an ontology-based approach to address manufacturing design problems.Their system is capable of integrating the whole design process within the product knowledge management to support locating, reusing, integrating and transferring the design knowledge.Their work can be improved and extended by developing more diverse domain-specific engineering ontologies to get a more extensive multidisciplinary integration.In contrast, the presented data query system in our paper is not limited to one aspect of manufacturing supply chain such as product design.The proposed system, designed in a scalable manner to accommodate data integration to the ontology within different tiers of manufacturing domain.Lin and Harding (2007) have proposed a Manufacturing System Engineering (MSE) ontology which acts as a mediated meta-model for inter-enterprise collaboration.
Their ontology is structured into classes, properties and instances.However, this method still involves time and cost overheads, as the individual partners must map their vocabularies to the MSE ontology at the outset; this can be a slow process with the current manual methods.Therefore, this research is limited due to this ineffective process, which may be a major obstacle for its large scale use in information integration in a global supply chain.To ease this problem, formal mapping could become semiautomated via algorithms and heuristics identifying the shared characteristics between the two ontologies.Ontology Inference Layer (OIL) (Ga, 2009).Another reason why this editor is suitable for this research is that it has deductive classifiers for validation of the ontology consistency.Furthermore, it can also export into other formats such as RDF, which is the basis of this research.When modelling in a domain, developers must be able to focus more on the concepts and relations rather than the syntax of the final results.This can be achieved through a Protégé-based editor, resulting modelling at a conceptual level (Noy, 2001;Gennari, 2003).
The concepts of the ontology and the instances are created based on the data from the case study.The scope of the ontology was limited to the workshops' attributes, such as their core competency operations.Some of the key extracted attributes from the case study are summarised and tabulated in section 2.3, which is the foundation for the developed ontology.However, our literature research highlighted that the existing ontologies do not cover the terminologies used in this data query system research as a result of their 'domain specific application' nature.For instance, the concept of 'quality control' is not demonstrated in the open source MASON ontology as it was specifically designed for the automatic production cost estimation.In order to widen the scope of the project, the ontology will be further developed for future work and enriched by mapping its main concepts to famous WordNet (https://wordnet.princeton.edu)lexical ontology and MASON in an offline manner.This will ensure that the implementation process is facilitated at scale.The alignments of the ontologies will lead to an improved knowledge-base, which will allow more data to be queried from the system.As an example, for this case study, the concept of 'production cost' is not considered as it can be covered by mapping the proposed ontology to MASON. Figure 1 shows the core concepts of the proposed ontology where each core concept is further formalised into more sub-concepts.where each one provides a set of manufacturing operations.Each Order consists of a set of Jobs which are defined by unique process plans.These process plans outline which operations are required for each job.Therefore, isNeededBy define which operations are needed for each job and isProvidedBy define which manufacturer provides that particular operation.Figure 2 shows an example of detailed object properties within the developed ontology.
Figure 2. Object properties of the proposed ontology

Industrial Case Study
The case study was carried out at the Gruppo Fabricazione Meccanica (GFM) S.r.l.
Group, Italy, and its objective was to use the result to help the understanding of how the theories of semantic data query apply to real manufacturing business situations.The case study will provide information to create a model of how the company, under study, is involved in the life cycle of its products.To do this, the study will look at the activities involved when a call for a proposal is received; an offer is generated; a firm order is received; material sourcing, production, quality control and transport logistics are triggered; and finally, after sales service is provided.During all these phases, hundreds of queries and messages are generated and as a considerable portion of these are unnecessary and can be avoided with an intelligent query system, they are highlighted.
GFM acts as a central company to a pool of manufacturing workshops and suppliers.They collaborate with over 300 suppliers and over 40 workshops to fulfil customer production needs.Their expertise is in the production of gas turbines, steam turbines and generator parts.GFM is a single point of contact for this study, and the case study data is extracted by queries in Microsoft Structured Query Language (SQL) Server to GFM's management and information system.They have internally developed the software that monitors the supplier's performances so it can better meet their needs, and have built procedures / queries for extracting the data.

Simulation Software
In collaboration with GFM, the most important data about each individual supplier is gathered and imported into different nodes in the simulation software namely, the operations provided, workpiece materials, fleet of machines and quantity, types of testing, tolerance dimensions and certifications.These are parsed into JAVA objects and stored in each node as its local database.A JAVA simulation platform is developed in NetBeans integrated development environment (IDE) to implement the model of network used by the manufacturing workshops and to simulate the message generation process during a data query.The simulation of the manufacturing network is built upon a primitive version of the software presented in a previous work (Jules et al. 2015).
However, the current simulation is mainly focused on the flow of data within the network and the use of ontology for data query rather than the network formation and scheduling.Each node or object represents one supplier/workshop and has the capability of communicating with other nodes.During the simulation, a query list is generated from one node (GFM).This is then circulated among the nodes until the result of the query is found.The platform makes use of two protocols for the search mechanism: namely random search protocol and ontology search protocol.The random search setting involves randomly selected nodes, while in ontology search protocol the query is sent to nodes with high probability of containing the answer to the query based on the unified ontology.These two search methods are explained in more detail in section 2.4 and 2.5.The schematic of this distributed manufacturing network is illustrated in Figure 3.
Table 1.Examples of key extracted attributes of workshops This figure illustrates the formation of a network and the data which support the network operation for 1 sub-order.This network is constituted by supplier and workshop data, which store data at every instance.The green, orange and blue arrows present the customer communication, supplier and workshop data flow within the network respectively.As the queried data cannot be accessed easily as it can be stored in any section of the network, ontology is the most efficient solution to access any data within the network.

Random Search Protocol
In a manufacturing network, each workshop has a myopic view of the system and the data stored in different resources.Search protocols enable finding the best queried resources by routing the query to the right workshop.This research is trying to show the potential of ontology for the development of an intelligent and efficient query system compared to a semantic-free search method.The random search method adapted in this study is based on the Gnutella algorithm (Breadth-First Search (BFS) algorithm), which uses flooding to search all the workshops for a given query (Yeferny et al., 2011;Kapoor et al., 2013;Arour & Yeferny, 2015).This method is well documented, and it is used due to its simplicity of implementation.The aim is not to develop a random search protocol, but to modify this protocol for the application within a manufacturing network and compare it with the proposed novel ontology search protocol.
The search mechanism is built upon two protocols, namely the random search protocol and the ontology protocol (outlined in section 2.5).During programme initialisation, an object called Message is created containing some properties such as QueryList and Hit.
QueryList, as the name suggests, contains the query that is sent to the network, and Hit is a boolean indicating whether or not the query result is found.The simulation software is developed on a cycle-based evaluation mechanism.After each cycle, if the Message finds the answer, it triggers Hit; otherwise it is sent to a random neighbour node and also adds to the counter CrossStep.Moreover, each protocol has properties called LastMessage, IncomingMessages and ProcessList which will be used in the Observer method for the evaluation of the whole process.This performance measure evaluation is explained later in the following sections.In each cycle, the content of the QueryList is compared with the local storage of each workshop.For each common match between the QueryList and the local storage data list, a counter is added inside the FindRelevance method and that relevance is compared with a predefined threshold.If the value of the relevance exceeds the threshold a Hit; is triggered, otherwise the cycle proceeds.The algorithm for the random search protocol is demonstrated in Algorithm 1 in Figure 4.  Lost-foam casting and Full-mold casting is two, as it is shown in Figure 5. Algorithm 2 elaborates on the procedure involved when ontology protocol is in use.The contribution of this research could be put into perspective as follows: (i) the proposed ontology search protocol presents a novel data query system, which focuses on retrieving information with respect to the semantic content of a workshop and by considering all the relevant accessible data within a workshop; (ii) this data query algorithm will evaluate the semantic richness of the database to return the most relevant results for each query, in comparison to a semantic-free based approach, even if the answer to the query does not exist within the documents with the help of reasoning; and (iii) the development of the proposed search mechanism results in a significantly more efficient data retrieval system than the other data query systems such as Random Walk and OSQR (Himali et al., 2012) considering the recall and precision metrics of the search quality from the literature.

Performance Evaluation
Evaluation of the performance of a data retrieval system is a crucial task which entails a well-established test method and benchmark.The results of the queries from the ontology will be evaluated against the actual existing data from the manufacturing environment in order to determine the fidelity of the ontology query model and the simulation model.The Key Performance Indicators (KPIs) involved in calculating fidelity of the proposed system are devised based on Precision and Recall performance criteria.Based on the literature review, these KPIs are the most prominent performance measures when dealing with information retrieval systems and ontologies (Raghavan et al., 1989;Müller et al., 2001;Euzenat, 2007), and therefore they have been chosen for the evaluation of this research.Furthermore, these KPIs have been chosen to allow the comparative approach to other similar research papers (Himali et al., 2012).In the configuration of the simulation software, an Observer method is created to evaluate the efficiency of the algorithms developed.The Observer calculates the number of Hits from all incoming messages, message cost, maximum CrossSteps and the sum of all CrossSteps, and these are the basis of the performance evaluation.The message cost is the number of all generated messages, and the maximum CrossStep is the longest distance travelled by the Message to trigger the Hit.The Response time is considered as the first occurring Hit.Precision is a measure of returned query relevancy, which is defined as the number of relevant retrieved instances to the total number of instances; and the other measures concern hits per query and recall.These are defined as the ratio of the number of relevant records retrieved to the total number of relevant records in the system.All these key performance indicators are used in this research to assess the basis of the proposed protocols.
In addition to the above, in every information retrieval, two main metrics of Precision and Recall are targeted to measure the classifier output quality of search.
These metrics are also linked to their harmonic mean, F-Score, which can be obtained by the following formulae: (https://www.crosswise.com/cross-device-learning-center): Recall= TP/ (TP + FN) (1) Precision is mainly known as the measure of result relevancy, which is dependent on the number of true positives (TP) and false positives (FP).On the other hand, recall is a metric value to evaluate the number of truly relevant results obtained and it is dependent on the number of True positives (TP) and False negatives (FN).The link between these metrics is a great illusion of the nature of the current problem in the process of information retrieval.By recalling data, all relevant documents can be accessed and finally sorted through junk, and this is interpreted as precision.In an information retrieval system, low precision leads to a high number of incorrectly predicted results, and low recall leads to a lower number of correctly predicted results.
Therefore, the ideal system would have to demonstrate high precision and recall to achieve its most accurate results.As accuracy is very misleading in the evaluation of the quality of an information retrieval, these metrics are taken into account together to ensure a high quality of information retrieval.

Results and Discussion
The results of the simulation are depicted in the  (Himali et al., 2012).As demonstrated in figures 9 and 10, the ontology protocol has been able to reach a maximum of 7,920 Hits per Cycle, while the random protocol holds only 99 Hits.The recall rate of the proposed search mechanism is drastically higher than both the OSQR and Random protocols respectively.However, the OSQR protocol was based on WordNet ontology which used a wide range of sample data inputs.In order to accurately compare the results of the proposed protocol with OSQR, it is essential to simulate the constraints by equalising the number of data inputs.
Figure 11 shows how the ontology protocol outperforms the random search method and OSQR in regards to precision rate.With the proposed algorithm, the possibility of an incorrect identification of data is minimised to ensure the least mismatching of the relevant data with irrelevant information to the query.The calculated improvement rate of precision resulting from this ontology is estimated at 228.57% compared to OSQR and 604.10% compared to Random search, which points out the efficiency of ontology protocol when considering hit messages per query.Table 2 tabulates the key performance evaluation criteria.
Table 2. Key performance evaluation criteria The only downfall of the proposed ontology protocol is the response time.For the simulation run illustrated in this paper, the average response time for the ontology protocol is 9 minutes; while the random search is done in under a minute.This is due to the semantic calculation happening in the ontology protocol and could be alleviated in the future work.

Conclusion and Future Work
A manufacturing ontology was created in this research to model the knowledge from a real case study.Accordingly, a simulation platform was developed to demonstrate the validity of an intelligent data query system.Based on the results, the effectiveness of the ontology protocol was shown compared to a random search mechanism and the wellestablished OSQR method from literature.Using the ontology not only helped to reach the desired query results with fewer messages, but it was also useful for capturing the most relevant results per query.Moreover, it will be useful in case of the absence of query results in local storage databases.For instance, if the result for a query does not exist in the nodes, the random search will return no result with a huge amount of generated messages.However, the ontology protocol will return the closest available answer to that query through the use of semantic relations.This will be beneficial especially in a dynamic manufacturing environment.
The scope of this paper was limited to just two search mechanisms and a number of limited evaluation metrics.As future work, the research will focus on the enhancement of the manufacturing ontology by using well-known existing ontologies such as WordNet and MASON.Furthermore, the authors are interested in the development of a multi-agent system where autonomous agents automate the whole query process.They will communicate with each other to find the query results and also to populate the ontology in an automatic way for knowledge reuse.

(
2016) have focused on the semantics of feature notions.They have highlighted that the current feature-based CAx systems such as Computer-Aided Design (CAD), Computer-Aided Engineering (CAE), Computer-Aided Manufacturing (CAM), and Computer-Aided Process Planning (CAPP) are missing a common understanding of what counts as a feature and the way features are conceptualised.They have emphasised the product knowledge conceptualisation problems within current product life cycle management (PLM) systems and the shortcomings of existing unifying approaches.The proposed ontology-based product knowledge representation would allow reliable data integration

Usman
et al. (2013)  presented a practical Manufacturing Reference Ontology (MRO) for product design and production domain.Their work is motivated by the lack of a core set of manufacturing concepts for unification of terminologies across various strands of manufacturing domain.The emphasis was on the formalisation of these core concepts for a semantic-aware system where the MRO will support the development of domain-specific manufacturing ontologies.Their proposed ontology was tested by a qualitative method of inserting two types of facts in the system, namely: facts which are violating the formal classifications of the concepts; and those which conform to the formal classification.The experimentation part was mainly focused on the Feature concepts as with the work carried out bySanfilippo and Borgo (2016).In this study, we looked at a semantic-aware query system in a manufacturing network rather than a comprehensive ontology development which we believe is missing within the manufacturing research community.The contrast between the work ofUsman et al. (2013) andLin and Harding (2007) and our work is that our paper made use of the ontology editor Protégé to create an ontology as a tool for data query purposes, while the scope of their work is mainly focused on the development of an efficient and practical core manufacturing ontology.In a separate research study,Cai et al. (2011) presented an ontology-based system called ManuHub which acts as a mediator for manufacturing service providers by efficient and automatic retrieval of the required manufacturing services.Their work is based on Service Oriented Architecture (SOA), and there are two limitations in their work.Firstly, the question as to how to merge their ontology into existing upper-level and domain-specific ontologies in the real world, and secondly how to test their idea on a dynamically changing manufacturing data set.In order to develop a simulation platform for the formation process of networks of manufacturers,Jules et al. (2013Jules et al. ( , 2015) ) have developed an ontology based on the Product-Resource-Order-Staff Architecture (PROSA), to model the inter-enterprise communication during the formation process.The ontology also models the process of resource auctioning which is part of the formation process.Saeidlou et al. (2016) have proposed a hybrid search algorithm where learning process is combined with data extraction from the ontology.This learning technique leads to a more intelligent data query system.The framework used set the foundation for the following research where an intelligent ontology-based data query mechanism is compared with a semantic-free one as well as the concept proposed byHimali et al. (2012).For the scope of this paper, a manufacturing ontology is built upon Protégé ontology editor software, which is a free and open-source leading ontological engineering tool.It provides interface with other knowledge-based tools such as Java Expert System Shell (JESS) and is compatible with various ontology languages and formats such as eXtensible Markup Language (XML), DARPA Agent Markup Language (DAML) and

Figure 1 .
Figure 1.Developed ontology for manufacturing data query

Figure 3 .
Figure 3. Case study network structure (Data flow between network entities)

Figure
Figure 4. Random search protocol

Figure 5 .
Figure 5. RDF representation of manufacturing ontology

Figure 7 .
Figure 7.Total cross step comparison of Random search and Ontology and are based on the aforementioned evaluation criteria.As demonstrated in Figures7 and 8, the ontology protocol shows a considerable amount of improvement in the number of travelled steps by the Message.The random search mechanism travels more distance among the nodes to find the query result.As illustrated in Figures7 and 8, there is a maximum difference of more than 200,000 in the comparison of total cross steps, as well as a maximum change of over 1,500 maximum cross steps as a result of the new proposed ontology.Figures 9 and 10 show experimental results for Hit and Hit per Query.As depicted in these Figures, the ontology protocol shows a dramatic improvement of 166.67% and 7,900% in total Hit counts compared to OSQR protocol and Random protocol respectively