Dependency graph for short text extraction and summarization

ABSTRACT A sheer amount of text generated from microblogs and social media brings huge opportunities to the text mining applications. Many techniques such as sentiment analysis and opinion mining are proven effective to deliver insights from documents. However, most of these textual data are in the form of short and fragmented texts which are difficult to visually extract due to the sparsity issue and the context in the content is often unknown. Naive while widely used models, term frequency and the bag-of-words never considered the semantic relationship between the words, making the results relatively difficult to interpret. A well-known technique in text mining like topic model may provide a general ‘at glance’ understanding but can be difficult to interpret or to understand. One alternative is to aggregate words in a semantical order and generates an output of human-understandable sentences. In this paper, we address this direction by proposing the belief graph data model that joins short texts by inducing the part-of-speech tagging to maintain the order and to preserve the context of the content. Extensive experiments showed that our approach improves the overall qualitative evaluation of text understanding compared to the previous state of the art text mining techniques.


Introduction
In the era of big data, a tremendous amount of fragmented short texts is growing at a scale that makes them impossible for the human to visually extract. The information overload is creating redundant and duplicates textual data, making it difficult to quickly understand information. As an illustration, in Figure 1 we can quickly scan the news headlines but it will take some time to understand the underlying event. For example, on the left side, in the Great Barrier Reef news, we might want to know why does it struggle and what was the cause of the funding; and on the right side, we might want to know what is the overall summarization of public opinion. Summarizing the details of the content can save users a significant amount of time. Many data mining applications have been developed to help understand and to summarize these collections of texts, for example, topic modelling (Blei, Ng, & Jordan, 2003) and text summarization (Mihalcea & Tarau, 2004).
However, these techniques emphasize the generalization which failed to capture the underlying context to concrete a profound summary.
A naive while widely used model is term-frequency or unigram which naturally presents hot keywords, where the topic is represented by frequent keywords. By only providing some keywords, unigram is far too limited. Bi-gram extends the unigram by capturing the relationship of two words based on the co-occurrence. Later, n-gram complements the words identification into a phrase with part-of-speech tagging. However, the ngram is modelled in the bag-of-words form, which does not consider the word order in the sentence. One option is to organize those keywords according to the grammatical order and generate an output similar to the human-understandable sentences. This is a challenging task since it combines part-of-speech processing and data mining for summarization. In this paper, we conduct the empirical evaluation of this direction by using the belief graph model.
In order to build the belief graph, we are combining the text mining with graph linguistic approach, which has been the focus of multiple lines of research (see e.g. Mihalcea & Tarau, 2004;Sayyadi & Raschid, 2013;Scaiella, Ferragina, Marino, & Ciaramita, 2012;Zhang, Wang, Xu, & Hu, 2013;Zuo, Zhao, & Xu, 2016). At the same time, there has been growing interest in adopting dependency parses tree (De Marneffe, MacCartney, & Manning, 2006) for a range of Natural Language Processing tasks, from machine translation (Deutch, Frost, & Gilad, 2017;Li & Jagadish, 2014) to question answering (Amsterdamer, Kukliansky, & Milo, 2015). Here, we specify the dependency parses tree simply as the dependency tree. Dependency tree parses the grammatical relation to capture dependencies between words such as predicate-argument. We will use the term dependency tree and tree interchangeably.
In this paper, we extend our previous work (Franciscus, Ren, & Stantic, 2018a) and present a scheme to build the belief graph by utilizing the graph dependency architecture. Our objective is by using the dependency tree, we are able to store the intermediate result of text processing in a graph database which further enables extensive graph queries such as centrality and clustering. The basic idea of our architecture is to simplify the process by utilizing state of the art language processing to enable graph computation over a large amount of text. To be specific, we provide the following contributions: . We improve the proposed reversed dependency to convert raw short text into a graph model by preserving grammatical properties and then reuse these properties to generate the synopsis of text. . We designed a practical schema and storage structure to build belief graph by using NoSQL databases, MongoDB and Neo4j. The recent update in the database version improves the writing performance significantly. . We performed comprehensive experiments to demonstrate the system performance and practical usage of graph queries in comparison with other well-known text mining and text summarization techniques.
The rest of the paper is organised as follows: in Section 2 we present some related works; in Section 3, we present the details of belief graph definition and construction; in Section 4, we provide the experiment results; in Section 5 we discuss our interpretation of the experiments and finally in Section 6 we conclude the paper and indicate the future work.

Related work
In this section, present some related work in-line with the purpose of our contribution which can be categorized into several classifications.
(1) Natural Language Dependency Tree The rising of NLP processing with the dependency tree has captured the interest of research communities with the availability of NLP toolkit . Two recent prominent works utilizing the natural language to interpret (Li & Jagadish, 2014) and answering (Deutch et al., 2017) query from relational database. Another work also uses dependency tree for crowdsourcing platform (Amsterdamer et al., 2015). Our work is similar to these concepts, however, we are targeting directly into text mining without involving SQL queries and focusing on graph approach. (2) Topic Modelling Latent Dirichlet Allocation (LDA) (Blei et al., 2003) is a well-known topic modelling technique which can be used to discover the latent information from the document. Several variances of LDA have been proposed to alleviate short and sparse text processing. Our work differs in the way we preserve the word order in contrast to using bag-of-words assumption which does not consider word order. Our work also uses a graph instead of a sampling method to obtain keywords. Another strategy is to restrict the document-topic distribution so that each short text is sampled from a single topic (single topic per document), this is known as Dirichlet Multinomial Mixture (DMM) or mixture of unigrams model (Yin & Wang, 2014). Given the limited content in a short text, this strategy is reasonable and it temporarily alleviates the data sparsity problem. The third strategy is to explicitly incorporate word co-occurrence information. For instance, modelling word co-occurrence patterns (Yan, Guo, Lan, & Cheng, 2013) and using soft-clustering mechanism for further word co-occurrence augmentation (Quan, Kit, Ge, & Pan, 2015).
(3) Graph-based model Graph model offers a simple, effective and interactive representation based on the relation between each vertex. Each vertex can be treated as an entity of sentences, words or characters with the edges as the label or property of the relationship between two vertices. The fundamental of the graph is based on the idea of KeyGraph (Ohsawa, Benson, & Yachida, 1998) which converts text into a term graph based on co-occurrence relations between terms. The main strength of graph representation lies in the connection between the entity, enable users to query the relationship between keywords. Further, graph-based models explicitly consider word co-occurrence as one of the main parameters to determine the score of a keyword (Sayyadi & Raschid, 2013;Zhang et al., 2013). Contrast to previous work, we focus specifically on the dependency tree to capture the grammatical relationship between keywords. (4) Word Embedding. Recently, word embedding growth is intensified with the development of vector representation of global words (Mikolov, Sutskever, Chen, Corrado, & Dean, 2013;Pennington, Socher, & Manning, 2014). Word Embedding can capture semantically similar words and word analogy (e.g. woman + king − man = queen) based on the position of words in the sentence. Words are mapped into a large dimension vector space using the skip-gram or the continuous bag-of-words (CBOW) model. However, it often requires a large number of corpus (millions number of words) to achieve a satisfactory performance since the word context is heavily relying on the surrounding words (Grave, Mikolov, Joulin, & Bojanowski, 2017;Joulin et al., 2016). Our work focuses specifically on the part of speech to explicitly capture the context based on the word order.

Belief graph
In this section, we present the definitions and characteristics of two classes of knowledge base model, knowledge graph and belief graph. The belief graph follows knowledge graph intuition for information linking. Before we describe the construction of the belief graph, we outline its predecessor knowledge graph and contrast the differences between the two of them.

Knowledge graph vs. Belief graph
Knowledge graph has been the focus of research since the introduction of the Google Knowledge Graph in 2012. 1 In practice, knowledge graph has been known as a complementary for many applications. For example, assisting question answering (Chen, Fisch, Weston, & Bordes, 2017), providing auxiliary information for information retrieval (Li, Wang, Zhang, Sun, & Ma, 2016) and inferring new knowledge from its own collection (Song, Wu, & Dong, 2016). However, it does not have a valid definition and therefore prone to multi-interpretations. Here, we compose the definition and characteristics of a knowledge graph.
Definition 3.1 A knowledge graph G is a directed labelled graph (V, E, L), where V is a set of nodes, and E # V × V is a set of edges. Each node v [ V represents an entity with label L(v) and each edge e [ E represents a relationship L(e) between the two entities. Characteristics: . Atomicity. Each statement has a single interpretation which refers to a subject-object relationship over a specific domain. For example, Shakespeare died on 23 April 1616 in Warwickshire, England. It has an exact and precise meaning. . Factoid Entities. The content is generated from a human-curated knowledge base which contains a well-known real-world fact. It covers the broad domain of human-life aspect. For example, Wikipedia and Freebase. . Triplets. The subject-object relationships in a common knowledge graph are often presented in the triplets form such as 'is-a' relationship (e.g. Apple is-a Fruit). Since a knowledge graph carries a lot of information, nodes and edges content may have a name, type, and attribute represented as the label. For example: On the other hand, although belief graph follows a similar schema with knowledge graph, it has significant differences in its content and source. It does not represent the grounded facts such as real-world entities and it is more focusing on the word-to-word relationship. Belief graph can be used to summarize unstructured and unlabelled text which further can be extended into topic modelling, text summarization, and text visualization. We present the definition and characteristics of belief graph as follows: Definition 3.2 A belief graph G is a directed labelled graph (V, E, L, P), where V is a set of nodes, and E # V × V is a set of edges. Each node v [ V represents an entity with label L(v) and has properties P(v) = id, t where id is the identifier and t is the grammatical property type. Each edge e [ E represents a relationship label L(e) between two entities and has a property P(e) = f where f is the co-occurrence frequency. A belief graph is formed from multiple words-to-words relations derived from sentences. Characteristics: . Ambiguous. The network can be constructed from any text regardless of the length or size of the document, thus each connection in the belief graph is prone to multiinterpretation. Unlike the triplets in the knowledge graph, belief graph takes into account all possible relationships which will be ranked according to their importance. . Non-Factoid Entities, generated from a collection of unstructured and unlabelled opinion-based text where the ground-truth identity is unknown. For example, social media and product review data. Thus, the summary generated can be a fact or public opinions. . Ordered. Since belief graph is using the dependency parser, it automatically preserves the word order of linguistic form including active and passive voice by using the grammatical information. Thus, it is relatively straightforward to identify the subject-object relationship. For example, Environmentalist (noun) .

Building belief graph
In this section, we give an overview of our belief graph system as shown in Figure 2. It can be divided into several inter-related components: (i) Initial preprocessing, where we collect the raw textual data and store them into MongoDB. (ii) Dependency tree, where we map a collection of text to the dependency parser and translate them into a grammatical structured tree. (iii) Dependency tree join, where we merge each dependency tree into a belief graph. As a case study, we demonstrate our belief graph system by using Twitter and Amazon product review data sources and using NoSQL database platforms.

Initial preprocessing
An efficient storage architecture is essential to manage raw data and handling preprocessing. Our initial preprocessing is primarily built based on the precomputing architecture (Franciscus, Ren, & Stantic, 2018b), where we classify unstructured data into manageable periodic content which can be drilled down or rolled up to support time-slicing query. However, in this case, we use the seed terms as the parameter to classify the collection. Hence, we index the collection level according to filtered key terms. Unlike normal text preprocessing, we do not necessarily have to tokenize or stem the words. Firstly, due to the process of Part-Of-Speech (POS) tagging, the system needs to capture the original sentence including punctuations, and secondly, when we capture the grammatical relation the system requires the connection between words which cannot be stem into their root. However, the basic preprocessing such as ASCII character formatting is required to remove uninterpreted emoticons or symbols.

Dependency tree
The human linguistic structure still plays a major role in determining the context of any given set of words. In this instance, we borrow the model of the linguistic structure from the Penn TreeBank which is normally used to structure English sentences. Specifically, the dependency tree is a linguistic tree generated from the typed dependency parses of English sentences from the phrase structure parses. In order to capture inherent relations occurring in corpus texts, we use the noun phrase (NP) relations which are included in the set of grammatical relations used (De Marneffe et al., 2006). In here, typed dependencies and phrase structures have a different way of representing the structure of sentences, while a phrase structure parse produces nesting of multi-word constituents, a dependency parse produces dependencies between individual words. A typed dependency parses further labels the dependencies with grammatical relations, such as subject or indirect object. The tree structure can be found in the original paper (De Marneffe et al., 2006). We use dependency tree as the words segmentation process which is a process to identify the meaning of a word beyond the co-occurrence with other words. During the parsing stage, the system seeks to understand the meaning of different word relationships according to the sentence flow. We define the dependency tree as the key concept of natural language from Stanford NLP parser. In particular, the dependency tree is considered as: Definition 3.3 A dependency tree T = (V, E, L) is a node-labelled tree where labels consist of (1) Part of Speech tagging: the syntactic role of the word, and (2) Relationship (REL): the grammatical relationship between words according to their associates in the dependency tree.
Each vertex (v [ V) represents word (term) while each edge (e [ E) represents a grammatical structure in a sentence. Each vertex has a POS-tagging type t property. This property distinguishes each word according to its classification. For example, in Figure 3 on the sentence 'Marvel comic has Superman and Batman', the word Marvel, Superman, Batman belong to PROPER NOUN (NNP) family while comic belong to NOUN (NN) family.
Another example, given two sentences 'LeoDiCaprio is an amazing environmentalist. He finally won an oscar' in Figure 3, the parser will indicate subject-object when predicate (VBZ and VBD) exist. In this way, we can virtually answer that LeoDiCaprio is an Environmentalist through the NN indicator while he also Won an Oscar via predicate-object relationship. Once we obtain the property value for each vertex, we build the belief graph as an extension of the dependency tree. Belief graph can be seen as the network of dependency tree collection. We discuss how to join the dependency tree further in the next section. 3.2.3. Dependency tree join Each sentence will form a dependency tree after the mapping process. Each node in the tree represents a word/term with its POS-tag type. We join a collection of dependency trees into a graph where the edges will record the co-occurrence between two nodes. In this process, we emphasize the relationship (bi-gram frequency) instead of a single node(term) frequency. Therefore, a reliable estimation of the terms co-occurrence requires a large number of corpus. In the short text environment, for example, tweet, the more the number of tweets the better. Note that a word may have a different POS-tag type resulting in multiple nodes with the same word. We keep the duplication of words as part of identification. For example, in Figure 4 we take the previous illustration 'LeoDiCaprio is an amazing environmentalist. He finally won an oscar' as the first sentence and 'LeoDicaprio won an Oscar' as the second sentence. We merge each tree into a graph by prioritizing the co-occurrence pair of nodes. In this way, we can capture the significance of both terms and phrases.

Graph query processing
Joining the dependency trees will result in a large directed network of words. Each word formed chain properties with a pointer to the next word. To filter the importance of the network, we use the off-shelf graph betweenness centrality algorithm as it shows a good performance in choosing the important topic (Sayyadi & Raschid, 2013). Betweenness centrality measures the value of a vertex based on the shortest path. Formally, recall that a geodesic path is not necessarily unique and the geodesic paths between a pair of vertices do not need to be node-independent, which means they may pass through some of the same vertices. Let n i a, b be the number of geodesic paths from a to b that pass through i and let n a, b be the total number of geodesic paths from a to b. Then the betweenness centrality of vertex i is: where by convention the weight ratio w i a, b = 0 if n a, b = 0. The main intuition is that important terms are likely to be in the centre of the graph. Thus, the core of the topic can be automatically generated from the centrality measurement. For example, Figure 5 shows how betweenness centrality able to generate keywords such as 'gold', 'I'm', and 'like'. These keywords b [ B will determine the topic of top m text summary where there can be overlapping words for each group but not the topic word (e.g. word 'gold' and 'I'm' cannot be in the same group).
One of the drawbacks of this method is the complexity. This procedure uses a Breadth First Search (BFS) shortest path algorithm. With BFS, the complexity naturally becomes O(n * m). Due to the complexity of this algorithm, it is recommended to run it on only the required nodes. Specifically, Neo4j supports betweenness algorithm through APOC 2 library which can be called through the following verbatim example.

Experiments
We used two datasets, Twitter and Amazon product review (He & McAuley, 2016). In particular, Twitter dataset is retrieved using the Twitter Streaming API during the period of May-July 2017. As the use case interests, we specifically select only in the Gold Coast, Australia area using the bounding-box method. Both datasets are loaded into MongoDB, each tweet or review is indicated as one document. We vary the number of documents from 100 documents to 25,000 documents for each setting. Note that Twitter only allows 140 character maximum for each tweet while the product review may range from short sentences to a long paragraph. We also address the graph analytic task after the dependency graph is stored inside Neo4j. Our experiments were conducted on a standard 4-core Intel i5 desktop, MongoDB version 3.4.2, and Neo4j version 3.4.0.

Preliminary results
The results of constructing documents into the dependency tree are given in Figure 6. We report the time to label each document before the merging process for different document size. As we can see the time cost of converting documents grows linear with the size. We record approximately between 40 and 100 ms to convert each document depending on the number of words per document. Note that the time can be significantly improved with a better computer specification and bigger memory. Since Twitter has a maximum 140 characters allowance, the processing time appears to be stable. While on the other hand, Amazon review varies a lot in the number of words and tend to be double than Twitter which indicates a significant discrepancy. Figure 7 depicts the results of loading dependency trees into the graph database. The time is measured as the writing time in the database, we noted a two times improvement in Neo4j version 3.4. We use transaction schema which writes per sequential wordrelationship-word as we explained in Section 3.2.3. As shown, the writing time increases gradually as the size is doubling. This is due to the fact that the database has to match existing word-relationship-word to increment the frequency property in the relationship, and one word may have different property type which leads to the increased matching time.
The overall computation time from constructing to loading into the database takes seconds to minutes to process due to the rich POS-tagging graph properties. Stanford dependency parser credited for more than half of the overall processing time. Note that at this point we did not consider any predefined indexing technique prior to loading into the database. However, we consider this process as an offline task which can be computed as a background task. The graph analytical querying within Neo4j is real-time regardless of the size of the document.

Baseline methods
Since clustering can be seen as the summarization method (Liu, Chen, & Tseng, 2015), we consider four other approaches to validate the performance of the proposed methods. K-Means (Lloyd, 1982) and DBScan (Ester, Kriegel, Sander, & Xu, 1996) are the most popular and the most straightforward to implement using the tf-idf vectorizer. Specifically, for K-Means and DBScan, we use the scikit-learn 3 implementation and we choose the number of clusters with the best silhouette score after several iterations. We also consider LDA (Blei et al., 2003) with Gibbs Sampling as the standard representation of topic modelling. LDA is implemented using Gensim library 4 with tf-idf document matrix and a variation of topic number, we found that the optimum number falls between 10 and 5 topics. LexRank (Erkan & Radev, 2004), and TextRank (Mihalcea & Tarau, 2004) are implemented using the Sumy summarizer 5 with the default parameter since fine-tuning does not return a substantial improvement. It is worth noting that these approaches cannot be directly compared for the targeted problem. However, they can be seen as alternatives to achieve the same purpose.

Quantitative evaluation
For the quantitative evaluation, we use ROUGE metrics (Lin & Och, 2004) as the intrinsic evaluation. ROUGE metric has been a standard to evaluate text summarization quality. The common approach is that given some gold standards of human evaluation, the more words that overlap with the automated summary, the higher the ROUGE score is. ROUGE metrics is usually expressed as ROUGE-N metric: where M is the set of manual summaries, n is the length of n-grams, count (n-gram) is the number of n-grams in the manual summary, and watch(n-gram) is the number of co-occurring n-grams between the manual and generated summaries. Table 1 indicates the performance of ROUGE 1-gram and 2-grams on several benchmarks.
We sampled one thousand tweets with the seed keywords filtering (e.g. tweets mentioning '#Australia' or 'Gold Coast') to obtain a better relevancy and a thousand Amazon product reviews in electronic goods. We manually generated three human gold standard evaluations for both sample datasets. The number of the gold standard summary is set to match the number of appropriate topics. The number of topics identified is usually less than the number of clusters generated. In the experiment, we maintained the number of topics to maximum ten. We report the average of ROUGE-1 and ROUGE-2 scores according to the gold standard for ten iterations on each topic. Note that it can be difficult to generate a gold standard summary for the product review since the human judgement is prone to subjectivity towards the aspect-specific. We generate the gold standard according to the deeper perspective (Ly, Sugiyama, Lin, & Kan, 2011), for example, the 'good' and 'bad' of a product.

Qualitative evaluation
For the qualitative evaluation, we judge the quality of the summarization based on human interpretation. A better approach is to present the results to someone who does not have any background on the topic and then ask her whether she can understand what is the summarization about. Once we translate the dependency trees into the dependency graph, we perform the summarizing task by considering several parameters when we queried the graph: . Frequency Pattern. We set a certain threshold to get the most frequent words relation pattern. This method is based on the relationship that co-occurs between words since the edges preserve the frequency properties. Within a specific threshold, we can filter the graph to generate a summary of text. Note that the co-occurrence frequency is generated based on the word-to-word relationship in contrast to the bagof-words model. . Core Nominals. The other way is by filtering the core nominals property to get all the main object. Core nominals usually belong to 'nsubj', 'obj', or 'iobj'. Once we get this, we can further use the frequency threshold filtering as the combination to get stronger summarization. In our observation, this combination benefits the product review dataset as the content will be toward the aspect-specific (products).
Both methods have the trade-off between each other. The limitation of the first method is that it may eliminate the main subject or object (e.g. nsubj, obj) which will not generate strong interpretations, while the second may produce excessive terms. Frequency pattern often favours larger dataset (graph) while considering core nominals is better when the dataset is small. We balance the result by taking each subset of each method by tuning the predetermined threshold, for example: WHERE r.frequency >= 3 AND n.tag = "NN" OR n.tag = "NNS" RETURN n; In Table 2, we present the summaries of sampled topics from K-Means, DBScan, LDA, LexRank, TextRank, and Belief Graph. For approaches other than the belief graph, we aggregate the short texts into a single large document before each run. Based on Table  1, we can see that belief graph has more interpretable and more cohesive structure. Some keywords have the supporting word preposition such as 'for the' audition generated from the graph. Due to space constraints, we only list keywords that are closely related. Belief graph does not use stopwords to retain the complete structure of the sentence. We will discuss the results further in the next section.

Discussion
We examined the ability of belief graph to summarize short text based on carefully tuned graph queries. The whole process consists of part of speech tagging and graph query execution. Based on the experiments, we showed that belief graph achieves a satisfactory result in terms of interpretability. By levering the natural order of words using part of speech tagging, we showed that it is possible to generate the summary from words relationship. Most state of the art techniques such as topic models neglect the importance of word order which often hindered the overall understanding and interpretability. Moreover, the bag-of-words model suffers from sparsity issue when the texts are short. Belief graph, on the other hand, can handle normal or short texts.
The overall process to construct the belief graph takes two sequential procedures. At first, we translate the dependency tree from the Stanford CoreNLP part of speech tagging. This process takes a considerable amount of time when the number of document is above a thousand. In the experiment, we used the original library without further modification which accounts for more than half of the process time. One plausible future direction is to reduce the labelling time to make the online process possible. The second procedure is to load the dependency tree into the graph database. We chose Neo4j due to its popularity for easier and faster implementation. The database also provides off-shelf graph algorithms making it more favourable in the production. The writing process gains two times improvement compared to the older version. Since we do not consider distributed implementation, the writing reaches its bottleneck at ten thousand documents. Overall, it takes less than a minute for a thousand documents.
On the quantitative evaluation, based on the ROUGE metrics results, belief graph achieved better results in the overall summarization. When the gold standard includes the common stop-words (e.g. word preposition), the belief graph gets a bonus score in the ROUGE metrics. Thus, a slightly noticeable improvement in the metrics. Most statistical techniques emphasize the word frequency for the keywords extraction while both LexRank and TextRank apply sentence extraction. Summarizing short text, especially tweets based on the word frequency and sentence ranking do not work well. Gold standard summarization for short text often takes word by word instead of sentence by sentence. Therefore, sentence ranking is not appropriate to fit ROUGE evaluation for short text.
Overall, since the metrics only consider the overlapping words between the generated summary and the gold standards, it is difficult to get the exact match of the words. The metrics do not acknowledge the word or context similarity. We argue that probability soft scoring is more appropriate to judge the relevancy. This explains why for the most techniques, the ROUGE metrics are very low.
In terms of the quality of summarization, we demonstrated that by maintaining the natural structure of human language makes the summarization process easier to understand. This is especially true when the user does not have background knowledge of the content. We observed that the word prepositions are helpful in mapping the relationship between words that appear in the final summarization. One drawback is when the grammatical structure of texts is constantly out of order. For other techniques, the quality of summarization is still somehow depending on the term frequency. When the frequency of words is low, the word choice in the generated summary tends to be unstable.
Finally, the belief graph offers a simple and straightforward technique to generate a short summary to quickly understand a document. In contrast to the previous techniques developed, our method embeds part of speech tagging in the directed graph to maintain the grammatical structure. Since the granularity of each vertex is based on words level, our proposed method works for both normal and short text as it utilizes the word relationships instead of word frequency.

Conclusion
In this work, we presented the belief graph model construction from raw texts. Within the model, we proposed the implementation of the dependency tree from Stanford dependency parser to build a dependency graph. Based on our model we are able to detect important keywords and summarize text-based their word-to-word relationship. Through the performance and practical study in our experiments and using real-world social media post dataset, we showed the effectiveness of the belief graph, especially in mining short text. Additionally, there is a need to improve the current tagging system as it is not yet effective in predicting the grammar for short text, specifically for social media content. Looking into indexing of the tree merging prior loading into the database, which can improve the processing time, is another avenue for further improvement. At this point, the system is only suitable for the offline task due to the long overall processing time. Moreover, short text similarity measurement is important in the aggregation process to avoid duplication, which in the end will improve the efficiency of the overall process.
Notes within the Institute of Integrated and Intelligent Systems at Griffith University. Professor Stantic is also Head of the School of Information and Communication Technology at Griffith University, Brisbane, Australia.