Deep Learning Approach for Aspect-Based Sentiment Classification: A Comparative Review

ABSTRACT The emergence of various e-commerce sites has led to an increase in review sites for various services and products. People nowadays easily get information about products and services that will be used through reviews. Here sentiment analysis plays an important role in classifying the polarity of product reviews. However, with a large number of reviews, a sentiment analysis that only gives overall polarity is not sufficient. This will make it difficult to find the reviews of certain aspects (features) of the product. Aspect-based sentiment analysis as fine-grained sentiment analysis is able to provide specific polarity for each aspect contained in a sentence. Various kinds of development methods have been carried out to provide accurate results in aspect-based sentiment analysis. This paper will discuss the various deep learning methods that have been carried out and provide the possibility of research that can be carried out from Aspect-Based Sentiment Analysis.


Introduction
With the use of internet tools rapidly increasing, the use of e-commerce such as Amazon, Walmart, and Alibaba have recently increased as well. The growth of e-commerce, as a new shopping and marketing channel, is causing a surge in review sites for a variety of services and products (Pontiki, Galanis, and Papageorgiou 2015). Customers buy products online and freely express their ideas and thoughts. Due to this increase, huge amounts of data are generated. In business sectors, this huge amount of review data has great efforts to find out customers' sentiments and opinions, often expressed in free text, toward companies' products and services. Customer reviews are in unstructured textual form, which makes it difficult to be summarized by a computer. In addition, manual analysis of this huge amount of data is nearly impossible.
Here automatic sentiment analysis comes to solve these problems.  Table 1 Sentiment analysis also known as opinion mining (Pang and Lee 2008), is classically classified based on the overall polarity of the input. The majority of current approaches, however, attempt to detect the overall polarity of a sentence, paragraph, or text spam and are based on the assumption that the sentiment expressed in a sentence is unified and consistent, which does not hold in reality. Sentiment analysis aims to classify text as positive or negative, sometimes even neutral according to the affective states and the subjective information of the text. In this case, sentiment analysis is called sentiment classification because it divides polarities into two or more classes. In theory, sentiment analysis is divided into three levels, namely document level, sentence level, and aspect/feature level, sometimes also called target. At the document level, sentiment classification is determined by looking at opinions from a review and determining their polarity. The system will decide whether the overall review contains positive or negative reviews (Toqir and Cheah 2016). Sentence level, the analysis is done one by one to the sentence. The last level of  sentiment analysis is aspect level sentiment analysis. Aspect-level sentiment analysis is proposed to conduct fine-grained opinion mining toward specific entities or categories of entities which are also called targets (Ren et al. 2020).
The goal of aspect-based sentiment analysis is to identify the sentiment polarity of a specific opinion target/aspect expressed in a comment or review by a reviewer. Conventional approaches mainly focus on designing a set of features such as bag-of-words, sentiment lexicon to train classification (e.g., SVM) for aspect-based sentiment analysis (Jiang et al. 2011). To make the system accurate, the aforementioned strategy requires a lot of features, and later machine learning techniques are proposed (Manek and Shenoy 2016;Syahrul and Dwi 2017).However, with the popularity of deep learning in natural language processing (NLP), researchers used that approach to solve aspect-based sentiment analysis problems (Poria, Cambria, and Gelbukh 2016;Tang, Qin, and Liu 2014).
This survey will discuss the methods used in solving aspect-based sentiment analysis problems. Through this survey, it will be possible to see the pattern of methods used in solving those problems.

Method for Collect and Review Papers
In conducting a review on aspect-based sentiment analysis, the first step is to find related journals that we will review. The search process for the journals that we will review is to search for them through Google Scholar by using several related keywords such as ABSA, aspect extraction, and deep learning in aspect extraction. From Google Scholar, we choose which papers are related to deep learning on aspect-based sentiment analysis, and then we also look at the reputation of the journals such as IEE, Springer, Science direct, and so on. Here we will show the journal sources distribution.
After finding the related paper, the next step is to review. The review process that we do is first, we sort out titles that match the topic we will review. Second, we start to look at the abstracts of the papers, and if we think they are related, we will use them to review at the next stage. Third, we look at the proposed method, and whether the method is deep learning or not, if the method used is deep learning, then we will use it. Fourth, we look at the dataset. The dataset that we will compare here will only use the most widely used dataset in the ABSA task namely, SemEval dataset and Twitter dataset. The last stage is that we look at the results of each paper, namely the accuracy and F1 values, which we then compare with several studies.

Aspect-Based Sentiment Analysis
Online seller products on the e-commerce often ask their customers to review the products that they have purchased and the related services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a prospective customer to read them to make the right decision on whether to purchase the product. It also makes it difficult for the product manufacturers to keep track and to manage customer opinions. These problems can be solved with a sentiment classification or sentiment analysis.
In the case of customer reviews, it has been observed that customers always comment multiple aspects in one sentence at the same time. This kind of problem cannot be solved by traditional sentiment classification. ABSA comes to overcome this issue. Instead of predicting the overall sentiment polarity, fine-grained ABSA is proposed to better understand reviews than traditional sentiment analysis. Suppose that a restaurant customer gives a comment on the restaurant he visited "The price is reasonable although the service is poor." There are two targets "price" and "service" mentioned in that sentence and they express positive and negative, respectively. The example can show that the polarity will be opposite when different aspects are considered.  In this survey, we divided aspect-based sentiment analysis (ABSA) into three sub-tasks: aspect term extraction, aspect term categorization, and aspect term sentiment classification. Figure 3 shows the sub-tasks in ABSA.

Aspect Term Extraction (ATE)
Aspect term extraction (or aspect identification or opinion target extraction) is a subtask of ABSA that deals with identifying different aspects mentioned within a given sentence. The major task of ABSA is to extract aspects from the review text (Da'u and Salim 2019). Traditional approaches to aspect term extraction typically rely on using handcrafted features, linear, and integrated network architectures. Although these methods can achieve good performances, they are time-consuming and often very complicated. In real-life systems, a simple model with competitive results is generally more effective and preferable over complicated models.
Aspect term extraction (ATE) for opinion mining was first studied by Hu, Liu, and Street (2004). They introduce the distinction between explicit and implicit aspects. Explicit aspects are concepts that explicitly denote targets in the opinionated sentence. For instance, in the example sentence "I love the touchscreen of my phone but the battery life is so short." Touchscreen and battery life are explicit aspects as they are explicitly mentioned in the sentence. On the other hand, an aspect can also be expressed indirectly through an implicit aspect , e.g., in the sentence "This camera is sleek and very affordable," which implicitly provides a positive opinion about the aspect "appearance" and "price" of the entity camera. As mentioned in the first example, it contains two aspects, namely "touchscreen" and "battery life." In this case, applying a sentence level polarity detection technique would mistakenly result in a polarity value close to neutral since the two opinions expressed by the user are opposite. Hence, aspect is necessary to first deconstruct sentence into product features and then assign a separate polarity value to each of these features. Given another example sentence, "The screen of my phone is really nice and its resolution is superb" for a phone review contains positive polarity, i.e., the author likes the phone. However, more specifically, the positive opinion is about its screen and resolution. These concepts are thus called opinion targets, or aspects, of this opinion. The task of identifying the aspects in the given opinionated text is called aspect extraction (Poria, Cambria, and Gelbukh 2016).
Rule-based methods have becomea popular method in early research ATE was proposed by Poria et al. (2014), they aim to solve the problem of ATE from product reviews by proposing a novel rule-based approach that exploits common-sense knowledge and sentence dependency trees to detect both explicit and implicit aspects proposed double propagation as the base and improved its results dramatically through aspect recommendation, semantic similarity-based, and aspect association-based. Since rule-based has grown in popularity, some studies have improved how automated systems identify the appropriate rule for ATE (Liu and Gao 2016;Rana and Cheah 2017).Rule-based methods usually do not group extracted aspect terms into categories.
In the supervised approaches, machine-learning systems are trained on manually annotated data to extract targets in the reviews. The most common techniques employed in supervised approaches are decision trees, Support Vector Machines (SVM) (Manek and Shenoy 2016), K-nearest neighbor (Shah et al. 2020), Naïve Bayesian classifiers (Kaur 2021), and some in neural networks (Kessler 2009;Pontiki et al. 2016). However, unsupervised methods are adopted to avoid reliance on labeled data needed for supervised learning (He et al. 2017), automatically extract product features using syntactic and contextual patterns without the need of annotated data Liu and Gao et al. 2016).
Conditional Random Field (CRF), one of the supervised-based methods, is a promising method used in the named entity recognition (NER) problems. Due to this reason, most of researchers use CRF in aspect extraction tasks since the NER and aspect extraction have a similar problem. Shu et al. (2017b)used the lifelong learning method combined with CRF (L-CRF) to leverage the knowledge gained from extraction results of previous domains, which are unlabeled data, to improve its extraction.
The methods mentioned above have their own limitations. CRF is a linear model, so it needs a large number of features to work well, linguistic patterns need to be crafted by hand, and they crucially depend on the grammatical accuracy of the sentence. Recently, methods based on the deep learning model make promising results in any sentiment analysis task including aspect extraction. To overcome the above limitations, Poria, Cambria, and Gelbukh (2016) proposed a deep convolutional neural network (CNN), a non-linear supervised classifier that can more easily fit the data. They also introduced specific linguistic patterns and combined a linguistic pattern approach with a deep learning approach for the ATE task.
CNN is widely used in research in the field of image processing. Since CNN does not require complex computations, Kim 2014proposed CNN for sentence classification to get a promising result. Due to that result, CNN has become more popular in text classification tasks especially ATE tasks. In ATE task, each aspect has a different domain depending on the context of the sentence. With the modification of the embedding layer, Xu et al. (2018) used a double embedding layer to give better performance for the CNN layer (DE-CNN). They used general embedding and domain-specific embedding to make a better performance.
Using the same double embedding method, Shu et al. (2017)modified the standard CNN in (Xu et al. 2018). They called controlled CNN (Ctrl), which has the idea by asynchronously updating control modules and CNN layers, it can boost the performance of a single task. Da'u and Salim (2019) with leveraging different embedding layers used multichannel convolutional neural network (MCNN). Similar to DE-CNN, they also use two-word embedding, general embedding, and domain embedding, but the difference is that in this approach, they use an additional embedding layer, namely Part of Speech (POS) tagging and the convolutional layer used to extract local features from the embedding layer. Jabreel, Hassan, and Moreno (2018) used a bidirectional gated recurrent neural network to extract the target of the tweets.

Aspect Term Categorization (ATC)
The second sub-task of ABSA is to cluster synonymous aspect terms into categories where each category represents a single aspect, which we call an aspect category (Mukherjee and Liu 2012). For the given example, in the sentence "I have to say they have one of the fastest delivery time in the city," the aspect term is "delivery time." For example, we can cluster aspect terms with similar meaning into categories where each category represents a single aspect (e.g., cluster "delivery time," "waiter," and "staff" into one aspect service). Ganu, Elhadad, and Marian (2009) adopted a category-specific sentiment classification in restaurant reviews. They first identified six basic categories for a restaurant. Then established that the textual entity of the review is a better indicator than the other meta-information such as star ratings.
ukherjee and Liu (2012) solved the problem in a different setting where the user provides multiple headwords for multiple aspect categories and the model extracts and groups the aspect terms into categories simultaneously by proposing two new statistical models Seeded Aspect Sentiment model (SAS) and the Maximum-Entropy Seeded Aspect Sentiment model (ME-SAS). Aspect category classification (ACC) and aspect term extraction (ATE) are often treated independently, even though they are closely related. Intuitively, the learned knowledge of one task should inform the other learning task. Xue et al. (2017) proposed a multi-task learning model based on neural networks (MTNA) to solve both tasks. ACC as a supervised classification task where the sentence should be labeled according to a subset of predefined aspect labels and ATE as a sequential labeling task where the word tokens related to the given aspects should be tagged according to a predefined tagging scheme, such as IOB (Inside, Outside, Beginning). They combine BiLSTM for ATE and CNN for ACC together in a multi-task framework. Senarath, Jihan, and Ranathunga (2019) proposed mixture classifiers for aspect extraction, combining the proposed improved CNN with an SVM that uses state-of-the art manually engineered features. Akhtar, Garg, and Ekbal (2020) proposed two strategies for joint learning of the two tasks (aspect term extraction and aspect sentiment classification). The first approach is based on an end-to-end framework, where the two tasks are solved in a sequence. BiLSTM-CNN is used in this task, the BiLSTM is responsible for learning the sequential pattern of tokens in a sentence. In addition, the self-attention module aims to assist the system in learning the importance of other tokens in the sentence, for tagging the current token as inside (or outside) of an aspect term, and using the softmax function for BIO classification. The CNN layer is used for capturing the local context of each token. In contrast, the second approach combines the two tasks into a single task and solves them as one task in architecture. They classified each token into one of the nine classes, i.e., B-Positive, I-Positive, B-Negative, I-Negative, B-Neutral, I-Neutral, B-Conflict, I-Conflict, and O.

Aspect Term Sentiment Classification
The last sub-task in ABSA is aspect term sentiment classification. After extracting the aspect term in a review sentence and classifying it, now the proposed of ABSA is classified to the sentiment polarity. Tang et al. (2016) proposed a target-dependent sentiment classification using Long Short-Term Memory. The term target here is the same meaning as the term aspect that we use in this survey. Target-dependent sentiment classification is typically regarded as a kind of text classification problem in the literature. Therefore, a standard text classification approach such as a feature-based Support Vector Machine (Jiang et al. 2011;Pang et al. 2002) can be naturally employed to build a sentiment classifier. For example, Jiang et al. (2011) manually designed target-independent features and target-dependent features with expert knowledge, syntactic parser, and external resources. Despite the effectiveness of feature engineering, it is labor-intensive and unable to discover the discriminative or explanatory factor of data. To handle this problem, Tan et al. (2014) proposed a method to transfer a dependency tree of a sentence into a targetspecific recursive structure and use an Adaptive Recursive Neural Network to learn higher-level representation. Otherwise, Vo and Zhang (2015) use rich features including sentiment-specific word embedding and sentiment lexicons.
Sentiment polarity in ABSA not only depends on the aspect on the sentence review but also depends on the content. In Wang et al. (2016), they found that the sentiment polarity of a sentence is highly dependent on both content and aspect. For example, "Staff are not that friendly, but the taste covers all." Will be positive if the aspect is food but negative when considering the aspect service. Polarity could be opposite when different aspects are considered.

Method for Aspect-Based Sentiment Classification (ABSA)
Traditional approaches to solve ABSA problems are to manually design a set of features. These conventional approaches mainly focus on designing a set of features such as bag-of-words, sentiment lexicon to train classification (e.g., SVM) (Jiang et al. 2011), rule-based methods (Ding et al. 2008), and statistic-based methods (Jiang et al. 2011;Zhao et al. 2010). However, feature engineering is labor-intensive and almost researches its performance bottleneck (Ma et al. 2017) because it needs a lot of labeled data. With the development of deep learning techniques, some researchers have designed effective neural networks to automatically generate useful low-dimensional representations from targets and their contexts and obtain a promising result on the ABSA task. In this part, we will discuss the methods used in ABSA.

Method Based on Recurrent Neural Network (RNN)
Recurrent Neural Networks (RNNs) are a type of Neural Network where the output from the previous step is fed as input to the current step. In traditional neural networks, all the inputs and outputs are independent of each other, but in cases when it is required to predict the next word of a sentence, the previous words are required, and hence there is a need to remember the previous words. Thus, RNN came into existence, which solved this issue with the help of a hidden layer. Here we will describe RNN based that researcher use in ABSA tasks.

Long Short-Term Memory (LSTM)
Standard RNN has gradient vanishing or exploding problems, where gradients may grow or decay exponentially over long sequences. In order to overcome the issues, Long Short-term Memory Network (LSTM) was developed and achieved superior performance (Hochreiter and Schmidhuber 1997). Compared with traditional feedforward neural networks, LSTM has feedback connections. It can not only process single data points (such as images) but also entire sequences of data (such as text, speech, and video). In LSTM architecture, there are three gates (input gate, forget gate, and output gate) and a cell memory state. The input gate is responsible for the addition information to the cell state. A forget gate is responsible for removing information from the cell state. The information that is no longer required for the LSTM to understand things or the information that is of less importance is removed via the multiplication of a filter. This is required for optimizing the performance of the LSTM network. Output gate has the job of selecting useful information from the current cell state and showing it out as output is done via the output gate. Figure 4 illustrates the architecture of standard LSTM. Figure 5Figure 6Figure 7 More formally, each cell in LSTM can be computed as follows: biases of LSTM to be learned during training, parameterizing the transformation of the input, forget, and output gates, respectively. σ is the sigmoid function and � stands for element-wise multiplication. x t includes the inputs of the LSTM cell unit, representing the word embedding vectors w t in Figure 4. The vector of the hidden layer is h t .
Since LSTM can capture sequence model, most researchers use LSTMbased method in text classification and sentiment classification tasks, especially in ABSA. Target-dependent sentiment classification (TD-LSTM) was first proposed by Tang et al. (2016). The proposed method is an extension of the LSTM model, which the standard LSTM model cannot capture any target information so that it predicts the same result for different targets in a sentence. The general proposed model is to use two LSTM neural networks, left and right LSTM, to model the preceding, and following contexts, respectively where the target is placed in the middle of the sentence. Wang et al. (2016) proposed a new idea with adding aspect embedding in LSTM (ATAE-LSTM). Apart from adding an aspect embedding, they also use the attention mechanism. This is the first time to propose aspect embedding in ABSA task. With the use of attention mechanism in  ATAE-LSTM, Ma et al. (2017) considered to separate modeling of targets, especially with the aid of contexts, they propose the interactive attention networks (IANs), which is based on the LSTM model and attention mechanism to interactively learn attention in the contexts and targets, and generate representation of the targets and contexts separately.
Previous LSTM-based methods mainly focused on modeling text separately, while they modeled aspects and texts simultaneously using LSTMs. Furthermore, the target representation and text representation generated from LSTMs interact with each other by an attention-over-attention (AOA) module than inspired by the use of AOA in question-answering (Cui et al. 2017). Meanwhile, He et al. (2018) proposed a modification of attention modeling based on LSTM.
Joint attention LSTM network (JAT-LSTM) is proposed to combine the aspect attention and sentiment attention to construct a joint attention LSTM network (Cai and Li 2018). The model concatenates the aspect terms embedding and sentiment terms embedding with sentence embedding as the input of the LSTM network to make the input information of the LSTM network richer.
In Table 2, we present the results of the accuracy and F1 studies of several proposed ABSA models that use LSTM as the base model. We can see that JAT-LSTM has the highest accuracy among other models with an LSTM base in the SemEval 2014 dataset. But for SemEval 2015, 2016, and Twitter datasets, here we can combine because only one approach uses these datasets. We will combine later with another method. Laptop and Rest stand for the SemEval 2014 domain, Twitter stands for the Twitter domain, Rest 15 stands for the

Gated Recurrent Unit (GRU)
Gated Recurrent Units (GRUs) is a variant of LSTM, introduced in (Cho et al. 2014). GRUs were designed to have more persistent memory, making them very useful to capture long-term dependencies between the elements of a sequence. It combines the forget gate and input gate into a single update gate. It also merges the cell state and hidden state, among other changes. The resulting model is simpler than standard LSTM models and has become a popular model in many tasks. GRU has reset (r t ) and update (z t ) gates. The former has the ability to completely reduce the past hidden state h tÀ 1 if it considers that it is irrelevant to the computation of the new state, whereas the latter is responsible for determining how much of h tÀ 1 should be carried forward to the next state h t . The output h t of a GRU depends on the input x t and the previous state h tÀ 1 , and it is computed as follows: r t and z t denote the reset and update gates, repectively, e h t is the candidate output state and h t is the actual output state at time t. The symbol � stands for element-wise multiplications, σ is a sigmoid function and stands for the vector concatenation operation.
are the parameters of the reset and update gates, respectively, where d h is the dimension of the hidden state.
Since GRU has a simpler model than LSTM, many researchers have started developing ABSA tasks using the GRU model. Most of the ABSA tasks assume that the target or aspect is known before determining the sentiment polarity. Meanwhile, Jabreel, Hassan, and Moreno (2017) proposed a model for targetdependent sentiment analysis of tweet that has the ability to identify and extract the target of the tweets, representing the relatedness between the targets and its contexts and identifying the polarities of the tweets toward the targets. They used the bi-GRU model (TD-BiGRU) to extract the target and determine the sentiment polarities. Gu et al. (2018), in their work, proved that when an aspect term occurs in a sentence, its neighboring words should be given more attention than other words with long distance. They proposed a position-aware bidirectional attention network (PBAN) based on bidirectional GRU. It not only concentrates on the position information of the aspect terms, but also mutually models the relation between aspect term and sentence by employing bidirectional attention mechanism. Previous works of ABSA have proved that the interaction between aspects and the contexts is important. Otherwise, most of those works ignore the position information of the aspect when encoding the sentence. HAPN (Hierarchical Attention-based Position-aware Network) (Li, Liu, and Zhou 2018) is proposed to solve this problem. They introduce the position embeddings when modeling the sentence and further generate the positionaware representations. In addition, they propose a hierarchical attention-based fusion mechanism to fuse the clues of aspects and the contexts. The results demonstrate that the proposed approach is effective for aspect-level sentiment classification, and it outperforms state-of-the-art approaches with remarkable gains.
In Table 3, we present the results of the accuracy and F1 studies of several proposed ABSA models that use GRU as the base model. We can see that HAPN has the highest accuracy among other models with GRU base in the SemEval 2014 dataset but if we compare with the LSTM base model, still LSTM in JAT-LSTM has the highest performance. For the Twitter dataset, TD-biGRU gets the highest accuracy performance.

Method Based on Convolutional Neural Network (CNN)
Convolutional Neural Networks (ConvNets or CNNs) are a category of Neural Networks that have proven very effective in areas such as image recognition and classification. CNNs have been successful in identifying faces, objects, and traffic signs apart from powering vision in robots and self-driving cars. In the past years CNN shows breakthrough results in some NLP tasks, one particular task is sentence classification (Kim 2014), i.e., classifying short phrases into a set of pre-defined categories.
Most Convolutional Neural Network (CNN) methods are used in text classification tasks and can achieve state-of-the-art performance on many standard sentiment classification datasets (Le, Cerisara, and Denis 2017). The CNN model consists of an embedding layer, a one-dimension convolutional layer, and a max-pooling layer. The embedding layer is usually initialized with pre-trained embeddings such as Glove (Pennington, Socher, and Manning 2014).
Since the ABSA is a part of text classification, some researchers used a method based on CNN. CNN cannot stand alone to solve the ABSA problem, some studies combine the CNN with a sequence model such as Table 2. Performance proposed methods based on LSTM model. LSTM. LSTM employed in ABSA have weaknesses, such as lacking position invariance and lacking sensitivity to local key patterns. LSTM combines with attention mechanism widely used in ABSA task, but it is time inefficient because they processed the given sequence in a circular manner and need more training time. Meanwhile, the CNN model can address these limitations because the convolution layer can easily perform parallel operations during training without waiting for the results of the previous step, but it is weak at capturing long-distance dependency and modeling sequence information. Many researchers try to tackle this limitation such as Xue and Li (2018) instead of using attention mechanism, they proposed a model based on convolutional neural network and gating mechanisms with aspect embedding (GCAE), which is more accurate and efficient.
Sparse attention-based separable dilated convolutional neural network (SA-SDCNN) (Gan et al. 2019) is a proposed method that is composed of multichannel embedding layer, separable dilated convolutional module, sparse attention layer, and output layer. The multichannel embedding using three different embedding layers, namely word2vec, glove, and SSWE for sentiment embedding. Ren et al. (2020) proposed the distillation network (DNet), a lightweight and efficient sentiment analysis model based on gated convolutional neural networks. The proposed model first encodes the sentence with gated convolutional networks to control what information is useful in predicting the sentiment polarity. They use the gating units from GCAE (Xue and Li, 2018) to extract aspect-sensitive information. Meanwhile, some researchers argue that the position of aspect and its context are important, Wu et al. (2020) proposed a relative position and aspect attention encode model (RPAEN) for ABSA based on convolutional neural networks. The proposed model first introduces a position encode layer to encode the relative position of the aspect term in the text so that the corresponding relative position information can be incorporated into the model and more conducive to the sentiment analysis of the current aspect term. Then, they use the aspect attention mechanism instead of the general attention mechanism to better capture the dependence between all words in the text and aspect term. Table 4 shows the results of some popular ABSA methods based on the CNN model. Table 5Table 6

Method Based on Memory Network
Memory network is a general machine learning framework introduced by Weston, Chopra, and Bordes (2015), coupled with multiple-hop attention attempts to explicitly focus only on the most informative context area to infer the sentiment polarity toward the target word. Its central idea is inference with a long-term memory component, which could be read, written to, and jointly learned with the goal of using it for prediction. Formally, a memory network consists of a memory m and four components, I, G, O, and R, where m is an array of objects such as an array of vectors. Among these four components, I coverts input to internal feature representation, G updates old memories with new input, O generates an output representation given a new input and the current memory state. R outputs a response based on the output representation. The method based on memory networks explicitly hold the context information through memory and acquire relation between the target and context through attention mechanisms.
Tang, Qin, and Liu (2014) proposed a deep memory network (MemNet) for aspect-level sentiment classification. The method can capture the importance of each context word when inferring the sentiment polarity of an aspect. This approach consists of multiple computational layers with shared parameters.

Attention Mechanism
Attention models have recently gained popularity in training neural networks and have been applied to various natural language processing tasks, including machine translation, sentence summarization, sentiment classification, and question answering. It was proposed in machine translation for the purpose of selecting referential words in original language for words in counterparts language before translation. Rather than using all available information, attention mechanism aims to focus on the most pertinent information for a task. Since the success of applying attention network on translation task (Bahdanau, Cho, and Bengio 2015;Luong, Pham, and Manning 2015), lots of work have designed attention mechanism networks to address the aspect-based sentiment analysis. In ABSA task, attention mechanism able to capture the importance of each context word toward a target by modeling their semantic associations so that can obtain comparable result.
A feature-enhanced attention network for target-dependent sentiment classification (FANS) has been proposed by Yang et al. (2018). They improve the attention model by using multi-view co-attention network (MCN) to learn a better multi-view sentiment awareness and target-specific sentence representation via interactively modeling the context words, target words, and sentiment words. Multi-attention network (MAN) also proposed by Xu et al. (2020). The proposed model uses intra-and inter-level attention mechanisms. Intra-level employs a transformer encode instead of a sequence model to reduce training time. Meanwhile the inter-level attention mechanism uses a global and local attention module to capture differently grained interactive information between aspect and context. Similar to Zhang and Lu (2019), Zhang et al. (2020) also used BERT as a pre-trained to model the data. The proposed model is called multiple interactive attention network (MIN). After the pre-trained process is done, they use the partial transformer to obtain a hidden state in parallel. Park, Song, and Shin (2020) proposed a deep learning model for ABSA that combines the location attention and content attention. There are two models proposed: one implementing Holistic Recurrent content attention on targetdependent memories from One-directional one-layered networks (HRT_one), the other from Bi-directional bi-layered networks (HRT_Bi). These two models employ target-dependent LSTM to produce memories from an input sentence and GRU cells for integrating sentence representations generated from different content attention weights on memories.

Method Based on Capsule Network
Capsule network was introduced to improverepresentation limitation of CNN and RNN models by extracting features in the form of vector. Capsule network in sentiment analysis was proposed by Wang, Sun, and Han (2018). They introduce the RNN-capsule for sentiment analysis, the capsule model based on Recurrent Neural Network (RNN). Each capsule is capable not only predicting the probability of its assigned sentiment, but also reconstructing the input instance representation. Compare with most existing neural network models for sentiment analysis, RNN-capsule model does not heavily rely on the quality of input instance representation. This model does not require any linguistic knowledge. The number of capsule N is the same as the number of   --sentiment categories to be modeled, each corresponding to one sentiment category. For example, three capsules are used to model three fine-grained sentiment categories: "positive," "neutral," and "negative." Since the capsule model used in sentiment analysis task, Wang et al. (2019) modified the RNN-capsule to suit the ABSA task. They propose the aspectlevel sentiment capsule model (AS-Capsule), which is capable of performing aspect detection and sentiment classification simultaneously, in a joint manner. In order to solve the problem of lacking aspect-level labeled data, Chen and Qian (2019) proposed a Transfer Capsule Network (TransCap) model for transferring document-level knowledge to aspect-level sentiment classification. Document-level labeled data like reviews are easily accessible from online websites. Meanwhile, the publicly available dataset for ABSA often contains limited data number of training examples.
Another modification of the capsule network was proposed by Du, Sun, and Wang (2019). The proposed model combines the capsule network with the attention mechanism. This capsule network in aspect-level sentiment analysis is used to tackle the overlapped features by feature clustering. They use EM routing algorithm to cluster features and to construct vector-based feature representation. Furthermore, interactive attention mechanism was introduced in the capsule routing procedure to model the semantic relationship between aspect terms and context. GCNAs (

Method Based on Graph Neural Network
Graph Neural Networks (GNNs) were introduced in (Sperduti and Starita 1997) and (Gori, Monfardini, and Scarselli 2005) as a generalization of recursive neural networks that can directly deal with a more general class of graphs, e.g., cycling, directed, and undirected graphs. GNNs consist of an iterative process, which propagates the node states until equilibrium; followed by a neural network, which produces an output for each node based on its state. Jeon et al. (2019) proposed graph-based aspect and rating classification, which utilizes multi-modal word co-occurrence network to solve aspect and sentiment classification. The graph-based aspect ratings classification framework builds word co-occurrence network from a given corpus, defining words as different models if their source document is labeled with different aspects or sentiment categories. Then, the model computes word-aspect dispersion score and word-rating dispersion score from the network, which are then concatenated and used as input for a feedforward neural network for aspects and rating classifications. Another approach based on graph methods proposed by Zhao, Hou, and Wu (2019) is a method to model Sentiment Dependencies with Graph Convolutional Networks (SDGCN). For every node in the graph, GCN encodes relevant information about its neighborhoods as a new feature representation vector. An aspect is treated as a node, and an edge represents the sentiment dependency relation of two nodes. The model learns the sentiment dependencies of aspects via that graph.
Graph Convolutional Networks (GCNs) often show the best performance with two layers, and deeper GCNs do not bring additional gain due to the over-smoothing problem. Hou, Huang, and Wang (2019) designed selective attention-based GCN block (SA-GCN) to find the most important context words and directly aggregate this information into the aspect-term representation.
The success of RNN and CNN in ASBA tasks also has shortcomings namely, they do not take full account of the entire text structure and the relation between words in a given document. To overcome these shortcomings, Chen (2019) proposed a novel neural network method (GCNSA) in which the text is treated as a graph and the aspect in the specific area of the graph. They performed the convolutional operation on the text graph to obtain a full-text hidden state and introduced an extended structural attention model implemented by LSTM to capture certain information. Unlike the previous methods, 2019Huang et al. (2019)represented a sentence as a dependency graph instead of a word sequence. They proposed a novel target-dependent graph attention network (TD-GAT), which explicitly utilizes the dependency relationship among words. In their experiment, they try to use two different pretrained data; GloVe and BERT.
The use of the attention mechanism in a graph convolutional network (GCN) was proposed by . They designed selective attentionbased GCN block (SA-GCN) to find the most important context words and directly aggregate this information into the aspect-term representation by applying GCN on the dependency tree. A similar method using dependency tree was proposed by Zhang, Li, and Song (2019) called Aspect-specific Graph Convolutional Network (ASGCN). It starts with a bi-LSTM layer to capture contextual information regarding word orders. Then, a multi-layer graph convolution structure is implemented on top of the LSTM output to obtain aspect-specific features. Recently, Xiao et al. (2020) proposed targeted sentiment classification based on attention encoding and graph convolutional networks (AEGCN). The proposed model with the BERT pre-trained model, composed of a multi-head self-attention improved graph convolutional network built over the dependency tree of a sentence.

Word Embeddings
Word embeddings are the representation of document vocabulary. It allows words with similar meaning to have a similar representation. Word embeddings give the impressive performance of deep learning methods on challenging natural language processing problem. Word embeddings are a class of techniques where individual words are represented as real-values vectors in a predefined vector space. Each word is mapped to one vector and the vector values a learned in a way that resembles a neural network, and hence the technique is often lumped into the field of deep learning. In this paper, we will provide a brief discussion of several word embedding methods used for the ABSA task.

Word2vec
Word2vec (Mikolov et al. 2013) is a statistical method for natural language processing uses a neural network model to learn word associations from a large corpus of text. Word2vec represents each distinct word with a particular list of number called a vector. Two different models were introduced as a part of the word2vec approach to learn the word embedding, namely Continuous Bag-of-Words (CBOW) and Skip Gram. When we reviewed several papers related to ABSA, we found several models using word2vec. We present several proposed models that use word2vec as word embedding in Table 7.

Glove
Glove, Global Vectors for Word Representation is an extension to the word2vec method for efficiently learning word vectors, developed by (Pennington, Socher, and Manning 2014). It is based on matrix factorization techniques on the wordcontext matrix. Glove mapping words into a meaningful space where the distance between words is related to semantic similarity. Many researchers in the NLP field use Glove as their word embedding, especially in the ABSA task. We describe several studies on ABSA using GloVe as word embedding in Table 8.

BERT
BERT, Bidirectional Encoder Representations from Transformers, is a language representation model designed to pretrained deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. BERT can be fine-tuned with just one additional output layer to create a state-of-the art model for a wide rank task (Kenton, Kristina, and Devlin 2019). Because of these advantages, several studies have tried to use it on ASBA tasks. We summarize some of these studies in Table 9.
We can say that the use of BERT as a pre-trained word embedding has not been done much in the ABSA field when compared to the use of Glove. So that BERT is still very broad to be developed in solving ABSA problems.

Dataset for ABSA
For the dataset section, we use the most widely used datasets for ABSA tasks, namely SemEval 2014 task 4 and Twitter dataset. We chose the SemEval 2014 task 4 dataset because this dataset was used for the competition in making models for ABSA where the competition was divided into several subtasks, namely aspect term extraction, aspect category polarity, aspect category detection, and aspect category polarity. The dataset consists of two domains, namely the restaurant domain and the laptop domain, which contains comments about products, namely restaurants and laptops. So, this dataset is the most widely used dataset for modeling ABSA problems.
Meanwhile, apart from the SemEval 4014 task 4 dataset, the second most used dataset is the Twitter dataset. Unlike the dataset on SemEval, the Twitter dataset uses comments contained on Twitter written by users. So, there is no special domain used in the Twitter dataset.

SemEval 2014 Dataset
SemEval 2014 dataset were proposed by Pontiki and Pavlopoulos (2014) that contained manually annotated reviews of restaurants and laptops. Here we will describe the details about SemEval dataset. Table 10 describes the size of the  dataset in the sentence and Table 11 describes the aspect terms and their polarities per domain.
SemEval 2014 dataset with laptop and restaurant domain has become a popular dataset among researchers in ABSA tasks. We combine the above proposed methods and put them into a chart. We divide the chart with two evaluation performances. Figure 8 shows the accuracy performance and Figure 9 shows the F1 performance.
As we can see from the figure, the highest accuracy performance on both laptop and restaurant domain come from AS-LSTM-CNN. We can say that the best performance is in the combination of convolutional neural networks (CNNs) and Recurrent Neural Networks (RNNs). Meanwhile, the highest F1 performance comes from MAN-BERT, and we can say pre-trained BERT can give better performance than others. From this result, we can develop more combination methods and pre-trained models to get better performance.

Twitter Dataset
Not only product reviews that areused in aspect-based sentiment analysis but also Twitter datasets get much attention to solve the ABSA task. Tan et al. (2014), manually annotated Twitter post comments. They used keywords (such as "bill gates," "taylor swift," "xbox," "windows 7," "google") to query the Twitter API. Figure 10 shows the chart performance of the Twitter dataset in accuracy and F1 performance. From those methods, we can see that Gated Recurrent Unit gets the best performance. From the figure above, we can see that MAN gets the best accuracy performance when we use the Twitter dataset. Meanwhile, if we talk about F1 score performance, MAN-BERT gets the highest score. Still, we can say that the use of pre-trained model BERT can have more impact.

Classification Performance Evaluation Metrics
To measure the performance of classification methods, most researchers used standard evaluation metrics (Yang 1999). Four standard metrics are used in most of the ASBA tasks. In this section, we will describe the four evaluations metrics.

Recall
A recall is defined as the number of true positives divided by the number of true positives plus false negatives.   F1-Score (also called F-Score or F-measure) is a measure of a test's accuracy. It considers both the precision p and the recall r. It is the weighted average of p and r.   Accuracy simply measures how often the classifier makes the correct prediction. It is the ratio between the number of correct predictions and the total number of predictions (the number of data points in the test set) Here TP stands for the number of True Positive, TN stands for the number of True Negative, FP stands for the number of False Positive, and FN stands for the number of False Negative.

Conclusion
In this survey, we describe aspect-based sentiment analysis (ABSA). First, we described the task of ABSA, there are three subtasks that we describe namely, aspect term extraction, aspect term categorization, and aspect term sentiment analysis. We provide several models for each task from ABSA. Second, we describe the deep learning methods that are used to solve the ABSA tasks. Finally, we describe two popular datasets that used in the ABSA task. From the survey that we conducted, we can see that ABSA problem is still an interesting and very broad matter to study because of the many methods that can be used, and effective methods are still needed to solve the ABSA problem. Many deep learning models have been used for ABSA, however, it is still very challenging to build an efficient model with high accuracy in ABSA tasks. From the review, we can say that graph and capsule models are still few to be developed and can be considered in the future to build models based on these two models.
We can see that the use of word embedding is also very influential on the level of accuracy. For example, in one model, the use of different word embedding will result in different levels of accuracy such as the use of Glove and BERT in one model. So, the choice of word embedding becomes very important. BERT here can be considered to be developed as a pre-trained embedding use. From the two datasets, we also found that every dataset has a different method to get better performance. It is still challenging to find the method that can be flexible to use in several datasets.