Abstract
Aspect-based sentiment analysis (ABSA) aims to mine the sentiment tendencies expressed by specific aspect terms. The studies of ABSA mainly focus on the attention-based approaches and the graph neural network approaches based on dependency trees. However, the attention-based methods usually face difficulties in capturing long-distance syntactic dependencies. Additionally, existing approaches using graph neural networks have not made sufficient exploit the syntactic dependencies among aspects and opinions. In this paper, we propose a novel Syntactic Dependency Graph Convolutional Network (SD-GCN) model for ABSA. We employ the Biaffine Attention to model the sentence syntactic dependencies and build syntactic dependency graphs from aspects and emotional words. This allows our SD-GCN to learn both the semantic relationships of aspects and the overall semantic meaning. According to these graphs, the long-distance syntactic dependency relationships are captured by GCNs, which facilitates SD-GCN to capture the syntactic dependencies between aspects and viewpoints more comprehensively, and consequently yields enhanced aspect features. We conduct extensive experiments on four aspect-level sentiment datasets. The experimental results show that our SD-GCN outperforms other methodologies. Moreover, ablation experiments and visualization of attention further substantiate the effectiveness of SD-GCN.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
With the rapid proliferation of electronic devices, an enormous quantity of comments on specific aspects has emerged on the Internet. To mine specific opinion information from these comments, aspect-based sentiment analysis (ABSA) has gained attractive interest [1]. ABSA has two main categories, Aspect Sentiment Classified (ASC) [2, 3] and Aspect Term Extraction (ATE) [4, 5]. In this study, we focus on ASC, which attempts to infer the emotional tendency expressed by particular aspects of the sentence. For instance, “The roasted pancake on this snack street is truly captivating, but the environment here is undeniably a bit unpleasant.”, where the aspect terms “roasted pancake” and “environment” express two contrasting sentiments. Previous sentence-level analyses can only yield an entire sentiment, while ABSA can accurately identify the flavor of “roasted pancake” and the complaint about “environment”. Therefore, ABSA exhibits greater efficacy in identifying the emotional polarity of particular aspects in sentences [6].
For ABSA, early studies train classifiers with manual rules and handcrafted features. Vo et al. [7] integrate distributed word representation and sentiment lexicon information for context presentation. They also use neural pooling functions for feature extraction. However, due to the tedium of manual feature extraction, the performance almost reaches bottleneck. To this end, some studies employ neural networks to avoid manual rule setting and automatically acquire contextual representations. Dong et al. [8] introduce recurrent neural networks (RNNs) with adaptation for classify Twitter emotions in objective-dependent ways. Moreover, Tang et al. [9] consider the relations among the context words and aspects with Long Short-Term Memory (LSTM), and describe how different attentions can be used to further improve model performance. Since then many studies have used attentional mechanisms to model the relationships among aspects and viewpoint words [10, 11].
Despite the effectiveness achieved by the attention-based approach, there still are some limitations in dealing with syntactic dependency relationships between aspects and viewpoints. As the sentence in Fig. 1 shows, “should be” is a verb phrase that has the adjective “more” as a modifier before “friendly”. Under this modifying context, “friendly” does not convey its literal meaning. Thus, these sentence structure and syntactic dependency relationships are crucial in determining the positional and modifier relationships, and ultimately influencing the semantic and emotional tendencies of the terms. If the attention does not take full advantage of this syntactic dependency, it may lead to computational errors in attention computation. And there is also a sparse distribution of weights in the face of long-term dependencies. All of these can affect the identification of sentiment polarity.
Considering that simple attention-based methods are unable to make full use of the dependencies within sentences, some researchers utilize syntactic dependency trees to obtain richer structural and syntactic information. Figure 1 shows a syntactic dependency tree obtained through dependency parsing, where words such as “staff”, “should”, etc. are represented as nodes, and dependencies between words are expressed by directed edges.
To utilize syntactic information within the syntactic dependency tree, Nguyen et al. [12] encode the nodes of the entire dependency tree bottom-up through the RNN, to extract aspect and sentiment features in sentences. However, they map the embeddings by averaging word vectors with the same weights for different nodes, which makes it difficult to obtain more complex semantics. Therefore He et al. [13] compute interior node distances for attentional weights decay that assigns different weights. And fuse contexts information and aspect terms by the Bi-directional LSTM network (Bi-LSTM). But as a type of graph-structured data, dependency trees can be better utilized for capturing semantic knowledge using graph neural networks (GNNs). Thus, most of the recent work on dependency trees is related to GNN [14, 15]. In this type of work, Zhang et al. [16] leverage syntactic dependencies and aspects of emotional knowledge through graph convolution operations carried out on dependency trees. Sun et al. [17] enhance embedding using a graph convolutional network (GCN), which manipulates phrase dependency structures directly, after learning sentence feature representations by Bi-LSTM. However, these works treat all neighboring nodes equally in the graph, which lacks an effective mechanism to differentiate the importance of neighboring nodes. In other words, they only consider the knowledge from the neighbor and overlook the associations of the long-distance nodes. Within this context, the models may have difficulty in capturing all syntactic dependencies among words.
In this paper, we propose a novel Syntactic Dependency Graph Convolutional Network model (SD-GCN) for ABSA task, which models the syntactic dependencies to overcome the aforementioned problems. SD-GCN employs dependency parsing to model the syntactic dependencies and build the syntactic dependency graphs, which can utilize the full potential of syntactic dependency relations. The structured representation obtained through the syntactic dependency modeling can assist in reducing the gap between aspects and opinions. This makes SD-GCN easier to capture their long-distance syntactic dependencies. Specifically, we obtain the semantic information and contextual word representations of the sentence by the BERT encoder. Next, Biaffine Attention is used for building the syntactic dependency graphs among words. Then, we utilize GCN to implement syntactic dependencies and yield enhanced aspect features for aspects prediction.
The main contributions are as follows:
-
(1) To effectively exploit the sentence–word interactions in ABSA, we propose SD-GCN to model syntactic dependencies and enhance aspect sentiment.
-
(2) We employ the biaffine attention to model syntactic dependencies for aspects and opinions, constructing syntactic dependency graphs. This can learn both the semantic relationships of aspects and the overall semantic meaning in a sentence.
-
(3) We use GCN to handle syntactic dependencies, which improves the capability to capture long-distance syntactic dependencies and accurately catch the structural information and semantic knowledge of sentences. That allows for effective integration and learning of features related to aspects and opinions, consequently enhancing the aspect features.
-
(4) The results of experiments carried out on four datasets prove the effectiveness of our SD-GCN.
-
The rest of the paper is organized below: Sect. 2 introduces relevant research on ABSA. Section 3 describes our proposed SD-GCN. Section 4 present experiments with analysis of experimental results. Finally, we make a summary in Sect. 5.
2 Related Work
Natural language processing encompasses various tasks, and text classification (TC) has been one of the prominent research areas[18, 19]. ABSA is a fine-grained form of TC. It unlike sentence-level analysis that considers the emotion tendency in the text as a whole [20, 21], which performs sentiment analysis on different aspect terms of the sentence. Previous ABSA studies rely heavily on handcrafted features and definitions of language rules to determine the affective polarity of particular aspects. Kiritchenko et al. [22] employed a supervisory learning approach to examine aspects and classes in customer reviews for identifying emotional tendencies regarding aspect terms. Jiang et al. [23] incorporated syntactic features and context information for training classifiers.
Neural networks have evolved significantly led to their widespread application in various domains [24, 25], especially in ABSA, where their application has brought significant effects [26]. Lakkaraju et al. [27] presented a layered approach to learning framework that used RNN for modeling aspects and perspectives. Wang et al. [28] aggregated Conditional Random Fields (CRF) and RNN as a joint model for aspect affective judgments.
Recently, the research on neural networks mainly focuses on attention-based neural networks. Wu et al. [10] incorporated a special residual methodology into the convolutional neural network (CNN), which could alleviate the loss of raw information in the attentive mechanisms. LSTM aims to tackle limitations of traditional RNN in capturing and remembering long-term dependencies in sequential data. Wang et al. [29] presented an LSTM incorporating a variant of the attention mechanism, that connected aspect vectors to sentence hidden representations to calculate attention weights. Huang et al. [11] suggested a method of superimposing attention on attention that directly captured the interplay among aspect terms and contextual phrases. Song et al. [30] suggested a network of attentive encoders with multiple attentions to extract concealed states and contextual interactions among targets. Wang et al. [31] introduced Multi-Attention Method Networks (MAMN), they use a pre-trained approach to build a vector of word embeddings. As well as applied various attentive mechanisms at internal and external levels. Finally, the feature focused attention mechanism was used to enhance sentiment identification. Ayetiran et al. [32] proposed a CNN-BiLSTM approach for fused attentiveness. It extracts high-level symbolic characters and contextual characteristic representations of text by learning additional document-level emotion statistics. However, the aforementioned attention-based models overlook the influence of sentence structures and syntactic dependency information. Without utilizing this additional semantic information, it may lead to erroneous results in identifying the sentiment orientation toward specific aspects.
An emerging trend is to utilize dependency trees because syntactic messages can establish linkages among aspects and corresponding opinions. In the research that utilizes dependency trees, GCN based on dependency trees has achieved promising results in the ABSA. Compared with CNN, GCN [33] can better process graph-structured data and is employed for processing various languages tasks, such as semantic role tagging [34], machine translators [35], and relationship extractions [36]. Zhang et al. [16] first used GCN for ABSA, they obtained aspect features by employing multilayer graph convolution on a dependency tree of sentences and applying an aspect masking layer. Sun et al. [17] presented a methodology for convolving dependency trees, which employed Bi-LSTM to study the characteristic representation and utilized graph convolution to handle the dependency trees of sentences. Liang et al. [37] extracted aspect sentiment in the sentence by constructing graphs on dependency trees, and integrating sentiment knowledge from SenticNet with contextual representations learned by LSTM. Tian et al. [38] constructed the graph from dependency trees and applied attentional mechanisms to weigh the edges in the graph.
Apart from the approach that utilizes GCN, some methods that utilize Graph Attention Network (GAT) to process dependency trees. Wang et al. [39] extended GAT to create relation graph attention networks (R-GAT), and employed R-GAT to reconstruct and prune the regular dependency syntax trees to gain a new dependency tree architecture for aspects. Ke et al. [40] combined syntactic dependencies with graph attention, which encoded dependent paths to obtain aspect-oriented syntactic representations. They also redesigned the attention layer and used layered attention for weighting and aggregating contextual terms. However, the aforementioned studies did not fully utilize syntactic dependencies and did not explore how to model syntactic relationships more effectively.
The aforementioned existing works can be broadly categorized into the following three groups:
-
(1) Attention-based approaches (e.g., Wang et al. [29], Song et al. [30]), have the advantage of introducing fine-grained attention that allows model to focus on crucial aspects. The disadvantage is the challenge of exploiting the syntactic dependencies.
-
(2) Syntactic dependency-based approaches (e.g., Zhang et al. [16], Liang et al. [37]), capture the textual structure through syntactic dependencies, which allows for better understanding the relationship among words. But they disregard the importance of different neighboring nodes, and how to capture the syntactic dependencies more adequately is a challenge.
-
(3) Methods that combine syntactic dependencies with attention mechanisms (e.g., Wang et al. [39], Ke et al. [40]), combine the advantages of fine-grained focus with syntactic dependency. However, the integration of multiple mechanisms leads to computational complexity.
3 Proposed Methodology
The workflow of SD-GCN is shown in Fig. 2. SD-GCN takes sentence-aspects pairs as inputs. Then the BERT encoder is utilized to obtain rich contextual information. Next, we perform dimensionality reduction on the word vectors obtained from BERT. Following this, we employ the biaffine attention module for modeling the relationships among word pairs in sentences, which can learn both the semantic relationships of aspects and the overall semantic meaning in the sentences. By the modeling, we can obtain syntactic dependency graphs that contain rich word pair relations. Subsequently, we utilize GCNs to process the syntactic dependency graph. Aspects and opinions are aggregated by GCNs, which can capture the dependency relationships among words and integrate the contextual information of the entire graph. Thus, SD-GCN can comprehensively understand the sentiment expression of individual words and effectively extract the aspectual features. Finally, the obtained aspectual features are aggregated to forecast the sentiment tendencies of particular aspect terms at the output layer.
3.1 Problem Definition
Give a set \(X = (S,A)\), which consists of sentence and aspects. Where \(S = \{ w_1 ,w_2 , \ldots ,w_n \}\) denotes a sentence, \(A = \{ a_1 ,a_2 , \ldots ,a_m \}\) denotes the specific aspects terms in the sentence \(S\). The target of ABSA is to predict the sentiment polarity \(Y \in \{ neutral,negative,positive\}\) of specific aspects in a sentence through \(S\) and \(A\).
3.2 Contextual Representation
To obtain vector representations of sentences with aspect terms, BERT-base-uncased is employed for obtaining hidden context definitions as a sentence encoder. BERT is a linguistic model built with the Transformer network architecture [41]. Different from traditional language models, BERT uses bidirectional encoders to learn context-dependent word embedding, allowing it to better catch the relationships and syntactic messages in words. Here, BERT is made up with numerous stacked bidirectional Transformer Encoder layers. It uses residual connections and layers normalization to alleviate the gradient vanishing problem and solve the long-term dependency problem of words, which can capture the bidirectional relations in sentences.
We expand the original input by constructing it as sentence-aspect pairs, which is a more suitable input for ABSA. To match the input form of BERT, we concatenate original text sequences with aspects separated by special tokens. The input of SD-GCN can be defined as:
where \([CLS]\) and \([SEP]\) are used to signal the beginning and ending respectively. They are both specific tokens. \(S\) represents a given sentence, and \(A\) represents aspects. The outputs of the encoding layer are \(V = \{ v_1 ,v_2 ,...,v_n \} ,v_i \in {\mathbb{R}}^{d_{bert} }\), which are the hidden representation sequences of the last Transformer model output.
3.3 Syntactic Dependency Modeling
We employ the biaffine attention to model dependency relationships for aspects and opinions. Biaffine attention is a graph-based neural dependency parser that is specifically designed to handle dependency parsing [42], and it has been proven to achieve good performance on named entity recognition tasks [43]. Biaffine Attention employs different dimension reductions on the hidden vectors of each word, resulting in two new vectors. Then uses these vectors as dependent arc and dependent heads to represent the features of the word, respectively. And then a biaffine transformation is applied to score the feature vectors of each dependent arc and head, which produces an arc score matrix and a label score matrix. With this approach, it can directly select the optimal parsing tree from all possible arcs and labels.
In our study, we utilize the biaffine attention to model word-pair relationships in sentences to catch the syntactic dependencies of aspects and viewpoints in sentences. The modeling process can be formulated as:
where \(v_i\) and \(v_j\) are the hidden representation sequences output by BERT encoder, \({\text{MLP}}\) indicates the multi-layer perceptron. \({\text{MLP}}_{0}\) is used for processing the cyclic output vector of the header words and \({\text{MLP}}_1\) is used for the cyclic output vector of the dependent words. Our purpose is to perform dimensionality reduction and nonlinear transformation on each cyclic output vector to remove information irrelevant to the current decision before applying the biaffine transformation, which helps to improve the parsing speed and reduce the risk of overfitting. Next, the bilinear attention score is calculated with the following equations:
where \(U_1\) denotes the tensor of \(d \times m \times d\), \(U_2\) denotes the matrix of \(2d \times m\), they are both learnable weights, \(b^{{\text{bam}}}\) is a bias term, and \(\oplus\) denotes concatenation. By modeling the relationship between \(v_i^{head}\) and \(v_j^{dep}\), the score tensor \(Q_{i,j} \in {\mathbb{R}}^{n \times n \times m}\) can be obtained. And \(r_{i,j,t}\) indicates the probability distribution with the t-th relationship type of the word pair \((v_i ,v_j )\), \(m\) denotes the quantity of relationship categories, and \(m\) is a hyper-parameter. Besides, the whole steps from Eq. (3) to (6) can be integrated into the following formula:
where \(R \in {\mathbb{R}}^{n \times n \times m}\) denotes the probability distribution graphs obtained by modeling syntactic dependency relationships of words. The procedure of modeling syntactic dependencies for words is described in Algorithm 1.
3.4 Graph Convolutional Operation
To obtain the dependency between aspects and viewpoints, we utilize GCN to capture the probability distribution relationship of the syntactic dependency graphs. GCN is a model that is inspired by CNN, which is used for process the graph structure data. A graph consists of edges and nodes. For each node, the GCN performs a convolution process on its neighbor to capture the topology relationships and obtain a discriminative representation of the node.
For the graph \(G\) with \(k\) nodes, we can enumerate the graph \(G\) to acquire its adjacency matrix. The resulting matrix can be expressed as \(A \in {\mathbb{R}}^{k \times k}\). Figure 3 illustrates an example about GCN performing graph convolution operations. To facilitate the explanation, we define the state of node \(i\) in layer l-th as \(h_i^l\) during graph convolution operations, where \(l \in [1,2,...,L]\). Here, \(h_i^0\) represents the initial status of node \(i\), and \(h_i^L\) represents the end status of the node \(i\). Accordingly, the convolution operation of the graph is mathematically represented as:
where \(\sigma\) is a function of non-linear, \(W^l\) denotes the matrix of weight, \(b^l\) is a bias term.
For any given sentence, by modeling the syntactic dependencies, we can acquire the probability distribution graphs. It denotes an \(n \times n\) adjacency matrix \(r \in R\). We perform the graph convolution operations on each node in l-th layer of t-th channel to update its state representation. By aggregating the information of neighboring nodes, an enhanced node representation with aspectual sentiment features can be obtained. The updating process follows:
where \(h_{j,t}^l\) is the status representational of node \(j\) in layer \(l\), \(h_{i,t}^{l + 1}\) is the end outputs of node \(i\) in layer \(l\). \(\sigma\) is the nonlinear activity function (e.g., ReLU). \(W_t^l\) is a linear transform weight, \(b_t^l\) is a bias term. All of these parameters are on the t-th channel.
The ultimate output in layer \(l\) on the t-th channel as follows:
After the graph convolution operation of l-layer, GCN can get the final feature representation. Since we have a total of \(m\) channels, there are a total of \(m\) feature representations.
3.5 Sentiment Classification
We utilize an average pooling function to integrate the enhanced characteristics generated by the GCN. After that, obtain the aspect vector information by masking the non-aspect word relationship information. Furthermore, the feature representation of specific aspects is obtained. This process follows:
where \(h_{a_p ,t}^l\) are the aspect vectors, \(f( \cdot )\) is an average pool function, which can enhance aspect term representation on the outputs. Then, we utilize an average pool function to aggregate the feature representations over \(m\) channels to achieve the ultimate aspect sentiment characteristic representation, which is as follows:
Finally, we employ the softmax function to classify aspect-level emotion polarity. The out probability is calculated as follows:
where \(W_{asc}\) and \(b_{asc}\) are learnable weight and bias.
4 Experiments
To verify the usefulness of our SD-GCN for aspect sentiment classified, we carry out extensive experiments on multiple datasets of ABSA. The experiments and result analysis are detailed as follows.
4.1 Datasets
Four ABSA datasets are used for the aspect emotional prediction, and details of these datasets are shown in Table 1.
These datasets include a Twitter social media dataset [8]. A dataset on restaurant reviews and a dataset on laptop reviews from SemEval 2014 [44]. A dataset on restaurant reviews from SemEval 2015 [45]. Each dataset classifies aspectual sentiment into three polarities. The Twitter dataset consists of brief and informal texts, characterized by casual language and relatively poor grammar. Both restaurant datasets involve customer comments on entities, such as dishes, ambiance, or service in restaurants. They emphasize more on subjective feelings and emotions. The laptop dataset encompasses information related to hardware, software, and performance of the computer domain, which has numerous computer terms and numeric texts.
4.2 Implementation Details
We conduct experiments on the following platforms. OS: Ubuntu20.04, CPU: Intel Core i9-10850 K, GPU: GeForce RTX 3090. The SD-GCN model is implemented with the Pytorch1.3.0 deep learning architecture. The parameter settings of SD-GCN are referenced from the study [16], and we fine-tuned the parameters based on experiments. Additionally, we employed cross-validation to evaluate the performance of various parameter combinations, which can select the best parameters. The hyper-parameters of SD-GCN are described in Table 2.
We train our SD-GCN on 100 epochs with batch size 16 and evaluate on the final model. For fairness, we apply accuracy and macro-F1 scores as evaluation metrics for the experiments. Five runs were performed using distinct random seeds and the experimental results were acquired on average.
4.3 Baselines
To comprehensively demonstrate the capability of our SD-GCN, we selected 16 typical aspect sentiment classification models as a baseline. The particulars of the 16 comparison approaches are described as follows:
-
LSTM [9]: utilized the LSTM to catch the links among aspects and its context, and used the final status vector output to predict the polarity of particular aspects.
-
IAN [46]: proposed the interactional attention networks, that uses multiple attention networks for simulating the interaction of target and content separately.
-
AOA [11]: proposed an attentional superposition approach that captured the relationship between aspect and contextual sentence by simulating the interaction between aspects and sentence, thus obtaining aspect-level affective polarity.
-
CNN-BiLSTM [32]: used CNN for more semantic characteristics, then inputted these features to a Bi-LSTM layer that catches contextual features of a text and could learn jointly on two levels of sentiment data.
-
ASGCN [16]: predicted sentiment by convolving a multilayer graph on dependency trees by applying an aspect-specific masking layer.
-
CDT [17]: utilized Bi-LSTM for acquiring the characteristic representations, then applied graph convolutional operations on the dependency trees to acquire aspect emotions.
-
Hete_GNNs [47]: proposed a heterogeneous graphical model that used syntactic tree information, word relations, and sentiment lexicon information to construct a unified framework to capture aspect sentiment polarity.
-
BiGCN [48]: constructed a conceptual hierarchy on lexical and syntactic graphs that allowed for the separate treatment of functionally distinct types of relations in the graph.
-
BERT4GCN [49]: combined grammatical order features in BERT with syntactic knowledge in the dependency graph to enhance the GCN using the output of BERT, and further combined the relative position information between words to make the GCN position-aware.
-
DualGCN [15]: combined syntactic knowledge with semantic information and captured feature representation using orthogonal and differential regularization to reduce the overlap of each word.
-
CRF-GCN [50]: utilized conditional random fields to withdraw viewpoint information, then integrated the opinion information through the graph convolutional networks and predicts aspect-specific sentiment by calculating a global vector expression of nodes.
-
BERT[41]: adopted multi-layer self-attention that can capture the bidirectional dependencies of the inputs.
-
SNBAN[51]: used dependency trees and additional multi-head attention to find aspects and aspect-related words, which improved the extraction of grammatical knowledge.
-
MTABSA[52]: combined ATE with ASC for joint training, and correlated aspects with dependent information through multi-task learning, thereby enhancing the connection between them and improving the focus on aspects.
-
T-GCN + BERT [38]: proposed a type-aware GCN, which could combine dependencies and types to construct an input graph, then applied attention mechanisms to weight the edges in the graph, and finally used layer integration to synthesize different contextual information.
-
R-GAT + BERT [39]: used a GAT incorporating relational knowledge to handle new dependency trees via reshaping and pruning primordial dependency trees.
4.4 Results
The results of SD-GCN compared with other 16 methods on four ABSA datasets are shown in Table 3. From it, we can find that SD-GCN model achieves satisfactory results on these datasets. The SD-GCN acquires the best results on three datasets: Rest14, Rest15, and Twitter. We can observe that the methods using GCN or GAT outperforms the other models in general, which indicates that graph neural networks can better consider the syntactic structure of sentences. In particular, by modeling syntactic dependencies, our SD-GCN outperform other models using GCN significantly due to leveraging the dependencies among aspects and emotional words. In comparison to R-GAT + BERT, the SD-GCN directly models the syntactic dependencies of the original sentences, which can avoid the potential loss caused by reshaping and pruning ordinary dependency parse trees. This enables our SD-GCN model to comprehensively capture the fine-grained linguistic nuances and the syntactic dependencies. On the other hand, the models using BERT generally outperform other representation methods, which implies that BERT is better at capturing semantic information. By applying BERT and GCN to our SD-GCN model, it can better catch the syntactic dependencies among aspects and emotional words, and achieves outstanding performance in ABSA tasks.
Worth mentioning is that our model performs weaker than T-GCN + BERT on the Lap14 dataset, but it still obtains superior performance than the other baselines. We compare the Lap14 dataset with the other three datasets and find that there are differences in data distribution through data analysis. In the Lap14 dataset there are excessive computer terms and numbers. When faced with these terms, our SD-GCN model is more likely to get the wrong results through syntactic dependency modeling. However, the T-GCN + BERT model can utilize attention mechanisms for weighting and combining semantic knowledge. Integrating the emotional information of aspects learned from the model. Therefore, it suffers less from this influence.
Another concern is that on the Twitter dataset, our SD-GCN does not perform well compared to other datasets. We speculate that this may be due to the informality of social media posts, where grammar rules are not always strictly followed, leading to poor grammaticality in Twitter dataset. This affects the effectiveness of syntactic dependency modeling.
4.5 Ablation Experiments
To examine the efficiency of each module in SD-GCN, we design ablation experiment with SD-GCN as the baseline. The details of ablation testing are shown in Table 4.
We generated some new models for comparison by removing or changing some modules in SD-GCN. Here PD represents to remove probability distribution in the biaffine attention module and directly uses the logits tensor to construct the adjacency matrix, RT represents not consider the relationship types in the biaffine attention model, which means that the number of defined relationship types is 1, and SA represents replacing biaffine attention with self-attention. First, it can be noticed that when removing PD, the Acc. and F1 both decreases, with the most significant decline on the Rest14 dataset, where the Acc. and F1 decrease by 0.87% and 1.31%. This indicates that normalizing the logits tensor can reduce errors in syntactic dependency parsing. Second, when removing RT, the efficiency of SD-GCN also decreases, with the most significant decline on the Rest15 dataset, where the Acc. and F1 decrease by 1.63% and 3.15%. This indicates that defining different numbers of relationship types can enhance the efficiency of SD-GCN. Finally, the effectiveness of the model decreases significantly after replacing biaffine attention with self-attention, which indicates that our model using biaffine attention can better model syntactic dependency relationships, and thus better catch syntactic dependencies among aspects and opinions.
4.6 Impact of SD-GCN Layers
We investigated the performance of SD-GCN with 1–6 layers on four datasets, to evaluate the impact of GCN layers. Figure 4 presents that when the GCN layer number is 2, the SD-GCN acquires the best performance. When the layer is one, it can only learn local node information and cannot integrate long-distance syntactic dependency information into global nodes. With GCN layers surpassing two and rising further, the model parameters and the redundant information obtained also increase. As a result, the training process become more challenging and the accuracy decreased significantly.
4.7 Case Study
We utilize visualizations of attention scores on several examples to further investigate the performance of SD-GCN in ABSA tasks. The visualization results of attention scores are shown in Fig. 5.
In Fig. 5a, the sentence has a modal verb phrase “should be”, which may be overlooked by some models. However, our SD-GCN model increases the attention weight on “should be” based on syntactic dependency, correctly predicting the polarity as “negative”. In Fig. 5b, our SD-GCN correctly identifies the opinion words “fast” and “friendly”, predicting the aspect term “service” as “positive”. In Fig. 5c, the sentence contains two aspect terms, and our model also predicts them correctly. In Fig. 5d, the sentence has two aspects with different polarities, “food” and “service”. Our model accurately identifies their corresponding opinions through syntactic dependency. In Fig. 5e, SD-GCN can correctly predict the sentiment by considering the effects of the long-distance interjection “Woo” and the adjective “excited”. In Fig. 5f, SD-GCN relies on syntactic dependencies to identify the modifying role of “Biggest”, thereby accurately predicting the polarity as “negative”. These six examples demonstrate that our SD-GCN can fully exploit the syntactic dependency among words, and match specific aspects with their corresponding opinion correctly.
5 Conclusion
In this study, we propose the SD-GCN model for ABSA task. It can efficiently model syntactic dependencies and integrate syntactic and semantic information of sentences by utilizing graph convolutional networks. First, we apply BERT to obtain the contextual representations. After this, we model the syntactic dependencies by the biaffine attention, and use GCN to handle these dependencies to acquire the enhanced features for emotion determination in specific aspects. Extensive experiments on four ASBA datasets validate the effectiveness of SD-GCN. We also designed ablation testing and experiments about the effect of GCN layers, as well as to further investigate the performance of SD-GCN by visualizing the attention score.
6 Limitations and Future Scope
In our work, we have considered the dependencies inherent within the texts, yet we have not harnessed some external domain-specific knowledge. This restricts the performance of SD-GCN in specific domains such as laptops or Twitter. Additionally, this study suffers from other limitations due to the constraints of the dataset and the experimental environment. High-quality datasets are still lacking in ABSA, and the commonly used datasets are released by SemEval in the early years. This constrains the applicability of the model in various scenarios and contexts. Moreover, due to restrictions imposed by the experimental equipment, we are unable to effectively run and evaluate large models, which prevents us from making sufficient comparisons with large models.
In future work, we intend to integrate domain-specific knowledge into the model through knowledge embedding, aiming to enhance the ability of understand and process information within a specific domain. We also plan to address the dataset issues by creating new datasets from diverse domains with multiple languages. Besides, we will seek to utilize more powerful computing resources to comprehensively evaluate the performance difference between our method and large models.
Data Availability
The experimental data used to support the findings of this study are available at https://github.com/z1898/SD-GCN.
References
Nazir, A., Rao, Y., Wu, L., Sun, L.: Issues and challenges of aspect-based sentiment analysis: A comprehensive survey. IEEE Trans. Affect. Comput. 13(2), 845–863 (2022). https://doi.org/10.1109/TAFFC.2020.2970399
Phan, H.T., Nguyen, N.T., Hwang, D.: Convolutional attention neural network over graph structures for improving the performance of aspect-level sentiment analysis. Inf. Sci. 589, 416–439 (2022). https://doi.org/10.1016/j.ins.2021.12.127
Xu, L., Bing, L., Lu, W. and Huang, F.: Aspect sentiment classification with aspect-specific opinion spans. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Punta Cana, Dominican Republic(Online), 3561–3567 (2020). https://doi.org/10.18653/v1/2020.emnlp-main.288
Peng, H., Xu, L., Bing, L., Huang, F., Lu, W., et al.: Knowing what, how and why: A near complete solution for aspect-based sentiment analysis. Proc. AAAI. Conf. Artif. Intell. 34(05), 8600–8607 (2020). https://doi.org/10.1609/aaai.v34i05.6383
Chen, H., Zhai, Z., Feng, F., Li, R. and Wang, X.: Enhanced multi-channel graph convolutional network for aspect sentiment triplet extraction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 2974–2985 (2022). https://doi.org/10.18653/v1/2022.acl-long.212
Wankhade, M., Rao, A.C.S., Kulkarni, C.: A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 55(7), 5731–5780 (2022). https://doi.org/10.1007/s10462-022-10144-1
Vo, D.-T. and Zhang, Y.: Target-dependent twitter sentiment classification with rich automatic features. In: Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina, 1347–1353(2015). https://dl.acm.org/doi/abs/https://doi.org/10.5555/2832415.2832437
Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M. et al.: Adaptive recursive neural network for target-dependent twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics Baltimore, Maryland, 49–54 (2014). https://doi.org/10.3115/v1/P14-2009
Tang, D., Qin, B., Feng, X. and Liu, T.: Effective lstms for target-dependent sentiment classification. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 3298–3307 (2016).
Wu, C., Xiong, Q., Yang, Z., Gao, M., Li, Q., et al.: Residual attention and other aspects module for aspect-based sentiment analysis. Neurocomputing 435, 42–52 (2021). https://doi.org/10.1016/j.neucom.2021.01.019
Huang, B., Ou, Y. and Carley, K. M.: Aspect level sentiment classification with attention-over-attention neural networks. In: Social, Cultural, and Behavioral Modeling: 11th International Conference, SBP-BRiMS 2018, Washington, DC, USA, 197–206 (2018).
Nguyen, T. H. and Shirai, K.: Phrasernn: Phrase recursive neural network for aspect-based sentiment analysis. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2509–2514 (2015). https://doi.org/10.18653/v1/D15-1298
He, R., Lee, W. S., Ng, H. T. and Dahlmeier, D.: Effective attention modeling for aspect-level sentiment classification. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, 1121–1131 (2018).
Zhao, P., Hou, L., Wu, O.: Modeling sentiment dependencies with graph convolutional networks for aspect-level sentiment classification. Knowl.-Based Syst. 193, 105443–105453 (2020). https://doi.org/10.1016/j.knosys.2019.105443
Li, R., Chen, H., Feng, F., Ma, Z., Wang, X. et al.: Dual graph convolutional networks for aspect-based sentiment analysis. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Bangkok, Thailand(Online), 6319–6329 (2021). https://doi.org/10.18653/v1/2021.acl-long.494
Zhang, C., Li, Q. and Song, D.: Aspect-based sentiment classification with aspect-specific graph convolutional networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 4568–4578 (2019). https://doi.org/10.18653/v1/D19-1464
Sun, K., Zhang, R., Mensah, S., Mao, Y. and Liu, X.: Aspect-level sentiment analysis via convolution over dependency tree. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 5679–5688 (2019). https://doi.org/10.18653/v1/D19-1569
Patra, D. and Jana, B.: Fake news identification through natural language processing and machine learning approach. In: Computational Intelligence in Communications and Business Analytics, Cham, 269–279 (2022). https://doi.org/10.1007/978-3-031-10766-5_21
Patra, D., Jana, B., Mandal, S. and Sekh, A. A.: Understanding fake news detection on social media: A survey on methodologies and datasets. In: Artificial Intelligence, Cham, 226–242 (2022). https://doi.org/10.1007/978-3-031-22485-0_21
Zhu, Y., Zheng, W., Tang, H.: Interactive dual attention network for text sentiment classification. Comput. Intell. Neurosci. 2020, 1–11 (2020). https://doi.org/10.1155/2020/8858717
Tahir, M., Halim, Z., Waqas, M., Tu, S.: On the effect of emotion identification from limited translated text samples using computational intelligence. Int. J. Comput. Intell. Syst. 16(1), 107 (2023). https://doi.org/10.1007/s44196-023-00234-5
Kiritchenko, S., Zhu, X., Cherry, C. and Mohammad, S.: Nrc-canada-2014: Detecting aspects and sentiment in customer reviews. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 437–442 (2014). https://doi.org/10.3115/v1/S14-2076
Jiang, L., Yu, M., Zhou, M., Liu, X. and Zhao, T.: Target-dependent twitter sentiment classification. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, 151–160 (2011).
Luo, Q. and Zheng, W.: Pre-locator incorporating swin-transformer refined classifier for traffic sign recognition. Intelligent Automation and Soft Computing, 37(2), 2227–2246 (2023). https://doi.org/10.32604/iasc.2023.040195
Liu, B., Guan, W., Yang, C., Fang, Z., Lu, Z.: Transformer and graph convolutional network for text classification. Int. J. Comput. Intell. Syst. 16(1), 161 (2023). https://doi.org/10.1007/s44196-023-00337-z
Dutta, R., Das, N., Majumder, M., Jana, B.: Aspect based sentiment analysis using multi-criteria decision-making and deep learning under covid-19 pandemic in india. CAAI Transact. Intell. Technol. 8(1), 219–234 (2023). https://doi.org/10.1049/cit2.12144
Lakkaraju, H., Socher, R. and Manning, C.: Aspect specific sentiment analysis using hierarchical deep learning. In: NIPS Workshop on deep learning and representation learning, Montreal, Canada, 1–9 (2014).
Wang, W., Pan, S. J., Dahlmeier, D. and Xiao, X.: Recursive neural conditional random fields for aspect-based sentiment analysis. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 616–626 (2016). https://doi.org/10.18653/v1/D16-1059
Wang, Y., Huang, M., Zhu, X. and Zhao, L.: Attention-based lstm for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, 606–615 (2016). https://doi.org/10.18653/v1/D16-1058
Song, Y., Wang, J., Jiang, T., Liu, Z. and Rao, Y.: Targeted sentiment classification with attentional encoder network. In: Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series, Munich, Germany, 93–103 (2019). https://doi.org/10.1007/978-3-030-30490-4_9
Wang, X., Tang, M., Yang, T., Wang, Z.: A novel network with multiple attention mechanisms for aspect-level sentiment analysis. Knowl.-Based Syst. 227, 1–12 (2021). https://doi.org/10.1016/j.knosys.2021.107196
Ayetiran, E.F.: Attention-based aspect sentiment classification using enhanced learning through cnn-bilstm networks. Knowl.-Based Syst. 252, 1–9 (2022). https://doi.org/10.1016/j.knosys.2022.109409
Kipf, T. N. and Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations, Toulon, France, 1–14(2017). http://arxiv.org/abs/1609.02907
Shi, T., Malioutov, I. and Irsoy, O.: Semantic role labeling as syntactic dependency parsing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Punta Cana, Dominican Republic(Online), 7551–7571 (2020). https://doi.org/10.18653/v1/2020.emnlp-main.610
Cai, D. and Lam, W.: Graph transformer for graph-to-sequence learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, New York, United States, 7464–7471 (2020). https://doi.org/10.1609/aaai.v34i05.6243
Li, B., Ye, W., Sheng, Z., Xie, R., Xi, X. et al.: Graph enhanced dual attention network for document-level relation extraction. In: Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain (Online), 1551–1560 (2020). https://doi.org/10.18653/v1/2020.coling-main.136
Liang, B., Su, H., Gui, L., Cambria, E., Xu, R.: Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks. Knowl.-Based Syst. 235, 1–11 (2022). https://doi.org/10.1016/j.knosys.2021.107643
Tian, Y., Chen, G. and Song, Y.: Aspect-based sentiment analysis with type-aware graph convolutional networks and layer ensemble. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, 2910–2922 (2021). https://doi.org/10.18653/v1/2021.naacl-main.231
Wang, K., Shen, W., Yang, Y., Quan, X. and Wang, R.: Relational graph attention network for aspect-based sentiment analysis. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Washington, United States(Online), 3229–3238 (2020). https://doi.org/10.18653/v1/2020.acl-main.295
Ke, W., Gao, J., Shen, H., Cheng, X.: Incorporating explicit syntactic dependency for aspect level sentiment classification. Neurocomputing 456, 394–406 (2021). https://doi.org/10.1016/j.neucom.2021.05.078
Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423
Dozat, T. and Manning, C. D.: Deep biaffine attention for neural dependency parsing. In: International Conference on Learning Representations, Toulon, France, 1–8(2017).
Yu, J., Bohnet, B. and Poesio, M.: Named entity recognition as dependency parsing. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Washington, United States(Online), 6470–6476 (2020). https://doi.org/10.18653/v1/2020.acl-main.577
Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H., Androutsopoulos, I. et al.: Semeval-2014 task 4: Aspect based sentiment analysis. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 27–35 (2014). https://doi.org/10.3115/v1/S14-2004
Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S. and Androutsopoulos, I.: Semeval-2015 task 12: Aspect based sentiment analysis. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, 486–495 (2015). https://doi.org/10.18653/v1/S15-2082
Ma, D., Li, S., Zhang, X. and Wang, H.: Interactive attention networks for aspect-level sentiment classification. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 4068–4074(2017). https://doi.org/10.24963/ijcai.2017/568
Lu, G., Li, J., Wei, J.: Aspect sentiment analysis with heterogeneous graph neural networks. Inf. Process. Manage. 59(4), 1–10 (2022). https://doi.org/10.1016/j.ipm.2022.102953
Zhang, M. and Qian, T.: Convolution over hierarchical syntactic and lexical graphs for aspect level sentiment analysis. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Punta Cana, Dominican Republic(Online), 3540–3549 (2020). https://doi.org/10.18653/v1/2020.emnlp-main.286
Xiao, Z., Wu, J., Chen, Q. and Deng, C.: Bert4gcn: Using bert intermediate layers to augment gcn for aspect-based sentiment classification. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic(Online), 9193–9200 (2021). https://doi.org/10.18653/v1/2021.emnlp-main.724
Huang, B., Zhang, J., Ju, J., Guo, R., Fujita, H., et al.: Crf-gcn: An effective syntactic dependency model for aspect-level sentiment analysis. Knowl.-Based Syst. 260, 1–11 (2023). https://doi.org/10.1016/j.knosys.2022.110125
Sharma, T., Kaur, K.: Aspect sentiment classification using syntactic neighbour based attention network. Journal of King Saud University - Computer and Information Sciences 35(2), 612–625 (2023). https://doi.org/10.1016/j.jksuci.2023.01.005
Zhao, G., Luo, Y., Chen, Q., Qian, X.: Aspect-based sentiment analysis via multitask learning for online reviews. Knowl.-Based Syst. 264, 1–12 (2023). https://doi.org/10.1016/j.knosys.2023.110326
Acknowledgements
Thanks to the anonymous reviewers for their valuable comments and suggestions that improved this paper.
Funding
This work is supported by the Natural Science Foundation of Sichuan, China (No. 2022NSFSC0571), the China Scholarship Council (No. 201908510026), and the Sichuan Science and Technology Program (No. 2019YJ0532 and No. 2021YFH0107).
Author information
Authors and Affiliations
Contributions
All authors have substantially contributed to this manuscript and have approved the final submitted version. The specific contributions of each author are as follows: FZ was involved in methodology, designing the framework of the paper, drafting the manuscript, and experiments. WZ was involved in designing the research methods, revising the manuscript, and providing guidance support. YY was involved in the statistical analysis and implementation of the research process.
Corresponding author
Ethics declarations
Conflicts of Interest
The authors declare that they have no conflicts of interest to report in this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhang, F., Zheng, W. & Yang, Y. Graph Convolutional Network with Syntactic Dependency for Aspect-Based Sentiment Analysis. Int J Comput Intell Syst 17, 37 (2024). https://doi.org/10.1007/s44196-024-00419-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44196-024-00419-6