Next Article in Journal
SGSAFormer: Spike Gated Self-Attention Transformer and Temporal Attention
Previous Article in Journal
Optimization of Video Surveillance System Deployment Based on Space Syntax and Deep Reinforcement Learning
Previous Article in Special Issue
Learning A-Share Stock Recommendation from Stock Graph and Historical Price Simultaneously
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning

1
Faculty of Engineering, Rajamangala University of Technology Krungthep, Bangkok 10120, Thailand
2
School of Information Engineering, Jiangsu College of Finance and Accounting, Lianyungang 222061, China
*
Author to whom correspondence should be addressed.
Submission received: 14 November 2024 / Revised: 15 December 2024 / Accepted: 23 December 2024 / Published: 26 December 2024
(This article belongs to the Special Issue Artificial Intelligence in Graphics and Images)

Abstract

:
Weibo sentiment analysis has gained prominence, particularly during the COVID-19 pandemic, as a means to monitor public emotions and detect emerging mental health trends. However, challenges arise from Weibo’s informal language, nuanced expressions, and stylistic features unique to social media, which complicate the accurate interpretation of sentiments. Existing models often fall short, relying on text-based methods that inadequately capture the rich emotional texture of Weibo posts, and are constrained by single loss functions that limit emotional depth. To address these limitations, we propose a novel framework incorporating a sentiment graph and self-supervised learning. Our approach introduces a “sentiment graph” that leverages both word-to-post and post-to-post relational connections, allowing the model to capture fine-grained sentiment cues and context-dependent meanings. Enhanced by a gated mechanism within the graph, our model selectively filters emotional signals based on intensity and relevance, improving its sensitivity to subtle variations such as sarcasm. Additionally, a self-supervised objective enables the model to generalize beyond labeled data, capturing latent emotional structures within the graph. Through this integration of sentiment graph and self-supervised learning, our approach advances Weibo sentiment analysis, offering a robust method for understanding the complex emotional landscape of social media.

1. Introduction

Weibo sentiment analysis [1,2,3] has become increasingly important, especially during the COVID-19 pandemic [4]. Platforms like Weibo have evolved into vital public spaces where people share opinions [5], express emotions, and foster community interactions. The surge in online engagement during this period highlighted Weibo as an invaluable resource for monitoring public sentiment in real-time. This has been particularly critical given the rise in mental health issues, such as anxiety, depression, and stress, that have accompanied prolonged periods of social isolation, health concerns, and economic uncertainty [6]. Accurate sentiment analysis of Weibo posts is essential to understanding collective moods, identifying emerging mental health risks, and providing timely insights to stakeholders such as government agencies, healthcare providers, and mental health professionals who rely on these analyses to measure societal emotional states and plan interventions as needed.
Yet, performing effective sentiment analysis on Weibo is complex and presents a series of challenges. Unlike traditional platforms, Weibo posts often utilize highly informal language, fragmented expressions, and numerous stylistic features unique to digital communication [7]. These include sarcasm, regional slang, dialects, abbreviations, and frequently abbreviated or implicit expressions due to character limits, which limit the context available for precise emotional interpretation [8]. Such nuances pose difficulties for sentiment analysis, as models struggle to recognize context-dependent emotions and attitudes [9,10,11]. For example, identifying sarcasm in a post or discerning the nuanced feelings of frustration or irony becomes particularly challenging. Accurately interpreting these complex and sensitive emotional cues is crucial, as it may enable the early detection of individuals who could be at risk for mental health issues, thereby contributing positively to public health and societal well-being [12,13].
Traditional Weibo sentiment analysis methods [2] largely rely on text-based models that use natural language processing (NLP) techniques [14,15,16] for sentiment classification. Although these NLP models have achieved notable success in detecting broad emotional categories (e.g., positive, negative, or neutral sentiments) [9,17,18], they often fall short of capturing the rich emotional texture present in Weibo posts. For instance, conventional models treat posts as simple sequences of text, neglecting the nuanced cues embedded in specific words or phrases that reveal the underlying emotions of the poster. In the example post, “Will Malaysia Airlines still deny it? What exactly are they hiding?”, the term “still” hints at ongoing frustration and distrust. Standard models, like BERT, may process this post as a single text sequence, potentially diluting the emphasis on the word “still” and missing the full emotional impact. This limitation underscores the need for advanced approaches that go beyond text processing, such as graph neural networks (GNNs), which can capture nuanced relationships between words and more accurately infer the sentiment expressed by the user [19].
Another significant limitation of existing sentiment analysis models is their dependency on a single loss function, typically a classification loss, to guide sentiment inference [20,21]. While this approach can work for simpler tasks, it constrains the model’s ability to capture complex and layered emotional states, as it reduces sentiment to predefined categories that may not fully encapsulate the diversity of human emotions. Single loss functions are insufficient for representing multi-faceted emotional content, as they ignore potential latent structures within emotional expressions [19,22]. In contrast, self-supervised learning offers a pathway to learn generalized emotional representations by leveraging large amounts of unlabeled data [23]. Self-supervised learning can reveal underlying patterns and structures within data, allowing models to infer emotions that go beyond traditional categories [24,25]. Without incorporating these self-supervised objectives, existing models struggle to generalize effectively beyond labeled datasets, limiting their ability to detect subtle emotional cues that might arise from complex linguistic and contextual nuances.
To address these limitations, we propose a novel graph-based framework that integrates self-supervised learning to significantly enhance Weibo sentiment analysis. Our approach introduces a unique “sentiment graph” structure that leverages both word-to-post and post-to-post connections. Unlike traditional models that treat text as isolated sequences, this sentiment graph forms a relational network where words, phrases, and entire posts are treated as interconnected nodes. These connections enable the model to capture fine-grained emotional cues and context-dependent meanings within Weibo posts, particularly those subtle cues that are often overlooked in sequence-based processing. The sentiment graph goes beyond conventional sentiment analysis by embedding two distinct types of relationships: semantic connections between words within individual posts and contextual connections between posts based on thematic or emotional similarity. This dual-level relational structure allows the model to understand not only the immediate sentiment expressed by a post but also how similar sentiments might manifest across different posts in nuanced ways. For instance, it can discern shifts in emotional tone across a user’s posts over time, detect recurring themes of frustration or distrust, and differentiate between subtle variations in sentiment that traditional text-based approaches might miss.
Beyond simply applying graph neural network, our approach is further distinguished by the introduction of a novel gated mechanism within the graph, a key contribution that enhances the model’s ability to capture nuanced sentiment variations. This gating function, integrated within the sentiment graph, allows the model to dynamically filter sentiment signals from neighboring nodes based on their intensity and relevance. By selectively controlling the flow of information, this gated mechanism enhances sensitivity to subtle emotional cues, such as sarcasm or shifts in sentiment intensity, which are common in social media discourse. This enables the model to better differentiate and represent the complex emotional landscape within Weibo posts.
Further distinguishing our approach is the integration of self-supervised learning within the graph framework. We employ a novel self-supervised loss function that operates on the structure of the sentiment graph itself, enabling the model to learn nuanced representations of emotional relationships without relying solely on labeled data. This self-supervised objective is designed to capture latent emotional structures within the graph, such as implicit hierarchies or clusters of sentiment expressions, which can enhance the model’s ability to generalize beyond predefined sentiment categories. By dynamically adjusting to patterns uncovered within unlabeled data, our model can detect complex, layered emotional states that emerge from both the linguistic and relational context in Weibo posts. Integrating sentiment graph and self-supervised learning, we enhance Weibo sentiment classification in a large margin.
In summary, this paper makes four key contributions to advancing Weibo sentiment analysis. First, we introduce a novel sentiment graph framework that leverages the relational connections between words, phrases, and posts, enabling richer contextual understanding than sequence-based models. Second, our dual-level relational structure captures both semantic and contextual relationships, allowing the model to interpret nuanced sentiment patterns across posts. Third, we propose a gated mechanism within the graph framework, enabling the model to selectively adjust information flow based on sentiment intensity—a critical improvement for capturing the nuanced sentiment shifts in social media. Fourth, by incorporating a self-supervised learning objective tailored to the sentiment graph, our model learns complex emotional representations without heavy reliance on labeled data, making it adaptable to diverse, unlabeled social media contexts. Together, these innovations represent a significant step forward in accurately capturing the dynamic, multi-layered emotional landscape of social media platforms like Weibo.

2. Literature Review

2.1. Traditional Weibo Sentiment Analysis

Sentiment analysis [26] on Weibo, a popular microblogging platform in China, has become an essential tool for gauging public opinion and emotional responses, particularly during critical events such as the COVID-19 pandemic. This section reviews recent studies that have employed various methodologies to analyze sentiment on Weibo, focusing on advancements in natural language processing (NLP) techniques and their implications for understanding large-scale public sentiment dynamics.
Recent research highlights the effectiveness of deep learning models, such as BERT [27], in enhancing sentiment classification accuracy on Weibo posts [3]. By leveraging pretrained language models, these approaches can capture nuanced contextual meanings, which has led to improvements in correctly identifying sentiment polarity in short, informal posts [28]. Additionally, the development of resources tailored specifically to Weibo, such as customized sentiment dictionaries, has further enhanced the accuracy of sentiment analysis in this context [29]. These resources have proven especially beneficial in handling the unique linguistic characteristics of Weibo, which often include colloquialisms and cultural expressions that general sentiment analysis tools may overlook. Specialized NLP toolkits, such as BosonNLP, have also gained prominence, providing powerful mechanisms for extracting sentiment and other insights from social media text [30].
The application of sentiment analysis on Weibo has been particularly significant during the COVID-19 pandemic, as researchers sought to capture real-time public emotions and gauge responses to the unfolding crisis. Studies have utilized sentiment analysis to detect public anxiety and panic as the pandemic evolved [2], while others have analyzed temporal patterns of emotional fluctuations in response to significant COVID-19 events [2]. In conjunction with topic modeling, sentiment analysis has enabled researchers to uncover complex public responses to the pandemic, illustrating the diverse applications of sentiment analysis in crisis contexts [31].
Beyond crisis analysis, machine learning techniques have broadened the scope of sentiment analysis on Weibo [32]. Models such as Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs) have demonstrated effectiveness in sentiment classification, providing richer insights into the emotional landscape of Weibo users [33,34]. Additionally, studies have explored the correlation between Weibo sentiment trends and external phenomena, such as stock market fluctuations, underscoring the broader implications of Weibo sentiment analysis beyond social media [35].

2.2. Graph Learning for Weibo Sentiment Analysis

Graph Neural Networks (GNNs) have emerged as a promising tool for enhancing sentiment analysis on social media platforms like Weibo [36,37,38], where they can capture intricate relationships between words and their contextual meanings—a crucial factor for accurately interpreting sentiment in text. This section synthesizes recent advancements in the application of GNNs for Weibo sentiment analysis, highlighting innovative methodologies, challenges, and potential future directions.
The integration of GNN frameworks has gained traction due to their ability to model relational structures in text, offering a step forward from traditional sequence-based approaches. For instance, Zhao et al. [39] introduced a Knowledge-Enhanced Graph convolutional network for aspect-based sentiment analysis, which incorporates external sentiment lexicons to enrich the model’s understanding of sentiment-laden words and phrases [39]. This method illustrates how GNNs can leverage external knowledge sources to improve sentiment detection. Similarly, Lan et al. [40] proposed dual-channel interactive graph convolutional networks that use attention mechanisms to capture both syntactic structures and multi-aspect sentiment dependencies, allowing the model to better interpret sentiment in complex, multi-faceted sentences. Recent advancements in general GNN methodologies include leveraging techniques such as Gaussian Similarity Modeling [41] and Cross-Channel Graph Information [42] to improve GNN performance.
In the specific context of Weibo, various studies have explored unique ways of applying GNNs to sentiment analysis. Chen et al. [43] laid the foundational work by combining character embeddings with dual-channel convolutional neural networks for Weibo sentiment classification, thereby setting the stage for integrating graph-based methods within this specific social media platform [43]. Additionally, Liao et al. developed a multi-level graph neural network that captures hierarchical relationships in the data, making it particularly suitable for analyzing the informal and rapidly evolving nature of Weibo content [39]. These studies underscore the adaptability of GNNs to the unique characteristics of Weibo, such as its colloquial language and real-time content shifts.
Despite these advancements, challenges persist in the application of GNNs for Weibo sentiment analysis. A primary challenge lies in effectively modeling both nuanced word relationships within posts and connections between posts in a unified framework. Capturing these relationships jointly is essential for reflecting the true emotional landscape on Weibo, as sentiment can be influenced by both the semantic context of individual words and the broader relational context across posts. Developing models that can seamlessly integrate these layers of information remains a key area for future research.

3. Problem Definition

Weibo sentiment classification aims to accurately identify the sentiment (positive, negative, and neutral) of posts on China’s Weibo platform. This task is challenging due to the platform’s informal language, cultural nuances, and non-standard syntax, as well as the complex, layered emotions that users often express. Effective classification models must capture subtle emotional cues within words and contextual shifts, adapting to Weibo’s rapidly evolving content and diverse topics. Solving this problem is essential for applications in social media analytics, public opinion monitoring, and crisis response, where timely insight into public sentiment is valuable.

4. Methodology

As shown in Figure 1, this section describes the steps in our proposed approach for Weibo sentiment classification, including the construction of the sentiment graph, feature transformation using FastText embeddings [44], and model training with a novel self-supervised loss. Specifically, FastText is a representation approach that can transform words to embeddings to vectorize Weibo Post.

4.1. Construction of Sentiment Graph

To capture the nuanced sentiments in Weibo posts, we construct a sentiment graph G = ( V , E ) , where V denotes the set of nodes, and E denotes the set of edges. The nodes in V are of two types, “word nodes” and “post nodes”, connected through two types of edges: “word-to-post” and “post-to-post”.
For each post p V , we link a word node w V to p if w appears in p. Formally, we define an edge ( w , p ) E if w p . This inclusion-based link enables the model to associate specific words with posts, capturing the lexical structure of posts and the contribution of individual words to post sentiment.
To capture the semantic similarity between posts, we define edges between post nodes based on content similarity. Given the FastText embeddings e p i and e p j of two posts p i and p j , we compute the cosine similarity:
sim ( p i , p j ) = e p i · e p j | e p i | | e p j | .
If sim ( p i , p j ) > 0.6 , we establish an edge ( p i , p j ) E . This thresholded connection allows the model to capture relational context across similar posts, reflecting trends and contextual sentiment relationships.
The resulting graph G captures both lexical content through word-to-post edges and contextual similarity through post-to-post edges, forming a hybrid structure that models both individual post composition and broader relational patterns.

4.2. Feature Transformation Using FastText Embeddings

To leverage the sentiment graph G , we transform each node in V into an embedding that captures sentiment information via a graph neural network (GNN).
We initialize word nodes and post nodes with FastText embeddings. Let x w R d represent the initial embedding of a word node w, and  x p R d represent the initial embedding of a post node p. FastText’s subword-based embeddings are particularly effective for informal language and slang, which are common in Weibo posts, by capturing morphological nuances within words and enhancing the model’s understanding of variations in sentiment expression.
Using a graph neural network (GNN) with a novel gated feature transformation mechanism, we refine the initial embeddings to learn nuanced, context-sensitive representations of Weibo sentiment. For each node v V , the GNN updates its embedding by aggregating information from neighboring nodes while using a gating mechanism, which we introduce as a key contribution, to selectively control the flow of sentiment information. Let h v ( k ) denote the embedding of node v at the k-th layer. The update rule at each layer k is defined as
h v ( k + 1 ) = σ u N ( v ) g v u ( k ) α v u W ( k ) h u ( k ) ,
where we have the following:
  • N ( v ) denotes the neighbors of v;
  • α v u represents attention weights that capture the importance of neighboring nodes;
  • W ( k ) is a learnable weight matrix;
  • g v u ( k ) = sigmoid ( w g ( k ) · h u ( k ) ) is our proposed learnable gating function that dynamically controls the influence of each neighboring node based on its sentiment features;
  • σ ( · ) is a non-linear activation function.
The gate g v u ( k ) acts as a sentiment-sensitive filter, strengthening or weakening the information flow based on the sentiment intensity or nuance present in each node’s content. This iterative process produces a final embedding h p for each post node p, capturing both structural and word-level nuances as well as sentiment-specific context of interactions. The gating mechanism, as our contribution, enables the GNN to handle the rich sentiment variance in Weibo data more effectively, leading to more accurate sentiment representation and analysis.
The final embedding h p of each post node p serves as input for the sentiment classifier. The classifier maps h p to a sentiment label y positive , negative , neutral , allowing us to predict sentiment based on the aggregated emotional context of each post.

4.3. Model Training with Additional Self-Supervised Loss

To enhance the model’s generalization ability, we introduce a novel self-supervised loss that reconstructs the affinity structure of the sentiment graph G , complementing the primary classification loss.
Our self-supervised objective encourages the model to predict the affinity between nodes based on their embeddings, thereby reinforcing the learned graph structure. For each pair of connected nodes ( u , v ) E , we define a reconstruction loss:
L self-supervised = ( u , v ) E sim ( h u , h v ) A u v 2 ,
where sim ( h u , h v ) is the cosine similarity between embeddings h u and h v , and  A u v is a binary indicator for the existence of an edge between u and v. This loss encourages nodes with an edge to have similar embeddings, while non-connected nodes remain less similar.
The final objective combines the classification loss L class with the self-supervised loss L self-supervised :
L = L class + λ L self-supervised ,
where λ is a hyperparameter that balances the two objectives. This combined loss encourages the model to learn both the sentiment classification task and the inherent relational structure within the sentiment graph, ultimately improving its ability to capture nuanced emotions in Weibo posts.
This methodology enables our model to effectively learn from the unique relational patterns in Weibo data, providing a robust approach to sentiment classification in social media contexts.

5. Experiment

5.1. Datasets

The data for the experiment were sourced from three primary datasets. We show the statistics in Table 1 and Figure 2 (for Dataset 1).
Dataset 1, sourced from the “SMP2020 Weibo Emotion Classification Technology Evaluation”, contains COVID-19-related Weibo posts categorized into six emotional labels: “neutral”, “happy”, “angry”, “sad”, “fear”, and “surprise”. For this study, the dataset was simplified to two sentiment categories: “Positive” (comprising “happy” and “surprise” emotions) with 4620 posts, and “Negative” (including “angry”, “sad”, and “fear” emotions) with 2526 posts. The ”neutral” posts were removed to refine the focus on binary sentiment classification and improve sentiment analysis reliability. In total, the processed Dataset 1 includes 8606 training samples, 2000 validation samples, and 3000 test samples.
Dataset 2 is derived from the “weibo_senti_100k” dataset, featuring 119,984 labeled Sina Weibo comments—59,993 positive and 59,991 negative. This dataset provides straightforward sentiment labels, supporting precise sentiment classification tasks and allowing us to evaluate the model’s performance and generalization on Weibo comment data.

5.2. Example of Data

We also show some examples in the following Table 2:
In the following Weibo review analysis table, we explore a diverse set of posts categorized by sentiment (labeled 0 for negative/neutral and 1 for positive). This sample reflects the various emotional tones and expressions often found on social media platforms.
  • Sentiment Label Distribution: The table predominantly features posts labeled “0”, representing neutral or negative sentiments, with a smaller portion labeled “1” for positive sentiments. This distribution illustrates the range of emotions present in social media, from frustration and disappointment to gratitude and joy.
  • Role of Emoticons and Social Media Language: Emoticons like [Tears], [Dizzy], [Haha], and [Playful] are integral to sentiment interpretation, as these symbols often convey emotions more directly than words alone. This underscores the unique challenge of sentiment analysis on platforms like Weibo, where textual and visual elements combine to express sentiment.
  • Contextual Complexity of Posts: Some posts, such as ID 62050, mention specific events or contexts (e.g., “More negative news about CMB”), making sentiment difficult to assess without background information. Additionally, comments on shared content (e.g., ID 81472) add complexity, as accurate sentiment analysis requires understanding both the primary text and the referenced content.
  • Direct vs. Indirect Sentiment Expression: Positive posts labeled “1” often express clear sentiments, such as gratitude or well wishes (e.g., IDs 7777 and 6598). In contrast, posts labeled “0” show negative sentiments, including frustration and confusion, as seen in IDs 100399 and 82398, where users express dissatisfaction with experiences like getting lost or facing unclear airline policies.
This analysis highlights the nuanced challenges in Weibo sentiment analysis, where accurate assessment requires interpreting not only direct emotional cues but also contextual references and emoticons.

5.3. Environment

The system configuration and tool settings used in the experiment are shown in Table 3.The experiments were conducted on an Ubuntu 18.04 operating system, using PyCharm as the development environment and Python 3.11 as the programming language. A laptop equipped with an RTX 3090 Super GPU was utilized to accelerate model training and inference, benefiting from enhanced GPU computing power to optimize processing speed.

5.4. Data Preprocessing

This section illustrates the flow of the primary preprocessing steps used in our experiments.
1.
Text Cleaning: To remove noise from the text, we applied regular expressions to filter out HTML tags, irrelevant URLs, emoticons, and extraneous letters and numbers, retaining only the essential text content.
2.
Word Segmentation: We employed the Jieba word segmentation library, a versatile tool for Chinese word segmentation. Jieba offers several segmentation modes: precise, full, and search engine modes. Additionally, it allows for custom dictionaries, enabling tailored segmentation for specific domain vocabularies.
3.
Stopword Removal: Stopwords are commonly removed in text preprocessing, as they do not contribute meaningful information. We utilized the Harbin Institute of Technology’s stopword list as our primary source and performed secondary filtering with the Baidu stopword list to enhance text quality.
4.
Numerical Conversion: We used a Tokenizer to transform the cleaned text data into numerical format, making them compatible with model processing.

5.5. Experimental Parameters

For model training, the Embedding layer was used to encode the input integer sequences into dense vector representations. The graph neural network architecture included 128 hidden units with a dropout rate of 0.5. We set the batch size to 16 and the learning rate to 0.0001.

5.6. Baseline Models

We evaluate our model’s performance against several established baselines in text classification and sentiment analysis:
  • Word2Vec: A neural network model that learns word embeddings by predicting context words (skip-gram) or target words (CBOW), effectively capturing semantic relationships between words.
  • FastText: An extension of Word2Vec that represents each word as a collection of character n-grams, enabling the model to capture subword information and improve performance with rare or misspelled words through enhanced morphological understanding.
  • K-Nearest Neighbors (KNN): A non-parametric, distance-based classification method that assigns labels based on the majority label among the k nearest neighbors, often utilizing word embeddings or document vectors to measure proximity in text data.
  • Convolutional Neural Networks (CNNs): Apply 1D convolutional layers on word embeddings to extract local n-gram features, which are aggregated via pooling layers to produce fixed-size vectors for classification tasks.
  • Long Short-Term Memory (LSTM): A recurrent neural network (RNN) variant designed for sequential data, employing gates to regulate information flow and effectively capturing long-term dependencies in text, which is particularly useful for sentiment analysis.
  • CNN-BiLSTM: Combines CNNs to extract local patterns with a bidirectional LSTM (BiLSTM) to capture contextual information from both past and future tokens, leveraging the strengths of both architectures for improved performance.
  • Gated Recurrent Unit (GRU): A streamlined alternative to LSTM with fewer gates and no separate cell state, providing the efficient training and effective modeling of sequential dependencies, especially for shorter text sequences.
  • Dual-Channel Graph [40]: Utilizes attention mechanisms to capture syntactic structures and multi-aspect sentiment dependencies, improving the model’s interpretation of complex, multi-faceted sentences.
  • Knowledge-Enhanced Graph [39]: Incorporates external sentiment vocabularies to enrich aspect-based sentiment analysis by enhancing the model’s understanding of sentiment-heavy words and phrases, representing the current state of the art.
  • Gaussian Similarity Modeling (GSM) [41]: Employs Gaussian similarity metrics to enhance GNN performance by improving the representation of node relationships.
  • Cross-Channel Graph Information Bottleneck (CCGIB) [42]: Leverages cross-channel graph information to improve GNN performance by effectively balancing node information and graph sparsity.

5.7. Classification Results

The results presented in Table 4 highlight the performance of various sentiment classification algorithms across two distinct datasets. Each model’s efficacy is measured through accuracy, precision, and F1 scores, which together provide a well-rounded understanding of each model’s classification ability. The models range from simpler approaches, like K-Nearest Neighbors (KNN), to more complex neural network-based models, including CNN, LSTM, GRU, and several graph-enhanced architectures. The performance metrics indicate a clear trend: more sophisticated architectures generally achieve higher accuracy and F1 scores, particularly on Dataset 2, which may suggest that this dataset has characteristics that benefit from advanced, high-capacity models.
Among the traditional word embedding models, FastText outperforms Word2Vec by a notable margin, achieving a 4% higher accuracy on Dataset 1 and a 3% increase on Dataset 2. This improvement in performance highlights the benefits of subword-level information in FastText, which likely aids in capturing finer nuances in sentiment classification. KNN, as expected, yields the lowest scores among all algorithms, underscoring its limitations in handling complex sentiment data compared to deep learning models. Notably, the CNN and LSTM models both demonstrate strong performance, with accuracy and F1 scores surpassing 90% on both datasets, showing the advantages of capturing spatial and sequential information in text data, respectively.
The dual-architecture models, such as CNN-BiLSTM and GRU, demonstrate incremental improvements over their single-architecture counterparts. These models leverage the strengths of both convolutional and recurrent neural networks, leading to better contextual understanding and feature extraction. For example, the CNN-BiLSTM model achieves 93.08% accuracy on Dataset 1 and 98.16% on Dataset 2, demonstrating its ability to capture intricate sentiment features effectively across diverse datasets.
The most advanced models, including the Dual-Channel Graph and Knowledge-Enhanced Graph architectures, exhibit the highest accuracy and F1 scores. The Dual-Channel Graph model reaches 96.20% accuracy on Dataset 1 and 98.85% on Dataset 2, while the Knowledge-Enhanced Graph model slightly surpasses it, achieving 96.75% and 99.00% accuracy on Datasets 1 and 2, respectively. These models capitalize on graph-based techniques that enhance text representations by capturing complex relationships and contextual dependencies. The Knowledge-Enhanced Graph model, with added semantic information, demonstrates a superior ability to generalize across both datasets, which is particularly advantageous for sentiment classification tasks with nuanced expressions.
Our model, which combines both dual-channel and Knowledge-Enhanced Graph techniques, sets a new performance benchmark with 97.51% accuracy on Dataset 1 and an impressive 99.56% on Dataset 2. This model achieves the highest precision and F1 scores as well, indicating a robust capacity to handle both positive and negative sentiments accurately. Its enhanced architecture likely allows for the extraction of more comprehensive sentiment features, leading to superior generalization. The consistently high performance across both datasets suggests that this model could be well suited for real-world applications in sentiment analysis, where data diversity and complexity often challenge simpler models.
In summary, these results affirm that complex, graph-based models provide the best performance for sentiment classification, with each incremental architectural enhancement translating to measurable improvements in classification metrics. The higher performance of “Our Model” indicates the effectiveness of combining dual-channel and knowledge-enhanced techniques, especially for nuanced sentiment classification tasks, and underscores the potential of these advanced architectures in real-world applications.

5.8. Ablation

The ablation study in Table 5 highlights the impact of removing specific components from our model architecture on performance across two datasets. For Dataset 1, removing the Post-to-Post Link results in a slight drop in F1 score to 95.5%, compared to 96.12% with the full model, indicating a marginal reduction in the model’s overall predictive power. Similarly, removing the Word-to-Post Link also lowers the F1 score slightly to 95.6%. The exclusion of self-supervised loss has a comparable effect, with an F1 score of 95.8%. However, our complete model achieves the highest accuracy (97.51%) and F1 score (96.12%) for Dataset 1, underscoring the contributions of all components to model performance. The results also indicate the importance of using the gating mechanism to improve sentiment classification.
For Dataset 2, the pattern is consistent, with the complete model outperforming ablated versions. The model without the Post-to-Post Link shows an F1 score of 98.3%, which is slightly below the full model’s F1 score of 98.82%. Removing the Word-to-Post Link or self-supervised loss also leads to similar minor drops in the F1 score, suggesting that each component contributes incrementally to model effectiveness. Finally, removing the gating mechanism also causes performance drop. The full model achieves the highest performance across all metrics, emphasizing the importance of each component in enhancing accuracy, precision, and F1 scores.

5.9. Sensitivity Check

Recall that we use a weight λ to balance the two losses:
L = L class + λ L self-supervised
In this sensitivity analysis of the Figure 3, we explore how variations in a specific parameter, denoted as λ , impact the performance of our model relative to a fixed reference point, termed the Knowledge-Enhanced Graph (static). The static graph serves purely as a reference for comparison and does not itself respond to changes in the λ parameter, as it is unaffected by this or any other variable. This reference provides a baseline or benchmark to evaluate the performance fluctuations in our model as λ is adjusted.
The chart shows two lines: the flat blue line representing the Knowledge-Enhanced Graph (static), and the dynamic orange dashed line representing our model. As λ is varied along the x-axis, our model exhibits significant changes in its output, indicating sensitivity to this parameter. The performance of our model initially rises, peaking at a certain λ value before gradually declining. This pattern suggests that our model is optimized for a specific range of λ values, where it performs most effectively, and that its performance diminishes when λ moves beyond this optimal range.
Despite the sensitivity of our model to the λ parameter, it consistently outperforms the Knowledge-Enhanced Graph (static) across all tested λ values. This finding is significant because it highlights our model’s ability to adapt to parameter variations while maintaining an edge over the static baseline. The reference line, though stable, is effectively outpaced by our model at every point, demonstrating the latter’s flexibility and superior performance.

5.10. Training and Validation Losses

Figure 4 demonstrates the consistent and smooth training of our method as indicated by the gradual decline in training loss over time. Unlike many models that tend to overfit quickly, our method shows a minimal gap between the training and validation losses, even after extended epochs. The validation loss remains stable and does not exhibit the sharp upward spikes typical of overfitting, reflecting the robustness of our approach. This suggests that our model generalizes well across both training and validation data, making it less prone to overfitting and capable of maintaining performance over time.

6. Visualization of Attention Maps

The visualization of attention score maps (Figure 5) reveals key insights into the model’s mechanisms for interpreting sentiment across different types of review content. By examining the distribution of attention scores in each sample, we can assess the model’s effectiveness in identifying sentiment-bearing elements within text.
Sample 1: Negative Sentiment In the first sample, “Too much! @Rexzhenghao: More negative news about CMB lately…, high attention scores are assigned to sentiment-rich phrases such as ”Too much!” and “negative news”. This distribution indicates the model’s focus on words that express strong sentiment, as these words likely inform the negative classification of the review. By emphasizing these emotionally charged terms, the model highlights its ability to prioritize critical phrases that contribute to an overall negative tone.
Sample 2: Mixed Sentiment The second sample, “A little tempted to join???? [Sneak smile] Still deciding on the time [Frustrated],” presents a more complex sentiment structure. Attention scores are distributed across phrases such as “tempted to join” and “still deciding”, reflecting a balance between positive curiosity and hesitation. The model appears to account for emoticons like “[Sneak smile]” and “[Frustrated]” in its score allocation, suggesting an understanding of these symbols as mood indicators. This nuanced spread of attention underscores the model’s capacity to capture ambivalence, an essential feature in sentiment analysis when dealing with mixed signals.
Sample 3: Positive Sentiment In the final sample, “[Great] Thanks to everyone supporting Juanwa’s sesame! [Love you]” attention scores are concentrated on overtly positive expressions like “Great”, “Thanks”, and “Love you”. These words carry strong positive connotations, which the model prioritizes in its interpretation. By focusing on these appreciative and affectionate terms, the attention mechanism successfully identifies signals of positive sentiment, thereby enhancing the accuracy of its sentiment prediction.
Overall, the attention score maps show a consistent pattern where the model prioritizes words that convey emotional tone, especially those that are sentiment laden or directly indicative of the review’s mood. This pattern suggests an effective alignment between attention distribution and sentiment-bearing elements within text. Such insights can be instrumental in refining the model’s attention mechanisms, ensuring a greater focus on sentiment-relevant words and improving sentiment analysis accuracy. These observations further highlight the potential for using attention-based interpretability to validate and adjust the model behavior in sentiment prediction tasks.

6.1. Time and Resource Analysis

The training process demonstrates remarkable efficiency as evidenced by the consistent growth of the F1 score over time in the Figure 6. Within just three hours, the model achieves a final F1 score of 0.95, starting at 0.70 in the initial phase. This rapid convergence highlights the effectiveness of the learning algorithm and its ability to optimize performance over a relatively short period. Such efficiency is crucial for iterative development processes, enabling researchers to test and deploy updates quickly without extensive computational delays. Additionally, the steady increase in performance metrics indicates robust generalization and the absence of significant overfitting during training.
One of the most notable strengths of the model is its resource-efficient architecture. The training process uses only 16 GB of memory, underlining its suitability for environments with limited computational resources. This efficiency is further amplified by the adoption of a mini-batch training strategy, which allows the model to handle large-scale datasets effectively without requiring excessive hardware. By dividing the data into manageable mini-batches, the model ensures that memory usage remains low while maintaining high performance. This design choice not only reduces the cost of infrastructure but also ensures scalability, making the model adaptable for a wide range of applications, from small-scale research projects to large-scale industrial tasks.

6.2. Applying to Other Social Media and Language

The results of our sentiment analysis on Twitter data [45], in Table 6, demonstrate the robustness of our proposed model compared to existing approaches. Our model achieves the highest performance across all metrics, with an accuracy of 76.21%, precision of 74.32%, and an F1 score of 75.32%. These scores outperform established models such as the GRU model, Dual-Channel Graph, and Knowledge-Enhanced Graph, indicating the effectiveness of our enhancements in capturing nuanced sentiment patterns. Notably, the Knowledge-Enhanced Graph model and CCGIB, which also leverage external information and graph structures, performed similarly to our model but fell short, particularly in the F1 score, suggesting that our integration of domain-specific knowledge and refined feature representation offers a significant edge in classification tasks.
The promising performance of our model on Twitter sentiment analysis suggests its potential applicability to other social media platforms, such as Facebook, Instagram, and YouTube. These platforms often feature diverse linguistic styles, including short posts, comments, and hashtags, where the adaptability of our approach in capturing contextual and semantic nuances can prove valuable. Furthermore, extending our model to support multilingual sentiment analysis could enhance its utility for global applications, particularly in addressing sentiment dynamics in languages with limited annotated datasets. By leveraging transfer learning or cross-lingual embedding techniques, our model can be fine-tuned to analyze sentiment across languages, enabling insights into cultural and regional sentiment trends and fostering broader applications in marketing, policy-making, and social impact analysis.

7. Conclusions and Future Work

In this paper, we have presented a novel approach to Weibo sentiment analysis, addressing the unique linguistic and emotional complexities of social media discourse through a graph-based framework that integrates self-supervised learning. By leveraging a sentiment graph with relational structures and an innovative gated mechanism, our model captures the nuanced emotional cues that traditional sequence-based models often miss. Our approach enhances the ability to interpret multi-layered emotional expressions, making it particularly relevant for the real-time monitoring of public sentiment, especially during crises such as the COVID-19 pandemic. Through this framework, we demonstrate significant improvements in accurately identifying and interpreting the subtle emotional shifts and intense sentiment fluctuations that characterize Weibo posts. These advancements underscore the potential of our model to support applications in mental health, policy-making, and societal well-being by offering more reliable insights into collective moods and emerging emotional trends.
Future Work: Future research could expand upon our sentiment graph framework by incorporating multimodal data sources, such as images and videos, which are prevalent in Weibo posts and can enhance emotional interpretation. Additionally, the application of cross-lingual transfer learning to this framework may allow it to be adapted to other social media platforms with different languages and cultural nuances. Another promising direction is the refinement of the self-supervised loss function to further capture temporal dynamics, enabling the model to track sentiment changes within user posts over time. Finally, exploring privacy-preserving mechanisms to analyze social media sentiment while safeguarding user data could broaden the adoption of this technology in sensitive contexts like mental health and crisis response, making it a valuable tool for both researchers and practitioners.

Author Contributions

Conceptualization, C.W.; Methodology, C.W.; Validation, C.W.; Formal analysis, C.W.; Investigation, C.W.; Resources, C.W. and A.S.; Data curation, C.W.; Writing—original draft, C.W.; Writing—review & editing, J.K. and S.T.; Visualization, C.W.; Supervision, J.K.; Project administration, J.K. and A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lyu, X.; Chen, Z.; Wu, D.; Wang, W. Sentiment analysis on Chinese Weibo regarding COVID-19. In Proceedings of the Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, 14–18 October 2020; Part I 9. Springer: Berlin/Heidelberg, Germany, 2020; pp. 710–721. [Google Scholar]
  2. Wu, W. A sentiment analysis approach to discover public panic: Based on weibo COVID-19 data. Soc. Netw. 2022, 11, 33–39. [Google Scholar] [CrossRef]
  3. Li, H.; Ma, Y.; Ma, Z.; Zhu, H. Weibo text sentiment analysis based on bert and deep learning. Appl. Sci. 2021, 11, 10774. [Google Scholar] [CrossRef]
  4. Li, S.; Wang, Y.; Xue, J.; Zhao, N.; Zhu, T. The impact of COVID-19 epidemic declaration on psychological consequences: A study on active Weibo users. Int. J. Environ. Res. Public Health 2020, 17, 2032. [Google Scholar] [CrossRef] [PubMed]
  5. Zhang, L.; Pentina, I. Motivations and usage patterns of Weibo. Cyberpsychology Behav. Soc. Netw. 2012, 15, 312–317. [Google Scholar] [CrossRef] [PubMed]
  6. Brenner, M.H.; Bhugra, D. Acceleration of anxiety, depression, and suicide: Secondary effects of economic disruption related to COVID-19. Front. Psychiatry 2020, 11, 592467. [Google Scholar] [CrossRef] [PubMed]
  7. Lin, M. Discursive Construction of Personal and Social Identities by Chinese Celebrities on Sina Weibo. Ph.D. Thesis, Hong Kong Polytechnic University, Hongkong, China, 2018. [Google Scholar]
  8. Liu, P.; Chen, W.; Ou, G.; Wang, T.; Yang, D.; Lei, K. Sarcasm detection in social media based on imbalanced classification. In Proceedings of the Web-Age Information Management: 15th International Conference, WAIM 2014, Macau, China, 16–18 June 2014; Proceedings 15. Springer: Berlin/Heidelberg, Germany, 2014; pp. 459–471. [Google Scholar]
  9. Jim, J.R.; Talukder, M.A.R.; Malakar, P.; Kabir, M.M.; Nur, K.; Mridha, M. Recent advancements and challenges of nlp-based sentiment analysis: A state-of-the-art review. Nat. Lang. Process. J. 2024, 6, 100059. [Google Scholar] [CrossRef]
  10. Poria, S.; Hazarika, D.; Majumder, N.; Mihalcea, R. Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research. IEEE Trans. Affect. Comput. 2020, 14, 108–132. [Google Scholar] [CrossRef]
  11. Mostafavi, M.; Porter, M.D.; Robinson, D.T. Contextual Embeddings in Sociological Research: Expanding the Analysis of Sentiment and Social Dynamics. Sociol. Methodol. 2024, 00811750241260729. [Google Scholar] [CrossRef]
  12. Thieme, A.; Belgrave, D.; Doherty, G. Machine learning in mental health: A systematic review of the HCI literature to support the development of effective and implementable ML systems. ACM Trans. Comput.-Hum. Interact. (TOCHI) 2020, 27, 1–53. [Google Scholar] [CrossRef]
  13. Zhong, B.; Huang, Y.; Liu, Q. Mental health toll from the coronavirus: Social media usage reveals Wuhan residents’ depression and secondary trauma in the COVID-19 outbreak. Comput. Hum. Behav. 2021, 114, 106524. [Google Scholar] [CrossRef] [PubMed]
  14. Resnik, P.; Lin, J. Evaluation of NLP systems. In The Handbook of Computational Linguistics and Natural Language Processing; Wiley: Hoboken, NJ, USA, 2010; pp. 271–295. [Google Scholar]
  15. Mihalcea, R.; Liu, H.; Lieberman, H. NLP (natural language processing) for NLP (natural language programming). In Proceedings of the Computational Linguistics and Intelligent Text Processing: 7th International Conference, CICLing 2006, Mexico City, Mexico, 19–25 February 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 319–330. [Google Scholar]
  16. Nadkarni, P.M.; Ohno-Machado, L.; Chapman, W.W. Natural language processing: An introduction. J. Am. Med Inform. Assoc. 2011, 18, 544–551. [Google Scholar] [CrossRef]
  17. Acheampong, F.A.; Nunoo-Mensah, H.; Chen, W. Transformer models for text-based emotion detection: A review of BERT-based approaches. Artif. Intell. Rev. 2021, 54, 5789–5829. [Google Scholar] [CrossRef]
  18. Alswaidan, N.; Menai, M.E.B. A survey of state-of-the-art approaches for emotion recognition in text. Knowl. Inf. Syst. 2020, 62, 2937–2987. [Google Scholar] [CrossRef]
  19. Kumari, Y.V. Explainable AI Framework Through Multi-Context Multi-Dimensional Graph Neural Network; University of Missouri-Kansas City: Kansas City, MO, USA, 2023. [Google Scholar]
  20. Yue, L.; Chen, W.; Li, X.; Zuo, W.; Yin, M. A survey of sentiment analysis in social media. Knowl. Inf. Syst. 2019, 60, 617–663. [Google Scholar] [CrossRef]
  21. Yadav, A.; Vishwakarma, D.K. Sentiment analysis using deep learning architectures: A review. Artif. Intell. Rev. 2020, 53, 4335–4385. [Google Scholar] [CrossRef]
  22. Peng, S.; Cao, L.; Zhou, Y.; Ouyang, Z.; Yang, A.; Li, X.; Jia, W.; Yu, S. A survey on deep learning for textual emotion analysis in social networks. Digit. Commun. Networks 2022, 8, 745–762. [Google Scholar] [CrossRef]
  23. Singh, A.; Nowak, R.; Zhu, J. Unlabeled data: Now it helps, now it doesn’t. Adv. Neural Inf. Process. Syst. 2008, 21. [Google Scholar]
  24. Montero Quispe, K.G.; Utyiama, D.M.; Dos Santos, E.M.; Oliveira, H.A.; Souto, E.J. Applying self-supervised representation learning for emotion recognition using physiological signals. Sensors 2022, 22, 9102. [Google Scholar] [CrossRef]
  25. Wu, Y.; Daoudi, M.; Amad, A. Transformer-based self-supervised multimodal representation learning for wearable emotion recognition. IEEE Trans. Affect. Comput. 2023, 15, 157–172. [Google Scholar] [CrossRef]
  26. Medhat, W.; Hassan, A.; Korashy, H. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 2014, 5, 1093–1113. [Google Scholar] [CrossRef]
  27. Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
  28. Yadav, A.; Chaudhary, A.; Gupta, A. Advancing Sentiment Understanding in social media through Dynamic Contextual Embedding. In Proceedings of the 2024 3rd International Conference for Innovation in Technology (INOCON), Bangalore, India, 1–3 March 2024; pp. 1–7. [Google Scholar]
  29. Zhao, S.; Chen, L.; Liu, Y.; Yu, M.; Han, H. Deriving anti-epidemic policy from public sentiment: A framework based on text analysis with microblog data. PLoS ONE 2022, 17, e0270953. [Google Scholar] [CrossRef] [PubMed]
  30. Duan, J.; Zhai, W.; Cheng, C. Crowd detection in mass gatherings based on social media data: A case study of the 2014 shanghai new year’s eve stampede. Int. J. Environ. Res. Public Health 2020, 17, 8640. [Google Scholar] [CrossRef] [PubMed]
  31. Xie, R.; Chu, S.; Chiu, D.; Wang, Y. Exploring public response to COVID-19 on weibo with lda topic modeling and sentiment analysis. Data Inf. Manag. 2021, 5, 86–99. [Google Scholar] [CrossRef] [PubMed]
  32. Kwon, H. AudioGuard: Speech Recognition System Robust against Optimized Audio Adversarial Examples. Multimed. Tools Appl. 2024, 83, 57943–57962. [Google Scholar] [CrossRef]
  33. Ma, J.; Chen, G.; Lv, W. Research and analysis of psychological data based on machine learning methods. Int. J. Wirel. Mob. Comput. 2022, 22, 1. [Google Scholar] [CrossRef]
  34. Chandrasekaran, G.; Dhanasekaran, S.; Moorthy, C.; Arul Oli, A. Multimodal sentiment analysis leveraging the strength of deep neural networks enhanced by the XGBoost classifier. Comput. Methods Biomech. Biomed. Eng. 2024, 1–23. [Google Scholar] [CrossRef]
  35. Xu, Y.; Liu, Z.; Zhao, J.; Su, C. Weibo sentiments and stock return: A time-frequency view. PLoS ONE 2017, 12, e0180723. [Google Scholar] [CrossRef] [PubMed]
  36. Das, N.; Sadhukhan, B.; Chatterjee, R.; Chakrabarti, S. Integrating sentiment analysis with graph neural networks for enhanced stock prediction: A comprehensive survey. Decis. Anal. J. 2024, 10, 100417. [Google Scholar] [CrossRef]
  37. Mahdi, A.S.; Shati, N.M. A Survey on Fake News Detection in Social Media Using Graph Neural Networks. J. Al-Qadisiyah Comput. Sci. Math. 2024, 16, 23–41. [Google Scholar]
  38. Rad, R.A.; Yamaghani, M.R.; Nourbakhsh, A. A survey of sentiment analysis methods based on graph neural network. 2023. [Google Scholar] [CrossRef]
  39. Zhao, Y.; Mamat, M.; Aysa, A.; Ubul, K. Knowledge-fusion-based iterative graph structure learning framework for implicit sentiment identification. Sensors 2023, 23, 6257. [Google Scholar] [CrossRef] [PubMed]
  40. Lan, Z.; He, Q.; Yang, L. Dual-channel interactive graph convolutional networks for aspect-level sentiment analysis. Mathematics 2022, 10, 3317. [Google Scholar] [CrossRef]
  41. Fan, X.; Gong, M.; Wu, Y.; Tang, Z.; Liu, J. Neural Gaussian Similarity Modeling for Differential Graph Structure Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 11919–11926. [Google Scholar]
  42. Fan, X.; Gong, M.; Wu, Y.; Zhang, M.; Li, H.; Jiang, X. CCGIB: A Cross-Channel Graph Information Bottleneck Principle. IEEE Trans. Neural Netw. Learn. Syst. 2024. [Google Scholar] [CrossRef] [PubMed]
  43. Chen, S.; Ding, Y.; Xie, Z.; Liu, S.; Ding, H. Chinese weibo sentiment analysis based on character embedding with dual-channel convolutional neural network. In Proceedings of the 2018 IEEE 3rd International conference on cloud computing and big data analysis (ICCCBDA), Chengdu, China, 20–22 April 2018; pp. 107–111. [Google Scholar] [CrossRef]
  44. Baltrušaitis, T.; Ahuja, C.; Morency, L. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 423–443. [Google Scholar] [CrossRef]
  45. Dong, L.; Wei, F.; Tan, C.; Tang, D.; Zhou, M.; Xu, K. Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, MD, USA, 22–27 June 2014; pp. 49–54. [Google Scholar]
Figure 1. Our framework models the sentiment graph with self-supervised learning to enhance sentiment predictions.
Figure 1. Our framework models the sentiment graph with self-supervised learning to enhance sentiment predictions.
Electronics 14 00041 g001
Figure 2. Label statistics for Dataset 1.
Figure 2. Label statistics for Dataset 1.
Electronics 14 00041 g002
Figure 3. Sensitivity analysis of λ .
Figure 3. Sensitivity analysis of λ .
Electronics 14 00041 g003
Figure 4. Training losses.
Figure 4. Training losses.
Electronics 14 00041 g004
Figure 5. Attention maps between word and post.
Figure 5. Attention maps between word and post.
Electronics 14 00041 g005
Figure 6. Training Time.
Figure 6. Training Time.
Electronics 14 00041 g006
Table 1. Dataset statistics.
Table 1. Dataset statistics.
DatasetCategoryNumber
SMP2020 Weibo Emotion Classification (Dataset 1)Negative (0)2526
weibo_senti_100k (Dataset 2)Positive (1)4620
Negative (0)59,991
Table 2. Example data table (translated) for weibo_senti_100k.
Table 2. Example data table (translated) for weibo_senti_100k.
IDLabelReview (Translated)
620500Too much! @Rexzhenghao //@Janie_Zhang: More negative news about CMB lately…
682630Hope you’re fine? My “fat blood history” [Dizzy] [Haha] @Pete Uncle
814720A little tempted to join???? [Sneak smile] Still deciding on the time [Frustrated] //@black_crystal: @slender_aunt
420211[Great] Thanks to everyone supporting Juanwa’s sesame! [Love you]
77771The last day of 2013, happily spent in Singapore, wishing all friends: Happy New Year! In 2014, we will be even better [Playful]
1003990Went out at noon, got lost, and now I’m getting sunburned. Couldn’t get more tragic. [Tears][Tears][Sweat]
823980Will Malaysia Airlines still deny it? What exactly are they hiding? [Frustrated] //@Headlines: Share on Weibo
1064230Croatian fans really love fireworks! The ball didn’t even go in, and smoke is everywhere. [Dizzy]
247981[Hug] Blessings TangRouLou special 8.8 discount »> http://t.cn/z...
65981Reply to @QianXumingQXM: [Chuckles][Chuckles] //@QianXumingQXM: Brother Yang [good][good][good]
Table 3. System configuration and tool settings used in the experiments.
Table 3. System configuration and tool settings used in the experiments.
Environment and Tool NamesSpecific Setting Instructions
Operating SystemUbuntu 18.04
CPUi9-10980HK
Memory32 GB
GPU and VRAM CapacityNVIDIA GeForce RTX 3090
Programming LanguagePython 3.11
Deep Learning FrameworkPytorch
Development ToolAnaconda Environmental Manager
Table 4. Sentiment classification results.
Table 4. Sentiment classification results.
AlgorithmDataset 1Dataset 2
AccuracyPrecisionF1AccuracyPrecisionF1
Word2vec model 85.12 % 84.30 % 84.70 % 88.75 % 87.90 % 88.30 %
FastText model 89.75 % 89.00 % 89.20 % 91.33 % 90.80 % 91.10 %
KNN 78.92 % 77.50 % 78.20 % 82.41 % 81.70 % 82.05 %
CNN model 92.23 % 91.80 % 92.00 % 96.41 % 96.00 % 96.20 %
LSTM model 92.13 % 91.50 % 91.80 % 95.87 % 95.40 % 95.60 %
CNN-BiLSTM model 93.08 % 92.70 % 92.90 % 98.16 % 97.80 % 98.00 %
GRU model 93.66 % 93.30 % 93.45 % 97.50 % 97.20 % 97.35 %
Dual-Channel Graph 96.20 % 95.50 % 95.80 % 98.85 % 98.30 % 98.00 %
Knowledge-Enhanced Graph 96.75 % 94.90 % 95.10 % 99.00 % 98.40 % 98.16 %
GSM 95.82 % 94.60 % 94.85 % 98.70 % 98.10 % 98.25 %
CCGIB 96.45 % 95.20 % 95.40 % 98.90 % 98.35 % 98.50 %
Our Model 97.51 % 96.12 % 96.12 % 99.56 % 98.82 % 98.82 %
Table 5. Ablation.
Table 5. Ablation.
AlgorithmDataset 1Dataset 2
AccuracyPrecisionF1AccuracyPrecisionF1
No Post-to-Post Link 97.00 % 95.30 % 95.50 % 99.20 % 98.50 % 98.30 %
No Word-to-Post Link 97.10 % 95.50 % 95.60 % 99.25 % 98.55 % 98.45 %
No Self-Supervised Loss 97.30 % 95.70 % 95.80 % 99.30 % 98.60 % 98.50 %
No Gate Mechanism 97.40 % 95.90 % 95.90 % 99.35 % 98.35 % 98.35 %
Our Model 97.51 % 96.12 % 96.12 % 99.56 % 98.82 % 98.82 %
Table 6. Twitter sentiment classification results.
Table 6. Twitter sentiment classification results.
AlgorithmAccuracy (%)Precision (%)F1 (%)
GRU model72.4571.3271.88
Dual-Channel Graph74.2173.1273.66
Knowledge-Enhanced Graph75.1174.3274.72
GSM74.8973.8774.38
CCGIB75.6774.8975.28
Our Model76.2174.3275.32
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, C.; Konpang, J.; Sirikham, A.; Tian, S. Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning. Electronics 2025, 14, 41. https://doi.org/10.3390/electronics14010041

AMA Style

Wang C, Konpang J, Sirikham A, Tian S. Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning. Electronics. 2025; 14(1):41. https://doi.org/10.3390/electronics14010041

Chicago/Turabian Style

Wang, Chuyang, Jessada Konpang, Adisorn Sirikham, and Shasha Tian. 2025. "Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning" Electronics 14, no. 1: 41. https://doi.org/10.3390/electronics14010041

APA Style

Wang, C., Konpang, J., Sirikham, A., & Tian, S. (2025). Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning. Electronics, 14(1), 41. https://doi.org/10.3390/electronics14010041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop