Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning
Abstract
:1. Introduction
2. Literature Review
2.1. Traditional Weibo Sentiment Analysis
2.2. Graph Learning for Weibo Sentiment Analysis
3. Problem Definition
4. Methodology
4.1. Construction of Sentiment Graph
4.2. Feature Transformation Using FastText Embeddings
- denotes the neighbors of v;
- represents attention weights that capture the importance of neighboring nodes;
- is a learnable weight matrix;
- is our proposed learnable gating function that dynamically controls the influence of each neighboring node based on its sentiment features;
- is a non-linear activation function.
4.3. Model Training with Additional Self-Supervised Loss
5. Experiment
5.1. Datasets
5.2. Example of Data
- Sentiment Label Distribution: The table predominantly features posts labeled “0”, representing neutral or negative sentiments, with a smaller portion labeled “1” for positive sentiments. This distribution illustrates the range of emotions present in social media, from frustration and disappointment to gratitude and joy.
- Role of Emoticons and Social Media Language: Emoticons like [Tears], [Dizzy], [Haha], and [Playful] are integral to sentiment interpretation, as these symbols often convey emotions more directly than words alone. This underscores the unique challenge of sentiment analysis on platforms like Weibo, where textual and visual elements combine to express sentiment.
- Contextual Complexity of Posts: Some posts, such as ID 62050, mention specific events or contexts (e.g., “More negative news about CMB”), making sentiment difficult to assess without background information. Additionally, comments on shared content (e.g., ID 81472) add complexity, as accurate sentiment analysis requires understanding both the primary text and the referenced content.
- Direct vs. Indirect Sentiment Expression: Positive posts labeled “1” often express clear sentiments, such as gratitude or well wishes (e.g., IDs 7777 and 6598). In contrast, posts labeled “0” show negative sentiments, including frustration and confusion, as seen in IDs 100399 and 82398, where users express dissatisfaction with experiences like getting lost or facing unclear airline policies.
5.3. Environment
5.4. Data Preprocessing
- 1.
- Text Cleaning: To remove noise from the text, we applied regular expressions to filter out HTML tags, irrelevant URLs, emoticons, and extraneous letters and numbers, retaining only the essential text content.
- 2.
- Word Segmentation: We employed the Jieba word segmentation library, a versatile tool for Chinese word segmentation. Jieba offers several segmentation modes: precise, full, and search engine modes. Additionally, it allows for custom dictionaries, enabling tailored segmentation for specific domain vocabularies.
- 3.
- Stopword Removal: Stopwords are commonly removed in text preprocessing, as they do not contribute meaningful information. We utilized the Harbin Institute of Technology’s stopword list as our primary source and performed secondary filtering with the Baidu stopword list to enhance text quality.
- 4.
- Numerical Conversion: We used a Tokenizer to transform the cleaned text data into numerical format, making them compatible with model processing.
5.5. Experimental Parameters
5.6. Baseline Models
- Word2Vec: A neural network model that learns word embeddings by predicting context words (skip-gram) or target words (CBOW), effectively capturing semantic relationships between words.
- FastText: An extension of Word2Vec that represents each word as a collection of character n-grams, enabling the model to capture subword information and improve performance with rare or misspelled words through enhanced morphological understanding.
- K-Nearest Neighbors (KNN): A non-parametric, distance-based classification method that assigns labels based on the majority label among the k nearest neighbors, often utilizing word embeddings or document vectors to measure proximity in text data.
- Convolutional Neural Networks (CNNs): Apply 1D convolutional layers on word embeddings to extract local n-gram features, which are aggregated via pooling layers to produce fixed-size vectors for classification tasks.
- Long Short-Term Memory (LSTM): A recurrent neural network (RNN) variant designed for sequential data, employing gates to regulate information flow and effectively capturing long-term dependencies in text, which is particularly useful for sentiment analysis.
- CNN-BiLSTM: Combines CNNs to extract local patterns with a bidirectional LSTM (BiLSTM) to capture contextual information from both past and future tokens, leveraging the strengths of both architectures for improved performance.
- Gated Recurrent Unit (GRU): A streamlined alternative to LSTM with fewer gates and no separate cell state, providing the efficient training and effective modeling of sequential dependencies, especially for shorter text sequences.
- Dual-Channel Graph [40]: Utilizes attention mechanisms to capture syntactic structures and multi-aspect sentiment dependencies, improving the model’s interpretation of complex, multi-faceted sentences.
- Knowledge-Enhanced Graph [39]: Incorporates external sentiment vocabularies to enrich aspect-based sentiment analysis by enhancing the model’s understanding of sentiment-heavy words and phrases, representing the current state of the art.
- Gaussian Similarity Modeling (GSM) [41]: Employs Gaussian similarity metrics to enhance GNN performance by improving the representation of node relationships.
- Cross-Channel Graph Information Bottleneck (CCGIB) [42]: Leverages cross-channel graph information to improve GNN performance by effectively balancing node information and graph sparsity.
5.7. Classification Results
5.8. Ablation
5.9. Sensitivity Check
5.10. Training and Validation Losses
6. Visualization of Attention Maps
6.1. Time and Resource Analysis
6.2. Applying to Other Social Media and Language
7. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lyu, X.; Chen, Z.; Wu, D.; Wang, W. Sentiment analysis on Chinese Weibo regarding COVID-19. In Proceedings of the Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, 14–18 October 2020; Part I 9. Springer: Berlin/Heidelberg, Germany, 2020; pp. 710–721. [Google Scholar]
- Wu, W. A sentiment analysis approach to discover public panic: Based on weibo COVID-19 data. Soc. Netw. 2022, 11, 33–39. [Google Scholar] [CrossRef]
- Li, H.; Ma, Y.; Ma, Z.; Zhu, H. Weibo text sentiment analysis based on bert and deep learning. Appl. Sci. 2021, 11, 10774. [Google Scholar] [CrossRef]
- Li, S.; Wang, Y.; Xue, J.; Zhao, N.; Zhu, T. The impact of COVID-19 epidemic declaration on psychological consequences: A study on active Weibo users. Int. J. Environ. Res. Public Health 2020, 17, 2032. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Pentina, I. Motivations and usage patterns of Weibo. Cyberpsychology Behav. Soc. Netw. 2012, 15, 312–317. [Google Scholar] [CrossRef] [PubMed]
- Brenner, M.H.; Bhugra, D. Acceleration of anxiety, depression, and suicide: Secondary effects of economic disruption related to COVID-19. Front. Psychiatry 2020, 11, 592467. [Google Scholar] [CrossRef] [PubMed]
- Lin, M. Discursive Construction of Personal and Social Identities by Chinese Celebrities on Sina Weibo. Ph.D. Thesis, Hong Kong Polytechnic University, Hongkong, China, 2018. [Google Scholar]
- Liu, P.; Chen, W.; Ou, G.; Wang, T.; Yang, D.; Lei, K. Sarcasm detection in social media based on imbalanced classification. In Proceedings of the Web-Age Information Management: 15th International Conference, WAIM 2014, Macau, China, 16–18 June 2014; Proceedings 15. Springer: Berlin/Heidelberg, Germany, 2014; pp. 459–471. [Google Scholar]
- Jim, J.R.; Talukder, M.A.R.; Malakar, P.; Kabir, M.M.; Nur, K.; Mridha, M. Recent advancements and challenges of nlp-based sentiment analysis: A state-of-the-art review. Nat. Lang. Process. J. 2024, 6, 100059. [Google Scholar] [CrossRef]
- Poria, S.; Hazarika, D.; Majumder, N.; Mihalcea, R. Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research. IEEE Trans. Affect. Comput. 2020, 14, 108–132. [Google Scholar] [CrossRef]
- Mostafavi, M.; Porter, M.D.; Robinson, D.T. Contextual Embeddings in Sociological Research: Expanding the Analysis of Sentiment and Social Dynamics. Sociol. Methodol. 2024, 00811750241260729. [Google Scholar] [CrossRef]
- Thieme, A.; Belgrave, D.; Doherty, G. Machine learning in mental health: A systematic review of the HCI literature to support the development of effective and implementable ML systems. ACM Trans. Comput.-Hum. Interact. (TOCHI) 2020, 27, 1–53. [Google Scholar] [CrossRef]
- Zhong, B.; Huang, Y.; Liu, Q. Mental health toll from the coronavirus: Social media usage reveals Wuhan residents’ depression and secondary trauma in the COVID-19 outbreak. Comput. Hum. Behav. 2021, 114, 106524. [Google Scholar] [CrossRef] [PubMed]
- Resnik, P.; Lin, J. Evaluation of NLP systems. In The Handbook of Computational Linguistics and Natural Language Processing; Wiley: Hoboken, NJ, USA, 2010; pp. 271–295. [Google Scholar]
- Mihalcea, R.; Liu, H.; Lieberman, H. NLP (natural language processing) for NLP (natural language programming). In Proceedings of the Computational Linguistics and Intelligent Text Processing: 7th International Conference, CICLing 2006, Mexico City, Mexico, 19–25 February 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 319–330. [Google Scholar]
- Nadkarni, P.M.; Ohno-Machado, L.; Chapman, W.W. Natural language processing: An introduction. J. Am. Med Inform. Assoc. 2011, 18, 544–551. [Google Scholar] [CrossRef]
- Acheampong, F.A.; Nunoo-Mensah, H.; Chen, W. Transformer models for text-based emotion detection: A review of BERT-based approaches. Artif. Intell. Rev. 2021, 54, 5789–5829. [Google Scholar] [CrossRef]
- Alswaidan, N.; Menai, M.E.B. A survey of state-of-the-art approaches for emotion recognition in text. Knowl. Inf. Syst. 2020, 62, 2937–2987. [Google Scholar] [CrossRef]
- Kumari, Y.V. Explainable AI Framework Through Multi-Context Multi-Dimensional Graph Neural Network; University of Missouri-Kansas City: Kansas City, MO, USA, 2023. [Google Scholar]
- Yue, L.; Chen, W.; Li, X.; Zuo, W.; Yin, M. A survey of sentiment analysis in social media. Knowl. Inf. Syst. 2019, 60, 617–663. [Google Scholar] [CrossRef]
- Yadav, A.; Vishwakarma, D.K. Sentiment analysis using deep learning architectures: A review. Artif. Intell. Rev. 2020, 53, 4335–4385. [Google Scholar] [CrossRef]
- Peng, S.; Cao, L.; Zhou, Y.; Ouyang, Z.; Yang, A.; Li, X.; Jia, W.; Yu, S. A survey on deep learning for textual emotion analysis in social networks. Digit. Commun. Networks 2022, 8, 745–762. [Google Scholar] [CrossRef]
- Singh, A.; Nowak, R.; Zhu, J. Unlabeled data: Now it helps, now it doesn’t. Adv. Neural Inf. Process. Syst. 2008, 21. [Google Scholar]
- Montero Quispe, K.G.; Utyiama, D.M.; Dos Santos, E.M.; Oliveira, H.A.; Souto, E.J. Applying self-supervised representation learning for emotion recognition using physiological signals. Sensors 2022, 22, 9102. [Google Scholar] [CrossRef]
- Wu, Y.; Daoudi, M.; Amad, A. Transformer-based self-supervised multimodal representation learning for wearable emotion recognition. IEEE Trans. Affect. Comput. 2023, 15, 157–172. [Google Scholar] [CrossRef]
- Medhat, W.; Hassan, A.; Korashy, H. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 2014, 5, 1093–1113. [Google Scholar] [CrossRef]
- Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Yadav, A.; Chaudhary, A.; Gupta, A. Advancing Sentiment Understanding in social media through Dynamic Contextual Embedding. In Proceedings of the 2024 3rd International Conference for Innovation in Technology (INOCON), Bangalore, India, 1–3 March 2024; pp. 1–7. [Google Scholar]
- Zhao, S.; Chen, L.; Liu, Y.; Yu, M.; Han, H. Deriving anti-epidemic policy from public sentiment: A framework based on text analysis with microblog data. PLoS ONE 2022, 17, e0270953. [Google Scholar] [CrossRef] [PubMed]
- Duan, J.; Zhai, W.; Cheng, C. Crowd detection in mass gatherings based on social media data: A case study of the 2014 shanghai new year’s eve stampede. Int. J. Environ. Res. Public Health 2020, 17, 8640. [Google Scholar] [CrossRef] [PubMed]
- Xie, R.; Chu, S.; Chiu, D.; Wang, Y. Exploring public response to COVID-19 on weibo with lda topic modeling and sentiment analysis. Data Inf. Manag. 2021, 5, 86–99. [Google Scholar] [CrossRef] [PubMed]
- Kwon, H. AudioGuard: Speech Recognition System Robust against Optimized Audio Adversarial Examples. Multimed. Tools Appl. 2024, 83, 57943–57962. [Google Scholar] [CrossRef]
- Ma, J.; Chen, G.; Lv, W. Research and analysis of psychological data based on machine learning methods. Int. J. Wirel. Mob. Comput. 2022, 22, 1. [Google Scholar] [CrossRef]
- Chandrasekaran, G.; Dhanasekaran, S.; Moorthy, C.; Arul Oli, A. Multimodal sentiment analysis leveraging the strength of deep neural networks enhanced by the XGBoost classifier. Comput. Methods Biomech. Biomed. Eng. 2024, 1–23. [Google Scholar] [CrossRef]
- Xu, Y.; Liu, Z.; Zhao, J.; Su, C. Weibo sentiments and stock return: A time-frequency view. PLoS ONE 2017, 12, e0180723. [Google Scholar] [CrossRef] [PubMed]
- Das, N.; Sadhukhan, B.; Chatterjee, R.; Chakrabarti, S. Integrating sentiment analysis with graph neural networks for enhanced stock prediction: A comprehensive survey. Decis. Anal. J. 2024, 10, 100417. [Google Scholar] [CrossRef]
- Mahdi, A.S.; Shati, N.M. A Survey on Fake News Detection in Social Media Using Graph Neural Networks. J. Al-Qadisiyah Comput. Sci. Math. 2024, 16, 23–41. [Google Scholar]
- Rad, R.A.; Yamaghani, M.R.; Nourbakhsh, A. A survey of sentiment analysis methods based on graph neural network. 2023. [Google Scholar] [CrossRef]
- Zhao, Y.; Mamat, M.; Aysa, A.; Ubul, K. Knowledge-fusion-based iterative graph structure learning framework for implicit sentiment identification. Sensors 2023, 23, 6257. [Google Scholar] [CrossRef] [PubMed]
- Lan, Z.; He, Q.; Yang, L. Dual-channel interactive graph convolutional networks for aspect-level sentiment analysis. Mathematics 2022, 10, 3317. [Google Scholar] [CrossRef]
- Fan, X.; Gong, M.; Wu, Y.; Tang, Z.; Liu, J. Neural Gaussian Similarity Modeling for Differential Graph Structure Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 11919–11926. [Google Scholar]
- Fan, X.; Gong, M.; Wu, Y.; Zhang, M.; Li, H.; Jiang, X. CCGIB: A Cross-Channel Graph Information Bottleneck Principle. IEEE Trans. Neural Netw. Learn. Syst. 2024. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.; Ding, Y.; Xie, Z.; Liu, S.; Ding, H. Chinese weibo sentiment analysis based on character embedding with dual-channel convolutional neural network. In Proceedings of the 2018 IEEE 3rd International conference on cloud computing and big data analysis (ICCCBDA), Chengdu, China, 20–22 April 2018; pp. 107–111. [Google Scholar] [CrossRef]
- Baltrušaitis, T.; Ahuja, C.; Morency, L. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 423–443. [Google Scholar] [CrossRef]
- Dong, L.; Wei, F.; Tan, C.; Tang, D.; Zhou, M.; Xu, K. Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, MD, USA, 22–27 June 2014; pp. 49–54. [Google Scholar]
Dataset | Category | Number |
---|---|---|
SMP2020 Weibo Emotion Classification (Dataset 1) | Negative (0) | 2526 |
weibo_senti_100k (Dataset 2) | Positive (1) | 4620 |
Negative (0) | 59,991 |
ID | Label | Review (Translated) |
---|---|---|
62050 | 0 | Too much! @Rexzhenghao //@Janie_Zhang: More negative news about CMB lately… |
68263 | 0 | Hope you’re fine? My “fat blood history” [Dizzy] [Haha] @Pete Uncle |
81472 | 0 | A little tempted to join???? [Sneak smile] Still deciding on the time [Frustrated] //@black_crystal: @slender_aunt |
42021 | 1 | [Great] Thanks to everyone supporting Juanwa’s sesame! [Love you] |
7777 | 1 | The last day of 2013, happily spent in Singapore, wishing all friends: Happy New Year! In 2014, we will be even better [Playful] |
100399 | 0 | Went out at noon, got lost, and now I’m getting sunburned. Couldn’t get more tragic. [Tears][Tears][Sweat] |
82398 | 0 | Will Malaysia Airlines still deny it? What exactly are they hiding? [Frustrated] //@Headlines: Share on Weibo |
106423 | 0 | Croatian fans really love fireworks! The ball didn’t even go in, and smoke is everywhere. [Dizzy] |
24798 | 1 | [Hug] Blessings TangRouLou special 8.8 discount »> http://t.cn/z... |
6598 | 1 | Reply to @QianXumingQXM: [Chuckles][Chuckles] //@QianXumingQXM: Brother Yang [good][good][good] |
Environment and Tool Names | Specific Setting Instructions |
---|---|
Operating System | Ubuntu 18.04 |
CPU | i9-10980HK |
Memory | 32 GB |
GPU and VRAM Capacity | NVIDIA GeForce RTX 3090 |
Programming Language | Python 3.11 |
Deep Learning Framework | Pytorch |
Development Tool | Anaconda Environmental Manager |
Algorithm | Dataset 1 | Dataset 2 | ||||
---|---|---|---|---|---|---|
Accuracy | Precision | F1 | Accuracy | Precision | F1 | |
Word2vec model | ||||||
FastText model | ||||||
KNN | ||||||
CNN model | ||||||
LSTM model | ||||||
CNN-BiLSTM model | ||||||
GRU model | ||||||
Dual-Channel Graph | ||||||
Knowledge-Enhanced Graph | ||||||
GSM | ||||||
CCGIB | ||||||
Our Model |
Algorithm | Dataset 1 | Dataset 2 | ||||
---|---|---|---|---|---|---|
Accuracy | Precision | F1 | Accuracy | Precision | F1 | |
No Post-to-Post Link | ||||||
No Word-to-Post Link | ||||||
No Self-Supervised Loss | ||||||
No Gate Mechanism | ||||||
Our Model |
Algorithm | Accuracy (%) | Precision (%) | F1 (%) |
---|---|---|---|
GRU model | 72.45 | 71.32 | 71.88 |
Dual-Channel Graph | 74.21 | 73.12 | 73.66 |
Knowledge-Enhanced Graph | 75.11 | 74.32 | 74.72 |
GSM | 74.89 | 73.87 | 74.38 |
CCGIB | 75.67 | 74.89 | 75.28 |
Our Model | 76.21 | 74.32 | 75.32 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, C.; Konpang, J.; Sirikham, A.; Tian, S. Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning. Electronics 2025, 14, 41. https://doi.org/10.3390/electronics14010041
Wang C, Konpang J, Sirikham A, Tian S. Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning. Electronics. 2025; 14(1):41. https://doi.org/10.3390/electronics14010041
Chicago/Turabian StyleWang, Chuyang, Jessada Konpang, Adisorn Sirikham, and Shasha Tian. 2025. "Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning" Electronics 14, no. 1: 41. https://doi.org/10.3390/electronics14010041
APA StyleWang, C., Konpang, J., Sirikham, A., & Tian, S. (2025). Mining Nuanced Weibo Sentiment with Hierarchical Graph Modeling and Self-Supervised Learning. Electronics, 14(1), 41. https://doi.org/10.3390/electronics14010041