skip to main content
research-article

Two New Large Corpora for Vietnamese Aspect-based Sentiment Analysis at Sentence Level

Published: 26 May 2021 Publication History

Abstract

Aspect-based sentiment analysis has been studied in both research and industrial communities over recent years. For the low-resource languages, the standard benchmark corpora play an important role in the development of methods. In this article, we introduce two benchmark corpora with the largest sizes at sentence-level for two tasks: Aspect Category Detection and Aspect Polarity Classification in Vietnamese. Our corpora are annotated with high inter-annotator agreements for the restaurant and hotel domains. The release of our corpora would push forward the low-resource language processing community. In addition, we deploy and compare the effectiveness of supervised learning methods with a single and multi-task approach based on deep learning architectures. Experimental results on our corpora show that the multi-task approach based on BERT architecture outperforms the neural network architectures and the single approach. Our corpora and source code are published on this footnoted site.1

References

[1]
Plaban Kr. Bhowmick, Pabitra Mitra, and Anupam Basu. 2008. An agreement measure for determining inter-annotator reliability of human judgements on affective text. In Proceedings of the Workshop on Human Judgements in Computational Linguistics. Association for Computational Linguistics. 58–65.
[2]
Xiao Chen, Changlong Sun, Jingjing Wang, Shoushan Li, Luo Si, Min Zhang, and Guodong Zhou. 2020. Aspect sentiment classification with document-level sentiment preference modeling. In Proceedings of the 58th Meeting of the Association for Computational Linguistics. 3667–3677.
[3]
Yoon Mi Oh François Pellegrino Egidio and Marsico Christophe Coupé. 2013. A quantitative and typological approach to correlating linguistic complexity. QITL-5 (2013), 71.
[4]
Erfan Ghadery, Sajad Movahedi, Heshaam Faili, and Azadeh Shakery. 2019. MNCN: A multilingual Ngram-based convolutional network for aspect category detection in online reviews. In Proceedings of the AAAI Conference on Artificial Intelligence. 6441–6448.
[5]
Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. 2018. Exploiting document knowledge for aspect-level sentiment classification. In Proceedings of the 56th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 579–585.
[6]
Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. 2019. An interactive multi-task learning network for end-to-end aspect-based sentiment analysis. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 504–515.
[7]
Mickel Hoang, Oskar Alija Bihorac, and Jacobo Rouces. 2019. Aspect-based sentiment analysis using bert. In NEAL Proceedings of the 22nd Nordic Conference on Computional Linguistics (NoDaLiDa’19). Linköping University Electronic Press, Association for Computational Linguistics, Finland, 187–196.
[8]
Robert Ireland and Ang Liu. 2018. Application of data analytics for product design: Sentiment analysis of online product reviews. CIRP J. Manuf. Sci. Technol. 23 (2018), 128–144.
[9]
Jian Jin, Ying Liu, Ping Ji, and Hongguang Liu. 2016. Understanding big consumer opinion data for market-driven product design. Int. J. Prod. Res. 54, 10 (2016), 3019–3041.
[10]
H. S. Le, T. V. Le, and T. V. Pham. 2015. Aspect analysis for opinion mining of Vietnamese text. In Proceedings of the International Conference on Advanced Computing and Applications (ACOMP’15). IEEE, 118–123.
[11]
Junjie Li, Haitong Yang, and Chengqing Zong. 2018. Document-level multi-aspect sentiment classification by jointly modeling users, aspects, and overall ratings. In Proceedings of the 27th International Conference on Computational Linguistics. 925–936.
[12]
Xin Li, Lidong Bing, Wenxuan Zhang, and Wai Lam. 2019. Exploiting BERT for end-to-end aspect-based sentiment analysis. In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT’19). 34–41.
[13]
Xin Li and Wai Lam. 2017. Deep multi-task learning for aspect term extraction with memory interaction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2886–2892.
[14]
Bing Liu and Lei Zhang. 2012. A Survey of Opinion Mining and Sentiment Analysis. Springer US, Boston, MA, 415–463.
[15]
Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2019. On the Variance of the Adaptive Learning Rate and Beyond. arxiv:cs.LG/1908.03265 (2019).
[16]
Long Mai and Bac Le. 2018. Aspect-based sentiment analysis of Vietnamese texts with deep learning. In Intelligent Information and Database Systems, Ngoc Thanh Nguyen, Duong Hung Hoang, Tzung-Pei Hong, Hoang Pham, and Bogdan Trawiński (Eds.). Springer International Publishing, Cham, 149–158.
[17]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. CoRR abs/1310.4546 (2013).
[18]
Sajad Movahedi, Erfan Ghadery, Heshaam Faili, and Azadeh Shakery. 2019. Aspect category detection via topic-attention network. CoRR. http://arxiv.org/abs/1901.01183 (2019).
[19]
Dat Quoc Nguyen and Anh Tuan Nguyen. 2020. PhoBERT: Pre-trained language models for Vietnamese. Findings of EMNLP (2020).
[20]
Huyen Nguyen, Hung Nguyen, Quyen Ngo, Luong Vu, Vu Tran, Bach Ngo, and Cuong Le. 2019. VLSP shared task: Sentiment analysis. J. Comput. Sci. Cyber. 34, 4 (2019), 295–310. Retrieved from: http://vjs.ac.vn/index.php/jcc/article/view/13160
[21]
M. Nguyen, T. M. Nguyen, D. Van Thin, and N. L. Nguyen. 2019. A corpus for aspect-based sentiment analysis in Vietnamese. In Proceedings of the 11th International Conference on Knowledge and Systems Engineering (KSE’19). 1–5.
[22]
T. P. Nguyen and A. C. Le. 2016. A hybrid approach to Vietnamese word segmentation. In Proceedings of the IEEE RIVF International Conference on Computing Communication Technologies, Research, Innovation, and Vision for the Future (RIVF’16). IEEE, 114–119.
[23]
Nguyen Minh Nhut. 2020. An analysis of grammatical errors by Vietnamese learners of English. Int. J. Adv. Res. Educ. Soc. 2, 2 (2020), 23–34. Retrieved from: http://myjms.moe.gov.my/index.php/ijares/article/view/9652
[24]
Thai-Hoang Pham and Phuong Le-Hong. 2017. End-to-end recurrent neural network models for Vietnamese named entity recognition: Word-level vs. character-level. CoRR abs/1705.04044 (2017).
[25]
Ben Phạm and Sharynne McLeod. 2016. Consonants, vowels and tones across Vietnamese dialects. Int. J. Speech-lang. Pathol. 18, 2 (2016), 122–134.
[26]
Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Ion Androutsopoulos, Suresh Manandhar, Mohammad AL-Smadi, Mahmoud Al-Ayyoub, Yanyan Zhao, Bing Qin, Orphée De Clercq, Véronique Hoste, Marianna Apidianaki, Xavier Tannier, Natalia Loukachevitch, Evgeniy Kotelnikov, Nuria Bel, Salud María Jiménez-Zafra, and Gülşen Eryiğit. 2016. SemEval-2016 task 5: Aspect based sentiment analysis. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). Association for Computational Linguistics, 19–30.
[27]
Maria Pontiki, Dimitris Galanis, Haris Papageorgiou, Suresh Manandhar, and Ion Androutsopoulos. 2015. SemEval-2015 task 12: Aspect based sentiment analysis. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). Association for Computational Linguistics, 486–495.
[28]
Radim Řehůřek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC Workshop on New Challenges for NLP Frameworks. ELRA, 45–50. Retrieved from http://is.muni.cz/publication/884893/en
[29]
Marzieh Saeidi, Guillaume Bouchard, Maria Liakata, and Sebastian Riedel. 2016. SentiHood: Targeted aspect based sentiment analysis dataset for urban neighbourhoods. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). The COLING 2016 Organizing Committee, 1546–1556. Retrieved from https://www.aclweb.org/anthology/C16-1146
[30]
Martin Schmitt, Simon Steinheber, Konrad Schreiber, and Benjamin Roth. 2018. Joint aspect and polarity classification for aspect-based sentiment analysis with end-to-end neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1109–1114. Retrieved from https://www.aclweb.org/anthology/D18-1139
[31]
Konstantinos Sechidis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2011. On the stratification of multi-label data. In Machine Learning and Knowledge Discovery in Databases, Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis (Eds.). Springer Berlin, 145–158.
[32]
Chi Sun, Luyao Huang, and Xipeng Qiu. 2019. Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 380–385.
[33]
Giang Tang. 2007. Cross-linguistic analysis of Vietnamese and English with implications for Vietnamese language acquisition and maintenance in the United States. J. Southeast Asian Amer. Educ. Advanc. 2, 1 (2007), 3.
[34]
D. V. Thin, V. D. Nguye, K. V. Nguyen, and N. L. Nguyen. 2018. Deep learning for aspect detection on Vietnamese reviews. In Proceedings of the 5th NAFOSTED Conference on Information and Computer Science (NICS’18). IEEE, 104–109.
[35]
Dang Van Thin, Vu Nguyen, Nguyen Kiet, and Nguyen Ngan. 2019. A transformation method for aspect-based sentiment analysis. J. Comput. Sci. Cyber. 34, 4 (2019), 323–333.
[36]
N. T. T. Thuy, N. X. Bach, and T. M. Phuong. 2018. Cross-language aspect extraction for opinion mining. In Proceedings of the 10th International Conference on Knowledge and Systems Engineering (KSE’18). IEEE, 67–72.
[37]
Khai Tran and Thi Phan. 2019. Deep learning application to ensemble learning—The simple, but effective, approach to sentiment classifying. Appl. Sci. 9, 13 (July 2019), 2760.
[38]
Phuoc Tran, Dien Dinh, and Hien T. Nguyen. 2016. A character level based and word level based approach for Chinese-Vietnamese machine translation. Computational intelligence and Neuroscience 2016 (2016).
[39]
Hai Wan, Yufei Yang, Jianfeng Du, Yanan Liu, Kunxun Qi, and Jeff Z. Pan. 2020. Target-aspect-sentiment joint detection for aspect-based sentiment analysis. In Proceedings of the AAAI Conference on Artificial Intelligence. 9122–9129.
[40]
Jingjing Wang, Jie Li, Shoushan Li, Yangyang Kang, Min Zhang, Luo Si, and Guodong Zhou. 2018. Aspect sentiment classification with both word-level and clause-level attention networks. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 4439–4445.
[41]
Jingjing Wang, Changlong Sun, Shoushan Li, Jiancheng Wang, Luo Si, Min Zhang, Xiaozhong Liu, and Guodong Zhou. 2019. Human-like decision making: Document-level aspect sentiment classification via hierarchical reinforcement learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 5585–5594.
[42]
Wenya Wang, Sinno Jialin Pan, and Daniel Dahlmeier. 2017. Multi-task coupled attentions for category-specific aspect and opinion terms co-extraction. CoRR abs/1702.01776 (2017).
[43]
Michael Wojatzki, Eugen Ruppert, Sarah Holschneider, Torsten Zesch, and Chris Biemann. 2017. Germeval 2017: Shared task on aspect-based sentiment in social media customer feedback. Proceedings of the GermEval (2017), 1--12.
[44]
Wei Xue and Tao Li. 2018. Aspect based sentiment analysis with gated convolutional networks. In Proceedings of the 56th Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2514–2523.
[45]
Wei Xue, Wubai Zhou, Tao Li, and Qing Wang. 2017. MTNA: A neural multi-task model for aspect category classification and aspect term extraction on restaurant reviews. In Proceedings of the 8th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 151–156.
[46]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1480–1489.
[47]
Yichun Yin, Yangqiu Song, and Ming Zhang. 2017. Document-level multi-aspect sentiment classification as machine comprehension. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2044–2054.
[48]
J. Yu, J. Jiang, and R. Xia. 2019. Global inference for aspect and opinion terms co-extraction based on multi-task neural networks. IEEE/ACM Trans. Aud., Speech, Lang. Proc. 27, 1 (2019), 168–177.
[49]
Hai Zhao, Tianjiao Yin, and Jingyi Zhang. 2013. Vietnamese to Chinese machine translation via Chinese character as pivot. In Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC’13). 250–259.
[50]
Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis C. M. Lau. 2015. A C-LSTM neural network for text classification. CoRR abs/1511.08630 (2015).
[51]
Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. 2015. Representation learning for aspect category detection in online reviews. In Proceedings of the 29th AAAI Conference on Artificial Intelligence.

Cited By

View all

Index Terms

  1. Two New Large Corpora for Vietnamese Aspect-based Sentiment Analysis at Sentence Level

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 4
    July 2021
    419 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3465463
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 May 2021
    Accepted: 01 January 2021
    Revised: 01 December 2020
    Received: 01 February 2020
    Published in TALLIP Volume 20, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Aspect-based sentiment analysis
    2. deep neural network
    3. multi-task learning
    4. Vietnamese corpora

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Vietnam National University HoChiMinh City (VNU-HCM)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)45
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media