skip to main content
10.1145/3589334.3648141acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Open access

Human vs ChatGPT: Effect of Data Annotation in Interpretable Crisis-Related Microblog Classification

Published: 13 May 2024 Publication History

Abstract

Recent studies have exploited the vital role of microblogging platforms, such as Twitter, in crisis situations. Various machine-learning approaches have been proposed to identify and prioritize crucial information from different humanitarian categories for preparation and rescue purposes. In crisis domain, the explanation of models' output decisions is gaining significant research momentum. Some previous works focused on human annotations of rationales to train and extract supporting evidence for model interpretability. However, such annotations are usually expensive, require much effort, and are not always available in real-time situations of a new crisis event. In this paper, we investigate the recent advances in large language models (LLMs) as data annotators on informal tweet text. We perform a detailed qualitative and quantitative evaluation of ChatGPT rationale annotations over a few-shot setup. ChatGPT annotations are quite close to humans but less precise in nature. Further, we propose an active learning-based interpretable classification model from a small set of annotated data. Our experiments show that (a). ChatGPT has the potential to extract rationales for the crisis tweet classification tasks, but the performance is slightly less than the model trained on human-annotated rationale data (\sim3-6%), (b). active learning setup can help reduce the burden of manual annotations and maintain a trade-off between performance and data size.

Supplemental Material

MP4 File
Supplemental video

References

[1]
Firoj Alam, Shafiq R. Joty, and Muhammad Imran. 2018a. Graph Based Semi-Supervised Learning with Convolution Neural Networks to Classify Crisis Related Tweets. In Proceedings of the Twelfth International Conference on Web and Social Media, ICWSM 2018, Stanford, California, USA, June 25--28, 2018. 556--559.
[2]
Firoj Alam, Ferda Ofli, and Muhammad Imran. 2018b. Crisismmd: Multimodal twitter datasets from natural disasters. In Proceedings of the international AAAI conference on web and social media, Vol. 12.
[3]
Firoj Alam, Hassan Sajjad, Muhammad Imran, and Ferda Ofli. 2021. CrisisBench: Benchmarking crisis-related social media datasets for humanitarian information processing. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 15. 923--932.
[4]
Amrita Bhattacharjee, Mansooreh Karami, and Huan Liu. 2022. Text transformations in contrastive self-supervised learning: a review. arXiv preprint arXiv:2203.12000 (2022).
[5]
Shiming Chen, Ziming Hong, Yang Liu, Guo-Sen Xie, Baigui Sun, Hao Li, Qinmu Peng, Ke Lu, and Xinge You. 2022. Transzero: Attribute-guided transformer for zero-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 330--338.
[6]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[7]
Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H. Bach, and Himabindu Lakkaraju. 2022. Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post Hoc Explanations. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (Oxford, United Kingdom) (AIES '22). Association for Computing Machinery, New York, NY, USA, 203--214.
[8]
Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, and Bernhard Schölkopf. 2018. Fidelity-Weighted Learning. In ICLR '19. https://openreview.net/forum?id=B1X0mzZCW
[9]
Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W Bruce Croft. 2017. Neural ranking models with weak supervision. In ACM SIGIR'17. ACM, 65--74.
[10]
Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C. Wallace. 2020. ERASER: A Benchmark to Evaluate Rationalized NLP Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). 4443--4458.
[11]
Fernando Ferraretto, Thiago Laitz, Roberto Lotufo, and Rodrigo Nogueira. 2023. ExaRanker: Explanation-Augmented Neural Ranker. arXiv preprint arXiv:2301.10521 (2023).
[12]
Niloy Ganguly, Dren Fazlija, Maryam Badar, Marco Fisichella, Sandipan Sikdar, Johanna Schrader, Jonas Wallat, Koustav Rudra, Manolis Koubarakis, Gourab K Patro, et al. 2023. A review of the role of causality in developing trustworthy ai systems. arXiv preprint arXiv:2302.06975 (2023).
[13]
Fabrizio Gilardi, Meysam Alizadeh, and Maël Kubli. 2023. Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056 (2023).
[14]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[15]
Fan Huang, Haewoon Kwak, and Jisun An. 2023. Is chatgpt better than human annotators? potential and limitations of chatgpt in explaining implicit hate speech. arXiv preprint arXiv:2302.07736 (2023).
[16]
Muhammad Imran, Carlos Castillo, Fernando Diaz, and Sarah Vieweg. 2015. Processing Social Media Messages in Mass Emergency: A Survey. ACM Comput. Surv., Vol. 47, 4, 67:1--67:38.
[17]
Muhammad Imran, Carlos Castillo, Ji Lucas, Patrick Meier, and Sarah Vieweg. 2014. AIDR: artificial intelligence for disaster response. In 23rd International World Wide Web Conference, WWW '14, Seoul, Republic of Korea, April 7--11, 2014, Companion Volume. 159--162.
[18]
Muhammad Imran, Prasenjit Mitra, and Carlos Castillo. 2016. Twitter as a lifeline: Human-annotated twitter corpora for NLP of crisis-related messages. arXiv preprint arXiv:1605.05894 (2016).
[19]
Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, and Byron C. Wallace. 2020. Learning to Faithfully Rationalize by Construction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
[20]
Jens Kersten, Anna M. Kruspe, Matti Wiegmann, and Friederike Klan. 2019. Robust filtering of crisis-related tweets. In Proceedings of the 16th International Conference on Information Systems for Crisis Response and Management, Valè ncia, Spain, May 19--22, 2019, Association.
[21]
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. Advances in neural information processing systems, Vol. 35 (2022), 22199--22213.
[22]
Taja Kuzman, Igor Mozetic, and Nikola Ljubevs ic. 2023. Chatgpt: Beginning of an end of manual linguistic data annotation? use case of automatic genre identification. ArXiv, abs/2303.03953 (2023).
[23]
Andrew Lampinen, Ishita Dasgupta, Stephanie Chan, Kory Mathewson, Mh Tessler, Antonia Creswell, James McClelland, Jane Wang, and Felix Hill. 2022. Can language models learn from explanations in context?. In Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, 537--563.
[24]
Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jingyuan Wang, Jian-Yun Nie, and Ji-Rong Wen. 2023. The Web Can Be Your Oyster for Improving Large Language Models. arXiv preprint arXiv:2305.10998 (2023).
[25]
Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).
[26]
Weixin Liang, Girmaw Abebe Tadesse, Daniel Ho, L Fei-Fei, Matei Zaharia, Ce Zhang, and James Zou. 2022. Advances, challenges and opportunities in creating data for trustworthy AI. Nature Machine Intelligence, Vol. 4, 8 (2022), 669--677.
[27]
Yixin Liu, Ming Jin, Shirui Pan, Chuan Zhou, Yu Zheng, Feng Xia, and S Yu Philip. 2022. Graph self-supervised learning: A survey. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 6 (2022), 5879--5900.
[28]
Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019.
[29]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems, Vol. 30 (2017).
[30]
Ana Marasovic, Iz Beltagy, Doug Downey, and Matthew Peters. 2022. Few-Shot Self-Rationalization with Natural Language Prompts. In Findings of the Association for Computational Linguistics: NAACL 2022. Association for Computational Linguistics, 410--424.
[31]
Richard McCreadie, Cody Buntain, and Ian Soboroff. 2019. TREC Incident Streams: Finding Actionable Information on Social Media. In Proceedings of the 16th International Conference on Information Systems for Crisis Response and Management, Valè ncia, Spain, May 19--22, 2019.
[32]
Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. 2020. BERTweet: A pre-trained language model for English Tweets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations.
[33]
Dat Tien Nguyen, Kamla Al-Mannai, Shafiq R. Joty, Hassan Sajjad, Muhammad Imran, and Prasenjit Mitra. 2017. Robust Classification of Crisis-Related Data on Social Networks Using Convolutional Neural Networks. In Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017, Montré al, Qué bec, Canada, May 15--18, 2017. 632--635.
[34]
Thi Huyen Nguyen, Hoang H Nguyen, Zahra Ahmadi, Tuan-Anh Hoang, and Thanh-Nam Doan. 2021. On the Impact of Dataset Size: A Twitter Classification Case Study. In IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. 210--217.
[35]
Thi Huyen Nguyen and Koustav Rudra. 2022a. Rationale Aware Contrastive Learning Based Approach to Classify and Summarize Crisis-Related Microblogs. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 1552--1562.
[36]
Thi Huyen Nguyen and Koustav Rudra. 2022b. Towards an Interpretable Approach to Classify and Summarize Crisis Events from Microblogs. In Proceedings of the ACM Web Conference 2022. 3641--3650.
[37]
Thi Huyen Nguyen and Koustav Rudra. 2023. Learning Faithful Attention for Interpretable Classification of Crisis-Related Microblogs under Constrained Human Budget. In Proceedings of the ACM Web Conference 2023. 3959--3967.
[38]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, Vol. 35 (2022), 27730--27744.
[39]
Serdar Ozsoy, Shadi Hamdan, Sercan Arik, Deniz Yuret, and Alper Erdogan. 2022. Self-supervised learning with an information maximization criterion. Advances in Neural Information Processing Systems, Vol. 35 (2022), 35240--35253.
[40]
Jayr Pereira, Robson Fidalgo, Roberto Lotufo, and Rodrigo Nogueira. 2023. Crisis Event Social Media Summarization with GPT-3 and Neural Reranking. In Proceedings of the 20th ISCRAM Conference.
[41]
Farhad Pourpanah, Moloud Abdar, Yuxuan Luo, Xinlei Zhou, Ran Wang, Chee Peng Lim, Xi-Zhao Wang, and QM Jonathan Wu. 2022. A review of generalized zero-shot learning methods. IEEE transactions on pattern analysis and machine intelligence (2022).
[42]
Litao Qiao, Weijia Wang, and Bill Lin. 2021. Learning accurate and interpretable decision rule sets from neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4303--4311.
[43]
Guanghui Qin and Jason Eisner. 2021. Learning How to Ask: Querying LMs with Mixtures of Soft Prompts. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 5203--5212.
[44]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.
[45]
Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Brij B Gupta, Xiaojiang Chen, and Xin Wang. 2021. A survey of deep active learning. ACM computing surveys (CSUR), Vol. 54, 9 (2021), 1--40.
[46]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.
[47]
Ohad Rubin, Jonathan Herzig, and Jonathan Berant. 2021. Learning To Retrieve Prompts for In-Context Learning. CoRR, Vol. abs/2112.08633 (2021).
[48]
Koustav Rudra, Subham Ghosh, Niloy Ganguly, Pawan Goyal, and Saptarshi Ghosh. 2015. Extracting Situational Information from Microblogs during Disaster Events: a Classification-Summarization Approach. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, Melbourne, VIC, Australia, October 19 - 23, 2015. 583--592.
[49]
Koustav Rudra, Pawan Goyal, Niloy Ganguly, Prasenjit Mitra, and Muhammad Imran. 2018. Identifying Sub-events and Summarizing Disaster-Related Information from Microblogs. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08--12, 2018. 265--274.
[50]
Tao Shen, Guodong Long, Xiubo Geng, Chongyang Tao, Tianyi Zhou, and Daxin Jiang. 2023. Large Language Models are Strong Zero-Shot Retriever. arXiv preprint arXiv:2304.14233 (2023).
[51]
Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, et al. 2022. Selective annotation makes language models better few-shot learners. arXiv preprint arXiv:2209.01975 (2022).
[52]
Sainbayar Sukhbaatar, Joan Bruna, Manohar Paluri, Lubomir Bourdev, and Rob Fergus. 2014. Training convolutional networks with noisy labels. arXiv preprint arXiv:1406.2080 (2014).
[53]
Xiaofei Sun, Xiaoya Li, Jiwei Li, Fei Wu, Shangwei Guo, Tianwei Zhang, and Guoyin Wang. 2023. Text Classification via Large Language Models. arXiv preprint arXiv:2305.08377 (2023).
[54]
Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. 2018. A survey on deep transfer learning. In Artificial Neural Networks and Machine Learning--ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4--7, 2018, Proceedings, Part III 27. Springer, 270--279.
[55]
TREC. 2022. Crisis FACTS. https://crisisfacts.github.io/
[56]
Andreas Veit, Neil Alldrin, Gal Chechik, Ivan Krasin, Abhinav Gupta, and Serge Belongie. 2017. Learning from noisy large-scale datasets with minimal supervision. In IEEE CVPR'17. 839--847.
[57]
Sudha Verma, Sarah Vieweg, William J. Corvey, Leysia Palen, James H. Martin, Martha Palmer, Aaron Schram, and Kenneth Mark Anderson. 2011. Natural Language Processing to the Rescue? Extracting "Situational Awareness" Tweets During Mass Emergency. In Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17--21, 2011.
[58]
Sarah Vieweg, Amanda L Hughes, Kate Starbird, and Leysia Palen. 2010. Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In Proceedings of the SIGCHI conference on human factors in computing systems. 1079--1088.
[59]
Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, and Benjamin Van Durme. 2023. " According to..." Prompting Language Models Improves Quoting from Pre-Training Data. arXiv preprint arXiv:2305.13252 (2023).
[60]
Wikipedia. 2023. Overlap coefficient. https://en.wikipedia.org/wiki/Overlap_coefficient
[61]
Tong Xiao, Tian Xia, Yi Yang, Chang Huang, and Xiaogang Wang. 2015. Learning from massive noisy labeled data for image classification. In IEEE CVPR'15. 2691--2699.
[62]
Kaitao Zhang, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. 2020. Selective Weak Supervision for Neural Information Retrieval. In Proceedings of The Web Conference 2020. 474--485.
[63]
Yu Zhang, Peter Tivn o, Alevs Leonardis, and Ke Tang. 2021b. A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, Vol. 5, 5 (2021), 726--742.
[64]
Zijian Zhang, Koustav Rudra, and Avishek Anand. 2021a. Explain and Predict, and Then Predict Again. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 418--426.
[65]
Zhixue Zhao and Nikolaos Aletras. 2023. Incorporating Attribution Importance for Improving Faithfulness Metrics. arXiv preprint arXiv:2305.10496 (2023).
[66]
Zexuan Zhong, Dan Friedman, and Danqi Chen. 2021. Factual Probing Is [MASK]: Learning vs. Learning to Recall. CoRR, Vol. abs/2104.05240 (2021).
[67]
Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2020. A comprehensive survey on transfer learning. Proc. IEEE, Vol. 109, 1 (2020), 43--76. io

Cited By

View all
  • (2024)AI Simulation by Digital Twins: Systematic Survey of the State of the Art and a Reference FrameworkProceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems10.1145/3652620.3688253(401-412)Online publication date: 22-Sep-2024
  • (2024)Agenda Formation and Prediction of Voting Tendencies for European Parliament Election using Textual, Social and Network FeaturesInformation Systems Frontiers10.1007/s10796-024-10568-wOnline publication date: 23-Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '24: Proceedings of the ACM Web Conference 2024
May 2024
4826 pages
ISBN:9798400701719
DOI:10.1145/3589334
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active learning
  2. crisis events
  3. interpretability
  4. large language model
  5. semi-supervised learning
  6. twitter

Qualifiers

  • Research-article

Funding Sources

  • the Science and Engineering Research Board, Department of Science and Technology, Government of India
  • the DST-INSPIRE Faculty Fellowship in the year 2021 under Engineering Sciences
  • Microsoft Academic Partnership Grant 2023
  • the European Union ?s Horizon 2020 research and innovation action program

Conference

WWW '24
Sponsor:
WWW '24: The ACM Web Conference 2024
May 13 - 17, 2024
Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)399
  • Downloads (Last 6 weeks)101
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)AI Simulation by Digital Twins: Systematic Survey of the State of the Art and a Reference FrameworkProceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems10.1145/3652620.3688253(401-412)Online publication date: 22-Sep-2024
  • (2024)Agenda Formation and Prediction of Voting Tendencies for European Parliament Election using Textual, Social and Network FeaturesInformation Systems Frontiers10.1007/s10796-024-10568-wOnline publication date: 23-Dec-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media