research-article

Hostility Detection in Online Hindi-English Code-Mixed Conversations

Authors:

Kamal Shrestha,

Kaushal Maurya,

Maunendra Sankar DesarkarAuthors Info & Claims

WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022

Pages 390 - 400

https://doi.org/10.1145/3501247.3531579

Published: 26 June 2022 Publication History

Abstract

With the rise in accessibility and popularity of various social media platforms, people have started expressing and communicating their ideas, opinions, and interests online. While these platforms are active sources of entertainment and idea-sharing, they also attract hostile and offensive content equally. Identification of hostile posts is an essential and challenging task. In particular, Hindi-English Code-Mixed online posts of conversational nature (which have a hierarchy of posts, comments, and replies) have escalated the challenges. There are two major challenges: (1) the complex structure of Code-Mixed text and (2) filtering the relevant previous context for a given utterance. To overcome these challenges, in this paper, we propose a novel hierarchical neural network architecture to identify hostile posts/comments/replies in online Hindi-English Code-Mixed conversations. We leverage large multilingual pre-trained (mLPT) models like mBERT, XLMR, and MuRIL. The mLPT models provide a rich representation of code-mix text and hierarchical modeling leads to a natural abstraction and selection of the relevant context. The propose model consistently outperformed all the baselines and emerged as a state-of-the-art performing model. We conducted multiple analyses and ablation studies to prove the robustness of the proposed model.

Supplementary Material

MP4 File (WS22_S7_114.mp4)

Presentation video

Download
846.21 MB

References

[1]

Somnath Banerjee, Maulindu Sarkar, Nancy Agrawal, Punyajoy Saha, and Mithun Das. 2021. Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages. (11 2021).

[2]

Mohit Bhardwaj, Md. Shad Akhtar, Asif Ekbal, Amitava Das, and Tanmoy Chakraborty. 2020. Hostility Detection Dataset in Hindi. CoRR abs/2011.03588(2020). arXiv:2011.03588https://arxiv.org/abs/2011.03588

[3]

Irina Bigoulaeva, Viktor Hangya, and Alexander Fraser. 2021. Cross-lingual transfer learning for hate speech detection. In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. 15–25.

[4]

Ana-Maria Bucur, Marcos Zampieri, and Liviu P. Dinu. 2021. An Exploratory Analysis of the Relation between Offensive Language and Mental Health. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 3600–3606. https://doi.org/10.18653/v1/2021.findings-acl.315

[5]

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747

[6]

Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. 512–515.

[7]

Arkadipta De, Venkatesh Elangovan, Kaushal Kumar Maurya, and Maunendra Sankar Desarkar. 2021. Coarse and fine-grained hostility detection in Hindi posts using fine tuned multilingual embeddings. In International Workshop on Combating On line Hostile Posts in Regional Languages during Emergency Situation. Springer, 201–212.

[8]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423

[9]

Sumanth Doddapaneni, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, and Mitesh M. Khapra. 2021. A Primer on Pretrained Multilingual Language Models. CoRR abs/2107.00676(2021). arXiv:2107.00676https://arxiv.org/abs/2107.00676

[10]

Ayush Gupta, Rohan Sukumaran, Kevin John, and Sundeep Teki. 2021. Hostility Detection and Covid-19 Fake News Detection in Social Media. CoRR abs/2101.05953(2021). arXiv:2101.05953https://arxiv.org/abs/2101.05953

[11]

Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. 2020. Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In International Conference on Machine Learning. PMLR, 4411–4421.

[12]

Vikas Kumar Jha, Pa Hrudya, PN Vinu, Vishnu Vijayan, and Pa Prabaharan. 2020. DHOT-repository and classification of offensive tweets in the Hindi language. Procedia Computer Science 171 (2020), 2324–2333.

[13]

Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, and Ravirai Joshi. 2021. Evaluation of Deep Learning Models for Hostility Detection in Hindi Text. In 2021 6th International Conference for Convergence in Technology (I2CT). IEEE, 1–5.

[14]

Aditya Kadam, Anmol Goel, Jivitesh Jain, Jushaan Singh Kalra, Mallika Subramanian, Manvith Reddy, Prashant Kodali, T. H. Arjun, Manish Shrivastava, and Ponnurangam Kumaraguru. 2021. Battling Hateful Content in Indic Languages HASOC ’21. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings abs/2110.12780. https://cdn.iiit.ac.in/cdn/precog.iiit.ac.in/pubs/2021_Sept_Battling_Hateful_Content_in_Indic_Languages_HASOC.pdf

[15]

Satyajit Kamble and Aditya Joshi. December, 2018. Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models. International Conference on Natural Language Processing, Patiala, India abs/1811.05145(December, 2018).

[16]

Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, and Partha Talukdar. 2021. MuRIL: Multilingual Representations for Indian Languages. arxiv:2103.10730 [cs.CL]

[17]

Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. 2019. A BERT-based transfer learning approach for hate speech detection in online social media. In International Conference on Complex Networks and Their Applications. Springer, 928–940.

[18]

Ravindra Nayak and Raviraj Joshi. 2021. Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings abs/2110.09338(2021).

[19]

Thseen Nazir and Liyana Thabassum. 2021. Cyberbullying: Definition, types, effects, related factors and precautions to be taken during COVID-19 pandemic. The International Journal of Indian Psychology (2021).

[20]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016. 1135–1144.

Digital Library

[21]

Debjoy Saha, Naman Paharia, Debajit Chakraborty, Punyajoy Saha, and Animesh Mukherjee. 2021. Hate-Alert@DravidianLangTech-EACL2021: Ensembling strategies for Transformer-based Offensive language Detection. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. Association for Computational Linguistics, Kyiv, 270–276. https://aclanthology.org/2021.dravidianlangtech-1.38

[22]

Ujwal Narayan Sayar Ghosh Roy, Tathagata Raha, Zubair Abid, and Vasudeva Varma. 2021. Leveraging multilingual transformers for hate speech detection. (2021).

[23]

Jonas Paul Schöne, Brian Parkinson, and Amit Goldenberg. 2021. Negativity spreads more than positivity on Twitter after both positive and negative political situations. Affective Science 2, 4 (2021), 379–390.

[24]

Chander Shekhar, Bhavya Bagla, Kaushal Kumar Maurya, and Maunendra Sankar Desarkar. 2021. Walk in Wild: An Ensemble Approach for Hostility Detection in Hindi Posts. CoRR abs/2101.06004(2021). arXiv:2101.06004https://arxiv.org/abs/2101.06004

[25]

K Sreelakshmi, B Premjith, and KP Soman. 2020. Detection of hate speech text in Hindi-English code-mixed data. Procedia Computer Science 171 (2020), 737–744.

[26]

Phoey Lee Teh, Chi-Bin Cheng, and Weng Mun Chee. 2018. Identifying and categorising profane words in hate speech. In Proceedings of the 2nd International Conference on Compute and Data Analysis. 65–69.

Digital Library

[27]

Abhishek Velankar, Hrushikesh Patil, Amol Gore, Shubham Salunke, and Raviraj Joshi. 2021. Hate and Offensive Speech Detection in Hindi and Marathi. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings (2021).

[28]

Michael Walsh and Stephanie Baker. 2021. Twitter’s design stokes hostility and controversy. Here’s why, and how it might change. (2021).

[29]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=rJ4km2R5t7

[30]

Huiling You, Xingran Zhu, and Sara Stymne. 2021. Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context Disambiguation. In Proceedings of the 15th International Workshop on Semantic Evaluation, SemEval@ACL/IJCNLP 2021, Virtual Event / Bangkok, Thailand, August 5-6, 2021, Alexis Palmer, Nathan Schneider, Natalie Schluter, Guy Emerson, Aurélie Herbelot, and Xiaodan Zhu (Eds.). Association for Computational Linguistics, 150–156. https://doi.org/10.18653/v1/2021.semeval-1.15

Cited By

Chakraborty AJoardar SSekh A(2024)Ensemble Classifier for Hindi Hostile Content DetectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/359135323:1(1-17)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1145/3591353
Sharma DSingh VGupta V(2024)TABHATE: A Target-based hate speech detection dataset in HindiSocial Network Analysis and Mining10.1007/s13278-024-01355-114:1Online publication date: 21-Sep-2024
https://doi.org/10.1007/s13278-024-01355-1
Rawat AKumar SSamant S(2024)Hate speech detection in social mediaWIREs Computational Statistics10.1002/wics.164816:2Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1002/wics.1648
Show More Cited By

Index Terms

Hostility Detection in Online Hindi-English Code-Mixed Conversations
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals

Index terms have been assigned to the content through auto-classification.

Recommendations

Ensemble Classifier for Hindi Hostile Content Detection
Detection of hostile content from social media posts (Facebook, Twitter, etc.) is a demanding task in the field of Natural Language Processing. The increase of hostile content in different electronic media has opened up new challenges in language ...
A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari
Social Media has been growing and has provided the world with a platform to opine, debate, display, and discuss like never before. It has a major influence in research areas that analyze human behavior and social groups, and the phenomenon of social ...
Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

With the increase in user generated content, particularly on social media networks, the amount of hate speech is also steadily increasing. So, there is a need to automatically detect such hateful content and curb the wrongful activities. While relevant ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022

June 2022

479 pages

ISBN:9781450391917

DOI:10.1145/3501247

General Chairs:
Ricardo Baeza-Yates
Northeastern University, MA, USA & Universitat Pompeu Fabra, Spain
,
Katrin Weller
GESIS & Center for Advanced Internet Studies, Germany
,
Organizing Chair:
Manuel Portela
Universitat Pompeu Fabra, Spain
,
Program Chairs:
Oshani Seneviratne
Rensselaer Polytechnic Institute, NY, USA
,
Ingmar Weber
Qatar Computing Research Institute, Qatar
,
Taha Yasseri
University College Dublin, Ireland
,
Publications Chairs:
Anna Bon
Vrije Universiteit Amsterdam, Netherlands
,
Srinath Srinivas
International Institute of Information Technology, Bangalore, India
,
Luis-Daniel Ibáñez
University of Southampton, UK

Copyright © 2022 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WebSci '22

Sponsor:

SIGWEB

WebSci '22: 14th ACM Web Science Conference 2022

June 26 - 29, 2022

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
218
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)4

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chakraborty AJoardar SSekh A(2024)Ensemble Classifier for Hindi Hostile Content DetectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/359135323:1(1-17)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1145/3591353
Sharma DSingh VGupta V(2024)TABHATE: A Target-based hate speech detection dataset in HindiSocial Network Analysis and Mining10.1007/s13278-024-01355-114:1Online publication date: 21-Sep-2024
https://doi.org/10.1007/s13278-024-01355-1
Rawat AKumar SSamant S(2024)Hate speech detection in social mediaWIREs Computational Statistics10.1002/wics.164816:2Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1002/wics.1648
Chakraborty AJoardar SSekh A(2023)Comparative Analysis of Different BERT-Based Machine Learning Models for Hostile Hindi Content Detection2023 IEEE 3rd Applied Signal Processing Conference (ASPCON)10.1109/ASPCON59071.2023.10395999(109-113)Online publication date: 24-Nov-2023
https://doi.org/10.1109/ASPCON59071.2023.10395999
Hsu CTung HShuai HChang Y(2023)Predicting and Exploring Abandonment Signals in a Banking Task-Oriented Chatbot ServiceInternational Journal of Human–Computer Interaction10.1080/10447318.2023.228222040:24(8497-8511)Online publication date: 20-Nov-2023
https://doi.org/10.1080/10447318.2023.2282220
Chakraborty AJoardar SAhmed Sekh A(2023)BSVM: A BERT-Based Support Vector Machine for Hindi Hostile Content DetectionProceedings of the 4th International Conference on Communication, Devices and Computing10.1007/978-981-99-2710-4_6(57-68)Online publication date: 28-Jul-2023
https://doi.org/10.1007/978-981-99-2710-4_6
Sharma DSingh ASingh V(undefined)THAR- Targeted Hate Speech Against Religion: A high-quality Hindi-English code-mixed Dataset with the Application of Deep Learning Models for Automatic DetectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3653017
https://dl.acm.org/doi/10.1145/3653017

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten