skip to main content
10.1145/3501247.3531579acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

Hostility Detection in Online Hindi-English Code-Mixed Conversations

Published: 26 June 2022 Publication History

Abstract

With the rise in accessibility and popularity of various social media platforms, people have started expressing and communicating their ideas, opinions, and interests online. While these platforms are active sources of entertainment and idea-sharing, they also attract hostile and offensive content equally. Identification of hostile posts is an essential and challenging task. In particular, Hindi-English Code-Mixed online posts of conversational nature (which have a hierarchy of posts, comments, and replies) have escalated the challenges. There are two major challenges: (1) the complex structure of Code-Mixed text and (2) filtering the relevant previous context for a given utterance. To overcome these challenges, in this paper, we propose a novel hierarchical neural network architecture to identify hostile posts/comments/replies in online Hindi-English Code-Mixed conversations. We leverage large multilingual pre-trained (mLPT) models like mBERT, XLMR, and MuRIL. The mLPT models provide a rich representation of code-mix text and hierarchical modeling leads to a natural abstraction and selection of the relevant context. The propose model consistently outperformed all the baselines and emerged as a state-of-the-art performing model. We conducted multiple analyses and ablation studies to prove the robustness of the proposed model.

Supplementary Material

MP4 File (WS22_S7_114.mp4)
Presentation video

References

[1]
Somnath Banerjee, Maulindu Sarkar, Nancy Agrawal, Punyajoy Saha, and Mithun Das. 2021. Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages. (11 2021).
[2]
Mohit Bhardwaj, Md. Shad Akhtar, Asif Ekbal, Amitava Das, and Tanmoy Chakraborty. 2020. Hostility Detection Dataset in Hindi. CoRR abs/2011.03588(2020). arXiv:2011.03588https://arxiv.org/abs/2011.03588
[3]
Irina Bigoulaeva, Viktor Hangya, and Alexander Fraser. 2021. Cross-lingual transfer learning for hate speech detection. In Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion. 15–25.
[4]
Ana-Maria Bucur, Marcos Zampieri, and Liviu P. Dinu. 2021. An Exploratory Analysis of the Relation between Offensive Language and Mental Health. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 3600–3606. https://doi.org/10.18653/v1/2021.findings-acl.315
[5]
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747
[6]
Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. 512–515.
[7]
Arkadipta De, Venkatesh Elangovan, Kaushal Kumar Maurya, and Maunendra Sankar Desarkar. 2021. Coarse and fine-grained hostility detection in Hindi posts using fine tuned multilingual embeddings. In International Workshop on Combating On line Hostile Posts in Regional Languages during Emergency Situation. Springer, 201–212.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
[9]
Sumanth Doddapaneni, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, and Mitesh M. Khapra. 2021. A Primer on Pretrained Multilingual Language Models. CoRR abs/2107.00676(2021). arXiv:2107.00676https://arxiv.org/abs/2107.00676
[10]
Ayush Gupta, Rohan Sukumaran, Kevin John, and Sundeep Teki. 2021. Hostility Detection and Covid-19 Fake News Detection in Social Media. CoRR abs/2101.05953(2021). arXiv:2101.05953https://arxiv.org/abs/2101.05953
[11]
Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, and Melvin Johnson. 2020. Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In International Conference on Machine Learning. PMLR, 4411–4421.
[12]
Vikas Kumar Jha, Pa Hrudya, PN Vinu, Vishnu Vijayan, and Pa Prabaharan. 2020. DHOT-repository and classification of offensive tweets in the Hindi language. Procedia Computer Science 171 (2020), 2324–2333.
[13]
Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, and Ravirai Joshi. 2021. Evaluation of Deep Learning Models for Hostility Detection in Hindi Text. In 2021 6th International Conference for Convergence in Technology (I2CT). IEEE, 1–5.
[14]
Aditya Kadam, Anmol Goel, Jivitesh Jain, Jushaan Singh Kalra, Mallika Subramanian, Manvith Reddy, Prashant Kodali, T. H. Arjun, Manish Shrivastava, and Ponnurangam Kumaraguru. 2021. Battling Hateful Content in Indic Languages HASOC ’21. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings abs/2110.12780. https://cdn.iiit.ac.in/cdn/precog.iiit.ac.in/pubs/2021_Sept_Battling_Hateful_Content_in_Indic_Languages_HASOC.pdf
[15]
Satyajit Kamble and Aditya Joshi. December, 2018. Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models. International Conference on Natural Language Processing, Patiala, India abs/1811.05145(December, 2018).
[16]
Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, and Partha Talukdar. 2021. MuRIL: Multilingual Representations for Indian Languages. arxiv:2103.10730 [cs.CL]
[17]
Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. 2019. A BERT-based transfer learning approach for hate speech detection in online social media. In International Conference on Complex Networks and Their Applications. Springer, 928–940.
[18]
Ravindra Nayak and Raviraj Joshi. 2021. Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings abs/2110.09338(2021).
[19]
Thseen Nazir and Liyana Thabassum. 2021. Cyberbullying: Definition, types, effects, related factors and precautions to be taken during COVID-19 pandemic. The International Journal of Indian Psychology (2021).
[20]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016. 1135–1144.
[21]
Debjoy Saha, Naman Paharia, Debajit Chakraborty, Punyajoy Saha, and Animesh Mukherjee. 2021. Hate-Alert@DravidianLangTech-EACL2021: Ensembling strategies for Transformer-based Offensive language Detection. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. Association for Computational Linguistics, Kyiv, 270–276. https://aclanthology.org/2021.dravidianlangtech-1.38
[22]
Ujwal Narayan Sayar Ghosh Roy, Tathagata Raha, Zubair Abid, and Vasudeva Varma. 2021. Leveraging multilingual transformers for hate speech detection. (2021).
[23]
Jonas Paul Schöne, Brian Parkinson, and Amit Goldenberg. 2021. Negativity spreads more than positivity on Twitter after both positive and negative political situations. Affective Science 2, 4 (2021), 379–390.
[24]
Chander Shekhar, Bhavya Bagla, Kaushal Kumar Maurya, and Maunendra Sankar Desarkar. 2021. Walk in Wild: An Ensemble Approach for Hostility Detection in Hindi Posts. CoRR abs/2101.06004(2021). arXiv:2101.06004https://arxiv.org/abs/2101.06004
[25]
K Sreelakshmi, B Premjith, and KP Soman. 2020. Detection of hate speech text in Hindi-English code-mixed data. Procedia Computer Science 171 (2020), 737–744.
[26]
Phoey Lee Teh, Chi-Bin Cheng, and Weng Mun Chee. 2018. Identifying and categorising profane words in hate speech. In Proceedings of the 2nd International Conference on Compute and Data Analysis. 65–69.
[27]
Abhishek Velankar, Hrushikesh Patil, Amol Gore, Shubham Salunke, and Raviraj Joshi. 2021. Hate and Offensive Speech Detection in Hindi and Marathi. Forum for Information Retrieval Evaluation (FIRE) 2021, CEUR Workshop Proceddings (2021).
[28]
Michael Walsh and Stephanie Baker. 2021. Twitter’s design stokes hostility and controversy. Here’s why, and how it might change. (2021).
[29]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=rJ4km2R5t7
[30]
Huiling You, Xingran Zhu, and Sara Stymne. 2021. Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context Disambiguation. In Proceedings of the 15th International Workshop on Semantic Evaluation, SemEval@ACL/IJCNLP 2021, Virtual Event / Bangkok, Thailand, August 5-6, 2021, Alexis Palmer, Nathan Schneider, Natalie Schluter, Guy Emerson, Aurélie Herbelot, and Xiaodan Zhu (Eds.). Association for Computational Linguistics, 150–156. https://doi.org/10.18653/v1/2021.semeval-1.15

Cited By

View all

Index Terms

  1. Hostility Detection in Online Hindi-English Code-Mixed Conversations
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WebSci '22: Proceedings of the 14th ACM Web Science Conference 2022
          June 2022
          479 pages
          ISBN:9781450391917
          DOI:10.1145/3501247
          Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 26 June 2022

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Code-Mixed data
          2. Neural networks
          3. hostility detection

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WebSci '22
          Sponsor:
          WebSci '22: 14th ACM Web Science Conference 2022
          June 26 - 29, 2022
          Barcelona, Spain

          Acceptance Rates

          Overall Acceptance Rate 245 of 933 submissions, 26%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)30
          • Downloads (Last 6 weeks)4
          Reflects downloads up to 28 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media