skip to main content
10.1145/3442188.3445875acmconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open access

Differential Tweetment: Mitigating Racial Dialect Bias in Harmful Tweet Detection

Published: 01 March 2021 Publication History

Abstract

Automated systems for detecting harmful social media content are afflicted by a variety of biases, some of which originate in their training datasets. In particular, some systems have been shown to propagate racial dialect bias: they systematically classify content aligned with the African American English (AAE) dialect as harmful at a higher rate than content aligned with White English (WE). This perpetuates prejudice by silencing the Black community. Towards this problem we adapt and apply two existing bias mitigation approaches: preferential sampling pre-processing and adversarial debiasing in-processing. We analyse the impact of our interventions on model performance and propagated bias. We find that when bias mitigation is employed, a high degree of predictive accuracy is maintained relative to baseline, and in many cases bias against AAE in harmful tweet predictions is reduced. However, the specific effects of these interventions on bias and performance vary widely between dataset contexts. This variation suggests the unpredictability of autonomous harmful content detection outside of its development context. We argue that this, and the low performance of these systems at baseline, raise questions about the reliability and role of such systems in high-impact, real-world settings.

References

[1]
Douglas G Altman and Patrick Royston. 2006. The cost of dichotomising continuous variables. BMJ 332, 7549 (2006), 1080.
[2]
Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep Learning for Hate Speech Detection in Tweets. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW '17 Companion). International World Wide Web Conferences Steering Committee, 759--760.
[3]
Rachel K. E. Bellamy, Kuntal Dey, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, Seema Nagar, Karthikeyan Natesan Ramamurthy, John Richards, Diptikalyan Saha, Prasanna Sattigeri, Moninder Singh, Kush R. Varshney, and Yunfeng Zhang. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias. https://arxiv.org/abs/1810.01943
[4]
Alex Beutel, Jilin Chen, Zhe Zhao, and Ed Chi. 2017. Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations. In Proceedings of the Conference on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017).
[5]
Hannah Bloch-Wehba. 2019. Automation in Moderation. Cornell International Law Journal, Forthcoming (2019).
[6]
Su Lin Blodgett, Lisa Green, and Brendan O'Connor. 2016. Demographic Dialectal Variation in Social Media: A Case Study of African-American English. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP 2016). Association for Computational Linguistics, 1119--1130.
[7]
Eszter Bokányi, Dániel Kondor, László Dobos, Tamás Sebők, József Stéger, István Csabai, and Gábor Vattay. 2016. Race, Religion and the City: Twitter Word Frequency Patterns Reveal Dominant Demographic Dimensions in the United States. Palgrave Communications 2 (2016).
[8]
Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS '16). Curran Associates Inc., 4356--4364.
[9]
Robyn Caplan. 2018. Content or context moderation. Data & Society Research Institute (2018).
[10]
Jennifer Cobbe. 2020. Algorithmic Censorship by Social Platforms: Power and Resistance. Philosophy & Technology (2020).
[11]
Jennifer Cobbe and Jatinder Singh. 2019. Regulating Recommending: Motivations, Considerations, and Principles. European Journal of Law and Technology, 10(3) (2019).
[12]
Nicole Cohen. 2011. The Valorization of Surveillance: Towards a Political Economy of Facebook. Democratic Communiqué 22 (2011).
[13]
Brian d'Alessandro, Cathy O'Neil, and Tom LaGatta. 2017. Conscientious Classification: A Data Scientist's Guide to Discrimination-Aware Classification. Big Data 5 (2017), 120--134.
[14]
Thomas Davidson, Debasmita Bhattacharya, and Ingmar Weber. 2019. Racial Bias in Hate Speech and Abusive Language Detection Datasets. In Proceedings of the Third Workshop on Abusive Language Online. 25--35.
[15]
Thomas Davidson, Dana Warmsley, Michael W. Macy, and Ingmar Weber. 2017. Automated Hate Speech Detection and the Problem of Offensive Language. In Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017, Montréal, Québec, Canada, May 15-18, 2017. AAAI Press, 512--515.
[16]
M. Di Paolo and A.K. Spears. 2014. Languages and Dialects in the U.S.: Focus on Diversity and Linguistics. Taylor & Francis.
[17]
Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2018. Measuring and Mitigating Unintended Bias in Text Classification. In AAAI/ACM Conference on AI, Ethics, and Society. Association for the Advancement of Artificial Intelligence.
[18]
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (Cambridge, Massachusetts) (ITCS '12). Association for Computing Machinery, 214--226.
[19]
Elizabeth Dwoskin, Jeanne Whalen, and Regine Cabato. 2019. Content moderators at YouTube, Facebook and Twitter see the worst of the web - and suffer silently. The Washington Post. https://www.washingtonpost.com/technology/2019/07/25/social-media-companies-are-outsourcing-their-dirty-work-philippines-generation-workers-is-paying-price/.
[20]
UK Department for Digital, Culture, Media & Sport. 2018. Data Ethics Framework. Government Guideline.
[21]
Antigoni-Maria Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, and Nicolas Kourtellis. 2018. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. In Proceedings of the Twelfth International Conference on Web and Social Media (ICWSM 2018). 491--500.
[22]
Aditya Gaydhani, Vikrant Doma, Shrikant Kendre, and Laxmi Bhagwat. 2018. Detecting Hate Speech and Offensive Language on Twitter using Machine Learning: An N-gram and TFIDF based Approach. arXiv:1809.08651 [cs.CL]
[23]
Jennifer Golbeck, Zahra Ashktorab, Rashad O. Banjo, Alexandra Berlinger, Siddharth Bhagwan, Cody Buntain, Paul Cheakalos, Alicia A. Geller, Quint Gergory, Rajesh Kumar Gnanasekaran, Raja Rajan Gunasekaran, Kelly M. Hoffman, Jenny Hottle, Vichita Jienjitlert, Shivika Khare, Ryan Lau, Marianna J. Martindale, Shalmali Naik, Heather L. Nixon, Piyush Ramachandran, Kristine M. Rogers, Lisa Rogers, Meghna Sardana Sarin, Gaurav Shahane, Jayanee Thanki, Priyanka Vengataraman, Zijian Wan, and Derek Michael Wu. 2017. A Large Labeled Corpus for Online Harassment Research. In Proceedings of the 2017 ACM on Web Science Conference (WebSci '17). Association for Computing Machinery, 229--233.
[24]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS '14). MIT Press, 2672--2680.
[25]
Robert Gorwa, Reuben Binns, and Christian Katzenbach. 2020. Algorithmic content moderation: Technical and political challenges in the automation of platform governance. Big Data & Society 7, 1 (2020).
[26]
Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16). Curran Associates Inc., 3323--3331.
[27]
High-Level Expert Group on AI. 2019. Ethics guidelines for trustworthy AI. Report. European Commission.
[28]
Dirk Hovy and Shannon L. Spruit. 2016. The Social Impact of Natural Language Processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 591--598.
[29]
Guang-Bin Huang. 2003. Learning Capability and Storage Capacity of Two-Hidden-Layer Feedforward Networks. Trans. Neur. Netw. 14, 2 (2003), 274--281.
[30]
F. Kamiran and T.G.K. Calders. 2010. Classification with no discrimination by preferential sampling. In Informal proceedings of the 19th Annual Machine Learning Conference of Belgium and The Netherlands (Benelearn'10, Leuven, Belgium, May 27-28, 2010). 1--6.
[31]
Faisal Kamiran and Toon Calders. 2012. Data Preprocessing Techniques for Classification without Discrimination. Knowl. Inf. Syst. 33, 1 (2012), 1--33.
[32]
Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012. Fairness-Aware Classifier with Prejudice Remover Regularizer. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD '12). 35--50.
[33]
Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2017. Inherent Trade-Offs in the Fair Determination of Risk Scores. In 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA (LIPIcs, Vol. 67). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 43:1-43:23.
[34]
Younghun Lee, Seunghyun Yoon, and Kyomin Jung. 2018. Comparative Studies of Detecting Abusive Language on Twitter. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Association for Computational Linguistics, 101--106.
[35]
Yi Li and Nuno Vasconcelos. 2019. REPAIR: Removing Representation Bias by Dataset Resampling. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
[36]
Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. 2016. The Variational Fair Autoencoder. In Proceedings of the 4th International Conference on Learning Representations (ICLR 2016).
[37]
Sean MacAvaney, Hao-Ren Yao, Eugene Yang, Katina Russell, Nazli Goharian, and Ophir Frieder. 2019. Hate speech detection: Challenges and solutions. PLOS ONE 14, 8 (2019), 1--16.
[38]
Jeremie Mary, Clément Calauzènes, and Noureddine El Karoui. 2019. Fairness-Aware Learning for Continuous Attributes and Treatments. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97). PMLR, 4382--4391.
[39]
Louise Matsakis and Paris Martineau. 2020. Coronavirus Disrupts Social Media's First Line of Defense. Wired. https://www.wired.com/story/coronavirus-social-media-automated-content-moderation/.
[40]
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2019. A Survey on Bias and Fairness in Machine Learning. arXiv:1908.09635 [cs.LG]
[41]
UK Home Office and UK Department for Digital, Culture, Media & Sport. 2020. Online Harms White Paper. Government Guideline.
[42]
Ji Ho Park, Jamin Shin, and Pascale Fung. 2018. Reducing Gender Bias in Abusive Language Detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2799--2804.
[43]
Frank Pasquale. 2019. The Second Wave of Algorithmic Accountability. LPE Blog. https://lpeblog.org/2019/11/25/the-second-wave-of-algorithmic-accountability/.
[44]
Georgios Pitsilis, Heri Ramampiaro, and Helge Langseth. 2018. Detecting Offensive Language in Tweets Using Deep Learning. (2018).
[45]
Julia Powles and Helen Nissenbaum. 2018. The Seductive Diversion of 'Solving' Bias in Artificial Intelligence. Medium OneZero. https://onezero.medium.com/the-seductive-diversion-of-solving-bias-in-artificial-intelligence-890df5e5ef53.
[46]
Daniel Preoţiuc-Pietro and Lyle Ungar. 2018. User-Level Race and Ethnicity Predictors from Twitter Text. In Proceedings of the 27th International Conference on Computational Linguistics (COLING). Association for Computational Linguistics, 1534--1545.
[47]
The Associated Press. 2020. Solidarity with U.S. protesters: People around the world march and speak out against racism. Canadian Broadcasting Corporation. https://www.cbc.ca/news/world/protests-world-floyd-1.5595135.
[48]
Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A. Smith. 2019. The Risk of Racial Bias in Hate Speech Detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1668--1678.
[49]
Tobias Scheffer, Christian Decomain, and Stefan Wrobel. 2001. Active Hidden Markov Models for Information Extraction. In Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis (IDA '01). Springer-Verlag, 309--318.
[50]
Anna Schmidt and Michael Wiegand. 2017. A Survey on Hate Speech Detection using Natural Language Processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, 1--10.
[51]
Louise Seamster and Raphaël Charron-Chénier. 2017. Predatory Inclusion and Education Debt: Rethinking the Racial Wealth Gap. Social Currents 4, 3 (2017).
[52]
Deven Shah, H. Andrew Schwartz, and Dirk Hovy. 2019. Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview. arXiv:1912.11078 [cs.CL]
[53]
H. Suresh and J. Guttag. 2019. A Framework for Understanding Unintended Consequences of Machine Learning. ArXiv abs/1901.10002 (2019).
[54]
Twitter. [n.d.]. Our range of enforcement options. https://help.twitter.com/en/rules-and-policies/enforcement-options.
[55]
Twitter. [n.d.]. The Twitter Rules. https://help.twitter.com/en/rules-and-policies/twitter-rules.
[56]
Ameya Vaidya, Feng Mai, and Yue Ning. 2019. Empirical Analysis of Multi-Task Learning for Reducing Model Bias in Toxic Comment Detection. arXiv:1909.09758 [cs.AI]
[57]
Sahil Verma and Julia Rubin. 2018. Fairness Definitions Explained. In Proceedings of the International Workshop on Software Fairness (FairWare '18). Association for Computing Machinery, 1--7.
[58]
Bertie Vidgen and Leon Derczynski. 2020. Directions in Abusive Language Training Data: Garbage In, Garbage Out. arXiv:2004.01670 [cs.CL]
[59]
Christina Wadsworth, Francesca Vera, and Chris Piech. 2018. Achieving fairness through adversarial learning: an application to recidivism prediction. In Proceedings of the Conference on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2018).
[60]
Zeerak Waseem. 2016. Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In Proceedings of the First Workshop on NLP and Computational Social Science. Association for Computational Linguistics, 138--142.
[61]
Zeerak Waseem, Thomas Davidson, Dana Warmsley, and Ingmar Weber. 2017. Understanding Abuse: A Typology of Abusive Language Detection Subtasks. In Proceedings of the First Workshop on Abusive Language Online. Association for Computational Linguistics, 78--84.
[62]
Zeerak Waseem and Dirk Hovy. 2016. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In Proceedings of the NAACL Student Research Workshop. Association for Computational Linguistics, 88--93.
[63]
Jess Whittlestone, Rune Nyrup, Anna Alexandrova, Kanta Dihal, and Stephen Cave. 2019. Ethical and societal implications of algorithms, data, and artificial intelligence: a roadmap for research. Technical Report. Nuffield Foundation.
[64]
Michael Wiegand, Josef Ruppenhofer, and Thomas Kleinbauer. 2019. Detection of abusive language: the problem of biased datasets. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 602--608.
[65]
Guang Xiang, Bin Fan, Ling Wang, Jason Hong, and Carolyn Rose. 2012. Detecting Offensive Tweets via Topical Feature Discovery over a Large Scale Twitter Corpus. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM '12). Association for Computing Machinery, 1980--1984.
[66]
Richard Zemel, Yu Wu, Kevin Swersky, Toniann Pitassi, and Cynthia Dwork. 2013. Learning Fair Representations. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (ICML '13). JMLR.org, III-325-III-333.
[67]
Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating Unwanted Biases with Adversarial Learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (AIES '18). Association for Computing Machinery, 335--340.
[68]
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. 2018. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, 15--20.
[69]
Shoshana Zuboff. 2019. Surveillance Capitalism and the Challenge of Collective Action. New Labor Forum 28, 1 (2019), 10--29.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency
March 2021
899 pages
ISBN:9781450383097
DOI:10.1145/3442188
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2021

Check for updates

Author Tags

  1. bias
  2. content moderation
  3. dialect
  4. fairness
  5. racial disparities

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

FAccT '21
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)323
  • Downloads (Last 6 weeks)38
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media