research-article

User Tampering in Reinforcement Learning Recommender Systems

Authors:

Atoosa Kasirzadeh,

Charles EvansAuthors Info & Claims

AIES '23: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society

Pages 58 - 69

https://doi.org/10.1145/3600211.3604669

Published: 29 August 2023 Publication History

Abstract

In this paper, we introduce new formal methods and provide empirical evidence to highlight a unique safety concern prevalent in reinforcement learning (RL)-based recommendation algorithms – ’user tampering.’ User tampering is a situation where an RL-based recommender system may manipulate a media user’s opinions through its suggestions as part of a policy to maximize long-term user engagement. We use formal techniques from causal modeling to critically analyze prevailing solutions proposed in the literature for implementing scalable RL-based recommendation systems, and we observe that these methods do not adequately prevent user tampering. Moreover, we evaluate existing mitigation strategies for reward tampering issues, and show that these methods are insufficient in addressing the distinct phenomenon of user tampering within the context of recommendations. We further reinforce our findings with a simulation study of an RL-based recommendation system focused on the dissemination of political content. Our study shows that a Q-learning algorithm consistently learns to exploit its opportunities to polarize simulated users with its early recommendations in order to have more consistent success with subsequent recommendations that align with this induced polarization. Our findings emphasize the necessity for developing safer RL-based recommendation systems and suggest that achieving such safety would require a fundamental shift in the design away from the approaches we have seen in the recent literature.

References

[1]

Himan Abdollahpouri, Gediminas Adomavicius, Robin Burke, Ido Guy, Dietmar Jannach, Toshihiro Kamishima, Jan Krasnodebski, and Luiz Pizzato. 2020. Multistakeholder recommendation: Survey and research directions. User Modeling and User-Adapted Interaction 30 (2020), 127–158.

[2]

M. Mehdi Afsar, Trafford Crump, and Behrouz Far. 2022. Reinforcement Learning Based Recommender Systems: A Survey. ACM Comput. Surv. (jun 2022). https://doi.org/10.1145/3543846 Just Accepted.

Digital Library

[3]

Anitha Anandhan, Liyana Shuib, Maizatul Akmar Ismail, and Ghulam Mujtaba. 2018. Social Media Recommender Systems: Review and Open Research Issues. IEEE Access 6 (2018), 15608–15628.

[4]

Stuart Armstrong, Jan Leike, Laurent Orseau, and Shane Legg. 2020. Pitfalls of Learning a Reward Function Online. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 1592–1600.

[5]

Christopher A. Bail. 2021. Breaking the Social Media Prism: How to Make our Platforms Less Polarizing. Princeton University Press, Princeton, New Jersey.

[6]

Christopher A. Bail, Lisa P. Argyle, Taylor W. Brown, John P. Bumpus, Haohan Chen, M. B. Fallin Hunzaker, Jaemin Lee, Marcus Mann, Friedolin Merhout, and Alexander Volfovsky. 2018. Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences 115, 37 (2018), 9216–9221.

[7]

Jesus Bobadilla, Fernando Ortega, A. Hernando, and A. Gutiérrez. 2013. Recommender systems survey. Knowledge-Based Systems 46 (2013), 109–132.

Digital Library

[8]

Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, and Anca Dragan. 2021. Estimating and Penalizing Preference Shift in Recommender Systems. Association for Computing Machinery, New York, NY, USA, 661–667. https://doi.org/10.1145/3460231.3478849

Digital Library

[9]

Tom Everitt, Ryan Carey, Eric D. Langlois, Pedro A. Ortega, and Shane Legg. 2021. Agent Incentives: A Causal Perspective. Proceedings of the AAAI Conference on Artificial Intelligence 35, 13 (2021), 11487–11495.

[10]

Tom Everitt, Marcus Hutter, Ramana Kumar, and Victoria Krakovna. 2021. Reward tampering problems and solutions in reinforcement learning: A causal influence diagram perspective. Synthese (2021), 1–33.

[11]

Sebastian Farquhar, Ryan Carey, and Tom Everitt. 2022. Path-Specific Objectives for Safer Agent Incentives. Proceedings of the AAAI Conference on Artificial Intelligence.

[12]

Florent Garcin, Kai Zhou, Boi Faltings, and Vincent Schickel. 2012. Personalized News Recommendation Based on Collaborative Filtering. In 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Vol. 1. 437–441.

[13]

Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, and Xiaohui Ye. 2018. Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform. Facebook AI (2018).

[14]

David Heckerman and Ross Shachter. 1995. Decision-Theoretic Foundations for Causal Reasoning. Journal of Artificial Intelligence Research 3, 1 (1995), 405–430.

Digital Library

[15]

Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, and Pushmeet Kohli. 2019. Degenerate Feedback Loops in Recommender Systems. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (Honolulu, HI, USA) (AIES ’19). Association for Computing Machinery, New York, NY, USA, 383–390.

Digital Library

[16]

Mohammed Khwaja, Miquel Ferrer, Jesus Omana Iglesias, A. Aldo Faisal, and Aleksandar Matic. 2019. Aligning Daily Activities with Personality: Towards a Recommender System for Improving Wellbeing. In Proceedings of the 13th ACM Conference on Recommender Systems (Copenhagen, Denmark) (RecSys ’19). Association for Computing Machinery, New York, NY, USA, 368–372.

Digital Library

[17]

David Krueger, Tegan Maharaj, and Jan Leike. 2020. Hidden Incentives for Auto-Induced Distributional Shift. ArXiv arXiv:2009.09153 (2020).

[18]

Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A Contextual-Bandit Approach to Personalized News Article Recommendation. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA, 661–670.

Digital Library

[19]

Feng Liu, Ruiming Tang, Xutao Li, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling. ArXiv arXiv:1810.12027 (2018).

[20]

Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized News Recommendation Based on Click Behavior. In Proceedings of the 15th International Conference on Intelligent User Interfaces (Hong Kong, China) (IUI ’10). Association for Computing Machinery, New York, NY, USA, 31–40.

Digital Library

[21]

Yang Liu, Zhengxing Chen, Kittipat Virochsiri, Juan Wang, Jiahao Wu, and Feng Liang. 2020. Reinforcement Learning-based Product Delivery Frequency Control. Facebook AI (2020).

[22]

Zhongqi Lu, Zhicheng Dou, Jianxun Lian, Xing Xie, and Qiang Yang. 2015. Content-Based Collaborative Filtering for News Topic Recommendation. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (Austin, Texas) (AAAI’15). AAAI Press, 217–223.

Digital Library

[23]

Silvia Milano, Mariarosaria Taddeo, and Luciano Floridi. 2020. Recommender systems and their ethical challenges. AI & SOCIETY 35, 4 (2020), 957–967.

Digital Library

[24]

Gourab K. Patro, Arpita Biswas, Niloy Ganguly, Krishna P. Gummadi, and Abhijnan Chakraborty. 2020. FairRec: Two-Sided Fairness for Personalized Recommendations in Two-Sided Platforms. In Proceedings of The Web Conference 2020 (Taipei, Taiwan) (WWW ’20). Association for Computing Machinery, New York, NY, USA, 1194–1204.

Digital Library

[25]

Martin L Puterman. 1990. Markov decision processes. Handbooks in operations research and management science 2 (1990), 331–434.

[26]

Stuart J. Russell. 2019. Filter Bubbles and the Future of Artificial Intelligence. https://www.youtube.com/watch?v=ZkV7anCPfaY&t=230s&ab_channel=LongNowFoundation. Accessed June 2, 2021.

[27]

Stuart J. Russell. 2019. Human Compatible: AI and the Problem of Control. Allen Lane, London.

[28]

Zeinab Shahbazi and Yung Cheol Byun. 2020. Toward Social Media Content Recommendation Integrated with Data Science and Machine Learning Approach for E-Learners. Symmetry 12, 11 (2020).

[29]

Guy Shani, David Heckerman, and Ronen Brafman. 2005. An MDP-Based Recommender System. Journal of Machine Learning Research 6 (2005), 1265–1295.

Digital Library

[30]

Jonathan Stray, Steven Adler, and Dylan Hadfield-Menell. 2020. What are you optimizing for? Aligning Recommender Systems with Human Values. In Participatory Approaches to Machine Learning. International Conference on Machine Learning Workshop.

[31]

Nima Taghipour and Ahmad Kardan. 2008. A Hybrid Web Recommender System Based on Q-Learning. In Proceedings of the 2008 ACM Symposium on Applied Computing (Fortaleza, Ceara, Brazil) (SAC ’08). Association for Computing Machinery, New York, NY, USA, 1164–1168.

Digital Library

[32]

Nima Taghipour, Ahmad Kardan, and Saeed Shiry Ghidary. 2007. Usage-Based Web Recommendations: A Reinforcement Learning Approach. In Proceedings of the 2007 ACM Conference on Recommender Systems (Minneapolis, MN, USA) (RecSys ’07). Association for Computing Machinery, New York, NY, USA, 113–120.

Digital Library

[33]

Liang Tang, Yexi Jiang, Lei Li, and Tao Li. 2014. Ensemble Contextual Bandits for Personalized Recommendation. In Proceedings of the 8th ACM Conference on Recommender Systems (Foster City, Silicon Valley, California, USA) (RecSys ’14). Association for Computing Machinery, New York, NY, USA, 73–80.

Digital Library

[34]

Liang Tang, Yexi Jiang, Lei Li, Chunqiu Zeng, and Tao Li. 2015. Personalized Recommendation via Parameter-Free Contextual Bandits. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (Santiago, Chile) (SIGIR ’15). Association for Computing Machinery, New York, NY, USA, 323–332.

Digital Library

[35]

Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, and Tao Li. 2016. Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 2025–2034.

Digital Library

[36]

Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep Reinforcement Learning for Page-Wise Recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia, Canada) (RecSys ’18). Association for Computing Machinery, New York, NY, USA, 95–103.

Digital Library

[37]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom). Association for Computing Machinery, New York, NY, USA, 1040–1048.

Digital Library

[38]

Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A Deep Reinforcement Learning Framework for News Recommendation. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 167–176.

Digital Library

Cited By

Ruan QMac Namee BDong R(2024)The Effects of Media Bias on News RecommendationsIEEE Access10.1109/ACCESS.2024.341377212(83391-83404)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3413772
Schuster NLazar S(2024)Attention, moral skill, and algorithmic recommendationPhilosophical Studies10.1007/s11098-023-02083-6Online publication date: 22-Jan-2024
https://doi.org/10.1007/s11098-023-02083-6
Bezou-Vrakatseli EBrückner BThorburn L(2023)SHAPE: A Framework for Evaluating the Ethicality of InfluenceMulti-Agent Systems10.1007/978-3-031-43264-4_11(167-185)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43264-4_11

Index Terms

User Tampering in Reinforcement Learning Recommender Systems
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Causal reasoning and diagnostics
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning

Recommendations

Reinforcement Learning based Recommender Systems: A Survey
Recommender systems (RSs) have become an inseparable part of our everyday lives. They help us find our favorite items to purchase, our friends on social networks, and our favorite movies to watch. Traditionally, the recommendation problem was considered ...
Reinforcement learning for addressing the cold-user problem in recommender systems
Abstract
Recommender systems are widely used in webshops because of their ability to provide users with personalized recommendations. However, the cold-user problem (i.e., recommending items to new users) is an important issue many webshops face. With the ...
Acquiring User Information Needs for Recommender Systems
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 03

Most recommender systems attempt to use collaborative filtering, content-based filtering or hybrid approach to recommend items to new users. Collaborative filtering recommends items to new users based on their similar neighbours, and content-based ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AIES '23: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society

August 2023

1026 pages

ISBN:9798400702310

DOI:10.1145/3600211

Editors:
Francesca Rossi
IBM
,
Sanmay Das
George Mason University
,
Jenny Davis
Australian National University
,
Kay Firth-Butterfield
Centre for Trustworthy Technology
,
Alex John
London, Carnegie Mellon University

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 August 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

AIES '23

Sponsor:

SIGAI

AIES '23: AAAI/ACM Conference on AI, Ethics, and Society

August 8 - 10, 2023

QC, Montr\'{e}al, Canada

Acceptance Rates

Overall Acceptance Rate 61 of 162 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
153
Total Downloads

Downloads (Last 12 months)153
Downloads (Last 6 weeks)11

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ruan QMac Namee BDong R(2024)The Effects of Media Bias on News RecommendationsIEEE Access10.1109/ACCESS.2024.341377212(83391-83404)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3413772
Schuster NLazar S(2024)Attention, moral skill, and algorithmic recommendationPhilosophical Studies10.1007/s11098-023-02083-6Online publication date: 22-Jan-2024
https://doi.org/10.1007/s11098-023-02083-6
Bezou-Vrakatseli EBrückner BThorburn L(2023)SHAPE: A Framework for Evaluating the Ethicality of InfluenceMulti-Agent Systems10.1007/978-3-031-43264-4_11(167-185)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43264-4_11

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents