skip to main content
10.1145/3600211.3604669acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article

User Tampering in Reinforcement Learning Recommender Systems

Published: 29 August 2023 Publication History

Abstract

In this paper, we introduce new formal methods and provide empirical evidence to highlight a unique safety concern prevalent in reinforcement learning (RL)-based recommendation algorithms – ’user tampering.’ User tampering is a situation where an RL-based recommender system may manipulate a media user’s opinions through its suggestions as part of a policy to maximize long-term user engagement. We use formal techniques from causal modeling to critically analyze prevailing solutions proposed in the literature for implementing scalable RL-based recommendation systems, and we observe that these methods do not adequately prevent user tampering. Moreover, we evaluate existing mitigation strategies for reward tampering issues, and show that these methods are insufficient in addressing the distinct phenomenon of user tampering within the context of recommendations. We further reinforce our findings with a simulation study of an RL-based recommendation system focused on the dissemination of political content. Our study shows that a Q-learning algorithm consistently learns to exploit its opportunities to polarize simulated users with its early recommendations in order to have more consistent success with subsequent recommendations that align with this induced polarization. Our findings emphasize the necessity for developing safer RL-based recommendation systems and suggest that achieving such safety would require a fundamental shift in the design away from the approaches we have seen in the recent literature.

References

[1]
Himan Abdollahpouri, Gediminas Adomavicius, Robin Burke, Ido Guy, Dietmar Jannach, Toshihiro Kamishima, Jan Krasnodebski, and Luiz Pizzato. 2020. Multistakeholder recommendation: Survey and research directions. User Modeling and User-Adapted Interaction 30 (2020), 127–158.
[2]
M. Mehdi Afsar, Trafford Crump, and Behrouz Far. 2022. Reinforcement Learning Based Recommender Systems: A Survey. ACM Comput. Surv. (jun 2022). https://doi.org/10.1145/3543846 Just Accepted.
[3]
Anitha Anandhan, Liyana Shuib, Maizatul Akmar Ismail, and Ghulam Mujtaba. 2018. Social Media Recommender Systems: Review and Open Research Issues. IEEE Access 6 (2018), 15608–15628.
[4]
Stuart Armstrong, Jan Leike, Laurent Orseau, and Shane Legg. 2020. Pitfalls of Learning a Reward Function Online. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 1592–1600.
[5]
Christopher A. Bail. 2021. Breaking the Social Media Prism: How to Make our Platforms Less Polarizing. Princeton University Press, Princeton, New Jersey.
[6]
Christopher A. Bail, Lisa P. Argyle, Taylor W. Brown, John P. Bumpus, Haohan Chen, M. B. Fallin Hunzaker, Jaemin Lee, Marcus Mann, Friedolin Merhout, and Alexander Volfovsky. 2018. Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences 115, 37 (2018), 9216–9221.
[7]
Jesus Bobadilla, Fernando Ortega, A. Hernando, and A. Gutiérrez. 2013. Recommender systems survey. Knowledge-Based Systems 46 (2013), 109–132.
[8]
Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, and Anca Dragan. 2021. Estimating and Penalizing Preference Shift in Recommender Systems. Association for Computing Machinery, New York, NY, USA, 661–667. https://doi.org/10.1145/3460231.3478849
[9]
Tom Everitt, Ryan Carey, Eric D. Langlois, Pedro A. Ortega, and Shane Legg. 2021. Agent Incentives: A Causal Perspective. Proceedings of the AAAI Conference on Artificial Intelligence 35, 13 (2021), 11487–11495.
[10]
Tom Everitt, Marcus Hutter, Ramana Kumar, and Victoria Krakovna. 2021. Reward tampering problems and solutions in reinforcement learning: A causal influence diagram perspective. Synthese (2021), 1–33.
[11]
Sebastian Farquhar, Ryan Carey, and Tom Everitt. 2022. Path-Specific Objectives for Safer Agent Incentives. Proceedings of the AAAI Conference on Artificial Intelligence.
[12]
Florent Garcin, Kai Zhou, Boi Faltings, and Vincent Schickel. 2012. Personalized News Recommendation Based on Collaborative Filtering. In 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Vol. 1. 437–441.
[13]
Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, and Xiaohui Ye. 2018. Horizon: Facebook’s Open Source Applied Reinforcement Learning Platform. Facebook AI (2018).
[14]
David Heckerman and Ross Shachter. 1995. Decision-Theoretic Foundations for Causal Reasoning. Journal of Artificial Intelligence Research 3, 1 (1995), 405–430.
[15]
Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, and Pushmeet Kohli. 2019. Degenerate Feedback Loops in Recommender Systems. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (Honolulu, HI, USA) (AIES ’19). Association for Computing Machinery, New York, NY, USA, 383–390.
[16]
Mohammed Khwaja, Miquel Ferrer, Jesus Omana Iglesias, A. Aldo Faisal, and Aleksandar Matic. 2019. Aligning Daily Activities with Personality: Towards a Recommender System for Improving Wellbeing. In Proceedings of the 13th ACM Conference on Recommender Systems (Copenhagen, Denmark) (RecSys ’19). Association for Computing Machinery, New York, NY, USA, 368–372.
[17]
David Krueger, Tegan Maharaj, and Jan Leike. 2020. Hidden Incentives for Auto-Induced Distributional Shift. ArXiv arXiv:2009.09153 (2020).
[18]
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A Contextual-Bandit Approach to Personalized News Article Recommendation. In Proceedings of the 19th International Conference on World Wide Web (Raleigh, North Carolina, USA) (WWW ’10). Association for Computing Machinery, New York, NY, USA, 661–670.
[19]
Feng Liu, Ruiming Tang, Xutao Li, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling. ArXiv arXiv:1810.12027 (2018).
[20]
Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. 2010. Personalized News Recommendation Based on Click Behavior. In Proceedings of the 15th International Conference on Intelligent User Interfaces (Hong Kong, China) (IUI ’10). Association for Computing Machinery, New York, NY, USA, 31–40.
[21]
Yang Liu, Zhengxing Chen, Kittipat Virochsiri, Juan Wang, Jiahao Wu, and Feng Liang. 2020. Reinforcement Learning-based Product Delivery Frequency Control. Facebook AI (2020).
[22]
Zhongqi Lu, Zhicheng Dou, Jianxun Lian, Xing Xie, and Qiang Yang. 2015. Content-Based Collaborative Filtering for News Topic Recommendation. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (Austin, Texas) (AAAI’15). AAAI Press, 217–223.
[23]
Silvia Milano, Mariarosaria Taddeo, and Luciano Floridi. 2020. Recommender systems and their ethical challenges. AI & SOCIETY 35, 4 (2020), 957–967.
[24]
Gourab K. Patro, Arpita Biswas, Niloy Ganguly, Krishna P. Gummadi, and Abhijnan Chakraborty. 2020. FairRec: Two-Sided Fairness for Personalized Recommendations in Two-Sided Platforms. In Proceedings of The Web Conference 2020 (Taipei, Taiwan) (WWW ’20). Association for Computing Machinery, New York, NY, USA, 1194–1204.
[25]
Martin L Puterman. 1990. Markov decision processes. Handbooks in operations research and management science 2 (1990), 331–434.
[26]
Stuart J. Russell. 2019. Filter Bubbles and the Future of Artificial Intelligence. https://www.youtube.com/watch?v=ZkV7anCPfaY&t=230s&ab_channel=LongNowFoundation. Accessed June 2, 2021.
[27]
Stuart J. Russell. 2019. Human Compatible: AI and the Problem of Control. Allen Lane, London.
[28]
Zeinab Shahbazi and Yung Cheol Byun. 2020. Toward Social Media Content Recommendation Integrated with Data Science and Machine Learning Approach for E-Learners. Symmetry 12, 11 (2020).
[29]
Guy Shani, David Heckerman, and Ronen Brafman. 2005. An MDP-Based Recommender System. Journal of Machine Learning Research 6 (2005), 1265–1295.
[30]
Jonathan Stray, Steven Adler, and Dylan Hadfield-Menell. 2020. What are you optimizing for? Aligning Recommender Systems with Human Values. In Participatory Approaches to Machine Learning. International Conference on Machine Learning Workshop.
[31]
Nima Taghipour and Ahmad Kardan. 2008. A Hybrid Web Recommender System Based on Q-Learning. In Proceedings of the 2008 ACM Symposium on Applied Computing (Fortaleza, Ceara, Brazil) (SAC ’08). Association for Computing Machinery, New York, NY, USA, 1164–1168.
[32]
Nima Taghipour, Ahmad Kardan, and Saeed Shiry Ghidary. 2007. Usage-Based Web Recommendations: A Reinforcement Learning Approach. In Proceedings of the 2007 ACM Conference on Recommender Systems (Minneapolis, MN, USA) (RecSys ’07). Association for Computing Machinery, New York, NY, USA, 113–120.
[33]
Liang Tang, Yexi Jiang, Lei Li, and Tao Li. 2014. Ensemble Contextual Bandits for Personalized Recommendation. In Proceedings of the 8th ACM Conference on Recommender Systems (Foster City, Silicon Valley, California, USA) (RecSys ’14). Association for Computing Machinery, New York, NY, USA, 73–80.
[34]
Liang Tang, Yexi Jiang, Lei Li, Chunqiu Zeng, and Tao Li. 2015. Personalized Recommendation via Parameter-Free Contextual Bandits. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (Santiago, Chile) (SIGIR ’15). Association for Computing Machinery, New York, NY, USA, 323–332.
[35]
Chunqiu Zeng, Qing Wang, Shekoofeh Mokhtari, and Tao Li. 2016. Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 2025–2034.
[36]
Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep Reinforcement Learning for Page-Wise Recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems (Vancouver, British Columbia, Canada) (RecSys ’18). Association for Computing Machinery, New York, NY, USA, 95–103.
[37]
Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom). Association for Computing Machinery, New York, NY, USA, 1040–1048.
[38]
Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A Deep Reinforcement Learning Framework for News Recommendation. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 167–176.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AIES '23: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society
August 2023
1026 pages
ISBN:9798400702310
DOI:10.1145/3600211
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 August 2023

Check for updates

Author Tags

  1. AI Ethics
  2. AI Safety
  3. Recommendation Systems
  4. Recommender Systems
  5. Reinforcement Learning
  6. Value Alignment

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AIES '23
Sponsor:
AIES '23: AAAI/ACM Conference on AI, Ethics, and Society
August 8 - 10, 2023
QC, Montr\'{e}al, Canada

Acceptance Rates

Overall Acceptance Rate 61 of 162 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)153
  • Downloads (Last 6 weeks)11
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media