short-paper

Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommendation Systems

Authors:

Hui FangAuthors Info & Claims

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2286 - 2290

https://doi.org/10.1145/3626772.3657924

Published: 11 July 2024 Publication History

Abstract

Large Language Models (LLMs) have demonstrated great potential in Conversational Recommender Systems (CRS). However, the application of LLMs to CRS has exposed a notable discrepancy in behavior between LLM-based CRS and human recommenders: LLMs often appear inflexible and passive, frequently rushing to complete the recommendation task without sufficient inquiry. This behavior discrepancy can lead to decreased accuracy in recommendations and lower user satisfaction. Despite its importance, existing studies in CRS lack a study about how to measure such behavior discrepancy. To fill this gap, we propose Behavior Alignment, a new evaluation metric to measure how well the recommendation strategies made by a LLM-based CRS are consistent with human recommenders'. Our experiment results show that the new metric is better aligned with human preferences and can better differentiate how systems perform than existing evaluation metrics. As Behavior Alignment requires explicit and costly human annotations on the recommendation strategies, we also propose a classification-based method to implicitly measure the Behavior Alignment based on the responses. The evaluation results confirm the robustness of the method.

References

[1]

Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. 2023. Tallrec: an effective and efficient tuning framework to align large language model with recommendation. arXiv preprint arXiv:2305.00447.

[2]

Qibin Chen, Junyang Lin, Yichang Zhang, Ming Ding, Yukuo Cen, Hongxia Yang, and Jie Tang. 2019. Towards knowledge-based recommender dialog system. arXiv preprint arXiv:1908.05391.

[3]

Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, and Mark Cieliebak. 2021. Survey on evaluation methods for dialogue systems. Artificial Intelligence Review, 54, 755--810.

Digital Library

[4]

Luke Friedman et al. 2023. Leveraging large language models in conversational recommender systems. arXiv preprint arXiv:2305.07961.

[5]

Chongming Gao, Wenqiang Lei, Xiangnan He, Maarten de Rijke, and Tat-Seng Chua. 2021. Advances and challenges in conversational recommender systems: a survey. AI open, 2, 100--126.

[6]

Shirley Anugrah Hayati, Dongyeop Kang, Qingxiaoyang Zhu, Weiyan Shi, and Zhou Yu. 2020. Inspired: toward sociable recommendation dialog systems. arXiv preprint arXiv:2009.14306.

[7]

Zhankui He, Zhouhang Xie, Rahul Jha, Harald Steck, Dawen Liang, Yesu Feng, Bodhisattwa Prasad Majumder, Nathan Kallus, and Julian McAuley. 2023. Large language models as zero-shot conversational recommenders. arXiv preprint arXiv:2308.10053.

[8]

Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian McAuley, and Wayne Xin Zhao. 2023. Large language models are zero-shot rankers for recommender systems. arXiv preprint arXiv:2305.08845.

[9]

Dietmar Jannach. 2023. Evaluating conversational recommender systems: a landscape of research. Artificial Intelligence Review, 56, 3, 2365--2400.

Digital Library

[10]

Andreas Köpf et al. 2023. Openassistant conversations-democratizing large language model alignment. arXiv preprint arXiv:2304.07327.

[11]

Lei Li, Yongfeng Zhang, Dugang Liu, and Li Chen. 2023. Large language models for generative recommendation: a survey and visionary discussions. arXiv preprint arXiv:2309.01157.

[12]

Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and Chris Pal. 2018. Towards deep conversational recommendations. Advances in neural information processing systems, 31.

[13]

Lizi Liao, Grace Hui Yang, and Chirag Shah. 2023. Proactive conversational agents in the post-chatgpt world. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 3452--3455.

Digital Library

[14]

Long Ouyang et al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730--27744.

[15]

Guilherme Penedo, Quentin Malartic, Daniel Hesslow, Ruxandra Cojocaru, Alessandro Cappelli, Hamza Alobeidli, Baptiste Pannier, Ebtesam Almazrouei, and Julien Launay. 2023. The refinedweb dataset for falcon llm: outperforming curated corpora withweb data, andweb data only. arXiv preprint arXiv:2306.01116.

[16]

Damien Sileo, Wout Vossen, and Robbe Raymaekers. 2022. Zero-shot recommendation as language modeling. In European Conference on Information Retrieval. Springer, 223--230.

Digital Library

[17]

Yueming Sun and Yi Zhang. 2018. Conversational recommender system. In The 41st international acm sigir conference on research & development in information retrieval, 235--244.

[18]

Hugo Touvron et al. 2023. Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.

[19]

Xiaolei Wang, Kun Zhou, Ji-Rong Wen, and Wayne Xin Zhao. 2022. Towards unified conversational recommender systems via knowledge-enhanced prompt learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1929--1937.

Digital Library

[20]

JasonWei et al. 2022. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682.

[21]

Gangyi Zhang. 2023. User-centric conversational recommendation: adapting the need of user with large language models. In Proceedings of the 17th ACM Conference on Recommender Systems, 1349--1354.

Digital Library

[22]

Junjie Zhang, Ruobing Xie, Yupeng Hou, Wayne Xin Zhao, Leyu Lin, and Ji-Rong Wen. 2023. Recommendation as instruction following: a large language model empowered recommendation approach. arXiv preprint arXiv:2305.07001.

[23]

Shuo Zhang and Krisztian Balog. 2020. Evaluating conversational recommender systems via user simulation. In Proceedings of the 26th acm sigkdd international conference on knowledge discovery & data mining, 1512--1520.

Digital Library

[24]

Kun Zhou, Xiaolei Wang, Yuanhang Zhou, Chenzhan Shang, Yuan Cheng, Wayne Xin Zhao, Yaliang Li, and Ji-Rong Wen. 2021. Crslab: an open-source toolkit for building conversational recommender system. arXiv preprint arXiv:2101.00939.

[25]

Kun Zhou, Wayne Xin Zhao, Shuqing Bian, Yuanhang Zhou, Ji-Rong Wen, and Jingsong Yu. 2020. Improving conversational recommender systems via knowledge graph based semantic fusion. In Proceedings of the 26th ACMSIGKDD international conference on knowledge discovery & data mining, 1006--1014.

Digital Library

[26]

Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, Xiaoke Wang, and Ji-Rong Wen. 2020. Towards topic-guided conversational recommender system. arXiv preprint arXiv:2010.04125.

Cited By

Gonzalez AMizuuchi IIndurkhya B(2025)ROOTED: An Open Source Toolkit for Dialogue Systems in Human Robot InteractionSocial Robotics10.1007/978-981-96-1151-5_11(104-118)Online publication date: 7-Feb-2025
https://doi.org/10.1007/978-981-96-1151-5_11

Index Terms

Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommendation Systems
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
2. Information systems
  1. Information retrieval
    1. Users and interactive retrieval

Recommendations

New Recommendation Techniques for Multicriteria Rating Systems

Traditional single-rating recommender systems have been successful in a number of personalization applications, but the research area of multicriteria recommender systems has been largely untouched. Taking full advantage of multicriteria ratings in ...
User-Specific Feature-Based Similarity Models for Top-n Recommendation of New Items
Survey Paper, Regular Papers and Special Section on Participatory Sensing and Crowd Intelligence

Recommending new items for suitable users is an important yet challenging problem due to the lack of preference history for the new items. Noncollaborative user modeling techniques that rely on the item features can be used to recommend new items. ...
Conversational Collaborative Recommendation --- An Experimental Analysis

Traditionally, collaborative recommender systems have been based on a single-shot model of recommendation where a single set of recommendations is generated based on a user's (past) stored preferences. However, content-based recommender system research ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2024

3164 pages

ISBN:9798400704314

DOI:10.1145/3626772

General Chairs:
Grace Hui Yang
Georgetown University, USA
,
Hongning Wang
Tsinghua University, China
,
Sam Han
The Washington Post, USA
,
Program Chairs:
Claudia Hauff
Spotify, Netherlands
,
Guido Zuccon
The University of Queensland, Australia
,
Yi Zhang
University of California Santa Cruz, USA

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

JP Morgan Chase Stipend

Conference

SIGIR 2024

Sponsor:

SIGIR

SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 14 - 18, 2024

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
465
Total Downloads

Downloads (Last 12 months)465
Downloads (Last 6 weeks)47

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gonzalez AMizuuchi IIndurkhya B(2025)ROOTED: An Open Source Toolkit for Dialogue Systems in Human Robot InteractionSocial Robotics10.1007/978-981-96-1151-5_11(104-118)Online publication date: 7-Feb-2025
https://doi.org/10.1007/978-981-96-1151-5_11

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten