Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

Shao, Han; Cohen, Lee; Blum, Avrim; Mansour, Yishay; Saha, Aadirupa; Walter, Matthew R.

Computer Science > Machine Learning

arXiv:2302.03805 (cs)

[Submitted on 7 Feb 2023 (v1), last revised 1 Nov 2023 (this version, v2)]

Title:Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

Authors:Han Shao, Lee Cohen, Avrim Blum, Yishay Mansour, Aadirupa Saha, Matthew R. Walter

View PDF

Abstract:In classic reinforcement learning (RL) and decision making problems, policies are evaluated with respect to a scalar reward function, and all optimal policies are the same with regards to their expected return. However, many real-world problems involve balancing multiple, sometimes conflicting, objectives whose relative priority will vary according to the preferences of each user. Consequently, a policy that is optimal for one user might be sub-optimal for another. In this work, we propose a multi-objective decision making framework that accommodates different user preferences over objectives, where preferences are learned via policy comparisons. Our model consists of a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector that expresses the relative importance of each objective. The goal is to efficiently compute a near-optimal policy for a given user. We consider two user feedback models. We first address the case where a user is provided with two policies and returns their preferred policy as feedback. We then move to a different user feedback model, where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2302.03805 [cs.LG]
	(or arXiv:2302.03805v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2302.03805

Submission history

From: Han Shao [view email]
[v1] Tue, 7 Feb 2023 23:58:19 UTC (73 KB)
[v2] Wed, 1 Nov 2023 03:06:11 UTC (74 KB)

Computer Science > Machine Learning

Title:Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators