Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

Prashanth L.A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvari
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1406-1415, 2016.

Abstract

Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also empirically demonstrate the usefulness of our algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-la16, title = {Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control}, author = {L.A., Prashanth and Jie, Cheng and Fu, Michael and Marcus, Steve and Szepesvari, Csaba}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {1406--1415}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/la16.pdf}, url = {https://proceedings.mlr.press/v48/la16.html}, abstract = {Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also empirically demonstrate the usefulness of our algorithms.} }
Endnote
%0 Conference Paper %T Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control %A Prashanth L.A. %A Cheng Jie %A Michael Fu %A Steve Marcus %A Csaba Szepesvari %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-la16 %I PMLR %P 1406--1415 %U https://proceedings.mlr.press/v48/la16.html %V 48 %X Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also empirically demonstrate the usefulness of our algorithms.
RIS
TY - CPAPER TI - Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control AU - Prashanth L.A. AU - Cheng Jie AU - Michael Fu AU - Steve Marcus AU - Csaba Szepesvari BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-la16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 1406 EP - 1415 L1 - http://proceedings.mlr.press/v48/la16.pdf UR - https://proceedings.mlr.press/v48/la16.html AB - Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also empirically demonstrate the usefulness of our algorithms. ER -
APA
L.A., P., Jie, C., Fu, M., Marcus, S. & Szepesvari, C.. (2016). Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1406-1415 Available from https://proceedings.mlr.press/v48/la16.html.

Related Material