research-article

Public Access

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds

Authors:

Prashant Doshi,

Bikramjit BanerjeeAuthors Info & Claims

AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems

Pages 530 - 538

Published: 09 May 2016 Publication History

Abstract

Perkins' Monte Carlo exploring starts for partially observable Markov decision processes (MCES-P) integrates Monte Carlo exploring starts into a local search of policy space to offer a template for reinforcement learning that operates under partial observability of the state. In this paper, we generalize the reinforcement learning under partial observability to the self-interested multiagent setting. We present a new template, MCES-IP, which extends MCES-P by maintaining predictions of the other agent's actions based on dynamic beliefs over models. MCES-IP is instantiated to be approximately locally optimal with some probability by deriving a theoretical bound on the sample size that in part depends on the allowed error from the sampling; we refer to this algorithm as MCESIP+PAC. Our experiments demonstrate that MCESIP+PAC learns policies whose values are comparable or better than those from MCESP+PAC in multiagent domains while utilizing much less samples for each transformation.

References

[1]

Richard S Sutton and Andrew G Barto. Introduction to reinforcement learning. MIT Press, 1998.

Digital Library

[2]

Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3--4):279--292, 1992.

Digital Library

[3]

Leemon Baird et al. Residual algorithms: Reinforcement learning with function approximation. In ICML, pages 30--37, 1995.

[4]

Steven D Whitehead. Reinforcement learning for the adaptive control of perception and action. Technical report, DTIC Document, 1992.

[5]

Theodore J Perkins. Reinforcement learning for pomdps based on action values and stochastic optimization. In AAAI/IAAI, pages 199--204, 2002.

Digital Library

[6]

Stuart Russell, Peter Norvig, and Artificial Intelligence. A modern approach. Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs, 25, 1995.

Digital Library

[7]

Daniel S Bernstein, Shlomo Zilberstein, and Neil Immerman. The complexity of decentralized control of markov decision processes. In Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence, pages 32--37. Morgan Kaufmann Publishers Inc., 2000.

Digital Library

[8]

Prashant J Doshi and Piotr J Chairperson-Gmytrasiewicz. Optimal sequential planning in partially observable multiagent settings. University of Illinois at Chicago, 2005.

[9]

Bikramjit Banerjee, Jeremy Lyle, Landon Kraemer, and Rajesh Yellamraju. Sample bounded distributed reinforcement learning for decentralized pomdps. In AAAI, 2012.

Digital Library

[10]

Landon Kraemer and Bikramjit Banerjee. Reinforcement learning of informed initial policies for decentralized planning. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 9(4):18, 2014.

Digital Library

[11]

Christopher Amato and Frans A Oliehoek. Scalable planning and learning for multiagent pomdps. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15)(to appear), 2015.

Digital Library

[12]

Brenda Ng, Kofi Boakye, Carol Meyers, and Andrew Wang. Bayes-adaptive interactive pomdps. In AAAI, 2012.

Digital Library

[13]

Trong Nghia Hoang and Kian Hsiang Low. A general framework for interacting bayes-optimally with self-interested agents using arbitrary parametric model and model prior. In Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI), pages 1394--1400, 2013.

Digital Library

[14]

Russell Greiner. Palo: A probabilistic hill-climbing algorithm. Artificial Intelligence, 84(1):177--208, 1996.

Digital Library

[15]

Yoav Shoham and Kevin Leyton-Brown. Multiagent Systems: Algorithmic, Game-theoretic, and Logical Foundations. Cambridge University Press, 2009.

Digital Library

[16]

Piotr J Gmytrasiewicz and Prashant Doshi. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research, pages 49--79, 2005.

Digital Library

[17]

Ekhlas Sonu and Prashant Doshi. Generalized and bounded policy iteration for finitely-nested interactive pomdps: Scaling up. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages 1039--1048. International Foundation for Autonomous Agents and Multiagent Systems, 2012.

Digital Library

[18]

Brenda Ng, Carol Meyers, Kofi Boakye, and John Nitao. Towards applying interactive POMDPs to real-world adversary modeling. In Innovative Applications in Artificial Intelligence (IAAI), pages 1814--1820, 2010.

Index Terms

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Multi-agent reinforcement learning

Recommendations

Learning to Act Optimally in Partially Observable Multiagent Settings: (Doctoral Consortium)
AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems

My research is focused on modeling optimal decision making in partially observable multiagent environments. I began with an investigation into the cognitive biases that induce subnormative behavior in humans playing games online in multiagent settings, ...
Learning Complementary Representations of the Past using Auxiliary Tasks in Partially Observable Reinforcement Learning
AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems

Partially observable Markov decision processes (POMDPs) define discrete-time sequential control problems [3, 11, 20]. In partially observable reinforcement learning (RL), an agent lacks access to the system state or domain model, and has to rely on the ...
A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game

We formulate an automatic strategy acquisition problem for the multi-agent card game "Hearts" as a reinforcement learning problem. The problem can approximately be dealt with in the framework of a partially observable Markov decision process (POMDP) for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems

May 2016

1580 pages

ISBN:9781450342391

General Chairs:
Catholijn M. Jonker
TU Delft, Netherlands
,
Stacy Marsella
University of Southern California, USA
,
Program Chairs:
John Thangarajah
RMIT University. Australia
,
Karl Tuyls
University of Liverpool, UK

Sponsors

IFAAMAS

In-Cooperation

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 09 May 2016

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ONR
NSF

Conference

AAMAS '16

Sponsor:

AAMAS '16: International Conference on Agents and Multiagent Systems

May 9 - 13, 2016

Singapore, Singapore

Acceptance Rates

AAMAS '16 Paper Acceptance Rate 137 of 550 submissions, 25%;

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
290
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)10

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents