skip to main content
10.5555/2936924.2937002acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article
Public Access

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds

Published: 09 May 2016 Publication History

Abstract

Perkins' Monte Carlo exploring starts for partially observable Markov decision processes (MCES-P) integrates Monte Carlo exploring starts into a local search of policy space to offer a template for reinforcement learning that operates under partial observability of the state. In this paper, we generalize the reinforcement learning under partial observability to the self-interested multiagent setting. We present a new template, MCES-IP, which extends MCES-P by maintaining predictions of the other agent's actions based on dynamic beliefs over models. MCES-IP is instantiated to be approximately locally optimal with some probability by deriving a theoretical bound on the sample size that in part depends on the allowed error from the sampling; we refer to this algorithm as MCESIP+PAC. Our experiments demonstrate that MCESIP+PAC learns policies whose values are comparable or better than those from MCESP+PAC in multiagent domains while utilizing much less samples for each transformation.

References

[1]
Richard S Sutton and Andrew G Barto. Introduction to reinforcement learning. MIT Press, 1998.
[2]
Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3--4):279--292, 1992.
[3]
Leemon Baird et al. Residual algorithms: Reinforcement learning with function approximation. In ICML, pages 30--37, 1995.
[4]
Steven D Whitehead. Reinforcement learning for the adaptive control of perception and action. Technical report, DTIC Document, 1992.
[5]
Theodore J Perkins. Reinforcement learning for pomdps based on action values and stochastic optimization. In AAAI/IAAI, pages 199--204, 2002.
[6]
Stuart Russell, Peter Norvig, and Artificial Intelligence. A modern approach. Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs, 25, 1995.
[7]
Daniel S Bernstein, Shlomo Zilberstein, and Neil Immerman. The complexity of decentralized control of markov decision processes. In Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence, pages 32--37. Morgan Kaufmann Publishers Inc., 2000.
[8]
Prashant J Doshi and Piotr J Chairperson-Gmytrasiewicz. Optimal sequential planning in partially observable multiagent settings. University of Illinois at Chicago, 2005.
[9]
Bikramjit Banerjee, Jeremy Lyle, Landon Kraemer, and Rajesh Yellamraju. Sample bounded distributed reinforcement learning for decentralized pomdps. In AAAI, 2012.
[10]
Landon Kraemer and Bikramjit Banerjee. Reinforcement learning of informed initial policies for decentralized planning. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 9(4):18, 2014.
[11]
Christopher Amato and Frans A Oliehoek. Scalable planning and learning for multiagent pomdps. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15)(to appear), 2015.
[12]
Brenda Ng, Kofi Boakye, Carol Meyers, and Andrew Wang. Bayes-adaptive interactive pomdps. In AAAI, 2012.
[13]
Trong Nghia Hoang and Kian Hsiang Low. A general framework for interacting bayes-optimally with self-interested agents using arbitrary parametric model and model prior. In Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI), pages 1394--1400, 2013.
[14]
Russell Greiner. Palo: A probabilistic hill-climbing algorithm. Artificial Intelligence, 84(1):177--208, 1996.
[15]
Yoav Shoham and Kevin Leyton-Brown. Multiagent Systems: Algorithmic, Game-theoretic, and Logical Foundations. Cambridge University Press, 2009.
[16]
Piotr J Gmytrasiewicz and Prashant Doshi. A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research, pages 49--79, 2005.
[17]
Ekhlas Sonu and Prashant Doshi. Generalized and bounded policy iteration for finitely-nested interactive pomdps: Scaling up. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages 1039--1048. International Foundation for Autonomous Agents and Multiagent Systems, 2012.
[18]
Brenda Ng, Carol Meyers, Kofi Boakye, and John Nitao. Towards applying interactive POMDPs to real-world adversary modeling. In Innovative Applications in Artificial Intelligence (IAAI), pages 1814--1820, 2010.

Index Terms

  1. Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems
    May 2016
    1580 pages
    ISBN:9781450342391

    Sponsors

    • IFAAMAS

    In-Cooperation

    Publisher

    International Foundation for Autonomous Agents and Multiagent Systems

    Richland, SC

    Publication History

    Published: 09 May 2016

    Check for updates

    Author Tags

    1. multiple agents
    2. partial observability
    3. probably approximately correct
    4. reinforcement learning

    Qualifiers

    • Research-article

    Funding Sources

    • ONR
    • NSF

    Conference

    AAMAS '16
    Sponsor:

    Acceptance Rates

    AAMAS '16 Paper Acceptance Rate 137 of 550 submissions, 25%;
    Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 290
      Total Downloads
    • Downloads (Last 12 months)62
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media