skip to main content
10.1145/1553374.1553422acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Dynamic analysis of multiagent Q-learning with ε-greedy exploration

Published: 14 June 2009 Publication History

Abstract

The development of mechanisms to understand and model the expected behaviour of multiagent learners is becoming increasingly important as the area rapidly find application in a variety of domains. In this paper we present a framework to model the behaviour of Q-learning agents using the ε-greedy exploration mechanism. For this, we analyse a continuous-time version of the Q-learning update rule and study how the presence of other agents and the ε-greedy mechanism affect it. We then model the problem as a system of difference equations which is used to theoretically analyse the expected behaviour of the agents. The applicability of the framework is tested through experiments in typical games selected from the literature.

References

[1]
Abdallah, S., & Lesser, V. (2008). Non-linear Dynamics in Multiagent Reinforcement Learning Algorithms. Proceedings of the Seventh International Conference on Autonomous Agents and Multiagent Systems (AAMAS'08) (pp. 1321--1324). Estoril, Portugal: IFAAMAS.
[2]
Borgers, T., & Sarin, R. (1997). Learning through reinforcement and replicator dynamics. Journal of Economic Theory, 77, 1--14.
[3]
Borkar, V. S., & Meyn, S. P. (2000). The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control and Optimization, 38, 447--469.
[4]
Bush, R. R., & Mosteller, F. (1955). Stochastic models for learning. New York: John Wiley and Sons.
[5]
Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. Proceedings of the Fifteenth National Conference on Artificial Intelligence (pp. 746--752). Menlo Park, CA, USA: AAAI.
[6]
Fulda, N., & Ventura, D. (2007). Predicting and preventing coordination problems in cooperative Q-learning systems. Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI'07) (pp. 780--785).
[7]
Galstyan, A., Czajkowski, K., & Lerman, K. (2004). Resource allocation in the grid using reinforcement learning. Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS'04) (pp. 1314--1315). Washington, DC, USA: IEEE Computer Society.
[8]
Gomes, E. R., & Kowalczyk, R. (2007). Learning the ipa market with individual and social rewards. Proceedings of the International Conference on Intelligent Agent Technology (IAT'07) (pp. 328--334). Fremont, CA, USA: IEEE Computer Society.
[9]
Harandi, M. T., Ahmadabadi, M. N., & Araabi, B. N. (2008). Optimal local basis: A reinforcement learning approach for face recognition. International Journal of Computer Vision, 81, 191--204.
[10]
Hofbauer, J., & Sigmund, K. (1998). Evolutionary games and population dynamics. Cambridge University Press.
[11]
Iglesias, A., Martnez, P., Aler, R., & Fernndez, F. (2008). Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Applied Intelligence, in press.
[12]
Leslie, D. S., & Collins, E. J. (2005). Individual Q-learning in normal form games. SIAM Journal on Control and Optimization, 44, 495--514.
[13]
Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems, 11, 387--434.
[14]
Panait, L., Tuyls, K., & Luke, S. (2008). Theoretical advantages of lenient learners: An evolutionary game theoretic perspective. Journal of Machine Learning Research, 9, 423--457.
[15]
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
[16]
Tuyls, K., Verbeeck, K., & Lenaerts, T. (2003). A selection-mutation model for Q-learning in multiagent systems. Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS'03) (pp. 693--700). New York, NY, USA: ACM.
[17]
Vidal, J. M., & Durfee, E. H. (2003). Predicting the expected behavior of agents that learn about agents: the CLRI framework. Autonomous Agents and Multi-Agent Systems, 6, 77--107.
[18]
Ziogos, N. P., Tellidou, A. C., Gountis, V. P., & Bakirtzis, A. G. (2007). A reinforcement learning algorithm for market participants in FTR auctions. Proceedings of the Seventh IEEE Power Tech (pp. 943--948). IEEE.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)8
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media