skip to main content
10.1145/3409501.3409517acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcctConference Proceedingsconference-collections
research-article

Value Function Dynamic Estimation in Reinforcement Learning based on Data Adequacy

Published: 25 August 2020 Publication History

Abstract

In recent years, reinforcement learning has played an important role in the study of decision problem in computer games. To solve the problem of how to better estimate the value function with limited computational resources, this paper proposes a dynamic estimation method of value function based on data adequacy. In consideration of the varying complexity of each state in the MDP model, we propose a dynamic value function estimation method which is different from the fixed value function estimation method in traditional methods. Based on the PigChase challenge of the Malmo project launched by Microsoft in 2017, we compare the new method with the existing techniques. Experimental results show that the performance of the proposed algorithm is better than traditional algorithms.

References

[1]
Sutton, R., and Barto, A. 1998. Reinforcement Learning:An Introduction.
[2]
Bellman, R. 1957. On a dynamic programming approach to the caterer problem-i. Management Science, 3.
[3]
Watkins, C. J. C. H., and Dayan, P. 1992. Q-learning. Machine Learning, 8(3-4), 279--292.
[4]
Project Malmo. https://www.microsoft.com/en-us/research/project/project-malmo/.
[5]
Hu, J. and Wellman, M. 1998. Multi-agent reinforcement learning: Theoretical framework andnan algorithms. In Proc. 15th International Conference on Machine Learning.
[6]
Littman, M. 1994. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning.
[7]
Silver, D., Huang, A., Maddison, C. J., Guez, A., and Hassabis, D. 2016. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484--489.
[8]
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., and Guez, A., et al. 2017. Mastering the game of go without human knowledge. Nature, 550(7676), 354--359.
[9]
Xiong, Y., Chen, H., Zhao, M., and An, B. 2018. HogRider: Champion Agent of Microsoft Malmo Collaborative AI Challenge. In Thirty-Second AAAI Conference on Artificial Intelligence.
[10]
Wang, W., Hao, J., Wang, Y., and Taylor, M. 2018. Towards cooperation in sequential prisoner's dilemmas: a deep multiagent reinforcement learning approach.
[11]
Strens, M. 2000. A Bayesian Framework for Reinforcement Learning. Seventeenth International Conference on Machine Learning.
[12]
Keramati, M., Dezfouli, A., Piray, P., and Behrens, T. 2011. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Computational Biology, 7(5), e1002055.
[13]
Delgado, K. V., Sanner, S., and Barros, L. N. D. 2011. Efficient solutions to factored mdps with imprecise transition probabilities. Artificial Intelligence, 175(9-10), 1498--1527.
[14]
Chowdhury, A., and Koval, D. Fundamentals of probability and statistics. Power Distribution System Reliability Practical Methods And Applications.
[15]
Mojang. Minecraft. https://minecraft.net/en-us/.
[16]
Banerjee, B., and Stone, P. 2007. General Game Learning Using Knowledge Transfer. IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-12, 2007. Morgan Kaufmann Publishers Inc.

Cited By

View all

Index Terms

  1. Value Function Dynamic Estimation in Reinforcement Learning based on Data Adequacy

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence
    July 2020
    276 pages
    ISBN:9781450375603
    DOI:10.1145/3409501
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Xi'an Jiaotong-Liverpool University: Xi'an Jiaotong-Liverpool University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 August 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. confidence interval
    2. dynamic programming
    3. probability distribution
    4. q-learning
    5. reinforcement learning

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    HPCCT & BDAI 2020

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 06 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media