research-article

Value Function Dynamic Estimation in Reinforcement Learning based on Data Adequacy

Authors:

Langcai CaoAuthors Info & Claims

HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence

Pages 204 - 208

https://doi.org/10.1145/3409501.3409517

Published: 25 August 2020 Publication History

Abstract

In recent years, reinforcement learning has played an important role in the study of decision problem in computer games. To solve the problem of how to better estimate the value function with limited computational resources, this paper proposes a dynamic estimation method of value function based on data adequacy. In consideration of the varying complexity of each state in the MDP model, we propose a dynamic value function estimation method which is different from the fixed value function estimation method in traditional methods. Based on the PigChase challenge of the Malmo project launched by Microsoft in 2017, we compare the new method with the existing techniques. Experimental results show that the performance of the proposed algorithm is better than traditional algorithms.

References

[1]

Sutton, R., and Barto, A. 1998. Reinforcement Learning:An Introduction.

[2]

Bellman, R. 1957. On a dynamic programming approach to the caterer problem-i. Management Science, 3.

[3]

Watkins, C. J. C. H., and Dayan, P. 1992. Q-learning. Machine Learning, 8(3-4), 279--292.

Digital Library

[4]

Project Malmo. https://www.microsoft.com/en-us/research/project/project-malmo/.

[5]

Hu, J. and Wellman, M. 1998. Multi-agent reinforcement learning: Theoretical framework andnan algorithms. In Proc. 15th International Conference on Machine Learning.

[6]

Littman, M. 1994. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning.

Digital Library

[7]

Silver, D., Huang, A., Maddison, C. J., Guez, A., and Hassabis, D. 2016. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484--489.

[8]

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., and Guez, A., et al. 2017. Mastering the game of go without human knowledge. Nature, 550(7676), 354--359.

[9]

Xiong, Y., Chen, H., Zhao, M., and An, B. 2018. HogRider: Champion Agent of Microsoft Malmo Collaborative AI Challenge. In Thirty-Second AAAI Conference on Artificial Intelligence.

[10]

Wang, W., Hao, J., Wang, Y., and Taylor, M. 2018. Towards cooperation in sequential prisoner's dilemmas: a deep multiagent reinforcement learning approach.

[11]

Strens, M. 2000. A Bayesian Framework for Reinforcement Learning. Seventeenth International Conference on Machine Learning.

[12]

Keramati, M., Dezfouli, A., Piray, P., and Behrens, T. 2011. Speed/accuracy trade-off between the habitual and the goal-directed processes. PLoS Computational Biology, 7(5), e1002055.

[13]

Delgado, K. V., Sanner, S., and Barros, L. N. D. 2011. Efficient solutions to factored mdps with imprecise transition probabilities. Artificial Intelligence, 175(9-10), 1498--1527.

Digital Library

[14]

Chowdhury, A., and Koval, D. Fundamentals of probability and statistics. Power Distribution System Reliability Practical Methods And Applications.

[15]

Mojang. Minecraft. https://minecraft.net/en-us/.

[16]

Banerjee, B., and Stone, P. 2007. General Game Learning Using Knowledge Transfer. IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-12, 2007. Morgan Kaufmann Publishers Inc.

Cited By

Liu YZeng YMa BPan YGao HZhang Y(2024)Improved Demonstration-Knowledge Utilization in Reinforcement LearningIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33288485:5(2139-2150)Online publication date: May-2024
https://doi.org/10.1109/TAI.2023.3328848
Chen YZhang FLiu Z(2024)Adaptive bias-variance trade-off in advantage estimator for actor–critic algorithmsNeural Networks10.1016/j.neunet.2023.10.023169(764-777)Online publication date: Jan-2024
https://doi.org/10.1016/j.neunet.2023.10.023
Chen YZhang FLiu Z(2021)Adaptive Advantage Estimation for Actor-Critic Algorithms2021 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN52387.2021.9534005(1-8)Online publication date: 18-Jul-2021
https://doi.org/10.1109/IJCNN52387.2021.9534005

Index Terms

Value Function Dynamic Estimation in Reinforcement Learning based on Data Adequacy
1. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling
      1. Planning under uncertainty

Recommendations

Explanation-Based Learning and Reinforcement Learning: A Unified View

In speedup-learning problems, where full descriptions of operators are known, both explanation-based learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both methods involve fundamentally the same process of ...
Evaluation of reinforcement learning techniques
IITM '10: Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia

Reinforcement learning is became one of the most important approaches to machine intelligence. Now RL is widely use by different research field as intelligent control, robotics and neuroscience. It provides us possible solution within unknown ...
A Huber reward function-driven deep reinforcement learning solution for cart-pole balancing problem
Abstract
Lots of learning tasks require experience learning based on activities performed in real scenarios which are affected by environmental factors. Therefore, real-time systems demand a model to learn from working experience—such as physical object ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HPCCT & BDAI '20: Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence

July 2020

276 pages

ISBN:9781450375603

DOI:10.1145/3409501

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Xi'an Jiaotong-Liverpool University: Xi'an Jiaotong-Liverpool University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

HPCCT & BDAI 2020

HPCCT & BDAI 2020: 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence

July 3 - 6, 2020

Qingdao, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
57
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu YZeng YMa BPan YGao HZhang Y(2024)Improved Demonstration-Knowledge Utilization in Reinforcement LearningIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33288485:5(2139-2150)Online publication date: May-2024
https://doi.org/10.1109/TAI.2023.3328848
Chen YZhang FLiu Z(2024)Adaptive bias-variance trade-off in advantage estimator for actor–critic algorithmsNeural Networks10.1016/j.neunet.2023.10.023169(764-777)Online publication date: Jan-2024
https://doi.org/10.1016/j.neunet.2023.10.023
Chen YZhang FLiu Z(2021)Adaptive Advantage Estimation for Actor-Critic Algorithms2021 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN52387.2021.9534005(1-8)Online publication date: 18-Jul-2021
https://doi.org/10.1109/IJCNN52387.2021.9534005

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents