article

On Actor-Critic Algorithms

Authors:

Vijay R. Konda,

John N. TsitsiklisAuthors Info & Claims

SIAM Journal on Control and Optimization, Volume 42, Issue 4

Pages 1143 - 1166

https://doi.org/10.1137/S0363012901385691

Published: 01 April 2003 Publication History

Abstract

In this article, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction, based on information provided by the critic. We show that the features for the critic should ideally span a subspace prescribed by the choice of parameterization of the actor. We study actor-critic algorithms for Markov decision processes with Polish state and action spaces. We state and prove two results regarding their convergence.

Cited By

View all

Panda PBhatnagar SKiyavash NMooij J(2024)Finite-time analysis of three-timescale constrained actor-critic and constrained natural actor-critic algorithmsProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702809(2787-2834)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702809
Wang YWang YZhou YZou SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Non-asymptotic analysis for single-loop (natural) actor-critic with compatible function approximationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694193(51771-51824)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694193
Agrawal SA. PMaguluri SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Policy evaluation for variance in average reward reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692091(471-502)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692091
Show More Cited By

Recommendations

Actor-Critic--Type Learning Algorithms for Markov Decision Processes

Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence ...
Actor-critic algorithms for hierarchical Markov decision processes

We consider the problem of control of hierarchical Markov decision processes and develop a simulation based two-timescale actor-critic algorithm in a general framework. We also develop certain approximation algorithms that require less computation and ...
Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes

This article proposes several two-timescale simulation-based actor-critic algorithms for solution of infinite horizon Markov Decision Processes with finite state-space under the average cost criterion. Two of the algorithms are for the compact (non-...

Comments

Information & Contributors

Information

Published In

cover image SIAM Journal on Control and Optimization

SIAM Journal on Control and Optimization Volume 42, Issue 4

2003

380 pages

ISSN:0363-0129

Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 April 2003

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

125
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Panda PBhatnagar SKiyavash NMooij J(2024)Finite-time analysis of three-timescale constrained actor-critic and constrained natural actor-critic algorithmsProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702809(2787-2834)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702809
Wang YWang YZhou YZou SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Non-asymptotic analysis for single-loop (natural) actor-critic with compatible function approximationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694193(51771-51824)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694193
Agrawal SA. PMaguluri SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Policy evaluation for variance in average reward reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692091(471-502)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692091
Singh NSaha IDastani MSichman JAlechina NDignum V(2024)Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique ExperiencesProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663037(1754-1762)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663037
Paul SWitter JChowdhury SHong JPark J(2024)Graph Learning-based Fleet Scheduling for Urban Air Mobility under Operational Constraints, Varying Demand & UncertaintiesProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3635976(638-645)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3605098.3635976
Wang JWang ZLi XKuang YShi ZZhu FYuan MZeng JZhang YWu F(2024)Learning to Cut via Hierarchical Sequence/Set Model for Efficient Mixed-Integer ProgrammingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.343271646:12(9697-9713)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1109/TPAMI.2024.3432716
Ghaffari AAsgharian MSavaria Y(2024)Statistical Hardware Design With Multimodel Active LearningIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.332098443:2(562-572)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TCAD.2023.3320984
Garg AJha S(2024)Deep deterministic policy gradient based multi-UAV control for moving convoy trackingEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107099126:PDOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107099
Zhang HLv XLiu YZou X(2024)Hedge transfer learning routing for dynamic searching and reconnoitering applications in 3D multimedia FANETsMultimedia Tools and Applications10.1007/s11042-023-15932-783:3(7505-7539)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s11042-023-15932-7
Ping YLiu YZhang LWang LXu X(2024)Enterprise and service−level scheduling of robot production services in cloud manufacturing with deep reinforcement learningJournal of Intelligent Manufacturing10.1007/s10845-023-02285-z35:8(3889-3916)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s10845-023-02285-z
Show More Cited By

Abstract

Cited By

Recommendations

Actor-Critic--Type Learning Algorithms for Markov Decision Processes

Actor-critic algorithms for hierarchical Markov decision processes

Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations