Article

SMDP homomorphisms: an algebraic approach to abstraction in semi-Markov decision processes

Authors:

Balaraman Ravindran,

Andrew G. BartoAuthors Info & Claims

IJCAI'03: Proceedings of the 18th international joint conference on Artificial intelligence

Pages 1011 - 1016

Published: 09 August 2003 Publication History

Abstract

To operate effectively in complex environments learning agents require the ability to selectively ignore irrelevant details and form useful abstractions. In this article we consider the question of what constitutes a useful abstraction in a stochastic sequential decision problem modeled as a semi-Markov Decision Process (SMDPs). We introduce the notion of SMDP homomorphism and argue that it provides a useful tool for a rigorous study of abstraction for SMDPs. We present an SMDP minimization framework and an abstraction framework for factored MDPs based on SMDP homomorphisms. We also model different classes of abstractions that arise in hierarchical systems. Although we use the options framework for purposes of illustration, the ideas are more generally applicable. We also show that the conditions for abstraction we employ are a generalization of earlier work by Dietterich as applied to the options framework.

References

[1]

{Arbib and Manes, 1975} M. A. Arbib and E. G. Manes. Arrows, Structures and Functors. Academic Press, New York, NY, 1975.

[2]

{Boutilicr and Dearden, 1994} C. Boutilier and R. Dearden. Using abstractions for decision theoretic planning with time constraints. In Proceedings of the AAAI-94, pages 1016-1022. AAAI, 1994.

Digital Library

[3]

{Boutilier et al, 1995} C. Boutilier, R. Dearden, and M. Goldszmidt. Exploiting structure in policy construction. In Proceedings of International Joint Conference on Artificial Intelligence pages 1104-1111, 1995.

Digital Library

[4]

{Boutilier et al, 2001} Craig Boutilier, Ray Reiter, and Robert Price. Symbolic dynamic programming for first-order mdps. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 541-547, 2001.

[5]

{Dean and Givan, 1997} T. Dean and R. Givan. Model minimization in Markov decision processes. In Proceedings of AAAI-97, pages 106-111. AAAI, 1997.

Digital Library

[6]

{Dean and Kanazawa, 1989} Thomas Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computer Intelligence, 5(3): 142-150, 1989.

Digital Library

[7]

{Dietterich, 2000} T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Artificial Intelligence Research, 13:227-303, 2000.

Digital Library

[8]

{Emerson and Sistla, 1996} E. A. Emerson and A. P. Sistla. Symmetry and model checking. Formal Methods in System Design, 9(1/2):105-131, 1996.

Digital Library

[9]

{Givan et al, 2000} R. Givan, S. Leach, and T. Dean. Bounded-parameter Markov decision processes. Artificial Intelligence, 122:71-109, 2000.

Digital Library

[10]

{Givan et al, 2003} R. Givan, T. Dean, and M. Greig. Equivalence notions and model minimization in Markov decision processes. To appear in Artificial Intelligence, 2003.

Digital Library

[11]

{Hartmanis and Steams, 1966} J. Hartmanis and R. E. Steams. Algebraic Structure Theory of Sequential Machines. Prentice-Hall, Englewood Cliffs, NJ, 1966.

Digital Library

[12]

{Iba, 1989} Glenn A. Iba. A heuristic approach to the discovery of macro-operators. Machine Learning, 3:285-317, 1989.

Digital Library

[13]

{Kemeny and Snell, 1960} J. G. Kemeny and J. L. Snell. Finite Markov Chains. Van Nostrand, Princeton, NJ, 1960.

[14]

{Parr and Russell, 1997} Ronald Parr and Stuart Russell. Reinforcement learning with hierarchies of machines. In Proceedings of Advances in Neural Information Processing Systems 10, pages 1043-1049. MIT Press, 1997.

Digital Library

[15]

{Ravindran and Barto, 2001} B. Ravindran and A. G. Barto. Symmetries and model minimization of Markov decision processes. Technical Report 01-43, University of Massachusetts, Amherst, 2001.

Digital Library

[16]

{Ravindran and Barto, 2002} Balaraman Ravindran and Andrew G. Barto. Model minimization in hierarchical reinforcement learning. In Sven Koenig and Robert C. Holte, editors, Proceedings of the Fifth Symposium on Abstraction, Reformulation and Approximation (SARA 2002), Lecture Notes in Artificial Intelligence 2371, pages 196-211, New York, NY, August 2002. Springer-Verlag.

Digital Library

[17]

{Sutton et al., 1999} Richard S. Sutton, Doina Precup, and Satinder Singh. Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181-211, 1999.

Digital Library

[18]

{Zeigler, 1972} Bernard P. Zeigler. On the formulation of problems in simulation and modelling in the framework of mathematical system theory. In Proceedings of the Sixth International Congress on Cybernetics, pages 363-385. Association Internationale de Sybernetique, 1972.

[19]

{Zinkevich and Balch, 2001} M. Zinkevich and T. Balch. Symmetry in Markov decision processes and its implications for single agent and multi agent learning. In Proceedings of the 18th International Conference on Machine Learning, pages 632-640, San Francisco, CA, 2001. Morgan Kaufmann.

Digital Library

Cited By

Oliehoek FWitwicki SKaelbling L(2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12136
Wang HDong SShao L(2019)Measuring structural similarities in finite MDPsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367553(3684-3690)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367471.3367553
Seijen HWhiteson SKester L(2014)EFFICIENT ABSTRACTION SELECTION IN REINFORCEMENT LEARNINGComputational Intelligence10.1111/coin.1201630:4(657-699)Online publication date: 1-Nov-2014
https://dl.acm.org/doi/10.1111/coin.12016
Show More Cited By

Recommendations

Plannable Approximations to MDP Homomorphisms: Equivariance under Actions
AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems

This work exploits action equivariance for representation learning in reinforcement learning. Equivariance under actions states that transitions in the input space are mirrored by equivalent transitions in latent space, while the map and transition ...
Natural Gradient Policy for Average Cost SMDP Problem
ICTAI '07: Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 01

Semi-Markov decision processes (SMDP) are continu- ous time generlizations of discrete time Markov Decision Process. A number of value and policy iteration algorithms have been developed for the solution of SMDP problem. But solving SMDP problem ...
Transfer via soft homomorphisms
AAMAS '09: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2

The field of transfer learning aims to speed up learning across multiple related tasks by transferring knowledge between source and target tasks. Past work has shown that when the tasks are specified as Markov Decision Processes (MDPs), a function that ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

IJCAI'03: Proceedings of the 18th international joint conference on Artificial intelligence

August 2003

1674 pages

Publisher

Morgan Kaufmann Publishers Inc.

San Francisco, CA, United States

Publication History

Published: 09 August 2003

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Oliehoek FWitwicki SKaelbling L(2021)A Sufficient Statistic for Influence in Structured Multiagent EnvironmentsJournal of Artificial Intelligence Research10.1613/jair.1.1213670(789-870)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.12136
Wang HDong SShao L(2019)Measuring structural similarities in finite MDPsProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367471.3367553(3684-3690)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367471.3367553
Seijen HWhiteson SKester L(2014)EFFICIENT ABSTRACTION SELECTION IN REINFORCEMENT LEARNINGComputational Intelligence10.1111/coin.1201630:4(657-699)Online publication date: 1-Nov-2014
https://dl.acm.org/doi/10.1111/coin.12016
Mahadevan SOsentoski SJohns JFerguson KWang C(2007)Learning to plan using harmonic analysis of diffusion modelsProceedings of the Seventeenth International Conference on International Conference on Automated Planning and Scheduling10.5555/3037176.3037206(224-231)Online publication date: 22-Sep-2007
https://dl.acm.org/doi/10.5555/3037176.3037206
Kuter UHu J(2007)Computing and using lower and upper bounds for action elimination in MDP planningProceedings of the 7th International conference on Abstraction, reformulation, and approximation10.5555/1770681.1770703(243-257)Online publication date: 18-Jul-2007
https://dl.acm.org/doi/10.5555/1770681.1770703
Narayanamurthy SRavindran B(2007)Efficiently exploiting symmetries in real time dynamic programmingProceedings of the 20th international joint conference on Artifical intelligence10.5555/1625275.1625687(2556-2561)Online publication date: 6-Jan-2007
https://dl.acm.org/doi/10.5555/1625275.1625687
Talvitie ESingh S(2007)An experts algorithm for transfer learningProceedings of the 20th international joint conference on Artifical intelligence10.5555/1625275.1625448(1065-1070)Online publication date: 6-Jan-2007
https://dl.acm.org/doi/10.5555/1625275.1625448
Ravindran BBarto AMathew V(2007)Deictic option schemasProceedings of the 20th international joint conference on Artifical intelligence10.5555/1625275.1625441(1023-1028)Online publication date: 6-Jan-2007
https://dl.acm.org/doi/10.5555/1625275.1625441
Stone P(2007)Learning and multiagent reasoning for autonomous agentsProceedings of the 20th international joint conference on Artifical intelligence10.5555/1625275.1625278(13-30)Online publication date: 6-Jan-2007
https://dl.acm.org/doi/10.5555/1625275.1625278
Osentoski SMahadevan S(2007)Learning state-action basis functions for hierarchical MDPsProceedings of the 24th international conference on Machine learning10.1145/1273496.1273585(705-712)Online publication date: 20-Jun-2007
https://dl.acm.org/doi/10.1145/1273496.1273585
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents