skip to main content
10.5555/1630659.1630804guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

SMDP homomorphisms: an algebraic approach to abstraction in semi-Markov decision processes

Published: 09 August 2003 Publication History

Abstract

To operate effectively in complex environments learning agents require the ability to selectively ignore irrelevant details and form useful abstractions. In this article we consider the question of what constitutes a useful abstraction in a stochastic sequential decision problem modeled as a semi-Markov Decision Process (SMDPs). We introduce the notion of SMDP homomorphism and argue that it provides a useful tool for a rigorous study of abstraction for SMDPs. We present an SMDP minimization framework and an abstraction framework for factored MDPs based on SMDP homomorphisms. We also model different classes of abstractions that arise in hierarchical systems. Although we use the options framework for purposes of illustration, the ideas are more generally applicable. We also show that the conditions for abstraction we employ are a generalization of earlier work by Dietterich as applied to the options framework.

References

[1]
{Arbib and Manes, 1975} M. A. Arbib and E. G. Manes. Arrows, Structures and Functors. Academic Press, New York, NY, 1975.
[2]
{Boutilicr and Dearden, 1994} C. Boutilier and R. Dearden. Using abstractions for decision theoretic planning with time constraints. In Proceedings of the AAAI-94, pages 1016-1022. AAAI, 1994.
[3]
{Boutilier et al, 1995} C. Boutilier, R. Dearden, and M. Goldszmidt. Exploiting structure in policy construction. In Proceedings of International Joint Conference on Artificial Intelligence pages 1104-1111, 1995.
[4]
{Boutilier et al, 2001} Craig Boutilier, Ray Reiter, and Robert Price. Symbolic dynamic programming for first-order mdps. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 541-547, 2001.
[5]
{Dean and Givan, 1997} T. Dean and R. Givan. Model minimization in Markov decision processes. In Proceedings of AAAI-97, pages 106-111. AAAI, 1997.
[6]
{Dean and Kanazawa, 1989} Thomas Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computer Intelligence, 5(3): 142-150, 1989.
[7]
{Dietterich, 2000} T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Artificial Intelligence Research, 13:227-303, 2000.
[8]
{Emerson and Sistla, 1996} E. A. Emerson and A. P. Sistla. Symmetry and model checking. Formal Methods in System Design, 9(1/2):105-131, 1996.
[9]
{Givan et al, 2000} R. Givan, S. Leach, and T. Dean. Bounded-parameter Markov decision processes. Artificial Intelligence, 122:71-109, 2000.
[10]
{Givan et al, 2003} R. Givan, T. Dean, and M. Greig. Equivalence notions and model minimization in Markov decision processes. To appear in Artificial Intelligence, 2003.
[11]
{Hartmanis and Steams, 1966} J. Hartmanis and R. E. Steams. Algebraic Structure Theory of Sequential Machines. Prentice-Hall, Englewood Cliffs, NJ, 1966.
[12]
{Iba, 1989} Glenn A. Iba. A heuristic approach to the discovery of macro-operators. Machine Learning, 3:285-317, 1989.
[13]
{Kemeny and Snell, 1960} J. G. Kemeny and J. L. Snell. Finite Markov Chains. Van Nostrand, Princeton, NJ, 1960.
[14]
{Parr and Russell, 1997} Ronald Parr and Stuart Russell. Reinforcement learning with hierarchies of machines. In Proceedings of Advances in Neural Information Processing Systems 10, pages 1043-1049. MIT Press, 1997.
[15]
{Ravindran and Barto, 2001} B. Ravindran and A. G. Barto. Symmetries and model minimization of Markov decision processes. Technical Report 01-43, University of Massachusetts, Amherst, 2001.
[16]
{Ravindran and Barto, 2002} Balaraman Ravindran and Andrew G. Barto. Model minimization in hierarchical reinforcement learning. In Sven Koenig and Robert C. Holte, editors, Proceedings of the Fifth Symposium on Abstraction, Reformulation and Approximation (SARA 2002), Lecture Notes in Artificial Intelligence 2371, pages 196-211, New York, NY, August 2002. Springer-Verlag.
[17]
{Sutton et al., 1999} Richard S. Sutton, Doina Precup, and Satinder Singh. Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181-211, 1999.
[18]
{Zeigler, 1972} Bernard P. Zeigler. On the formulation of problems in simulation and modelling in the framework of mathematical system theory. In Proceedings of the Sixth International Congress on Cybernetics, pages 363-385. Association Internationale de Sybernetique, 1972.
[19]
{Zinkevich and Balch, 2001} M. Zinkevich and T. Balch. Symmetry in Markov decision processes and its implications for single agent and multi agent learning. In Proceedings of the 18th International Conference on Machine Learning, pages 632-640, San Francisco, CA, 2001. Morgan Kaufmann.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
IJCAI'03: Proceedings of the 18th international joint conference on Artificial intelligence
August 2003
1674 pages

Publisher

Morgan Kaufmann Publishers Inc.

San Francisco, CA, United States

Publication History

Published: 09 August 2003

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media