skip to main content
10.1145/509907.510021acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

Approximating the smallest grammar: Kolmogorov complexity in natural models

Published: 19 May 2002 Publication History

Abstract

We consider the problem of finding the smallest context-free grammar that generates exactly one given string of length n. The size of this grammar is of theoretical interest as an efficiently computable variant of Kolmogorov complexity. The problem is of practical importance in areas such as data compression and pattern extraction.The smallest grammar is known to be hard to approximate to within a constant factor, and an o(logn/log logn) approximation would require progress on a long-standing algebraic problem [10]. Previously, the best proved approximation ratio was O(n1/2) for the Bisection algorithm [8]. Our main result is an exponential improvement of this ratio; we give an O(log (n/g*)) approximation algorithm, where g* is the size of the smallest grammar.We then consider other computable variants of Kolomogorov complexity. In particular we give an O(log2 n) approximation for the smallest non-deterministic finite automaton with advice that produces a given string. We also apply our techniques to "advice-grammars" and "edit-grammars", two other natural models of string complexity.

References

[1]
A. Apostolico and S. Lonardi. Some theory and practice of greedy off-line textual substitution. In J. A. Storer and M. Cohn, editors, Data Compression Conference, pages 119--128, Snowbird, Utah, 1998.
[2]
V. Chvátal. A greedy heuristic for the set-covering problem. (MATH)ematics of Operations Research, 4(3):233--235, 1979.
[3]
T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, New York, NY, USA, 1991.
[4]
C. de Marcken. Unsupervised Language Acquisition. PhD thesis, Massachusetts Institute of Technology, 1996.
[5]
M. Farach-Colton. Personal communication.
[6]
T. Kida, Y. Shibata, M. Takeda, A. Shinohara, and S. Arikawa. A unifying framework for compressed pattern matching. In SPIRE/CRIWG, pages 89--96, 1999.
[7]
J. C. Kieffer and E. hui Yang. Grammar-based codes: A new class of universal lossless source codes. IEEE Transactions on Information Theory, 46(3):737--754, 2000.
[8]
J. C. Kieffer, E. hui Yang, G. J. Nelson, and P. Cosman. Universal lossless compression via multilevel pattern matching. IEEE Transactions on Information Theory, 46(4):1227--1245, 2000.
[9]
N. J. Larsson and A. Moffat. Offline dictionary-based compression. In Data Compression Conference, pages 296--305, 1999.
[10]
E. Lehman and A.Shelat. Approximations algorithms for grammar-based compression. In Thirteenth Annual Symposium on Discrete Algorithms (SODA'02), 2002.
[11]
C. Nevill-Manning. Inferring Sequential Structure. PhD thesis, University of Waikato, 1996.
[12]
C. G. Nevill-Manning and I. H. Witten. Compression and explanation using hierarchical grammars. The Computer Journal, 40(2/3):103--116, 1997.
[13]
J. A. Storer. Data Compression: Methods and Complexity Issues. PhD thesis, Princeton University, 1979.
[14]
E. h. Yang and J. C. Kieffer. Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform--part one: Without context models. IEEE Transactions on Information Theory, 46(3):755--777, 2000.
[15]
J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, IT-23(3):337--343, 1977.
[16]
J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, IT-24:530--536, 1978.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
STOC '02: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
May 2002
840 pages
ISBN:1581134959
DOI:10.1145/509907
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 May 2002

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

STOC02
Sponsor:
STOC02: Symposium on the Theory of Computing
May 19 - 21, 2002
Quebec, Montreal, Canada

Acceptance Rates

STOC '02 Paper Acceptance Rate 91 of 287 submissions, 32%;
Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25
57th Annual ACM Symposium on Theory of Computing (STOC 2025)
June 23 - 27, 2025
Prague , Czech Republic

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media