skip to main content
10.3115/1218955.1219013dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Alternative approaches for generating bodies of grammar rules

Published: 21 July 2004 Publication History

Abstract

We compare two approaches for describing and generating bodies of rules used for natural language parsing. In today's parsers rule bodies do not exist a priori but are generated on the fly, usually with methods based on n-grams, which are one particular way of inducing probabilistic regular languages. We compare two approaches for inducing such languages. One is based on n-grams, the other on minimization of the Kullback-Leibler divergence. The inferred regular languages are used for generating bodies of rules inside a parsing procedure. We compare the two approaches along two dimensions: the quality of the probabilistic regular language they produce, and the performance of the parser they were used to build. The second approach outperforms the first one along both dimensions.

References

[1]
S. Abney, D. McAllester, and F. Pereira. 1999. Relating probabilistic grammars and automata. In Proc. 37th Annual Meeting of the ACL, pages 542--549.
[2]
T. Booth and R. Thompson. 1973. Applying probability measures to abstract languages. IEEE Transaction on Computers, C-33(5):442--450.
[3]
R. Carrasco and J. Oncina. 1994. Learning stochastic regular grammars by means of state merging method. In Proc. ICGI-94, Springer, pages 139--150.
[4]
E. Charniak. 1997. Statistical parsing with a context-free grammar and word statistics. In Proc. 14th Nat. Conf. on Artificial Intelligence, pages 598--603.
[5]
G. Chastellier and A. Colmerauer. 1969. W-grammar. In Proc. 1969 24th National Conf., pages 511--518.
[6]
M. Collins. 1996. A new statistical parser based on bigram lexical dependencies. In Proc. 34th Annual Meeting of the ACL, pages 184--191.
[7]
M. Collins. 1997. Three generative, lexicalized models for statistical parsing. In Proc. 35th Annual Meeting of the ACL and 8th Conf. of the EACL, pages 16--23.
[8]
M. Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, PA.
[9]
M. Collins. 2000. Discriminative reranking for natural language parsing. In Proc. ICML-2000, Stanford, Ca.
[10]
T. Cover and J. Thomas. 1991. Elements of Information Theory. Jonh Wiley and Sons, New York.
[11]
F. Denis. 2001. Learning regular languages from simple positive examples. Machine Learning, 44(1/2):37--66.
[12]
P. Dupont and L. Chase. 1998. Using symbol clustering to improve probabilistic automaton inference. In Proc. ICGI-98, pages 232--243.
[13]
J. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration. In Proc. COLING-96, pages 340--245, Copenhagen, Denmark.
[14]
J. Eisner. 2000. Bilexical grammars and their cubic-time parsing algorithms. In Advances in Probabilistic and Other Parsing Technologies, pages 29--62. Kluwer.
[15]
E. M. Gold. 1967. Language identification in the limit. Information and Control, 10:447--474.
[16]
G. Infante-Lopez and M. de Rijke. 2003. Natural language parsing with W-grammars. In Proc. CLIN 2003.
[17]
D. Klein and C. Manning. 2003. Accurate unlexicalized parsing. In Proc. 41st Annual Meeting of the ACL.
[18]
A. Krotov, M. Hepple, R. J. Gaizauskas, and Y. Wilks. 1998. Compacting the Penn Treebank grammar. In Proc. COLING-ACL, pages 699--703.
[19]
G. Kruijff. 2003. 3-phase grammar learning. In Proc. Workshop on Ideas and Strategies for Multilingual Grammar Development.
[20]
D. Lin. 1995. A dependency-based method for evaluating broad-coverage parsers. In Proc. IJCAI-95.
[21]
K. Sima'an. 2000. Tree-gram Parsing: Lexical Dependencies and Structural Relations. In Proc. 38th Annual Meeting of the ACL, pages 53--60, Hong Kong, China.
[22]
F. Thollard, P. Dupont, and C. de la Higuera. 2000. Probabilistic DFA inference using kullback-leibler divergence and minimality. In Proc. ICML 2000.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
July 2004
729 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 21 July 2004

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)3
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media