skip to main content
10.3115/1075096.1075124dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Closing the gap: learning-based information extraction rivaling knowledge-engineering methods

Published: 07 July 2003 Publication History

Abstract

In this paper, we present a learning approach to the scenario template task of information extraction, where information filling one template could come from multiple sentences. When tested on the MUC-4 task, our learning approach achieves accuracy competitive to the best of the MUC-4 systems, which were all built with manually engineered rules. Our analysis reveals that our use of full parsing and state-of-the-art learning algorithms have contributed to the good performance. To our knowledge, this is the first research to have demonstrated that a learning approach to the full-scale information extraction task could achieve performance rivaling that of the knowledge engineering approach.

References

[1]
M. E. Califf and R. J. Mooney. 1999. Relational learning of pattern-match rules for information extraction. In Proceedings of AAAI99, pages 328--334.
[2]
E. Charniak, C. Hendrickson, N. Jacobson, and M. Perkowitz. 1993. Equations for part-of-speech tagging. In Proceedings of AAA193, pages 784--789.
[3]
H. L. Chieu and H. T. Ng. 2002a. A maximum entropy approach to information extraction from semi-structured and free text. In Proceedings of AAAI02, pages 786--791.
[4]
H. L. Chieu and H. T. Ng. 2002b. Named entity recognition: A maximum entropy approach using global information. In Proceedings of COLING02, pages 190--196.
[5]
F. Ciravegna. 2001. Adaptive information extraction from text by rule induction and generalisation. In Proceedings of IJCAI01, pages 1251--1256.
[6]
M. Collins. 1999. Head-driven statistical models for natural language parsing. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania.
[7]
R. O. Duda and P. E. Hart. 1973. Pattern Classification and Scene Analysis. Wiley, New York.
[8]
D. Fisher, S. Soderland, J. McCarthy, F. Feng, and W. Lehnert. 1995. Description of the UMass system as used for MUC-6. In Proceedings of MUC-6, pages 127--140.
[9]
D. Gildea and D. Jurafsky. 2000. Automatic labelling of semantic roles. In Proceedings of ACL00, pages 512--520.
[10]
S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, R. Weischedel, and the Annotation Group. 1998. Algorithms that learn to extract information BBN: Description of the SIFT system as used for MUC-7. In Proceedings of MUC-7.
[11]
J. R. Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco.
[12]
A. Ratnaparkhi. 1998. Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania.
[13]
L. Rau, G. Krupka, and P. Jacobs. 1992. GE NL-TOOLSET: MUC-4 test results and analysis. In Proceedings of MUC-4, pages 94--99.
[14]
D. Roth and W. Yih. 2001. Relational learning via propositional algorithms: An information extraction case study. In Proceedings of IJACI01, pages 1257--1263.
[15]
S. Soderland. 1999. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1/2/3):233--272.
[16]
W. M. Soon, H. T. Ng, and D. C. Y. Lim. 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521--544.
[17]
V. N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York.

Cited By

View all
  1. Closing the gap: learning-based information extraction rivaling knowledge-engineering methods

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
      July 2003
      571 pages

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 07 July 2003

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate 85 of 443 submissions, 19%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)64
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 26 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media