skip to main content
10.5555/1642293.1642459guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A probabilistic model of redundancy in information extraction

Published: 30 July 2005 Publication History

Abstract

Unsupervised Information Extraction (UIE) is the task of extracting knowledge from text without using hand-tagged training examples. A fundamental problem for both UIE and supervised IE is assessing the probability that extracted information is correct. In massive corpora such as the Web, the same extraction is found repeatedly in different documents. How does this redundancy impact the probability of correctness?
This paper introduces a combinatorial "balls-andurns" model that computes the impact of sample size, redundancy, and corroboration from multiple distinct extraction rules on the probability that an extraction is correct. We describe methods for estimating the model's parameters in practice and demonstrate experimentally that for UIE the model's log likelihoods are 15 times better, on average, than those obtained by Pointwise Mutual Information (PMI) and the noisy-or model used in previous work. For supervised IE, the model's performance is comparable to that of Support Vector Machines, and Logistic Regression.

References

[1]
{Agichtein and Gravano, 2000} E. Agichtein and L. Gravano. Snowball: Extracting relations from large plaintext collections. In Proc. of the 5th ACM Intl. Conf. on Digital Libraries, 2000.
[2]
{Chang and Lin, 2001} C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001.
[3]
{Culotta and McCallum, 2004} A. Culotta and A. McCallum. Confidence estimation for information extraction. In HLT-NAACL, 2004.
[4]
{Etzioni et al., 2004} O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Web-scale information extraction in system x: (preliminary results). In WWW, 2004.
[5]
{Etzioni et al., 2005} O. Etzioni, M. Cafarella, D. Downey, S. Kok, A. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Unsupervised named-entity extraction from the web: An experimental study. In To appear in AIJ, 2005.
[6]
{Gale and Sampson, 1995} W. A. Gale and G. Sampson. Good-turing frequency estimation without tears. Journal of Quantitative Linguistics, 2(3):217-237, 1995.
[7]
{Lin et al., 2003} W. Lin, R. Yangarber, and R. Grishman. Bootstrapped learning of semantic classes. In ICML Workshop on The Continuum from Labeled to Unlabeled Data, 2003.
[8]
{Magnini et al., 2002} B. Magnini, M. Negri, R. Prevete, and H. Tanev. Is it the right answer? exploiting web redundancy for answer validation. In ACL, 2002.
[9]
{Milch et al., 2004} B. Milch, B. Marthi, and S. Russell. BLOG: Relational modeling with unknown objects. In ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields, 2004.
[10]
{Riloff and Jones, 1999} E. Riloff and R. Jones. Learning dictionaries for information extraction by multi-level bootstrapping. In AAAI/IAAI, 1999.
[11]
{Skounakis and Craven, 2003} M. Skounakis and M. Craven. Evidence combination in biomedical natural-language processing. In BIOKDD, 2003.
[12]
{Turney, 2001} P. D. Turney. Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. Lecture Notes in Computer Science, 2167:491-502, 2001.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
IJCAI'05: Proceedings of the 19th international joint conference on Artificial intelligence
July 2005
1775 pages

Sponsors

  • The International Joint Conferences on Artificial Intelligence, Inc.

Publisher

Morgan Kaufmann Publishers Inc.

San Francisco, CA, United States

Publication History

Published: 30 July 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media