skip to main content
10.1145/345508.345563acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free access

Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods

Published: 01 July 2000 Publication History

Abstract

The recognition of Proper Nouns (PNs) is considered an important task in the area of Information Retrieval and Extraction. However the high performance of most existing PN classifiers heavily depends upon the availability of large dictionaries of domain-specific Proper Nouns, and a certain amount of manual work for rule writing or manual tagging. Though it is not a heavy requirement to rely on some existing PN dictionary (often these resources are available on the web), its coverage of a domain corpus may be rather low, in absence of manual updating. In this paper we propose a technique for the automatic updating of an PN Dictionary through the cooperation of an inductive and a probabilistic classifier. In our experiments we show that, whenever an existing PN Dictionary allows the identification of 50% of the proper nouns within a corpus, our technique allows, without additional manual effort, the successful recognition of about 90% of the remaining 50%.

References

[1]
Basili, g., Pazienza M.T., Velardi P., A (not-so) shallow parser for colloeational analysis. Proc. of Coling '94, Kyoto, Japan, 1994.
[2]
Basili, R., Marziali A., Pazienza M.T., Modelling syntax uncertainty in lexical acquisition from texts. Journal of Quantitative Linguistics, vol. 1, n. 1, 1994.
[3]
Bikel D., Miller S., Schwartz R. and Weischedel R., Nymble: a High-Performance Learning Name-finder. Proc. of 5th Conference on Applied natural Language Processing, Washington, 1997
[4]
A. Borthwick, J. Sterling, E. Agichten and R. Gnshman. NYU: Description of the MENE named Entity system as Used in MUC-7. Proc. of MUC-7, 1998
[5]
Brill, E., Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging, Computational Linguistics, vol. 21, n. 24, 1995.
[6]
Cowie, J. Description of the CRL/NMSU System Used for MUC-6. In {DARPA 1995}.
[7]
Cucchiarelli A. and Velardi P., Finding a Domain- Appropriate Sense Inventory for Semantically Tagging a Corpus. Int. Journal on Natural Language Engineering, December 1998
[8]
Cucchiarelli A. and Velardi P, Using Corpus Evidence for Automatic Gazetteer Extension. Proc. of Conf, on Language Resources and Evaluation, Granada, Spain, 28-30 May 1998
[9]
Defense Advanced Research Projects Agency. Proceedings of the Sixth Message Understanding Conference (MUC-6), Morgan Kaufinann.
[10]
Defense Advanced Research Projects Agency. Proceedings of the Seventh Message Understanding Conference (MUC- 7), Morgan Kaufmann.
[11]
Day, D., Robinson, P., Vilain, M., and Yeh, A. Description of the ALEMBIC system as used for MUC-7. In {DARPA 1998}.
[12]
Gale, W. K. Church and D. Yarowsky. One sense per discourse. Proc. of the DARPA speech and Natural Language workshop, Harriman, NY, February 1992
[13]
Grishman, R., J. Sterling, Generalizing Automatically Generated Selectional Patterns. Proc. of COLING '94, Kyoto, August 1994.
[14]
Humphreys, K., Gaizauskas, R., Cunningham, H., and Azzam, S. VIE Technical Specifications. Department of Computer Science, University of Sheffield.
[15]
Miller, George A., WordNet: a lexical database for English. Communications of the ACM 38 (11), November 1995, pp. 39 - 41
[16]
Quinlan, J. R., C4.5: Programs for machine learning, Morgan-Kaufmann, San Mateo, CA, 1993.
[17]
S. Sekine, NYU System for Japanese NE-MET2. Proc. of MUC-7, 1998
[18]
Vilain, M., and Day, D., Finite-state phrase parsing by rule sequences. Proceedings of COLING.96, vol. 1, pp. 274-279.
[19]
Yarowsky D., Word-Sense disambiguation using statistical models of Roget's categories trained on large corpora. Proc. of COLING 92, Nantes, July 1992.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
July 2000
396 pages
ISBN:1581132263
DOI:10.1145/345508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2000

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information extraction
  2. machine learning and IR
  3. natural language processing for IR
  4. text data mining

Qualifiers

  • Article

Conference

SIGIR00
Sponsor:
  • Greek Com Soc
  • SIGIR
  • Athens U of Econ & Business

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)14
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media