skip to main content
10.1145/1141277.1141524acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Using the structure of documents to improve the discovery of unexpected information

Published: 23 April 2006 Publication History

Abstract

In this paper we are interested in taking into account the structure of the documents during the discovery of unexpected information in textual databases. Following a work that aimed at designing and integrating, in the UnexpectedMiner system, some measures for the evaluation of the unexpectedness of documents, we wanted to improve the system by taking into account the structure of the documents processed. Each part of the documents are weighted by some coefficients whose values are determined by optimization techniques. Those coefficients are then used by the system in the unexpectedness measures to determine if a document contains some unexpected information or not. The efficiency of our new system is then evaluated and the experiments put forward the improvements induced by the use of the structure of the documents.

References

[1]
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pages 194--218, 1998.
[2]
K. K. Bun and M. Ishizuka. Emerging topic tracking system. In Proceedings of the 1st International Conference on Web Intelligence, LNCS 2198, pages 125--130, 2001.
[3]
T. Dkaki and J. Mothe. Trec novelty track at irit-sig. In The Twelfth Text REtrieval Conference (TREC 2003), pages 337--342, 2003.
[4]
F. Fourel. Modelling, indexing and retrieval of structured documents. PhD thesis, University of Grenoble I, France, 1998.
[5]
F. Jacquenet and C. Largeron. Discovering Unexpected Information for Technology Watch. In Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), LNCS 3202, pages 219--230, 2004.
[6]
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. In Sciences, pages 671--680, 1983.
[7]
B. Liu, Y. Ma, and P. S. Yu. Discovering unexpected information from your competitors' web sites. In Proceedings of the 7th ACM SIGKDD international conference on Knowledge Discovery and Data mining, pages 144--153, 2001.
[8]
N. Matsumura, Y. Ohsawa, and M. Ishizuka. Discovery of emerging topics between communities on WWW. In Proceedings of the 1st International Conference on Web Intelligence, LNCS 2198, pages 473--482, 2001.
[9]
Y. Ohsawa, N. E. Benson, and M. Yachida. Keygraph: Automatic indexing by co-occurrence graph based on building construction metaphor. In Proceedings of the Advances in Digital Libraries Conference, pages 12--18, 1998.
[10]
B. Piwowarski. Machine Learning for Processing Structured Information: Application to Information retrieval. PhD thesis, University Paris VI, France, 2003.
[11]
G. Salton and M. J. McGill. Introduction to modern information retrieval. In McGraw-Hill, 1983.
[12]
I. Soboroff and D. Harman. Overview of the trec 2003 novelty track. In NIST Special Publication:SP 500--255, pages 38--53. The Twelfth Text Retrieval Conference (TREC 2003), 2003.
[13]
J. Swets. Information retrieval systems. Science, 141:245--250, 1963.
[14]
C. Wayne. Topic detection and tracking (tdt) overview and perspective. http://www.nist.gov/speech/publications/darpa-98/html/tdt10/tdt10.htm, 1998.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
April 2006
1967 pages
ISBN:1595931082
DOI:10.1145/1141277
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information retrieval
  2. structure of documents
  3. text mining
  4. unexpected information

Qualifiers

  • Article

Conference

SAC06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media