Article

Using the structure of documents to improve the discovery of unexpected information

Authors:

François Jacquenet,

Christine LargeronAuthors Info & Claims

SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

Pages 1036 - 1042

https://doi.org/10.1145/1141277.1141524

Published: 23 April 2006 Publication History

Get Access

Abstract

In this paper we are interested in taking into account the structure of the documents during the discovery of unexpected information in textual databases. Following a work that aimed at designing and integrating, in the UnexpectedMiner system, some measures for the evaluation of the unexpectedness of documents, we wanted to improve the system by taking into account the structure of the documents processed. Each part of the documents are weighted by some coefficients whose values are determined by optimization techniques. Those coefficients are then used by the system in the unexpectedness measures to determine if a document contains some unexpected information or not. The efficiency of our new system is then evaluated and the experiments put forward the improvements induced by the use of the structure of the documents.

References

[1]

J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pages 194--218, 1998.

Google Scholar

[2]

K. K. Bun and M. Ishizuka. Emerging topic tracking system. In Proceedings of the 1st International Conference on Web Intelligence, LNCS 2198, pages 125--130, 2001.

Digital Library

Google Scholar

[3]

T. Dkaki and J. Mothe. Trec novelty track at irit-sig. In The Twelfth Text REtrieval Conference (TREC 2003), pages 337--342, 2003.

Google Scholar

[4]

F. Fourel. Modelling, indexing and retrieval of structured documents. PhD thesis, University of Grenoble I, France, 1998.

Google Scholar

[5]

F. Jacquenet and C. Largeron. Discovering Unexpected Information for Technology Watch. In Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), LNCS 3202, pages 219--230, 2004.

Digital Library

Google Scholar

[6]

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. In Sciences, pages 671--680, 1983.

Crossref

Google Scholar

[7]

B. Liu, Y. Ma, and P. S. Yu. Discovering unexpected information from your competitors' web sites. In Proceedings of the 7th ACM SIGKDD international conference on Knowledge Discovery and Data mining, pages 144--153, 2001.

Digital Library

Google Scholar

[8]

N. Matsumura, Y. Ohsawa, and M. Ishizuka. Discovery of emerging topics between communities on WWW. In Proceedings of the 1st International Conference on Web Intelligence, LNCS 2198, pages 473--482, 2001.

Digital Library

Google Scholar

[9]

Y. Ohsawa, N. E. Benson, and M. Yachida. Keygraph: Automatic indexing by co-occurrence graph based on building construction metaphor. In Proceedings of the Advances in Digital Libraries Conference, pages 12--18, 1998.

Digital Library

Google Scholar

[10]

B. Piwowarski. Machine Learning for Processing Structured Information: Application to Information retrieval. PhD thesis, University Paris VI, France, 2003.

Google Scholar

[11]

G. Salton and M. J. McGill. Introduction to modern information retrieval. In McGraw-Hill, 1983.

Digital Library

Google Scholar

[12]

I. Soboroff and D. Harman. Overview of the trec 2003 novelty track. In NIST Special Publication:SP 500--255, pages 38--53. The Twelfth Text Retrieval Conference (TREC 2003), 2003.

Google Scholar

[13]

J. Swets. Information retrieval systems. Science, 141:245--250, 1963.

Crossref

Google Scholar

[14]

C. Wayne. Topic detection and tracking (tdt) overview and perspective. http://www.nist.gov/speech/publications/darpa-98/html/tdt10/tdt10.htm, 1998.

Google Scholar

Cited By

View all

Kamaruddin SHamdan ABakar AMat Nor F(2018)Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity functionIntelligent Data Analysis10.5555/2608519.260852816:3(487-511)Online publication date: 27-Dec-2018
https://dl.acm.org/doi/10.5555/2608519.2608528
Nishiyama RTakeuchi HNasukawa TWatanabe H(2008)Extracting Advantage Phrases That Hint at a New Technology's PotentialsProceedings of the 7th International Conference on Practical Aspects of Knowledge Management10.1007/978-3-540-89447-6_11(98-110)Online publication date: 22-Nov-2008
https://dl.acm.org/doi/10.1007/978-3-540-89447-6_11

Index Terms

Using the structure of documents to improve the discovery of unexpected information

Recommendations

Discovering unexpected documents in corpora

Text mining is widely used to discover frequent patterns in large corpora of documents. Hence, many classical data mining techniques, that have been proven fruitful in the context of data stored in relational databases, are now successfully used in the ...
Toward a taxonomy of concepts using web documents structure
IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services

Due to the rise of the Web and the need to have structured knowledge, an interesting line for research is the formalization of ontologies and the creation of conceptual taxonomies from Web documents. The traditional methods for ontology learning and ...
A structured documents retrieval method supporting attribute-based structure information
SAC '02: Proceedings of the 2002 ACM symposium on Applied computing

There are many studies on retrieval methods for structured documents but most of the studies are for those whose structure information is expressed by elements. But when elements are used to describe a document structure, the structure becomes static ...

Comments

Information & Contributors

Information

Published In

SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

April 2006

1967 pages

ISBN:1595931082

DOI:10.1145/1141277

Conference Chair:
Hisham M. Haddad
Kennesaw State University, Kennesaw, Georgia

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SAC06

Sponsor:

SIGAPP

SAC06: The 2006 ACM Symposium on Applied Computing

April 23 - 27, 2006

Dijon, France

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
171
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Kamaruddin SHamdan ABakar AMat Nor F(2018)Deviation detection in text using conceptual graph interchange format and error tolerance dissimilarity functionIntelligent Data Analysis10.5555/2608519.260852816:3(487-511)Online publication date: 27-Dec-2018
https://dl.acm.org/doi/10.5555/2608519.2608528
Nishiyama RTakeuchi HNasukawa TWatanabe H(2008)Extracting Advantage Phrases That Hint at a New Technology's PotentialsProceedings of the 7th International Conference on Practical Aspects of Knowledge Management10.1007/978-3-540-89447-6_11(98-110)Online publication date: 22-Nov-2008
https://dl.acm.org/doi/10.1007/978-3-540-89447-6_11

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Discovering unexpected documents in corpora

Toward a taxonomy of concepts using web documents structure

A structured documents retrieval method supporting attribute-based structure information

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Discovering unexpected documents in corpora

Toward a taxonomy of concepts using web documents structure

A structured documents retrieval method supporting attribute-based structure information

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations