Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleAugust 2002
- ArticleAugust 2002
Adaptive information extraction for document annotation in amilcare
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPage 451https://doi.org/10.1145/564376.564492Amilcare is a tool for Adaptive Information Extraction (IE) designed for supporting active annotation of documents for the Semantic Web (SW). It can be used either for unsupervised document annotation or as a support for human annotation. Amilcare is ...
- ArticleAugust 2002
Correlating multilingual documents via bipartite graph modeling
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 443–444https://doi.org/10.1145/564376.564485There is enormous amount of multilingual documents from various sources and possibly from different countries describing a single event or a set of related events. It is desirable to construct text mining methods that can compare and highlight ...
- ArticleAugust 2002
Topic structure modeling
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 417–418https://doi.org/10.1145/564376.564472In this paper, we present a method based on document probes to quantify and diagnose topic structure, distinguishing topics as monolithic, structured, or diffuse. The method also yields a structure analysis that can be used directly to optimize filter (...
- ArticleAugust 2002
Building thematic lexical resources by term categorization
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 415–416https://doi.org/10.1145/564376.564471We discuss the automatic generation of thematic lexicons by means of term categorization, a novel task employing techniques from information retrieval (IR) and machine learning (ML). Specifically, we view the generation of such lexicons as an iterative ...
-
- ArticleAugust 2002
Modeling (in)variability of human judgments for text summarization
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 407–408https://doi.org/10.1145/564376.564467The paper proposes and empirically motivates an integration of supervised learning with unsupervised learning to deal with human biases in summarization. In particular, we explore the use of probabilistic decision tree within the clustering framework to ...
- ArticleAugust 2002
Automatic metadata generation & evaluation
- Elizabeth D. Liddy,
- Eileen Allen,
- Sarah Harwell,
- Susan Corieri,
- Ozgur Yilmazel,
- N. Ercan Ozgencil,
- Anne Diekema,
- Nancy McCracken,
- Joanne Silverstein,
- Stuart Sutton
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 401–402https://doi.org/10.1145/564376.564464The poster reports on a project in which we are investigating methods for breaking the human metadata-generation bottleneck that plagues Digital Libraries. The research question is whether metadata elements and values can be automatically generated from ...
- ArticleAugust 2002
User-centered interface design for cross-language information retrieval
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 383–384https://doi.org/10.1145/564376.564455This paper reports on the user-centered design methodology and techniques used for the elicitation of user requirements and how these requirements informed the first phase of the user interface design for a Cross-Language Information Retrieval System. ...
- ArticleAugust 2002
- ArticleAugust 2002
ICA and SOM in text document analysis
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 361–362https://doi.org/10.1145/564376.564444In this study we show experimental results on using Independent Component Analysis (ICA) and the Self-Organizing Map (SOM) in document analysis. Our documents are segments of spoken dialogues carried out over the telephone in a customer service, ...
- ArticleAugust 2002
Using self-supervised word segmentation in Chinese information retrieval
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 349–350https://doi.org/10.1145/564376.564438We propose a self-supervised word-segmentation technique for Chinese information retrieval. This method combines the advantages of traditional dictionary based approaches with character based approaches, while overcoming many of their shortcomings. ...
- ArticleAugust 2002
Using part-of-speech patterns to reduce query ambiguity
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 307–314https://doi.org/10.1145/564376.564430Query ambiguity is a generally recognized problem, particularly in Web environments where queries are commonly only one or two words in length. In this study, we explore one technique that finds commonly occurring patterns of parts of speech near a one-...
- ArticleAugust 2002
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 275–282https://doi.org/10.1145/564376.564425Arabic, a highly inflected language, requires good stemming for effective information retrieval, yet no standard approach to stemming has emerged. We developed several light stemmers based on heuristics and a statistical stemmer based on co-occurrence ...
- ArticleAugust 2002
Empirical studies in strategies for Arabic retrieval
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 269–274https://doi.org/10.1145/564376.564424This work evaluates a few search strategies for Arabic monolingual and cross-lingual retrieval, using the TREC Arabic corpus as the test-bed. The release by NIST in 2001 of an Arabic corpus of nearly 400k documents with both monolingual and cross-...
- ArticleAugust 2002
Methods and metrics for cold-start recommendations
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 253–260https://doi.org/10.1145/564376.564421We have developed a method for recommending items that combines content and collaborative data under a single probabilistic framework. We benchmark our algorithm against a naïve Bayes classifier on the cold-start problem, where we wish to recommend ...
- ArticleAugust 2002
Probabilistic combination of text classifiers using reliability indicators: models and results
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 207–214https://doi.org/10.1145/564376.564413The intuition that different text classifiers behave in qualitatively different ways has long motivated attempts to build a better metaclassifier via some combination of classifiers. We introduce a probabilistic method for combining classifiers that ...
- ArticleAugust 2002
A new family of online algorithms for category ranking
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 151–158https://doi.org/10.1145/564376.564404We describe a new family of topic-ranking algorithms for multi-labeled documents. The motivation for the algorithms stems from recent advances in online learning algorithms. The algorithms we present are simple to implement and are time and memory ...
- ArticleAugust 2002
Unsupervised document classification using sequential information maximization
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 129–136https://doi.org/10.1145/564376.564401We present a novel sequential clustering algorithm which is motivated by the Information Bottleneck (IB) method. In contrast to the agglomerative IB algorithm, the new sequential (sIB) approach is guaranteed to converge to a local maximum of the ...
- ArticleAugust 2002
Cross-document summarization by concept classification
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 121–128https://doi.org/10.1145/564376.564399In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous ...
- ArticleAugust 2002
Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalPages 113–120https://doi.org/10.1145/564376.564398A novel method for simultaneous keyphrase extraction and generic text summarization is proposed by modeling text documents as weighted undirected and weighted bipartite graphs. Spectral graph clustering algorithms are useed for partitioning sentences of ...