Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleSeptember 2001
- ArticleSeptember 2001
Query clustering using content words and user feedback
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 442–443https://doi.org/10.1145/383952.384083Query clustering is crucial for automatically discovering frequently asked queries (FAQs) or most popular topics on a question-answering search engine. Due to the short length of queries, the traditional approaches based on keywords are not suitable for ...
- ArticleSeptember 2001
Automatic web search query generation to create minority language corpora
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 432–433https://doi.org/10.1145/383952.384072The Web is a valuable source of language specific resources but collecting, organizing and utilizing this information is difficult. We describe CorpusBuilder, an approach for automatically generating Web-search queries to collect documents in a minority ...
- ArticleSeptember 2001
Generic topic segmentation of document texts
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 418–419https://doi.org/10.1145/383952.384065Topic segmentation is an important initial step in many text-based tasks. A hierarchical representation of a texts topics is useful in retrieval and allows judging relevancy at different levels of detail. This short paper describes research on generic ...
- ArticleSeptember 2001
Query-biased web page summarisation: a task-oriented evaluation
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 412–413https://doi.org/10.1145/383952.384062We present a system that offers a new way of assessing web document relevance and new approach to the web-based evaluation of such a system. Provisionally named WebDocSum, the system is a query-biased web page summariser that aims to provide an ...
-
- ArticleSeptember 2001
- ArticleSeptember 2001
Structure and content-based segmentation of speech transcripts
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 404–405https://doi.org/10.1145/383952.384041algorithm for the segmentation of an audio/video source into topically cohesive segments based on automatic speech recognition (ASR) transcriptions is presented. A novel two-pass algorithm is described that combines a boundary-based method with a ...
- ArticleSeptember 2001
Quantifying the utility of parallel corpora
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 398–399https://doi.org/10.1145/383952.384037Our English-Chinese cross-language IR system is trained from parallel corpora; we investigate its performance as a function of training corpus size for three different training corpora. We find that the performance of the system as trained on the three ...
- ArticleSeptember 2001
Anchor text mining for translation extraction of query terms
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 388–389https://doi.org/10.1145/383952.384031This paper presents an approach to automatically extracting the bilingual translations of many Web query terms through mining the Web anchor texts. Some preliminary experiments are conducted on using 109,416 Web pages containing both Chinese and English ...
- ArticleSeptember 2001
Searcher performance in question answering
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 375–381https://doi.org/10.1145/383952.384028There are many tasks that require information finding. Some can be largely automated, and others greatly benefit from successful interaction between system and searcher. We are interested in the task of answering questions where some synthesis of ...
- ArticleSeptember 2001
High performance question/answering
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 366–374https://doi.org/10.1145/383952.384025In this paper we present the features of a Question/Answering (Q/A) system that had unparalleled performance in the TREC-9 evaluations. We explain the accuracy of our system through the unique characteristics of its architecture: (1) usage of a wide-...
- ArticleSeptember 2001
Exploiting redundancy in question answering
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 358–365https://doi.org/10.1145/383952.384024Our goal is to automatically answer brief factual questions of the form ``When was the Battle of Hastings?'' or ``Who wrote The Wind in the Willows?''. Since the answer to nearly any such question can now be found somewhere on the Web, the problem ...
- ArticleSeptember 2001
Finding topic words for hierarchical summarization
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 349–357https://doi.org/10.1145/383952.384022Hierarchies have long been used for organization, summarization, and access to information. In this paper we define summarization in terms of a probabilistic language model and use the definition to explore a new technique for automatically ...
- ArticleSeptember 2001
Topic segmentation with an aspect hidden Markov model
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 343–348https://doi.org/10.1145/383952.384021We present a novel probabilistic method for topic segmentation on unstructured text. One previous approach to this problem utilizes the hidden Markov model (HMM) method for probabilistically modeling sequence data [7]. The HMM treats a document as ...
- ArticleSeptember 2001
A study of smoothing methods for language models applied to Ad Hoc information retrieval
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 334–342https://doi.org/10.1145/383952.384019Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech ...
- ArticleSeptember 2001
A meta-learning approach for text categorization
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 303–309https://doi.org/10.1145/383952.384011We investigate a meta-model approach, called Meta-learning Using Document Feature characteristics (MUDOF), for the task of automatic textual document categorization. It employs a meta-learning phase using document feature characteristics. Document ...
- ArticleSeptember 2001
Enhanced topic distillation using text, markup tags, and hyperlinks
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 208–216https://doi.org/10.1145/383952.383990Topic distillation is the analysis of hyperlink graph structure to identify mutually reinforcing authorities (popular pages) and hubs (comprehensive lists of links to authorities). Topic distillation is becoming common in Web search engines, but the ...
- ArticleSeptember 2001
Automatic generation of concise summaries of spoken dialogues in unrestricted domains
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 199–207https://doi.org/10.1145/383952.383989Automatic summarization of open domain spoken dialogues is a new research area. This paper introduces the task, the challenges involved, and presents an approach to obtain automatic extract summaries for multi-party dialogues of four different genres, ...
- ArticleSeptember 2001
On feature distributional clustering for text categorization
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 146–153https://doi.org/10.1145/383952.383976We describe a text categorization approach that is based on a combination of feature distributional clusters with a support vector machine (SVM) classifier. Our feature selection approach employs distributional clustering of words via the recently ...
- ArticleSeptember 2001
A study of thresholding strategies for text categorization
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalPages 137–145https://doi.org/10.1145/383952.383975Thresholding strategies in automated text categorization are an underexplored area of research. This paper presents an examination of the effect of thresholding strategies on the performance of a classifier under various conditions. Using k-Nearest ...