skip to main content
article
Free access

Efficient passage ranking for document databases

Published: 01 October 1999 Publication History

Abstract

Queries to text collections are resolved by ranking the documents in the collection and returning the highest-scoring documents to the user. An alternative retrieval method is to rank passages, that is, short fragments of documents, a strategy that can improve effectiveness and identify relevant material in documents that are too large for users to consider as a whole. However, ranking of passages can considerably increase retrieval costs. In this article we explore alternative query evaluation techniques, and develop new tecnhiques for evaluating queries on passages. We show experimentally that, appropriately implemented, effective passage retrieval is practical in limited memory on a desktop machine. Compared to passage ranking with adaptations of current document ranking algorithms, our new “DO-TOS” passage-ranking algorithm requires only a fraction of the resources, at the cost of a small loss of effectiveness.

References

[1]
ALLAN, J. 1995. Relevance feedback with too much data. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Re-trieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 337-343.
[2]
ANH,V.N.AND MOFFAT, A. 1998. Compressed inverted files with reduced decoding overheads. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 290-297.
[3]
BELL,T.C.,MOFFAT, A., WITTEN,I.H.,AND ZOBEL, J. 1995. The MG retrieval system: Compressing for space and speed. Commun. ACM 38, 4 (Apr. 1995), 41-42.
[4]
BERTINO, E., OOI, B., SACKS-DAVIS, R., TAN, K.-L., AND ZOBEL, J. 1997. Text databases. In Indexing Techniques for Advanced Database Systems Kluwer Academic Publishers, Hing-ham, MA.
[5]
BROWN, E. W. 1995. Fast evaluation of structured queries for information retrieval. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 30-38.
[6]
BUCKLEY,C.AND LEWIT, A. F. 1985. Optimization of inverted vector searches. In Proceedings of the eighth annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR '85, Montr~al, P.Q., Canada, June 5-7, 1985), J. M. Tague, Ed. ACM Press, New York, NY, 97-110.
[7]
CALLAN, J. P. 1994. Passage-level evidence in document retrieval. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 302-310.
[8]
CLARKE,C.L.A.,CORMACK,G.V.,AND BURKOWSKI, F. J. 1995. Shortest substring ranking MultiText experiments for TREC-4. In Proceedings of the 4th Text Retrieval Conference (TREC-4, Washington, D.C., Nov.), D. K. Harman, Ed. National Institute of Standards and Technology, Gaithersburg, MD, 295-304.
[9]
CLARKE, C., CORMACK, G., AND TUDHOPE, E. 1997. Relevance ranking for one to three term queries. In Proceedings of the 5th RIAO Conference 388-412.
[10]
CORMACK, G., PALMER, C., BIESBROUCK, M., AND CLARKE, C. 1998. Deriving very short queries for high precision and recall. In Proceedings of the 7th Text Retreival Conference (TREC-7)
[11]
FRAKES,W.B.AND BAEZA-YATES, R., Eds. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Inc., Upper Saddle River, NJ.
[12]
FULLER, M., KASZKIEL, M., KIM, D., NG, C., ROBERTSON, J., WILKINSON, R., WU, M., AND ZOBEL, J. 1998. TREC 7 ad hoc, speech, and interactive tracks at MDS/CSIRO. In Proceedings of the 7th Text Retreival Conference (TREC-7)
[13]
FULLER, M., KASZKIEL, M., NG, C., VINES, P., WILKINSON, R., AND ZOBEL, J. 1997. MDS TREC 6 report. In Proceedings of the 6th Text Retreival Conference (TREC-6, Nov.), E. Voorhees and D. Harman, Eds. 241-258.
[14]
HARMAN, D. K. 1995. Overview of the second text retrieval conference (TREC-2). Inf. Process. Manage. 31, 3 (May-June), 271-289.
[15]
HARMAN,D.AND CANDELA, G. 1990. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. J. Am. Soc. Inf. Sci. 41, 8, 581-589.
[16]
HEARST,M.A.AND PLAUNT, C. 1993. Subtopic structuring for full-length document access. In Proceedings of the 16th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '93, Pittsburgh, PA, June 27-July), R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, New York, NY, 59-68.
[17]
KASZKIEL,M.AND ZOBEL, J. 1997. Passage retrieval revisited. In Proceedings of the 20th Annual International ACM Conference on Research and Development in Information Re-trieval (SIGIR '97, Philadelphia, PA, July 27-31), N. J. Belkin, A. D. Narasimhalu, P. Willett, W. Hersh, F. Can, and E. Voorhees, Eds, ACM Press, New York, NY, 178-185.
[18]
MITRA, M., SINGHAL, A., AND BUCKLEY, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '98, Melbourne, Australia, Aug. 24-28), W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 206-214.
[19]
MITTENDORF,E.AND SCH~UBLE, P. 1994. Document and passage retrieval based on hidden Markov models. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 318-327.
[20]
MOFFAT,A.AND ZOBEL, J. 1996. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14, 4, 349-379.
[21]
MOFFAT, A., ZOBEL, J., AND KLEIN, S. 1995. Improved inverted file processing for large text databases. In Proceedings of the 6th Australasian Database Conference (Adelaide, Jan.), R. Sacks-Davis and J. Zobel, Eds. 162-171.
[22]
PERSIN, M. 1996. Efficient implementation of text retrieval techniques. RMIT, Melbourne, Australia.
[23]
PERSIN, M., ZOBEL, J., AND SACKS-DAVIS, R. 1996. Filtered document retrieval with frequency-sorted indexes. J. Am. Soc. Inf. Sci. 47, 10, 749-764.
[24]
SALTON, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Series in Computer Science. Addison-Wesley Longman Publ. Co., Inc., Reading, MA.
[25]
SALTON,G.AND BUCKLEY, C. 1991. Automatic text structuring and retrieval-experiments in automatic encyclopedia searching. In Procedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '91, Chicago, IL, Oct. 13-16), E. Fox, Ed. ACM Press, New York, NY, 21-30.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 17, Issue 4
Oct. 1999
123 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/326440
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 1999
Published in TOIS Volume 17, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. inverted files
  2. passage retrieval
  3. query evaluation
  4. text databases
  5. text retrieval

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)72
  • Downloads (Last 6 weeks)12
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media