skip to main content
article
Free access

The effectiveness of GIOSS for the text database discovery problem

Published: 24 May 1994 Publication History

Abstract

The popularity of on-line document databases has led to a new problem: finding which text databases (out of many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the text database discovery problem. The first part of this paper presents a practical solution based on estimating the result size of a query and a database. The method is termed GlOSS—Glossary of Servers Server. The second part of this paper evaluates the effectiveness of GlOSS based on a trace of real user queries. In addition, we analyze the storage cost of our approach.

References

[1]
Luis Gravano, H6ctor Garc/a-Molina, and Anthony Tomasic. The efficacy of GLOSS for the text database discovery problem. Technical Report STAN-CS-TN- 93-002, Stanford University, November 1993. Available by anonymous ftp from db.stazlford.edu in /pub/grava_no/1993/st a_n. cs. tn. 93. 009. ps.
[2]
Michael F. Schwartz, Alan Emtage, Brewster Kahle, and B. Cliford Neuman. A comparison of INTERNET resource discovery approaches. Computer Systems, 5(4), 1992.
[3]
Katia Obraczka, Peter B. Danzig, and Shih-Hao Li. INTERNET resource discovery services. IEEE Computer, September 1993.
[4]
Tim Berners-Lee, Robert Cailliau, Jean-F. Croft, and Bernd Pollermann. World-Wide Web: The Information Universe. Electronic Networking: Research, Applications and Policy, 1(2), 1992.
[5]
Steve Foster. About the Veronica service, November 1992. Message posted in comp. ~nfosystems. gopher.
[6]
B. Clifford Neuman. The Prospero File System: A global file system based on the Virtual System model. Computer Systems, 5(4), 1992.
[7]
Brewster Kahle and Art Medlar. An information system for corporate users: Wide Area Information Servers. Technical Report TMC199, Thinking Machines Corporation, April 1991.
[8]
Jim Fullton, Archie Warnock, et al. Release notes for freeWAIS 0.2, October 1993.
[9]
Michael F. Schwartz. A scalable, non-hierarchical resource discovery mechanism based on probabilistic protocols. Technical Report CU-CS-474-90, Dept. of Computer Science, University of Colorado at Boulder, June 1990.
[10]
Michael F. Schwartz. INTERNET resource discovery at the University of Colorado. IEEE Computer, September 1993.
[11]
Peter B. Danzig, Shih-Hao Li, and Katia Obraczka. Distributed indexing of autonomous INTERNET services. Computer Systems, 5(4), 1992.
[12]
Peter B. Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proceedings of the 14th Annual SIGIR Conference, October 1991.
[13]
Patricia Simpson and Rafael Alonso. Querying a network of autonomous databases. Technical Report CS-TR-202-89, Dept. of Computer Science, Princeton University, January 1989.
[14]
Daniel Barbar# and Chris Clifton. Information Brokers: Sharing knowledge in a heterogeneous distributed system. Technical Report MITL-TR-31-92, Matsushita Information Technology Laboratory, October 1992.
[15]
Joann J. OrdilIe and Barton P. Miller. Distributed active catalogs and meta-data caching in descriptive name services. Technical Report #1118, University of Wisconsin-Madison, November 1992.
[16]
Chris Weider and Simon Spero. Architecture of the WHOIS++ Index Service, October 1993. Working draft.
[17]
Ran Giladi and Peretz Shoval. Routing queries in a network of databases driven by a meta knowledgebase. In Proceedings of the International Workshop on Next Generation Informatwn Technologies and Systems, June 1993.
[18]
Mark A. Sheldon, Andrzej Duda, Ron Weiss, James W. O'Toole, and David K. Gifford. A content routing system for distributed information servers. To appear in EDBT '94.
[19]
Alice Y. Chamis. Selection of online databases using switching vocabularies. Journal of the American Society for Information Sc,ence, 39(3), 1988.
[20]
Gerard Salton and Michael J. McGill. Introduction to modern information retrieval. McGraw-Hill, 1983.
[21]
Gerard Salton and Chris Buckley. Parallel text search methods. Communicatwns of the ACM, 31(2), February 1988.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 23, Issue 2
June 1994
522 pages
ISSN:0163-5808
DOI:10.1145/191843
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMOD '94: Proceedings of the 1994 ACM SIGMOD international conference on Management of data
    May 1994
    525 pages
    ISBN:0897916395
    DOI:10.1145/191839
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 May 1994
Published in SIGMOD Volume 23, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)11
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media