skip to main content
10.1145/1376616.1376708acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Keyword proximity search in complex data graphs

Published: 09 June 2008 Publication History

Abstract

In keyword search over data graphs, an answer is a nonredundant subtree that includes the given keywords. An algorithm for enumerating answers is presented within an architecture that has two main components: an engine that generates a set of candidate answers and a ranker that evaluates their score. To be effective, the engine must have three fundamental properties. It should not miss relevant answers, has to be efficient and must generate the answers in an order that is highly correlated with the desired ranking. It is shown that none of the existing systems has implemented an engine that has all of these properties. In contrast, this paper presents an engine that generates all the answers with provable guarantees. Experiments show that the engine performs well in practice. It is also shown how to adapt this engine to queries under the OR semantics. In addition, this paper presents a novel approach for implementing rankers destined for eliminating redundancy. Essentially, an answer is ranked according to its individual properties (relevancy) and its intersection with the answers that have already been presented to the user. Within this approach, experiments with specific rankers are described.

References

[1]
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: enabling keyword search over relational databases. In SIGMOD, 2002.
[2]
J. Allan, V. Lavrenko, and H. Jin. First story detection in TDT is hard. In CIKM, 2000.
[3]
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, 2002.
[4]
B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. In ICDE, 2007.
[5]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.
[6]
N. Fuhr and K. Großjohann. XIRQL: A query language for information retrieval in XML documents. In SIGIR, 2001.
[7]
N. Fuhr, M. Lalmas, S. Malik, and G. Kazai, editors. 4th International Workshop of the Initiative for the Evaluation of XML Retrieval. Springer, 2006.
[8]
V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style keyword search over relational databases. In VLDB, 2003.
[9]
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, 2002.
[10]
V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. In ICDE, 2003.
[11]
D. Johnson, M. Yannakakis, and C. Papadimitriou. On generating all maximal independent sets. Info. Proc. Lett., 27, 1988.
[12]
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, 2005.
[13]
G. Kazai and M. Lalmas. extended cumulated gain measures for the evaluation of content-oriented XML retrieval. ACM Trans. Inf. Syst., 24(4), 2006.
[14]
B. Kimelfeld and Y. Sagiv. Efficiently enumerating results of keyword search. In DBPL, 2005.
[15]
B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In PODS, 2006.
[16]
B. Kimelfeld and Y. Sagiv. Efficiently enumerating results of keyword search over data graphs. To appear in Information Systems, 2008.
[17]
E. L. Lawler. A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Management Science, 18, 1972.
[18]
W.-S. Li, K. S. Candan, Q. Vu, and D. Agrawal. Retrieving and organizing web pages by ?information unit?. In WWW, 2001.
[19]
F. Liu, C. T. Yu, W. Meng, and A. Chowdhury. Effective keyword search in relational databases. In SIGMOD Conference, 2006.
[20]
D. Quass, A. Rajaraman, Y. Sagiv, J. D. Ullman, and J. Widom. Querying semistructured heterogeneous information. In DOOD, 1995.
[21]
N. Stokes and J. Carthy. Combining semantic and syntactic document classifiers to improve first story detection. In SIGIR. ACM, 2001.
[22]
M. Theobald, R. Schenkel, and G. Weikum. An efficient and versatile query engine for topx search. In VLDB, 2005.
[23]
S. Wang, Z. Peng, J. Zhang, L. Qin, S. Wang, J. X. Yu, and B. Ding. NUITS: A novel user interface for efficient keyword search over databases. In VLDB, 2006.
[24]
J. Y. Yen. Finding the k shortest loopless paths in a network. Management Science, 17, 1971.
[25]
J. Y. Yen. Another algorithm for finding the k shortest loopless network paths. In Proc. 41st Mtg. Operations Research Society of America, volume 20, 1972.
[26]
Y. Zhang, J. P. Callan, and T. P. Minka. Novelty and redundancy detection in adaptive filtering. In SIGIR. ACM, 2002.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
June 2008
1396 pages
ISBN:9781605581026
DOI:10.1145/1376616
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximate top-k answers
  2. information retrieval on graphs
  3. keyword proximity search
  4. redundancy elimination
  5. subtree enumeration by height

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media