skip to main content
10.1145/1645953.1645970acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Effective, design-independent XML keyword search

Published: 02 November 2009 Publication History

Abstract

Keyword search techniques that take advantage of XML structure make it very easy for ordinary users to query XML databases, but current approaches to processing these queries rely on intuitively appealing heuristics that are ultimately ad hoc. These approaches often retrieve irrelevant answers, overlook relevant answers, and cannot rank answers appropriately. To address these problems for data-centric XML, we propose coherency ranking (CR), a domain- and database design-independent ranking method for XML keyword queries that is based on an extension of the concept of mutual information. With CR, the results of a keyword query are invariant under schema reorganization. We analyze how previous approaches to XML keyword search approximate CR, and present efficient algorithms to perform CR. Our empirical evaluation with 65 user-supplied queries over two real-world XML data sets shows that CR has better precision and recall and provides better ranking than all previous approaches.

References

[1]
Z. Bao et al. E ffective XML Keyword Search with Relevance Oriented Ranking. In ICDE 2009.
[2]
G. Bhalotoa et al. Keyword Searching and Browsing in databases using BANKS. In ICDE 2002.
[3]
S. Chakrabarti, K. Puniyani, and S. Das. Optimizing Scoring Functions and Indexes for Proximity Search in Type-annotated Corpora. In WWW 2007.
[4]
S. Cohen et al. XSearch: A Semantic Search Engine for XML. In VLDB 2003.
[5]
T. Cover and J. Thomas. Elements of Information Theory. Wiley, 1983.
[6]
L. Guo et al. XRANK: Ranked Keyword Search over XML Documents. In SIGMOD 2003.
[7]
V. Hristidis et al. Keyword Proximity Search in XML Trees. TKDE, 18(5):525--536, 2006.
[8]
Y. Ke, J. Cheng, and W. Ng. Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach. In SIGKDD 2006.
[9]
M. Kendall and J. D. Gibbons. Rank Correlation Methods. 1990.
[10]
G. Li et al. Ef fective Keyword Search for Valuable LCAs over XML Documents. In CIKM 2007.
[11]
Y. Li, C. Yu, and H. V. Jagadish. Schema-Free XQuery. In VLDB 2004.
[12]
Z. Liu and Y. Chen. Reasoning and Identifying Relevant Matches for XML Keyword Search. In VLDB 2008.
[13]
C. Manning, P. Raghavan, and H. Schutze. An Introduction to Information Retrieval. 2008.
[14]
A. Termehchy and M. Winslett. E ffective Ranking of XML Keyword Search Results (Extended Version). Technical Report UIUCDCS-R-2009-3043, 2009.
[15]
S. Watanabe. Information Theoretical Analysis of Multivariate Correlation. IBM Journal of Research and Development, 4(1):66.
[16]
J. Wen, J. Nie, and H. Zhang. Clustering User Queries of a Search Engine. In WWW 2001.
[17]
I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. 2005.
[18]
Y. Xu and Y. Papakonstantinou. Efficient Keyword Search for Smallest LCAs in XML Databases. In SIGMOD 2005.
[19]
X. Yin, J. Han, and J. Yang. Searching for Related Objects in Relational Databases. In SSDBM 2005.
[20]
M. Zaki. Efficiently Mining Frequent Trees in a Forest. TKDE, 17(8):1021--1035, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
November 2009
2162 pages
ISBN:9781605585123
DOI:10.1145/1645953
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. XML
  2. correlation mining
  3. keyword queries

Qualifiers

  • Research-article

Conference

CIKM '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media