skip to main content
10.1145/1458082.1458087acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Dynamic faceted search for discovery-driven analysis

Published: 26 October 2008 Publication History

Abstract

We propose a dynamic faceted search system for discovery-driven analysis on data with both textual content and structured attributes. From a keyword query, we want to dynamically select a small set of "interesting" attributes and present aggregates on them to a user. Similar to work in OLAP exploration, we define "interestingness" as how surprising an aggregated value is, based on a given expectation. We make two new contributions by proposing a novel "navigational" expectation that's particularly useful in the context of faceted search, and a novel interestingness measure through judicious application of p-values. Through a user survey, we find the new expectation and interestingness metric quite effective. We develop an efficient dynamic faceted search system by improving a popular open source engine, Solr. Our system exploits compressed bitmaps for caching the posting lists in an inverted index, and a novel directory structure called a bitset tree for fast bitset intersection. We conduct a comprehensive experimental study on large real data sets and show that our engine performs 2 to 3 times faster than Solr.

References

[1]
Sanjay Agrawal, et al: DBXplorer: A System for Keyword-Based Search over Relational Databases. In ICDE 2002: 5--16
[2]
http://base.google.com/
[3]
http://clusty.com/
[4]
Thomas M. Cover and Joy a. Thomas. Elements of Information Theory. 1992.
[5]
W. Dakka, et al: Automatic discovery of useful facet terms. In SIGIR Faceted Search Workshop, 2006
[6]
DBLP dataset: http://dblp.uni-trier.de/xml/
[7]
Bradley Efron and Robert J. Tibshirani: An introduction to the bootstrap. Chapman & Hall, 1993
[8]
http://endeca.com/
[9]
The Flamenco Search Interface Project. http://flamenco.berkeley.edu/
[10]
Roy Goldman and Jennifer Widom, DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases, In VLDB 1997
[11]
Jim Gray, et al: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE 1996: 152--159
[12]
Vagelis Hristidis, et al: DISCOVER: Keyword Search in Relational Databases. VLDB 2002
[13]
Ihab F. Ilyas, et al: CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies. In SIGMOD 2004
[14]
Xiaohui Long et al: Optimized Query Execution in Large Search Engines with Global Page Ordering, VLDB 2003
[15]
http://lucene.apache.org/
[16]
Tom M. Mitchell: Machine learning. McGraw-Hill,1997
[17]
Patent dataset: http://www.nber.org/patents
[18]
John Roddick, et al: A Survey of Temporal Knowledge Discovery Paradigms and Methods. In TKDE, 2002
[19]
Sunita Sarawagi: User-Adaptive Exploration of Multidimensional Data. VLDB 2000: 307--316
[20]
Sunita Sarawagi, et al: Discovery-Driven Exploration of OLAP Data Cubes. In EDBT 1998
[21]
http://incubator.apache.org/solr/
[22]
Jayme Luiz Szwarcfiter: Optimal multiway search trees for variable size keys, In Acta Informatica, 1984
[23]
http://www.ibm.com/software/data/discovery/content
[24]
Witten, I.H., et al: Managing Gigabytes: Compressing and Indexing Documents and Images. 1994
[25]
http://www.research.ibm.com/UIMA/
[26]
Ping Wu, et al: From Keyword-based Retrieval to Keyword-driven Analytical Processing: A Multi-faceted Approach. SIGMOD 2007
[27]
Kesheng Wu, et al: Optimizing bitmap indices with efficient compression. ACM TODS, 31(1), 2006
[28]
http://commons.apache.org/math/
[29]
Hellerstein, et al: Generalized search trees for database systems. In VLDB, 1995.
[30]
Liqiang Geng, et al. Interestingness measures for data mining: A survey. In ACM Computing Surveys, 2006.
[31]
Friedman, et al: Exploratory Projection Pursuit. In JASA, 1987.
[32]
Swayne, et al: XGobi: Interactive Dynamic Data Visualization in the X Window System. In JCGS, 1998.

Cited By

View all

Index Terms

  1. Dynamic faceted search for discovery-driven analysis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
      October 2008
      1562 pages
      ISBN:9781595939913
      DOI:10.1145/1458082
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 October 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Conference

      CIKM08
      CIKM08: Conference on Information and Knowledge Management
      October 26 - 30, 2008
      California, Napa Valley, USA

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)22
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 14 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media