skip to main content
10.1145/2661829.2661964acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Extending Faceted Search to the General Web

Published: 03 November 2014 Publication History

Abstract

Faceted search helps users by offering drill-down options as a complement to the keyword input box, and it has been used successfully for many vertical applications, including e-commerce and digital libraries. However, this idea is not well explored for general web search, even though it holds great potential for assisting multi-faceted queries and exploratory search. In this paper, we explore this potential by extending faceted search into the open-domain web setting, which we call Faceted Web Search. To tackle the heterogeneous nature of the web, we propose to use query-dependent automatic facet generation, which generates facets for a query instead of the entire corpus. To incorporate user feedback on these query facets into document ranking, we investigate both Boolean filtering and soft ranking models. We evaluate Faceted Web Search systems by their utility in assisting users to clarify search intent and find subtopic information. We describe how to build reusable test collections for such tasks, and propose an evaluation method that considers both gain and cost for users. Our experiments testify to the potential of Faceted Web Search, and show Boolean filtering feedback models, which are widely used in conventional faceted search, are less effective than soft ranking models.

References

[1]
N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. Umass at trec 2004: Novelty and hard. Technical report, DTIC Document, 2004.
[2]
E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Paşca, and A. Soroa. A study on similarity and relatedness using distributional and wordnet-based approaches. In Proc. of NAACL-HLT, pages 19--27, 2009.
[3]
J. Allan. Relevance feedback with too much data. In Proc. of SIGIR, pages 337--343, 1995.
[4]
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using smart: Trec 3. NIST special publication, pages 69--69, 1995.
[5]
R. D. Burke, K. J. Hammond, and B. C. Young. Knowledge-based navigation of complex information spaces. In Proc. of National Conference of Artificial Intelligence, 1996.
[6]
C. Carpineto, S. Osiński, G. Romano, and D. Weiss. A survey of web clustering engines. ACM Computing Surveys (CSUR), 41(3):17, 2009.
[7]
C. L. Clarke, N. Craswell, and I. Soboro. Overview of the trec 2009 web track. Technical report, DTIC Document, 2009.
[8]
C. L. Clarke, N. Craswell, I. Soboro, and G. V. Cormack. Overview of the trec 2010 web track. Technical report, DTIC Document, 2009.
[9]
C. L. Clarke, N. Craswell, I. Soboro, and E. M. Voorhees. Overview of the trec 2011 web track. Technical report, DTIC Document, 2009.
[10]
C. L. Clarke, N. Craswell, and E. M. Voorhees. Overview of the trec 2012 web track. Technical report, DTIC Document, 2009.
[11]
C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proc. of SIGIR, pages 659--666, 2008.
[12]
G. V. Cormack, M. D. Smucker, and C. L. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information retrieval, 14(5):441--465, 2011.
[13]
W. Dakka and P. G. Ipeirotis. Automatic extraction of useful facet hierarchies from text databases. In Proc. of ICDE, pages 466--475, 2008.
[14]
V. Dang, X. Xue, and W. B. Croft. Inferring query aspects from reformulations using clustering. In Proc. of CIKM, pages 2117--2120, 2011.
[15]
D. Dash, J. Rao, N. Megiddo, A. Ailamaki, and G. Lohman. Dynamic faceted search for discovery-driven analysis. In Proc. of CIKM, pages 3--12, 2008.
[16]
Z. Dou, S. Hu, Y. Luo, R. Song, and J.-R. Wen. Finding dimensions for queries. In Proc. of CIKM, pages 1311--1320, 2011.
[17]
J. English, M. Hearst, R. Sinha, K. Swearingen, and K.-P. Yee. Hierarchical faceted metadata in site search interfaces. In Proc. of CHI, pages 628--639, 2002.
[18]
M. Hearst. Design recommendations for hierarchical faceted search interfaces. In SIGIR Workshop on Faceted Search.
[19]
M. Hearst. UIs for Faceted Navigation: Recent Advances and Remaining Open Problems. In Workshop on Computer Interaction and Information Retrieval, HCIR, 2008.
[20]
L. Heyer, S. Kruglyak, and S. Yooseph. Exploring expression data: identification and analysis of coexpressed genes. Genome research, 9(11):1106--1115, 1999.
[21]
A. Jain and G. Mishne. Organizing query completions for web search. In Proc. of CIKM, pages 1169--1178, 2010.
[22]
D. Klein and C. D. Manning. Accurate unlexicalized parsing. In Proc. of ACL, pages 423--430. Association for Computational Linguistics, 2003.
[23]
J. Koenemann and N. J. Belkin. A case for interaction: a study of interactive information retrieval behavior and effectiveness. In Proc. of SIGCHI, pages 205--212, 1996.
[24]
C. Kohlschütter, P.-A. Chirita, and W. Nejdl. Using link analysis to identify aspects in faceted web search. In SIGIR Workshop on Faceted Search, 2006.
[25]
W. Kong and J. Allan. Extracting query facets from search results. In Proc. of SIGIR, pages 93--102, 2013.
[26]
J. Koren, Y. Zhang, and X. Liu. Personalized interactive faceted search. In Proc. of WWW, pages 477--486, 2008.
[27]
R. Krovetz. Viewing morphology as an inference process. In Proc. of SIGIR, pages 191--202, 1993.
[28]
B. Kules, R. Capra, M. Banta, and T. Sierra. What do exploratory searchers look at in a faceted search interface? In Proc. of JCDL, pages 313--322, 2009.
[29]
K. Latha, K. R. Veni, and R. Rajaram. Afgf: An automatic facet generation framework for document retrieval. In Proc. of ACE, pages 110--114. IEEE, 2010.
[30]
D. Lawrie, W. B. Croft, and A. Rosenberg. Finding topic words for hierarchical summarization. In Proc. of SIGIR, pages 349--357, 2001.
[31]
D. J. Lawrie and W. B. Croft. Generating hierarchical summaries for web searches. In Proc. of SIGIR, pages 457--458, 2003.
[32]
C. Li, N. Yan, S. B. Roy, L. Lisham, and G. Das. Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia. In Proc. of WWW, pages 651--660, 2010.
[33]
D. Metzler and W. B. Croft. A markov random field model for term dependencies. In Proc. of SIGIR, pages 472--479, 2005.
[34]
C. G. Nevill-Manning, I. H. Witten, and G. W. Paynter. Lexically-generated subject hierarchies for browsing large collections. International Journal on Digital Libraries, 2(2-3):111--123, 1999.
[35]
E. Oren, R. Delbru, and S. Decker. Extending faceted navigation for rdf data. In Proc. of ISWC, pages 559--572, 2006.
[36]
P. Pantel, E. Crestan, A. Borkovsky, A.-M. Popescu, and V. Vyas. Web-scale distributional similarity and entity set expansion. In Proc. of EMNLP, pages 938--947, 2009.
[37]
P. Pantel and D. Lin. Discovering word senses from text. In Proc. of SIGKDD, pages 613--619, 2002.
[38]
P. Pantel, D. Ravichandran, and E. Hovy. Towards terascale knowledge acquisition. In Proc. of ICCL, page 771. Association for Computational Linguistics, 2004.
[39]
J. J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The Smart retrieval system - experiments in automatic document processing, pages 313--323. Englewood Cliffs, NJ: Prentice-Hall, 1971.
[40]
T. Sakai and R. Song. Evaluating diversified search results using per-intent graded relevance. In Proc. of SIGIR, pages 1043--1052, 2011.
[41]
G. Salton. Improving retrieval performance by relevance feedback. Readings in information retrieval, 24:5, 1997.
[42]
R. L. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In Proc. of WWW, pages 881--890, 2010.
[43]
A. Schuth and M. Marx. Evaluation methods for rankings of facetvalues for faceted search. In Multilingual and Multimodal Information Access Evaluation, pages 131--136. 2011.
[44]
S. Shi, H. Zhang, X. Yuan, and J.-R. Wen. Corpus-based semantic class mining: distributional vs. pattern-based approaches. In Proc. of ICCL, pages 993--1001, 2010.
[45]
R. Song, M. Zhang, T. Sakai, M. Kato, Y. Liu, M. Sugimoto, Q. Wang, and N. Orii. Overview of the ntcir-9 intent task. In Proc. of NTCIR-9 Workshop Meeting, pages 82--105, 2011.
[46]
E. Stoica and M. A. Hearst. Automating creation of hierarchical faceted metadata structures. In In Procs. of NAACL-HLT, 2007.
[47]
B. Tan, A. Velivelli, H. Fang, and C. Zhai. Term feedback for information retrieval with language models. In Proc. of SIGIR, pages 263--270, 2007.
[48]
J. Teevan, S. Dumais, and Z. Gutt. Challenges for supporting faceted search in large, heterogeneous corpora like the web. Proc. of HCIR, pages 6--8, 2008.
[49]
X. Wang, D. Chakrabarti, and K. Punera. Mining broad latent query aspects from search sessions. In Proc. of SIGKDD, pages 867--876, 2009.
[50]
F. Wu, J. Madhavan, and A. Y. Halevy. Identifying aspects for web-search queries. JAIR, 40:677--700, 2011.
[51]
J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proc. of SIGIR, pages 4--11, 1996.
[52]
H. Zhang, M. Zhu, S. Shi, and J.-R. Wen. Employing topic models for pattern-based semantic class discovery. In Proc. of the ACL-IJCNLP, pages 459--467, 2009.
[53]
L. Zhang and Y. Zhang. Interactive retrieval based on faceted feedback. In Proc. of SIGIR, pages 363--370, 2010.

Cited By

View all

Index Terms

  1. Extending Faceted Search to the General Web

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
    November 2014
    2152 pages
    ISBN:9781450325981
    DOI:10.1145/2661829
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. faceted web search
    2. interactive feedback
    3. query facets

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CIKM '14
    Sponsor:

    Acceptance Rates

    CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media