skip to main content
research-article

Selective Cluster Presentation on the Search Results Page

Published: 28 February 2018 Publication History

Abstract

Web search engines present, for some queries, a cluster of results from the same specialized domain (“vertical”) on the search results page (SERP). We introduce a comprehensive analysis of the presentation of such clusters from seven different verticals based on the logs of a commercial Web search engine. This analysis reveals several unique characteristics—such as size, rank, and clicks—of result clusters from community question-and-answer websites. The study of properties of this result cluster—specifically as part of the SERP—has received little attention in previous work. Our analysis also motivates the pursuit of a long-standing challenge in ad hoc retrieval, namely, selective cluster retrieval. In our setting, the specific challenge is to select for presentation the documents most highly ranked either by a cluster-based approach (those in the top-retrieved cluster) or by a document-based approach. We address this classification task by representing queries with features based on those utilized for ranking the clusters, query-performance predictors, and properties of the document-clustering structure. Empirical evaluation performed with TREC data shows that our approach outperforms a recently proposed state-of-the-art cluster-based document-retrieval method as well as state-of-the-art document-retrieval methods that do not account for inter-document similarities.

Supplementary Material

JPG File (a28-levi.jpg)
MP4 File (a28-levi.mp4)

References

[1]
Nasreen Abdul-Jaleel, James Allan, W. Bruce Croft, Fernando Diaz, Leah Larkey, Xiaoyan Li, Marck D. Smucker, and Courtney Wade. 2004. UMASS at TREC 2004—Novelty and HARD. In Proceedings of TREC-13.
[2]
Lada A. Adamic, Jun Zhang, Eytan Bakshy, and Mark S. Ackerman. 2008. Knowledge sharing and Yahoo answers: Everyone knows something. In Proceedings of WWW. 665--674.
[3]
Jaime Arguello. 2015. Improving aggregated search coherence. In Proceedings of ECIR. Springer, 25--36.
[4]
Jaime Arguello and Robert Capra. 2012. The effect of aggregated search coherence on search behavior. In Proceedings of CIKM. 1293--1302.
[5]
Jaime Arguello and Robert Capra. 2014. The effects of vertical rank and border on aggregated search coherence and search behavior. In Proceedings of CIKM. 539--548.
[6]
Jaime Arguello, Fernando Diaz, and Jamie Callan. 2011a. Learning to aggregate vertical results into web search results. In Proceedings of CIKM. 201--210.
[7]
Jaime Arguello, Fernando Diaz, Jamie Callan, and Ben Carterette. 2011b. A methodology for evaluating aggregated search results. In Proceedings of ECIR. 141--152.
[8]
Jaime Arguello, Fernando Diaz, Jamie Callan, and Jean-Francois Crespo. 2009. Sources of evidence for vertical selection. In Proceedings of SIGIR. 315--322.
[9]
Jaime Arguello, Fernando Diaz, and Jean-François Paiement. 2010. Vertical selection in the presence of unlabeled verticals. In Proceedings of SIGIR. 691--698.
[10]
Niranjan Balasubramanian and James Allan. 2010. Learning to select rankers. In Proceedings of SIGIR. 855--856.
[11]
Michael Bendersky, W. Bruce Croft, and Yanlei Diao. 2011. Quality-biased ranking of web documents. In Proceedings of WSDM. 95--104.
[12]
Michael S. Bernstein, Jaime Teevan, Susan Dumais, Daniel Liebling, and Eric Horvitz. 2012. Direct answers for search queries in the long tail. In Proceedings of CHI. ACM, 237--246.
[13]
Horatiu Bota, Ke Zhou, and Joemon M. Jose. 2016. Playing your cards right: The effect of entity cards on search behaviour and workload. In Proceedings of CHIIR. ACM, 131--140.
[14]
David Carmel and Elad Yom-Tov. 2010. Estimating the Query Difficulty for Information Retrieval. Morgan 8 Claypool, San Francisco, CA.
[15]
Danqi Chen, Weizhu Chen, Haixun Wang, Zheng Chen, and Qiang Yang. 2012. Beyond ten blue links: Enabling user click modeling in federated web search. In Proceedings of WSDM. 463--472.
[16]
Kevyn Collins-Thompson, Paul N. Bennett, Fernando Diaz, Charlie Clarke, and Ellen M. Voorhees. 2013. TREC 2013 web track overview. In Proceedings of TREC.
[17]
Gordon V. Cormack, Charles L. A. Clarke, and Stefan Büttcher. 2009. Reciprocal rank fusion outperforms Condorcet and individual rank learning methods. In Proceedings of SIGIR. 758--759.
[18]
Gordon V. Cormack, Mark D. Smucker, and Charles L. A. Clarke. 2011. Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval Journal 14, 5, 441--465.
[19]
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. 2008. An experimental comparison of click position-bias models. In Proceedings of WSDM. 87--94.
[20]
W. Bruce Croft. 1980. A model of cluster searching based on classification. Information Systems 5 (1980), 189--195.
[21]
W. Bruce Croft and Roger Thompson. 1984. The use of adaptive mechanisms for selection of search strategies in document retrieval systems. In Proceedings of SIGIR. 95--110.
[22]
Steve Cronen-Townsend, Yun Zhou, and W. Bruce Croft. 2002. Predicting query performance. In Proceedings of SIGIR. 299--306.
[23]
Fernando Diaz. 2005. Regularizing ad hoc retrieval scores. In Proceedings of CIKM. 672--679.
[24]
Fernando Diaz. 2009. Integration of news content into web results. In Proceedings of WSDM. 182--191.
[25]
Fernando Diaz. 2015. Condensed list relevance models. In Proceedings of ICTIR. 313--316.
[26]
Bilwaj Gaonkar, Aristeidis Sotiras, and Christos Davatzikos. 2013. Deriving statistical significance maps for support vector regression using medical imaging data. In International Workshop on Pattern Recognition in Neuroimaging (PRNI’13). 13--16.
[27]
Alan Griffiths, H. Claire Luckhurst, and Peter Willett. 1986. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science 37, 1, 3--11.
[28]
Ido Guy. 2016. Searching by talking: Analysis of voice queries on mobile web search. In Proceedings of SIGIR. 35--44.
[29]
Mark A. Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explorations 11, 1, 10--18.
[30]
Ben He and Iadh Ounis. 2004. Inferring query performance using pre-retrieval predictors. In Proceedings of SPIRE. 43--54.
[31]
N. Jardine and C. J. van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7, 5, 217--240.
[32]
Arlind Kopliku, Karen Pinel-Sauvagnat, and Mohand Boughanem. 2014. Aggregated search: A new information retrieval paradigm. Computing Surveys 46, 3, 41.
[33]
Elad Kravi, Ido Guy, Avihai Mejer, David Carmel, Yoelle Maarek, Dan Pelleg, and Gilad Tsur. 2016. One query, many clicks: Analysis of queries with multiple clicks by the same user. In Proceedings of CIKM. 1423--1432.
[34]
Oren Kurland. 2008. The opposite of smoothing: A language model approach to ranking query-specific document clusters. In Proceedings of SIGIR. 171--178.
[35]
Oren Kurland. 2009. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval 12, 4, 437--460.
[36]
Oren Kurland and Carmel Domshlak. 2008. A rank-aggregation approach to searching for optimal query-specific clusters. In Proceedings of SIGIR. 547--554.
[37]
Oren Kurland and Lillian Lee. 2005. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of SIGIR. 306--313.
[38]
Oren Kurland and Lillian Lee. 2006. Respect my authority! HITS without hyperlinks utilizing cluster-based language models. In Proceedings of SIGIR. 83--90.
[39]
Oren Kurland, Fiana Raiber, and Anna Shtok. 2012. Query-performance prediction and cluster ranking: Two sides of the same coin. In Proceedings of CIKM. 2459--2462.
[40]
John D. Lafferty and Chengxiang Zhai. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR. 111--119.
[41]
Mounia Lalmas. 2011. Aggregated search. In Advanced Topics in Information Retrieval. Springer, 109--123.
[42]
Victor Lavrenko and W. Bruce Croft. 2001. Relevance-based language models. In Proceedings of SIGIR. 120--127.
[43]
Kyung-Soon Lee, W. Bruce Croft, and James Allan. 2008. A cluster-based resampling method for pseudo-relevance feedback. In Proceedings of SIGIR. 235--242.
[44]
Kyung-Soon Lee, Young-Chan Park, and Key-Sun Choi. 2001. Re-ranking model based on document clusters. Information Processing and Management 37, 1, 1--14.
[45]
Or Levi, Fiana Raiber, Oren Kurland, and Ido Guy. 2016. Selective cluster-based document retrieval. In Proceedings of CIKM. 1473--1482.
[46]
Thomas Lin, Patrick Pantel, Michael Gamon, Anitha Kannan, and Ariel Fuxman. 2012. Active objects: Actions for entity-centric search. In Proceedings of WWW. 589--598.
[47]
Xiaoyong Liu and W. Bruce Croft. 2004. Cluster-based retrieval using language models. In Proceedings of SIGIR. 186--193.
[48]
Xiaoyong Liu and W. Bruce Croft. 2006. Experiments on Retrieval of Optimal Clusters. Technical Report IR-478. University of Massachusetts, Amherst, Massachusetts.
[49]
Xiaoyong Liu and W. Bruce Croft. 2008. Evaluating text representations for retrieval of the best group of documents. In Proceedings of ECIR. 454--462.
[50]
Zeyang Liu, Yiqun Liu, Ke Zhou, Min Zhang, and Shaoping Ma. 2015. Influence of vertical result in web search examination. In Proceedings of SIGIR. 193--202.
[51]
Craig Macdonald, Rodrygo L. T. Santos, and Iadh Ounis. 2012. On the usefulness of query features for learning to rank. In Proceedings of CIKM. 2559--2562.
[52]
Lior Meister, Oren Kurland, and Inna Gelfer Kalmanovich. 2010. Re-ranking search results using an additional retrieved list. Information Retrieval 14, 4, 413--437.
[53]
Pavel Metrikov, Fernando Diaz, Sebastien Lahaie, and Justin Rao. 2014. Whole page optimization: How page elements interact with the position auction. In Proceedings of EC. 583--600.
[54]
Donald Metzler and W. Bruce Croft. 2005. A Markov random field model for term dependencies. In Proceedings of SIGIR. 472--479.
[55]
Boaz Petersil, Avihai Mejer, Idan Szpektor, and Koby Crammer. 2016. That’s not my question: Learning to weight unmatched terms in CQA vertical search. In Proceedings of SIGIR. 225--234.
[56]
John C. Platt. 1998. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Technical Report. Advances in Kernel Methods - Support Vector Learning, Microsoft Research.
[57]
Fiana Raiber and Oren Kurland. 2013. Ranking document clusters using Markov random fields. In Proceedings of SIGIR. 333--342.
[58]
Fiana Raiber and Oren Kurland. 2014. Query-performance prediction: Setting the expectations straight. In Proceedings of SIGIR. 13--22.
[59]
Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2011. Aggregated search result diversification. In Proceedings of ICTIR. 250--261.
[60]
Rodrygo L. T. Santos, Craig MacDonald, and Iadh Ounis. 2015. Search result diversification. Foundations and Trends in Information Retrieval 9, 1, 1--90.
[61]
Anna Shtok, Oren Kurland, and David Carmel. 2009. Predicting query performance by query-drift estimation. In Proceedings of ICTIR. 305--312.
[62]
Shanu Sushmita, Hideo Joho, Mounia Lalmas, and Robert Villa. 2010. Factors affecting click-through behavior in aggregated search interfaces. In Proceedings of CIKM. ACM, 519--528.
[63]
Gilad Tsur, Yuval Pinter, Idan Szpektor, and David Carmel. 2016. Identifying web queries with question intent. In Proceedings of WWW. 783--793.
[64]
Vishwa Vinay, Ingemar J. Cox, Natasa Milic-Frayling, and Kenneth R. Wood. 2006. On ranking the effectiveness of searches. In Proceedings of SIGIR. 398--404.
[65]
Ellen M. Voorhees. 1985. The cluster hypothesis revisited. In Proceedings of SIGIR. 188--196.
[66]
Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating non-sequential behavior into click models. In Proceedings of SIGIR. 283--292.
[67]
Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma, Meihong Zheng, Jing Qian, and Kuo Zhang. 2013. Incorporating vertical results into search click models. In Proceedings of SIGIR. 503--512.
[68]
Yue Wang, Dawei Yin, Luo Jie, Pengyuan Wang, Makoto Yamada, Yi Chang, and Qiaozhu Mei. 2016. Beyond ranking: Optimizing whole-page presentation. In Proceedings of WSDM. 103--112.
[69]
Peter Willett. 1985. Query specific automatic document classification. International Forum on Information and Documentation 10, 2, 28--32.
[70]
Peter Willett. 1988. Recent trends in hierarchical document clustering: A critical review. Information Processing and Management 24, 5, 577--97.
[71]
Linpeng Yang, Donghong Ji, Guodong Zhou, Yu Nie, and Guozheng Xiao. 2006. Document re-ranking using cluster validation and label propagation. In Proceedings of CIKM. 690--697.
[72]
Chengxiang Zhai and John D. Lafferty. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR. 334--342.
[73]
Ying Zhao, Falk Scholer, and Yohannes Tsegay. 2008. Effective pre-retrieval query performance prediction using similarity and variability evidence. In Proceedings of ECIR. 52--64.
[74]
Ke Zhou, Ronan Cummins, Mounia Lalmas, and Joemon M. Jose. 2012. Evaluating aggregated search pages. In Proceedings of SIGIR. 115--124.
[75]
Ke Zhou, Ronan Cummins, Mounia Lalmas, and Joemon M. Jose. 2013a. Which vertical search engines are relevant? In Proceedings of WWW. 1557--1568.
[76]
Ke Zhou, Mounia Lalmas, Tetsuya Sakai, Ronan Cummins, and Joemon M. Jose. 2013b. On the reliability and intuitiveness of aggregated search metrics. In Proceedings of CIKM. 689--698.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 36, Issue 3
July 2018
402 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3146384
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2018
Accepted: 01 October 2017
Revised: 01 July 2017
Received: 01 December 2016
Published in TOIS Volume 36, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cluster-based retrieval
  2. aggregated search

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Technion-Microsoft Electronic Commerce Research Center
  • Israel Science Foundation
  • Yahoo faculty research and engagement award

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media