skip to main content
10.1145/1835449.1835536acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Query similarity by projecting the query-flow graph

Published: 19 July 2010 Publication History

Abstract

Defining a measure of similarity between queries is an interesting and difficult problem. A reliable query-similarity measure can be used in a variety of applications such as query recommendation, query expansion, and advertising.
In this paper, we exploit the information present in query logs in order to develop a measure of semantic similarity between queries. Our approach relies on the concept of the query-flow graph. The query-flow graph aggregates query reformulations from many users: nodes in the graph represent queries, and two queries are connected if they are likely to appear as part of the same search goal. Our query similarity measure is obtained by projecting the graph (or appropriate subgraphs of it) on a low-dimensional Euclidean space. Our experiments show that the measure we obtain captures a notion of semantic similarity between queries and it is useful for diversifying query recommendations.

References

[1]
Antonellis, I., Garcia-Molina, H., and Chang, C.-C. Simrank++: Query rewriting through link analysis of the click graph. In VLDB (2008).
[2]
Baeza-Yates, R. Graphs from search engine queries. In Theory and Practice of Computer Science (SOFSEM) (2007).
[3]
Baeza-Yates, R., and Tiberi, A. Extracting semantic relations from query logs. In KDD (2007).
[4]
Baeza-Yates, R. A., Hurtado, C. A., and Mendoza, M. Query recommendation using query logs in search engines. In EDBT Workshops (2004).
[5]
Beeferman, D., and Berger, A. Agglomerative clustering of a search engine query log. In KDD (2000).
[6]
Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., and Vigna, S. The query-flow graph: Model and applications. In CIKM (2008).
[7]
Boldi, P., Bonchi, F., Castillo, C., Donato, D., and Vigna, S. Query suggestions using query-flow graphs. In WSCD (2009).
[8]
Borges, J., and Levene, M. Evaluating variable-length markov chain models for analysis of user web navigation sessions. IEEE Trans. Knowl. Data Eng. 19, 4 (2007), 441--452.
[9]
Chung, F. Laplacians and the cheeger inequality for directed graphs. Annals of Combinatorics 9, 1 (2005).
[10]
Chung, F. R. K. Spectral Graph Theory (CBMS Regional conf. Series in Mathematics, No. 92). American Mathematical Society, February 1997.
[11]
Craswell, N., and Szummer, M. Random walks on the click graph. In SIGIR (2007).
[12]
Fonseca, B. M., Golgher, P. B., de Moura, E. S., and Ziviani, N. Using association rules to discover search engines related queries. In LA-WEB (Washington, DC, USA, 2003).
[13]
Fuxman, A., Tsaparas, P., Achan, K., and Agrawal, R. Using the wisdom of the crowds for keyword generation. In WWW (2008).
[14]
Jeh, G., and Widom, J. Simrank: a measure of structural-context similarity. In KDD (2002).
[15]
Jones, R., and Klinkner, K. L. Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In CIKM (2008).
[16]
Jones, R., Rey, B., Madani, O., and Greiner, W. Generating query substitutions. In WWW (2006).
[17]
Koren, Y. On spectral graph drawing. In COCOON (2003).
[18]
Kruskal, J. Nonmetric multidimensional scaling: A numerical method. Psychometrika 29, 2 (1964).
[19]
Levene, M., and Loizou, G. A probabilistic approach to navigation in hypertext. Inf. Sci. 114, 1-4 (1999), 165--186.
[20]
Luxenburger, J., Elbassuoni, S., and Weikum, G. Matching task profiles and user needs in personalized web search. In CIKM (2008).
[21]
Manca, M., and Pintus, E. Diversity in web search. Master's thesis, University of Cagliari, Italy, 2009.
[22]
Mei, Q., Zhou, D., and Church, K. Query suggestion using hitting time. In CIKM (2008).
[23]
METIS -- Family of multilevel partitioning algorithms. http://glaros.dtc.umn.edu/gkhome/views/metis/.
[24]
Ng, A. Y., Jordan, M. I., and Weiss, Y. On spectral clustering: Analysis and an algorithm. In NIPS (2001).
[25]
Radlinski, F., and Joachims, T. Query chains: learning to rank from implicit feedback. In KDD (2005).
[26]
Richardson, M. Learning about the world through long-term query logs. ACM Trans. Web 2, 4 (2008).
[27]
Sadikov, E., Madhavan, J., Wang, L., and Halevy, A. Clustering query refinements by user intent. In 19th International World Wide Web Conference, WWW 2010. (2010).
[28]
Sun, J., Boyd, S., Xiao, L., and Diaconis, P. The fastest mixing markov process on a graph and a connection to a maximum variance unfolding problem. SIAM Review 48 (2004), 2006.
[29]
Tenenbaum, J. B., Silva, V., and Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 290, 5500 (2000).
[30]
Wen, J.-R., Nie, J.-Y., and Zhang, H.-J. Clustering user queries of a search engine. In Proc. of the 10th WWW conf. (2001).
[31]
White, R. W., Bilenko, M., and Cucerzan, S. Studying the use of popular destinations to enhance web search interaction. In SIGIR (2007).
[32]
White, R. W., Bilenko, M., and Cucerzan, S. Leveraging popular destinations to enhance web search interaction. ACM Trans. Web 2, 3 (2008), 1--30.
[33]
Zhang, Z., and Nasraoui, O. Mining search engine query logs for query recommendation. In Proc. of the 15th WWW conf. (2006).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
July 2010
944 pages
ISBN:9781450301534
DOI:10.1145/1835449
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. query reformulations
  2. query similarity
  3. spectral projections

Qualifiers

  • Research-article

Conference

SIGIR '10
Sponsor:

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)2
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media