skip to main content
10.1145/1935826.1935888acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
poster

Multidimensional mining of large-scale search logs: a topic-concept cube approach

Published: 09 February 2011 Publication History

Abstract

In addition to search queries and the corresponding clickthrough information, search engine logs record multidimensional information about user search activities, such as search time, location, vertical, and search device. Multidimensional mining of search logs can provide novel insights and useful knowledge for both search engine users and developers. In this paper, we describe our topic-concept cube project, which addresses the business need of supporting multidimensional mining of search logs effectively and efficiently. We answer two challenges. First, search queries and click-through data are well recognized sparse, and thus have to be aggregated properly for effective analysis. Second, there is often a gap between the topic hierarchies in multidimensional aggregate analysis and queries in search logs. To address those challenges, we develop a novel topic-concept model that learns a hierarchy of concepts and topics automatically from search logs. Enabled by the topicconcept model, we construct a topic-concept cube that supports online multidimensional mining of search log data. A distinct feature of our approach is that, in addition to the standard dimensions such as time and location, our topic-concept cube has a dimension of topics and concepts, which substantially facilitates the analysis of log data. To handle a huge amount of log data, we develop distributed algorithms for learning model parameters efficiently. We also devise approaches to computing a topic-concept cube. We report an empirical study verifying the effectiveness and efficiency of our approach on a real data set of 1.96 billion queries and 2.73 billion clicks.

References

[1]
http://research.microsoft.com/en-us/people/djiang/ext.pdf.
[2]
ODP: http://www.dmoz.org.
[3]
Wikipedia: http://en.wikipedia.org.
[4]
Yahoo! Directory: http://dir.yahoo.com.
[5]
Backstrom, L., et al. Spatial variation in search engine queries. In WWW'08.
[6]
Baeza-Yates, R.A., et al. Query recommendation using query logs in search engines. In EDBT'04 Workshop.
[7]
Beeferman, D. and Berger, A. Agglomerative clustering of a search engine query log. In KDD'00.
[8]
Beitzel, S.M., et al. Hourly analysis of a very large topically categorized web query log. In SIGIR'04.
[9]
Cao, H., et al. Context-aware query suggestion by mining click-through and session data. In KDD'08.
[10]
Cao, H., et al. Towards context-aware search by learning a very large variable length hidden markov model from search logs. In WWW'09.
[11]
Dean, J., et al. MapReduce: simplified data processing on large clusters. In OSDI'04
[12]
Dempster, A.P., et al. Maximal likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Ser B(39):1---38, 1977.
[13]
Grey, J., et al. Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In ICDE'96.
[14]
Hofmann, T. Probabilistic Latent Semantic Analysis. In UAI'99.
[15]
Joachims, T. Text categorization with support vector machines: learning with many relevant features. In ECML'98.
[16]
Joachims, T. Transductive inference for text classification using support vector machines. In ICML'99.
[17]
Kamvar, M. et al. Computers and iphones and mobile phones, oh my!: a logs--based comparison of search users on different devices. In WWW'09.
[18]
Shen, D. et al. Q2c@ust: our winning solution to query classification in kddcup 2005. KDD Exploration, 7(2), 2005.
[19]
Wen, J., et al. Clustering user queries of a search engine. In WWW'01.
[20]
Zhang, D., et al. Topic cube: Topic modeling for olap on multidimensional text databases. In SDM'09.
[21]
Zhao, Q., et al. Event detection from evolution of click-through data. In KDD'06.

Cited By

View all

Index Terms

  1. Multidimensional mining of large-scale search logs: a topic-concept cube approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '11: Proceedings of the fourth ACM international conference on Web search and data mining
    February 2011
    870 pages
    ISBN:9781450304931
    DOI:10.1145/1935826
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 February 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. olap
    2. search log
    3. topic-concept cube

    Qualifiers

    • Poster

    Conference

    Acceptance Rates

    WSDM '11 Paper Acceptance Rate 83 of 372 submissions, 22%;
    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media