skip to main content
article

Stratified analysis of AOL query log

Published: 01 May 2009 Publication History

Abstract

Characterizing user's intent and behaviour while using a retrieval information tool (e.g. a search engine) is a key question on web research, as it hold the keys to know how the users interact, what they are expecting and how we can provide them information in the most beneficial way. Previous research has focused on identifying the average characteristics of user interactions. This paper proposes a stratified method for analyzing query logs that groups queries and sessions according to their hit frequency and analyzes the characteristics of each group in order to find how representative the average values are. Findings show that behaviours typically associated with the average user do not fit in most of the aforementioned groups.

References

[1]
Eytan Adar. User 4xxxxx9: anonymizing query logs. in: Proceedings of the 16th International World Wide Web Conference, Workshop Query Log Analysis: Social and Technological Challenges (Page Accessed with No Page Numbers), Banff, Alberta, Canadá, May 2007. URL <http://www.cond.org/anonlogs.pdf>.
[2]
Chris Anderson. The Long Tail: Why the Future of Business is Selling Less of More. Hyperion, July 2006a, ISBN: 1401302378.
[3]
Nate Anderson. The Ethics of Using AOL Search Data. Online, 08 2006b. URL <http://arstechnica.com/news.ars/post/20060823-7578.html>.
[4]
Michael Barbaro, Tom Zeller Jr., A Face is Exposed for AOL Searcher No. 4417749. The New York Times, August 2006, ISSN: 0362-4331. URL <http://www.nytimes.com/2006/08/09/technology/09aol.html>. <http://www.nytimes.com/2006/08/09/technology/09aol.html?ex=1312776000>.
[5]
David J. Brenes, Daniel Gayo-Avello, Automatic detection of navigational queries according to behavioural characteristics. in: Lernen, Wissen und Adaptivität 2008 Workshop Proceedings, Workshop on Information Retrieval, Würzburg, Germany, October 2008.
[6]
Broder, Andrei, A taxonomy of web search. SIGIR Forum. v36. 3-10.
[7]
Aaron Clauset, Cosma Rohilla Shalizi, M.E.J Newman. Power-law Distributions in Empirical Data. Online, June 2007.
[8]
Katie Hafner. Researchers yearn to use AOL logs, but they hesitate. The New York Times, August 2006, ISSN 0362-4331. URL <http://www.nytimes.com/2006/08/23/technology/23search.html>.
[9]
Daqing He, Ayse Göker. Detecting session boundaries from web user logs. in: Proceedings of the BCS-IRSG 22nd Annual Colloquium on Information Retrieval Research, 2000, pp. 57-66.
[10]
Cristoph Hoelscher. How internet experts search for information on the web, in: H. Maurer, R.G. Olson (Eds.), Proceedings of WebNet98 - World Conference of the WWW, Internet and Intranet (Pages Published on CD-ROM with No Page Numbers), Orlando, FL, 1998.
[11]
Jansen, Bernard J., Booth, Danielle L. and Spink, Amanda, Determining the informational, navigational, and transactional intent of web queries. Information Processing and Management. v44. 1251-1266.
[12]
Bernard J. Jansen, Udo Pooch, A review of web searching studies and a framework for future research, Journal of the American Society for Information Science and Technology, 52 (2001) 235-46, ISSN: 3318-3324.
[13]
Jansen, Bernard J., Spink, Amanda, Bateman, Judy and Saracevic, Tefko, Real life information retrieval: a study of user queries on the web. ACM SIGIR Forum. v32. 5-17.
[14]
Jansen, Bernard J., Spink, Amanda and Saracevic, Tefko, Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management. v36. 207-227.
[15]
Krug, Steve, Don't Make Me Think: A Common Sense Approach to Web Usability. 2005. second ed. ISBN:0321344758. New Riders Press.
[16]
Tessa Lau, Eric Horvitz, Patterns of search: analyzing and modeling web query refinement, in: Proceedings of the Seventh International Conference on User Modeling, Banff, Canada, Springer-Verlag, New York, Inc., 1999, pp. 119-128, ISBN: 3-211-83151-7.
[17]
Uichin Lee, Zhenyu Liu, Junghoo Cho, Automatic identification of user goals in web search, in: Proceedings of the 14th International Conference on World Wide Web, ACM, Chiba, Japan, 2005, pp. 391-400, ISBN: 1-59593-046-9.
[18]
Qiaozhu Mei, Kenneth Church, Entropy of search logs: how hard is search? with personalization? with backoff? in: Proceedings of the International Conference on Web Search and Web Data Mining, ACM, Palo Alto, California, USA, 2008, pp. 45-54, ISBN: 978-1-59593-927-9.
[19]
Mark R. Meiss, Filippo Menczer, Santo Fortunato, Alessandro Flammini, Alessandro Vespignani. Ranking web sites with real user traffic. in: First ACM International Conference on Web Search and Data Mining, pp. 65-76, 2008.
[20]
Newman, M.E.J., Power laws, pareto distributions and zipf's law. Contemporary Physics. v46. 323-351.
[21]
Pass, Greg, Chowdhury, Abdur and Torgeson, Cayley, A picture of search. In: The First International Conference on Scalable Information Systems, ACM, Hong Kong. pp. 1-7.
[22]
Hsiao-Tieh Pu, An analysis of failed queries for web image retrieval, Journal of Information Science 34 (3) (2008) 275-289. URL <http://jis.sagepub.com/cgi/content/abstract/34/3/275>.
[23]
Daniel E. Rose, Danny Levinson, Understanding user goals in web search. in: Proceedings of the 13th International Conference on World Wide Web, ACM, New York, USA, 2004, pp. 13-19, ISBN: 1-58113-844-X.
[24]
Narayanan Sadagopan, Jie Li. Characterizing typical and atypical user sessions in clickstreams, in: Proceeding of the 17th International Conference on World Wide Web, ACM, Beijing, China, 2008, pp. 885-894, ISBN: 978-1-60558-085-2. URL<http://portal.acm.org/citation.cfm?id=1367497.1367617>.
[25]
Silverstein, Craig, Henzinger, Monika, Marais, Hannes and Moricz, Michael, Analysis of a very large altavista query log. ACM SIGIR Forum. v33. 6-12.
[26]
Amanda Spink, Dietmar Wolfram, Major B.J. Jansen, Tefko Saracevic, Searching the web: the public and their queries, Journal of the American Society for Information Science and Technology 52 (2001) 226-34, ISSN: 3318-3324.
[27]
Spink, Amanda, Ozmutlu, Seda, Ozmutlu, Huseyin C. and Jansen, Bernard J., US versus European web searching trends. SIGIR Forum. v36. 32-38.
[28]
Li Xiong, Eugene Agichtein, Towards privacy-preserving query log publishing, in: Proceedings of the 16th International World Wide Web Conference, Workshop Query Log Analysis: Social and Technological Challenges (Page Accessed with No Page Numbers), Banff, Alberta, Canadá, July 2007.

Cited By

View all
  1. Stratified analysis of AOL query log

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Information Sciences: an International Journal
    Information Sciences: an International Journal  Volume 179, Issue 12
    May, 2009
    258 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 01 May 2009

    Author Tags

    1. Query log analysis
    2. User behaviour
    3. User intent
    4. User interactions
    5. User profiling

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media