skip to main content
10.1145/1835449.1835471acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Freshness matters: in flowers, food, and web authority

Published: 19 July 2010 Publication History

Abstract

The collective contributions of billions of users across the globe each day result in an ever-changing web. In verticals like news and real-time search, recency is an obvious significant factor for ranking. However, traditional link-based web ranking algorithms typically run on a single web snapshot without concern for user activities associated with the dynamics of web pages and links. Therefore, a stale page popular many years ago may still achieve a high authority score due to its accumulated in-links. To remedy this situation, we propose a temporal web link-based ranking scheme, which incorporates features from historical author activities. We quantify web page freshness over time from page and in-link activity, and design a web surfer model that incorporates web freshness, based on a temporal web graph composed of multiple web snapshots at different time points. It includes authority propagation among snapshots, enabling link structures at distinct time points to influence each other when estimating web page authority. Experiments on a real-world archival web corpus show our approach improves upon PageRank in both relevance and freshness of the search results.

References

[1]
A. Acharya, M. Cutts, J. Dean, P. Haahr, M. Henzinger, U. Hoelzle, S. Lawrence, K. Pfleger, O. Sercinoglu, and S. Tong. Information retrieval based on historical data. US Patent 7,346,839, USPTO, Mar. 2008.
[2]
E. Adar, J. Teevan, S. Dumais, and J. L. Elsas. The web changes everything: understanding the dynamics of web content. In Proc. of 2nd ACM WSDM Conf., pages 282--291. Feb, 2009.
[3]
Amazon, Inc. Amazon mechanical turk home page, 2010. http://www.mturk.com/.
[4]
E. Amitay, D. Carmel, M. Herscovici, R. Lempel, and A. Soffer. Trend detection through temporal link analysis. Journal of the American Society for Information Science and Technology, 55(14):1270--1281, 2004.
[5]
Z. Bar-Yossef, A. Z. Broder, R. Kumar, and A. Tomkins. Sic transit gloria telae: Towards an understading of the web's decay. In Proc. of 13th Int'l World Wide Web Conference, pages 328--337. May 2004.
[6]
K. Berberich, S. Bedathur, M. Vazirgiannis, and G. Weikum. Buzzrank... and the trend is your friend. In Proc. of 15th Int'l World Wide Web Conference, pages 937--938, May 2006.
[7]
K. Berberich, S. Bedathur, G. Weikum, and M. Vazirgiannis. Comparing apples and oranges: Normalized PageRank for evolving graphs. In Proc. of 16th Int'l World Wide Web Conference, pages 1145--1146, May 2007.
[8]
K. Berberich, M. Vazirgiannis, and G. Weikum. Time-aware authority ranking. Internet Mathematics, 2(3):301--332, 2005.
[9]
J. Bian, Y. Liu, D. Zhou, E. Agichtein, and H. Zha. Learning to recognize reliable users and content in social media with coupled mutual reinforcement. In Proc. of 18th Int'l World Wide Web Conference, pages 51--60, Apr. 2009.
[10]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In Proc. of 7th Int'l World Wide Web Conference, pages 107--117, Apr. 1998.
[11]
D. Cai, X. He, J. R.Wen, and W. Y. Ma. Block-level link analysis. In Proc. of 27th Annual Int'l ACM SIGIR Conf., Sheffield, UK, July 2004.
[12]
D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv., 38(1):2, 2006.
[13]
J. Cho, S. Roy, and R. E. Adams. Page quality: In search of an unbiased web ranking. In Proc. of ACM SIGMOD, Baltimore, MD, June 2005.
[14]
Y. J. Chung, M. Toyoda, and M. Kitsuregawa. A study of link farm distribution and evolution using a time series of web snapshots. In Proc. of the 5th Int'l Workshop on Adversarial Information Retrieval on the Web, pages 9--16, New York, NY, USA, 2009. ACM.
[15]
O. de Kretser and A. Moffat. Effective document presentation with a locality-based similarity heuristic. In Proc. of 22nd Annual Int'l ACM SIGIR Conf., pages 113--120, New York, NY, USA, 1999. ACM.
[16]
J. L. Elsas and S. T. Dumais. Leveraging temporal dynamics of document content in relevance ranking. In Proc. of 3nd ACM WSDM Conf., pages 1--10. Feb, 2010.
[17]
B. Gao, T. Y. Liu, Z. Ma, T.Wang, and H. Li. A general markov framework for page importance computation. In Proc. of 18th ACM CIKM Conf., pages 1835--1838, New York, NY, USA, 2009. ACM.
[18]
Google Inc. Google trends home page, 2010. http://www.google.com/trends.
[19]
T. Haveliwala, S. Kamvar, A. Kamvar, and G. Jeh. An analytical comparison of approaches to personalizing pagerank. Technical report, Stanford University, 2003.
[20]
Internet Archive. The Internet Archive. 2010. http://www.archive.org/.
[21]
K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proc. of 23rd Annual Int'l ACM SIGIR Conf., pages 41--48, July 2000.
[22]
K. Kise, M. Junker, A. Dengel, and K. Matsumoto. Passage Retrieval Based on Density Distributions of Terms and Its Applications to Document Retrieval and Question Answering. In volume 2956 of LNCS, pages 306--327. Springer, Berlin/Heidelberg, 2004.
[23]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of the ACM-SIAM Symposium on Discrete Algorithms (SODA-98), pages 668--677, San Francisco, CA, Jan. 1998.
[24]
Y. Li and J. Tang. Expertise search in a time-varying social network. In Proc. of 9th Int'l Web-Age Information Management Conf. (WAIM 08), July 2008.
[25]
Y. Liu, B. Gao, T. Y. Liu, Y. Zhang, Z. Ma, S. He, and H. Li. Browserank: letting web users vote for page importance. In Proc. of 31st Annual Int'l ACM SIGIR Conf., pages 451--458, New York, NY, USA, 2008. ACM.
[26]
Y. Lv and C. Zhai. Positional language models for information retrieval. In Proc. of 32nd Annual Int'l ACM SIGIR Conf., pages 299--306, New York, NY, USA, 2009. ACM.
[27]
NIST. Text REtrieval Conference (TREC) home page, 2010. http://trec.nist.gov/.
[28]
D. Petkova and W. B. Croft. Proximity-based document representation for named entity retrieval. In Proc. of 16th ACM CIKM Conf., pages 731--740, New York, NY, USA, 2007. ACM.
[29]
S. E. Robertson. Overview of the OKAPI projects. Journal of Documentation, 53:3--7, 1997.
[30]
S. M. Ross. Introduction to Probability Models, Ninth Edition. Academic Press, Inc., Orlando, FL, USA, 2006.
[31]
G. Shen, B. Gao, T.Y. Liu, G. Feng, S. Song, and H. Li. Detecting link spam using temporal information. In Proc. of IEEE International Conference on Data Mining, pages 1049--1053, 2006.
[32]
B. Wu, V. Goel and B. D. Davison. Propagating Trust and Distrust to Demote Web Spam. In Proc. of WWW2006 MTW Workshop, 2006.
[33]
Yahoo!, Inc. Yahoo! site explorer, 2010. http://siteexplorer.search.yahoo.com/.
[34]
L. Yang, L. Qi, Y. P. Zhao, B. Gao, and T.Y. Liu. Link analysis using time series of web graphs. In Proc. of 16th ACM CIKM Conf., pages 1011--1014, New York, NY, USA, 2007. ACM.
[35]
P. S. Yu, X. Li, and B. Liu. On the temporal dimension of search. In Proc. of 13rd Int'l World Wide Web Conference, pages 448--449. ACM Press, May 2004.

Cited By

View all

Index Terms

  1. Freshness matters: in flowers, food, and web authority

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
    July 2010
    944 pages
    ISBN:9781450301534
    DOI:10.1145/1835449
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. pagerank
    2. temporal link analysis
    3. web freshness
    4. web search engine

    Qualifiers

    • Research-article

    Conference

    SIGIR '10
    Sponsor:

    Acceptance Rates

    SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media