skip to main content
10.1145/1557914.1557933acmconferencesArticle/Chapter ViewAbstractPublication PageshtConference Proceedingsconference-collections
research-article

The scalable hyperlink store

Published: 29 June 2009 Publication History

Abstract

This paper describes the Scalable Hyperlink Store, a distributed in-memory "database" for storing large portions of the web graph. SHS is an enabler for research on structural properties of the web graph as well as new link-based ranking algorithms. Previous work on specialized hyperlink databases focused on finding efficient compression algorithms for web graphs. By contrast, this work focuses on the systems issues of building such a database. Specifically, it describes how to build a hyperlink database that is fast, scalable, fault-tolerant, and incrementally updateable.

References

[1]
M. Adler and M. Mitzenmacher.Towards Compressing Web Graphs.In 11th IEEE Data Compression Conference, March 2001, pages 203--212.
[2]
L. Becchetti, C. Castillo, D. Donato, R. Baeza-Yates, and S. Leonardi. Link Analysis for Web Spam Detection. ACM Transactions on the Web, 2(1), 2008.
[3]
K. Bharat, A. Broder, M. Henzinger, P. Kumar, and S. Venkatasubramanian.The Connectivity Server: fast access to linkage information on the Web.In 7th International World Wide Web Conference,April 1998, pages 469--477.
[4]
P. Boldi and S. Vigna.The WebGraph Framework I: Compression Techniques.In 13th International World Wide Web Conference,May 2004, pages 595--601.
[5]
P. Boldi and S. Vigna.The WebGraph Framework II: Codes For The World-Wide Web. In 14th IEEE Data Compression Conference, March 2004, page 528.
[6]
A. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher. Min-Wise Independent Permutations. Journal of Computer and System Sciences 60(3):630--659, 2000.
[7]
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener.Graph structure in the Web. In 9th International World Wide Web Conference,May 2000, pages 309--320.
[8]
G. Buehrer and K. Chellapilla.A Scalable Pattern Mining Approach to Web Graph Compression with Communities.In 1st Intl. Conf. on Web Search and Data Mining, February 2008, pages 95--106.
[9]
A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal. Using the Wisdom of the Crowds for Keyword Generation. In 17th International World Wide Web Conference,April 2008, pages 61--70.
[10]
S. Gollapudi, M. Najork, and R. Panigrahy. Using Bloom Filters to Speed Up HITS-like Ranking Algorithms. In 5th Workshop on Algorithms and Models for the Web--Graph, December 2007, pages 195--201.
[11]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In 9th Annual ACM--SIAM Symposium on Discrete Algorithms, January 1998, pages 668--677.
[12]
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for Emerging Cyber-Communities. In 8th International World Wide Web Conference,May 1999, pages 11--16.
[13]
R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA) and the TKC effect. Computer Networks and ISDN Systems, 33(1--6):387--401, 2000.
[14]
M. Marchiori. The quest for correct information on the Web: Hyper search engines. In Computer Networks and ISDN Systems, 29(8--13):1225--1236, 1997.
[15]
A. Moffat and A. Turpin. Compression and Coding Algorithms. Kluwer Academic Publishers, 2002.
[16]
M. Najork. System and method for maintaining a distributed database of hyperlinks. US Patent 7340467; filed April 2003, issued March 2008.
[17]
M. Najork. Comparing the Effectiveness of HITS and SALSA. In 16th ACM Conference on Information and Knowledge Management, November 2007, pages 157--164.
[18]
M. Najork and N. Craswell. Efficient and Effective Link Analysis with Precomputed SALSA Maps. In 17th ACM Conference on Information and Knowledge Management,October 2008, pages 53--61.
[19]
M. Najork, S. Gollapudi, and R. Panigrahy. Less is More: Sampling the Neighborhood Graph Makes SALSA Better and Faster. In 2nd ACM International Conference on Web Search and Data Mining, February 2009, pages 242--251.
[20]
M. Najork and A. Heydon. High-Performance Web Crawling. In Handbook of Massive Data Sets,Kluwer Academic Publishers, 2002.
[21]
M. Najork, H. Zaragoza, and M. Taylor. HITS on the Web: How does it Compare? In 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 2007, pages 471--478.
[22]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
[23]
K. Randall, R. Stata, R. Wickremesinghe, and J. Wiener. The Link Database: Fast Access to Graphs of the Web. In 12th IEEE Data Compression Conference, April 2002, pages 122--131.
[24]
T. Suel and J. Yuan. Compressing the Graph Structure of the Web. In 11th IEEE Data Compression Conference, March 2001, pages 213--222.
[25]
I. Witten, A. Moffat, and T. Bell. Managing Gigabytes (2nd edition).Academic Press, 1999.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HT '09: Proceedings of the 20th ACM conference on Hypertext and hypermedia
June 2009
410 pages
ISBN:9781605584867
DOI:10.1145/1557914
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hyperlink database
  2. scalability
  3. web graph

Qualifiers

  • Research-article

Conference

HT '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 378 of 1,158 submissions, 33%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Efficient Scalable Temporal Web Graph Store2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671984(263-273)Online publication date: 15-Dec-2021
  • (2016)WeaverProceedings of the VLDB Endowment10.14778/2983200.29832029:11(852-863)Online publication date: 1-Jul-2016
  • (2013)NaiadProceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles10.1145/2517349.2522738(439-455)Online publication date: 3-Nov-2013
  • (2013)Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems PrinciplesundefinedOnline publication date: 3-Nov-2013
  • (2012)Towards effective partition management for large graphsProceedings of the 2012 ACM SIGMOD International Conference on Management of Data10.1145/2213836.2213895(517-528)Online publication date: 20-May-2012
  • (2012)Of hammers and nailsProceedings of the fifth ACM international conference on Web search and data mining10.1145/2124295.2124310(103-112)Online publication date: 8-Feb-2012
  • (2012)EarlybirdProceedings of the 2012 IEEE 28th International Conference on Data Engineering10.1109/ICDE.2012.149(1360-1369)Online publication date: 1-Apr-2012
  • (2011)Scalable manipulation of archival web graphsProceedings of the 9th workshop on Large-scale and distributed informational retrieval10.1145/2064730.2064739(27-32)Online publication date: 28-Oct-2011
  • (2011)HipGACM SIGOPS Operating Systems Review10.1145/2007183.200718545:2(3-13)Online publication date: 18-Jul-2011
  • (2010)Querying the web graphProceedings of the 17th international conference on String processing and information retrieval10.5555/1928328.1928330(1-12)Online publication date: 11-Oct-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media