article

Inside PageRank

Authors:

Monica Bianchini,

Marco Gori,

Franco ScarselliAuthors Info & Claims

ACM Transactions on Internet Technology (TOIT), Volume 5, Issue 1

Pages 92 - 128

https://doi.org/10.1145/1052934.1052938

Published: 01 February 2005 Publication History

Get Access

Abstract

Although the interest of a Web page is strictly related to its content and to the subjective readers' cultural background, a measure of the page authority can be provided that only depends on the topological structure of the Web. PageRank is a noticeable way to attach a score to Web pages on the basis of the Web connectivity. In this article, we look inside PageRank to disclose its fundamental properties concerning stability, complexity of computational scheme, and critical role of parameters involved in the computation. Moreover, we introduce a circuit analysis that allows us to understand the distribution of the page score, the way different Web communities interact each other, the role of dangling pages (pages with no outlinks), and the secrets for promotion of Web pages.

References

[1]

Bharat, K. and Henzinger, M. R. 1998. Improved algorithms for topic distillation in hyperlinked environments. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 104--111.

Digital Library

Google Scholar

[2]

Bianchini, M., Fanelli, S., and Gori, M. 2001. Optimal algorithms for well-conditioned nonlinear systems of equations. IEEE Trans. Comput. 50, 7, 689--698.

Digital Library

Google Scholar

[3]

Björck, A. 1996. Numerical Methods for Least Squares Problems. Society for Industrial and Applied Mathematics.

Google Scholar

[4]

Bomze, I. and Gutjahr, W. 1994. The dinamics of self--evaluation. Appl. Math. Comput. 64, 47--63.

Digital Library

Google Scholar

[5]

Bomze, I. and Gutjahr, W. 1995. Estimating qualifications in a self-evaluating group. Qual. Quant. 29, 241--250.

Crossref

Google Scholar

[6]

Borodin, A., Roberts, G. O., Rosenthal, J. S., and Tsaparas, P. 2001. Finding authorities and hubs from link structures on the world wide web. In Proceedings of the 10th International World Wide Web Conference.

Digital Library

Google Scholar

[7]

Brin, S., Motwani, R., Page, L., and Winograd, T. 1998. What can you do with a web in your pocket? IEEE Bulle. Techn. Comm. Data Eng., IEEE Comput. Soc. 21, 2, 37--47.

Google Scholar

[8]

Brin, S. and Page, L. 1998. The anatomy of a large--scale hypertextual Web search engine. In Proceedings of the 7th World Wide Web Conference (WWW7).

Digital Library

Google Scholar

[9]

Brin, S., Page, L., Motwani, R., and Winograd, T. 1999. The PageRank citation ranking: Bringing order to the Web. Tech. Rep. 1999-66, Stanford University. Available on the Internet at http://dbpubs.stanford.edu:8090/pub/1999-66.

Google Scholar

[10]

Cohn, D. and Chang, H. 2000. Learning to probabilistically identify authoritative documents. In Proceedings of 17th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, Calif., 167--174.

Digital Library

Google Scholar

[11]

Cohn, D. and Hofmann, T. 2001. The missing link---A probabilistic model of document content and hypertext connectivity. In Neural Inf. Proc. Syst. 13.

Google Scholar

[12]

Diligenti, M., Gori, M., and Maggini, M. 2002. Web page scoring systems for horizontal and vertical search. In Proceedings of the 11th World Wide Web Conference (WWW11).

Digital Library

Google Scholar

[13]

Golub, G. H. and Van Loan, C. F. 1993. Matrix computation. The Johns Hopkins University Press.

Google Scholar

[14]

Haveliwala, T. H. 1999. Efficient computation of pagerank. Tech. Rep. 1999-66, Stanford University. Available on the Internet at http://dbpubs.stanford.edu:8090/pub/1999-66.

Google Scholar

[15]

Haveliwala, T. H. 2002. Topic sensitive pagerank. In Proceedings of the 11th World Wide Web Conference (WWW11). Available on the Internet at http://dbpubs.stanford.edu:8090/pub/2002-6.

Digital Library

Google Scholar

[16]

Henzinger, M. 2001. Hyperlink analysis for the Web. IEEE Internet Computing 5, 1, 45--50.

Digital Library

Google Scholar

[17]

Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632.

Digital Library

Google Scholar

[18]

Lempel, R. and Moran, S. 2000. The stochatic approach for link--structure analysis (SALSA) and the TKC effect. In Proceedings of the 9th World Wide Web Conference (WWW9). Elsevier Science, 387--401.

Digital Library

Google Scholar

[19]

Marchiori, M. 1997. The quest for correct information on the Web: Hyper search engines. Computer Networks and ISDN Systems 29, 1225--1235.

Digital Library

Google Scholar

[20]

Motwani, R. and Raghavan, P. 1995. Randomized algorithms. Cambridge University Press.

Digital Library

Google Scholar

[21]

Ng, A. Y., Zheng, A. X., and Jordan, M. I. 2001a. Link analysis, eigenvectors and stability. In Proceedings of International Conference on Research and Development in Information Retrieval (SIGIR 2001). ACM, New York.

Google Scholar

[22]

Ng, A. Y., Zheng, A. X., and Jordan, M. I. 2001b. Stable algorithms for link analysis. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI'2001).

Google Scholar

[23]

Pringle, G., Allison, L., and Dowe, D. L. 1998. What is tall poppy among the Web pages? Comput. Netwo. ISDN Syst. 30, 369--377.

Digital Library

Google Scholar

[24]

Richardson, M. and Domingos, P. 2002. The intellingent surfer: probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems, 14. MIT Press, Cambridge, Mass.

Google Scholar

[25]

Rumelhart, D., Hinton, G., and Williams, R. 1986. Learning representations by back-propagating errors. Nature 323, 533--536.

Crossref

Google Scholar

[26]

Seneta, E. 1981. Non-negative matrices and Markov chains. Springer-Verlag, New York, Chap. 4, pp. 112--158.

Google Scholar

[27]

Varga, R. S. 1962. Matrix Iterative Analysis. Prentice--Hall, Englewood Cliffs, N.J.

Google Scholar

[28]

Zhang, D. and Dong, Y. 2000. An efficient algorithm to rank web resources. In Proceedings of the 9th International World Wide Web Conference (WWW9). Elsevier Science, Amsterdam, The Netherlands.

Digital Library

Google Scholar

Cited By

View all

Weinstein NCarlsen JSchulz SStapleton THenriksen HTravnik EJohansson P(2024)A Lifelike guided journey through the pathophysiology of pulmonary hypertension—from measured metabolites to the mechanism of action of drugsFrontiers in Cardiovascular Medicine10.3389/fcvm.2024.134114511Online publication date: 23-May-2024
https://doi.org/10.3389/fcvm.2024.1341145
Ben-Gigi NZhitomirsky-Geffet MKatzoff BSchler J(2024)Citation network analysis for viewpoint plurality assessment of historical corpora: The case of the medieval rabbinic literaturePLOS ONE10.1371/journal.pone.030711519:7(e0307115)Online publication date: 22-Jul-2024
https://doi.org/10.1371/journal.pone.0307115
Wu SWu DQuan JChan TLu K(2024)Efficient and Accurate PageRank Approximation on Large GraphsProceedings of the ACM on Management of Data10.1145/36771322:4(1-26)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677132
Show More Cited By

Index Terms

Inside PageRank

Recommendations

PageRank revisited

PageRank, one part of the search engine Google, is one of the most prominent link-based rankings of documents in the World Wide Web. Usually it is described as a Markov chain modeling a specific random surfer. In this article, an alternative ...
Beyond PageRank: machine learning for static ranking
WWW '06: Proceedings of the 15th international conference on World Wide Web

Since the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are ...
Associated pagerank: improved pagerank measured by frequent term sets
VECIMS'09: Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems

Web search engines encounter many new challenges while the amount of information on the web increases rapidly. Web documents have been a main resource for various purposes, and people rely on search engines to retrieve the desired documents. This paper ...

Reviews

Reviewer: Kipp Jones

Web searching continues to be a popular topic, of both commercial and academic interest. This paper presents an in-depth mathematical analysis of the properties of Google's PageRank algorithm, which relies on the topological structure of the Web to calculate the relative value of a given Web page. In addition to the rigorous mathematical treatment, the paper provides insights into the implications of these calculations, and how they impact stability, computability, convergence, maintenance, and vulnerability. The PageRank algorithm exploits the inherent properties of the Markovian matrices that represent the Web's structure to calculate individual page ranks. The authors go on to show that it is possible to perform an optimal computation, based on the limited precision requirements imposed by the PageRank formula. This result is an important factor for the scalability of the PageRank algorithm. However, as pointed out by Langville and Meyer [1], "if the holy grail of real-time personalized search is ever to be realized, then drastic speed improvements must be made..." The authors also introduce the notion of energy, and demonstrate the mathematical properties of this concept through the use of circuit analysis on a subgraph or community of Web pages. This analysis leads to some interesting properties, including the energy loss introduced by "dangling" pages, and "outlinks" from a given community. Following this line, the authors also detail mechanisms by which the PageRank can be increased (or decreased), to influence the relative ranking of the pages within the ranking system. Indeed, one of the authors' findings is the fact that the energy of a given target community can be driven to grow linearly with the growth of a "promoting community." This results in a promotion mechanism that is very difficult to detect. Another nugget suggested by this research is that hyperlinks to outside pages should be placed in pages with a small PageRank, in addition to a large number of internal links. This property, as well as others discussed by the authors, can be exploited to retain as much energy as possible in a given community, and may provide some insights for site designers. With this paper, the authors provide a valuable resource for those interested in analyzing the behavior of, and optimizing, large-scale ranking algorithms, such as the one used by Google. The math should keep those interested in the theory busy, while the conclusions provide some practical advice for practitioners in the field. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology

ACM Transactions on Internet Technology Volume 5, Issue 1

February 2005

297 pages

ISSN:1533-5399

EISSN:1557-6051

DOI:10.1145/1052934

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2005

Published in TOIT Volume 5, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

373
Total Citations
View Citations
7,245
Total Downloads

Downloads (Last 12 months)294
Downloads (Last 6 weeks)22

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Weinstein NCarlsen JSchulz SStapleton THenriksen HTravnik EJohansson P(2024)A Lifelike guided journey through the pathophysiology of pulmonary hypertension—from measured metabolites to the mechanism of action of drugsFrontiers in Cardiovascular Medicine10.3389/fcvm.2024.134114511Online publication date: 23-May-2024
https://doi.org/10.3389/fcvm.2024.1341145
Ben-Gigi NZhitomirsky-Geffet MKatzoff BSchler J(2024)Citation network analysis for viewpoint plurality assessment of historical corpora: The case of the medieval rabbinic literaturePLOS ONE10.1371/journal.pone.030711519:7(e0307115)Online publication date: 22-Jul-2024
https://doi.org/10.1371/journal.pone.0307115
Wu SWu DQuan JChan TLu K(2024)Efficient and Accurate PageRank Approximation on Large GraphsProceedings of the ACM on Management of Data10.1145/36771322:4(1-26)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677132
Zhang SZhao YXiong XSun YNie XZhang JWang FZheng XZhang YPei Dd'Amorim M(2024)Illuminating the Gray Zone: Non-intrusive Gray Failure Localization in Server Operating SystemsCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663834(126-137)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663834
Khan MMello GHabib LEngelstad PYazidi A(2024)HITS-based Propagation Paradigm for Graph Neural NetworksACM Transactions on Knowledge Discovery from Data10.1145/363877918:4(1-23)Online publication date: 13-Feb-2024
https://dl.acm.org/doi/10.1145/3638779
Shah RJain KManchanda SMedya SRanu SBaeza-Yates RBonchi F(2024)NeuroCut: A Neural Approach for Robust Graph PartitioningProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671815(2584-2595)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671815
Yang MWang HWei ZWang SWen J(2024)Efficient Algorithms for Personalized PageRank Computation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337600036:9(4582-4602)Online publication date: Sep-2024
https://doi.org/10.1109/TKDE.2024.3376000
Zeng ZXiang TGuo SHe JZhang QXu GZhang T(2024)Contrast-Then-Approximate: Analyzing Keyword Leakage of Generative Language ModelsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.339253519(5166-5180)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1109/TIFS.2024.3392535
Pandey DShu T(2024)AM-DGCNN: Leveraging Graph Attention Networks and Edge Attributes for Link Classification in Knowledge GraphsSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SCW63240.2024.00144(1037-1045)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SCW63240.2024.00144
Tushara SSudarsan S(2024)Study on the Webpage properties and prediction using Google.2024 International Conference on Social and Sustainable Innovations in Technology and Engineering (SASI-ITE)10.1109/SASI-ITE58663.2024.00066(311-314)Online publication date: 23-Feb-2024
https://doi.org/10.1109/SASI-ITE58663.2024.00066
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

PageRank revisited

Beyond PageRank: machine learning for static ranking

Associated pagerank: improved pagerank measured by frequent term sets

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations