skip to main content
article

Inside PageRank

Published: 01 February 2005 Publication History

Abstract

Although the interest of a Web page is strictly related to its content and to the subjective readers' cultural background, a measure of the page authority can be provided that only depends on the topological structure of the Web. PageRank is a noticeable way to attach a score to Web pages on the basis of the Web connectivity. In this article, we look inside PageRank to disclose its fundamental properties concerning stability, complexity of computational scheme, and critical role of parameters involved in the computation. Moreover, we introduce a circuit analysis that allows us to understand the distribution of the page score, the way different Web communities interact each other, the role of dangling pages (pages with no outlinks), and the secrets for promotion of Web pages.

References

[1]
Bharat, K. and Henzinger, M. R. 1998. Improved algorithms for topic distillation in hyperlinked environments. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 104--111.
[2]
Bianchini, M., Fanelli, S., and Gori, M. 2001. Optimal algorithms for well-conditioned nonlinear systems of equations. IEEE Trans. Comput. 50, 7, 689--698.
[3]
Björck, A. 1996. Numerical Methods for Least Squares Problems. Society for Industrial and Applied Mathematics.
[4]
Bomze, I. and Gutjahr, W. 1994. The dinamics of self--evaluation. Appl. Math. Comput. 64, 47--63.
[5]
Bomze, I. and Gutjahr, W. 1995. Estimating qualifications in a self-evaluating group. Qual. Quant. 29, 241--250.
[6]
Borodin, A., Roberts, G. O., Rosenthal, J. S., and Tsaparas, P. 2001. Finding authorities and hubs from link structures on the world wide web. In Proceedings of the 10th International World Wide Web Conference.
[7]
Brin, S., Motwani, R., Page, L., and Winograd, T. 1998. What can you do with a web in your pocket? IEEE Bulle. Techn. Comm. Data Eng., IEEE Comput. Soc. 21, 2, 37--47.
[8]
Brin, S. and Page, L. 1998. The anatomy of a large--scale hypertextual Web search engine. In Proceedings of the 7th World Wide Web Conference (WWW7).
[9]
Brin, S., Page, L., Motwani, R., and Winograd, T. 1999. The PageRank citation ranking: Bringing order to the Web. Tech. Rep. 1999-66, Stanford University. Available on the Internet at http://dbpubs.stanford.edu:8090/pub/1999-66.
[10]
Cohn, D. and Chang, H. 2000. Learning to probabilistically identify authoritative documents. In Proceedings of 17th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, Calif., 167--174.
[11]
Cohn, D. and Hofmann, T. 2001. The missing link---A probabilistic model of document content and hypertext connectivity. In Neural Inf. Proc. Syst. 13.
[12]
Diligenti, M., Gori, M., and Maggini, M. 2002. Web page scoring systems for horizontal and vertical search. In Proceedings of the 11th World Wide Web Conference (WWW11).
[13]
Golub, G. H. and Van Loan, C. F. 1993. Matrix computation. The Johns Hopkins University Press.
[14]
Haveliwala, T. H. 1999. Efficient computation of pagerank. Tech. Rep. 1999-66, Stanford University. Available on the Internet at http://dbpubs.stanford.edu:8090/pub/1999-66.
[15]
Haveliwala, T. H. 2002. Topic sensitive pagerank. In Proceedings of the 11th World Wide Web Conference (WWW11). Available on the Internet at http://dbpubs.stanford.edu:8090/pub/2002-6.
[16]
Henzinger, M. 2001. Hyperlink analysis for the Web. IEEE Internet Computing 5, 1, 45--50.
[17]
Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632.
[18]
Lempel, R. and Moran, S. 2000. The stochatic approach for link--structure analysis (SALSA) and the TKC effect. In Proceedings of the 9th World Wide Web Conference (WWW9). Elsevier Science, 387--401.
[19]
Marchiori, M. 1997. The quest for correct information on the Web: Hyper search engines. Computer Networks and ISDN Systems 29, 1225--1235.
[20]
Motwani, R. and Raghavan, P. 1995. Randomized algorithms. Cambridge University Press.
[21]
Ng, A. Y., Zheng, A. X., and Jordan, M. I. 2001a. Link analysis, eigenvectors and stability. In Proceedings of International Conference on Research and Development in Information Retrieval (SIGIR 2001). ACM, New York.
[22]
Ng, A. Y., Zheng, A. X., and Jordan, M. I. 2001b. Stable algorithms for link analysis. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI'2001).
[23]
Pringle, G., Allison, L., and Dowe, D. L. 1998. What is tall poppy among the Web pages? Comput. Netwo. ISDN Syst. 30, 369--377.
[24]
Richardson, M. and Domingos, P. 2002. The intellingent surfer: probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems, 14. MIT Press, Cambridge, Mass.
[25]
Rumelhart, D., Hinton, G., and Williams, R. 1986. Learning representations by back-propagating errors. Nature 323, 533--536.
[26]
Seneta, E. 1981. Non-negative matrices and Markov chains. Springer-Verlag, New York, Chap. 4, pp. 112--158.
[27]
Varga, R. S. 1962. Matrix Iterative Analysis. Prentice--Hall, Englewood Cliffs, N.J.
[28]
Zhang, D. and Dong, Y. 2000. An efficient algorithm to rank web resources. In Proceedings of the 9th International World Wide Web Conference (WWW9). Elsevier Science, Amsterdam, The Netherlands.

Cited By

View all

Recommendations

Reviews

Kipp Jones

Web searching continues to be a popular topic, of both commercial and academic interest. This paper presents an in-depth mathematical analysis of the properties of Google's PageRank algorithm, which relies on the topological structure of the Web to calculate the relative value of a given Web page. In addition to the rigorous mathematical treatment, the paper provides insights into the implications of these calculations, and how they impact stability, computability, convergence, maintenance, and vulnerability. The PageRank algorithm exploits the inherent properties of the Markovian matrices that represent the Web's structure to calculate individual page ranks. The authors go on to show that it is possible to perform an optimal computation, based on the limited precision requirements imposed by the PageRank formula. This result is an important factor for the scalability of the PageRank algorithm. However, as pointed out by Langville and Meyer [1], "if the holy grail of real-time personalized search is ever to be realized, then drastic speed improvements must be made..." The authors also introduce the notion of energy, and demonstrate the mathematical properties of this concept through the use of circuit analysis on a subgraph or community of Web pages. This analysis leads to some interesting properties, including the energy loss introduced by "dangling" pages, and "outlinks" from a given community. Following this line, the authors also detail mechanisms by which the PageRank can be increased (or decreased), to influence the relative ranking of the pages within the ranking system. Indeed, one of the authors' findings is the fact that the energy of a given target community can be driven to grow linearly with the growth of a "promoting community." This results in a promotion mechanism that is very difficult to detect. Another nugget suggested by this research is that hyperlinks to outside pages should be placed in pages with a small PageRank, in addition to a large number of internal links. This property, as well as others discussed by the authors, can be exploited to retain as much energy as possible in a given community, and may provide some insights for site designers. With this paper, the authors provide a valuable resource for those interested in analyzing the behavior of, and optimizing, large-scale ranking algorithms, such as the one used by Google. The math should keep those interested in the theory busy, while the conclusions provide some practical advice for practitioners in the field. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 5, Issue 1
February 2005
297 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/1052934
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2005
Published in TOIT Volume 5, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Information retrieval
  2. Markov chains
  3. PageRank
  4. Web page scoring
  5. search engines
  6. searching the Web

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)294
  • Downloads (Last 6 weeks)22
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media