skip to main content
article
Free access

Authoritative sources in a hyperlinked environment

Published: 01 September 1999 Publication History

Abstract

The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of context on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authorative” information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristrics for link-based analysis.

References

[1]
AROCENA,G.O.,MENDELZON,A.O.,AND MIHAILA, G. A. 1997. Applications of a Web query language. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7-11).
[2]
BARRETT, R., MAGLIO, P., AND KELLEM, D. 1997. How to personalize the web. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI '97) (Atlanta, Ga., Mar. 22-27). ACM, New York, pp. 75-82.
[3]
BERMAN, O., HODGSON,M.J.,AND KRASS, D. 1995. Flow-interception problems. In Facility Location: A Survey of Applications and Methods, Z. Drezner, ed. Springer-Verlag, New York.
[4]
BERNERS-LEE, T., CAILLIAU, R., LUOTONEN, A., NIELSEN,H.F.,AND SECRET, A. 1994. The world-wide web. Commun. ACM 37, 1 (Jan.), 76-82.
[5]
BHARAT, K., BRODER, A., HENZINGER,M.R.,KUMAR, P., AND VENKATASUBRAMANIAN, S. 1998. Connectivity server: Fast access to linkage information on the web. In Proceedings of the 7th International World Wide Web Conference (Brisbane, Australia, Apr. 14-18).
[6]
BHARAT, K., AND HENZINGER, M. R. 1998. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Melbourne, Australia, Aug. 24-28). ACM, New York, pp. 104-111.
[7]
BOTAFOGO, R., RIVLIN, E., AND SHNEIDERMAN, B. 1992. Structural analysis of hypertext: Identify-ing hierarchies and useful metrics. ACM Trans. Inf. Sys. 10, 2 (Apr.), 142-180.
[8]
BRIN, S., AND PAGE, L. 1998. Anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference (Brisbane, Australia, Apr. 14-18). pp. 107-117.
[9]
CARRIERE, J., AND KAZMAN, R. 1997. WebQuery: Searching and visualizing the web through connectivity. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7-11).
[10]
CHAKRABARTI, S., DOM, B., GIBSON, D., KUMAR,S.R.,RAGHAVAN, P., RAJAGOPALAN, S., AND TOMKINS, A. 1998. Experiments in topic distillation. In Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval on the Web (Melbourne, Australia). ACM, New York.
[11]
CHAKRABARTI, S., DOM, B., GIBSON, D., KLEINBERG, J., RAGHAVAN, P., AND RAJAGOPALAN, S. 1998. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceed-ings of the 7th International World Wide Web Conference (Brisbane, Australia, Apr. 14-18). pp. 65-74.
[12]
CHUNG, F. R. K. 1997. Spectral Graph Theory. AMS Press, Providence, R.I.
[13]
CHEKURI, C., GOLDWASSER, M., RAGHAVAN, P., AND UPFAL, E. 1997. Web search using automated classification. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7-11).
[14]
CUTTING,D.R.,PEDERSEN, J., KARGER,D.R.,AND TUKEY, J. W. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Copenhagen, Denmark, June 21-24). ACM, New York, pp. 330-337.
[15]
DE SOLLA PRICE, D. 1981. The analysis of square matrices of scientometric transactions. Sciento-metrics 3 55-63.
[16]
DEERWESTER, S., DUMAIS, S., LANDAUER, T., FURNAS, G., AND HARSHMAN, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Info. Sci. 41, 391-407.
[17]
DIGITAL EQUIPMENT CORPORATION. AltaVista search engine, http://altavista.digital.com/.
[18]
DONATH,W.E.,AND HOFFMAN, A. J. 1973. Lower bounds for the partitioning of graphs. IBM J. Res. Develop. 17.
[19]
DOREIAN, P. 1988. Measuring the relative standing of disciplinary journals, Inf. Proc. Manage. 24, 45-56.
[20]
DOREIAN, P. 1994. A measure of standing for citation networks within a wider environment. Inf. Proc. Manage. 30, 21-31.
[21]
EGGHE, L. 1988. Mathematical relations between impact factors and average number of citations. Inf. Proc. Manage. 24, 567-576.
[22]
EGGHE, L., AND ROUSSEAU, R. 1990. Introduction to Informetrics, Elsevier, North-Holland, Am-sterdam, The Netherlands.
[23]
FIELDER, M. 1973. Algebraic connectivity of graphs. Czech. Math. J. 23, 298-305.
[24]
FRIEZE, A., KANNAN, R., AND VEMPALA, S. 1998. Fast Monte-Carlo Algorithms for Finding Low-Rank Approximations. In Proceedings of the 39th IEEE Symposium on Foundations of Computer Science (Palo Alto, Calif., Nov. 8-11). IEEE Computer Society Press, Los Alamitos, Calif.
[25]
FRISSE, M. E. 1988. Searching for information in a hypertext medical handbook. Commun. ACM 31, 7 (July), 880-886.
[26]
GARFIELD, E. 1972. Citation analysis as a tool in journal evaluation. Science 178, 471-479.
[27]
GELLER, N. 1978. On the citation influence methodology of Pinski and Narin. Inf. Proc. Manage. 14, 93-95.
[28]
GIBSON, D., KLEINBERG, J., AND RAGHAVAN, P. 1998. Inferring web communities from link topology. In Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (Pittsburgh, Pa., June 20-24). ACM, New York, pp. 225-234.
[29]
GIBSON, D., KLEINBERG, J., AND RAGHAVAN, P. 1998. Clustering categorical data: An approach based on dynamical systems. In Proceedings of the 24th International Conference on Very Large Databases (New York, N.Y., Aug. 24-27). pp. 311-322.
[30]
GOLUB, G., AND VAN LOAN, C. F. 1989. Matrix Computations. Johns Hopkins University Press, Baltimore, Md.
[31]
HOTELLING, H. 1933. Analysis of a complex statistical variable into principal components. J. Educ. Psychol. 24, 417-441.
[32]
HUBBELL, C. H. 1965. An input-output approach to clique identification. Sociometry 28, 377-399.
[33]
HUBERMAN, B., PIROLLI, P., PITKOW, J., AND LUKOSE, R. 1998. Strong regularities in world wide web surfing. Science, 280.
[34]
JOLLIFFE, I. T. 1986. Principal Component Analysis. Springer-Verlag, New York.
[35]
KATZ, L. 1953. A new status index derived from sociometric analysis. Psychometrika 18, 39-43.
[36]
KESSLER, M. M. 1963. Bibliographic coupling between scientific papers. Amer. Document. 14, 10-25.
[37]
LARSON, R. 1996. Bibliometrics of the world wide web: An exploratory analysis of the intellectual structure of cyberspace. In Proceedings of the Annual Meeting of the American Society of Information Science (Baltimore, Md., Oct. 19-24).
[38]
LEVINE, J. H. 1979. Joint-space analysis of 'pick-any' data: Analysis of choices from an uncon-strained set of alternatives. Psychometrika, 44, 85-92.
[39]
MARCHIORI, M. 1997. The quest for correct information on the web: Hyper search engines. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7-11).
[40]
MCBRYAN, O. 1994. GENVL and WWWW: Tools for taming the web. In Proceedings of the 1st International World Wide Web Conference (Geneva, Switzerland, May).
[41]
MCCAIN, K. 1986. Co-cited author mapping as a valid representation of intellectual structure. J. Amer. Soc. Info. Sci. 37, 111-122.
[42]
NOMA, E. 1982. An improved method for analyzing square scientometric transaction matrices. Scientometrics 4, 297-316.
[43]
NOMA, E. 1984. Co-citation analysis and the invisible college. J. Amer. Soc. Info. Sci. 35, 29-33.
[44]
PAPADIMITRIOU,C.H.,RAGHAVAN, P., TAMAKI, H., AND VEMPALA, S. 1998. Latent semantic indexing: A probabilistic analysis. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Seattle, Wash., June 1-3). ACM, New York, pp. 159-168.
[45]
PINSKI, G., AND NARIN, F. 1976. Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Inf. Proc. Manage. 12, 297-312.
[46]
PIROLLI, P., PITKOW, J., AND RAO, R. 1996. Silk from a sow's ear: Extracting usable structures from the web. In Proceedings of ACM SIGCHI Conference on Human Factors in Computing Systems (CHI '96) (Vancouver, B.C., Canada, Apr. 13-18). ACM, New York, pp. 118-125.
[47]
PITKOW, J., AND PIROLLI, P. 1997. Life, death, and lawfulness on the electronic frontier. In Proceedings of ACM SIGCHI Conference on Human Factors in Computing Systems (CHI '97) (Atlanta, Ga., Mar. 22-27). ACM, New York, pp. 383-390.
[48]
SALTON, G. 1989. Automatic Text Processing. Addison-Wesley, Reading, Mass.
[49]
SHAW, W. M. 1991. Subject and citation indexing. Part I: The clustering structure of composite representations in the cystic fibrosis document collection. J. Amer. Soc. Info. Sci. 42, 669-675.
[50]
SHAW, W. M. 1991. Subject and citation indexing. Part II: The optimal, cluster-based retrieval performance of composite representations. J. Amer. Soc. Info. Sci. 42, 676-684.
[51]
SMALL, H. 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Amer. Soc. Info. Sci. 24, 265-269.
[52]
SMALL, H. 1986. The synthesis of specialty narratives from co-citation clusters. J. Amer. Soc. Info. Sci. 37, 97-110.
[53]
SMALL, H., AND GRIFFITH, B. C. 1974. The structure of the scientific literatures I. Identifying and graphing specialties. Science Studies 4, 17-40.
[54]
SPERTUS, E. 1997. ParaSite: Mining structural information on the web. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7-11).
[55]
VAN RIJSBERGEN, C. J. 1979. Information Retrieval. Butterworths, London, England.
[56]
WEISS, R., VELEZ, B., SHELDON,M.A.,NEMPREMPRE, C., SZILAGYI, P., DUDA, A., AND GIFFORD, D. K. 1996. HyPursuit: A hierarchical network search engine that exploits content-link hypertext clustering. In Proceedings of the 7th ACM Conference on Hypertext (Washington, D.C., Mar. 16-20). ACM, New York, pp. 180-193.
[57]
WIRED DIGITAL,INC. Hotbot, http://www.hotbot.com.
[58]
YAHOO!CORPORATION Yahoo!, http://www.yahoo.com.

Cited By

View all

Recommendations

Reviews

Lynda Hardman

Searching for relevant information on the World Wide Web can be very much a hit-or-miss experience. In order to improve matters, Kleinberg first introduces the notion of broad-topic queries, where the user is interested in information on a particular topic and a standard text search may produce hundreds or thousands of hits of uncertain relevance. He then introduces the notions of hubs and authorities, where an authority is a page linked to by many hubs, and a hub is a page linking to many authorities. While this at first seems to be a circular definition, he presents a computationally inexpensive algorithm that is able to identify hubs and authorities reliably. (Note that Kleinberg does not claim that the algorithm finds all hubs and authorities relevant to the query.) In addition, the motivation of the algorithm is highly intuitive and is , in itself, an interesting and insightful contribution. In addition to presenting his own work, the author devotes a large part of the paper to a thorough discussion of related work, covering studies not only of online sources but also of printed materials (such as journal citation indices). Not only is the paper authoritative in its own right, it is a hub pointing to other works on the topic.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 46, Issue 5
Sept. 1999
210 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/324133
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1999
Published in JACM Volume 46, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. World Wide Web
  2. graph algorithms
  3. hypertext structure
  4. link analysis

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,627
  • Downloads (Last 6 weeks)302
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media