skip to main content
10.5555/1793274.1793309acmotherconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

The importance of link evidence in Wikipedia

Published: 30 March 2008 Publication History

Abstract

Wikipedia is one of the most popular information sources on the Web. The free encyclopedia is densely linked. The link structure in Wikipedia differs from the Web at large: internal links in Wikipedia are typically based on words naturally occurring in a page, and link to another semantically related entry. Our main aim is to find out if Wikipedia's link structure can be exploited to improve ad hoc information retrieval. We first analyse the relation between Wikipedia links and the relevance of pages. We then experiment with use of link evidence in the focused retrieval of Wikipedia content, based on the test collection of INEX 2006. Our main findings are: First, our analysis of the link structure reveals that the Wikipedia link structure is a (possibly weak) indicator of relevance. Second, our experiments on INEX ad hoc retrieval tasks reveal that if the link evidence is made sensitive to the local context we see a significant improvement of retrieval effectiveness. Hence, in contrast with earlier TREC experiments using crawled Web data, we have shown that Wikipedia's link structure can help improve the effectiveness of ad hoc retrieval.

References

[1]
Wikipedia: The free encyclopedia (2008), http://en.wikipedia.org/
[2]
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998).
[3]
Kleinberg, J.M.: Authoritative structures in a hyperlinked environment. Journal of the ACM 46, 604-632 (1999).
[4]
Hawking, D.: Overview of the TREC-9 web track. In: Ninth Text REtrieval Conference (TREC-9), National Institute for Standards and Technology, pp. 87-102. NIST Special Publication 500-249 (2001).
[5]
Kraaij, W., Westerveld, T.: How different are web documents? In: Proceedings of the ninth Text Retrieval Conference, TREC-9, May 2001, NIST Special Publication (2001).
[6]
Hawking, D., Craswell, N.: Very large scale retrieval and web search. In: TREC: Experiment and Evaluation in Information Retrieval, pp. 199-231. MIT Press, Cambridge (2005).
[7]
Kraaij, W.,Westerveld, T., Hiemstra, D.: The importance of prior probabilities for entry page search. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 27-34. ACM Press, New York (2002).
[8]
Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 143-150. ACM Press, New York (2003).
[9]
Kamps, J.: Web-centric language models. In: CIKM 2005: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 307-308. ACM Press, New York (2005).
[10]
Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3-10 (2002).
[11]
INEX: INitiative for the Evaluation of XML retrieval (2007), http://inex.is. informatik.uni-duisburg.de/
[12]
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum 40, 64-69 (2006).
[13]
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: SIGCOMM 1999: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, pp. 251- 262. ACM Press, New York (1999).
[14]
ILPS: The ILPS extension of the Lucene search engine (2008), http://ilps. science.uva.nl/Resources/
[15]
Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, Center for Telematics and Information Technology, University of Twente (2001).
[16]
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11-21 (1972).
[17]
Sigurbjörnsson, B., Kamps, J., de Rijke, M.: An element-based approach to XML retrieval. In: INEX 2003 Workshop Proceedings, pp. 19-26 (2004).
[18]
Lalmas, M., Kazai, G., Kamps, J., Pehcevski, J., Piwowarski, B., Robertson, S.: INEX 2006 evaluation measures. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 20-34. Springer, Heidelberg (2007).

Cited By

View all
  1. The importance of link evidence in Wikipedia

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ECIR'08: Proceedings of the IR research, 30th European conference on Advances in information retrieval
    March 2008
    718 pages
    ISBN:3540786457
    • Editors:
    • Craig Macdonald,
    • Iadh Ounis,
    • Vassilis Plachouras,
    • Ian Ruthven,
    • Ryen W. White

    Sponsors

    • Yahoo! Research
    • Google Inc.
    • Microsoft Research: Microsoft Research
    • Matrixware Information Services

    In-Cooperation

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 30 March 2008

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media