skip to main content
10.1145/1390334.1390394acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Retrieval and feedback models for blog feed search

Published: 20 July 2008 Publication History

Abstract

Blog feed search poses different and interesting challenges from traditional ad hoc document retrieval. The units of retrieval, the blogs, are collections of documents, the blog posts. In this work we adapt a state-of-the-art federated search model to the feed retrieval task, showing a significant improvement over algorithms based on the best performing submissions in the TREC 2007 Blog Distillation task[12]. We also show that typical query expansion techniques such as pseudo-relevance feedback using the blog corpus do not provide any significant performance improvement and in many cases dramatically hurt performance. We perform an in-depth analysis of the behavior of pseudo-relevance feedback for this task and develop a novel query expansion technique using the link structure in Wikipedia. This query expansion technique provides significant and consistent performance improvements for this task, yielding a 22% and 14% improvement in MAP over the unexpanded query for our baseline and federated algorithms respectively.

References

[1]
J. Arguello, J. L. Elsas, J. Callan, and J. G. Carbonell. Document representation and query expansion models for blog recommendation. In Proc. of the 2nd Intl. Conf. on Weblogs and Social Media (ICWSM), 2008.
[2]
S. Brin and L. Page. The anatomy of a large-scale hyper-textual web search engine. Computer Networks and ISDN Systems, 30(1-7):107--117, 1998.
[3]
J. Callan. Distributed information retrieval. In W. Croft, editor, Advances in Information Retrieval, pages 127--150. Kluwer Academic Publishers, 2000.
[4]
C. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 terabyte track. In Proc. of the 2004 Text Retrieval Conf., 2004.
[5]
C. Clarke, F. Scholer, and I. Soboroff. Overview of the TREC 2005 terabyte track. In Proc. of the 2005 Text Retrieval Conf., 2005.
[6]
F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corpora. In Proc. of the 29th Annl. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 154--161, 2006.
[7]
J. Elsas, J. Arguello, J. Callan, and J. Carbonell. Retrieval and feedback models for blog distillation. In Proc. of the 2007 Text Retrieval Conf., 2007.
[8]
D. Hannah, C. Macdonald, J. Peng, B. He, and I. Ounis. University of Glasgow at TREC 2007: Experiments with blog and enterprise tracks with terrier. In Proc. of the 2007 Text Retrieval Conf., 2007.
[9]
P. Kolari, A. Java, and T. Finin. Characterizing the splogosphere. In Proc. of the 3rd Annl. Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wide Web Conf., 2006.
[10]
V. Lavrenko and W. B. Croft. Relevance based language models. In Proc. of the 24th Annl. Intl. ACM SIGIR Conf. on Research and Development in Information retrieval, pages 120--127, 2001.
[11]
C. Macdonal and I. Ounis. The TREC blog06 collection: Creating and analysing a blog test collection. Technical Report TR-2006-224, Department of Computing Science, U. of Glasgow, 2006.
[12]
C. Macdonald, I. Ounis, and I. Soboroff. Overview of the TREC 2007 blog track. In Proc. of the 2007 Text Retrieval Conf., 2007.
[13]
D. Metzler and B. W. Croft. A markov random field model for term dependencies. In Proc. of the 28th Annl. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 472--479, 2005.
[14]
D. Metzler, T. Strohman, H. Turtle, and W. Croft. Indri at TREC 2004: Terabyte track. In Proc. of the 2004 Text Retrieval Conf., 2004.
[15]
D. Metzler, T. Strohman, Y. Zhou, and W. Croft. Indri at TREC 2005: Terabyte track. In Proc. of the 2005 Text Retrieval Conf., 2005.
[16]
J. Seo and W. B. Croft. Umass at trec 2007 blog distillation task. In Proc. of the 2007 Text Retrieval Conf., 2007.
[17]
L. Si and J. Callan. Relevant document distribution estimation method for resource selection. In Proc. of the 26th Annl. Intl. ACM SIGIR Conf. on Research and Development in Informaion Retrieval, 2003.
[18]
I. Soboroff, A. de Vries, and N. Craswell. Overview of the trec 2006 enterprise track. In Proc. of the 2006 Text Retrieval Conf., 2006.
[19]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 22(2):179--214, 2004.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. blog retrieval
  2. federated search
  3. query expansion

Qualifiers

  • Research-article

Conference

SIGIR '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media