skip to main content
10.5555/2816272.2816349guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

Information retrieval on mixed written and spoken documents

Published: 26 April 2004 Publication History

Abstract

While advances have been made in structuring, indexing and retrieval of multimedia documents, we propose to study the less explored problematics of information retrieval on heterogeneous media sets composed of written and spoken documents. The coverage of modalities in retrieved results seems to be an important part of the user's information need. We show that this problematic is not satisfied by the usual bag-of-words models and we propose a method to balance modalities within the query expansion process of the probabilistic model. As experiments never seem to have been conducted in this domain, we suggest that building evaluation data for the addressed media (text and speech) as well as other media (image...) is important for the multimedia information retrieval community.

References

[1]
Allan, J. (2002). Information Retrieval Techniques for Speech Applications, chapter Perspectives on Information Retrieval and Speech. Anni R. Coden and Eric W. Brown and Savitha Srivinvasen.
[2]
Baeza--Yates, R. and Berthier Ribiero--Neto (1999). Modern Information Retrieval. Addison Wesley.
[3]
Browne, P., Czirjek, C., Gaughan, G., Gurrin, C., Jones, G., Marlow, S. L. H., McDonald, K., Murphy, N., O'Connor, N., O'Hare, N., Smeaton, A., and Ye, J. (2003). Dublin city university video track experiments for TREC 2003.
[4]
Buckley, C., Mitra, M., A. Walz, J., and Cardie, C. (2000). Using clustering and superconcepts within SMART: TREC 6. Information Processing and Management, 36(1):109--131.
[5]
Cooper, W. S. (1995). Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. In ACM Transactions on Information Systems.
[6]
Cui, H., Wen, J.-R., Nie, J.-Y., and Ma, W.-Y. (2002). Probabilistic query expansion using query logs. In Proceedings of the eleventh international conference on World Wide Web, pages 325--332.
[7]
Fujii, A., Itou, K., and Ishikawa, T. (2001). Speech-driven text retrieval: Using target IR collections for statistical language model adaptation in speech recognition. Lecture Notes in Computer Science, 2273.
[8]
Garofolo, J. S., Auzanne, C. G. P., and Voorhees, E. M. (2000). The trec spoken document retrieval track: A success story. In The Eighth Text REtrieval Conference.
[9]
Hori, C. and Furui, S. (2000). Automatic speech summarization based on word significance and linguistic likelihood.
[10]
Hull, D. A. (1996). Stemming algorithms: A case study for detailed evaluation. Journal of the American Society of Information Science, 47(1):70--84.
[11]
Johnson, S. E., Jourlin, P., Spärck Jones, K., and Woodland, P. C. (2000). Spoken document retrieval for TREC-8 at Cambridge university. In The Eighth Text REtrieval Conference, pages 197--206.
[12]
Mitra, M., Singhal, A., and Buckley, C. (1998). Improving automatic query expansion. In Research and Development in Information Retrieval, pages 206--214.
[13]
Robertson, S. E. and Walker, S. (1997). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Readings in Information Retrieval.
[14]
Sable, C. and Church, K. W. (2001). Using bins to empirically estimate term weights for text categorization. In Proceedings of the 2001 Conference on Empirical Methods in Natuarl Language Processing (EMNLP-01).
[15]
Salton, G. and Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5).
[16]
Singhal, A., Choi, J., Hindle, D., Lewis, D. D., and Pereira, F. C. N. (1999). ATT at TREC-8. In Text REtrieval Conference.
[17]
Spärck Jones, K., Walker, S., and Robertson, S. E. (1998). A probabilistic model of information retrieval: development and status. Technical report, Computer Laboratory, University of Cambridge.
[18]
Van Rijsbergen, C. J. (1979). Information Retrieval. Butterworths.
[19]
Voorhees, E. M. (1999). Natural language processing and information retrieval.
[20]
Walker, S. and Robertson, S. E. (1999). Okapi/Keenbow at TREC-8. In NIST Special Publication 500-246: The Eighth Text REtrieval Conference (TREC-8).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
RIAO '04: Coupling approaches, coupling media and coupling languages for information retrieval
April 2004
935 pages
ISBN:905450096

Publisher

LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE

Paris, France

Publication History

Published: 26 April 2004

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 57
    Total Downloads
  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)5
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media