skip to main content
10.1145/383952.384021acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Topic segmentation with an aspect hidden Markov model

Published: 01 September 2001 Publication History

Abstract

We present a novel probabilistic method for topic segmentation on unstructured text. One previous approach to this problem utilizes the hidden Markov model (HMM) method for probabilistically modeling sequence data [7]. The HMM treats a document as mutually independent sets of words generated by a latent topic variable in a time series. We extend this idea by embedding Hofmann's aspect model for text [5] into the segmenting HMM to form an aspect HMM (AHMM). In doing so, we provide an intuitive topical dependency between words and a cohesive segmentation model. We apply this method to segment unbroken streams of New York Times articles as well as noisy transcripts of radio programs on SpeechBot, an online audio archive indexed by an automatic speech recognition engine. We provide experimental comparisons which show that the AHMM outperforms the HMM for this task.

References

[1]
Doug Beeferman,Adam Berger,and John Lafferty. Statistical models for text segmentation.Machine Learning,1999.
[2]
A.P.Dempster,N.M.Laird,and D.B.Rubin. Maximum likelihood from incomplete data via the em algorithm.Journal of the Royal Statistical Society, Series B (Methodological),39(1):1 -38,1977.
[3]
Daniel Gildea and Thomas Hofmann.Topic-based language models using em.EuroSpeech-99,pages 2167 -2170,1999.
[4]
Marti A.Hearst.Context and structure in automated full-text information access.University of California at Berkeley dissertation.Computer Science Division Technical Report,1994.
[5]
Thomas Hofmann.Probabilistic latent semantic indexing.Proceedings of the Twenty-Second Annual International SIGIRConference on Research and Development in Information Retrieval,1999.
[6]
Pedro Moreno,JM Van Thong,Beth Logan,Blair Fidler,Katrina Ma .ey,and Matthew Moores. Speechbot:a content-based search index for multimedia on the web.Proceedings of the .rst IEEE Paci .c-Rim Conference on Multimedia,2000.
[7]
P.van Mulbregt,I.Carp,L.Gillick,S.Lowe,and J.Yamron.Text segmentation and topic tracking on broadcast news via a hidden markov model approach. 1998.
[8]
Andrew J.Viterbi.Error bounds for convolutional codes and an asymptotically optimal decoding algorithm.IEEE Transactions on Information Theory, 13:260 -269,1967.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
September 2001
454 pages
ISBN:1581133316
DOI:10.1145/383952
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGIR01
Sponsor:

Acceptance Rates

SIGIR '01 Paper Acceptance Rate 47 of 201 submissions, 23%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)5
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media