Transfer Topic Modeling with Ease and Scalability

Kang, Jeon-Hyung; Ma, Jun; Liu, Yan

Computer Science > Computation and Language

arXiv:1301.5686 (cs)

[Submitted on 24 Jan 2013 (v1), last revised 26 Jan 2013 (this version, v2)]

Title:Transfer Topic Modeling with Ease and Scalability

Authors:Jeon-Hyung Kang, Jun Ma, Yan Liu

View PDF

Abstract:The increasing volume of short texts generated on social media sites, such as Twitter or Facebook, creates a great demand for effective and efficient topic modeling approaches. While latent Dirichlet allocation (LDA) can be applied, it is not optimal due to its weakness in handling short texts with fast-changing topics and scalability concerns. In this paper, we propose a transfer learning approach that utilizes abundant labeled documents from other domains (such as Yahoo! News or Wikipedia) to improve topic modeling, with better model fitting and result interpretation. Specifically, we develop Transfer Hierarchical LDA (thLDA) model, which incorporates the label information from other domains via informative priors. In addition, we develop a parallel implementation of our model for large-scale applications. We demonstrate the effectiveness of our thLDA model on both a microblogging dataset and standard text collections including AP and RCV1 datasets.

Comments:	2012 SIAM International Conference on Data Mining (SDM12) Pages: {564-575}
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1301.5686 [cs.CL]
	(or arXiv:1301.5686v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1301.5686

Submission history

From: Jeon-Hyung Kang [view email]
[v1] Thu, 24 Jan 2013 02:02:13 UTC (994 KB)
[v2] Sat, 26 Jan 2013 18:00:19 UTC (994 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2013-01

Change to browse by:

cs
cs.LG
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jeon-Hyung Kang
Jun Ma
Yan Liu

export BibTeX citation

Computer Science > Computation and Language

Title:Transfer Topic Modeling with Ease and Scalability

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Transfer Topic Modeling with Ease and Scalability

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators