Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding

Shen, Ying; Zhang, Qiang; Zhang, Jin; Huang, Jiyue; Lu, Yuming; Lei, Kai

Computer Science > Computation and Language

arXiv:1812.01885 (cs)

[Submitted on 5 Dec 2018]

Title:Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding

Authors:Ying Shen, Qiang Zhang, Jin Zhang, Jiyue Huang, Yuming Lu, Kai Lei

View PDF

Abstract:Automatic text classification (TC) research can be used for real-world problems such as the classification of in-patient discharge summaries and medical text reports, which is beneficial to make medical documents more understandable to doctors. However, in electronic medical records (EMR), the texts containing sentences are shorter than that in general domain, which leads to the lack of semantic features and the ambiguity of semantic. To tackle this challenge, we propose to add word-cluster embedding to deep neural network for improving short text classification. Concretely, we first use hierarchical agglomerative clustering to cluster the word vectors in the semantic space. Then we calculate the cluster center vector which represents the implicit topic information of words in the cluster. Finally, we expand word vector with cluster center vector, and implement classifiers using CNN and LSTM respectively. To evaluate the performance of our proposed method, we conduct experiments on public data sets TREC and the medical short sentences data sets which is constructed and released by us. The experimental results demonstrate that our proposed method outperforms state-of-the-art baselines in short sentence classification on both medical domain and general domain.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1812.01885 [cs.CL]
	(or arXiv:1812.01885v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1812.01885
Journal reference:	International Conference on Information Science and Applications ICISA 2018: Information Science and Applications 2018 pp 401-411

Submission history

From: Ying Shen [view email]
[v1] Wed, 5 Dec 2018 10:02:59 UTC (612 KB)

Computer Science > Computation and Language

Title:Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators