Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation

Ionescu, Radu Tudor; Butnaru, Andrei M.

Computer Science > Computation and Language

arXiv:1902.08850 (cs)

[Submitted on 23 Feb 2019 (v1), last revised 6 May 2019 (this version, v3)]

Title:Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation

Authors:Radu Tudor Ionescu, Andrei M. Butnaru

View PDF

Abstract:In this paper, we propose a novel representation for text documents based on aggregating word embedding vectors into document embeddings. Our approach is inspired by the Vector of Locally-Aggregated Descriptors used for image representation, and it works as follows. First, the word embeddings gathered from a collection of documents are clustered by k-means in order to learn a codebook of semnatically-related word embeddings. Each word embedding is then associated to its nearest cluster centroid (codeword). The Vector of Locally-Aggregated Word Embeddings (VLAWE) representation of a document is then computed by accumulating the differences between each codeword vector and each word vector (from the document) associated to the respective codeword. We plug the VLAWE representation, which is learned in an unsupervised manner, into a classifier and show that it is useful for a diverse set of text classification tasks. We compare our approach with a broad range of recent state-of-the-art methods, demonstrating the effectiveness of our approach. Furthermore, we obtain a considerable improvement on the Movie Review data set, reporting an accuracy of 93.3%, which represents an absolute gain of 10% over the state-of-the-art approach. Our code is available at this https URL.

Comments:	Accepted at NAACL 2019
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1902.08850 [cs.CL]
	(or arXiv:1902.08850v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1902.08850

Submission history

From: Radu Tudor Ionescu [view email]
[v1] Sat, 23 Feb 2019 21:35:54 UTC (207 KB)
[v2] Sun, 24 Mar 2019 21:58:36 UTC (207 KB)
[v3] Mon, 6 May 2019 05:19:39 UTC (208 KB)

Computer Science > Computation and Language

Title:Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators