What the Vec? Towards Probabilistically Grounded Embeddings

Allen, Carl; Balažević, Ivana; Hospedales, Timothy

Computer Science > Computation and Language

arXiv:1805.12164 (cs)

[Submitted on 30 May 2018 (v1), last revised 11 Nov 2019 (this version, v3)]

Title:What the Vec? Towards Probabilistically Grounded Embeddings

Authors:Carl Allen, Ivana Balažević, Timothy Hospedales

View PDF

Abstract:Word2Vec (W2V) and GloVe are popular, fast and efficient word embedding algorithms. Their embeddings are widely used and perform well on a variety of natural language processing tasks. Moreover, W2V has recently been adopted in the field of graph embedding, where it underpins several leading algorithms. However, despite their ubiquity and relatively simple model architecture, a theoretical understanding of what the embedding parameters of W2V and GloVe learn and why that is useful in downstream tasks has been lacking. We show that different interactions between PMI vectors reflect semantic word relationships, such as similarity and paraphrasing, that are encoded in low dimensional word embeddings under a suitable projection, theoretically explaining why embeddings of W2V and GloVe work. As a consequence, we also reveal an interesting mathematical interconnection between the considered semantic relationships themselves.

Comments:	Advances in Neural Information Processing, 2019
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1805.12164 [cs.CL]
	(or arXiv:1805.12164v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1805.12164

Submission history

From: Carl Allen [view email]
[v1] Wed, 30 May 2018 18:19:38 UTC (173 KB)
[v2] Sun, 26 May 2019 14:38:29 UTC (1,382 KB)
[v3] Mon, 11 Nov 2019 15:11:25 UTC (801 KB)

Computer Science > Computation and Language

Title:What the Vec? Towards Probabilistically Grounded Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:What the Vec? Towards Probabilistically Grounded Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators