Do Neural Nets Learn Statistical Laws behind Natural Language?

Takahashi, Shuntaro; Tanaka-Ishii, Kumiko

doi:10.1371/journal.pone.0189326

Computer Science > Computation and Language

arXiv:1707.04848 (cs)

[Submitted on 16 Jul 2017 (v1), last revised 28 Nov 2017 (this version, v2)]

Title:Do Neural Nets Learn Statistical Laws behind Natural Language?

Authors:Shuntaro Takahashi, Kumiko Tanaka-Ishii

View PDF

Abstract:The performance of deep learning in natural language processing has been spectacular, but the reasons for this success remain unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a neural language model based on long short-term memory (LSTM) effectively reproduces Zipf's law and Heaps' law, two representative statistical properties underlying natural language. We discuss the quality of reproducibility and the emergence of Zipf's law and Heaps' law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical property of natural language. This understanding could provide a direction for improving the architectures of neural networks.

Comments:	21 pages, 11 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1707.04848 [cs.CL]
	(or arXiv:1707.04848v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1707.04848
Related DOI:	https://doi.org/10.1371/journal.pone.0189326

Submission history

From: Kumiko Tanaka-Ishii [view email]
[v1] Sun, 16 Jul 2017 09:08:42 UTC (3,288 KB)
[v2] Tue, 28 Nov 2017 07:36:25 UTC (4,775 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2017-07

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Shuntaro Takahashi
Kumiko Tanaka-Ishii

export BibTeX citation

Computer Science > Computation and Language

Title:Do Neural Nets Learn Statistical Laws behind Natural Language?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Do Neural Nets Learn Statistical Laws behind Natural Language?

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators