Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

Haider, Thomas

Computer Science > Computation and Language

arXiv:2102.08858 (cs)

[Submitted on 17 Feb 2021 (v1), last revised 21 Apr 2021 (this version, v2)]

Title:Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

Authors:Thomas Haider

View PDF

Abstract:A prerequisite for the computational study of literature is the availability of properly digitized texts, ideally with reliable meta-data and ground-truth annotation. Poetry corpora do exist for a number of languages, but larger collections lack consistency and are encoded in various standards, while annotated corpora are typically constrained to a particular genre and/or were designed for the analysis of certain linguistic features (like rhyme). In this work, we provide large poetry corpora for English and German, and annotate prosodic features in smaller corpora to train corpus driven neural models that enable robust large scale analysis.
We show that BiLSTM-CRF models with syllable embeddings outperform a CRF baseline and different BERT-based approaches. In a multi-task setup, particular beneficial task relations illustrate the inter-dependence of poetic features. A model learns foot boundaries better when jointly predicting syllable stress, aesthetic emotions and verse measures benefit from each other, and we find that caesuras are quite dependent on syntax and also integral to shaping the overall measure of the line.

Comments:	EACL 2021
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2102.08858 [cs.CL]
	(or arXiv:2102.08858v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2102.08858

Submission history

From: Thomas Haider [view email]
[v1] Wed, 17 Feb 2021 16:38:57 UTC (1,929 KB)
[v2] Wed, 21 Apr 2021 09:35:47 UTC (1,929 KB)

Computer Science > Computation and Language

Title:Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators