Rethinking the Authorship Verification Experimental Setups

Brad, Florin; Manolache, Andrei; Burceanu, Elena; Barbalau, Antonio; Ionescu, Radu; Popescu, Marius

Computer Science > Computation and Language

arXiv:2112.05125 (cs)

[Submitted on 9 Dec 2021 (v1), last revised 1 Nov 2022 (this version, v2)]

Title:Rethinking the Authorship Verification Experimental Setups

Authors:Florin Brad, Andrei Manolache, Elena Burceanu, Antonio Barbalau, Radu Ionescu, Marius Popescu

View PDF

Abstract:One of the main drivers of the recent advances in authorship verification is the PAN large-scale authorship dataset. Despite generating significant progress in the field, inconsistent performance differences between the closed and open test sets have been reported. To this end, we improve the experimental setup by proposing five new public splits over the PAN dataset, specifically designed to isolate and identify biases related to the text topic and to the author's writing style. We evaluate several BERT-like baselines on these splits, showing that such models are competitive with authorship verification state-of-the-art methods. Furthermore, using explainable AI, we find that these baselines are biased towards named entities. We show that models trained without the named entities obtain better results and generalize better when tested on DarkReddit, our new dataset for authorship verification.

Comments:	Accepted as a short paper at the EMNLP 2022 conference. 10 pages, 5 figures, 9 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2112.05125 [cs.CL]
	(or arXiv:2112.05125v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2112.05125

Submission history

From: Florin Brad [view email]
[v1] Thu, 9 Dec 2021 18:57:29 UTC (1,091 KB)
[v2] Tue, 1 Nov 2022 11:20:36 UTC (1,205 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-12

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Florin Brad
Elena Burceanu
Radu Tudor Ionescu
Marius Popescu

export BibTeX citation

Computer Science > Computation and Language

Title:Rethinking the Authorship Verification Experimental Setups

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Rethinking the Authorship Verification Experimental Setups

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators