DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Coquenet, Denis; Chatelain, Clément; Paquet, Thierry

doi:10.1109/TPAMI.2023.3235826

Computer Science > Computer Vision and Pattern Recognition

arXiv:2203.12273 (cs)

[Submitted on 23 Mar 2022 (v1), last revised 13 Dec 2022 (this version, v4)]

Title:DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Authors:Denis Coquenet, Clément Chatelain, Thierry Paquet

View PDF

Abstract:Unconstrained handwritten text recognition is a challenging computer vision task. It is traditionally handled by a two-step approach, combining line segmentation followed by text line recognition. For the first time, we propose an end-to-end segmentation-free architecture for the task of handwritten document recognition: the Document Attention Network. In addition to text recognition, the model is trained to label text parts using begin and end tags in an XML-like fashion. This model is made up of an FCN encoder for feature extraction and a stack of transformer decoder layers for a recurrent token-by-token prediction process. It takes whole text documents as input and sequentially outputs characters, as well as logical layout tokens. Contrary to the existing segmentation-based approaches, the model is trained without using any segmentation label. We achieve competitive results on the READ 2016 dataset at page level, as well as double-page level with a CER of 3.43% and 3.70%, respectively. We also provide results for the RIMES 2009 dataset at page level, reaching 4.54% of CER.
We provide all source code and pre-trained model weights at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2203.12273 [cs.CV]
	(or arXiv:2203.12273v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2203.12273
Journal reference:	IEEE Transactions on Pattern Analysis and Machine Intelligence 2023
Related DOI:	https://doi.org/10.1109/TPAMI.2023.3235826

Submission history

From: Denis Coquenet [view email]
[v1] Wed, 23 Mar 2022 08:40:42 UTC (3,548 KB)
[v2] Thu, 7 Apr 2022 09:26:23 UTC (4,329 KB)
[v3] Mon, 1 Aug 2022 15:28:39 UTC (12,128 KB)
[v4] Tue, 13 Dec 2022 10:06:59 UTC (11,832 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators