DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

Wang, Hao; Wang, Qingxuan; Li, Yue; Wang, Changqing; Chu, Chenhui; Wang, Rui

Computer Science > Human-Computer Interaction

arXiv:2310.14802 (cs)

[Submitted on 23 Oct 2023]

Title:DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

Authors:Hao Wang, Qingxuan Wang, Yue Li, Changqing Wang, Chenhui Chu, Rui Wang

View PDF

Abstract:The use of visually-rich documents (VRDs) in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce \textsc{DocTrack}, a VRD dataset really aligned with human eye-movement information using eye-tracking technology. This dataset can be used to investigate the challenges mentioned above. Additionally, we explore the impact of human reading order on document understanding tasks and examine what would happen if a machine reads in the same order as a human. Our results suggest that although Document AI models have made significant progress, they still have a long way to go before they can read VRDs as accurately, continuously, and flexibly as humans do. These findings have potential implications for future research and development of Document AI models. The data is available at \url{this https URL}.

Comments:	14 pages, 8 figures, Accepted by Findings of EMNLP2023
Subjects:	Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Cite as:	arXiv:2310.14802 [cs.HC]
	(or arXiv:2310.14802v1 [cs.HC] for this version)
	https://doi.org/10.48550/arXiv.2310.14802

Submission history

From: Hao Wang [view email]
[v1] Mon, 23 Oct 2023 10:58:09 UTC (8,087 KB)

Computer Science > Human-Computer Interaction

Title:DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Human-Computer Interaction

Title:DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators