A Dataset and Baselines for Visual Question Answering on Art

Garcia, Noa; Ye, Chentao; Liu, Zihua; Hu, Qingtao; Otani, Mayu; Chu, Chenhui; Nakashima, Yuta; Mitamura, Teruko

Computer Science > Computer Vision and Pattern Recognition

arXiv:2008.12520 (cs)

[Submitted on 28 Aug 2020]

Title:A Dataset and Baselines for Visual Question Answering on Art

Authors:Noa Garcia, Chentao Ye, Zihua Liu, Qingtao Hu, Mayu Otani, Chenhui Chu, Yuta Nakashima, Teruko Mitamura

View PDF

Abstract:Answering questions related to art pieces (paintings) is a difficult task, as it implies the understanding of not only the visual information that is shown in the picture, but also the contextual knowledge that is acquired through the study of the history of art. In this work, we introduce our first attempt towards building a new dataset, coined AQUA (Art QUestion Answering). The question-answer (QA) pairs are automatically generated using state-of-the-art question generation methods based on paintings and comments provided in an existing art understanding dataset. The QA pairs are cleansed by crowdsourcing workers with respect to their grammatical correctness, answerability, and answers' correctness. Our dataset inherently consists of visual (painting-based) and knowledge (comment-based) questions. We also present a two-branch model as baseline, where the visual and knowledge questions are handled independently. We extensively compare our baseline model against the state-of-the-art models for question answering, and we provide a comprehensive study about the challenges and potential future directions for visual question answering on art.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:2008.12520 [cs.CV]
	(or arXiv:2008.12520v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2008.12520

Submission history

From: Noa Garcia [view email]
[v1] Fri, 28 Aug 2020 07:33:30 UTC (10,256 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2020-08

Change to browse by:

cs
cs.CL

References & Citations

DBLP - CS Bibliography

listing | bibtex

Noa Garcia
Zihua Liu
Qingtao Hu
Mayu Otani
Chenhui Chu

…

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:A Dataset and Baselines for Visual Question Answering on Art

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Dataset and Baselines for Visual Question Answering on Art

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators