short-paper

A Crowdsourcing Tool for Data Augmentation in Visual Question Answering Tasks

Authors:

Eduardo BezerraAuthors Info & Claims

WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web

Pages 137 - 140

https://doi.org/10.1145/3243082.3267455

Published: 16 October 2018 Publication History

Get Access

Abstract

Visual Question Answering (VQA) is a task that connects the fields of Computer Vision and Natural Language Processing. Taking as input an image I and a natural language question Q about I, a VQA model must be able to produce a coherent answer R (also in natural language) to Q. A particular type of visual question is one in which the question is binary (i.e., a question whose answer belongs to the set {yes, no}). Currently, deep neural networks correspond to the state of the art technique for training of VQA models. Despite its success, the application of neural networks to the VQA task requires a very large amount of data in order to produce models with adequate precision. Datasets currently used for the training of VQA models are the result of laborious manual labeling processes (i.e., made by humans). This context makes relevant the study of approaches to augment these datasets in order to train more accurate prediction models. This paper describes a crowdsourcing tool which can be used in a collaborative manner to augment an existing VQA dataset for binary questions. Our tool actively integrates candidate items from an external data source in order to optimize the selection of queries to be presented to curators.

References

[1]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual question answering. In Proceedings of the IEEE ICCV. 2425--2433.

Digital Library

Google Scholar

[2]

Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1993. Signature verification using a Siamese time delay neural network. (1993), 737--744 pages.

Digital Library

Google Scholar

[3]

Douglas Cirqueira, Lucas Vinícius, Márcia Pinheiro, Antônio Jacob Junior, Fábio Lobato, and Ádamo Santana. 2017. Opinion Label: A Gamified Crowdsourcing System for Sentiment Analysis Annotation. In Proceedings of the 23rd WebMedia (WebMedia '17). ACM, New York, NY, USA, 49--56.

Google Scholar

[4]

Marcello N de Amorim, Celso AS Santos, and Orivaldo de L Tavares. 2017. Crowdsourcing Environment for Complex Video Annotations. In Proceedings of the 23rd WebMedia (WebMedia '17). ACM, New York, NY, USA.

Google Scholar

[5]

Marcello N. de Amorim, Ricardo M.C. Segundo, Celso A.S. Santos, and Orivaldo de L. Tavares. 2017. Video Annotation by Cascading Microtasks: A Crowdsourcing Approach. In Proceedings of the 23rd WebMedia (WebMedia '17). ACM, New York, NY, USA, 49--56.

Digital Library

Google Scholar

[6]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2012. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. (2012).

Google Scholar

[7]

Xiao Lin and Devi Parikh. 2017. Active Learning for Visual Question Answering: An Empirical Study. CoRR abs/1711.01732 (2017).

Google Scholar

[8]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2015. SSD: Single Shot MultiBox Detector. (dec 2015). arXiv:1512.02325

Google Scholar

[9]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013).

Google Scholar

[10]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.

Digital Library

Google Scholar

Index Terms

A Crowdsourcing Tool for Data Augmentation in Visual Question Answering Tasks

Recommendations

Robust visual question answering via semantic cross modal augmentation
Abstract
Recent advances in vision-language models have resulted in improved accuracy in visual question answering (VQA) tasks. However, their robustness remains limited when faced with out-of-distribution data containing unanswerable questions. In this ...
Highlights
- VQA models often confidently give incorrect answers to irrelevant questions.
- We enhance model robustness at test-time through multi-modal semantic augmentation.
- Proposed CMA creates varied inputs for models and merges predictions ...
Medical visual question answering: A survey
Abstract
Medical Visual Question Answering (VQA) is a combination of medical artificial intelligence and popular VQA challenges. Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a ...
Highlights
- It is the first medical VQA survey about past and future research directions.
- This survey presents an overview of the publicly available medical VQA datasets.
- This survey gives a comprehensive summary and discussion of the ...
Visual question answering

A comprehensive review of the state of the art on the emerging task of visual question answeringReview the growing number of datasets, highlighting their distinct characteristicsAn in-depth analysis of the questions/answers provided in the recently-...

Comments

Information & Contributors

Information

Published In

WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web

October 2018

437 pages

ISBN:9781450358675

DOI:10.1145/3243082

General Chairs:
Manoel Carvalho Marques Neto
IFBA
,
Renato Lima Novais
IFBA
,
Carlos Ferraz
UFPE
,
Windson Viana
UFC

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SBC: Brazilian Computer Society
SIGMM: ACM Special Interest Group on Multimedia
CNPq: Conselho Nacional de Desenvolvimento Cientifico e Tecn
CGIBR: Comite Gestor da Internet no Brazil
CAPES: Brazilian Higher Education Funding Council

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

WebMedia '18

WebMedia '18: Brazilian Symposium on Multimedia and the Web

October 16 - 19, 2018

BA, Salvador, Brazil

Acceptance Rates

WebMedia '18 Paper Acceptance Rate 37 of 111 submissions, 33%;

Overall Acceptance Rate 270 of 873 submissions, 31%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
94
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Robust visual question answering via semantic cross modal augmentation

Medical visual question answering: A survey

Visual question answering

Comments

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Login options

Full Access

PDF

eReader

Abstract

References

Index Terms

Recommendations

Robust visual question answering via semantic cross modal augmentation

Medical visual question answering: A survey

Visual question answering

Comments

Information

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations