skip to main content
10.1145/3243082.3267455acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
short-paper

A Crowdsourcing Tool for Data Augmentation in Visual Question Answering Tasks

Published: 16 October 2018 Publication History

Abstract

Visual Question Answering (VQA) is a task that connects the fields of Computer Vision and Natural Language Processing. Taking as input an image I and a natural language question Q about I, a VQA model must be able to produce a coherent answer R (also in natural language) to Q. A particular type of visual question is one in which the question is binary (i.e., a question whose answer belongs to the set {yes, no}). Currently, deep neural networks correspond to the state of the art technique for training of VQA models. Despite its success, the application of neural networks to the VQA task requires a very large amount of data in order to produce models with adequate precision. Datasets currently used for the training of VQA models are the result of laborious manual labeling processes (i.e., made by humans). This context makes relevant the study of approaches to augment these datasets in order to train more accurate prediction models. This paper describes a crowdsourcing tool which can be used in a collaborative manner to augment an existing VQA dataset for binary questions. Our tool actively integrates candidate items from an external data source in order to optimize the selection of queries to be presented to curators.

References

[1]
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual question answering. In Proceedings of the IEEE ICCV. 2425--2433.
[2]
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1993. Signature verification using a Siamese time delay neural network. (1993), 737--744 pages.
[3]
Douglas Cirqueira, Lucas Vinícius, Márcia Pinheiro, Antônio Jacob Junior, Fábio Lobato, and Ádamo Santana. 2017. Opinion Label: A Gamified Crowdsourcing System for Sentiment Analysis Annotation. In Proceedings of the 23rd WebMedia (WebMedia '17). ACM, New York, NY, USA, 49--56.
[4]
Marcello N de Amorim, Celso AS Santos, and Orivaldo de L Tavares. 2017. Crowdsourcing Environment for Complex Video Annotations. In Proceedings of the 23rd WebMedia (WebMedia '17). ACM, New York, NY, USA.
[5]
Marcello N. de Amorim, Ricardo M.C. Segundo, Celso A.S. Santos, and Orivaldo de L. Tavares. 2017. Video Annotation by Cascading Microtasks: A Crowdsourcing Approach. In Proceedings of the 23rd WebMedia (WebMedia '17). ACM, New York, NY, USA, 49--56.
[6]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2012. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. (2012).
[7]
Xiao Lin and Devi Parikh. 2017. Active Learning for Visual Question Answering: An Empirical Study. CoRR abs/1711.01732 (2017).
[8]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2015. SSD: Single Shot MultiBox Detector. (dec 2015). arXiv:1512.02325
[9]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013).
[10]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web
October 2018
437 pages
ISBN:9781450358675
DOI:10.1145/3243082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Crowdsourcing
  2. Data Augmentation
  3. Human Computation
  4. Image Annotation

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

WebMedia '18
WebMedia '18: Brazilian Symposium on Multimedia and the Web
October 16 - 19, 2018
BA, Salvador, Brazil

Acceptance Rates

WebMedia '18 Paper Acceptance Rate 37 of 111 submissions, 33%;
Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 94
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media