Learning Visual Commonsense for Robust Scene Graph Generation

Zareian, Alireza; Wang, Zhecan; You, Haoxuan; Chang, Shih-Fu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2006.09623 (cs)

[Submitted on 17 Jun 2020 (v1), last revised 18 Jul 2020 (this version, v2)]

Title:Learning Visual Commonsense for Robust Scene Graph Generation

Authors:Alireza Zareian, Zhecan Wang, Haoxuan You, Shih-Fu Chang

View PDF

Abstract:Scene graph generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild. Perception errors often lead to nonsensical compositions in the output scene graph, which do not follow real-world rules and patterns, and can be corrected using commonsense knowledge. We propose the first method to acquire visual commonsense such as affordance and intuitive physics automatically from data, and use that to improve the robustness of scene understanding. To this end, we extend Transformer models to incorporate the structure of scene graphs, and train our Global-Local Attention Transformer on a scene graph corpus. Once trained, our model can be applied on any scene graph generation model and correct its obvious mistakes, resulting in more semantically plausible scene graphs. Through extensive experiments, we show our model learns commonsense better than any alternative, and improves the accuracy of state-of-the-art scene graph generation methods.

Comments:	To be presented at ECCV 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2006.09623 [cs.CV]
	(or arXiv:2006.09623v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2006.09623

Submission history

From: Alireza Zareian [view email]
[v1] Wed, 17 Jun 2020 03:07:53 UTC (5,127 KB)
[v2] Sat, 18 Jul 2020 11:10:45 UTC (7,991 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2020-06

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Alireza Zareian
Haoxuan You
Shih-Fu Chang

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Learning Visual Commonsense for Robust Scene Graph Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Learning Visual Commonsense for Robust Scene Graph Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators