Jan 4, 2022 · We propose a depth and segmentation based visual attention mechanism for Embodied Question Answering. First, we extract local semantic features.
We presented a novel depth and video segmentation based visual attention mechanism to improve the performance of EQA systems.
Answer. Bottom-up attention block. Top-down attention block. RGB. Mask. Embedding+ GRU. Segmentation attention. Linear layers. Embedded question features.
Missing: Depth | Show results with:Depth
In the paper [66] , a top-up visual attention mechanism was used in image captioning and visual question answering (VQA) that can understand the images deeper ...
To tackle these problems, we propose a segmentation based visual attention mechanism for Embodied Question Answering. Firstly, We extract the local semantic ...
Missing: Depth | Show results with:Depth
Co-authors ; Depth and video segmentation based visual attention for embodied question answering. H Luo, G Lin, Y Yao, F Liu, Z Liu, Z Tang. IEEE Transactions on ...
SegEQA : video segmentation based visual attention for embodied question answering. Proceedings of the International. Conference on Computer Vision (ICCV) 2019.
Missing: Depth | Show results with:Depth
We present a transformer-based framework that aligns vision and language information for the task of robot navigation and question answering.
EmbodiedQA requires a range of AI skills – language un- derstanding, visual recognition, active perception, goal- driven navigation, commonsense reasoning, long ...
Nov 30, 2017 · Depth and Video Segmentation Based Visual Attention for Embodied Question Answering ... Visual Question Answering (VQA) sub-task and a ...