Depth and Video Segmentation Based Visual Attention for Embodied Question Answering.

AllImages Books Videos Maps News Shopping

Depth and Video Segmentation Based Visual Attention for Embodied ...

Jan 4, 2022 · We propose a depth and segmentation based visual attention mechanism for Embodied Question Answering. First, we extract local semantic features.

Scholarly articles for Depth and Video Segmentation Based Visual Attention for Embodied Question Answering.

scholar.google.com › citations

Depth and video segmentation based visual attention …
Luo · Cited by 15

Depth and Video Segmentation Based Visual Attention for Embodied ...

www.computer.org › journal › 2023/06

We presented a novel depth and video segmentation based visual attention mechanism to improve the performance of EQA systems.

[PDF] SegEQA: Video Segmentation Based Visual Attention for Embodied ...

openaccess.thecvf.com › papers › L...

Answer. Bottom-up attention block. Top-down attention block. RGB. Mask. Embedding+ GRU. Segmentation attention. Linear layers. Embedded question features.

Missing: Depth | Show results with:Depth

SegEQA: Video Segmentation Based Visual Attention for Embodied ...

www.researchgate.net › publication › 33...

In the paper [66] , a top-up visual attention mechanism was used in image captioning and visual question answering (VQA) that can understand the images deeper ...

ICCV 2019 Open Access Repository

openaccess.thecvf.com › html › Luo_Seg...

To tackle these problems, we propose a segmentation based visual attention mechanism for Embodied Question Answering. Firstly, We extract the local semantic ...

Missing: Depth | Show results with:Depth

‪Haonan Luo‬ - ‪Google Scholar‬

scholar.google.com › citations

Co-authors ; Depth and video segmentation based visual attention for embodied question answering. H Luo, G Lin, Y Yao, F Liu, Z Liu, Z Tang. IEEE Transactions on ...

[PDF] SegEQA : video segmentation based visual attention for embodied ...

dr.ntu.edu.sg › bitstream

SegEQA : video segmentation based visual attention for embodied question answering. Proceedings of the International. Conference on Computer Vision (ICCV) 2019.

Missing: Depth | Show results with:Depth

Transformer-based vision-language alignment for robot navigation and ...

www.sciencedirect.com › article › abs › pii

We present a transformer-based framework that aligns vision and language information for the task of robot navigation and question answering.

[PDF] paper.pdf - Embodied Question Answering

embodiedqa.org › paper

EmbodiedQA requires a range of AI skills – language un- derstanding, visual recognition, active perception, goal- driven navigation, commonsense reasoning, long ...

[PDF] Embodied Question Answering - Semantic Scholar

www.semanticscholar.org › paper › Emb...

Nov 30, 2017 · Depth and Video Segmentation Based Visual Attention for Embodied Question Answering ... Visual Question Answering (VQA) sub-task and a ...