"Where am I?" Scene Retrieval with Language

Chen, Jiaqi; Barath, Daniel; Armeni, Iro; Pollefeys, Marc; Blum, Hermann

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.14565 (cs)

[Submitted on 22 Apr 2024 (v1), last revised 8 Nov 2024 (this version, v2)]

Title:"Where am I?" Scene Retrieval with Language

Authors:Jiaqi Chen, Daniel Barath, Iro Armeni, Marc Pollefeys, Hermann Blum

View PDF HTML (experimental)

Abstract:Natural language interfaces to embodied AI are becoming more ubiquitous in our daily lives. This opens up further opportunities for language-based interaction with embodied agents, such as a user verbally instructing an agent to execute some task in a specific location. For example, "put the bowls back in the cupboard next to the fridge" or "meet me at the intersection under the red sign." As such, we need methods that interface between natural language and map representations of the environment. To this end, we explore the question of whether we can use an open-set natural language query to identify a scene represented by a 3D scene graph. We define this task as "language-based scene-retrieval" and it is closely related to "coarse-localization," but we are instead searching for a match from a collection of disjoint scenes and not necessarily a large-scale continuous map. We present Text2SceneGraphMatcher, a "scene-retrieval" pipeline that learns joint embeddings between text descriptions and scene graphs to determine if they are a match. The code, trained models, and datasets will be made public.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.14565 [cs.CV]
	(or arXiv:2404.14565v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.14565

Submission history

From: Jiaqi Chen [view email]
[v1] Mon, 22 Apr 2024 20:21:32 UTC (8,061 KB)
[v2] Fri, 8 Nov 2024 14:33:06 UTC (8,049 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:"Where am I?" Scene Retrieval with Language

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:"Where am I?" Scene Retrieval with Language

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators