ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes

Gong, Ran; Huang, Jiangyong; Zhao, Yizhou; Geng, Haoran; Gao, Xiaofeng; Wu, Qingyang; Ai, Wensi; Zhou, Ziheng; Terzopoulos, Demetri; Zhu, Song-Chun; Jia, Baoxiong; Huang, Siyuan

Computer Science > Artificial Intelligence

arXiv:2304.04321 (cs)

[Submitted on 9 Apr 2023 (v1), last revised 11 Sep 2023 (this version, v2)]

Title:ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes

Authors:Ran Gong, Jiangyong Huang, Yizhou Zhao, Haoran Geng, Xiaofeng Gao, Qingyang Wu, Wensi Ai, Ziheng Zhou, Demetri Terzopoulos, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang

View PDF

Abstract:Understanding the continuous states of objects is essential for task learning and planning in the real world. However, most existing task learning benchmarks assume discrete (e.g., binary) object goal states, which poses challenges for the learning of complex tasks and transferring learned policy from simulated environments to the real world. Furthermore, state discretization limits a robot's ability to follow human instructions based on the grounding of actions and states. To tackle these challenges, we present ARNOLD, a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes. ARNOLD is comprised of 8 language-conditioned tasks that involve understanding object states and learning policies for continuous goals. To promote language-instructed learning, we provide expert demonstrations with template-generated language descriptions. We assess task performance by utilizing the latest language-conditioned policy learning models. Our results indicate that current models for language-conditioned manipulations continue to experience significant challenges in novel goal-state generalizations, scene generalizations, and object generalizations. These findings highlight the need to develop new algorithms that address this gap and underscore the potential for further research in this area. Project website: this https URL.

Comments:	The first two authors contributed equally; 20 pages; 17 figures; project availalbe: this https URL ICCV 2023
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Cite as:	arXiv:2304.04321 [cs.AI]
	(or arXiv:2304.04321v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2304.04321

Submission history

From: Ran Gong [view email]
[v1] Sun, 9 Apr 2023 21:42:57 UTC (43,018 KB)
[v2] Mon, 11 Sep 2023 11:27:53 UTC (24,715 KB)

Computer Science > Artificial Intelligence

Title:ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators