ActionVOS: Actions as Prompts for Video Object Segmentation

Ouyang, Liangyang; Liu, Ruicong; Huang, Yifei; Furuta, Ryosuke; Sato, Yoichi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.07402 (cs)

[Submitted on 10 Jul 2024]

Title:ActionVOS: Actions as Prompts for Video Object Segmentation

Authors:Liangyang Ouyang, Ruicong Liu, Yifei Huang, Ryosuke Furuta, Yoichi Sato

View PDF HTML (experimental)

Abstract:Delving into the realm of egocentric vision, the advancement of referring video object segmentation (RVOS) stands as pivotal in understanding human activities. However, existing RVOS task primarily relies on static attributes such as object names to segment target objects, posing challenges in distinguishing target objects from background objects and in identifying objects undergoing state changes. To address these problems, this work proposes a novel action-aware RVOS setting called ActionVOS, aiming at segmenting only active objects in egocentric videos using human actions as a key language prompt. This is because human actions precisely describe the behavior of humans, thereby helping to identify the objects truly involved in the interaction and to understand possible state changes. We also build a method tailored to work under this specific setting. Specifically, we develop an action-aware labeling module with an efficient action-guided focal loss. Such designs enable ActionVOS model to prioritize active objects with existing readily-available annotations. Experimental results on VISOR dataset reveal that ActionVOS significantly reduces the mis-segmentation of inactive objects, confirming that actions help the ActionVOS model understand objects' involvement. Further evaluations on VOST and VSCOS datasets show that the novel ActionVOS setting enhances segmentation performance when encountering challenging circumstances involving object state changes. We will make our implementation available at this https URL.

Comments:	This paper is accepted by ECCV2024. Code will be released at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.07402 [cs.CV]
	(or arXiv:2407.07402v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.07402

Submission history

From: Liangyang Ouyang [view email]
[v1] Wed, 10 Jul 2024 06:57:04 UTC (4,937 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ActionVOS: Actions as Prompts for Video Object Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ActionVOS: Actions as Prompts for Video Object Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators