Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models

Li, Liulei; Wang, Wenguan; Yang, Yi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2410.20155 (cs)

[Submitted on 26 Oct 2024]

Title:Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models

Authors:Liulei Li, Wenguan Wang, Yi Yang

View PDF HTML (experimental)

Abstract:Prevalent human-object interaction (HOI) detection approaches typically leverage large-scale visual-linguistic models to help recognize events involving humans and objects. Though promising, models trained via contrastive learning on text-image pairs often neglect mid/low-level visual cues and struggle at compositional reasoning. In response, we introduce DIFFUSIONHOI, a new HOI detector shedding light on text-to-image diffusion models. Unlike the aforementioned models, diffusion models excel in discerning mid/low-level visual concepts as generative models, and possess strong compositionality to handle novel concepts expressed in text inputs. Considering diffusion models usually emphasize instance objects, we first devise an inversion-based strategy to learn the expression of relation patterns between humans and objects in embedding space. These learned relation embeddings then serve as textual prompts, to steer diffusion models generate images that depict specific interactions, and extract HOI-relevant cues from images without heavy fine-tuning. Benefited from above, DIFFUSIONHOI achieves SOTA performance on three datasets under both regular and zero-shot setups.

Comments:	NeurIPS 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2410.20155 [cs.CV]
	(or arXiv:2410.20155v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2410.20155

Submission history

From: Liulei Li [view email]
[v1] Sat, 26 Oct 2024 12:00:33 UTC (9,434 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators