Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

Park, Dongmin; Qian, Zhaofang; Han, Guangxing; Lim, Ser-Nam

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.10492 (cs)

[Submitted on 15 Mar 2024 (v1), last revised 3 Oct 2024 (this version, v3)]

Title:Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

Authors:Dongmin Park, Zhaofang Qian, Guangxing Han, Ser-Nam Lim

View PDF HTML (experimental)

Abstract:Mitigating hallucinations of Large Vision Language Models,(LVLMs) is crucial to enhance their reliability for general-purpose assistants. This paper shows that such hallucinations of LVLMs can be significantly exacerbated by preceding user-system dialogues. To precisely measure this, we first present an evaluation benchmark by extending popular multi-modal benchmark datasets with prepended hallucinatory dialogues powered by our novel Adversarial Question Generator (AQG), which can automatically generate image-related yet adversarial dialogues by adopting adversarial attacks on LVLMs. On our benchmark, the zero-shot performance of state-of-the-art LVLMs drops significantly for both the VQA and Captioning tasks. Next, we further reveal this hallucination is mainly due to the prediction bias toward preceding dialogues rather than visual content. To reduce this bias, we propose Adversarial Instruction Tuning (AIT) that robustly fine-tunes LVLMs against hallucinatory dialogues. Extensive experiments show our proposed approach successfully reduces dialogue hallucination while maintaining performance.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.10492 [cs.CV]
	(or arXiv:2403.10492v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.10492

Submission history

From: Zhaofang Qian [view email]
[v1] Fri, 15 Mar 2024 17:27:12 UTC (4,097 KB)
[v2] Sat, 25 May 2024 06:31:18 UTC (5,276 KB)
[v3] Thu, 3 Oct 2024 18:08:57 UTC (5,756 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators