AnyAttack: Targeted Adversarial Attacks on Vision-Language Models toward Any Images

Zhang, Jiaming; Ye, Junhong; Ma, Xingjun; Li, Yige; Yang, Yunfan; Sang, Jitao; Yeung, Dit-Yan

Computer Science > Machine Learning

arXiv:2410.05346 (cs)

[Submitted on 7 Oct 2024 (v1), last revised 17 Dec 2024 (this version, v2)]

Title:AnyAttack: Targeted Adversarial Attacks on Vision-Language Models toward Any Images

Authors:Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Jitao Sang, Dit-Yan Yeung

View PDF HTML (experimental)

Abstract:Due to their multimodal capabilities, Vision-Language Models (VLMs) have found numerous impactful applications in real-world scenarios. However, recent studies have revealed that VLMs are vulnerable to image-based adversarial attacks, particularly targeted adversarial images that manipulate the model to generate harmful content specified by the adversary. Current attack methods rely on predefined target labels to create targeted adversarial attacks, which limits their scalability and applicability for large-scale robustness evaluations. In this paper, we propose AnyAttack, a self-supervised framework that generates targeted adversarial images for VLMs without label supervision, allowing any image to serve as a target for the attack. Our framework employs the pre-training and fine-tuning paradigm, with the adversarial noise generator pre-trained on the large-scale LAION-400M dataset. This large-scale pre-training endows our method with powerful transferability across a wide range of VLMs. Extensive experiments on five mainstream open-source VLMs (CLIP, BLIP, BLIP2, InstructBLIP, and MiniGPT-4) across three multimodal tasks (image-text retrieval, multimodal classification, and image captioning) demonstrate the effectiveness of our attack. Additionally, we successfully transfer AnyAttack to multiple commercial VLMs, including Google Gemini, Claude Sonnet, Microsoft Copilot and OpenAI GPT. These results reveal an unprecedented risk to VLMs, highlighting the need for effective countermeasures.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.05346 [cs.LG]
	(or arXiv:2410.05346v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.05346

Submission history

From: Jiaming Zhang [view email]
[v1] Mon, 7 Oct 2024 09:45:18 UTC (2,230 KB)
[v2] Tue, 17 Dec 2024 15:32:04 UTC (2,351 KB)

Computer Science > Machine Learning

Title:AnyAttack: Targeted Adversarial Attacks on Vision-Language Models toward Any Images

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:AnyAttack: Targeted Adversarial Attacks on Vision-Language Models toward Any Images

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators