PromptAttack: Prompt-based Attack for Language Models via Gradient Search

Shi, Yundi; Li, Piji; Yin, Changchun; Han, Zhaoyang; Zhou, Lu; Liu, Zhe

Computer Science > Computation and Language

arXiv:2209.01882 (cs)

[Submitted on 5 Sep 2022]

Title:PromptAttack: Prompt-based Attack for Language Models via Gradient Search

Authors:Yundi Shi, Piji Li, Changchun Yin, Zhaoyang Han, Lu Zhou, Zhe Liu

View PDF

Abstract:As the pre-trained language models (PLMs) continue to grow, so do the hardware and data requirements for fine-tuning PLMs. Therefore, the researchers have come up with a lighter method called \textit{Prompt Learning}. However, during the investigations, we observe that the prompt learning methods are vulnerable and can easily be attacked by some illegally constructed prompts, resulting in classification errors, and serious security problems for PLMs. Most of the current research ignores the security issue of prompt-based methods. Therefore, in this paper, we propose a malicious prompt template construction method (\textbf{PromptAttack}) to probe the security performance of PLMs. Several unfriendly template construction approaches are investigated to guide the model to misclassify the task. Extensive experiments on three datasets and three PLMs prove the effectiveness of our proposed approach PromptAttack. We also conduct experiments to verify that our method is applicable in few-shot scenarios.

Comments:	12 pages
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cite as:	arXiv:2209.01882 [cs.CL]
	(or arXiv:2209.01882v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2209.01882

Submission history

From: Piji Li [view email]
[v1] Mon, 5 Sep 2022 10:28:20 UTC (1,452 KB)

Computer Science > Computation and Language

Title:PromptAttack: Prompt-based Attack for Language Models via Gradient Search

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PromptAttack: Prompt-based Attack for Language Models via Gradient Search

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators