Imperceptible Face Forgery Attack via Adversarial Semantic Mask

Decheng Liu, Qixuan Su, Chunlei Peng,  , Nannan Wang,  and Xinbo Gao, D. Liu, Q. Su, C. Peng are with the State Key Laboratory of Integrated Services Networks, School of Cyber Engineering, Xidian University, Xi’an 710071, Shaanxi, P. R. China and with Shanghai Key Laboratory of Computer Software Evaluating and Testing, Shanghai 201112, P. R. China (e-mail: [email protected]; [email protected]; [email protected]).
N. Wang is with the State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710071, Shaanxi, P. R. China (e-mail: [email protected]).
X. Gao is with the Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, P. R. China.(e-mail: [email protected]).
Abstract

With the great development of generative model techniques, face forgery detection draws more and more attention in the related field. Researchers find that existing face forgery models are still vulnerable to adversarial examples with generated pixel perturbations in the global image. These generated adversarial samples still can’t achieve satisfactory performance because of the high detectability. To address these problems, we propose an Adversarial Semantic Mask Attack framework (ASMA) which can generate adversarial examples with good transferability and invisibility. Specifically, we propose a novel adversarial semantic mask generative model, which can constrain generated perturbations in local semantic regions for good stealthiness. The designed adaptive semantic mask selection strategy can effectively leverage the class activation values of different semantic regions, and further ensure better attack transferability and stealthiness. Extensive experiments on the public face forgery dataset prove the proposed method achieves superior performance compared with several representative adversarial attack methods. The code is publicly available at https://github.com/clawerO-O/ASMA.

Index Terms:
Forgery detection, face forgery, adversarial learning, adversarial attack.

I Introduction

It has been found that deep learning-based AI systems are susceptible to being fooled by small well-designed perturbations that make significantly incorrect and high-confidence predictions. With the development of image forgery techniques, forgery detection models are under increasing threat. Researchers often generate adversarial images by attacking algorithms to make detection models misclassify images. Adversarial images have a great impact on the optimization of forgery detection models in addition to their negative aspects in terms of security and privacy. The robustness and detection accuracy of the models trained with adversarial samples are improved to varying degrees. Therefore, the study of attack algorithms for generating adversarial examples has significance and application value in promoting the development of face forgery detection models. Existing attack algorithms mainly add the generated adversarial noise to the original image, which makes the deep network inference abnormality. Because of the large area of adversarial noise, the generated adversarial examples have poor stealth, making them easy for human eyes to recognize.

Thus, it is necessary to explore a stealthy adversarial attack algorithm capable of deceiving both human eyes and forgery detector machines. However, most existing adversarial attack methods generate adversarial samples by adding adversarial perturbations to the whole face area, without considering the diverse properties of different semantic face regions. The former methods always generate redundant adversarial noises, which means the adversarial generation model adds too much adversarial noise in the regions irrelevant to the model’s decision. Thus, it is still a challenge to enhance the attacking ability while generating high-quality faces and maintaining visual stealthiness for face forgery detection models. To improve the repeatability and stealthiness of the generated adversarial examples, we propose an adversarial semantic mask attack algorithm for face forgery detection tasks. We leverage the class activation mapping and face semantic parsing module to locate the key semantic regions adaptively. Then, we constrain the adversarial attack noise region and enhance the stealthiness of the adversarial sample by adding adversarial noise to the key part of the face image. We conducted experiments on the Deepfake Detection Challenge (DFDC) dataset [1]. The effectiveness of the proposed method has been proven by compared with diverse representative adversarial attack algorithms.

The main contributions of our paper can be summarized as follows:

  1. 1.

    We explore a novel adversarial semantic mask generation pipeline attacking face forgery detection, which can constrain generated perturbations in local semantic regions for good stealthiness.

  2. 2.

    We further propose the adaptive semantic mask selection strategy, which leverages the class activation mapping to select more suitable adversarial semantic mask regions and aims to maintain low perceptibility in real applications.

  3. 3.

    Experimental results on public large-scale DFDC dataset illustrate the superior performance of the proposed ASMA compared with representative adversarial attack algorithms. The code is publicly available at https://github.com/clawerO-O/ASMA.

II Related Work

II-A Forgery Detection Methods

Over the past decade, with the emergence and continued maturation of deep learning techniques, forgery techniques, especially image forgery, have had a significant impact on people’s lives. With the development of adversarial generative models, high-quality forged images are becoming increasingly difficult for humans to recognize correctly. Regarding the potential malice from academics, the researchers attempted to detect whether the images had been tampered with to mitigate the danger, which was seen as a binary classification problem. Existing forgery detection techniques are carried out in two main areas, the spatial domain and the frequency domain. In the image domain, some works [2, 3] have utilized the approach of extracting information about the content features of an image to detect unusual noise. These methods only utilize the information from the spatial domain, which generally overfits the classification boundary. In the frequency domain, some works [4, 5] leverage the difference in the frequency domain between real and fake images for forgery detection. These methods make forgery detection more reliable, but the detection of images takes a longer time. Also, researchers explore more accurate and efficient detection methods by utilizing both the image domain and frequency domain [6]. However, limited works focus on exploring the adversarial attacking samples for face forgery methods.

II-B Adversarial Attack Methods

Adversarial attack examples aim to mislead deep learning models and transfer across different target models. Generally, given a well-trained network, the goal of the adversarial attack is to generate adversarial examples that make the network predict wrongly.

Adversarial attack noise. Gradient-based attack. To mislead the pre-trained model for classification detection, the strength of the attack needs to be increased and the image changed enough to be discerned by the human eye. The classic case of noise-based adversarial example generation is an experiment conducted by Goodfellow et al. [7], which proves that the recognition results of the model can be misled by adding a small amount of perturbation to the image. These attack algorithms are usually single-step or multi-step attack methods that calculate the perturbations based on the gradient of the adversarial loss. Several gradient-based methods have been proposed for adversarial attacks, including the Basic Iterative Method (BIM) and projected gradient descent (PGD). In addition, a method called DeepFool ensures that the distance is minimized throughout the iteration by calculating the gap between the adversarial sample and the original sample, which minimizes the generated perturbations. Although adversarial examples can be generated in this way to confuse the model, as the attack intensity increases, the human eye can observe the difference between the source image and the adversarial image. Optimization-based attack. Like the model training process, researchers have treated the process of generating adversarial samples as a task, taking as a goal to be able to perturb the forgery detection model, and setting up an optimizer such that the adversarial samples are continually tuned to come closest to the model’s decision boundaries. In these methods, the attack is made somewhat model-based by excluding some pixel points that have little effect on the model classification. These optimization-based methods ensure that the noise range of the antagonistic samples is small, but they also lead to larger time-consuming and less transferability.

Adversarial attack on face analysis. With the development of forgery technology, forgery detection models are increasingly replacing human eyes as the primary way of forgery detection. It has also been argued that adversarial image generation can be achieved by substituting face regions, mainly in the form of patches [8]. Some works [9, 10] propose makeup by models extracting the make-up features of the target face image to generate specific noise to be added to the face. Zihao Xiao et al [11] extend the proposed GenAP methods to other tasks, e.g., image classification via adversarial patches in the query-free black-box setting. For example, Yang Hou et al [12] propose a method that can evade forgery detectors by minimizing the statistical differences between natural and fake images. In this paper, we propose an adversarial semantic mask attack algorithm that can mislead the forgery detection model while ensuring good stealthiness.

III Proposed Approach

To generate images with better mobility and stealthiness, our work leverages the adaptive semantic mask selection strategy and face semantic parsing module to enhance image stealthiness and attack transferability. In this section, we start by introducing the motivation for the algorithm. The details of the proposed framework are presented.

III-A Motivation

Existing forgery detection attack methods tend to globally modify the original face through generating adversarial noises, without considering the diverse properties of different face semantic regions. This kind of adversarial attack method may be effective when attacking non-face data. However, considering the specific and rich semantic structure in face images, it is not suitable to directly bring traditional adversarial attack algorithms to the forgery detection attacking task. This is because this kind of adversarial noise in the global area easily generates unnatural visual artifacts, which can result in high detectability. Existing related methods only focus on the whole face area and do not take into account the specific properties of different semantic regions, so the generated adversarial samples may add redundant disturbance in the region unrelated to the model decision. The adversarial samples have obvious adversarial artifacts, which are easy to be perceived by the human eye, resulting in low stealthiness. Considering the specific properties of different face semantic regions, we propose the novel adversarial semantic mask attacking face forgery detection method to maintain both high transferability and low detectability.

Refer to caption
Figure 1: The framework of the proposed adversarial semantic mask attack for face forgery detection method.

III-B Adversarial Semantic Mask Attack Generation

The interpretability of neural networks has always been a hot topic [13], while deep convolutional neural networks are usually considered as a black-box model, which makes it difficult to understand its internal mechanism. Class Activation Mapping (CAM) is a feature visualization technique typically used to acquire images with a specific type of architecture Response heat maps for classification models. The class activation mapping responds to the model’s attention to the input image, and visualization techniques based on class activation mapping play an important role in understanding the working mechanism of the model. Considering the possibility that interfering class activation features may directly affect the target model’s output, the algorithm uses class-activated features to create adversarial examples inspired from [14]. Given a natural sample x𝑥xitalic_x, the algorithm first initializes the adversarial sample x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as x𝑥xitalic_x. Inputting x𝑥xitalic_x and x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT into the pre-trained forgery detection model yields the class-activated features ΦxsubscriptΦ𝑥\Phi_{x}roman_Φ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and ΦxsubscriptΦsuperscript𝑥\Phi_{x^{\prime}}roman_Φ start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT for both, and then computes their gradients using the class-activated feature distance Δ(x,x)Δ𝑥superscript𝑥\Delta(x,x^{\prime})roman_Δ ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ):

gt=xΔ(x,x),subscript𝑔𝑡𝑥Δ𝑥superscript𝑥g_{t}=\nabla x\Delta(x,x^{\prime}),italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∇ italic_x roman_Δ ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , (1)

where Δ()Δ\Delta(\cdot)roman_Δ ( ⋅ ) is the characteristic distance measure function for xtsubscriptsuperscript𝑥𝑡x^{\prime}_{t}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is updated with the gradient to obtainxt+1subscriptsuperscript𝑥𝑡1x^{\prime}_{t+1}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT :

xt+1=xt+αsign(gt).subscriptsuperscript𝑥𝑡1subscriptsuperscript𝑥𝑡𝛼𝑠𝑖𝑔𝑛subscript𝑔𝑡x^{\prime}_{t+1}=x^{\prime}_{t}+\alpha\cdot sign(g_{t}).italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_α ⋅ italic_s italic_i italic_g italic_n ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) . (2)

Here α𝛼\alphaitalic_α is the attack step size, which will be obtained by constraining the perturbation:

xt+1=clip(xt+1,xϵ,x+ϵ).subscriptsuperscript𝑥𝑡1𝑐𝑙𝑖𝑝subscriptsuperscript𝑥𝑡1𝑥italic-ϵ𝑥italic-ϵx^{\prime}_{t+1}=clip(x^{\prime}_{t+1},x-\epsilon,x+\epsilon).italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_c italic_l italic_i italic_p ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_x - italic_ϵ , italic_x + italic_ϵ ) . (3)

The class activation mapping map CAMc(w,h)𝐶𝐴subscript𝑀𝑐𝑤CAM_{c}(w,h)italic_C italic_A italic_M start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_w , italic_h ) of the input samples is then obtained to locate the fake attention, and the face is segmented using the face analysis module to segment the face. The pixel averages of the activation maps are computed for different semantic regions as the forgery correlation scores and sorting, selecting labels to obtain semantic masks, and restricting the generated adversarial noise to the semantic mask region. By iteratively performing such an update process, the algorithm can achieve maximizing Δ(x,x)Δ𝑥superscript𝑥\Delta(x,x^{\prime})roman_Δ ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) to obtain the final confrontation sample. The final adversarial attack mask is acquired by integrating the selected face regions mask. The training process and more algorithm details are shown in Algorithm 1.

III-C Adaptive Semantic Mask Selection

In order to select suitable face semantic regions to generate adversarial masks, we leverage the class activation values to choose semantic regions adaptively. The face parsing network can perform semantic segmentation for the whole face image, classifying each pixel into a particular semantic label. We utilize the public algorithm to segment the face into multiple semantic categories, including left eye, right eye, left eyebrow, right eyebrow, nose, upper lip, lower lip, inside of mouth, face, hair, or background. The corresponding semantic features are extracted by selecting the labels corresponding to the facial regions. Integrated with the class activation values of different regions, we select the most suitable face regions to generate an adversarial attack mask, which contains smaller and more important areas for face forgery detection tasks.

IV Experiments

In this section, we evaluated the proposed method on the dataset: Deepfake Detection Challenge dataset. We compared other state-of-the-art methods and the experimental results prove that our method achieved satisfactory performance in the image attack task. Then we investigate the effect of different parameters on the recognition performance. Finally, we conduct an ablation study to evaluate the effectiveness of the proposed ASMA.

IV-A Databases

DFDC dataset contains 472GB of data, including 119,197 face videos, of which 100,000 are fake face videos and 19,197 are videos taken by real people with more lifelike content. Of these, 100,000 videos are fake face videos and 19,197 videos are real videos with more lifelike content. The fake face videos were generated using a variety of face generation techniques, including DeepFakes, face2face, and other face-faking and expression editing algorithms, as well as unlearned methods to make the dataset contain as many fake face videos as possible. Each video in the dataset has a duration of 10 seconds, a frame rate ranging from 15 to 30 fps, and a video resolution ranging from 320×240 to 3840×2160, which makes the DFDC dataset more interesting than other datasets. The DFDC dataset has the largest size and the richest number of fake faces compared to other datasets.

Algorithm 1 The detailed training process of ASMA.

Input: A pretrained face forgery detection model 𝒫𝒫\mathcal{P}caligraphic_P, input original face image x𝑥xitalic_x.
Parameter: number of iterations T𝑇Titalic_T and attack step size α𝛼\alphaitalic_α.
Output: The adversarial example x¯¯𝑥\bar{x}over¯ start_ARG italic_x end_ARG.

1:  x¯0xsubscript¯𝑥0𝑥\bar{x}_{0}\leftarrow xover¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← italic_x.
2:  for t𝑡titalic_t = 0 to T𝑇Titalic_T - 1 do
3:     Forward x𝑥xitalic_x and x¯tsubscript¯𝑥𝑡\bar{x}_{t}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to 𝒫𝒫\mathcal{P}caligraphic_P, and obtain class activation features ΘxsubscriptΘ𝑥\Theta_{x}roman_Θ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and Θx¯tsubscriptΘsubscript¯𝑥𝑡\Theta_{\bar{x}_{t}}roman_Θ start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT.
4:     Compute the feature distance. Δ(x,x¯t)=δ(Φx,Φx¯t)Δ𝑥subscript¯𝑥𝑡𝛿subscriptΦ𝑥subscriptΦsubscript¯𝑥𝑡\Delta(x,\bar{x}_{t})=\delta(\Phi_{x},\Phi_{\bar{x}_{t}})roman_Δ ( italic_x , over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_δ ( roman_Φ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , roman_Φ start_POSTSUBSCRIPT over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
5:     Compute gradients with the Eq. (1).
6:     Update the adversarial example x¯tsubscript¯𝑥𝑡\bar{x}_{t}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with the Eq. (2).
7:     Project x¯t+1subscript¯𝑥𝑡1\bar{x}_{t+1}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT to the vicinity of x𝑥xitalic_x with the Eq. (3).
8:  end for
9:  Compute global noise: xg=xTxsubscript𝑥𝑔subscript𝑥𝑇𝑥{x_{g}}=x_{T}-xitalic_x start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT - italic_x
10:  Generate semantic mask xmsubscript𝑥𝑚x_{m}italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT by the adaptive semantic mask selection strategy.
11:  Update the adversarial example x¯Tsubscript¯𝑥𝑇\bar{x}_{T}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT: x¯T=x+xgxmsubscript¯𝑥𝑇𝑥direct-productsubscript𝑥𝑔subscript𝑥𝑚\bar{x}_{T}=x+{x_{g}}\odot x_{m}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = italic_x + italic_x start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ⊙ italic_x start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT
12:  return x¯Tsubscript¯𝑥𝑇\bar{x}_{T}over¯ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT.

IV-B Experimental Settings

TABLE I: Evaluation of ASMA and other adversarial attack algorithms.
Model Method Xception ASR(%) ResNet-50 ASR(%) EfficientNet-b0 ASR(%) EfficientNet-b4 ASR(%)
XceptionNet FGSM[7] 42.04 13.19 2.85 9.70
BIM[15] 85.20 17.92 5.94 14.63
PGD[16] 85.23 18.47 6.14 15.88
C&W[17] 68.39 11.10 1.25 0.05
DeepFool[18] 77.62 8.30 0.42 0.18
TRM[19] 81.19 16.25 7.45 19.16
ASMA 85.33 31.68 11.46 23.02
ResNet-50 FGSM[7] 9.35 74.84 10.04 14.91
BIM[15] 16.44 75.49 10.96 17.66
PGD [16] 16.66 75.54 11.94 17.72
C&W [17] 3.03 58.22 0.15 0.31
DeepFool[18] 1.67 75.15 0.18 0.77
TRM[19] 15.76 66.24 19.63 22.95
ASMA 33.72 75.57 24.67 27.17
EfficientNet-b0 FGSM[7] 16.86 28.23 10.04 25.69
BIM[15] 27.01 29.41 26.87 28.17
PGD[16] 27.70 31.26 26.91 28.20
C&W [17] 5.98 3.49 1.01 0.09
DeepFool[18] 4.03 3.45 1.45 0.16
TRM[19] 30.18 37.48 26.33 52.59
ASMA 34.90 58.05 43.05 62.25
EfficientNet-b4 FGSM[7] 14.08 4.31 18.12 39.31
BIM[15] 21.18 5.49 20.91 47.78
PGD[16] 21.14 6.94 21.02 48.44
C&W [17] 9.92 1.53 0.21 16.37
DeepFool[18] 12.62 0.82 0.18 16.06
TRM[19] 25.95 27.56 28.33 52.54
ASMA 27.46 29.44 28.54 65.93

Implementation details During the experiments, 4000 videos in the dataset are randomly selected, and each video contains 300 frames. The algorithm in this chapter adopts MobileNet_SSD as the face detection model, extracts and generates aligned face images frame by frame, and if more than one face is detected in a frame, only the largest face is extracted. To balance the influence of positive and negative sample imbalance, the algorithm randomly extracts 15 frames for fake face videos and 75 frames for real videos during training. Then the cropped faces are preprocessed, and all aligned face images are scaled to 320 × 320 and saved. To evaluate the adversarial performance of the algorithm, the experiment uses the above method to extract 1000 images of fake faces, which are used as inputs to the network to generate adversarial samples. The training iterations of the forgery detection model are 50 rounds, and the algorithm adopts the Adam optimizer, with the network hyper-parameters set to β1subscript𝛽1\beta_{1}italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT= 0.9, β2subscript𝛽2\beta_{2}italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT=0.999, and the learning rate lrsubscript𝑙𝑟l_{r}italic_l start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT=0.01. During the generation of the confrontation samples, the perturbation size ϵitalic-ϵ\epsilonitalic_ϵ= 0.20, the number of iterations T𝑇Titalic_T = 20, and the step size α𝛼\alphaitalic_α = 0.015.
Target forgery detection models This paper uses some classical forgery detection models XceptionNet, ResNet50, EfficientNet-B0, and EfficientNet-B4 to study the ability to counter sample white-box and black-box attacks. This algorithm retrains these models on the DFDC dataset, where the inputs to all the forgery detection models are 320×320×3 images.

IV-C Comparison Results

This section compares this algorithm with representative adversarial attack algorithms [7, 15, 16, 17, 18, 20]. The algorithm generates adversarial examples based on the four classical models in the first column and transfers them to other networks for evaluation testing. As shown in Table I, it can be seen that the face confrontation image generation algorithm based on semantic mask noise in this chapter has stronger attack performance compared to attack algorithms such as FGSM and PGD. The proposed ASMA has the highest white-box attack success rate compared to both FGSM and PGD methods, e.g., the success rate of the adversarial sample generated based on XceptionNet when attacking XceptionNet is 43.29% higher than the FGSM method with a reduced attack area, 0.13% higher than the BIM method, 0.1% higher than the PGD method, 19.64% higher than the C&W method, 7.71% higher than the DeepFool method, and 4.14% higher than TRM method, which shows a very good performance of white-box attack. In comparison with other attack methods, the ASMA method produces adversarial examples with less variation in the attack success rate on different models, which indicates the good migratability of the adversarial examples. Meanwhile, the adversarial examples generated by the method in this chapter on ResNet50 have an attack success rate 17.06% higher than that of PGD when attacking XceptionNet, 12.73% higher than that of PGD when attacking EfficientNet-B0, and 9.45% higher than that of PGD when attacking EfficientNet-B4, which verifies the black-box migration performance of this algorithm. black-box migration of this algorithm. In addition, it can be seen that the success rate of the antagonistic samples generated on XceptionNet is lower when attacking EfficientNet-B4, and the adversarial examples generated based on EfficientNet-B4 are also difficult to migrate to XceptionNet due to the obvious structural difference between EfficientNet-B4 and XceptionNet, which is because the two networks have different structure. This is because of the obvious structural difference between EfficientNet-B4 and XceptionNet, and the adversarial attacks between the two networks have limited migration to each other. The experimental results show that adding a small amount of adversarial perturbation to the critical region of the image forgery has significantly improved the success rate of adversarial perturbation for the XceptionNet, ResNet50, EfficientNet-B0, and EfficientNet-B4 models.

IV-D Algorithm Analysis

Table II shows experimental results after adding adversarial noise to different attribute regions of the face during the generation of adversarial samples, and the experiments reflect the impact of the perturbation region selection on the success rate of the attack and the visual quality. Since face forgery mainly tampers with the five senses of the face, the main attention of the forgery detection model is also focused on these regions, and the facial skin, eyes, nose, eyebrows, and hair are selected as the perturbation regions for the experiments. In our experiments, we reflect the model’s prioritization of these regions by calculating the number of pixels in a particular region as a proportion of the overall pixels. From the result of the experiment, we combine the success rate of the attack after adding noise to the facial regions, the visual effect, and the importance of the main facial recognition regions of the model, and finally choose the eyes, nose, and eyebrows as the joint attack regions. As shown in the results, it can be observed that the success rate of the attack on skin and eyes is the highest among the perturbation selections of a single region, but due to the large area of the skin, the quality of the image is reduced after the attack, which makes the overall image less stealthy. As shown in the experimental results, it can be found that a better attack effect can be achieved by using a combination of multiple regions to attack. The proposed algorithm selects a combination of selected regions for training, which is combined with the effect of the attack on the image quality, resulting in generating the most aggressive adversarial examples while maintaining a high degree of concealment.

TABLE II: Quality analysis of adversarial images generated by selecting different features.
Feature MSE MAE PSNR SSIM Value Rate
Skin 0.0725 0.0354 82.7299 0.8797 0.031
Nose 0.0692 0.0022 74.8923 0.9249 0.016
Eye 0.0651 0.0006 70.5987 0.9372 0.004
Brow 0.0636 0.0008 78.9069 0.9852 0.009
Hair 0.0527 0.0526 81.4216 0.9349 0.001
Eye+Nose+Brow 0.0757 0.0075 63.0589 0.8939 0.029

IV-E Parameter Analysis

To evaluate the effect of the perturbation threshold size ϵitalic-ϵ\epsilonitalic_ϵ on the model performance, the experiments in this section generate adversarial examples on XceptionNet, set 6 groups of different sizes of perturbation, the perturbation varies from 0 to 0.25, and the increase is 0.05 each time. The experiments test the success rate of the generated confrontation images of the face on the XceptionNet forgery detection model and count MSE, MAE, PSNR, SSIM, and other metrics to evaluate the algorithm attack performance and the quality of the generated images. From Table III, it can be found that the larger the perturbation, the higher its attack success rate, but the worse the image quality index. When the perturbation is too large (greater than 0.15), as the perturbation increases, the gain in the attack success rate is not obvious but causes a rapid decline in the image quality metrics. Based on the above experimental results, the hyperparameter of the perturbation ϵitalic-ϵ\epsilonitalic_ϵ is set to 0.15 in the adversarial attack module of the algorithm in this chapter for training and testing.

TABLE III: Evaluation of different perturbation sizes in terms of attack success rate and visual quality.
Perturbation ASR(%) MSE MAE PSNR SSIM
0.10 35.64 0.0118 0.2223 64.1251 0.9998
0.15 69.38 0.0387 0.1088 63.8271 0.9985
0.20 85.33 0.0757 0.0075 63.0589 0.9939
0.25 91.77 0.1254 0.0031 60.8061 0.9878
0.30 95.28 0.2181 0.0019 58.5085 0.9741

IV-F Quantitative analysis

Refer to caption
Figure 2: Visualization comparison of feature activation area changes before and after adversarial attacks on real faces.

This section compares the quality of images generated by ASMA and other adversarial algorithms through qualitative and quantitative analysis. An important aspect of the quality requirements of adversarial images is concealability and different adversarial algorithms have different ways of generating images and generate images with inconsistent strength of concealment. Qualitative analysis experiments observe the difference between the generated adversarial image and the source image, combined with the added adversarial noise, to judge the adversarial generation algorithm. To better reflect the quality of the generated image, the commonly used image quality assessment indicators are used to calculate the difference from the original image. Quantitative experiments were compared by MSE, MAE, PSNR, and SSIM quality assessment metrics. For a better comparison of the results, we use Xception to generate the adversarial images, and for ASMA, set the maximum perturbation to 0.15 and the size of each iteration to 0.05.

IV-G Qualitative analysis

Refer to caption
Figure 3: Comparison of images generated by different adversarial generation algorithms.

The first column of Figure 3 is the original human face, and the rest is the adversarial face generated by FGSM, BIM, PGD, C&W, DeepFool, and ASMA. In the attack process of ASMA, the nose, left and right eyes, and left and right eyebrow areas of the face are selected to add adversarial perturbation. Since these areas are forged key areas of face images and have more complex texture features, adding appropriate adversarial perturbations to this area has better attack performance, and adversarial perturbations are more difficult to perceive by the human eye. In this section, the masks of the left and right eyes and the left and right eyebrow regions are selected, and the semantic mask is multiplied by the adversarial noise so that the adversarial samples generated in each iteration only retain the adversarial perturbation of the mask region. Due to the decrease in the disturbance area, the attack performance may be reduced. To take into account the attack performance of the algorithm, the algorithm sets a reasonable threshold to constrain the size of the disturbance. As can be seen from Figure 3, the algorithm in this chapter restricts the adversarial perturbation to the local semantic mask region, the image background and the face have no traces of perturbation, and the range of the adversarial perturbation region is significantly reduced compared to the perturbation of FGSM and PGD. The global adversarial perturbation generated by the FGSM and PGD algorithms adds too much adversarial noise in the background regions, which have simple image texture with a single color, making the generated adversarial samples with obvious adversarial texture, which can be easily detected by the human eye. After a qualitative comparison of the results generated from the confrontation samples, it can be seen that the confrontation generated by the algorithms in this chapter faces are more visually covert.

As shown in Table IV, we find that ASMA has significantly lower MSE, and MAE metrics than the rest of the methods through the addition of local mask noise. In addition, the images generated by the ASMA are more similar to the original images. It is worth noting that the perturbations generated by the algorithms such as FGSM, BIM, PGD, etc. are global, while the algorithm in this chapter generates the adversarial noise only in the semantic mask region, so the algorithm in this chapter is completely ahead of other algorithms in terms of the evaluation indexes of image quality. In addition, for the fairness of the comparison, the experiment compares the global adversarial faces generated by the algorithm in this chapter with other methods. ASMA(w/o mask) in Table IV represents that the algorithm does not use a semantic mask to constrain the noisy region, and iteratively generates the adversarial samples directly on the whole face image. Analysing the experimental results shows that the algorithm in this chapter has higher visual quality than existing methods even without constraining the perturbed regions. The experimental results show that the algorithm in this chapter can effectively improve the similarity between the face confrontation image and the original image with stronger concealment.

TABLE IV: Evaluation of images generated by different adversarial generation algorithms.
Method MSE MAE PSNR SSIM
FGSM 54.0564 0.0879 30.0652 0.6631
BIM 24.7829 0.0861 34.1893 0.8249
PGD 26.7434 0.0645 34.8586 0.8132
C&W 0.5005 0.0254 51.1374 0.9795
DeepFool 0.3723 0.0167 59.5134 0.9897
ASMA(w/o mask) 63.6884 0.0859 33.0589 0.8839
ASMA 0.0757 0.0075 63.0589 0.9939

IV-H Visualization Analysis

In Figure 2, we can find that the detection model focuses on specific regions of the face on real faces. After the ASMA attack, the model would be misled during the face forgery detection process, it prefers to focus on other areas that are not attacked, so that the attention of the key areas is distracted, thus ensuring the invisibility of the adversarial examples. It can be seen from the experimental results that the model’s attention to the image forgery region changes before and after the attack. In addition, it can be found from the original face CAM that the model’s attention to the original fake face mainly focuses on the facial features of the face, especially the eye eyebrows. This is because the face forgery generation algorithm mainly tampers with the facial features, making the prediction of the model more sensitive to these areas. Therefore, adding subtle adversarial perturbations to these regions can shift the model’s attention and mislead the model’s prediction results. The comparison results of the class activation map also prove that the attack method based on class activation features is effective against the forgery detection model.

V Conclusions

In this paper, we propose an adversarial semantic mask attack framework (ASMA) for face forgery detection tasks. The proposed method designs a novel adversarial semantic mask generation pipeline attacking face forgery detection, which aims to constrain generated perturbations in local semantic regions for good stealthiness. To further improve the stealthiness and transferability performance, we design the adaptive semantic mask selection strategy, which leverages the class activation mapping to select more suitable adversarial semantic mask regions. Experiments on public large-scale DFDC datasets illustrate the superior performance of the proposed ASMA. In the future, we will evaluate the proposed method on more complex real scenarios to adapt to the needs of the real world.

References

  • [1] B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C. C. Ferrer, “The deepfake detection challenge (dfdc) dataset,” in arXiv preprint arXiv:2006.07397, 2020.
  • [2] W. Bai, Y. Liu, Z. Zhang, B. Li, and W. Hu, “Aunet: Learning relations between action units for face forgery detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 24 709–24 719.
  • [3] S. Xiao, G. Lan, J. Yang, W. Lu, Q. Meng, and X. Gao, “Mcs-gan: A different understanding for generalization of deep forgery detection,” IEEE Transactions on Multimedia, 2023.
  • [4] C. Miao, Z. Tan, Q. Chu, H. Liu, H. Hu, and N. Yu, “F 2 trans: High-frequency fine-grained transformer for face forgery detection,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 1039–1051, 2023.
  • [5] J. Li, H. Xie, L. Yu, X. Gao, and Y. Zhang, “Discriminative feature mining based on frequency information and metric learning for face forgery detection,” IEEE Transactions on Knowledge and Data Engineering, 2021.
  • [6] D. Liu, Z. Zheng, C. Peng, Y. Wang, N. Wang, and X. Gao, “Hierarchical forgery classifier on multi-modality face forgery clues,” IEEE Transactions on Multimedia, pp. 1–12, 2023.
  • [7] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  • [8] S. Komkov and A. Petiushko, “Advhat: Real-world adversarial attack on arcface face id system,” in 2020 25th International Conference on Pattern Recognition (ICPR).   IEEE, 2021, pp. 819–826.
  • [9] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
  • [10] X. Yang, Y. Dong, T. Pang, H. Su, J. Zhu, Y. Chen, and H. Xue, “Towards face encryption by generating adversarial identity masks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3897–3907.
  • [11] Z. Xiao, X. Gao, C. Fu, Y. Dong, W. Gao, X. Zhang, J. Zhou, and J. Zhu, “Improving transferability of adversarial patches on face recognition with generative models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 11 845–11 854.
  • [12] Y. Hou, Q. Guo, Y. Huang, X. Xie, L. Ma, and J. Zhao, “Evading deepfake detectors via adversarial statistical consistency,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12 271–12 280.
  • [13] Y. Zhang, P. Tiňo, A. Leonardis, and K. Tang, “A survey on neural network interpretability,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 5, no. 5, pp. 726–742, 2021.
  • [14] D. Zhou, N. Wang, C. Peng, X. Gao, X. Wang, J. Yu, and T. Liu, “Removing adversarial noise in class activation feature space,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 7878–7887.
  • [15] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in Artificial intelligence safety and security.   Chapman and Hall/CRC, 2018, pp. 99–112.
  • [16] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
  • [17] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in 2017 ieee symposium on security and privacy (sp).   Ieee, 2017, pp. 39–57.
  • [18] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2574–2582.
  • [19] Y. Liu, X. Feng, Y. Wang, W. Yang, and D. Ming, “Trm-uap: Enhancing the transferability of data-free universal adversarial perturbation via truncated ratio maximization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 4762–4771.
  • [20] X. Tang, P. Yin, Z. Zhou, and D. Huang, “Adversarial perturbation elimination with gan based defense in continuous-variable quantum key distribution systems,” Electronics, vol. 12, no. 11, p. 2437, 2023.