research-article

Open access

FaceSigns: Semi-fragile Watermarks for Media Authentication

Authors:

Farinaz KoushanfarAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 11

Article No.: 337, Pages 1 - 21

https://doi.org/10.1145/3640466

Published: 12 September 2024 Publication History

PDF eReader

Abstract

Manipulated media is becoming a prominent threat due to the recent advances in realistic image and video synthesis techniques. There have been several attempts at detecting synthetically tampered media using machine learning classifiers. However, such classifiers do not generalize well to black-box image synthesis techniques and have been shown to be vulnerable to adversarial examples. To address these challenges, we introduce FaceSigns—a deep learning-based semi-fragile watermarking technique that allows media authentication by verifying an invisible secret message embedded in the image pixels. Instead of identifying and detecting manipulated media using visual artifacts, we propose to proactively embed a semi-fragile watermark into a real image or video so that we can prove its authenticity when needed. FaceSigns is designed to be fragile to malicious manipulations or tampering while being robust to benign operations such as image/video compression, scaling, saturation, contrast adjustments, and so forth. This allows images and videos shared over the internet to retain the verifiable watermark as long as a malicious modification technique is not applied. We demonstrate that our framework can embed a 128-bit secret as an imperceptible image watermark that can be recovered with a high bit recovery accuracy at several compression levels, while being non-recoverable when unseen malicious manipulations are applied. For a set of unseen benign and malicious manipulations studied in our work, our framework can reliably detect manipulated content with an AUC score of 0.996, which is significantly higher than prior image watermarking and steganography techniques.

1 Introduction

Media authentication, despite having been a long-term challenge, has become even more difficult with the advent of deep learning-based generative models. Deep Neural Network (DNN)-based generative models [12, 25, 26, 34, 50, 58] have enabled the creation of high-quality synthetic media in various domains. Such techniques can be used to easily manipulate real images, videos, and audio to fuel misinformation; tamper sensitive documents; defame individuals; and reduce trust in social media platforms [32]. Media authentication is crucial in ensuring the accuracy of news and maintaining public trust to safeguard against the potential misuse of generative models. Media authentication also plays a crucial role in law enforcement, where videos and images are often used as evidence. Recent methods to detect fake media rely on DNN-based classifiers to distinguish synthetic videos from real videos [14, 41]. However, classifiers trained in a supervised manner on existing media synthesis techniques cannot be reliably secure against black-box generation methods. Moreover, the current best-performing detectors for synthetic media can be easily bypassed by attackers using adversarial examples [16, 21, 33].

As an alternate solution, proactively embedding a secret verifiable message into images and videos at the time of their capture from a device can establish the provenance of authentic images and videos and circumvent the limitations of classifiers for synthetic media. Several prior works have explored digital image watermarking and deep learning-based steganography techniques [11, 15, 30, 49, 62] to hide secret messages in image pixels. However, these works are either fragile to basic image processing operations such as compression and color adjustments or overly robust to the point that the secret can be recovered even after occluding major portions of the embedded image [49]. In fact, we experimentally demonstrate that past works on robust neural network-generated watermarks [49, 62] can recover messages even from images that have undergone face swapping manipulations. Moreover, past neural network-based watermarking frameworks are not designed to be robust to common video compression codecs that apply temporal compression along with per-frame spatial compression. For solving the challenge of media authentication, the watermarking framework should have the following desirable properties: (1) the watermark data should be recoverable if the image/video undergoes benign transformations such as compression or minor adjustments; (2) the watermark recovery should break if the image/video has been maliciously manipulated, e.g. replacing the face, occluding/replacing significant portions of the image; and (3) the watermark should be visually imperceptible.

To address the above challenges of synthetic media classifiers and watermarking frameworks, we introduce FaceSigns—a deep learning-based semi-fragile watermarking system that embeds a recoverable message as an imperceptible perturbation in the image pixels. The watermark can contain a secret message or device-specific codes that can be used for authenticating images and videos. The desirable property of the watermark is that it should break if a malicious manipulation such as occlusion, face swapping, or content manipulation is applied to the image/video, but it should be robust against harmless transformations such as image compression, video compression, and color and lighting adjustments, which are commonly applied on pictures and videos before uploading them to online sharing platforms. To achieve this goal, we develop an encoder-decoder-based training framework that encourages message recovery under benign transformations and discourages message recovery if the watermark has been spatially tampered in certain parts of the image. An overview of the FaceSigns watermarking framework is depicted in Figure 1. In contrast to hand-designed pipelines used in previous work for semi-fragile watermarking [6, 18, 28, 39, 54], our framework is end-to-end and learns to be robust to a wide range of real-world digital image processing operations such as social media filters and compression techniques, while being fragile to various Deepfake tampering techniques. The technical contributions of our work are as follows:

Fig. 1.

(1)

We develop a neural semi-fragile image watermarking framework that can certify the authenticity of digital media and serve as a proactive defense against media manipulation.

(2)

We propose a novel training procedure to make the watermark retrieval robust against both image and video compression techniques like JPEG, H264, and MPEG4. We overcome the challenge of non-differentiable video compression codecs during training by estimating the gradients using a straight-through estimator in the backward pass.

(3)

We design a differentiable procedure to simulate watermark tampering during training such that our framework can achieve selective fragility against unseen malicious transformations (Section 3.3.2).

(4)

For a set of previously unseen benign and malicious image transformations, FaceSigns achieves the goal of selective fragility and reliably detects malicious manipulations with an area under the ROC curve (AUC) score of \(\mathbf {0.996,}\) which is significantly higher than alternate robust and semi-fragile watermarking frameworks.

2 Background

2.1 Media Forgery

Media forgery refers to the manipulation of digital content such as documents, images, videos, and audio to create convincing but fabricated media. Traditional media forgery techniques like image compositing [38] aim to selectively remove important context or to create a misleading narrative. For example, a compositing attack could be used to alter the background of an image to misrepresent the location where the photo was taken or to selectively remove individuals or objects from a video to distort the events that occurred. The layer-based compositing [38] technique involves breaking down the image into multiple layers, each of which contains different elements (e.g., foreground objects, background, shadows, highlights). Each layer is then composited separately, and the final image is the sum of all the layers. Alpha blending [38] is another common technique for compositing images, where the transparency of each pixel is specified by an alpha value. The resulting image is a linear combination of the foreground and background images, weighted by their respective alpha values. Compositing techniques can be difficult to detect, especially when the manipulation is subtle, making them a common tool for propagating fake or misleading information. These types of media forgery have been used for many years and are often employed to manipulate public opinion, discredit individuals or groups, or create sensational news stories. Due to this, the task of verifying the authenticity of an image is becoming a crucial aspect of image security.

2.2 Facial Forgery

Until recently, the ease of generating manipulated faces in photos and videos has been limited by manual editing tools. However, since the advent of deep learning, there has been significant work in developing new techniques for automatic digital forgery. It has now become easier to create realistic-looking synthetic media that are difficult to distinguish from authentic media. Particularly, DNN-based facial manipulation methods [8, 9, 12, 26] operate end-to-end on a source video and target face and require minimal human expertise to generate fake videos in real time. In our work we show effectiveness against popular Generative Adversarial Network (GAN) based Deeepfake generation methods, SimSwap [8] and Few-Shot Face Translation (FSFT) [43], and a classical computer graphics-based face replacement approach, FaceSwap [26].

The best-performing Deepfake detectors [1, 14, 41, 42, 52] rely on convolutional neural network (CNN)-based architectures. Such Deepfake detectors model Deepfake detection as a per-frame binary classification problem using a face-tracking method prior to CNN classification to effectively detect facial forgeries in both uncompressed and compressed videos. While CNN-based classifiers achieve promising detection accuracy on a fixed in-domain test set of real and fake videos, they suffer from two main drawbacks: (1) lack of generalizability to unseen Deepfake synthesis techniques and (2) vulnerability to adversarial examples in both black-box and white-box attack settings. Classifiers trained in a supervised manner on existing Deepfake generation methods cannot be reliably secure against novel Deepfake generation methods not seen during training. With the advances in deep learning-based generative models, classification methods fail to stay a step ahead in the race to reliably detect synthetic videos. This lack of generalizability is a significant drawback, as it means that CNN-based classifiers may not be able to keep up with the constantly evolving landscape of manipulated videos. Moreover, the current best-performing Deepfake classifiers can be easily bypassed using adversarial examples. Prior work [20, 16, 21, 33] demonstrates that an attacker can bypass most state-of-the-art Deepfake detectors by adding an imperceptible perturbation to each frame of a given video, causing the detector to misclassify a given Deepfake as real. We refer the reader to past works [20, 16, 21, 31] that explore such limitations of CNN-based Deepfake detectors.

2.3 Digital Watermarking

Digital watermarking [11], similar to steganography [15], is the task of embedding information into an image in a visually imperceptible manner. These techniques broadly seek to generate three different types of watermarks: fragile [6, 13], robust [3, 7, 10, 36, 37, 45, 62], and semi-fragile [28, 48, 57] watermarks. Fragile and semi-fragile watermarks are primarily used to certify the integrity and authenticity of image data. Fragile watermarks are used to achieve accurate authentication of digital media, where even a 1-bit change to an image will lead it to fail the certification system. In contrast, robust watermarks aim to be recoverable under several image manipulations in order to allow media producers to assert ownership over their content even if the video is redistributed and modified. Semi-fragile watermarks combine the advantages of both robust and fragile watermarks and are mainly used for fuzzy authentication of digital images and identification of image tampering [57]. The use of semi-fragile watermarks is justified by the fact that images and videos are generally transmitted and stored in a compressed form, which should not break the watermark. However when the image gets tampered, the watermark should also get damaged, indicating image tampering.

Several past works have proposed hand-engineered pipelines to embed semi-fragile watermark information in the spatial and frequency (transform) domain of images and videos. In the spatial domain, the pixels of digital images are processed directly using block-based embedding [6] and least significant bits modification [53, 54] to embed watermarks. In the frequency domain, the watermark can be embedded by modifying the coefficients produced with transformations such as the Discrete Cosine Transform (DCT) [3, 18, 39] and Discrete Wavelet Transform [5, 27, 44]. However, we demonstrate in our experiments that the major limitations of traditional approaches lie in higher visibility of the embedded watermarks, increased distortions in generated images, and low robustness to compression techniques like JPEG transforms. Moreover, these works have not been designed to be fragile against Deepfake manipulations.

More recently, CNNs have been used to provide an end-to-end solution to the watermarking problem. They replace hand-crafted hiding procedures with neural network encoding [2, 17, 30, 49, 59, 62]. Notably, both StegaStamp [49] and HiDDeN [62] propose frameworks to embed robust watermarks that can hide and transmit data in a way that is robust to various real-world transformations. All of these works focus on generating robust watermarks, with the goal of ensuring robustness and recovery of the embedded secret information under various physical and digital image distortions. We empirically demonstrate that these techniques are unable to generate semi-fragile watermarks and are therefore not suitable for identifying tampered media such as Deepfakes.

3 Methodology

We aim to develop an image watermarking framework that is robust to a set of benign image and video transformations while being fragile to malicious transforms. Additionally, it is desirable to have an imperceptible watermark so that devices can store only the watermarked images without revealing the original image to the end user. The set of benign and malicious transformations depends on the application of the media authentication system and can be modified as desired. For example, for applications like document verification, it may be desirable to limit the set of benign transformations to have only compression, while for social media platforms, it is desirable to allow operations such as artistic image filtering. We propose a general-purpose framework that can be adapted for any set of benign and malicious transforms.

With this objective in mind, our system consists of three main components: an encoder network \(E_\alpha\), a decoder network \(D_\beta\), and an adversarial discriminator network \(A_\gamma\), where \(\alpha , \beta ,\) and \(\gamma\) are learnable parameters. An overview of our system is provided in Figure 2. The encoder network E takes as input an image x and a bit string \(s \in \lbrace 0, 1\rbrace ^L\) of length L and produces an encoded (watermarked) image \(x_w\). That is, \(x_w=E(x, s)\). The watermarked image then goes through two image transformation functions—one sampled from a set of benign transformations (\(g_b \sim G_b\)) and the other sampled from a set of malicious transformations (\(g_m \sim G_m\)) to produce a benign image \(x_b=g_b(x_w)\) and a malicious image \(x_m=g_m(x_w)\). The benign and malicious watermarked images are then fed to the decoder network, which predicts the messages \(s_b=D(x_b)\) and \(s_m=D(x_m),\) respectively.

Fig. 2.

For optimizing secret retrieval during training, we use the \(L_1\) distortion between the predicted and ground-truth bit strings. The decoder is encouraged to be robust to benign transformations by minimizing the message distortion \(L_1(s, s_b)\), and fragile for malicious manipulations by maximizing the error \(L_1(s, s_m)\). Therefore, the secret retrieval error for an image \(L_M(x)\) is obtained as follows:

\begin{equation} L_M(x) = L_1(s, s_b) - L_1(s, s_m). \end{equation}

(1)

The watermarked image is encouraged to look visually similar to the original image by optimizing three image distortion metrics: \(L_1\), \(L_2\), and \(L_{\it pips}\) [60] distortions. Additionally, we use an adversarial loss \(L_G(x_w) = \log (1 - A(x_w))\) from the discriminator, which is trained simultaneously to distinguish original images from watermarked images. That is, our image reconstruction loss \(L_{\it img}\) is obtained as follows:

\begin{equation} \begin{split}& L_{\it d}(x, x_w) = L_1(x, x_w) + L_2(x, x_w) + c_p L_{\it pips}(x, x_w) \\ & L_{\it img}(x, x_w) = L_{\it d}(x, x_w) + c_g L_G(x_w). \end{split} \end{equation}

(2)

Therefore, the parameters \(\alpha ,\beta\) of the encoder and decoder network are trained using mini-batch gradient descent to optimize the following loss over a distribution of input messages and images:

\begin{equation} \mathbb {E}_{x, s, g_b, g_m} [ L_{\it img}(x, x_w) + c_M L_M(x) ]. \end{equation}

(3)

The discriminator parameters \(\gamma\) are trained to distinguish original images x from watermarked images \(x_w\) as follows:

\begin{equation} \mathbb {E}_{x, s} [\log (1 - A(x)) + \log (A(x_w)) ]. \end{equation}

(4)

In the above equations, \(c_p\), \(c_g\), \(c_M\) are scalar coefficients for the respective loss terms. We use the following values for our loss coefficients: \(c_p=1\), \(c_g=0.1\), \(c_m=1\). We use Adam optimizer during training with a learning rate of \(2e-4\).

3.1 Message Encoding

The encoder network accepts watermarking data as a bit string s of length L. This watermarking data can contain information about the device that captured the image or a secret message that can be used to authenticate the image. To prevent adversaries (who have gained white-box access to the encoder network) from encoding a target message, we can encrypt the message using symmetric or asymmetric encryption algorithms or hashing. In our experiments, we embed encrypted messages of size 128 bits, which allows the network to encode \(2^{128}\) unique messages. We discuss the possible threats and defenses to our watermarking framework in Section 5.

3.2 Network Architectures

Our encoder and decoder networks are based on the U-Net CNN architecture [23, 40, 49] and operate on \(256\times 256\) images. The encrypted message s, which is an L length bit string, is first projected to a tensor \(s_{{\it Proj}}\) of size \(96\times 96\) using a trainable fully connected layer, then resized to \(256 \times 256\) using bilinear interpolation, and finally added as the fourth channel to the original RGB image to be fed as an input to the encoder network. The encoder U-Net contains eight downsampling and eight upsampling layers. We modify the original U-Net architecture and replace the transposed convolution in the upsampling layers with convolutions followed by nearest-neighbor upsampling as per the recommendations given by [35]. In our preliminary experiments, we found this change to significantly improve the image quality and training speed of our framework. The downsampling and upsampling layers have skip-connections between the corresponding layers with the same output size. The decoder network also follows the U-Net architecture similar to our encoder network. The decoder U-Net first outputs a \(256\times 256\) intermediate output, which is downsized to \(96\times 96\) using bilinear downsampling to produce \(s_{\it ProjDecoded}\) and then projected to a vector of size L using a fully connected layer followed by a sigmoid layer to scale values between 0 and 1. We use batch normalization layers in the encoder network and instance normalization layers in the decoder network.

For the discriminator network, we use the patch discriminator from [23]. The discriminator is trained to classify each \(N\times N\) image patch as real or fake. We average discriminator responses across all patches to obtain the discriminator output. Our discriminator network consists of three convolutional blocks of stride 2, thereby classifying patches of size \(32\times 32\).

3.3 Transformation Functions

The choice of benign and malicious transformation functions is critical to achieve selective fragility and robustness of the watermark. While we can only use a limited set of image transformations during training, the list of possible benign and malicious transforms in real-world settings is non-exhaustive. In our experiments (Section 4.3), we demonstrate that by incorporating the below described transformation functions, we are able to generalize to unseen benign and malicious transformations that are commonly used across social media platforms.

3.3.1 Benign Transforms.

Our goal is to authenticate real images and videos shared over online platforms that generally undergo compression and diverse color or lighting adjustments (e.g., Instagram filters). To approximate standard image processing distortions, we apply a diverse set of differentiable benign image transformations (\(G_b\)) to our watermarked images during training:

(1)

Gaussian blur: We convolve the original image with a Gaussian kernel \(\mathit {k}\). This transform is given by \(t(x)= k \ast x,\) where \(\ast\) is the convolution operator. We use kernel sizes ranging from \(\mathit {k} = 3\) to \(\mathit {k} = 7.\)

(2)

JPEG compression: Digital images are usually stored in a lossy format such as JPEG. We approximate JPEG compression with the differentiable JPEG function proposed in [46]. During training, we apply JPEG compression with quality 40, 60, and 80.

(3)

Saturation adjustments: To account for various color adjustments from social media filters, we randomly linearly interpolate between the original (full RGB) image and its grayscale equivalent.

(4)

Contrast adjustments: We linearly rescale the image histogram using a contrast factor \(\sim \mathcal {U}[0.5, 1.5]\)

(5)

Downsizing and upsizing: The image is first downsized by a factor \({\it scale}\) and then up-sampled by the same factor using bilinear upsampling. We use \({\it scale} \sim \mathcal {U}[2, 5].\)

(6)

Translation and rotation: The image is shifted horizontally and vertically by \(n_h\) and \(n_w\) pixels, where \(n_h, n_w \sim \mathcal {U}[-10, 10],\) and rotated by r degrees, where \(r \sim \mathcal {U}[-10, 10]\).

(7)

Video compression: Simulating video compression distortions during training is more challenging because common video compression codecs such as MPEG4 and H264 cannot be easily implemented using differentiable functions. Such codecs not only compress each frame of a given video but also apply temporal compression across the time-steps for a more optimized compression. Since video compression is applied to almost all videos uploaded on the internet; it is essential to ensure the robustness to these codecs to make the watermark suitable for videos. To this end, we propose the first technique to ensure robustness of the generated watermark to a benign non-differentiable video transform \(g_b\):

—

When training the watermarking framework on videos, each mini-batch of images x corresponds to consecutive frames of a single video.

—

We obtain watermarked frames \(x_w\) by embedding unique signatures into each frame using the encoder network. Next, we detach \(x_w\) from the computational graph, extract each frame, and write the frames into a video file. The video file is then compressed into H264 codec using FFMPEG¹ at a quantization factor from the interval \([5,25]\).

—

Next, we read each frame of the compressed file and stack them together to obtain the transformed image batch \(g_b(x_w),\) which is then reinserted in the computational graph to be fed as input to the decoder.

—

During the backward pass, we use the straight-through estimator [4] to estimate the gradient across the transformation function \(g_b\). That is:

\begin{equation} \left. \nabla _{x_w} L_M(g_b(x_w)) \right|_{x_w = \hat{x_w}} \approx \left. \nabla _{x_w} L_M(x_w) \right|_{x_w = g_b(\hat{x_w})}, \end{equation}

(5)

where \(L_M(x_w)\) indicates the message recovery loss from the decoder for an input \(x_w\). We illustrate the video compression procedure used during training in Figure 3.

Fig. 3.

For each mini-batch iteration, we sample one transformation function from the above list (and an Identity transform) and apply it to all the images in the batch.

3.3.2 Malicious Transforms.

Our semi-fragile watermarks have to be unrecoverable when malicious transforms such as image compositing, occlusion, or face replacement are applied. The common operation across these manipulation techniques is to modify certain spatial areas of the image.

To simulate such transforms during training, we propose a watermark occlusion transform as follows: We first generate a tampering mask that indicates what modifications we want to retain or partially discard in the signed image. Given such a tampering mask, we partially remove the added perturbation in the signed image from the areas indicated by the mask. We consider two kinds of spatial tampering masks during training:

—

Image compositing mask: For each image, we initialize a mask \(M_{h\times w\times c}\) of all ones. Next, we randomly select n rectangular patches in the mask and set the value of all pixels in the patches to a small watermark retention percentage \(w_r \in [0,1]\).

—

Facial manipulation mask: For each image, we initialize a mask \(M_{h\times w\times c}\) of all ones. Next, we extract the facial feature polygons for eyes, nose, and lips and set the values for all pixels inside the polygons to a small watermark retention percentage \(w_r \in [0,1]\).

That is, \(M[i,j,:] = w_r\) for all \(i,j\) in the selected spatial polygons. Finally, the maliciously transformed image \(g_m(x_w)\) is obtained as follows:

\begin{equation*} g_m(x_w) = M\cdot x_w + (1-M)\cdot x. \end{equation*}

Figure 4 illustrates the malicious transform procedure.

Fig. 4.

4 Experiments

4.1 Datasets and Experimental Setup

We conduct our experiments on the CelebA [29], MIRFLICKR [19], and UCF-101 [47] datasets. CelebA is a large-scale database of over \(200{,}000\) face images of \(10{,}000\) unique celebrities. The MIRFLICKR dataset is a diverse image retrieval dataset containing 1 million images. For training our watermarking framework to be robust to video compression, we use the UCF-101 dataset that contains \(13{,}320\) short clips for action recognition. We set aside \(1{,}000\) images/videos for testing from each dataset and split the remaining data into \(80\%\) training and \(20\%\) validation. We train our models for 200K mini-batch iterations with a batch size of 64 and use an Adam optimizer with a fixed learning rate of \(2e-4\). All our models are trained using images/video frames of size \(256\times 256,\) which are obtained after center-cropping and resizing the images. We conduct experiments with message length \(L=128\). To evaluate the effectiveness of using transformation functions during training, we conduct an ablation study by training a FaceSigns (No Transform) model that does not incorporate any input transformations and a FaceSigns (Robust) model that uses only benign transformations during training. We evaluate watermarking techniques primarily on the following aspects:

(1)

Imperceptibility: We compare the original and watermarked images to compute peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). Higher values for both PSNR and SSIM are desirable for a more imperceptible watermark.

(2)

Robustness and fragility: To measure the robustness and fragility of the watermarking system, we measure the bit recovery accuracy (BRA) of the bit string s when unseen (not used in training) benign and malicious image transformations are applied. BRA is calculated by comparing the decoded secret bit string with the secret bit string that was embedded by the encoder into the given image. The number of matched bits divided by the length of the bit string gives the bit recovery accuracy of a single image. We average this over our test set to report the BRA. For robustness, it is desirable to have a high BRA against benign transformations like social media filters and image compression. For fragility against malicious tampering, it is desirable to have a low BRA when facial manipulation or image compositing is applied. To make a fair comparison with past works, we do not apply any bit error correcting codes while calculating the BRA and compare the input string s with the raw decoder output. A detector can classify an input as manipulated if the BRA of the decoded message is below a set threshold and benign if the BRA is more than the threshold. We measure the performance of such a detector using the AUC score.

(3)

Capacity: This measures the amount of information that can be embedded in the image. We measure the capacity as the bits per pixel (BPP), which is the number of bits of the encrypted message embedded per pixel of the image, which is simply \(=L/(\mathit {HWC})\).

It is important to note the tradeoff between the above metrics—e.g., models with higher capacity sacrifice on the imperceptibility or bit recovery accuracy. Similarly, more robust models sacrifice capacity or imperceptibility. We compare our watermarking framework against three prior works on image watermarking—a DCT-based semi-fragile watermarking system [18] and two neural image watermarking systems, HiDDeN [62] and StegaStamp [49]. Both HiDDeN and StegaStamp embed a bit string message into a square RGB image while ensuring robustness to a set of image transformations. We present examples of original and watermarked images along with the added perturbation from different techniques in Figure 5.

Fig. 5.

4.2 Imperceptibility and Capacity

We report the image similarity and capacity metrics of different watermarking techniques in Table 1. We find that even at a higher message capacity, FaceSigns can encode messages with better imperceptibility as compared to StegaStamp and HiDDeN. As noted by the authors of StegaStamp and visible in Figure 5, the residual added by their model is perceptible in large low-frequency regions of the image. We believe that this is primarily due to the difference in our network architecture choices. In our initial experiments, we found that using a UNet architecture for the decoder with an intermediate message reconstruction loss described in Section 3.2 performed significantly better than a downsampling CNN architecture used in prior work. Additionally, we use nearest neighbor upsampling instead of transposed convolutions in our U-Net architectures, which helps reduce the perceptibility of the watermark by removing upsampling artifacts.

Table 1.

Method	H,W	L	\(\text{BPP}\)	PSNR	SSIM
	Capacity			Imperceptibility
Semi-fragile DCT [18]	128	256	\(5.2\text{e-}3\)	22.49	0.871
HiDDeN [62]	128	30	\(6.1\text{e-}4\)	27.57	0.934
StegaStamp [49]	400	100	\(2.0\text{e-}4\)	29.39	0.925
FaceSigns (No Transform)	256	128	\(6.5\text{e-}4\)	36.38	0.973
FaceSigns (Robust)	256	128	\(6.5\text{e-}4\)	35.56	0.964
FaceSigns (Semi-Fragile)	256	128	\(6.5\text{e-}4\)	35.43	0.962

Table 1. Capacity and Imperceptibility Metrics of Different Watermarking Systems

\(H,W\) indicate the height and width of the input image.

4.3 Robustness and Fragility

To study the robustness and fragility of different DNN-based watermarking techniques, we transform the watermarked images using unseen benign and malicious transformations and then attempt to decode the message from the transformed message. We perform ablation studies to evaluate the effectiveness of the proposed transforms by training three versions of our watermarking framework: FaceSigns (No Transform), which does not use any benign or malicious transformations during training; FaceSigns (Robust), which is only trained to be robust against benign transformations and does not use malicious transformations during training; and FaceSigns (Semi-fragile), which uses both benign and malicious transformations during training.

4.3.1 Benign Image Transforms.

For benign transforms, we first consider real-world image operations that are commonly used when uploading pictures on the internet. We compress the image using different levels of JPEG compression (separate from training) and also apply Instagram filters, namely Aden, Brooklyn, and Clarendon, which we use from an open-source Python library, Pilgram [24]. Some example images from these transformations are shown in Figure 6. We report the BRA of different watermarking frameworks after undergoing benign transformations in Table 3. We find that both StegaStamp and our robust and semi-fragile models can decode secrets with a high BRA for these image transformations. We find that FaceSigns (Robust), which does not use malicious transforms during training, is slightly more robust to benign transformations as compared to FaceSigns (Semi-fragile). However, this improved robustness comes at the cost of being non-fragile to malicious transformations and being able to decode messages with high BRA even for Deepfake manipulations. The model FaceSigns (No Transform), which does not incorporate any benign or malicious transformations during training, is fragile to both JPEG compression and malicious transforms as indicated by the low BRA for both methods.

Table 2.

Method	Blur	Cropping	Rotation	Contrast	Brightness	Translation
	BRA (%)
FaceSigns (No Transform)	78.32	65.62	80.22	88.23	92.23	62.62
FaceSigns (Robust)	99.71	97.39	99.82	99.88	99.91	97.12
FaceSigns (Semi-fragile)	99.68	97.45	99.77	99.85	99.82	96.54

Table 2. Bit Recovery Accuracy (BRA) of Different Techniques against Benign Transformations Used During Training

The hyperparameter values for these transforms are sampled from the range given in Section 3.3.1.

Table 3.

Method	None	JPG-75	JPG-50	Aden	Brooklyn	Clarendon	SimSwap [8]	FSFT [43]	FS [26]	Compositing
	BRA (%) - Benign Transforms						BRA (%) - Malicious Transforms
Semi-fragile DCT [18]	99.81	56.65	55.04	94.98	96.41	95.06	57.62	57.61	88.59	82.31
HiDDeN [62]	97.06	72.71	68.48	94.52	94.52	94.52	85.48	72.33	74.23	73.27
StegaStamp [49]	99.92	99.91	99.87	99.84	99.73	99.39	98.34	97.42	97.43	98.21
FaceSigns (No Transform)	99.96	50.51	50.07	98.39	99.67	99.65	51.04	52.00	51.36	53.36
FaceSigns (Robust)	\(99.96\)	99.74	97.26	99.53	99.19	99.37	97.29	89.76	68.99	97.26
FaceSigns (Semi-fragile)	99.68	99.49	98.38	97.40	98.34	99.32	64.93	52.21	31.77	51.61

Table 3. Bit Recovery Accuracy (BRA) of Different Watermarking Techniques against Benign and Malicious Transforms Unseen during Training

For Benign transforms, we consider two JPEG compression levels and three Instagram filters—Aden, Brooklyn and Clarendon. For malicious transforms we consider various face manipulation/swapping techniques and general image compositing transforms. A higher BRA against benign and a lower BRA against malicious transforms is desirable to achieve our goal of semi-fragile watermarking.

Fig. 6.

We also evaluate FaceSigns watermark recovery against the benign transformations used during training. The hyper-parameters used for these transformations are sampled randomly from the intervals described in Section 3.3.1. For cropping, we use center-cropping with crop factor sampled from \((1.2, 1.5)\). We present sample images undergoing these transformations in Figure 7 and the results in Table 2. We also study the BRA at different magnitudes of distortions for Gaussian blurring and JPEG compressions and present the results in Figure 8. We find that both FaceSigns (Robust) and FaceSigns (Semi-fragile) can effectively recover the watermark data even at high magnitudes of distortions for benign transforms.

Fig. 7.

Fig. 8.

Robustness to Video Compression: For watermarking videos, we use the FaceSigns encoder to insert the watermark data into each video frame. Similarly, for decoding, we decode watermark data by passing each frame of the watermarked video to the FaceSigns decoder network. In our initial experiments, we found that training FaceSigns to be robust against spatial image transforms does not ensure robustness against video compression codecs. This is because besides compressing each frame spatially, video compression codecs like H264 also compress data temporally. To address this challenge, we incorporate video compression during training using the gradient-estimation procedure described in Section 3.3.1. As indicated by the results in Figure 8(C), incorporating video compression codecs during training significantly improves watermark recovery from highly compressed videos. Robustness to H264 compression makes FaceSigns a practical framework for inserting recoverable watermarks in videos shared on the internet.

4.3.2 Malicious Transforms.

To evaluate the fragility of the watermark against unseen facial manipulations, we apply three face-swapping techniques on the watermarked images from the CelebA dataset: FaceSwap [26], SimSwap [8], and Few-Shot Face Translation (FSFT) [43]. FaceSwap [26] is a computer graphics-based technique that swaps the face by aligning the facial landmarks of the two images. SimSwap [8] and FSFT [43] are deep learning-based techniques that use CNN encoder-decoder networks trained using adversarial loss to generate Deepfakes. Figure 9 shows examples of swapped faces using these techniques. Additionally, we consider a general image compositing operation for all test images where we randomly select image patches covering 10% to 50% of the image and replace the patches with those from an alternate image.

Fig. 9.

As reported in Table 3, we find that StegaStamp and FaceSigns (Robust) can decode signatures from maliciously transformed images with a high BRA, thereby making them unsuitable for authenticating the integrity of digital media. This is understandable since these methods prioritize robustness over fragility. StegaStamp has been shown to be robust to occlusions even though occlusions were not explicitly a part of their set of training transformations. In contrast, watermark data recovery for the FaceSigns (Semi-fragile) model breaks against malicious transforms, which is desirable for malicious tampering detection.

Based on the bit-recovery accuracy of the watermark data, we can define a manipulation detector as follows: The detector labels an image as maliciously tampered if the BRA of the predicted bit string is less than a threshold \(\tau\). The ROC curve of such a detector is shown in Figure 10. As evident by the ROC plots and AUC scores shown in Figure 10, in contrast to prior works, our semi-fragile model demonstrates robustness to benign transformations while being fragile toward out-of-domain malicious Deepfake transformations, thereby achieving our goal of selective fragility and an AUC score of 0.996 for manipulation detection.

Fig. 10.

4.4 Watermarking Images with Multiple Faces

For watermarking facial images with multiple faces, we can adapt our framework to insert the watermark into each face to detect facial tampering in any identity. To this end, we use a face detection model to extract a square bounding box of each face in the image containing multiple faces. The faces from the bounding boxes are cropped out and resized to \(256\times 256\) and passed as input to our encoder model to embed individual semi-fragile watermarks. The watermarked faces are then resized back to their original size and placed back into the original images. During decoding, a similar process is repeated where the faces are cropped and resized to \(256\times 256\) before being fed into the decoder. Since the benign transforms during training tolerate small image translations, it makes our watermarking robust to small shifts in the face detection network. We conduct experiments on 400 test images containing two to six faces each from the Celebrity Together dataset [61]. Our FaceSigns (Semi-fragile) model achieves a BRA of 99.50%, demonstrating that we can effectively encode and retrieve watermarks embedded in images with multiple faces. Figure 11 shows watermarked images with multiple identities.

Fig. 11.

5 Discussion - Threat Models

Both watermark embedding techniques and Deepfake detection systems face adversarial threats from attackers who attempt to bypass the detectors by authenticating manipulated media. In this section, we discuss some of the threat models faced by our system and how these challenges can be addressed:

Attack 1. Querying the decoder network for performing adversarial attacks: The attacker may query the decoder network with an image to get the decoded message and adversarially perturb the query image until the decoded message matches the target message.

Defense: The attacker does not know what target messages can prove media authenticity since these messages can be kept as a secret and updated frequently. If the attacker gains access to the secret message by querying the decoder with a watermarked image, the encryption key secrecy can prevent the attacker from knowing the target encrypted message for the decoder. Lastly, the decoder network can be hosted securely and can only output a binary label indicating whether the image is authentic or manipulated by matching the decoded secret with the list of trusted secrets. This would make the decoder’s signal unusable for performing adversarial attacks to match a target message out of the total possible \(2^{128}\) messages.

Attack 2. Copying the watermark perturbation from one image to another: The adversary may attempt to extract the added perturbation of the watermark and add it onto a Deepfake image to authenticate the manipulated media.

Defense: Since FaceSigns generates an image- and message-specific perturbation, we hypothesize that the same perturbation when applied on alternate images should not be recoverable by the decoder. We verify this hypothesis by conducting an experiment in which we extract added perturbations from 100 watermarked images and apply extracted perturbation to 100 alternate images. The bit recovery accuracy of such an attack is just 17.6%, which is worse than random prediction.

Attack 3. Training a proxy encoder: The adversary can collect a dataset of original and watermarked images and train a neural network-based encoder-decoder image-to-image translation network to map any new image to a watermarked image.

Defense: One defense strategy is to only store watermarked images on devices so that an attacker never gains access to pairs of original and watermarked images. Also, the above attack can only work if the encoded images all contain the same secret message, so that the adversary can learn a generator for watermarking a new image with the same secret message. To prevent the creation of such a dataset, some bits of the message can be kept dynamic and contain a unique timestamp and device-specific codes so that each embedded bit string is different. Regularly updating the trusted message or encryption key is another preventative strategy against such attacks.

6 Related Work

In this section, we discuss various concurrent endeavors that tackle the challenges related to media ownership and digital watermarking through deep neural networks. One such effort is FakeTagger [51], which endeavors to create a highly robust watermark that remains intact even after facial manipulation or Deepfake modification. Although this work does not consider the concept of fragility, the rationale behind their idea is that one can track the origin of an image or video. However, FakeTagger has a different objective compared to our work, as we aim to identify facial tampering through semi-fragile watermarking. Additionally, unlike our approach, the practical implementation of FakeTagger is memory intensive since it requires the storage of the original photo along with its tag to retrieve the authentic image, necessitating additional data (i.e., every photo) to be saved. FaceGuard [55] is another contemporaneous work that performs digital watermarking using DNNs. However, the authors only consider robustness during their training process and do not introduce any technique to make the watermark fragile to facial manipulations. In their experiments, the authors only evaluate the recovery of the embedded watermark and do not perform Deepfake manipulation to check if the watermark is still recoverable or not. Specifically, the authors embed a watermark in real images and assume that Deepfake images have not been watermarked, and use a standard Deepfake dataset to evaluate how accurately the decoder can identify real images. In contrast, our work demonstrates that relying only on robustness to benign transformations during training is not sufficient to achieve semi-fragility in watermarks.

Past work such as [22] proposed a hardware accelerator that focuses on optimizing the hardware design of image watermarking with reconfigurable modules. However, in contrast to our work, they do not conduct thorough assessments against real-world Deepfake manipulations, compositing attacks, and video compression codecs.

Another related work [56] seeks to embed robust watermarks into the weights of DNN-based generative models in order to assert ownership of models and their generated images. This solution is reliant on the assumption that Deepfake synthesizers will only utilize generative models that have been watermarked, thereby allowing detectors to identify the source model of the Deepfakes.

7 Future Directions

As neural networks continue to advance and make high-quality synthetic content more accessible, it is crucial to prioritize safeguards against potential misuse of this technology. Addressing both generation and detection of synthesized media is key for the responsible use of media synthesis technology. Since AI-generated content is expected to increase across social media platforms in the foreseeable future, the reliable detection of such content is essential to ensure trust in social media platforms and prevent potential harms of synthesis technologies. With the proliferation of Deepfakes and manipulated content, there’s a growing need for real-time detection systems capable of identifying even the most subtle alterations. Future work integrating multi-modal analyses that leverage text, audio, and video data can enhance the accuracy and robustness of synthetic media detection. Future directions should also study the potential for watermarking video and audio content together in order to authenticate real media.

8 Conclusion

We introduce a deep learning-based semi-fragile watermarking system that can certify the integrity of digital images and videos and reliably detect tampering. Through our experiments and evaluations, we demonstrate that FaceSigns generates more imperceptible watermarks than previous state-of-the-art methods while upholding the desired semi-fragile characteristics. By carefully designing a fixed set of benign and malicious transformations during training, our framework achieves generalizability to real-world image and video transformations and can reliably detect Deepfake facial and image compositing manipulations, unlike prior image watermarking techniques. Additionally, our work is a significant step forward in the field of covert watermarking for videos. FaceSigns can be vital to media authenticators in social media platforms, news agencies, and legal offices and help create more trustworthy platforms and establish consumer trust in digital media.

Footnote

https://ffmpeg.org/

A Appendix

A.1 Message Length Experiments

We conduct additional experiments to study the effectiveness of the FaceSigns (Semi-fragile) framework in embedding different message lengths ranging from 64 to 512 bits into an image and present the results in Table 4. We study different lengths of the embedded watermark message: 64 bits, 128 bits, 256 bits, and 512 bits. We compute the BRA Benign for all the test benign transforms (Identity, JPG-75, JPG-50, Aden, Brooklyn, and Clarendon) and BRA Malicious for the four malicious transforms (SimSwap, FSFT, FS, and Compositing) and report the mean BRA in Table 4. As we increase the message length, we observe a slight decrease in PSNR and SSIM metrics, indicating that the watermark gets more perceptible. At all message lengths, the watermark maintains the desired fragility to malicious transforms as indicated by low BRA Malicious. However, we observe a slight decrease in robustness to benign transforms as we increase the message length to 256 and 512.

Table 4.

Message Length	PSNR	SSIM	BPP	BRA Benign (%)	BRA Malicious (%)
64	37.48	0.979	\(3.2\text{e-4}\)	98.91	48.22
128	35.43	0.962	\(6.5\text{e-4}\)	98.76	49.52
256	35.32	0.959	\(1.3\text{e-3}\)	93.42	52.23
512	32.17	0.938	\(2.6\text{e-3}\)	90.29	51.23

Table 4. Comparison of FaceSigns (Semi-fragile) at Different Message Lengths

The Bit recovery accuracy (BRA) for benign and malicious transforms is averaged across the transformations given in Table 3.

A.2 Discriminator Ablation study

Table 5 presents a comparison of results for the FaceSigns watermarking technique (Semi-fragile) with and without the use of a discriminator loss. This comparison is conducted using a fixed message length of 128 bits. Comparing the PSNR values, it’s evident that the watermarked images generated using the discriminator loss in the training objective have a higher PSNR (35.43) compared to training without discriminator loss (34.21). This suggests that incorporating the discriminator loss has a positive impact on preserving image quality after watermarking. The SSIM values reinforce the observation made in PSNR.

Table 5.

Technique	PSNR	SSIM	BRA Benign (%)	BRA Malicious (%)
With Discriminator Loss	35.43	0.962	98.76	49.52
Without Discriminator Loss	34.21	0.941	98.61	50.18

Table 5. Comparison of FaceSigns (Semi-fragile) with and without the Discriminator Loss Using a Message Length of 128

A.3 Additional Image Examples

We present additional examples of original and watermarked images in Figure 12.

Fig. 12.

References

[1]

Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. Mesonet: A compact facial video forgery detection network. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS’18). IEEE, 1–7.

Abstract

1 Introduction

2 Background

2.1 Media Forgery

2.2 Facial Forgery

2.3 Digital Watermarking

3 Methodology

3.1 Message Encoding

3.2 Network Architectures

3.3 Transformation Functions

3.3.1 Benign Transforms.

3.3.2 Malicious Transforms.

4 Experiments

4.1 Datasets and Experimental Setup

4.2 Imperceptibility and Capacity

4.3 Robustness and Fragility

4.3.1 Benign Image Transforms.

4.3.2 Malicious Transforms.

4.4 Watermarking Images with Multiple Faces

5 Discussion - Threat Models

6 Related Work

7 Future Directions

8 Conclusion

Footnote

A Appendix

A.1 Message Length Experiments

A.2 Discriminator Ablation study

A.3 Additional Image Examples

References

Cited By

Index Terms

Recommendations

A novel fast self-restoration semi-fragile watermarking algorithm for image content authentication resistant to JPEG compression

Image authentication based on digital signature and semi-fragile watermarking

Social Media Authentication and Combating Deepfakes Using Semi-Fragile Invisible Image Watermarking

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations