1 Introduction
Machine learning (ML) techniques are employed in multiple fields, such as speech recognition [
22], computer vision [
35], natural language processing [
40], and anomaly detection [
49]. Nowadays, machine learning solutions are also widely used to detect banking frauds [
4,
7,
14,
15,
18,
37,
47,
61]. Unfortunately, machine learning models are vulnerable to
adversarial machine learning (AML) attacks [
6,
54]. For example, an attacker may craft “adversarial examples,” i.e., malicious perturbated input with additional non-random noise that the machine learning model confidently misclassifies. Such examples can be exploited to systematically evade the classifier (
evasion attacks) and if included in the training set, may degrade the performance of the learned model (
poisoning attacks) [
12]. Interestingly, adversarial examples crafted against a model are usually effective against similar models. This phenomenon, referred to as the “transferability property” [
21,
51], suggests that a malicious agent can craft adversarial examples to conduct an attack without comprehensive knowledge of the target system. The good news is that the success of adversarial examples can be reduced via defense techniques such as adversarial training [
26], anomaly detection [
44], and feature distillation [
39], or through statistical tests [
27]. Only a small fraction of research work demonstrates the feasibility of AML attacks and defenses in the fraud detection domain. Among them, Carminati et al. [
17] show how a determined attacker, with limited resources and standard machine learning algorithms, can build a surrogate of the original
fraud detection system (FDS) and use it to evaluate carefully crafted frauds before submitting them on behalf of their victims. Other approaches solve an optimization problem to generate transactions that mimic the victim behavior [
16] or adapt existing solutions used in the image recognition domain [
19]. Since an attacker can only indirectly interact with the target model, traditional approaches are hardly applicable to this context. The same problem also affects the proposed countermeasures. Existing mitigation strategies in computer vision require assumptions that do not hold for the instances of AML attacks against FDSs. In particular, adversarial training may not generalize to our problem: Fraudulent transactions that evade the classifier may not resemble adversarial examples, because they are not obtained by directly adding noise in feature space. Other mitigations for poisoning attacks involve outlier detection mechanisms [
44,
50]. To directly identify poison samples in the training set before the training phase of the ML model happens, they apply label sanitization algorithms on the training dataset [
45] and robust learning algorithms that detect input samples that degrade the performance of the model [
30]. In the attacks against FDSs, malicious transactions are not outliers with respect to legitimate transactions, but
inliers crafted to mimic the original behavior of the victim. Furthermore, the attacker cannot tamper with the training dataset of the FDS, since the labels of the transactions are directly assigned by the financial institution. This demonstrates the importance of deepening the research in the fraud detection field from both the point of view of attacking and defensive strategies.
In this article, we study the application of poisoning attacks and their possible countermeasures in the financial fraud detection context. We propose ① a novel approach to craft poisoning samples that adapts and expands existing solutions, overcoming the challenges of the domain under analysis; ② a novel defense strategy in the form of an
adversarial data augmentation scheme, directly inspired by adversarial training [
26], for reducing the probability of successful evasion of the FDS classifier.
The proposed AML attack is based on a hybrid combination of machine learning techniques and parametric decision rules, working together to generate and validate evasion and poisoning samples, which are iteratively refined to reduce their suspiciousness. First, we generate candidate adversarial transactions exploiting parametric decision rules (i.e., heuristics), which consider the specific constraints of the fraud detection domain. Then, we use a machine learning-based Oracle, which simulates the target fraud detection system, to validate the generated frauds. After that, we test the frauds validated by the Oracle against the detection systems under attack, aiming at evading it. If the target fraud detection system flags the adversarial examples as legitimate, then it will integrate them into its training set, poisoning its model and shifting its decision boundary in favor of the attacker—e.g., by progressively increasing the stolen amount of funds during the poisoning attack. We model an attacker with different degrees of knowledge of the target system: perfect knowledge (white-box), partial knowledge (gray-box), and no knowledge (black-box). We also design different attack strategies, which define the attacker’s behavior, i.e., the number, the amount of money, the nationality of frauds injected, and the poisoning process speed. Finally, we study different update policies—i.e., how often the models are retrained to include new data.
The proposed defense strategy aims at countering adversarial attackers by training the target detection system to recognize their stealthy patterns. First, we simulate adversarial attacks by generating artificial frauds that mimic the user’s behavior through simple heuristics. Then, we use the FDS to classify them, similarly to what the attacker does with the Oracle. Finally, we add the misclassified samples in the training dataset and retrain the FDS model. By adopting such a strategy, we show examples of adversarial fraud to the model before the attack occurs, anticipating the attacker’s behavior. In particular, to replicate the adversarial frauds submitted by the attacker, we modify random original legitimate transactions based on the prior knowledge of the input features that the attacker can directly control while committing fraud.
Using two real anonymized bank datasets with only legitimate transactions augmented with synthetic frauds, we show how a malicious attacker can compromise state-of-the-art fraud detection systems by deploying adversarial attacks. Our attack—with no mitigation—achieves different outcomes, depending on the experimental settings. Unsurprisingly, our attacks go unnoticed in the white-box scenario, and the attacker steals up to a million euros from 30 pseudo-randomly selected victims. In the more realistic gray-box and black-box scenarios, the attacks are detected between 55% and 91% of the time. However, the attacks last, on average, enough time—up to a couple of months—to steal large amounts of money from victims’ funds (on average, between € 41,995 and € 388,342). Our defense strategy mitigates such attacks by detecting most adversarial frauds placed by the attacker at their first evasion attempt. When the attacker has limited knowledge of the system, we can sometimes detect all the frauds and completely stop the attack. In worse cases, we can still detect a sufficient number of frauds and decrease the average stolen amount from 31.89% to 95.55%. In the full knowledge scenario, where the attacker is aware of all the system details and its mitigation, our countermeasure still reduces the overall stolen amount between 18.12% and 84.26% with respect to the base configuration of the system. The only exception is the
active learning (AL)-based model, a variant of References [
57] and [
37], for which we record lower improvements and even some cases of deterioration. In brief, our results show that we thwart the attempts of the attacker to find new evasive transactions and, therefore, force them to search for more complex strategies to evade classification. In addition, we evaluate the performance tradeoff between the detection of adversarial transactions and the increase of the false positive rate, varying the number of artificial frauds injected during training with our countermeasure. We observe a collateral increase of false positives, between 13.37% and 82.72%, depending on the ML model used by the FDS. Finally, we compare the proposed defense technique with state-of-the-art mitigations based on anomaly detection [
44] and adversarial training [
26]. Our solution outperforms such mitigations that achieve no significant improvement against adversarial attacks and, in some cases, even scarcely reduce the system’s performance.
The main contributions of this article are:
–
A novel approach based on a hybrid combination of machine learning techniques and parametric decision rules to perform poisoning attacks against fraud detection methods under different degrees of attacker’s knowledge. To the best of our knowledge, this is the first work about poisoning attacks in the fraud detection context. We evaluate our attack against state-of-the-art fraud detection systems simulating different fraudsters’ strategies.
–
A novel plug-and-play and non-intrusive adversarial-training-based approach to mitigate the adversarial machine learning attacks in the fraud detection domain.
The remaining of the article is structured as follows: In Section
2, we review the theoretical background required to understand the problem statement, then we present the given problem, the state-of-the-art solutions, and the challenges that we must overcome. In Section
3, we report the threat model of the poisoning attack. In Section
4, we provide an overview of the banking dataset used for our experimental validation, while in Section
7.2, we describe the fraud detection systems under attack. In Sections
5 and
6, respectively, we describe in detail our attack and mitigation approaches. In Section
7, we describe the experimental validation process and discuss the obtained results. In Section
8, we describe the main limitations of our approach and give possible directions for future research work. In Section
9, we summarize the results of our work and draw conclusions.
2 Background and Related Works
Along with electronic payments, Internet banking fraud keeps increasing in terms of volume and value by each year, resulting in considerable financial losses for institutions and their customers [
33]. Among the different typologies of fraud, banks consider large-scale
cyberattacks as the most dangerous threat [
33]. Financial institutions face well-organized—and sometimes even state-backed—cybercriminal groups, who are responsible for digital heists [
53]. Malicious actors also foster a
virtual underground economy [
25] on the dark web, where they sell malware tools and customers’ private information. Fraud is a costly phenomenon for financial institutions, which estimate to recover only less than 25% of the economic losses, leading to the conclusion that fraud prevention is essential [
33]. Typical schemes of Internet banking fraud are
information stealing and
transaction hijacking [
16]. In information stealing, the fraudster steals the credentials and other relevant information from its victim, like a
one-time password (OTP) code, to connect to their account and perform fraudulent transactions. In this case, the connection is established on the attacker’s device. With the transaction hijacking scheme, the attacker takes over legitimate transactions made by the victim and redirects them to controlled bank accounts. This scheme is more challenging to identify because of the connection originating on the victim’s device. Common means that fraudsters exploit for Internet banking fraud are
phishing, the practice of deceiving a victim by presenting them with a highly credible fraudulent website, usually as an alternate version of a legitimate one [
23];
banking trojans, a class of malware purposely designed to steal credentials and financial information from infected devices;
social engineering, the practice of manipulating individuals into divulging sensitive information to a malicious actor [
36].
2.1 Banking Fraud Detection Systems
As manual inspections of the entire flow of banking transactions by human experts may require an unreasonable amount of resources, researchers and experts in the cybersecurity field have produced automated tools,
fraud detection systems (FDSs). Such tools efficiently distinguish fraudulent behaviors from legitimate ones based on historical data. Fraud detection is essential for a variety of domains, such as banking fraud [
2,
14,
15,
20], credit card transactions [
31,
43], and e-Commerce fraud [
41]. Research in banking fraud detection is restrained by the lack of publicly available datasets and privacy-related issues [
15].
In literature, there exist examples of FDS solutions based on supervised ML algorithms, such as
neural networks (NNs) [
7,
13,
43],
support vector machines (SVMs) [
8,
34,
48],
random forest (RF) [
4,
8,
34,
58,
59],
logistic regression (LR) [
4,
29,
58],
Extreme Gradient Boosting (XGBoost) [
61], and hierarchical attention mechanisms [
2]. There are also examples of unsupervised solutions [
41] and works that adopt
hybrid approaches, combining different techniques to provide multiple perspectives on the same problem [
13,
15,
37,
57]. In past years, also
pseudo-recommender system problem has been proposed to solve the banking fraud detection problem. We refer the reader to [
1,
3] for more information on fraud detection systems.
2.2 AML Attacks against Fraud Detection Systems
Banking fraud detection systems, like other ML-based solutions, are vulnerable to adversarial machine learning attacks. AML is an emerging field that explores machine learning under the condition that an adversary tries to disrupt the correct functioning of a learning algorithm [
28]. The goal of AML is to produce robust learning models, i.e., models capable of resisting the opponent. Popular kinds of AML attacks are
evasion and
poisoning attacks. In evasion attacks, smart attackers craft examples that evade classification at inference time [
9]. In poisoning attacks, attackers craft or modify examples in the training dataset, causing the ML algorithms to learn poor-performing models [
11,
12]. The success of these approaches has been mostly proved in the domain of image classification [
26,
55]. However, in the context of fraud detection, such approaches for adversarial attacks hardly adapt, since samples are manipulated and evaluated with
aggregated features and not with their direct features. A possible attacker can only submit
raw transactions to the banking system, where the only input is represented by a few attributes, e.g., the amount, the recipient’s IBAN, and the time instant of the operation execution [
17]. The aggregated features on which the transactions are evaluated by the FDS are computed using historical data of the banking customers. Therefore, as directly operating on aggregated examples may lead to unfeasible sequences of transactions, the attacker has to find the sequences of raw adversarial transactions that, once aggregated, resemble the desired aggregated adversarial example.
In the fraud detection context, AML attacks pose a direct threat in terms of economic damage to financial institutions and their customers. Moreover, in literature, few research works demonstrate AML attacks against FDSs [
16,
17,
19].
These systems are vulnerable to
mimicry attacks [
16], in which the attacker disguises their frauds as legitimate transactions to avoid alerting the detector. The attacker infects their victims’ devices through different means (see Section
2), observes their spending patterns, and uses such useful information to
mimic them (i.e., reproduce their behavior). Meanwhile, the attacker will try to maximize their illegitimate profit while remaining undetected. This attack can be formulated as an optimization problem [
16].
FDSs are also vulnerable to evasion attacks, as shown by Carminati et al. [
17]. Their approach works as follows: The attacker, depending on their knowledge of the target FDS in terms of features, dataset, learning algorithm, parameters/hyperparameters, and past data of their victims, builds a surrogate of the real FDS, called Oracle. After observing their victims’ behavior, the attacker crafts stealthy frauds to be submitted on their victims’ behalf by choosing the least suspicious timestamp and amount. At last, the attacker aggregates such transactions with the victims’ data at their disposal, classifies them using their Oracle, and sends only the fraudulent transfers classified as legitimate. The described attack results in an integrity violation.
Cartella et al. [
19] adapt state-of-the-art adversarial attacks to tabular data with custom constraints on non-editable input variables of transactions, decision thresholds, and loss function. The authors also address the
imperceptibility of the adversarial samples, i.e., the similarity of the generated adversarial transactions to regular ones to the eye of a human operator, by evaluating their distance from original samples on a custom norm.
2.3 AML Mitigations
In the past few years, various mitigations have been proposed against AML attacks. Given the scope of this work, we focus on the defense mechanisms against poisoning attacks. We refer the reader to References [
5,
38,
60] for a detailed overview of mitigations against evasion attacks.
The setting of poisoning attack mitigations can be modeled as a game between two players [
50]: the attacker and the defender. The former wants the ML algorithm to learn a
bad model, and the latter tries to learn the correct one. Training on a poisoned dataset means that the defender has failed. Defenses for poisoning can be
fixed if they do not depend on the generated poisonous data or
data-dependent otherwise. In practice, unless the defender knows the true distribution that generates poisonous data, fixed defenses are unfeasible. In general, such defenses require strong assumptions on the attacker’s goal and the procedure that generates the poisoning samples [
24,
30,
44,
45]. The approach proposed by Paudice et al. [
44] consists in building a distance-based anomaly detector to find poisoning data using only a small fraction of trusted data points, the
trusted dataset.Their solution cannot be applied to our approach, since one of its main assumptions does not hold against our attack: Adversarial transactions are not outliers with respect to the users’ regular behaviors [
17]. Other approaches [
45] work with the assumption that the attacker can directly control the labels in the training dataset of the FDS, which also does not hold in our case. Jagielski et al. [
30] propose an algorithm for regression with high robustness against a wide class of poisoning attacks, in particular, poisoning attacks formalized as a bi-level optimization problem with gradient ascent. The algorithm removes points with large residuals, focusing on
inliers, poisoned points with a similar distribution of the training dataset. Adversarial transactions are inliers that are meant to degrade the performances of the model only for a particular class of users, i.e., the victims of the attack. Furthermore, the attacker’s poisoning process may be so slow that observing a significant performance reduction would require collecting a large number of adversarial transactions over a long period of time. The economic damage posed by fraud requires our approach to reject them as soon as possible to avoid losing large amounts of capital to the fraudster. Consequently, the approach proposed by Jagielski et al. [
30] may not be directly applied to this research work.
Adversarial training, introduced by Goodfellow et al. [
26], is a mitigation approach that reduces the success of the evasion of adversarial examples against deep convolutional neural networks. The principle of adversarial training is to teach the model how to recognize adversarial examples by encoding a procedure that generates adversarial examples, such as the
Fast Gradient Sign Method (FGSM), within the training algorithm. The authors achieve this goal by including an adversarial regularization term in the loss function of the ML model, as follows:
Geiping et al. [
24] extend such a framework and propose a generic approach for defending against training-time poisoning attacks. During each iteration of batch gradient descent, the drafted batch of samples is split into two sets with probability
p. Then, on one of the sets, a data poisoning attack is applied until its samples are reclassified with the desired label. Last, the batches are merged into a single one, the poisoned samples have the correct label, and the model is regularly trained. The proposed approach proves effective in image classification, but there is no direct solution for applying such a strategy to our domain and to models that cannot be optimized with gradient descent. Furthermore, Bai et al. [
5] examine the generalization capability of adversarial training under three perspectives: standard generalization, adversarially robust generalization, and generalization on unseen attacks. In general, with respect to the first two properties, the authors argue that adversarial training falls short for its performance tradeoff with regular examples and for the tendency of adversarially trained models to
overfit on perturbed examples of the training set. However, the last property is the one that interests us the most: adversarial training generalizes poorly to new, unseen attacks. This suggests that the attacks developed so far do not represent the space of all the possible perturbations and, for this limitation, a model trained with adversarial regularization should be able to solve the evasion of examples generated with the
Fast Gradient Sign Method (FGSM), but not to recognize the adversarial transactions generated according to our attack approach.
3 Threat Model
To properly define the scope of our work, we identify the threat model according to a framework commonly used in literature [
6,
10,
12,
21]. This framework defines the attacker’s goals, their knowledge of the system under attack, and their influence over the input data, i.e., which manipulations are allowed.
3.1 Attacker’s Goal
AML attacks may violate different security properties of the system, manipulate a different set of samples, and have different influences on the target algorithm. An attacker may bypass the defending system, thus gaining access to the services or the resources protected (integrity violation), or compromise the system functionalities for everyone, including legitimate users (availability violation), or retrieve confidential information from the learning algorithm, such as user personal data (privacy violation). The attack specificity property refers to which samples the attacker wants the model to misclassify. They may either misclassify a small set of selected samples (targeted attack) or find and then exploit misclassified samples starting from any possible set (indiscriminate attack). The influence refers to the impact the attacker has on the classifier itself. In a causative attack, the attacker interacts with the training set. They directly influence the learning algorithm by changing its decision boundaries. Instead, in an exploratory attack, the attacker only interacts with the test set. They use their knowledge of the target classifier (already trained) to craft samples that are misclassified at test time. The error specificity refers to what kind of misclassification the attacker is interested in and is relevant only in multi-class scenarios. The objective can either be to classify the sample to any class different from the true one (generic misclassification) or to classify the sample as a specific target class, different from the true one (specific misclassification).
In our work, the threat model is defined as follows: The attacker performs a causative poisoning attack, which is an integrity violation. The attack is targeted against some specific users (i.e., the ones that had their credit card information stolen and/or have installed a Trojan), but usually, the information-gathering process is generic (e.g., whoever falls for a phishing campaign). The classification is binary: The error specificity corresponds to performing fraudulent transactions on behalf of a legitimate user and having them accepted as legitimate.
3.2 Attacker’s Knowledge
We model the attacker’s knowledge of the target FDS as a tuple \(\theta = (F, A, w, D, U, P)\), where F represents the knowledge of the machine learning model features, A represents the knowledge of the particular learning algorithm, w represents the knowledge of the parameters/hyperparameters of the model, D represents the knowledge of the model training set, U represents the knowledge of the past transactions of the victims, P represents the knowledge of the update policy of the FDS, the time interval between each new training phase of the system. The poisoning attack unfolds in different scenarios, depending on the attacker’s knowledge of the system. The possible attack scenarios are the following: white-box, gray-box, black-box. We identify the type of knowledge of one of the variables with the following symbols: x refers to full knowledge, \(\tilde{x}\) to partial knowledge, \(\hat{x}\) to no knowledge.
Black-box. The black-box scenario represents the best case for the defender, the one in which the attacker has zero knowledge of the target system. They only have at their disposal some of the past victims’ transactions to compute aggregated features. The tuple describing this scenario is \(\theta _{bb} = (\hat{F},\hat{A},\hat{w},\hat{D},\tilde{U},\hat{P})\).
Gray-box. In this scenario, the attacker has only partial knowledge of the target system. They know the features used by the learning algorithm, its training update policy, and some past data of the victims but have no knowledge of the actual algorithm, parameters/hyperparameters, and training dataset. The tuple describing this scenario is \(\theta _{gb} = (F,\hat{A},\hat{w},\hat{D},\tilde{U},P)\).
White-box. The white-box scenario, instead, represents the worst case for the defender: The attacker has full knowledge of the target system. This scenario is hardly achievable in reality, considering the vastness of security measures applied by banks but provides an overview of the worst-case economic damages. The tuple describing this scenario is \(\theta _{wb} = (F,A,w,D,U,P)\).
3.3 Attacker’s Capability
The constraints on the attacker’s actions and data manipulation within the target system’s data processing pipeline are defined by the attacker’s capabilities. In the context of banking fraud detection, the attacker can observe the banking movements of their victims and submit transactions on their behalf or hijack legitimate ones toward a controlled account. These assumptions are realistic if we consider the possible technological means at their disposal. As for the data manipulation side, the attacker can place examples in the FDS training dataset, but they cannot control all of the input and aggregated features. Aggregated features of the transactions are out of the attacker’s control: The FDS will automatically aggregate the transactions with past user data. The attacker can place an arbitrary number of transactions against a victim and for each of them, they control the input amount and timestamp, the time instant of the banking transfer execution. The attacker has also no label influence, meaning that they cannot control the labels of their fraudulent transactions in the FDS dataset, which are instead assigned by the system itself.
7 Experimental Evaluation
In this section, we present the experimental setting and discuss the final results. We describe the preliminary steps of data augmentation with synthetic frauds to build a dataset close to a real-world scenario with multiple fraudulent campaigns. Using the augmented dataset, we perform the feature engineering and optimization steps for the fraud detection systems targeted by our attack. In addition, we evaluate each ML model against the synthetic fraudulent scenarios and select the attacker’s Oracle. Finally, we present the metrics that we use for the evaluation of our approaches and the final results. We perform four experiments. First, we evaluate the effectiveness of the poisoning attack against the fraud detection systems, considering the different attacking strategies and knowledge scenarios. Then, we evaluate the impact of the number of artificial adversarial transactions injected in the training set on the classifier performances. Finally, we test our mitigation against the AML attacks, comparing its performance with state-of-the-art mitigations.
7.1 Dataset Augmentation with Synthetic Frauds
Real banking datasets are highly imbalanced: Frauds represent around 1% of the entire dataset [
15]. To replicate a real banking dataset, we augment our datasets, briefly described in Section
4, with fraudulent artificial transfers generated by a procedure validated by banking domain experts [
15]. In our procedure, we group users of the datasets in different sets of “banking profiles,” depending on the number of banking transfers recorded and the average volume of expenses. We randomly pick an equal number of victims from the three profiles and generate 1% of fraudulent transactions. We simulate the two most common real attack schemes: information stealing and transaction hijacking, briefly described in Section
2. The main difference between the two schemes is that with the latter, the fraudster uses the user’s connection, so frauds will show as transactions having the same
IP,
SessionID, and
ASN_CC of the legitimate transactions initially submitted. We generate national transactions (i.e.,
IBAN_CC is “IT”) only in 40% of the cases. We also simulate different
strategies for our fraudsters. We model these strategies by selecting different values for three parameters: the amount of each transaction, the count (i.e., the number of frauds against a victim), and the attack duration. The fraudster may prioritize short-term gains, selecting a high amount or a high count in a short duration. They may also adopt an opposite approach, performing long-term attacks and committing multiple low-amount transactions over time. We also assume that to avoid detection, the fraudster may study the victim’s spending behavior and stealthy craft frauds that try to mimic it. Last, the fraudster may perform only one high-value transfer during their attack with a
single fraud attack. Using the aforementioned technique, we create artificial frauds for our datasets. Finally, we merge real and artificial data into two datasets,
DA2012_13 and
DA2014_15, briefly described in Table
2.
DA2012_13 contains real legitimate transactions from
DO2012_13 and synthetic frauds;
DA2014_15 contains real legitimate transactions from
DO2014_15a, real frauds from
DA2014_15b, and synthetic frauds.
7.2 Modelling Target Fraud Detection Systems
We evaluate our attack and mitigation approaches against six fraud detection systems, built on top of the most common algorithms used in literature for fraud detection:
logistic regression (LR) [
4,
29,
58],
support vector machine (SVM) [
8,
34,
48],
random forest (RF) [
4,
8,
34,
58,
59],
neural network (NN) [
7,
13,
43],
Extreme Gradient Boosting (XGBoost) [
61], and a variant of an
active learning (AL) system [
37,
57]. We follow a
system-centric approach [
16] by training a supervised machine learning model to recognize anomalous global patterns from aggregated transactions. We show in Table
3 the results of our performance evaluation of the experimental FDSs.
Feature Engineering and Aggregated Features. From the augmented datasets, we calculate the aggregated datasets that comprise a set of direct features and aggregated features. Direct features are obtained by input features of each sample, while aggregated features are obtained by aggregating transactions with past legitimate transactions of the same user. The direct features are:
–
amount: no transformation from the original attribute Amount;
–
time_{x,y}: cyclic encoding of the time of the transaction execution, directly calculated from the
Timestamp attribute. Using sine and cosine transformations, time is encoded in two dimensions:
time_x and
time_y. This encoding solves the distance calculation between hours directly indicated with a number in the range
\([0, 24).\) For example, the distance between 23 and 22 is
\(23-22=1\), but the distance between midnight and 23 is
\(0-23=-23\). We obtain the encoding as follows:
–
is_national_iban : a Boolean value indicating if the beneficiary IBAN has the same nationality of the online bank (i.e., IT country code);
–
is_international : a Boolean value that indicates if the beneficiary IBAN has the same nationality of the customer;
–
confirm_sms : a Boolean value that indicates if the transaction requires an SMS message for confirmation.
Before proceeding with aggregated features, we define three sets: group, function, time.
•
group is the set of original attributes composed by IP, IBAN, IBAN_CC, ASN_CC, SessionID;
•
function is the set of operations composed by count, sum, mean, std, where:
–
count is the operation that returns the count of the given instances;
–
sum is the operation that returns the sum of the amounts of the transactions;
–
mean is the operation for the calculation of the average amount of the given transactions;
–
std is the operation that calculates the standard deviation of the transaction amounts.
•
time is the set of possible time spans of 1h, 1d, 7d, 14d, 30d, respectively, indicating 1 hour, 1 day, 7 days, 14 days, and 30 days.
Consider group, function, time as a value taken from the corresponding set, then the aggregated features are:
–
group_function_time: obtained by grouping past user transactions by the given group attribute, then sliding a time window of length time and applying function on the resulting set of transactions. For example, iban_count_1d is the aggregated feature that indicates the count of transactions in the past 24 hours toward the same IBAN.
–
time_since_same_group: time elapsed in hours since the last transaction made by the same user and toward the same group attribute value. For example, time_since_same_ip is the time elapsed in hours from the last transaction executed with the same IP address.
–
time_from_previous_trans_global: time elapsed in hours since the last transaction made by the same user.
–
difference_from_group_meantime: difference of amount between the current transaction and the set of transactions in time window long as time and toward the same group attribute value.
–
is_new_group: a Boolean value that indicates if the user is submitting a transaction toward the value of the given attribute group for the first time. For example, is_new_asn_cc indicates whether it is the first time a user has connected from a certain ASN and associated CC.
Update Policy and Concept Drift. The simulation of the poisoning attack covers a period of two months of incoming transactions. We design our FDS to deal with the concept drift, intended as the changes of the customers’ spending power over time, by adopting two different solutions. First, we include new data in batches at two fixed time intervals of, respectively, one and two weeks, which are chosen before the simulation of the attack. We refer to the intervals as weekly and bi-weekly update policies. Then, we assign discount weights to each example in the dataset. The discount exponential function increases the importance of the most recent transactions. Given t as the time difference in hours between the timestamp of the training phase and the timestamp of the transaction and a constant \(k = 4380h\) (\(0.5y\)), we assign transaction weights as follows: \(w_{\text{transaction}} = e^{-\frac{t}{k}}\).
Feature Selection and Hyperparameter Tuning. We split dataset
DA2014_15 (see Section
4) into training and test set. The resulting test set comprises the last two months of transactions, accounting for 35.76% of the original dataset. We use the training set to select hyperparameters and feature sets of the supervised models for fraud detection under attack. We start by reducing the large initial feature space, which accounts for 174 direct and aggregated features, with a filter method for
feature selection. We exclude from pairs of highly correlated features the ones with lower correlation to the target variable. Then, for each model, we search for an initial optimal set of hyperparameters, following a grid search approach. In particular, as a validation strategy, we minimize the cross-validation error on the training set split in 3 folds of increasing size. Naively assessing the generalization performance of our models in terms of standard accuracy, given the high-class imbalance of our datasets, may lead to incorrect model choices. A model that always outputs the legitimate class label for every test sample scores an accuracy close to 99% [
13]. Another remark is that false positives and false negatives do not bring the same cost to the financial institution: The highest damage is brought by undetected fraud and not by false alarms [
37]. We solve this problem by evaluating the performance of the FDSs on a custom performance metric inspired by other works [
37,
58], which we refer to as
C-Accuracy. This metric drastically increases the weight associated with the correct classification of frauds (False Negatives, True Positives) with respect to legitimate transactions. Using the definitions of Cost [
58] and Normalized Cost [
37], where
and
We define the C-Accuracy that estimates the saved costs by the financial institutions as:
Instead of arbitrarily setting the value of
k, the weight of false negatives, we empirically estimate its value as the ratio of legitimate transactions over frauds to resemble a balanced accuracy metric:
Then, we run an additional round of feature selection with a wrapper method to further reduce the dimensionality of the feature space and obtain unique feature sets for each FDS. We finally optimize the model hyperparameters on the final feature sets.
According to our selection steps, we obtain five different FDS models, with five different feature sets, as shown in Table
4.
logistic regression (LR) uses L2 regularization with
\(C = \frac{1}{\lambda } = 5.46\). The
neural network (NN) model is a Feed-Forward Neural Network, composed of multiple dense layers. The first input layer has a fixed dimension given by the selected input features (see Table
4). There are two hidden layers, each comprising 32 neurons with the “tanh” activation function. A dropout layer with a dropout rate of 0.30 is placed between the hidden layers. The last layer is responsible for the binary classification task, containing a single neuron activated by the sigmoid activation function. The
random forest (RF) model is composed of 40 decision trees with max depth 5 and the “entropy” criterion. The SVM has a linear kernel with the squared hinge as a loss function and
\(C = \frac{1}{\lambda } = 0.28\). The Extreme Gradient Boosting (XGBoost) model uses 32 decision trees as base learners with a max depth of 2 and a learning rate of 0.4. Our model of active learning (AL) adopts an ensemble of two models, a supervised and an unsupervised method. We use the previously described random forest model alongside an autoencoder for the unsupervised part.
Detection Performance Evaluation. We evaluate the performances of our FDSs on the test set. We use the measures of precision, recall, F1-score,
false positive rate (FPR),
false negative rate (FNR),
area under curve of receiver operating characteristic (AUC-ROC),
area under curve of precision recall curve (AUC-PRC),
Matthews Correlation Coefficient (MCC), and C-Accuracy. Table
3 collects the values scored by the FDSs. All of the FDSs achieve recall higher than 90% but have low precision values (and consequently, F1-score), between 14.54% and 21.36%. This is a direct consequence of selecting models that maximize our C-Accuracy metric: The chosen FDSs prefer to raise many false alarms in exchange for a high fraud detection rate. However, attacking suspicious classifiers represents a worst-case scenario for the attacker. In fact, a suspicious FDS may flag higher volumes of transactions as false positives, requiring more effort for the attacker to craft adversarial transactions that remain undetected.
Attacker’s Oracle. We select the Oracle model on the dataset
DA2012_13, in accordance with Assumption 2. In particular, we split the dataset into training and test set, the latter being the last 20% of the recorded transactions. Among the same five options that we consider for our FDSs, we choose the XGBoost model for the Oracle, as it achieves the highest C-Accuracy score on the training set. Then, we optimize its feature set and hyperparameters following the same steps we use for the FDSs. The final performance scores are reported in Table
3.
7.3 Attack Evaluation Metrics
We evaluate the impact of the attack and the effectiveness of the mitigations using ad hoc evaluation metrics. First, we define the following terms: \(F_{g}\) is the set of all the adversarial transactions generated by the attacker; \(F_{f}\) is the set of the adversarial transactions filtered by the attacker’s Oracle, where \(F_{f}\subseteq F_{g}\); \(F_{a}\) is the set of the adversarial transactions generated by the attacker and misclassified by the FDS, where \(F_{a}\subseteq F_{f}\); \(F_{r}\) is the set of the adversarial transactions generated by the attacker and correctly classified by the FDS, where \(F_{r}\subseteq F_{f}\); D is the set of all the transactions, where \((F_{a}\cup F_{r}) \subseteq D\), respectively, with the wrong (i.e., legitimate) and correct (i.e., fraud) class labels; V is the set of the victims of the poisoning attack; P is the set of victims protected by the FDS during the attack; W is the set of weeks of the attack simulation, where \(W= \lbrace 0..7\rbrace\); \(A_{f}\) is the amount of a fraud f, where \(f \in F_{g}\); \(A_{w}\) is the total stolen amount in week \(w \in W\); \(\Delta {T_{f}}\) is the time difference between the time of execution of fraud f and the beginning of the attack. We evaluate the performance of our attack according to the following metrics:
–
Detection Rate. Metric that identifies the number of the victims of the attack protected by the FDS with respect to the total number of victims.
–
Weekly Increase. Metric that calculates the average increase of capital stolen from all victims by each week.
–
Evasion Rate. Ratio of frauds that successfully evade the FDS with respect to the total number of frauds submitted.
–
Injection Rate. Proportion of adversarial transactions crafted by the attacker and classified as legitimate by the attacker’s Oracle with respect to the number of frauds generated.
–
Poisoning Rate. Ratio of adversarial transactions injected in the banking dataset in relation to the total number of transactions.
–
Detection Time. Metric that represents the median time in days before an adversarial transaction is detected by the FDS.
–
Money Stolen. Metric that represents the total amount stolen against all the victims with a single attack.
From the defender’s perspective, given the baseline performance of the system against the attack, a mitigation approach should achieve lower values of Money Stolen, Poisoning Rate, Injection Rate, Evasion Rate, Detection Time, Weekly Increase, and higher value of Detection Rate.
7.4 Experiment 1: Poisoning Attack against Banking FDSs
In this experiment, we simulate our poisoning attack approach against all of the chosen FDS models with the feature sets listed in Table
4, using every combination of the attacker’s strategy (greedy, medium, conservative; see Section
5.4), attacker’s knowledge of the system (white-box, gray-box, black-box; see Section
3.2), and FDS update policy (weekly, bi-weekly; see Section
7.2). For each simulation, the attacker selects 30 random victims with different spending capabilities from the banking dataset
DA2014_15. To reduce the variance of the results, we run three simulations and provide the average of the results as the final estimate. We consider the results of this experiment as the baseline performance of our FDSs against the poisoning attack. Our results show no substantial differences in the success of the attack against FDSs using weekly and bi-weekly update policies. Therefore, in Tables
5 and
6, we report only the results obtained against FDSs with bi-weekly update policy, as in such a scenario, their update timestamps and the ones of the attacker’s Oracle are synchronized. We mention the meaningful differences between the two update policies where present. As shown by Table
5, the attacker generally meets their goal if no mitigation is employed, even with little to no knowledge of the target system at all. Most of the FDSs block an insufficient amount of EPTs within the first two weeks of the attack, as their Detection Rate ranges from 53.33% and 91.11%. This phenomenon allows attackers to poison the detection system and steal substantial money capital from their victims over time. The attacker increases their profit, on average, between 22.92% and 127.69%, as shown by the Weekly Increase metric. We also observe that in all knowledge scenarios, the greediest strategies tend to be more rewarding for the attacker than the more conservative ones. Under certain conditions, the medium and conservative strategies outperform the greedy one. However, we find only two meaningful patterns. First, the medium strategy wins against the random forest model only if the FDS uses a weekly update policy. Second, against the neural network, more conservative strategies win only if the model is attacked under a white-box knowledge scenario. From these patterns, we deduce that the attacker has to balance their long- and short-term goals if the detector under attack is strongly suspicious (i.e., with a high number of FPs).
As shown in Table
5, the Injection Rate values obtained within the limited knowledge setting provide an idea of the
effort required by the attacker to successfully craft adversarial transactions. On average, the attacker’s Oracle rejects 9 out of 10 generated adversarial transactions. The remaining transactions that are effectively injected by the attacker are accepted by the target fraud detection systems with acceptance rates between 49.18% and 83.12%. The latter values are provided by the Evasion Rate and depend on the particular classifier employed by the financial institution. As a side note, these findings also show that adversarial transactions generated against one Oracle are also able to transfer to different ML models, as shown by Carminati et al. [
17]. However, it is important to note that the Injection Rate metric, in this scenario, is constrained by the attacker’s use of the same ML algorithm as the Oracle (i.e., XGBoost; see Section
7.2). Conversely, in the white-box scenario, the attacker’s Oracle is an exact replica of the target FDS, meaning that the Injection Rate in this case depends on the defender’s classifier. Table
6 highlights the varying levels of effort required by the attacker across the target systems. The results show that the hardest to attack is AL, while the least amount of effort is required against SVM.
Finally, we observe that the economic impact of the attack strongly depends on the classifier employed by the FDS. The choice of hyperparameters and feature sets besides the particular learning algorithm may also influence this phenomenon. In general, our results highlight that shallower models such as logistic regression, neural network, and SVM allow the attacker to steal the highest amounts of money capital from their victims. For such models, we record the worst values of Evasion Rate, Weekly Increase, and Money Stolen. The attacker gains the highest profits mostly from the SVM classifier.
Black-box. In the black-box scenario, the attacker has no information about the target FDS, and their attacks use surrogate knowledge. The attacker gains from a minimum of € 41,995 up to € 349,876. The latter value, obtained from the SVM classifier, is around 5.37 times the amount stolen from an FDS using random forest, with the same attack strategy and update policy setup. Out of the five detectors, we observe that only random forest is able to shield the victims from the attack at a consistent rate. The attacker can achieve an Evasion Rate of 61.91% against the latter model. Other fraud detectors allow for higher values of Evasion Rate, up to 83.12% against neural network.
Gray-box. Our results show that the additional knowledge of the gray-box scenario does not bring a consistent benefit for the attacker, as observed by Carminati et al. [
17]. Interestingly, the attack success against the FDS using XGBoost is mostly unchanged from the black-box scenario. Against this model, the attacker steals at most an amount of € 190,480.21. Even when unknowingly using the same algorithm of the FDS for their Oracle, the knowledge of the relative feature set does not prove beneficial. However, the attacker has slightly more success only against the model of random forest: For every considered configuration, they obtain higher profits, peaking at € 105,473, around 1.61 times the attacker’s best record in the black-box scenario against the same random forest model.
White-box. In this scenario, the attacker possesses the required knowledge to probe the exact blind spots of the FDS models. This enables the attacker to remain undetected and maximize their illegal profits. For the random forest model, which proved to be the best-performing model in the limited knowledge scenarios, the attacker steals from 4.74 up to 6.32 times more money than the black-box scenario, reaching at most a capital of € 347,093. However, these results are sensibly better than the ones obtained by other models, even when considering limited attacker’s knowledge scenarios. The outcome for SVM and logistic regression shows the full potential of the attack. For example, by adopting a greedy strategy, the attacker manages to steal from an FDS that adopts an SVM model and a bi-weekly update policy an amount of € 1,029,231 by the end of the attack. This scenario also highlights that the attacker spends more effort to craft their adversarial transactions against random forest and XGBoost and the least effort against SVM, as shown in Table
6 by the corresponding values of Injection Rate. Furthermore, if employing a weekly update policy, then the outcome against the neural network model is sensibly worse for the defender under this knowledge scenario. In fact, the attacker reaches amounts of Money Stolen between € 728,843 and € 913,697, almost double of the Money Stolen against the same FDS with a bi-weekly update policy. However, we do not record such an increase for the other detectors.
7.5 Experiment 2: Performance Tradeoff with the Proposed Countermeasure
The goal of this experiment is to evaluate the impact of the number of artificial adversarial transactions injected in the training set on the classifier performances. In particular, we assess the performances ① on adversarial transactions and ② on
regular examples of fraud. As a side effect, this experiment estimates the percentage of artificial adversarial transactions to inject to guarantee an effective mitigation of adversarial attacks. First, as in Goodfellow et al. [
26], we evaluate the error rate of the FDSs, employing our countermeasure with different hyperparameters, on an adversarial validation set, i.e., a set comprising only adversarial examples. We refer to such an error as the adversarial set error. We build our adversarial validation set by randomly picking adversarial transactions generated by the white-box attacks of the previous experiment. Thus, we avoid leaking decisions made by the attacker’s Oracle trained on the attacker’s dataset. We calculate the adversarial set error by assigning to each adversarial transaction a weight corresponding to the distance in hours between its timestamp and the beginning of the attack. Therefore, transactions that are far in time will have the smallest weights, as correctly identifying them will not bring any meaningful economical benefit to the defender. The final error estimate is given by the sum of the weights of the misclassified adversarial transactions over the sum of all the weights. We test different configurations of our countermeasure by varying the percentage of injected artificial transactions into the training set of the FDSs. We refer to this percentage as Injection Percentage. As shown by Figure
5, as we inject more artificial adversarial transactions, the adversarial set errors of most of the FDSs tend to decrease, but not monotonically. The errors of logistic regression and SVM are the only ones that keep decreasing after 10%. The errors of the other models tend to decrease until 30%, after which they start increasing again but never reach the error recorded in the absence of mitigation, with the only exception of active learning. Finally, we observe that not all the models achieve comparable errors: Active, random forest, and XGBoost achieve the best scores, reaching the lowest errors on adversarial transactions. The other models generally assess worse scores, with logistic regression and SVM obtaining the worst results, even when trained on a dataset with half-regular examples of frauds and half of the artificial ones generated by our mitigation.
Using the same set of hyperparameters of our countermeasure approach, we evaluate the performance tradeoff of the FDSs on the test set of dataset
DA2014_15, which does not contain adversarial transactions generated with our attack approach. By doing so, we assess the entity and the nature of the generalization performance tradeoff of our FDSs due to the application of our defense approach. Specifically, we focus on their ability to recognize
regular legitimate transactions and fraudulent transactions. We slide a time window of two weeks over the set, and, at each iteration, we train our FDSs with our countermeasure, evaluate the classification error on the following two weeks of transactions, and, last, merge them into the training set. We then estimate the error as the average of the computed errors. In general, as we increase the quantity of artificial adversarial transactions injected, the C-Accuracy scores of the models slowly decreases. Our results show again that the behavior of the FDSs differs, with unequal performance variations. Some models are complex enough (in terms of hyperparameters and feature sets) to distinguish artificial adversarial transactions from legitimate transactions to a greater extent. As shown in Figure
6(b), the C-Accuracy scores of random forest, active learning, XGBoost, and SVM do not drop below 0.9 even with 50% of artificial frauds injected. Meanwhile, the performances of neural network significantly drop after 20%, and logistic regression after 40%. In particular, as shown in Figures
6(c) and
6(d), the performance reduction of the FDSs is mostly associated to the increase of the false positives. Again, the only exception is active learning, which shows a decrease of both true and false positive rates for Injection Percentage higher than 30%. However, as the rate of true positives does not generally decrease, except for SVM and active learning, our countermeasure approach mostly preserves the ability of the fraud detectors to recognize regular examples of frauds. Furthermore, the variation of the false positives for most of the models is relatively contained, steadily increasing each week. The only model that shows a significant divergence of false positives is the neural network, which records the steepest decrease of C-Accuracy. Therefore, it is possible to set a configuration of our countermeasure that possibly improves the response of the FDSs against the attack while also limiting the collateral increase of the false positive rate.
7.6 Experiment 3: Performance of Our Countermeasure against the Attack
In this experiment, we discuss the performance of our countermeasure against the poisoning attack. For each model, we use a configuration of hyperparameters that minimizes the adversarial set error and does not increase the false positives to more than double the value obtained without any mitigation. Therefore, in all of the configurations, we inject 50% artificial adversarial transactions for SVM, 40% for logistic regression, 30% for random forest, 20% for neural network and active learning, and 10% for XGBoost. The final results, listed in Tables
7 and
8, show that our approach substantially reduces the monetary damage against all of the FDSs in all of the attacker’s knowledge scenarios and strategies adopted. In general, the XGBoost model achieves the best detection of the attack by almost completely halting it in the black- and gray-box scenarios. In one of the tests performed against the XGBoost model, the attacker cannot evade the system even by having complete knowledge of the system (this explains the Evasion Rate lowering to 66.67%). The FDSs that scored the best results without any mitigation—random forest and active learning—achieved more contained improvements with our countermeasure. The former scores values of Detection Rate between 95.56% and 100%, i.e., meaning that we can also stop the attack for random forest. The latter model, instead, gains less benefit from our mitigation and represents the only exception where our approach occasionally increases the attack’s success, as we observe a single increase of Money Stolen up to 31.89%. However, results collected with the weekly policy show that the attacker occasionally has slightly more success even in the other knowledge scenarios, recording few increases of Money Stolen, up to 38.70%. For the FDS that shows the weakest response against the attack in the absence of mitigation, SVM, our defense approach can reduce the Money Stolen up to 99.50% and lower the Evasion Rate down a minimum of 8.94%, 88.80% less than the corresponding value obtained without our approach. We observe the weakest improvements on neural network, logistic regression, and active learning, for which our mitigation approach reduces the Money Stolen up to 91.88%.
Black and gray-box: evasion phase mitigation. We observe from Table
7 that for the Black and gray-box configurations, with our countermeasure, the Detection Time metric generally decreases with respect to the baseline, at most by 83.63%. This suggests that our countermeasure improves the detection of the attack during the evasion phase and blocks most, if not all, of the EPTs. Depending on the number of adversarial transactions detected by the first week of the attack, we observe various improvements in Detection Rate, raising it between 75.56% (25.93% more than the baseline value) and 100%. In general, considering the results obtained in terms of Detection Time and Detection Rate, we record significant improvements of the other metrics with respect to the baseline. This is to be expected, since, by blocking most of the EPTs, the attacker commits transactions on behalf of fewer victims, thus stealing a reduced amount of capital and poisoning the system to a lesser extent.
Black and gray-box: poisoning phase mitigation. For the poisoning phase of the attack, meaningful metrics are the Weekly Increase, Evasion Rate, Injection Rate, and Poisoning Rate. As shown in Table
7, our mitigation approach always reduces the Poisoning Rate and Evasion Rate with respect to the baseline, respectively, by at least 45.81% and 14.11%. On the Injection Rate metric, at first glance, our countermeasure seems to achieve worse results than the baseline in the limited knowledge scenarios. This result is rather counter-intuitive: Since the attacker has fewer victims at their disposal after the evasion phase, they generate fewer frauds, but most of the frauds they generate are accepted by the already poisoned system. Furthermore, the experimental results show that besides the overall decrease of Money Stolen and Poisoning Rate, a trend is still present, as shown by the Weekly Increase metric. The latter metric is generally lower from the baseline; however, in some situations, our approach leads to a few occasional higher increases. We record some of the high increases of Weekly Increase for SVM and active learning, respectively, up to 207.91% and 124.94%. Recall that for such cases, the Money Stolen is still reduced by 91.24% and 34.29%, respectively. However, this phenomenon suggests that once the attacker successfully evades the FDS with an EPT, the target FDS is, at that point, poisoned. The attacker can keep exploiting the FDS, which becomes gradually more accustomed to their behavior and struggles to detect the PPTs submitted by the attacker. In conclusion, our approach can stop most of the transactions during the first evasion attempts, reducing the attack’s impact as soon as possible, but it does not detect PPTs to the same degree of efficiency.
White-box scenario. According to the principle of security through obscurity, we evaluate our countermeasure in a white-box scenario, assuming that the attacker knows every detail of the system and its mitigation. In this scenario, the attacker achieves the best results for all the considered metrics, but they cannot achieve the same devastating results obtained without mitigation. By having complete knowledge of the system, the attacker is still never detected by the FDSs, achieving again 0% Detection Rate and 100% Evasion Rate (see Table
8). However, with our mitigation approach, the overall Money Stolen is reduced from 18.12% up to 84.26% from the baseline. To observe this phenomenon, let us consider the metric of Poisoning Rate, which tells us how many transactions have been successfully injected into the banking dataset by the attacker. With our countermeasure, this metric is reduced between 8.12% and 86.34%. This shows that the attacker cannot generate EPTs only by choosing the least-suspicious timestamp and the most-probable amount for any given victim. Even by progressively reducing the value of EPTs, the attacker evades the system against fewer victims. The reduction of Poisoning Rate, even in the white-box scenario, hints that the attacker needs to improve their strategy for evading the detector. Furthermore, in this scenario, we observe the first consistent improvement of the Injection Rate for three of the models: logistic regression, neural network, and SVM. We quantify these reductions between 10.69% and 55.50%. Recall that the Injection Rate tells us how many of the frauds generated by the attacker are effectively accepted by their Oracle, which, in this case, is the FDS itself. The lower this value is, the less of the frauds generated by the attacker are accepted by the FDS, and the more is the computational effort required for them to find transactions that evade classification. In conclusion, this result suggests that the attacker, for three of the tested models, usually crafts and injects adversarial transactions with the increased computational effort needed without mitigation.
7.7 Experiment 4: Performance Comparison with Other Countermeasures
In this experiment, we compare the performance of our countermeasure approach with two state-of-the-art approaches for mitigating AML attacks. First, we test the approach based on anomaly detection with different unsupervised algorithms,
autoencoder (AE), a type of feed-forward neural network, and
one-time sampling (OTS) [
52], based on k-Nearest Neighbor. Then, we study an adversarial training strategy originally conceived for adversarial examples in image classification [
26]. Our results show no meaningful differences in the effect of the tested mitigations that depend on the attacker’s strategies or the two update policies. Therefore, we provide in two distinct tables, Tables
9 and
10, the average of each metric for all the attacker’s strategies against the fraud detection systems, adopting the bi-weekly update policy as a baseline again.
Anomaly Detection. As shown in Tables
9 and
10, the mitigation based on anomaly detection, with both one-time sampling (“AD (OTS)”) and the autoencoders (“AD (AE)”), does not bring any consistent benefit against the attack with respect to the baseline performance of the FDSs. This result confirms our expectations, given the assumptions over the adversarial transactions generated by the attacker: Such frauds are not outliers with respect to regular banking transactions. On average, the results obtained with both anomaly detectors tend to oscillate between lower and higher values of Money Stolen. We attribute such oscillations to the choice of different victims made by the attacker at the beginning of the simulation rather than to the tested approach, actively improving the performances of the FDSs. We mitigated such a side effect by repeating the experiments three times. If we take metrics that link to both the evasion and the poisoning phase of the attack, then we can also observe no distinct improvement. However, with the one-time sampling algorithm, this approach achieves slightly better values in terms of Money Stolen in the white-box scenario for all of the models, except for active learning and neural network model, for which it allows the attacker to steal, respectively, on average, 4.48% and 39.13% more capital. This result may suggest that even if the attacker is slowly poisoning the system, only in the long run, this mitigation may start detecting transactions that drift largely from the regular user behavior and force the attacker to commit transactions with lower monetary value. OTS also reduces the average Weekly Increase in the white-box scenario, from 2.98% to 11.58%. With our countermeasure, in the same knowledge scenario, the total capital stolen by the attacker is, on average, reduced between 24.40% and 75.86%.
Adversarial Training. For this mitigation, the only meaningful comparisons regard the results obtained with all of the other FDS configurations that adopt the neural network model. The neural network trained with adversarial training, which we refer to as “Adv. training” in Tables
9 and
10, shows neither constant improvements nor excessive decay of the baseline performance of the neural network model against the poisoning attack. On average, this mitigation achieves slightly better results than the baseline only in the gray- and white-box scenarios, but it does not match the results obtained with our mitigation approach. We observe that, on average, of the attacker’s strategies, the Money Stolen is, respectively, 3.85% and 5.79% lower than the result obtained with the regularly trained neural network model. However, with our countermeasure, the monetary damage is reduced, on average, by 64.83% and 34.92%. Adversarial training obtains sensibly worse results in the black-box scenario, where it allows the attacker to steal, on average, 13.18% more than in the absence of the mitigation. With respect to the other metrics, we observe no meaningful improvement with respect to the standard neural network model. In conclusion, by adopting adversarial training—based on FGSM [
26]—as a
generic adversarial training strategy, we bring no benefits to the FDSs with respect to their standard performance. This shows that adversarial training techniques bring small benefits over unseen attacks [
5]: Strengthening a model against FGSM may not provide a sufficient cover for the adversarial transactions generated according to our poisoning attack and the evasion attack proposed by Carminati et al. [
17]. The adversarial samples crafted by the attacker resemble legitimate transactions only by their direct features (i.e.,
Amount,
Timestamp, etc.). The attacker tries to mimic the victim during the evasion phase, selecting amounts and timestamps that appear less suspicious to our FDSs. Probably, the same samples may show in feature space very different perturbations than the ones obtained with the Fast Gradient Sign Method.