Authors:
Roberto Saia
;
Ludovico Boratto
and
Salvatore Carta
Affiliation:
Università di Cagliari, Italy
Keyword(s):
Fraud Detection, Pattern Recognition, User Model.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Business Analytics
;
Computational Intelligence
;
Data Analytics
;
Data Engineering
;
Data Mining in Electronic Commerce
;
Evolutionary Computing
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Soft Computing
;
Symbolic Systems
Abstract:
The exponential and rapid growth of the E-commerce based both on the new opportunities offered by the Internet, and on the spread of the use of debit or credit cards in the online purchases, has strongly increased the number of frauds, causing large economic losses to the involved businesses. The design of effective strategies able to face this problem is however particularly challenging, due to several factors, such as the heterogeneity and the non stationary distribution of the data stream, as well as the presence of an imbalanced class distribution. To complicate the problem, there is the scarcity of public datasets for confidentiality issues, which does not allow researchers to verify the new strategies in many data contexts. Differently from the canonical state-of-the-art strategies, instead of defining a unique model based on the past transactions of the users, we follow a Divide and Conquer strategy, by defining multiple models (user behavioral patterns), which we exploit to
evaluate a new transaction, in order to detect potential attempts of fraud. We can act on some parameters of this process, in order to adapt the models sensitivity to the operating environment. Considering that our models do not need to be trained with both the past legitimate and fraudulent transactions of a user, since they use only the legitimate ones, we can operate in a proactive manner, by detecting fraudulent transactions that have never occurred in the past. Such a way to proceed also overcomes the data imbalance problem that afflicts the machine learning approaches. The evaluation of the proposed approach is performed by comparing it with one of the most performant approaches at the state of the art as Random Forests, using a real-world credit card dataset.
(More)