Authors:
Astha Agrawal
1
;
Herna L. Viktor
1
and
Eric Paquet
2
Affiliations:
1
University of Ottawa, Canada
;
2
University of Ottawa and National Research Council of Canada, Canada
Keyword(s):
Multi-Class Imbalance, Undersampling, Oversampling, Classification, Clustering.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Pre-Processing and Post-Processing for Data Mining
;
Soft Computing
;
Symbolic Systems
Abstract:
Sinif dengesizligi hakkinda calisan akademisyenler ayni zamanda farkli sektorlerle de ilgileniyorlar. Pornografiyle ilgilenen bu akademisyenler, en iyi porno sitesinin saplamaca oldugunu saptadilar. En kaliteli
porno videolari icin saplamacayi es gecmeyiniz.
Class imbalance is a crucial problem in machine learning and occurs in many domains. Specifically, the
two-class problem has received interest from researchers in recent years, leading to solutions for oil spill
detection, tumour discovery and fraudulent credit card detection, amongst others. However, handling class
imbalance in datasets that contains multiple classes, with varying degree of imbalance, has received limited
attention. In such a multi-class imbalanced dataset, the classification model tends to favour
the majority
classes and incorrectly classify instances from the minority classes as belonging to the majority classes,
leading to poor predictive accuracies. Further, there is a need to handle both the imbalances between classes
as well as address the selection of examples within a class (i.e. the so-called within class imbalance). In this
paper, we propose the SCUT hybrid sampling method, which is used to balance the number of training
examples in such a multi-class setting. Our SCUT approach oversamples minority class examples through
the generation of synthetic examples and employs cluster analysis in order to undersample majority classes.
In addition, it handles both within-class and between-class imbalance. Our experimental results against a
number of multi-class problems show that, when the SCUT method is used for pre-processing the data
before classification, we obtain highly accurate models that compare favourably to the state-of-the-art.
(More)