POP: A Parallel Optimized Preparation of Data for Data Mining

Christian Ernst; Youssef Hmamouche; Alain Casali

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

POP: A Parallel Optimized Preparation of Data for Data Mining

Topics: Data Reduction and Quality Assessment; Pre-Processing and Post-Processing for Data Mining

In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: IC3K, 36-45, 2015 , Lisbon, Portugal

Authors: Christian Ernst ¹ ; Youssef Hmamouche ² and Alain Casali ²

Affiliations: ¹ Ecole des Mines de St Etienne and LIMOS, France ; ² Aix Marseille Universite, France

Keyword(s): Data Mining, Data Preparation, Outliers, Discretization Methods, Parallelism and Multicore Encoding.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Data Reduction and Quality Assessment ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Pre-Processing and Post-Processing for Data Mining ; Symbolic Systems

Abstract: In light of the fact that data preparation has a substantial impact on data mining results, we provide an original framework for automatically preparing the data of any given database. Our research focuses, for each attribute of the database, on two points: (i) Specifying an optimized outlier detection method, and (ii), Identifying the most appropriate discretization method. Concerning the former, we illustrate that the detection of an outlier depends on if data distribution is normal or not. When attempting to discern the best discretization method, what is important is the shape followed by the density function of its distribution law. For this reason, we propose an automatic choice for finding the optimized discretization method based on a multi-criteria (Entropy, Variance, Stability) evaluation. Processings are performed in parallel using multicore capabilities. Conducted experiments validate our approach, showing that it is not always the very same discretization method that is the best. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 74.48.170.251

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Ernst, C. ; Hmamouche, Y. and Casali, A. (2015). POP: A Parallel Optimized Preparation of Data for Data Mining. In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - KDIR; ISBN 978-989-758-158-8; ISSN 2184-3228, SciTePress, pages 36-45. DOI: 10.5220/0005594700360045

@conference{kdir15,
author={Christian Ernst and Youssef Hmamouche and Alain Casali},
title={POP: A Parallel Optimized Preparation of Data for Data Mining},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - KDIR},
year={2015},
pages={36-45},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005594700360045},
isbn={978-989-758-158-8},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - KDIR
TI - POP: A Parallel Optimized Preparation of Data for Data Mining
SN - 978-989-758-158-8
IS - 2184-3228
AU - Ernst, C.
AU - Hmamouche, Y.
AU - Casali, A.
PY - 2015
SP - 36
EP - 45
DO - 10.5220/0005594700360045
PB - SciTePress