Data Mining algorithms have been the focus of much research in recent years and new techniques are being developed regularly. This thesis describes EvRFind, an application for rule discovery in the task of Data Mining.
EvRFind is a hybrid Genetic Algorithm that also employs techniques from statistics and machine learning to improve efficiency and performance of the search. Among the non-evolutionary components are algorithms such as gradient ascent local search (Hill Climbing), optimization methods designed to improve search speed, automatic concept generalization, and automatic expansion of the description language.
EvRFind creates predictive models in the form of a default hierarchy. Each hierarchy is comprised of a set of rules that are ordered by generality, and selected with a bias towards minimum-length and comprehensibility.
Experiments on several datasets are run to evaluate EvRFind, and the results are compared to published work. To properly evaluate and illustrate the features and expressive power of EvRFind, the Poker Hand Dataset was created. This dataset represents a very large, imbalanced, and challenging domain. There are several target concepts, each with varying distribution within the dataset. The results achieved by EvRFind are compared to those generated by several other machine learning algorithms.
Recommendations
On Optimal Rule Discovery
In machine learning and data mining, heuristic and association rules are two dominant schemes for rule discovery. Heuristic rule discovery usually produces a small set of accurate rules, but fails to find many globally optimal rules. Association rule ...
Refinement and selection heuristics in subgroup discovery and classification rule learning
New double beam algorithms for subgroup discovery (SD) and classification rules (RL).Algorithms can use different heuristics for rule refinement and rule selection.Variants of new SD algorithm give more interesting rules than state-of-the-art.RL ...
Simple association rules (SAR) and the SAR-based rule discovery
Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a problem of concern, as conventional mining algorithms often produce too many rules for decision makers to digest. ...