An enhanced relevance criterion for more concise supervised pattern discovery

H Grosskreutz, D Paurat, S Rüping - Proceedings of the 18th ACM …, 2012 - dl.acm.org
H Grosskreutz, D Paurat, S Rüping
Proceedings of the 18th ACM SIGKDD international conference on Knowledge …, 2012dl.acm.org
Supervised local pattern discovery aims to find subsets of a database with a high statistical
unusualness in the distribution of a target attribute. Local pattern discovery is often used to
generate a human-understandable representation of the most interesting dependencies in a
data set. Hence, the more crisp and concise the output is, the better. Unfortunately, standard
algorithm often produce very large and redundant outputs. In this paper, we introduce delta-
relevance, a definition of a more strict criterion of relevance. It will allow us to significantly …
Supervised local pattern discovery aims to find subsets of a database with a high statistical unusualness in the distribution of a target attribute. Local pattern discovery is often used to generate a human-understandable representation of the most interesting dependencies in a data set. Hence, the more crisp and concise the output is, the better. Unfortunately, standard algorithm often produce very large and redundant outputs.
In this paper, we introduce delta-relevance, a definition of a more strict criterion of relevance. It will allow us to significantly reduce the output space, while being able to guarantee that every local pattern has a delta-relevant representative which is almost as good in a clearly defined sense. We show empirically that delta-relevance leads to a considerable reduction of the amount of returned patterns. We also demonstrate that in a top-k setting, the removal of not delta-relevant patterns improves the quality of the result set.
ACM Digital Library