article

Mining outlying aspects on numeric data

Authors:

Akiko Campbell,

Changjie TangAuthors Info & Claims

Data Mining and Knowledge Discovery, Volume 29, Issue 5

Pages 1116 - 1151

https://doi.org/10.1007/s10618-014-0398-2

Published: 01 September 2015 Publication History

Abstract

When we are investigating an object in a data set, which itself may or may not be an outlier, can we identify unusual (i.e., outlying) aspects of the object? In this paper, we identify the novel problem of mining outlying aspects on numeric data. Given a query object $$o$$o in a multidimensional numeric data set $$O$$O, in which subspace is $$o$$o most outlying? Technically, we use the rank of the probability density of an object in a subspace to measure the outlyingness of the object in the subspace. A minimal subspace where the query object is ranked the best is an outlying aspect. Computing the outlying aspects of a query object is far from trivial. A naïve method has to calculate the probability densities of all objects and rank them in every subspace, which is very costly when the dimensionality is high. We systematically develop a heuristic method that is capable of searching data sets with tens of dimensions efficiently. Our empirical study using both real data and synthetic data demonstrates that our method is effective and efficient.

References

[1]

Aggarwal CC (2013) An introduction to outlier analysis. Springer, New York.

[2]

Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. ACM Sigmod Record, ACM, vol 30, pp 37-46.

Digital Library

[3]

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, VLDB '94, pp 487-499.

[4]

Angiulli F, Fassetti F, Palopoli L (2009) Detecting outlying properties of exceptional objects. ACM Trans Database Syst 34(1):7:1-7:62.

Digital Library

[5]

Angiulli F, Fassetti F, Palopoli L, Manco G (2013) Outlying property detection with numerical attributes. CoRR abs/1306.3558.

[6]

Bache K, Lichman M (2013) UCI machine learning repository.

[7]

Bhaduri K, Matthews BL, Giannella CR (2011) Algorithms for speeding up distance-based outlier detection. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD '11, pp 859-867.

[8]

Böhm K, Keller F, Müller E, Nguyen HV, Vreeken J (2013) CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: Proceedings of the 13th SIAM international conference on data mining, SDM '13, pp 198-206.

[9]

Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD '00, pp 93-104.

[10]

Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1-15:58.

Digital Library

[11]

Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco.

[12]

Härdle W (1990) Smoothing techniques: with implementations in S. Springer, New York.

[13]

Härdle W, Werwatz A, Müller M, Sperlich S (2004) Nonparametric and semiparametric modelss., Springer Series in Statistics Springer, Berlin.

[14]

He Z, Xu X, Huang ZJ, Deng S (2005) FP-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst/ComSIS 2(1):103-118.

[15]

Keller F, Müller E, Böhm K (2012) HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of the 28th international conference on data engineering, ICDE '12, pp 1037-1048.

[16]

Knorr EM, Ng RT (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th international conference on very large data bases, VLDB '99, pp 211-222.

[17]

Kriegel HP, Schubert M, Zimek A (2008) Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD '08, pp 444-452.

[18]

Kriegel HP, Kröger P, Schubert E, Zimek A (2009) Outlier detection in axis-parallel subspaces of high dimensional data. In: Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD '09, pp 831-838.

[19]

Müller E, Schiffer M, Seidl T (2011) Statistical selection of relevant subspace projections for outlier ranking. In: Proceedings of the 27th IEEE international conference on data engineering, ICDE '11, pp 434-445.

[20]

Müller E, Assent I, Iglesias P, Mülle Y, Böhm K (2012a) Outlier ranking via subspace analysis in multiple views of the data. In: Proceedings of the 12th IEEE international conference on data mining, ICDM '12, pp 529-538.

[21]

Müller E, Keller F, Blanc S, Böhm K (2012b) OutRules: a framework for outlier descriptions in multiple context spaces. In: ECML/PKDD (2), pp 828-832.

[22]

Paravastu R, Kumar H, Pudi V (2008) Uniqueness mining. In: Proceedings of the 13th international conference on database systems for advanced applications, DASFAA '08, pp 84-94.

[23]

Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD '00, pp 427-438.

[24]

Rymon R (1992) Search through systematic set enumeration. In: Proceedings of the 3rd international conference on principle of knowledge representation and reasoning, KR '92, pp 539-550.

[25]

Scott DW (1992) Multivariate density estimation: theory, practice, and visualization., Wiley Series in Probability and Statistics Wiley, New York.

[26]

Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall/CRC, London.

[27]

Tang G, Bailey J, Pei J, Dong G (2013) Mining multidimensional contextual outliers from categorical relational data. In: Proceedings of the 25th international conference on scientific and statistical database management, SSDBM '13, pp 43:1-43:4.

[28]

Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363-387.

Digital Library

Cited By

Feng CSerra ESpezzano F(2024)PARs: Predicate-based Association Rules for Efficient and Accurate Anomaly ExplanationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679625(612-621)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679625
Angiulli FFassetti FNisticò SPalopoli L(2024)Explaining outliers and anomalous groups via subspace density contrastive lossMachine Language10.1007/s10994-024-06618-8113:10(7565-7589)Online publication date: 23-Sep-2024
https://dl.acm.org/doi/10.1007/s10994-024-06618-8
Li ZZhu YVan Leeuwen M(2023)A Survey on Explainable Anomaly DetectionACM Transactions on Knowledge Discovery from Data10.1145/360933318:1(1-54)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3609333
Show More Cited By

Mining outlying aspects on numeric data
1. Information systems
  1. Information systems applications

Recommendations

Outlying property detection with numerical attributes

The outlying property detection problem (OPDP) is the problem of discovering the properties distinguishing a given object, known in advance to be an outlier in a database, from the other database objects. This problem has been recently analyzed focusing ...
Discovering outlying aspects in large datasets

We address the problem of outlying aspects mining: given a query object and a reference multidimensional data set, how can we discover what aspects (i.e., subsets of features or subspaces) make the query object most outlying? Outlying aspects mining can ...
Group Outlying Aspects Mining
Knowledge Science, Engineering and Management
Abstract
Existing works on outlying aspects mining have been focused on detecting the outlying aspects of a single query object, rather than the outlying aspects of a group of objects. While in many application scenarios, methods that can effectively mine ...

Comments

Information & Contributors

Information

Published In

cover image Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery Volume 29, Issue 5

September 2015

392 pages

ISSN:1384-5810

Issue’s Table of Contents

Copyright © Copyright © 2015 The Author(s).

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 September 2015

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Feng CSerra ESpezzano F(2024)PARs: Predicate-based Association Rules for Efficient and Accurate Anomaly ExplanationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679625(612-621)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679625
Angiulli FFassetti FNisticò SPalopoli L(2024)Explaining outliers and anomalous groups via subspace density contrastive lossMachine Language10.1007/s10994-024-06618-8113:10(7565-7589)Online publication date: 23-Sep-2024
https://dl.acm.org/doi/10.1007/s10994-024-06618-8
Li ZZhu YVan Leeuwen M(2023)A Survey on Explainable Anomaly DetectionACM Transactions on Knowledge Discovery from Data10.1145/360933318:1(1-54)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3609333
Angiulli FFassetti FNisticó SPalopoli L(2023)Counterfactuals Explanations for Outliers via Subspaces Density Contrastive LossDiscovery Science10.1007/978-3-031-45275-8_11(159-173)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1007/978-3-031-45275-8_11
Lüdtke SBartelt CStuckenschmidt H(2023)Outlying Aspect Mining via Sum-Product NetworksAdvances in Knowledge Discovery and Data Mining10.1007/978-3-031-33374-3_3(27-38)Online publication date: 25-May-2023
https://dl.acm.org/doi/10.1007/978-3-031-33374-3_3
Chen XWang XWang YHan CDuan L(2023)Learning Enhanced Representations via Contrasting for Multi-view Outlier DetectionDatabase Systems for Advanced Applications10.1007/978-3-031-30678-5_9(110-120)Online publication date: 17-Apr-2023
https://dl.acm.org/doi/10.1007/978-3-031-30678-5_9
Boukela LZhang GYacoub MBouzefrane SBaba Ahmadi S(2022)An approach for unsupervised contextual anomaly detection and characterizationIntelligent Data Analysis10.3233/IDA-21590626:5(1185-1209)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.3233/IDA-215906
Wu YWang XLi YGuo LLi ZZhang JWu X(2022)OWSP-Miner: Self-adaptive One-off Weak-gap Strong Pattern MiningACM Transactions on Management Information Systems10.1145/347624713:3(1-23)Online publication date: 4-Feb-2022
https://dl.acm.org/doi/10.1145/3476247
Mokoena TCelik TMarivate V(2022)Why is this an anomaly? Explaining anomalies using sequential explanationsPattern Recognition10.1016/j.patcog.2021.108227121:COnline publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1016/j.patcog.2021.108227
Chen ZDuan LWang X(2022)An Efficient Method for Outlying Aspect Mining Based on Genetic AlgorithmAdvanced Data Mining and Applications10.1007/978-3-031-22064-7_25(337-351)Online publication date: 30-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-22064-7_25
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents