skip to main content
article

Mining outlying aspects on numeric data

Published: 01 September 2015 Publication History

Abstract

When we are investigating an object in a data set, which itself may or may not be an outlier, can we identify unusual (i.e., outlying) aspects of the object? In this paper, we identify the novel problem of mining outlying aspects on numeric data. Given a query object $$o$$o in a multidimensional numeric data set $$O$$O, in which subspace is $$o$$o most outlying? Technically, we use the rank of the probability density of an object in a subspace to measure the outlyingness of the object in the subspace. A minimal subspace where the query object is ranked the best is an outlying aspect. Computing the outlying aspects of a query object is far from trivial. A naïve method has to calculate the probability densities of all objects and rank them in every subspace, which is very costly when the dimensionality is high. We systematically develop a heuristic method that is capable of searching data sets with tens of dimensions efficiently. Our empirical study using both real data and synthetic data demonstrates that our method is effective and efficient.

References

[1]
Aggarwal CC (2013) An introduction to outlier analysis. Springer, New York.
[2]
Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. ACM Sigmod Record, ACM, vol 30, pp 37-46.
[3]
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, VLDB '94, pp 487-499.
[4]
Angiulli F, Fassetti F, Palopoli L (2009) Detecting outlying properties of exceptional objects. ACM Trans Database Syst 34(1):7:1-7:62.
[5]
Angiulli F, Fassetti F, Palopoli L, Manco G (2013) Outlying property detection with numerical attributes. CoRR abs/1306.3558.
[6]
Bache K, Lichman M (2013) UCI machine learning repository.
[7]
Bhaduri K, Matthews BL, Giannella CR (2011) Algorithms for speeding up distance-based outlier detection. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD '11, pp 859-867.
[8]
Böhm K, Keller F, Müller E, Nguyen HV, Vreeken J (2013) CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: Proceedings of the 13th SIAM international conference on data mining, SDM '13, pp 198-206.
[9]
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD '00, pp 93-104.
[10]
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1-15:58.
[11]
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco.
[12]
Härdle W (1990) Smoothing techniques: with implementations in S. Springer, New York.
[13]
Härdle W, Werwatz A, Müller M, Sperlich S (2004) Nonparametric and semiparametric modelss., Springer Series in Statistics Springer, Berlin.
[14]
He Z, Xu X, Huang ZJ, Deng S (2005) FP-outlier: frequent pattern based outlier detection. Comput Sci Inf Syst/ComSIS 2(1):103-118.
[15]
Keller F, Müller E, Böhm K (2012) HiCS: high contrast subspaces for density-based outlier ranking. In: Proceedings of the 28th international conference on data engineering, ICDE '12, pp 1037-1048.
[16]
Knorr EM, Ng RT (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the 25th international conference on very large data bases, VLDB '99, pp 211-222.
[17]
Kriegel HP, Schubert M, Zimek A (2008) Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD '08, pp 444-452.
[18]
Kriegel HP, Kröger P, Schubert E, Zimek A (2009) Outlier detection in axis-parallel subspaces of high dimensional data. In: Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD '09, pp 831-838.
[19]
Müller E, Schiffer M, Seidl T (2011) Statistical selection of relevant subspace projections for outlier ranking. In: Proceedings of the 27th IEEE international conference on data engineering, ICDE '11, pp 434-445.
[20]
Müller E, Assent I, Iglesias P, Mülle Y, Böhm K (2012a) Outlier ranking via subspace analysis in multiple views of the data. In: Proceedings of the 12th IEEE international conference on data mining, ICDM '12, pp 529-538.
[21]
Müller E, Keller F, Blanc S, Böhm K (2012b) OutRules: a framework for outlier descriptions in multiple context spaces. In: ECML/PKDD (2), pp 828-832.
[22]
Paravastu R, Kumar H, Pudi V (2008) Uniqueness mining. In: Proceedings of the 13th international conference on database systems for advanced applications, DASFAA '08, pp 84-94.
[23]
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD '00, pp 427-438.
[24]
Rymon R (1992) Search through systematic set enumeration. In: Proceedings of the 3rd international conference on principle of knowledge representation and reasoning, KR '92, pp 539-550.
[25]
Scott DW (1992) Multivariate density estimation: theory, practice, and visualization., Wiley Series in Probability and Statistics Wiley, New York.
[26]
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall/CRC, London.
[27]
Tang G, Bailey J, Pei J, Dong G (2013) Mining multidimensional contextual outliers from categorical relational data. In: Proceedings of the 25th international conference on scientific and statistical database management, SSDBM '13, pp 43:1-43:4.
[28]
Zimek A, Schubert E, Kriegel HP (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min 5(5):363-387.

Cited By

View all
  1. Mining outlying aspects on numeric data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Data Mining and Knowledge Discovery
    Data Mining and Knowledge Discovery  Volume 29, Issue 5
    September 2015
    392 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 September 2015

    Author Tags

    1. Kernel density estimation
    2. Outlying aspect
    3. Outlyingness degree
    4. Subspace search

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 04 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media