skip to main content
article

Outlier detection for high dimensional data

Published: 01 May 2001 Publication History

Abstract

The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are high dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. However, in high dimensional space, the data is sparse and the notion of proximity fails to retain its meaningfulness. In fact, the sparsity of high dimensional data implies that every point is an almost equally good outlier from the perspective of proximity-based definitions. Consequently, for high dimensional data, the notion of finding meaningful outliers becomes substantially more complex and non-obvious. In this paper, we discuss new techniques for outlier detection which find the outliers by studying the behavior of projections from the data set.

References

[1]
C. C. Aggarwal. Re-designing Distance Functions and Distance Based Applications for High Dimensional Data. ACM SIGMOD Record, March 2001.]]
[2]
C. C. Aggarwal et al. Fast Algorithms for Projected Clustering. ACM SIGMOD Conference Proceedings, 1999.]]
[3]
C. C. Aggarwal, P. Yu. Finding Generalized Projected Clusters in High Dimensional Spaces. ACM SIGMOD Conference Proceedings, 2000.]]
[4]
C. C. Aggarwal, J. B. Orlin, R. P. Tai. Optimized Crossover for the Independent Set Problem. Operations Research 45(2), March 1997.]]
[5]
R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. ACM SIGMOD Conference Proceedings, 1998.]]
[6]
R. Agrawal, T. Imielinski, A. Swami. Mining Association Rules Between Sets of Items in Large Databases. ACM SIGMOD Conference Proceedings, 1993.]]
[7]
A. Arning, R. Agrawal, P. Raghavan. A Linear Method for Deviation Detection in Large Databases. KDD Conference Proceedings, 1995.]]
[8]
V. Barnett, T. Lewis. Outliers in Statistical Data. John Wiley and Sons, NY 1994.]]
[9]
K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft. When is Nearest Neighbors Meaningful? ICDT Conference Proceedings, 1999.]]
[10]
M. M. Breunig, H.-P. Kriegel, R. T. Ng, J. Sander. LOF: Identifying Density-Based Local Outliers. ACM SIGMOD Conference Proceedings, 2000.]]
[11]
K. Chakrabarti, S. Mehrotra. Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB Conference Proceedings, 2000.]]
[12]
C. Darwin. The Origin of the Species by Natural Selection. Published, 1859.]]
[13]
D. Hawkins. Identification of Outliers, Chapman and Hall, London, 1980.]]
[14]
K. A. De Jong. Analysis of the Behavior of a Class of Genetic Adaptive Systems. Ph. D. Dissertation, University of Michigan, Ann Arbor, MI, 1975.]]
[15]
M. Ester, H.-P. Kriegel, J. Sander, X. Xu. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD Conference Proceedings, 1996.]]
[16]
J. J. Grefenstette. Genesis Software Version 5.0. Available at http://www.santafe.edu.]]
[17]
D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley, Reading, MA, 1989.]]
[18]
S. Guha, R. Rastogi, K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. ACM SIGMOD Conference Proceedings, 1998.]]
[19]
A. Hinneburg, C. C. Aggarwal, D. A. Keim. What is the nearest neighbor in high dimensional spaces? VLDB Conference Proceedings, 2000.]]
[20]
J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor MI 1975.]]
[21]
S. Kirkpatrick, C. D. Gelatt, M. P. Vecchi. Optimization by Simulated Annealing. Science (220) (4589): pages 671-680, 1983.]]
[22]
E. Knorr, R. Ng. Algorithms for Mining Distance-based Outliers in Large Data Sets. VLDB Conference Proceedings, September 1998.]]
[23]
E. Knorr, R. Ng. Finding Intensional Knowledge of Distance-based Outliers. VLDB Conference Proceedings, 1999.]]
[24]
R. Ng, J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB Conference Proceedings, pages 144-155, 1994.]]
[25]
S. Ramaswamy, R. Rastogi, K. Shim. Efficient Algorithms for Mining Outliers from Large Data Sets. ACM SIGMOD Conference Proceedings, 2000.]]
[26]
S. Sarawagi, R. Agrawal, N. Meggido. Discovery Driven Exploration of OLAP Data Cubes. EDBT Conference Proceedings, 1998.]]
[27]
T. Zhang, R. Ramakrishnan, M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. ACM SIGMOD Conference Proceedings, 1996.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 30, Issue 2
June 2001
625 pages
ISSN:0163-5808
DOI:10.1145/376284
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
    May 2001
    630 pages
    ISBN:1581133324
    DOI:10.1145/375663
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001
Published in SIGMOD Volume 30, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)280
  • Downloads (Last 6 weeks)34
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media