skip to main content
10.1145/2020408.2020509acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Online heterogeneous mixture modeling with marginal and copula selection

Published: 21 August 2011 Publication History

Abstract

This paper proposes an online mixture modeling methodology in which individual components can have different marginal distributions and dependency structures. Mixture models have been widely studied and applied to various application areas, including density estimation, fraud/failure detection, image segmentation, etc. Previous research has been almost exclusively focused on mixture models having components of a single type (e.g., a Gaussian mixture model.) However, recent growing needs for complicated data modeling necessitate the use of more flexible mixture models (e.g., a mixture of a lognormal distribution for medical costs and a Gaussian distribution for blood pressure, for medical analytics.) Our key ideas include: 1) separating marginal distributions and their dependencies using copulas and 2) online extension of a recently-developed "expectation minimization of description length," which enable us to efficiently learn types of both marginal distributions and copulas as well as their parameters. The proposed method provides not only good performance in applications, but also scalable, automatic model selection, which greatly reduces the intensive modeling costs in data mining processes. We show that the proposed method outperforms state-of-the-art methods in application to density estimation and to anomaly detection.

References

[1]
H. Akaike. Information theory and an extension of the maximum likelihood principle. In B. N. Petrov and F. Caski, editors, Proceedings of the 2nd International Symposium on Information Theory, pages 267--281, 1973.
[2]
M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. LOF: identifying density-based local outliers. Sigmod Record, 29(2):93--104, 2000.
[3]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/ cjlin/libsvm, 2001.
[4]
A. Corduneanu and C. Bishop. Variational bayesian model selection for mixture distributions. In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, 2001.
[5]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from imcomplete data via the em algorithm. Journal of the Royal Statistical Society, B39(1):1--38, 1977.
[6]
P. Diehr, D. Yanez, A. S. Ash, M. Hornbrook, and D. Y. Lin. Methods for analyzing health care utilization and costs. Annual Review of Public Health, 20:125--144, 1999.
[7]
G. Elidan. Copula bayesian networks. In Advances in Neural Information Processing Systems 23, pages 559--567, 2010.
[8]
A. Frank and A. Asuncion. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2010. University of California, Irvine, School of Information and Computer Sciences.
[9]
R. Fujimaki, S. Morinaga, M. Momma, K. Aoki, and T. Nakata. Linear time model selection for mixture of heterogeneous components. In Proceedings of the 1st Asian Conference on Machine Learning, pages 82--97, 2009.
[10]
H. Joe, editor. Multivariate Models and Dependence Concepts. Chapman & Hall, 1997.
[11]
M. I. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181--214, 1994.
[12]
K. Kurihara, M. Welling, and N. Vlassis. Accelerated variational dirichlet mixture models. In Advances in Neural Information Processing Systems 19, pages 761--768. MIT Press, 2006.
[13]
D. X. Li. On default correlation: A copula function approach. Journal of Fixed Income, 9(4):43--54, 1999.
[14]
F. T. Liu, K. M. Ting, and Z. H. Zhou. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 413--422, 2008.
[15]
S. Morinaga and K. Yamanishi. Tracking dynamics of topic trends using a finite mixture model. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 811--816. ACM Press, 2004.
[16]
R. Neal and G. E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models, pages 355--368, 1998.
[17]
R. B. Nelsen, editor. An Introduction to Copulas. Springer Series in Statistics, 2006.
[18]
C. E. Rasmussen. The infinite gaussian mixture model. In In Advances in Neural Information Processing Systems 12, pages 554--560, 2000.
[19]
G. Rätsch, T. Onoda, and K. Müller. Soft margins for AdaBoost. Machine Learning, 42(3):287--320, 2001.
[20]
D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 10(1):19--41, 2000.
[21]
J. Rissanen. Modeling by shortest data description. Automatica, 14:465--471, 1978.
[22]
M. Sato. On-line model selection based on the variational bayes. Neural Computation, 13:1649--1681, 2001.
[23]
B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural computation, 13(7):1443--1471, 2001.
[24]
A. Sklar. Fonctions de repartition a n dimensions et leurs marges. Technical report, Publications de l'Institut de Statistique de L'Universite de Paris, 1959.
[25]
C. Stauffer and W. E. L. Grimson. Adaptive background mixture models for real-time tracking. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 246--252, 1999.
[26]
A. Wilson and Z. Ghahramani. Copula processes. In Advances in Neural Information Processing Systems 23, pages 2460--2468, 2010.
[27]
K. Yamanishi, J. ichi Takeuchi, G. Williams, and P. Milne. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery, 8(3):275--300, 2004.
[28]
K. Yamanishi and Y. Maruyama. Dynamic model selection with its applications to novelty detection. IEEE Transactions on Information Theory, 53:2180--2189, 2007.

Cited By

View all

Index Terms

  1. Online heterogeneous mixture modeling with marginal and copula selection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2011
      1446 pages
      ISBN:9781450308137
      DOI:10.1145/2020408
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 August 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. copula
      2. expectation minimization of description length
      3. heterogeneous mixture model
      4. online model selection

      Qualifiers

      • Research-article

      Conference

      KDD '11
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media