skip to main content
10.1145/3188745.3188758acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Public Access

List-decodable robust mean estimation and learning mixtures of spherical gaussians

Published: 20 June 2018 Publication History

Abstract

We study the problem of list-decodable (robust) Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. In the former problem, we are given a set T of points in n with the promise that an α-fraction of points in T, where 0< α < 1/2, are drawn from an unknown mean identity covariance Gaussian G, and no assumptions are made about the remaining points. The goal is to output a small list of candidate vectors with the guarantee that at least one of the candidates is close to the mean of G. In the latter problem, we are given samples from a k-mixture of spherical Gaussians on n and the goal is to estimate the unknown model parameters up to small accuracy. We develop a set of techniques that yield new efficient algorithms with significantly improved guarantees for these problems. Specifically, our main contributions are as follows:
List-Decodable Mean Estimation. Fix any d+ and 0< α <1/2. We design an algorithm with sample complexity Od ((nd/α)) and runtime Od ((n/α)d) that outputs a list of O(1/α) many candidate vectors such that with high probability one of the candidates is within ℓ2-distance Od−1/(2d)) from the mean of G. The only previous algorithm for this problem achieved error Õ(α−1/2) under second moment conditions. For d = O(1/), where >0 is a constant, our algorithm runs in polynomial time and achieves error O(α). For d = Θ(log(1/α)), our algorithm runs in time (n/α)O(log(1/α)) and achieves error O(log3/2(1/α)), almost matching the information-theoretically optimal bound of Θ(log1/2(1/α)) that we establish. We also give a Statistical Query (SQ) lower bound suggesting that the complexity of our algorithm is qualitatively close to best possible.
Learning Mixtures of Spherical Gaussians. We give a learning algorithm for mixtures of spherical Gaussians, with unknown spherical covariances, that succeeds under significantly weaker separation assumptions compared to prior work. For the prototypical case of a uniform k-mixture of identity covariance Gaussians we obtain the following: For any >0, if the pairwise separation between the means is at least Ω(k+√log(1/δ)), our algorithm learns the unknown parameters within accuracy δ with sample complexity and running time (n, 1/δ, (k/)1/). Moreover, our algorithm is robust to a small dimension-independent fraction of corrupted data. The previously best known polynomial time algorithm required separation at least k1/4 (k/δ). Finally, our algorithm works under separation of Õ(log3/2(k)+√log(1/δ)) with sample complexity and running time (n, 1/δ, klogk). This bound is close to the information-theoretically minimum separation of Ω(√logk).
Our main technical contribution is a new technique, using degree-d multivariate polynomials, to remove outliers from high-dimensional datasets where the majority of the points are corrupted.

Supplementary Material

MP4 File (7b-2.mp4)

References

[1]
J. Acharya, I. Diakonikolas, J. Li, and L. Schmidt. 2017. Sample-Optimal Density Estimation in Nearly-Linear Time. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017. 1278–1289.
[2]
D. Achlioptas and F. McSherry. 2005.
[3]
On Spectral Learning of Mixtures of Distributions. In Proceedings of the Eighteenth Annual Conference on Learning Theory (COLT). 458–469.
[4]
J. Anderson, M. Belkin, N. Goyal, L. Rademacher, and J. R. Voss. 2014. The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures. In Proceedings of The 27th Conference on Learning Theory, COLT 2014. 1135–1164.
[5]
S. Arora and R. Kannan. 2001.
[6]
Learning mixtures of arbitrary Gaussians. In Proceedings of the 33rd Symposium on Theory of Computing. 247–257.
[7]
S. Balakrishnan, S. S. Du, J. Li, and A. Singh. 2017. Computationally Efficient Robust Sparse Estimation in High Dimensions. In Proceedings of the 30th Conference on Learning Theory, COLT 2017. 169–212.
[8]
M.-F. Balcan, A. Blum, and S. Vempala. 2008. A discriminative framework for clustering via similarity functions. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing. 671–680.
[9]
M. Belkin and K. Sinha. 2010. Polynomial Learning of Distribution Families. In FOCS. 103–112.
[10]
T. Bernholt. 2006.
[11]
Robust Estimators are Hard to Compute. Technical Report. University of Dortmund, Germany.
[12]
A. Bhaskara, M. Charikar, A. Moitra, and A. Vijayaraghavan. 2014. Smoothed analysis of tensor decompositions. In Symposium on Theory of Computing, STOC 2014. 594–603.
[13]
S. C. Brubaker and S. Vempala. 2008. Isotropic PCA and Affine-Invariant Clustering. In Proc. 49th IEEE Symposium on Foundations of Computer Science. 551–560.
[14]
M. Charikar, J. Steinhardt, and G. Valiant. 2017. Learning from untrusted data. In Proceedings of STOC 2017. 47–60.
[15]
S. Dasgupta. 1999.
[16]
Learning mixtures of Gaussians. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science. 634–644.
[17]
I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. 2016.
[18]
Robust Estimators in High Dimensions without the Computational Intractability. In Proceedings of FOCS’16. 655–664.
[19]
I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. 2017.
[20]
Being Robust (in High Dimensions) Can Be Practical. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017. 999–1008.
[21]
I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. 2017. Robustly Learning a Gaussian: Getting Optimal Error, Efficiently. CoRR abs/1704.03866 (2017).
[22]
https://arxiv.org/abs/1704.03866 To appear in SODA’18.
[23]
I. Diakonikolas, D. M. Kane, and A. Stewart. 2016.
[24]
Robust Learning of Fixed-Structure Bayesian Networks. CoRR abs/1606.07384 (2016).
[25]
I. Diakonikolas, D. M. Kane, and A. Stewart. 2016. Statistical Query Lower Bounds for Robust Estimation of High-dimensional Gaussians and Gaussian Mixtures. CoRR abs/1611.03473 (2016). http://arxiv.org/abs/1611.03473 In Proceedings of FOCS’17.
[26]
I. Diakonikolas, D. M. Kane, and A. Stewart. 2017. Learning Geometric Concepts with Nasty Noise. CoRR abs/1707.01242 (2017). http://arxiv.org/abs/1707.01242
[27]
I. Diakonikolas, D. M. Kane, and A. Stewart. 2017. List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians. CoRR abs/1711.07211 (2017). http://arxiv.org/abs/1711.07211
[28]
J. Feldman, R. O’Donnell, and R. Servedio. 2006.
[29]
PAC Learning Mixtures of Gaussians with No Separation Assumption. In Proc. 19th Annual Conference on Learning Theory (COLT). 20–34.
[30]
V. Feldman. 2017. A General Characterization of the Statistical Query Complexity. In Proceedings of the 30th Conference on Learning Theory, COLT 2017. 785–830.
[31]
V. Feldman, E. Grigorescu, L. Reyzin, S. Vempala, and Y. Xiao. 2013. Statistical algorithms and a lower bound for detecting planted cliques. In Proceedings of STOC’13. 655–664.
[32]
V. Feldman, C. Guzman, and S. Vempala. 2017. Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’17). 1265–1277.
[33]
V. Feldman, W. Perkins, and S. Vempala. 2015. On the Complexity of Random Satisfiability Problems with Planted Solutions. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC, 2015. 77–86.
[34]
R. Ge, Q. Huang, and S. M. Kakade. 2015. Learning Mixtures of Gaussians in High Dimensions. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015. 761–770.
[35]
N. Goyal, S. Vempala, and Y. Xiao. 2014. Fourier PCA and robust tensor decomposition. In Symposium on Theory of Computing, STOC 2014. 584–593.
[36]
F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel. 1986.
[37]
Robust statistics. The approach based on influence functions. Wiley New York.
[38]
M. Hardt and E. Price. 2015.
[39]
Tight Bounds for Learning a Mixture of Two Gaussians. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015. 753–760.
[40]
S. B. Hopkins and J. Li. 2017. Mixture Models, Robustness, and Sum of Squares Proofs. CoRR abs/1711.07454 (2017). arXiv: 1711.07454 http://arxiv.org/abs/1711.
[41]
07454 In STOC’18.
[42]
D. Hsu and S. M. Kakade. 2013. Learning mixtures of spherical gaussians: moment methods and spectral decompositions. In Innovations in Theoretical Computer Science, ITCS ’13. 11–20.
[43]
P.J. Huber and E. M. Ronchetti. 2009.
[44]
Robust statistics. Wiley New York.
[45]
P. J. Huber. 1964.
[46]
Robust estimation of a location parameter. The Annals of Mathematical Statistics 35, 1 (1964), 73–101.
[47]
S. Janson. 1997.
[48]
Gaussian Hilbert Spaces. Cambridge University Press, Cambridge, UK.
[49]
R. Kannan, H. Salmasian, and S. Vempala. 2008. The Spectral Method for General Mixture Models. SIAM J. Comput. 38, 3 (2008), 1141–1156.
[50]
M. Kearns. 1998. Efficient noise-tolerant Learning from statistical queries. JACM 45, 6 (1998), 983–1006.
[51]
P. Kothari, J. Steinhardt, and D. Steurer. {n. d.}. Robust Moment Estimation and Improved Clustering via Sum of Squares. ({n. d.}). In STOC’18.
[52]
K. A. Lai, A. B. Rao, and S. Vempala. 2016.
[53]
Agnostic Estimation of Mean and Covariance. In Proceedings of FOCS’16.
[54]
J. Li and L. Schmidt. 2017. Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities. In Proceedings of the 30th Conference on Learning Theory, COLT 2017. 1302–1382.
[55]
A. Moitra and G. Valiant. 2010. Settling the polynomial learnability of mixtures of Gaussians. In FOCS. 93–102.
[56]
K. Pearson. 1894. Contribution to the mathematical theory of evolution. Phil. Trans. Roy. Soc. A 185 (1894), 71–110.
[57]
O. Regev and A. Vijjayraghavan. 2017.
[58]
On Learning Mixtures of Well-Separated Gaussians. In Proceedings of FOCS’17. Full version available at https://arxiv.org/abs/1710.11592.
[59]
P. J Rousseeuw and A. M Leroy. 2005.
[60]
Robust regression and outlier detection. Vol. 589. John Wiley &amp; Sons.
[61]
J. Steinhardt, M. Charikar, and G. Valiant. 2017.
[62]
Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers. CoRR abs/1703.04940 (2017). http://arxiv.org/abs/1703.04940
[63]
J. Steinhardt, P. W. Koh, and P. Liang. 2017. Certified Defenses for Data Poisoning Attacks. CoRR abs/1706.03691 (2017). http://arxiv.org/abs/1706.03691 To appear in NIPS 2017.
[64]
J. Steinhardt, G. Valiant, and M. Charikar. 2016. Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction. In NIPS. 4439–4447.
[65]
A. T. Suresh, A. Orlitsky, J. Acharya, and A. Jafarpour. 2014. Near-Optimal-Sample Estimators for Spherical Gaussian Mixtures. In Advances in Neural Information Processing Systems (NIPS). 1395–1403.
[66]
J.W. Tukey. 1960. A survey of sampling from contaminated distributions. Contributions to probability and statistics 2 (1960), 448–485.
[67]
J.W. Tukey. 1975. Mathematics and picturing of data. In Proceedings of ICM, Vol. 6. 523–531.
[68]
S. Vempala and G. Wang. 2002. A Spectral Algorithm for learning mixtures of distributions. In Proceedings of the 43rd Annual Symposium on Foundations of Computer Science. 113–122.

Cited By

View all

Index Terms

  1. List-decodable robust mean estimation and learning mixtures of spherical gaussians

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing
    June 2018
    1332 pages
    ISBN:9781450355599
    DOI:10.1145/3188745
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 June 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. learning mixtures of spherical Gaussians
    2. list-decodable learning
    3. robust statistics

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    STOC '18
    Sponsor:
    STOC '18: Symposium on Theory of Computing
    June 25 - 29, 2018
    CA, Los Angeles, USA

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Upcoming Conference

    STOC '25
    57th Annual ACM Symposium on Theory of Computing (STOC 2025)
    June 23 - 27, 2025
    Prague , Czech Republic

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)119
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media