Abstract
We study the problem of list-decodable (robust) Gaussian mean estimation and the related problem of learning mixtures of separated spherical Gaussians. In the former problem, we are given a set T of points in n with the promise that an α-fraction of points in T, where 0< α < 1/2, are drawn from an unknown mean identity covariance Gaussian G, and no assumptions are made about the remaining points. The goal is to output a small list of candidate vectors with the guarantee that at least one of the candidates is close to the mean of G. In the latter problem, we are given samples from a k-mixture of spherical Gaussians on n and the goal is to estimate the unknown model parameters up to small accuracy. We develop a set of techniques that yield new efficient algorithms with significantly improved guarantees for these problems. Specifically, our main contributions are as follows:
List-Decodable Mean Estimation. Fix any d ∈ + and 0< α <1/2. We design an algorithm with sample complexity Od ((nd/α)) and runtime Od ((n/α)d) that outputs a list of O(1/α) many candidate vectors such that with high probability one of the candidates is within ℓ2-distance Od(α−1/(2d)) from the mean of G. The only previous algorithm for this problem achieved error Õ(α−1/2) under second moment conditions. For d = O(1/), where >0 is a constant, our algorithm runs in polynomial time and achieves error O(α). For d = Θ(log(1/α)), our algorithm runs in time (n/α)O(log(1/α)) and achieves error O(log3/2(1/α)), almost matching the information-theoretically optimal bound of Θ(log1/2(1/α)) that we establish. We also give a Statistical Query (SQ) lower bound suggesting that the complexity of our algorithm is qualitatively close to best possible.
Learning Mixtures of Spherical Gaussians. We give a learning algorithm for mixtures of spherical Gaussians, with unknown spherical covariances, that succeeds under significantly weaker separation assumptions compared to prior work. For the prototypical case of a uniform k-mixture of identity covariance Gaussians we obtain the following: For any >0, if the pairwise separation between the means is at least Ω(k+√log(1/δ)), our algorithm learns the unknown parameters within accuracy δ with sample complexity and running time (n, 1/δ, (k/)1/). Moreover, our algorithm is robust to a small dimension-independent fraction of corrupted data. The previously best known polynomial time algorithm required separation at least k1/4 (k/δ). Finally, our algorithm works under separation of Õ(log3/2(k)+√log(1/δ)) with sample complexity and running time (n, 1/δ, klogk). This bound is close to the information-theoretically minimum separation of Ω(√logk).
Our main technical contribution is a new technique, using degree-d multivariate polynomials, to remove outliers from high-dimensional datasets where the majority of the points are corrupted.