Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes

Ashtiani, Hassan; Ben-David, Shai; Harvey, Nick; Liaw, Christopher; Mehrabian, Abbas; Plan, Yaniv

Computer Science > Machine Learning

arXiv:1710.05209 (cs)

[Submitted on 14 Oct 2017 (v1), last revised 21 Jul 2020 (this version, v5)]

Title:Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes

Authors:Hassan Ashtiani, Shai Ben-David, Nick Harvey, Christopher Liaw, Abbas Mehrabian, Yaniv Plan

View PDF

Abstract:We prove that $\tilde{\Theta}(k d^2 / \varepsilon^2)$ samples are necessary and sufficient for learning a mixture of $k$ Gaussians in $\mathbb{R}^d$, up to error $\varepsilon$ in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that $\tilde{O}(k d / \varepsilon^2)$ samples suffice, matching a known lower bound. Moreover, these results hold in the agnostic-learning/robust-estimation setting as well, where the target distribution is only approximately a mixture of Gaussians.
The upper bound is shown using a novel technique for distribution learning based on a notion of `compression.' Any class of distributions that allows such a compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in $\mathbb{R}^d$ admits a small-sized compression scheme.

Comments:	To appear in Journal of the ACM. 46 pages. An extended abstract appeared in NeurIPS 2018. This version contains all the proofs, generalizes the results to agnostic learning, and improves the bounds by logarithmic factors
Subjects:	Machine Learning (cs.LG); Statistics Theory (math.ST)
Cite as:	arXiv:1710.05209 [cs.LG]
	(or arXiv:1710.05209v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1710.05209

Submission history

From: Abbas Mehrabian [view email]
[v1] Sat, 14 Oct 2017 16:39:24 UTC (18 KB)
[v2] Fri, 16 Feb 2018 18:40:00 UTC (59 KB)
[v3] Thu, 29 Nov 2018 07:59:54 UTC (46 KB)
[v4] Sat, 20 Jul 2019 16:25:31 UTC (64 KB)
[v5] Tue, 21 Jul 2020 19:48:26 UTC (139 KB)

Computer Science > Machine Learning

Title:Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators