Canopy Fast Sampling with Cover Trees

Manzil Zaheer, Satwik Kottur, Amr Ahmed, José Moura, Alex Smola
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3977-3986, 2017.

Abstract

Hierarchical Bayesian models often capture distributions over a very large number of distinct atoms. The need for these models arises when organizing huge amount of unsupervised data, for instance, features extracted using deep convnets that can be exploited to organize abundant unlabeled images. Inference for hierarchical Bayesian models in such cases can be rather nontrivial, leading to approximate approaches. In this work, we propose Canopy, a sampler based on Cover Trees that is exact, has guaranteed runtime logarithmic in the number of atoms, and is provably polynomial in the inherent dimensionality of the underlying parameter space. In other words, the algorithm is as fast as search over a hierarchical data structure. We provide theory for Canopy and demonstrate its effectiveness on both synthetic and real datasets, consisting of over 100 million images.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-zaheer17b, title = {Canopy Fast Sampling with Cover Trees}, author = {Manzil Zaheer and Satwik Kottur and Amr Ahmed and Jos{\'e} Moura and Alex Smola}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {3977--3986}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/zaheer17b/zaheer17b.pdf}, url = {https://proceedings.mlr.press/v70/zaheer17b.html}, abstract = {Hierarchical Bayesian models often capture distributions over a very large number of distinct atoms. The need for these models arises when organizing huge amount of unsupervised data, for instance, features extracted using deep convnets that can be exploited to organize abundant unlabeled images. Inference for hierarchical Bayesian models in such cases can be rather nontrivial, leading to approximate approaches. In this work, we propose Canopy, a sampler based on Cover Trees that is exact, has guaranteed runtime logarithmic in the number of atoms, and is provably polynomial in the inherent dimensionality of the underlying parameter space. In other words, the algorithm is as fast as search over a hierarchical data structure. We provide theory for Canopy and demonstrate its effectiveness on both synthetic and real datasets, consisting of over 100 million images.} }
Endnote
%0 Conference Paper %T Canopy Fast Sampling with Cover Trees %A Manzil Zaheer %A Satwik Kottur %A Amr Ahmed %A José Moura %A Alex Smola %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-zaheer17b %I PMLR %P 3977--3986 %U https://proceedings.mlr.press/v70/zaheer17b.html %V 70 %X Hierarchical Bayesian models often capture distributions over a very large number of distinct atoms. The need for these models arises when organizing huge amount of unsupervised data, for instance, features extracted using deep convnets that can be exploited to organize abundant unlabeled images. Inference for hierarchical Bayesian models in such cases can be rather nontrivial, leading to approximate approaches. In this work, we propose Canopy, a sampler based on Cover Trees that is exact, has guaranteed runtime logarithmic in the number of atoms, and is provably polynomial in the inherent dimensionality of the underlying parameter space. In other words, the algorithm is as fast as search over a hierarchical data structure. We provide theory for Canopy and demonstrate its effectiveness on both synthetic and real datasets, consisting of over 100 million images.
APA
Zaheer, M., Kottur, S., Ahmed, A., Moura, J. & Smola, A.. (2017). Canopy Fast Sampling with Cover Trees. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:3977-3986 Available from https://proceedings.mlr.press/v70/zaheer17b.html.

Related Material