Distributed Nonparametric Regression under Communication Constraints

Yuancheng Zhu, John Lafferty
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:6009-6017, 2018.

Abstract

This paper studies the problem of nonparametric estimation of a smooth function with data distributed across multiple machines. We assume an independent sample from a white noise model is collected at each machine, and an estimator of the underlying true function needs to be constructed at a central machine. We place limits on the number of bits that each machine can use to transmit information to the central machine. Our results give both asymptotic lower bounds and matching upper bounds on the statistical risk under various settings. We identify three regimes, depending on the relationship among the number of machines, the size of data available at each machine, and the communication budget. When the communication budget is small, the statistical risk depends solely on this communication bottleneck, regardless of the sample size. In the regime where the communication budget is large, the classic minimax risk in the non-distributed estimation setting is recovered. In an intermediate regime, the statistical risk depends on both the sample size and the communication budget.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-zhu18a, title = {Distributed Nonparametric Regression under Communication Constraints}, author = {Zhu, Yuancheng and Lafferty, John}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {6009--6017}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/zhu18a/zhu18a.pdf}, url = {https://proceedings.mlr.press/v80/zhu18a.html}, abstract = {This paper studies the problem of nonparametric estimation of a smooth function with data distributed across multiple machines. We assume an independent sample from a white noise model is collected at each machine, and an estimator of the underlying true function needs to be constructed at a central machine. We place limits on the number of bits that each machine can use to transmit information to the central machine. Our results give both asymptotic lower bounds and matching upper bounds on the statistical risk under various settings. We identify three regimes, depending on the relationship among the number of machines, the size of data available at each machine, and the communication budget. When the communication budget is small, the statistical risk depends solely on this communication bottleneck, regardless of the sample size. In the regime where the communication budget is large, the classic minimax risk in the non-distributed estimation setting is recovered. In an intermediate regime, the statistical risk depends on both the sample size and the communication budget.} }
Endnote
%0 Conference Paper %T Distributed Nonparametric Regression under Communication Constraints %A Yuancheng Zhu %A John Lafferty %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-zhu18a %I PMLR %P 6009--6017 %U https://proceedings.mlr.press/v80/zhu18a.html %V 80 %X This paper studies the problem of nonparametric estimation of a smooth function with data distributed across multiple machines. We assume an independent sample from a white noise model is collected at each machine, and an estimator of the underlying true function needs to be constructed at a central machine. We place limits on the number of bits that each machine can use to transmit information to the central machine. Our results give both asymptotic lower bounds and matching upper bounds on the statistical risk under various settings. We identify three regimes, depending on the relationship among the number of machines, the size of data available at each machine, and the communication budget. When the communication budget is small, the statistical risk depends solely on this communication bottleneck, regardless of the sample size. In the regime where the communication budget is large, the classic minimax risk in the non-distributed estimation setting is recovered. In an intermediate regime, the statistical risk depends on both the sample size and the communication budget.
APA
Zhu, Y. & Lafferty, J.. (2018). Distributed Nonparametric Regression under Communication Constraints. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:6009-6017 Available from https://proceedings.mlr.press/v80/zhu18a.html.

Related Material