skip to main content
article
Free access

Minimax estimation of kernel mean embeddings

Published: 01 January 2017 Publication History

Abstract

In this paper, we study the minimax estimation of the Bochner integral µk(P):= ∫χk(ċx)dP(x) also called as the kernel mean embedding, based on random samples drawn i.i.d. from P, where k : χ×χ → R is a positive de_nite kernel. Various estimators (including the empirical estimator), Θn of µk(P) are studied in the literature wherein all of them satisfy ||Θnk(P)||Hk = OP (n-1/2) with Hk being the reproducing kernel Hilbert space induced by k. The main contribution of the paper is in showing that the above mentioned rate of n-1/2 is minimax in ||ċ||Hk and ||ċ|| L2(Rd)-norms over the class of discrete measures and the class of measures that has an in_nitely di_erentiable density, with k being a continuous translation-invariant kernel on Rd. The interesting aspect of this result is that the minimax rate is independent of the smoothness of the kernel and the density of P (if it exists).

References

[1]
N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68:337-404, 1950.
[2]
V. I. Bogachev. Measure Theory, volume 1. Springer, 2007.
[3]
J. Diestel and J. J. Uhl. Vector Measures. American Mathematical Society, Providence, 1977.
[4]
N. Dinculeanu. Vector Integration and Stochastic Integration in Banach Spaces. Wiley, 2000.
[5]
R. M. Dudley. Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1999.
[6]
R. M. Dudley. Real Analysis and Probability. Cambridge University Press, 2002.
[7]
G. B. Folland. Real Analysis: Modern Techniques and Their Applications. Wiley, 1999.
[8]
K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf. Kernel measures of conditional dependence. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 489-496, Cambridge, MA, 2008. MIT Press.
[9]
K. Fukumizu, L. Song, and A. Gretton. Kernel Bayes' rule: Bayesian inference with positive de_nite kernels. J. Mach. Learn. Res., 14:3753-3783, 2013.
[10]
I. S. Gradshteyn and I. M. Ryzhik. Table of Integrals, Series, and Products. Academic Press, San Diego, USA, 2000.
[11]
A. Gretton, K. M. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola. A kernel method for the two sample problem. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 513-520, Cambridge, MA, 2007. MIT Press.
[12]
A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schölkopf, and A. J. Smola. A kernel statistical test of independence. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 585-592. MIT Press, 2008.
[13]
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. J. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723-773, 2012.
[14]
E. L. Lehmann and G. Casella. Theory of Point Estimation. Springer-Verlag, New York, 2008.
[15]
D. Lopez-Paz, K. Muandet, B. Schölkopf, and I. Tolstikhin. Towards a learning theory of cause-e_ect inference. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, 2015.
[16]
K. Muandet, B. Sriperumbudur, K. Fukumizu, A. Gretton, and B. Schölkopf. Kernel mean shrinkage estimators. Journal of Machine Learning Research, 2016. To appear.
[17]
A. Ramdas, S. Reddi, B. Poczos, A. Singh, and L. Wasserman. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. In AAAI Conference on Artificial Intelligence, 2015.
[18]
I. J. Schoenberg. Metric spaces and completely monotone functions. The Annals of Mathematics, 39(4):811-841, 1938.
[19]
A. J. Smola, A. Gretton, L. Song, and B. Schölkopf. A Hilbert space embedding for distributions. In Proceedings of the 18th International Conference on Algorithmic Learning Theory (ALT), pages 13-31. Springer-Verlag, 2007.
[20]
L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt. Feature selection via dependence maximization. Journal of Machine Learning Research, 13:1393-1434, 2012.
[21]
B. K. Sriperumbudur. Mixture density estimation via Hilbert space embedding of measures. In Proceedings of International Symposium on Information Theory, pages 1027-1030, 2011.
[22]
B. K. Sriperumbudur. On the optimal estimation of probability measures in weak and strong topologies. Bernoulli, 22(3):1839-1893, 2016.
[23]
B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res., 11:1517-1561, 2010.
[24]
B. K. Sriperumbudur, K. Fukumizu, and G. R. G. Lanckriet. Universality, characteristic kernels and rkhs embedding of measures. J. Mach. Learn. Res., 12:2389-2410, 2011.
[25]
B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G. R. G. Lanckriet. On the empirical estimation of integral probability metrics. Electronic Journal of Statistics, 6:1550-1599, 2012.
[26]
I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008.
[27]
Z. Szabó, A. Gretton, B. Póczos, and B. K. Sriperumbudur. Two-stage sampled learning theory on distributions. In Proceedings of the Eighteenth International Conference on Arti_cial Intelligence and Statistics, volume 38, pages 948-957. JMLR Workshop and Conference Proceedings, 2015.
[28]
A. B. Tsybakov. Introduction to Nonparametric Estimation. Springer, NY, 2008.
[29]
R. Vert and J-P. Vert. Consistency and convergence rates of one-class SVMs and related algorithms. Journal of Machine Learning Research, 7:817-854, 2006.
[30]
H. Wendland. Scattered Data Approximation. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2005.
[31]
V. Yurinsky. Sums and Gaussian Vectors, volume 1617 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 1995.

Cited By

View all
  • (2024)Sample complexity bounds for estimating probability divergences under invariancesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694001(47396-47417)Online publication date: 21-Jul-2024
  • (2023)Sparse learning of dynamical systems in RKHSProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618951(13325-13352)Online publication date: 23-Jul-2023
  • (2023)Compressed decentralized learning of conditional mean embedding operators in reproducing kernel hilbert spacesProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.25956(7902-7909)Online publication date: 7-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 18, Issue 1
January 2017
8830 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 January 2017
Published in JMLR Volume 18, Issue 1

Author Tags

  1. Bochner integral
  2. Bochner's theorem
  3. kernel mean embeddings
  4. minimax lower bounds
  5. reproducing kernel Hilbert space
  6. translation invariant kernel

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)90
  • Downloads (Last 6 weeks)12
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Sample complexity bounds for estimating probability divergences under invariancesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694001(47396-47417)Online publication date: 21-Jul-2024
  • (2023)Sparse learning of dynamical systems in RKHSProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618951(13325-13352)Online publication date: 23-Jul-2023
  • (2023)Compressed decentralized learning of conditional mean embedding operators in reproducing kernel hilbert spacesProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.25956(7902-7909)Online publication date: 7-Feb-2023
  • (2022)Generalization bounds with minimal dependency on hypothesis class via distributionally robust optimizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602270(27576-27590)Online publication date: 28-Nov-2022
  • (2022)A kernelised stein statistic for assessing implicit generative modelsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600798(7277-7289)Online publication date: 28-Nov-2022
  • (2022)Sample-Efficient Kernel Mean Estimator with Marginalized Corrupted DataProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539318(2110-2119)Online publication date: 14-Aug-2022
  • (2020)Breaking the curse of many agentsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525874(10092-10103)Online publication date: 13-Jul-2020
  • (2020)Robust density estimation under besov IPM lossesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496173(5345-5355)Online publication date: 6-Dec-2020
  • (2018)Nonparametric density estimation with adversarial lossesProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327686(10246-10257)Online publication date: 3-Dec-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media