skip to main content
Free access

Minimax estimation of kernel mean embeddings

Published: 01 January 2017 Publication History


In this paper, we study the minimax estimation of the Bochner integral µk(P):= ∫χk(ċx)dP(x) also called as the kernel mean embedding, based on random samples drawn i.i.d. from P, where k : χ×χ → R is a positive de_nite kernel. Various estimators (including the empirical estimator), Θn of µk(P) are studied in the literature wherein all of them satisfy ||Θnk(P)||Hk = OP (n-1/2) with Hk being the reproducing kernel Hilbert space induced by k. The main contribution of the paper is in showing that the above mentioned rate of n-1/2 is minimax in ||ċ||Hk and ||ċ|| L2(Rd)-norms over the class of discrete measures and the class of measures that has an in_nitely di_erentiable density, with k being a continuous translation-invariant kernel on Rd. The interesting aspect of this result is that the minimax rate is independent of the smoothness of the kernel and the density of P (if it exists).


N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68:337-404, 1950.
V. I. Bogachev. Measure Theory, volume 1. Springer, 2007.
J. Diestel and J. J. Uhl. Vector Measures. American Mathematical Society, Providence, 1977.
N. Dinculeanu. Vector Integration and Stochastic Integration in Banach Spaces. Wiley, 2000.
R. M. Dudley. Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1999.
R. M. Dudley. Real Analysis and Probability. Cambridge University Press, 2002.
G. B. Folland. Real Analysis: Modern Techniques and Their Applications. Wiley, 1999.
K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf. Kernel measures of conditional dependence. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 489-496, Cambridge, MA, 2008. MIT Press.
K. Fukumizu, L. Song, and A. Gretton. Kernel Bayes' rule: Bayesian inference with positive de_nite kernels. J. Mach. Learn. Res., 14:3753-3783, 2013.
I. S. Gradshteyn and I. M. Ryzhik. Table of Integrals, Series, and Products. Academic Press, San Diego, USA, 2000.
A. Gretton, K. M. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola. A kernel method for the two sample problem. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 513-520, Cambridge, MA, 2007. MIT Press.
A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schölkopf, and A. J. Smola. A kernel statistical test of independence. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 585-592. MIT Press, 2008.
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. J. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723-773, 2012.
E. L. Lehmann and G. Casella. Theory of Point Estimation. Springer-Verlag, New York, 2008.
D. Lopez-Paz, K. Muandet, B. Schölkopf, and I. Tolstikhin. Towards a learning theory of cause-e_ect inference. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, 2015.
K. Muandet, B. Sriperumbudur, K. Fukumizu, A. Gretton, and B. Schölkopf. Kernel mean shrinkage estimators. Journal of Machine Learning Research, 2016. To appear.
A. Ramdas, S. Reddi, B. Poczos, A. Singh, and L. Wasserman. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. In AAAI Conference on Artificial Intelligence, 2015.
I. J. Schoenberg. Metric spaces and completely monotone functions. The Annals of Mathematics, 39(4):811-841, 1938.
A. J. Smola, A. Gretton, L. Song, and B. Schölkopf. A Hilbert space embedding for distributions. In Proceedings of the 18th International Conference on Algorithmic Learning Theory (ALT), pages 13-31. Springer-Verlag, 2007.
L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt. Feature selection via dependence maximization. Journal of Machine Learning Research, 13:1393-1434, 2012.
B. K. Sriperumbudur. Mixture density estimation via Hilbert space embedding of measures. In Proceedings of International Symposium on Information Theory, pages 1027-1030, 2011.
B. K. Sriperumbudur. On the optimal estimation of probability measures in weak and strong topologies. Bernoulli, 22(3):1839-1893, 2016.
B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. G. Lanckriet. Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res., 11:1517-1561, 2010.
B. K. Sriperumbudur, K. Fukumizu, and G. R. G. Lanckriet. Universality, characteristic kernels and rkhs embedding of measures. J. Mach. Learn. Res., 12:2389-2410, 2011.
B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Schölkopf, and G. R. G. Lanckriet. On the empirical estimation of integral probability metrics. Electronic Journal of Statistics, 6:1550-1599, 2012.
I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008.
Z. Szabó, A. Gretton, B. Póczos, and B. K. Sriperumbudur. Two-stage sampled learning theory on distributions. In Proceedings of the Eighteenth International Conference on Arti_cial Intelligence and Statistics, volume 38, pages 948-957. JMLR Workshop and Conference Proceedings, 2015.
A. B. Tsybakov. Introduction to Nonparametric Estimation. Springer, NY, 2008.
R. Vert and J-P. Vert. Consistency and convergence rates of one-class SVMs and related algorithms. Journal of Machine Learning Research, 7:817-854, 2006.
H. Wendland. Scattered Data Approximation. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2005.
V. Yurinsky. Sums and Gaussian Vectors, volume 1617 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 1995.

Cited By

View all
  • (2024)Sample complexity bounds for estimating probability divergences under invariancesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694001(47396-47417)Online publication date: 21-Jul-2024
  • (2023)Sparse learning of dynamical systems in RKHSProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618951(13325-13352)Online publication date: 23-Jul-2023
  • (2023)Compressed decentralized learning of conditional mean embedding operators in reproducing kernel hilbert spacesProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.25956(7902-7909)Online publication date: 7-Feb-2023
  • Show More Cited By



Information & Contributors


Published In

cover image The Journal of Machine Learning Research
The Journal of Machine Learning Research  Volume 18, Issue 1
January 2017
8830 pages
Issue’s Table of Contents


Publication History

Published: 01 January 2017
Published in JMLR Volume 18, Issue 1

Author Tags

  1. Bochner integral
  2. Bochner's theorem
  3. kernel mean embeddings
  4. minimax lower bounds
  5. reproducing kernel Hilbert space
  6. translation invariant kernel


  • Article


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)90
  • Downloads (Last 6 weeks)12
Reflects downloads up to 02 Feb 2025

Other Metrics


Cited By

View all
  • (2024)Sample complexity bounds for estimating probability divergences under invariancesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694001(47396-47417)Online publication date: 21-Jul-2024
  • (2023)Sparse learning of dynamical systems in RKHSProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618951(13325-13352)Online publication date: 23-Jul-2023
  • (2023)Compressed decentralized learning of conditional mean embedding operators in reproducing kernel hilbert spacesProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.25956(7902-7909)Online publication date: 7-Feb-2023
  • (2022)Generalization bounds with minimal dependency on hypothesis class via distributionally robust optimizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602270(27576-27590)Online publication date: 28-Nov-2022
  • (2022)A kernelised stein statistic for assessing implicit generative modelsProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3600798(7277-7289)Online publication date: 28-Nov-2022
  • (2022)Sample-Efficient Kernel Mean Estimator with Marginalized Corrupted DataProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539318(2110-2119)Online publication date: 14-Aug-2022
  • (2020)Breaking the curse of many agentsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525874(10092-10103)Online publication date: 13-Jul-2020
  • (2020)Robust density estimation under besov IPM lossesProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496173(5345-5355)Online publication date: 6-Dec-2020
  • (2018)Nonparametric density estimation with adversarial lossesProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327546.3327686(10246-10257)Online publication date: 3-Dec-2018

View Options

View options


View or Download as a PDF file.



View online with eReader.


Login options

Full Access






Share this Publication link

Share on social media