skip to main content
article

Learning with optimal interpolation norms

Published: 01 June 2019 Publication History

Abstract

We analyze a class of norms defined via an optimal interpolation problem involving the composition of norms and a linear operator. This construction, known as infimal postcomposition in convex analysis, is shown to encompass various norms which have been used as regularizers in machine learning, signal processing, and statistics. In particular, these include the latent group lasso, the overlapping group lasso, and certain norms used for learning tensors. We establish basic properties of this class of norms and we provide dual norms. The extension to more general classes of convex functions is also discussed. A stochastic block-coordinate version of the Douglas-Rachford algorithm is devised to solve minimization problems involving these regularizers. A prominent feature of the algorithm is that it yields iterates that converge to a solution in the case of nonsmooth losses and random block updates. Finally, we present numerical experiments with problems employing the latent group lasso penalty.

References

[1]
Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73, 243---272 (2008)
[2]
Argyriou, A., Foygel, R., Srebro, N.: Sparse prediction with the K-support norm. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1466---1474 (2012)
[3]
Argyriou, A., Micchelli, C.A., Pontil, M.: On spectral learning. J. Mach. Learn. Res. 11, 935---953 (2010)
[4]
Asaei, A., Golbabaee, M., Bourlard, H., Cevher, V.: Structured sparsity models for reverberant speech separation. IEEE/ACM Trans. Audio Speech Language Process. 22, 620---633 (2014)
[5]
Bach, F.R., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4, 1---106 (2012)
[6]
Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the SMO algorithm. In: Proceedings of the 21st International Conference on Machine Learning, pp. 6---15 (2004)
[7]
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, New York (2017)
[8]
Becker, S.R., Combettes, P.L.: An algorithm for splitting parallel sums of linearly composed monotone operators, with applications to signal recovery. J. Nonlin. Convex Anal. 15, 137---159 (2014)
[9]
Bourrier, A., Davies, M.E., Peleg, T., Pérez, P., Gribonval, R.: Fundamental performance limits for ideal decoders in high-dimensional linear inverse problems. IEEE Trans. Inform. Theory 60, 7928---7947 (2014)
[10]
Cheney, W., Light, W.: A Course in Approximation Theory. American Mathematical Society, Providence (2000)
[11]
Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.: The convex geometry of linear inverse problems. Found. Comput. Math. 12, 805---849 (2012)
[12]
Combettes, P.L.: Systems of structured monotone inclusions: duality, algorithms, and applications. SIAM J. Optim. 23, 2420---2447 (2013)
[13]
Combettes, P.L., Pesquet, J.-C.: Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping. SIAM J. Optim. 25, 1221---1248 (2015)
[14]
Combettes, P.L., Pesquet, J.-C.: Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping II: Mean-square and linear convergence, Math. Programming, published on line 2018-05-26
[15]
Gandy, S., Recht, B., Yamada, I.: Tensor completion and low-n-rank tensor recovery via convex optimization, Inverse Problems, vol. 27, art 025010 (2011)
[16]
Goulaouic, C.: Prolongements de foncteurs d'interpolation et applications. Ann. Inst. Fourier 18, 1---98 (1968)
[17]
Herbster, M., Lever, G.: Predicting the labelling of a graph via minimum p-seminorm interpolation. In: Proceedings of the 22nd Annual Conference on Learning Theory (2009)
[18]
Jacob, L., Obozinski, G., Vert, J.-P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th International Conference on Machine Learning, pp. 433---440 (2009)
[19]
Jaggi, M., Sulovsky, M.: A simple algorithm for nuclear norm regularized problems. In: Proceedings of the 27th International Conference on Machine Learning, pp. 471---478 (2010)
[20]
Jenatton, R., Mairal, J., Obozinski, G., Bach, F.: Proximal methods for hierarchical sparse coding. J. Mach. Learn. Res. 12, 2297---2334 (2011)
[21]
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51, 455---500 (2009)
[22]
Liu, X., Zhao, G., Yao, J., Qi, C.: Background subtraction based on low-rank and structured sparse decomposition. IEEE Trans. Image Process. 24, 2502---2514 (2015)
[23]
Mallat, S., Yu, G.: Super-resolution with sparse mixing estimators. IEEE Trans. Image Process. 19, 2889---2900 (2010)
[24]
Maurer, A., Pontil, M.: Structured sparsity and generalization. J. Mach. Learn. Res. 13, 671---690 (2012)
[25]
McDonald, A.M., Pontil, M., Stamos, D.: Spectral K-support norm regularization. In: Advances in Neural Information Processing Systems, vol. 27, pp. 3644---3652 (2014)
[26]
McDonald, A.M., Pontil, M., Stamos, D.: Fitting spectral decay with the k-support norm. In: Proceedings of the 19th International Conference on Artificial Intelligence Statistic and Machine Learning Research, pp. 1061---1069 (2016)
[27]
McDonald, A.M., Pontil, M., Stamos, D.: New perspectives on k-support and cluster norms. J. Machine Learn. Res. 17, 1---38 (2016)
[28]
Micchelli, C.A., Morales, J.M., Pontil, M.: Regularizers for structured sparsity. Adv. Comput. Math. 38, 455---489 (2013)
[29]
Micchelli, C.A., Pontil, M.: Learning the kernel function via regularization. J. Mach. Learn. Res. 6, 1099---1125 (2005)
[30]
Micchelli, C.A., Pontil, M.: Feature space perspectives for learning the kernel. Machine Learn. 66, 297---319 (2007)
[31]
Micchelli, C.A., Shen, L., Xu, Y.: Proximity algorithms for image models: denoising, inverse problems, vol. 27, art 045009 (2011)
[32]
Peetre, J.: A new approach in interpolation spaces. Studia Math. 34, 23---42 (1970)
[33]
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
[34]
Romera-Paredes, B., Aung, H., Bianchi-Berthouze, N., Pontil, M.: Multilinear multitask learning. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1444---1452 (2013)
[35]
Signoretto, M., Dinh, Q.T., De Lathauwer, L., Suykens, J.A.K.: Learning with tensors: a framework based on convex optimization and spectral regularization. Mach. Learn. 94, 303---351 (2014)
[36]
Srebro, N., Rennie, J.D.M., Jaakkola, T.S.: Maximum-margin matrix factorization. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1329---1336 (2005)
[37]
Sun, Y., Tao, X., Li, Y., Lu, J.: Robust 2D principal component analysis: a structured sparsity regularized approach. IEEE Trans. Image Process. 24, 2515---2526 (2015)
[38]
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. B67, 91---108 (2005)
[39]
Tomioka, R., Suzuki, T.: Convex tensor decomposition via structured Schatten norm regularization. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1331---1339 (2013)
[40]
Tomioka, R., Sukuki, T., Hayashi, K., Kashima, H.: Statistical Pperformance of convex tensor decomposition. In: Advances in Neural Information Processing Systems, vol. 23, pp. 972---980 (2011)
[41]
Triebel, H.: Interpolation Theory, Function Spaces, Differential Operators. North-Holland, New York (1978)
[42]
Ward, J.E., Wendell, R.E.: Using block norms for location modeling. Oper. Res. 33, 1074---1090 (1985)
[43]
Wimalawarne, K., Sugiama, M., Tomioka, R.: Multitask learning meets tensor factorization: task imputation via convex optimization. In: Advances in Neural Information Processing Systems, vol. 26, pp. 2825---2833 (2014)
[44]
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B68, 49---67 (2006)
[45]
Ză?linescu, C.: Convex Analysis in General Vector Spaces. World Scientific, River Edge (2002)
[46]
Zeng, X., Figueiredo, M.A.T.: Decreasing weighted sorted ℓ1 regularization. IEEE Signal Process. Lett. 21, 1240---1244 (2014)
[47]
Zhang, L., Wei, W., Tian, C., Li, F., Zhang, Y.: Exploring structured sparsity by a reweighted Laplace prior for hyperspectral compressive sensing. IEEE Trans. Image Process. 25, 4974---4988 (2016)
[48]
Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 37, 3468---3497 (2009)

Cited By

View all
  1. Learning with optimal interpolation norms

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Numerical Algorithms
    Numerical Algorithms  Volume 81, Issue 2
    June 2019
    357 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 June 2019

    Author Tags

    1. Block-coordinate proximal algorithm
    2. Douglas-Rachford splitting
    3. Infimal postcomposition
    4. Latent group lasso
    5. Machine learning
    6. Optimal interpolation norm

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media