research-article

Gradient Networks

Authors:

Shreyas Chaudhari,

Srinivasa Pranav,

José M.F. MouraAuthors Info & Claims

IEEE Transactions on Signal Processing, Volume 73

Pages 324 - 339

https://doi.org/10.1109/TSP.2024.3496692

Published: 13 November 2024 Publication History

Abstract

Directly parameterizing and learning gradients of functions has widespread significance, with specific applications in inverse problems, generative modeling, and optimal transport. This paper introduces gradient networks (<monospace>GradNets</monospace>): novel neural network architectures that parameterize gradients of various function classes. <monospace>GradNets</monospace> exhibit specialized architectural constraints that ensure correspondence to gradient functions. We provide a comprehensive <monospace>GradNet</monospace> design framework that includes methods for transforming <monospace>GradNets</monospace> into monotone gradient networks (<monospace>mGradNets</monospace>), which are guaranteed to represent gradients of convex functions. Our results establish that our proposed <monospace>GradNet</monospace> (and <monospace>mGradNet</monospace>) universally approximate the gradients of (convex) functions. Furthermore, these networks can be customized to correspond to specific spaces of potential functions, including transformed sums of (convex) ridge functions. Our analysis leads to two distinct <monospace>GradNet</monospace> architectures, <monospace>GradNet-C</monospace> and <monospace>GradNet-M</monospace>, and we describe the corresponding monotone versions, <monospace>mGradNet-C</monospace> and <monospace>mGradNet-M</monospace>. Our empirical results demonstrate that these architectures provide efficient parameterizations and outperform existing methods by up to 15 dB in gradient field tasks and by up to 11 dB in Hamiltonian dynamics learning tasks.

References

[1]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Adv. Neur. Inf. Process. Syst., vol. 25, 2012.

[2]

A. Vaswani et al., “Attention is all you need,” Adv. Neur. Inf. Process. Syst., vol. 30, 2017.

[3]

V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.

[4]

A. Hyvärinen and P. Dayan, “Estimation of non-normalized statistical models by score matching.” J. Mach. Learn. Res., vol. 6, no. 4, pp. 695–709, 2005.

Digital Library

[5]

Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Adv. Neur. Inf. Process. Syst., vol. 32, 2019.

[6]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in Proc. 9th Int. Conf. Learn. Represent. (ICLR), Virtual Event, Austria, May 3–7, 2021. OpenReview.net. [Online]. Available: https://openreview.net/forum?id=PxTIG12RRHS

[7]

Y. Song and S. Ermon, “Improved techniques for training score-based generative models,” Adv. Neur. Inf. Process. Syst., vol. 33, pp. 12438–12448, 2020.

[8]

R. Cai et al., “Learning gradient fields for shape generation,” in Proc. Comput. Vision–ECCV 2020: 16th Eur. Conf. (Part III 16), Glasgow, UK, August 23–28. Springer, 2020, pp. 364–381.

[9]

Y. Brenier, “Polar factorization and monotone rearrangement of vector-valued functions,” Commun. Pure Appl. Math., vol. 44, no. 4, pp. 375–417, 1991.

[10]

F. Santambrogio, “Optimal transport for applied mathematicians,” Birkäuser, NY, vol. 55, no. 58–63, p. 94, 2015.

[11]

A. Korotin, L. Li, A. Genevay, J. M. Solomon, A. Filippov, and E. Burnaev, “Do neural optimal transport solvers work? A continuous Wasserstein-2 benchmark,” Adv. Neur. Inf. Process. Syst., vol. 34, pp. 14593–14605, 2021.

[12]

C. Bunne, A. Krause, and M. Cuturi, “Supervised training of conditional Monge maps,” Adv. Neur. Inf. Process. Syst., vol. 35, pp. 6859–6872, 2022.

[13]

A. Makkuva, A. Taghvaei, S. Oh, and J. Lee, “Optimal transport mapping via input convex neural networks,” in Int. Conf. Mach. Learn. PMLR, 2020, pp. 6672–6681.

[14]

S. Chaudhari, S. Pranav, and J. M. F. Moura, “Learning gradients of convex functions with monotone gradient networks,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2023, pp. 1–5.

[15]

J. Vongkulbhisal, F. De la Torre, and J. P. Costeira, “Discriminative optimization: Theory and applications to computer vision,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 4, pp. 829–843, Apr. 2018.

[16]

Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (red),” SIAM J. Imag. Sci., vol. 10, no. 4, pp. 1804–1844, Apr. 2017.

Digital Library

[17]

E. T. Reehorst and P. Schniter, “Regularization by denoising: Clarifications and new interpretations,” IEEE Trans. Comput. Imag., vol. 5, no. 1, pp. 52–67, Jan. 2018.

[18]

S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” in Proc. IEEE Global Conf. Signal Information Process. Piscataway, NJ, USA: IEEE Press, 2013, pp. 945–948.

[19]

U. S. Kamilov, C. A. Bouman, G. T. Buzzard, and B. Wohlberg, “Plug-and-play methods for integrating physical and learned models in computational imaging: Theory, algorithms, and applications,” IEEE Signal Process. Mag., vol. 40, no. 1, pp. 85–97, Jan. 2023.

[20]

R. Cohen, Y. Blau, D. Freedman, and E. Rivlin, “It has potential: Gradient-driven denoisers for convergent solutions to inverse problems,” Adv. Neur. Inf. Process. Syst., vol. 34, pp. 18152–18164, 2021.

[21]

S. Hurault, A. Leclaire, and N. Papadakis, “Gradient step denoiser for convergent plug-and-play,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2022. [Online]. Available: https://openreview.net/forum?id=fPhKeld3Okz

[22]

R. G. Gavaskar, C. D. Athalye, and K. N. Chaudhury, “On plug-and-play regularization using linear denoisers,” IEEE Trans. Image Process., vol. 30, pp. 4802–4813, 2021.

[23]

R. Fermanian, M. Le Pendu, and C. Guillemot, “PnP-ReG: Learned regularizing gradient for plug-and-play gradient descent,” SIAM J. Imag. Sci., vol. 16, no. 2, pp. 585–613, 2023.

[24]

M. Andrychowicz et al., “Learning to learn by gradient descent by gradient descent,” Adv. Neur. Inf. Process. Syst., vol. 29, 2016.

[25]

W. Rudin, Principles of Mathematical Analysis. New York, NY, USA: McGraw-Hill, 1964, vol. 3.

[26]

W. M. Czarnecki, S. Osindero, M. Jaderberg, G. Swirszcz, and R. Pascanu, “Sobolev training for neural networks,” Adv. Neur. Inf. Process. Syst., vol. 30, 2017.

[27]

S. Saremi, “On approximating $\nabla f$ with neural networks,” 2019,.

[28]

L. Metz, C. D. Freeman, S. S. Schoenholz, and T. Kachman, “Gradients are not all you need,” 2021,.

[29]

J. Richter-Powell, J. Lorraine, and B. Amos, “Input convex gradient networks,” 2021,.

[30]

S. Roth and M. J. Black, “Fields of experts,” Int. J. Comput. Vision, vol. 82, pp. 205–229, 2009.

Digital Library

[31]

D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425–455, 1994.

[32]

Y. Chen, R. Ranftl, and T. Pock, “Insights into analysis operator learning: From patch-based sparse models to higher order MRFS,” IEEE Trans. Image Process., vol. 23, no. 3, pp. 1060–1072, 2014.

Digital Library

[33]

E. Kobler, T. Klatzer, K. Hammernik, and T. Pock, “Variational networks: Connecting variational methods and deep learning,” in Proc. Pattern Recognit.: 39th German Conf. (GCPR), Basel, Switzerland, September 12–15. New York: Springer, 2017, pp. 281–293.

[34]

A. Goujon, S. Neumayer, P. Bohra, S. Ducotterd, and M. Unser, “A neural-network-based convex regularizer for inverse problems,” IEEE Trans. Comput. Imag., vol. 9, pp. 781–795, 2023.

[35]

A. Goujon, S. Neumayer, and M. Unser, “Learning weakly convex regularizers for convergent image-reconstruction algorithms,” SIAM J. Imag. Sci., vol. 17, no. 1, pp. 91–115, 2024.

[36]

B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in Int. Conf. Mach. Learn. PMLR, 2017, pp. 146–155.

[37]

Y. Chen, Y. Shi, and B. Zhang, “Optimal control via neural networks: A convex approach,” in Proc. Int. Conf. Learn. Represent., 2018.

[38]

C.-W. Huang, R. T. Q. Chen, C. Tsirigotis, and A. Courville, “Convex potential flows: Universal probability distributions with optimal transport and convex optimization,” in Proc. Int. Conf. Learn. Represent., 2021. [Online]. Available: https://openreview.net/forum?id=te7PVH1sPxJ

[39]

S. Sivaprasad, A. Singh, N. Manwani, and V. Gandhi, “The curious case of convex neural networks,” in Proc. Mach. Learn. Knowl. Discovery Databases. Res. Track: Eur. Conf. (ECML PKDD)—Part I 21, Bilbao, Spain, September 13–17. New York: Springer, 2021, pp. 738–754.

[40]

P.-J. Hoedt and G. Klambauer, “Principled weight initialisation for input-convex neural networks,” Adv. Neur. Inf. Process. Syst., vol. 36, 2024.

[41]

J. Lorraine and S. Hossain, “JacNet: Learning functions with structured Jacobians,” 2024,.

[42]

P. J. Davis and P. Rabinowitz, Methods of Numerical Integration. Courier Corporation, 2007.

[43]

T. Tao, Analysis II. New York: Springer, 2016.

[44]

R. T. Rockafellar and R. J.-B. Wets, Variational Analysis. Springer Science & Business Media, 2009, vol. 317.

[45]

E. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, “Plug-and-play methods provably converge with properly trained denoisers,” in Int. Conf. Mach. Learn. PMLR, 2019, pp. 5546–5557.

[46]

M. Terris, A. Repetti, J.-C. Pesquet, and Y. Wiaux, “Building firmly nonexpansive convolutional neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP). Piscataway, NJ, USA: IEEE Press, 2020, pp. 8658–8662.

[47]

S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.

[48]

S. Mukherjee, S. Dittmer, Z. Shumaylov, S. Lunz, O. Öktem, and C.-B. Schönlieb, “Learned convex regularizers for inverse problems,” 2020,.

[49]

R. T. Rockafellar, Convex Analysis, vol. 18. Princeton, NJ, USA: Princeton Univ. Press, 1970.

[50]

A. H. Smith and W. A. Albrecht, Fundamental Concepts of Analysis. Englewood Cliffs, NJ, USA: Prentice-Hall, 1966.

[51]

G. C. Calafiore, S. Gaubert, and C. Possieri, “Log-sum-exp neural networks and posynomial models for convex and log-log-convex data,” IEEE Trans. Neural Networks Learn. Syst., vol. 31, no. 3, pp. 827–838, Mar. 2020.

[52]

M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,” Neural Networks, vol. 6, no. 6, pp. 861–867, Jun. 1993. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0893608005801315

Digital Library

[53]

J. Park and I. W. Sandberg, “Approximation and radial-basis-function networks,” Neural Comput., vol. 5, no. 2, pp. 305–316, Feb. 1993.

Digital Library

[54]

G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Math. Control, Signals Syst., vol. 2, no. 4, pp. 303–314, 1989.

[55]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2016, pp. 770–778.

[56]

H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the loss landscape of neural nets,” Adv. Neur. Inf. Process. Syst., vol. 31, 2018.

[57]

K. Svanberg, “A class of globally convergent optimization methods based on conservative convex separable approximations,” SIAM J. Optim., vol. 12, no. 2, pp. 555–573, 2002.

Digital Library

[58]

A. Blum, J. Hopcroft, and R. Kannan, Foundations of Data Science. Cambridge, U.K.: Cambridge Univ. Press, 2020.

[59]

S. Greydanus, M. Dzamba, and J. Yosinski, “Hamiltonian neural networks,” in Adv. Neur. Inf. Process. Syst., H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2019/file/26cd8ecadce0d4efd6cc8a8725cbd1f8-Paper.pdf

Index Terms

Gradient Networks

Index terms have been assigned to the content through auto-classification.

Recommendations

On three-term conjugate gradient algorithms for unconstrained optimization

This paper presents a project for three-term conjugate gradient algorithms development. The search direction of the algorithms from this class has three terms and is computed as modifications of the classical conjugate gradient algorithms to satisfy ...
Another nonlinear conjugate gradient algorithm for unconstrained optimization

A nonlinear conjugate gradient algorithm which is a modification of the Dai and Yuan [A nonlinear conjugate gradient method with a strong global convergence property, SIAM J. Optim. 10 (1999), pp. 177-182] conjugate gradient algorithm satisfying a ...
An investigation of the gradient descent process in neural networks

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Signal Processing

IEEE Transactions on Signal Processing Volume 73, Issue

2025

371 pages

Issue’s Table of Contents

1053-587X © 2024 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 13 November 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents