skip to main content
research-article

Gradient Networks

Published: 13 November 2024 Publication History

Abstract

Directly parameterizing and learning gradients of functions has widespread significance, with specific applications in inverse problems, generative modeling, and optimal transport. This paper introduces gradient networks (<monospace>GradNets</monospace>): novel neural network architectures that parameterize gradients of various function classes. <monospace>GradNets</monospace> exhibit specialized architectural constraints that ensure correspondence to gradient functions. We provide a comprehensive <monospace>GradNet</monospace> design framework that includes methods for transforming <monospace>GradNets</monospace> into monotone gradient networks (<monospace>mGradNets</monospace>), which are guaranteed to represent gradients of convex functions. Our results establish that our proposed <monospace>GradNet</monospace> (and <monospace>mGradNet</monospace>) universally approximate the gradients of (convex) functions. Furthermore, these networks can be customized to correspond to specific spaces of potential functions, including transformed sums of (convex) ridge functions. Our analysis leads to two distinct <monospace>GradNet</monospace> architectures, <monospace>GradNet-C</monospace> and <monospace>GradNet-M</monospace>, and we describe the corresponding monotone versions, <monospace>mGradNet-C</monospace> and <monospace>mGradNet-M</monospace>. Our empirical results demonstrate that these architectures provide efficient parameterizations and outperform existing methods by up to 15 dB in gradient field tasks and by up to 11 dB in Hamiltonian dynamics learning tasks.

References

[1]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Adv. Neur. Inf. Process. Syst., vol. 25, 2012.
[2]
A. Vaswani et al., “Attention is all you need,” Adv. Neur. Inf. Process. Syst., vol. 30, 2017.
[3]
V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[4]
A. Hyvärinen and P. Dayan, “Estimation of non-normalized statistical models by score matching.” J. Mach. Learn. Res., vol. 6, no. 4, pp. 695–709, 2005.
[5]
Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Adv. Neur. Inf. Process. Syst., vol. 32, 2019.
[6]
Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in Proc. 9th Int. Conf. Learn. Represent. (ICLR), Virtual Event, Austria, May 3–7, 2021. OpenReview.net. [Online]. Available: https://openreview.net/forum?id=PxTIG12RRHS
[7]
Y. Song and S. Ermon, “Improved techniques for training score-based generative models,” Adv. Neur. Inf. Process. Syst., vol. 33, pp. 12438–12448, 2020.
[8]
R. Cai et al., “Learning gradient fields for shape generation,” in Proc. Comput. Vision–ECCV 2020: 16th Eur. Conf. (Part III 16), Glasgow, UK, August 23–28. Springer, 2020, pp. 364–381.
[9]
Y. Brenier, “Polar factorization and monotone rearrangement of vector-valued functions,” Commun. Pure Appl. Math., vol. 44, no. 4, pp. 375–417, 1991.
[10]
F. Santambrogio, “Optimal transport for applied mathematicians,” Birkäuser, NY, vol. 55, no. 58–63, p. 94, 2015.
[11]
A. Korotin, L. Li, A. Genevay, J. M. Solomon, A. Filippov, and E. Burnaev, “Do neural optimal transport solvers work? A continuous Wasserstein-2 benchmark,” Adv. Neur. Inf. Process. Syst., vol. 34, pp. 14593–14605, 2021.
[12]
C. Bunne, A. Krause, and M. Cuturi, “Supervised training of conditional Monge maps,” Adv. Neur. Inf. Process. Syst., vol. 35, pp. 6859–6872, 2022.
[13]
A. Makkuva, A. Taghvaei, S. Oh, and J. Lee, “Optimal transport mapping via input convex neural networks,” in Int. Conf. Mach. Learn. PMLR, 2020, pp. 6672–6681.
[14]
S. Chaudhari, S. Pranav, and J. M. F. Moura, “Learning gradients of convex functions with monotone gradient networks,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2023, pp. 1–5.
[15]
J. Vongkulbhisal, F. De la Torre, and J. P. Costeira, “Discriminative optimization: Theory and applications to computer vision,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 4, pp. 829–843, Apr. 2018.
[16]
Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (red),” SIAM J. Imag. Sci., vol. 10, no. 4, pp. 1804–1844, Apr. 2017.
[17]
E. T. Reehorst and P. Schniter, “Regularization by denoising: Clarifications and new interpretations,” IEEE Trans. Comput. Imag., vol. 5, no. 1, pp. 52–67, Jan. 2018.
[18]
S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” in Proc. IEEE Global Conf. Signal Information Process. Piscataway, NJ, USA: IEEE Press, 2013, pp. 945–948.
[19]
U. S. Kamilov, C. A. Bouman, G. T. Buzzard, and B. Wohlberg, “Plug-and-play methods for integrating physical and learned models in computational imaging: Theory, algorithms, and applications,” IEEE Signal Process. Mag., vol. 40, no. 1, pp. 85–97, Jan. 2023.
[20]
R. Cohen, Y. Blau, D. Freedman, and E. Rivlin, “It has potential: Gradient-driven denoisers for convergent solutions to inverse problems,” Adv. Neur. Inf. Process. Syst., vol. 34, pp. 18152–18164, 2021.
[21]
S. Hurault, A. Leclaire, and N. Papadakis, “Gradient step denoiser for convergent plug-and-play,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2022. [Online]. Available: https://openreview.net/forum?id=fPhKeld3Okz
[22]
R. G. Gavaskar, C. D. Athalye, and K. N. Chaudhury, “On plug-and-play regularization using linear denoisers,” IEEE Trans. Image Process., vol. 30, pp. 4802–4813, 2021.
[23]
R. Fermanian, M. Le Pendu, and C. Guillemot, “PnP-ReG: Learned regularizing gradient for plug-and-play gradient descent,” SIAM J. Imag. Sci., vol. 16, no. 2, pp. 585–613, 2023.
[24]
M. Andrychowicz et al., “Learning to learn by gradient descent by gradient descent,” Adv. Neur. Inf. Process. Syst., vol. 29, 2016.
[25]
W. Rudin, Principles of Mathematical Analysis. New York, NY, USA: McGraw-Hill, 1964, vol. 3.
[26]
W. M. Czarnecki, S. Osindero, M. Jaderberg, G. Swirszcz, and R. Pascanu, “Sobolev training for neural networks,” Adv. Neur. Inf. Process. Syst., vol. 30, 2017.
[27]
S. Saremi, “On approximating $\nabla f$ with neural networks,” 2019,.
[28]
L. Metz, C. D. Freeman, S. S. Schoenholz, and T. Kachman, “Gradients are not all you need,” 2021,.
[29]
J. Richter-Powell, J. Lorraine, and B. Amos, “Input convex gradient networks,” 2021,.
[30]
S. Roth and M. J. Black, “Fields of experts,” Int. J. Comput. Vision, vol. 82, pp. 205–229, 2009.
[31]
D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425–455, 1994.
[32]
Y. Chen, R. Ranftl, and T. Pock, “Insights into analysis operator learning: From patch-based sparse models to higher order MRFS,” IEEE Trans. Image Process., vol. 23, no. 3, pp. 1060–1072, 2014.
[33]
E. Kobler, T. Klatzer, K. Hammernik, and T. Pock, “Variational networks: Connecting variational methods and deep learning,” in Proc. Pattern Recognit.: 39th German Conf. (GCPR), Basel, Switzerland, September 12–15. New York: Springer, 2017, pp. 281–293.
[34]
A. Goujon, S. Neumayer, P. Bohra, S. Ducotterd, and M. Unser, “A neural-network-based convex regularizer for inverse problems,” IEEE Trans. Comput. Imag., vol. 9, pp. 781–795, 2023.
[35]
A. Goujon, S. Neumayer, and M. Unser, “Learning weakly convex regularizers for convergent image-reconstruction algorithms,” SIAM J. Imag. Sci., vol. 17, no. 1, pp. 91–115, 2024.
[36]
B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in Int. Conf. Mach. Learn. PMLR, 2017, pp. 146–155.
[37]
Y. Chen, Y. Shi, and B. Zhang, “Optimal control via neural networks: A convex approach,” in Proc. Int. Conf. Learn. Represent., 2018.
[38]
C.-W. Huang, R. T. Q. Chen, C. Tsirigotis, and A. Courville, “Convex potential flows: Universal probability distributions with optimal transport and convex optimization,” in Proc. Int. Conf. Learn. Represent., 2021. [Online]. Available: https://openreview.net/forum?id=te7PVH1sPxJ
[39]
S. Sivaprasad, A. Singh, N. Manwani, and V. Gandhi, “The curious case of convex neural networks,” in Proc. Mach. Learn. Knowl. Discovery Databases. Res. Track: Eur. Conf. (ECML PKDD)—Part I 21, Bilbao, Spain, September 13–17. New York: Springer, 2021, pp. 738–754.
[40]
P.-J. Hoedt and G. Klambauer, “Principled weight initialisation for input-convex neural networks,” Adv. Neur. Inf. Process. Syst., vol. 36, 2024.
[41]
J. Lorraine and S. Hossain, “JacNet: Learning functions with structured Jacobians,” 2024,.
[42]
P. J. Davis and P. Rabinowitz, Methods of Numerical Integration. Courier Corporation, 2007.
[43]
T. Tao, Analysis II. New York: Springer, 2016.
[44]
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis. Springer Science & Business Media, 2009, vol. 317.
[45]
E. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, “Plug-and-play methods provably converge with properly trained denoisers,” in Int. Conf. Mach. Learn. PMLR, 2019, pp. 5546–5557.
[46]
M. Terris, A. Repetti, J.-C. Pesquet, and Y. Wiaux, “Building firmly nonexpansive convolutional neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP). Piscataway, NJ, USA: IEEE Press, 2020, pp. 8658–8662.
[47]
S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.
[48]
S. Mukherjee, S. Dittmer, Z. Shumaylov, S. Lunz, O. Öktem, and C.-B. Schönlieb, “Learned convex regularizers for inverse problems,” 2020,.
[49]
R. T. Rockafellar, Convex Analysis, vol. 18. Princeton, NJ, USA: Princeton Univ. Press, 1970.
[50]
A. H. Smith and W. A. Albrecht, Fundamental Concepts of Analysis. Englewood Cliffs, NJ, USA: Prentice-Hall, 1966.
[51]
G. C. Calafiore, S. Gaubert, and C. Possieri, “Log-sum-exp neural networks and posynomial models for convex and log-log-convex data,” IEEE Trans. Neural Networks Learn. Syst., vol. 31, no. 3, pp. 827–838, Mar. 2020.
[52]
M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,” Neural Networks, vol. 6, no. 6, pp. 861–867, Jun. 1993. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0893608005801315
[53]
J. Park and I. W. Sandberg, “Approximation and radial-basis-function networks,” Neural Comput., vol. 5, no. 2, pp. 305–316, Feb. 1993.
[54]
G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Math. Control, Signals Syst., vol. 2, no. 4, pp. 303–314, 1989.
[55]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2016, pp. 770–778.
[56]
H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the loss landscape of neural nets,” Adv. Neur. Inf. Process. Syst., vol. 31, 2018.
[57]
K. Svanberg, “A class of globally convergent optimization methods based on conservative convex separable approximations,” SIAM J. Optim., vol. 12, no. 2, pp. 555–573, 2002.
[58]
A. Blum, J. Hopcroft, and R. Kannan, Foundations of Data Science. Cambridge, U.K.: Cambridge Univ. Press, 2020.
[59]
S. Greydanus, M. Dzamba, and J. Yosinski, “Hamiltonian neural networks,” in Adv. Neur. Inf. Process. Syst., H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2019/file/26cd8ecadce0d4efd6cc8a8725cbd1f8-Paper.pdf

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Signal Processing
IEEE Transactions on Signal Processing  Volume 73, Issue
2025
371 pages

Publisher

IEEE Press

Publication History

Published: 13 November 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media