Gradient Networks
Pages 324 - 339
Abstract
Directly parameterizing and learning gradients of functions has widespread significance, with specific applications in inverse problems, generative modeling, and optimal transport. This paper introduces gradient networks (<monospace>GradNets</monospace>): novel neural network architectures that parameterize gradients of various function classes. <monospace>GradNets</monospace> exhibit specialized architectural constraints that ensure correspondence to gradient functions. We provide a comprehensive <monospace>GradNet</monospace> design framework that includes methods for transforming <monospace>GradNets</monospace> into monotone gradient networks (<monospace>mGradNets</monospace>), which are guaranteed to represent gradients of convex functions. Our results establish that our proposed <monospace>GradNet</monospace> (and <monospace>mGradNet</monospace>) universally approximate the gradients of (convex) functions. Furthermore, these networks can be customized to correspond to specific spaces of potential functions, including transformed sums of (convex) ridge functions. Our analysis leads to two distinct <monospace>GradNet</monospace> architectures, <monospace>GradNet-C</monospace> and <monospace>GradNet-M</monospace>, and we describe the corresponding monotone versions, <monospace>mGradNet-C</monospace> and <monospace>mGradNet-M</monospace>. Our empirical results demonstrate that these architectures provide efficient parameterizations and outperform existing methods by up to 15 dB in gradient field tasks and by up to 11 dB in Hamiltonian dynamics learning tasks.
References
[1]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Adv. Neur. Inf. Process. Syst., vol. 25, 2012.
[2]
A. Vaswani et al., “Attention is all you need,” Adv. Neur. Inf. Process. Syst., vol. 30, 2017.
[3]
V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[4]
A. Hyvärinen and P. Dayan, “Estimation of non-normalized statistical models by score matching.” J. Mach. Learn. Res., vol. 6, no. 4, pp. 695–709, 2005.
[5]
Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Adv. Neur. Inf. Process. Syst., vol. 32, 2019.
[6]
Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in Proc. 9th Int. Conf. Learn. Represent. (ICLR), Virtual Event, Austria, May 3–7, 2021. OpenReview.net. [Online]. Available: https://openreview.net/forum?id=PxTIG12RRHS
[7]
Y. Song and S. Ermon, “Improved techniques for training score-based generative models,” Adv. Neur. Inf. Process. Syst., vol. 33, pp. 12438–12448, 2020.
[8]
R. Cai et al., “Learning gradient fields for shape generation,” in Proc. Comput. Vision–ECCV 2020: 16th Eur. Conf. (Part III 16), Glasgow, UK, August 23–28. Springer, 2020, pp. 364–381.
[9]
Y. Brenier, “Polar factorization and monotone rearrangement of vector-valued functions,” Commun. Pure Appl. Math., vol. 44, no. 4, pp. 375–417, 1991.
[10]
F. Santambrogio, “Optimal transport for applied mathematicians,” Birkäuser, NY, vol. 55, no. 58–63, p. 94, 2015.
[11]
A. Korotin, L. Li, A. Genevay, J. M. Solomon, A. Filippov, and E. Burnaev, “Do neural optimal transport solvers work? A continuous Wasserstein-2 benchmark,” Adv. Neur. Inf. Process. Syst., vol. 34, pp. 14593–14605, 2021.
[12]
C. Bunne, A. Krause, and M. Cuturi, “Supervised training of conditional Monge maps,” Adv. Neur. Inf. Process. Syst., vol. 35, pp. 6859–6872, 2022.
[13]
A. Makkuva, A. Taghvaei, S. Oh, and J. Lee, “Optimal transport mapping via input convex neural networks,” in Int. Conf. Mach. Learn. PMLR, 2020, pp. 6672–6681.
[14]
S. Chaudhari, S. Pranav, and J. M. F. Moura, “Learning gradients of convex functions with monotone gradient networks,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2023, pp. 1–5.
[15]
J. Vongkulbhisal, F. De la Torre, and J. P. Costeira, “Discriminative optimization: Theory and applications to computer vision,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 4, pp. 829–843, Apr. 2018.
[16]
Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: Regularization by denoising (red),” SIAM J. Imag. Sci., vol. 10, no. 4, pp. 1804–1844, Apr. 2017.
[17]
E. T. Reehorst and P. Schniter, “Regularization by denoising: Clarifications and new interpretations,” IEEE Trans. Comput. Imag., vol. 5, no. 1, pp. 52–67, Jan. 2018.
[18]
S. V. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” in Proc. IEEE Global Conf. Signal Information Process. Piscataway, NJ, USA: IEEE Press, 2013, pp. 945–948.
[19]
U. S. Kamilov, C. A. Bouman, G. T. Buzzard, and B. Wohlberg, “Plug-and-play methods for integrating physical and learned models in computational imaging: Theory, algorithms, and applications,” IEEE Signal Process. Mag., vol. 40, no. 1, pp. 85–97, Jan. 2023.
[20]
R. Cohen, Y. Blau, D. Freedman, and E. Rivlin, “It has potential: Gradient-driven denoisers for convergent solutions to inverse problems,” Adv. Neur. Inf. Process. Syst., vol. 34, pp. 18152–18164, 2021.
[21]
S. Hurault, A. Leclaire, and N. Papadakis, “Gradient step denoiser for convergent plug-and-play,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2022. [Online]. Available: https://openreview.net/forum?id=fPhKeld3Okz
[22]
R. G. Gavaskar, C. D. Athalye, and K. N. Chaudhury, “On plug-and-play regularization using linear denoisers,” IEEE Trans. Image Process., vol. 30, pp. 4802–4813, 2021.
[23]
R. Fermanian, M. Le Pendu, and C. Guillemot, “PnP-ReG: Learned regularizing gradient for plug-and-play gradient descent,” SIAM J. Imag. Sci., vol. 16, no. 2, pp. 585–613, 2023.
[24]
M. Andrychowicz et al., “Learning to learn by gradient descent by gradient descent,” Adv. Neur. Inf. Process. Syst., vol. 29, 2016.
[25]
W. Rudin, Principles of Mathematical Analysis. New York, NY, USA: McGraw-Hill, 1964, vol. 3.
[26]
W. M. Czarnecki, S. Osindero, M. Jaderberg, G. Swirszcz, and R. Pascanu, “Sobolev training for neural networks,” Adv. Neur. Inf. Process. Syst., vol. 30, 2017.
[27]
S. Saremi, “On approximating $\nabla f$ with neural networks,” 2019,.
[28]
L. Metz, C. D. Freeman, S. S. Schoenholz, and T. Kachman, “Gradients are not all you need,” 2021,.
[29]
J. Richter-Powell, J. Lorraine, and B. Amos, “Input convex gradient networks,” 2021,.
[30]
S. Roth and M. J. Black, “Fields of experts,” Int. J. Comput. Vision, vol. 82, pp. 205–229, 2009.
[31]
D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425–455, 1994.
[32]
Y. Chen, R. Ranftl, and T. Pock, “Insights into analysis operator learning: From patch-based sparse models to higher order MRFS,” IEEE Trans. Image Process., vol. 23, no. 3, pp. 1060–1072, 2014.
[33]
E. Kobler, T. Klatzer, K. Hammernik, and T. Pock, “Variational networks: Connecting variational methods and deep learning,” in Proc. Pattern Recognit.: 39th German Conf. (GCPR), Basel, Switzerland, September 12–15. New York: Springer, 2017, pp. 281–293.
[34]
A. Goujon, S. Neumayer, P. Bohra, S. Ducotterd, and M. Unser, “A neural-network-based convex regularizer for inverse problems,” IEEE Trans. Comput. Imag., vol. 9, pp. 781–795, 2023.
[35]
A. Goujon, S. Neumayer, and M. Unser, “Learning weakly convex regularizers for convergent image-reconstruction algorithms,” SIAM J. Imag. Sci., vol. 17, no. 1, pp. 91–115, 2024.
[36]
B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in Int. Conf. Mach. Learn. PMLR, 2017, pp. 146–155.
[37]
Y. Chen, Y. Shi, and B. Zhang, “Optimal control via neural networks: A convex approach,” in Proc. Int. Conf. Learn. Represent., 2018.
[38]
C.-W. Huang, R. T. Q. Chen, C. Tsirigotis, and A. Courville, “Convex potential flows: Universal probability distributions with optimal transport and convex optimization,” in Proc. Int. Conf. Learn. Represent., 2021. [Online]. Available: https://openreview.net/forum?id=te7PVH1sPxJ
[39]
S. Sivaprasad, A. Singh, N. Manwani, and V. Gandhi, “The curious case of convex neural networks,” in Proc. Mach. Learn. Knowl. Discovery Databases. Res. Track: Eur. Conf. (ECML PKDD)—Part I 21, Bilbao, Spain, September 13–17. New York: Springer, 2021, pp. 738–754.
[40]
P.-J. Hoedt and G. Klambauer, “Principled weight initialisation for input-convex neural networks,” Adv. Neur. Inf. Process. Syst., vol. 36, 2024.
[41]
J. Lorraine and S. Hossain, “JacNet: Learning functions with structured Jacobians,” 2024,.
[42]
P. J. Davis and P. Rabinowitz, Methods of Numerical Integration. Courier Corporation, 2007.
[43]
T. Tao, Analysis II. New York: Springer, 2016.
[44]
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis. Springer Science & Business Media, 2009, vol. 317.
[45]
E. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, and W. Yin, “Plug-and-play methods provably converge with properly trained denoisers,” in Int. Conf. Mach. Learn. PMLR, 2019, pp. 5546–5557.
[46]
M. Terris, A. Repetti, J.-C. Pesquet, and Y. Wiaux, “Building firmly nonexpansive convolutional neural networks,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP). Piscataway, NJ, USA: IEEE Press, 2020, pp. 8658–8662.
[47]
S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.
[48]
S. Mukherjee, S. Dittmer, Z. Shumaylov, S. Lunz, O. Öktem, and C.-B. Schönlieb, “Learned convex regularizers for inverse problems,” 2020,.
[49]
R. T. Rockafellar, Convex Analysis, vol. 18. Princeton, NJ, USA: Princeton Univ. Press, 1970.
[50]
A. H. Smith and W. A. Albrecht, Fundamental Concepts of Analysis. Englewood Cliffs, NJ, USA: Prentice-Hall, 1966.
[51]
G. C. Calafiore, S. Gaubert, and C. Possieri, “Log-sum-exp neural networks and posynomial models for convex and log-log-convex data,” IEEE Trans. Neural Networks Learn. Syst., vol. 31, no. 3, pp. 827–838, Mar. 2020.
[52]
M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,” Neural Networks, vol. 6, no. 6, pp. 861–867, Jun. 1993. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0893608005801315
[53]
J. Park and I. W. Sandberg, “Approximation and radial-basis-function networks,” Neural Comput., vol. 5, no. 2, pp. 305–316, Feb. 1993.
[54]
G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Math. Control, Signals Syst., vol. 2, no. 4, pp. 303–314, 1989.
[55]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit., 2016, pp. 770–778.
[56]
H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the loss landscape of neural nets,” Adv. Neur. Inf. Process. Syst., vol. 31, 2018.
[57]
K. Svanberg, “A class of globally convergent optimization methods based on conservative convex separable approximations,” SIAM J. Optim., vol. 12, no. 2, pp. 555–573, 2002.
[58]
A. Blum, J. Hopcroft, and R. Kannan, Foundations of Data Science. Cambridge, U.K.: Cambridge Univ. Press, 2020.
[59]
S. Greydanus, M. Dzamba, and J. Yosinski, “Hamiltonian neural networks,” in Adv. Neur. Inf. Process. Syst., H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2019/file/26cd8ecadce0d4efd6cc8a8725cbd1f8-Paper.pdf
Index Terms
- Gradient Networks
Index terms have been assigned to the content through auto-classification.
Recommendations
On three-term conjugate gradient algorithms for unconstrained optimization
This paper presents a project for three-term conjugate gradient algorithms development. The search direction of the algorithms from this class has three terms and is computed as modifications of the classical conjugate gradient algorithms to satisfy ...
Another nonlinear conjugate gradient algorithm for unconstrained optimization
A nonlinear conjugate gradient algorithm which is a modification of the Dai and Yuan [A nonlinear conjugate gradient method with a strong global convergence property, SIAM J. Optim. 10 (1999), pp. 177-182] conjugate gradient algorithm satisfying a ...
Comments
Information & Contributors
Information
Published In
1053-587X © 2024 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
Publisher
IEEE Press
Publication History
Published: 13 November 2024
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 17 Jan 2025