research-article

Layer collaboration in the forward-forward algorithm

AUTHORs:

Alexander Schwing,

Tamir HazanAuthors Info & Claims

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

Article No.: 1576, Pages 14141 - 14148

https://doi.org/10.1609/aaai.v38i13.29324

Published: 20 February 2024 Publication History

Abstract

Backpropagation, which uses the chain rule, is the de-facto standard algorithm for optimizing neural networks nowadays. Recently, as an alternative, the Forward-Forward algorithm had been proposed. The algorithm optimizes neural nets layer-by-layer, without propagating gradients throughout the network. Although such an approach has several advantages over back-propagation and shows promising results, the fact that each layer is being trained independently limits the optimization process. Specifically, it prevents the network's layers from collaborating to learn complex and rich features. In this work, we study layer collaboration in the forward-forward algorithm. We show that the current version of the forward-forward algorithm is suboptimal when considering information flow in the network, resulting in a lack of collaboration between layers of the network. We propose an improved version that supports layer collaboration to better utilize the network structure, while not requiring any additional assumptions or computations. We empirically demonstrate the efficacy of the proposed version when considering both information flow and objective metrics. Additionally, we provide a theoretical motivation for the proposed method, inspired by functional entropy theory.

References

[1]

Adel, H.; Vu, N. T.; Kirchhoff, K.; Telaar, D.; and Schultz, T. 2015. Syntactic and semantic features for code-switching factored language models. IEEE/ACM transactions on audio, speech, and language Processing.

[2]

Anceschi, F.; and Zhu, Y. 2021. On a spatially inhomogeneous nonlinear Fokker-Planck equation: Cauchy problem and diffusion asymptotics. arXiv preprint arXiv:2102.12795.

[3]

Anderson, P.; He, X.; Buehler, C.; Teney, D.; Johnson, M.; Gould, S.; and Zhang, L. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR.

[4]

Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L. D.; Monfort, M.; Muller, U.; Zhang, J.; et al. 2016. End to end learning for self-driving cars. In arXiv preprint arXiv:1604.07316.

[5]

Clauset, A.; and Post, K. 2019. Machine learning improves hearing aids. Science.

[6]

Evans, J. 2021. Hypocoercivity in Phi-Entropy for the Linear Relaxation Boltzmann Equation on the Torus. SIAM, 53.

[7]

Fedorov, I.; Stamenovic, M.; Jensen, C.; Yang, L.-C.; Mandell, A.; Gan, Y.; Mattina, M.; and Whatmough, P. N. 2020. TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids. arXiv preprint arXiv:2005.11138.

[8]

Gat, I.; Adi, Y.; Schwing, A.; and Hazan, T. 2022a. On the Importance of Gradient Norm in PAC-Bayesian Bounds. In NeurIPS, volume 35.

[9]

Gat, I.; Calderon, N.; Reichart, R.; and Hazan, T. 2022b. A Functional Information Perspective on Model Interpretation. In ICML.

[10]

Gat, I.; Lorberbom, G.; Schwartz, I.; and Hazan, T. 2022c. Latent space explanation by intervention. In AAAI, volume 36.

[11]

Gat, I.; Schwartz, I.; and Schwing, A. 2021. Perceptual Score: What Data Modalities Does Your Model Perceive? In NeurIPS.

[12]

Gat, I.; Schwartz, I.; Schwing, A.; and Hazan, T. 2020. Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies. NeurIPS, 33.

[13]

Haarnoja, T.; Tang, H.; Abbeel, P.; and Levine, S. 2017. Reinforcement learning with deep energy-based policies. In ICML.

[14]

Hinton, G. 2022. The Forward-Forward Algorithm: Some Preliminary Investigations. ArXiv, 2212.13345.

[15]

Indelman, H. C.; and Hazan, T. 2021. Learning randomly perturbed structured predictors for direct loss minimization. In International Conference on Machine Learning. PMLR.

[16]

Jahnel, B.; and Köppl, J. 2023. Trajectorial dissipation of Phi-entropies for interacting particle systems. arXiv preprint arXiv:2301.03922.

[17]

Lavin, A.; and Gray, S. 2016. Fast algorithms for convolutional neural networks. In CVPR.

[18]

LeCun, Y.; Chopra, S.; Hadsell, R.; Ranzato, M.; and Huang, F. 2006. A tutorial on energy-based learning. Predicting structured data.

[19]

LeCun, Y.; and Huang, F. J. 2005. Loss functions for discriminative training of energy-based models. In International workshop on artificial intelligence and statistics.

[20]

Liu, C.; Feng, L.; Liu, G.; Wang, H.; and Liu, S. 2021. Bottom-up broadcast neural network for music genre classification. IEEE.

[21]

Liu, Q.; and Wang, D. 2017. Learning deep energy models: Contrastive divergence vs. amortized mle. arXiv preprint arXiv:1707.00797.

[22]

Liu, T.-Y.; et al. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3): 225-331.

[23]

Lorberbom, G.; Gane, A.; Jaakkola, T.; and Hazan, T. 2019. Direct Optimization through for Discrete Variational AutoEncoder. Advances in neural information processing systems, 32.

[24]

Luss, R.; Chen, P.-Y.; Dhurandhar, A.; Sattigeri, P.; Zhang, Y.; Shanmugam, K.; and Tu, C.-C. 2019. Generating contrastive explanations with monotonic attribute functions. In arXiv preprint arXiv:1905.12698.

[25]

Mahdi, A.; Qin, J.; and Crosby, G. 2019. DeepFeat: A bottom-up and top-down saliency model based on deep features of convolutional neural networks. IEEE Transactions on Cognitive and Developmental Systems.

[26]

Maulud, D. H.; Zeebaree, S. R.; Jacksi, K.; Sadeeq, M. A. M.; and Sharif, K. H. 2021. State of art for semantic analysis of natural language processing. Qubahan Academic Journal.

[27]

Morais, E.; Hoory, R.; Zhu, W.; Gat, I.; Damasceno, M.; and Aronowitz, H. 2022. Speech Emotion Recognition Using Self-Supervised Features. In ICASSP.

[28]

Ngiam, J.; Chen, Z.; Koh, P. W.; and Ng, A. Y. 2011. Learning deep energy models. In ICML.

[29]

Ophir, Y.; Tikochinski, R.; Asterhan, C. S.; Sisso, I.; and Reichart, R. 2020. Deep neural networks detect suicide risk from textual facebook posts. Scientific reports.

[30]

Rosenberg, D.; Gat, I.; Feder, A.; and Reichart, R. 2021. Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions. In ACL.

[31]

Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1986. Learning representations by back-propagating errors. nature.

[32]

Salakhutdinov, R.; and Hinton, G. 2009. Deep Boltzmann Machines. In ICAIS, volume 5.

[33]

Sharma, K.; Singh, B.; Herman, E.; Regine, R.; Rajest, S. S.; and Mishra, V. P. 2021. Maximum Information Measure Policies in Reinforcement Learning with Deep Energy-Based Model. In ICCIKE.

[34]

Workman, S.; and Jacobs, N. 2015. On the location dependence of convolutional neural network features. In CVPR.

Index Terms

Layer collaboration in the forward-forward algorithm

Index terms have been assigned to the content through auto-classification.

Recommendations

Single-Iteration Training Algorithm for Multi-Layer Feed-Forward Neural Networks

A new methodology for neural learning is presented. Only a single iteration is needed to train a feed-forward network with near-optimal results. This is achieved by introducing a key modification to the conventional multi-layer architecture. A virtual ...
European option pricing based on the multi-layer forward BP neural network
ICCBD '24: Proceedings of the 2024 International Conference on Cloud Computing and Big Data

By applying the multi-layer forward BP neural network to European option pricing, the TensorFlow platform is used to build the network. The interpolation features combine with numerical methods and statistical methods are used to predict the training ...
A study of sudden noise resistance based on four-layer feed-forward neural network blind equalization algorithm
AICI'11: Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part III

In the study of feed-forward neural network blind equalization algorithm, three-layer BP neural network structure is usually adopted .In this paper, iterative formula of four-layer feed forward neural network blind equalization algorithm was deduced by ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

February 2024

23861 pages

ISBN:978-1-57735-887-9

Copyright © 2024 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 20 February 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents