skip to main content
10.1609/aaai.v38i13.29324guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Layer collaboration in the forward-forward algorithm

Published: 20 February 2024 Publication History

Abstract

Backpropagation, which uses the chain rule, is the de-facto standard algorithm for optimizing neural networks nowadays. Recently, as an alternative, the Forward-Forward algorithm had been proposed. The algorithm optimizes neural nets layer-by-layer, without propagating gradients throughout the network. Although such an approach has several advantages over back-propagation and shows promising results, the fact that each layer is being trained independently limits the optimization process. Specifically, it prevents the network's layers from collaborating to learn complex and rich features. In this work, we study layer collaboration in the forward-forward algorithm. We show that the current version of the forward-forward algorithm is suboptimal when considering information flow in the network, resulting in a lack of collaboration between layers of the network. We propose an improved version that supports layer collaboration to better utilize the network structure, while not requiring any additional assumptions or computations. We empirically demonstrate the efficacy of the proposed version when considering both information flow and objective metrics. Additionally, we provide a theoretical motivation for the proposed method, inspired by functional entropy theory.

References

[1]
Adel, H.; Vu, N. T.; Kirchhoff, K.; Telaar, D.; and Schultz, T. 2015. Syntactic and semantic features for code-switching factored language models. IEEE/ACM transactions on audio, speech, and language Processing.
[2]
Anceschi, F.; and Zhu, Y. 2021. On a spatially inhomogeneous nonlinear Fokker-Planck equation: Cauchy problem and diffusion asymptotics. arXiv preprint arXiv:2102.12795.
[3]
Anderson, P.; He, X.; Buehler, C.; Teney, D.; Johnson, M.; Gould, S.; and Zhang, L. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR.
[4]
Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L. D.; Monfort, M.; Muller, U.; Zhang, J.; et al. 2016. End to end learning for self-driving cars. In arXiv preprint arXiv:1604.07316.
[5]
Clauset, A.; and Post, K. 2019. Machine learning improves hearing aids. Science.
[6]
Evans, J. 2021. Hypocoercivity in Phi-Entropy for the Linear Relaxation Boltzmann Equation on the Torus. SIAM, 53.
[7]
Fedorov, I.; Stamenovic, M.; Jensen, C.; Yang, L.-C.; Mandell, A.; Gan, Y.; Mattina, M.; and Whatmough, P. N. 2020. TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids. arXiv preprint arXiv:2005.11138.
[8]
Gat, I.; Adi, Y.; Schwing, A.; and Hazan, T. 2022a. On the Importance of Gradient Norm in PAC-Bayesian Bounds. In NeurIPS, volume 35.
[9]
Gat, I.; Calderon, N.; Reichart, R.; and Hazan, T. 2022b. A Functional Information Perspective on Model Interpretation. In ICML.
[10]
Gat, I.; Lorberbom, G.; Schwartz, I.; and Hazan, T. 2022c. Latent space explanation by intervention. In AAAI, volume 36.
[11]
Gat, I.; Schwartz, I.; and Schwing, A. 2021. Perceptual Score: What Data Modalities Does Your Model Perceive? In NeurIPS.
[12]
Gat, I.; Schwartz, I.; Schwing, A.; and Hazan, T. 2020. Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies. NeurIPS, 33.
[13]
Haarnoja, T.; Tang, H.; Abbeel, P.; and Levine, S. 2017. Reinforcement learning with deep energy-based policies. In ICML.
[14]
Hinton, G. 2022. The Forward-Forward Algorithm: Some Preliminary Investigations. ArXiv, 2212.13345.
[15]
Indelman, H. C.; and Hazan, T. 2021. Learning randomly perturbed structured predictors for direct loss minimization. In International Conference on Machine Learning. PMLR.
[16]
Jahnel, B.; and Köppl, J. 2023. Trajectorial dissipation of Phi-entropies for interacting particle systems. arXiv preprint arXiv:2301.03922.
[17]
Lavin, A.; and Gray, S. 2016. Fast algorithms for convolutional neural networks. In CVPR.
[18]
LeCun, Y.; Chopra, S.; Hadsell, R.; Ranzato, M.; and Huang, F. 2006. A tutorial on energy-based learning. Predicting structured data.
[19]
LeCun, Y.; and Huang, F. J. 2005. Loss functions for discriminative training of energy-based models. In International workshop on artificial intelligence and statistics.
[20]
Liu, C.; Feng, L.; Liu, G.; Wang, H.; and Liu, S. 2021. Bottom-up broadcast neural network for music genre classification. IEEE.
[21]
Liu, Q.; and Wang, D. 2017. Learning deep energy models: Contrastive divergence vs. amortized mle. arXiv preprint arXiv:1707.00797.
[22]
Liu, T.-Y.; et al. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3): 225-331.
[23]
Lorberbom, G.; Gane, A.; Jaakkola, T.; and Hazan, T. 2019. Direct Optimization through for Discrete Variational AutoEncoder. Advances in neural information processing systems, 32.
[24]
Luss, R.; Chen, P.-Y.; Dhurandhar, A.; Sattigeri, P.; Zhang, Y.; Shanmugam, K.; and Tu, C.-C. 2019. Generating contrastive explanations with monotonic attribute functions. In arXiv preprint arXiv:1905.12698.
[25]
Mahdi, A.; Qin, J.; and Crosby, G. 2019. DeepFeat: A bottom-up and top-down saliency model based on deep features of convolutional neural networks. IEEE Transactions on Cognitive and Developmental Systems.
[26]
Maulud, D. H.; Zeebaree, S. R.; Jacksi, K.; Sadeeq, M. A. M.; and Sharif, K. H. 2021. State of art for semantic analysis of natural language processing. Qubahan Academic Journal.
[27]
Morais, E.; Hoory, R.; Zhu, W.; Gat, I.; Damasceno, M.; and Aronowitz, H. 2022. Speech Emotion Recognition Using Self-Supervised Features. In ICASSP.
[28]
Ngiam, J.; Chen, Z.; Koh, P. W.; and Ng, A. Y. 2011. Learning deep energy models. In ICML.
[29]
Ophir, Y.; Tikochinski, R.; Asterhan, C. S.; Sisso, I.; and Reichart, R. 2020. Deep neural networks detect suicide risk from textual facebook posts. Scientific reports.
[30]
Rosenberg, D.; Gat, I.; Feder, A.; and Reichart, R. 2021. Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions. In ACL.
[31]
Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1986. Learning representations by back-propagating errors. nature.
[32]
Salakhutdinov, R.; and Hinton, G. 2009. Deep Boltzmann Machines. In ICAIS, volume 5.
[33]
Sharma, K.; Singh, B.; Herman, E.; Regine, R.; Rajest, S. S.; and Mishra, V. P. 2021. Maximum Information Measure Policies in Reinforcement Learning with Deep Energy-Based Model. In ICCIKE.
[34]
Workman, S.; and Jacobs, N. 2015. On the location dependence of convolutional neural network features. In CVPR.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence
February 2024
23861 pages
ISBN:978-1-57735-887-9

Sponsors

  • Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 20 February 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media