Vector Forward Mode Automatic Differentiation on SIMD/SIMT architectures

Published: 17 August 2020 Publication History


Automatic differentiation, back-propagation, differentiable programming and related methods have received widespread attention, due to their ability to compute accurate gradients of numerical programs for optimization, uncertainty quantification, and machine learning. Two strategies are commonly used. The forward mode, which is easy to implement but has an overhead compared to the original program that grows linearly with the number of inputs, and the reverse mode, which can compute gradients for an arbitrary number of program inputs with a constant factor overhead, although the constant can be large, more memory is required, and the implementation is often challenging. Previous literature has shown that the forward mode can be more easily parallelized and vectorized than the reverse mode, but case studies investigating when either mode is the best choice are lacking, especially for modern CPUs and GPUs. In this paper, we demonstrate that the forward mode can outperform the reverse mode for programs with tens or hundreds of directional derivatives, a number that may yet increase if current hardware trends continue.


  • (2023)Optimization of Ported CFD Kernels on Intel Data Center GPU Max 1550 using oneAPI ESIMDProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624251(1705-1712)Online publication date: 12-Nov-2023
  • (2023)Efficient GPU Implementation of Automatic Differentiation for Computational Fluid Dynamics2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00055(377-386)Online publication date: 18-Dec-2023
  • (2022)Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with SacadoACM Transactions on Mathematical Software10.1145/356026248:4(1-29)Online publication date: 19-Dec-2022



ICPP '20: Proceedings of the 49th International Conference on Parallel Processing
August 2020
844 pages
Author Tags

  1. Automatic Differentiation
  2. GPU
  3. Julia Language
  4. Reduced Precision
  5. SIMD
  6. Vector Forward Mode


ICPP '20

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%


