Cited By
View all- Deshmukh SYokota RBosilca G(2023)Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu ProcessorsACM Transactions on Mathematical Software10.1145/359517849:3(1-29)Online publication date: 19-Sep-2023
- Abdelfattah AHaidar ATomov SDongarra J(2016)On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2016.190(1249-1258)Online publication date: May-2016
- Abdelfattah ACosta TDongarra JGates MHaidar AHammarling SHigham NKurzak JLuszczek PTomov SZounon M(2021)A Set of Batched Basic Linear Algebra Subprograms and LAPACK RoutinesACM Transactions on Mathematical Software10.1145/343192147:3(1-23)Online publication date: 26-Jun-2021
- Show More Cited By