Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation

Lei Wang; Lingxiao Ma; Shijie Cao; Quanlu Zhang; Jilong Xue; Yining Shi; Ningxin Zheng; Ziming Miao; Fan Yang; Ting Cao; Yuqing Yang; Mao Yang

Authors:

Lei Wang, University of Chinese Academy of Sciences & Microsoft Research; Lingxiao Ma, Shijie Cao, Quanlu Zhang, and Jilong Xue, Microsoft Research; Yining Shi, Peking University & Microsoft Research; Ningxin Zheng, Ziming Miao, Fan Yang, Ting Cao, Yuqing Yang, and Mao Yang, Microsoft Research

Abstract:

The increasing demand for improving deep learning model performance has led to a paradigm shift in supporting low-precision computation to harness the robustness of deep learning to errors. Despite the emergence of new low-precision data types and optimization approaches, existing hardware and software have insufficient and inefficient support for those evolving data types, making it challenging to achieve real performance gains through low-precision computing.

This paper introduces Ladder, a novel compiler designed to bridge the gap between evolving custom data types and the fixed precision formats supported by current hardware. Leveraging a general type system, tType, and an extended tensor expression, Ladder transforms deep neural network (DNN) computations into optimized computing pipelines with custom data types as the first-class citizen, exposing an optimization space for efficiently handling data storage, accesses, and type conversions. Ladder employs a new set of tensor scheduling primitives and a hardware-aware optimization policy to navigate the complex transformation space, ensuring optimal performance across different memory layers and DNN operators. Our evaluation demonstrates Ladder's capability to systematically support a wide array of low-bit precision custom data types, significantly enhancing the performance of DNN computations on modern accelerators without necessitating hardware modifications. This innovation empowers model designers with the ability to explore data type optimizations and offers hardware vendors a flexible solution to expand their support for diverse precision formats.

Lei Wang, University of Chinese Academy of Sciences & Microsoft Research

Lingxiao Ma, Microsoft Research

Shijie Cao, Microsoft Research

Quanlu Zhang, Microsoft Research

Jilong Xue, Microsoft Research

Yining Shi, Peking University & Microsoft Research

Ningxin Zheng, Microsoft Research

Ziming Miao, Microsoft Research

Fan Yang, Microsoft Research

Ting Cao, Microsoft Research

Yuqing Yang, Microsoft Research

Mao Yang, Microsoft Research

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {298699,
author = {Lei Wang and Lingxiao Ma and Shijie Cao and Quanlu Zhang and Jilong Xue and Yining Shi and Ningxin Zheng and Ziming Miao and Fan Yang and Ting Cao and Yuqing Yang and Mao Yang},
title = {Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation},
booktitle = {18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)},
year = {2024},
isbn = {978-1-939133-40-3},
address = {Santa Clara, CA},
pages = {307--323},
url = {https://www.usenix.org/conference/osdi24/presentation/wang-lei},
publisher = {USENIX Association},
month = jul
}

Download