Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

Xue, Zi Yu; Wu, Yannan Nellie; Emer, Joel S.; Sze, Vivienne

doi:10.1145/3613424.3623793

Computer Science > Hardware Architecture

arXiv:2310.00192 (cs)

[Submitted on 29 Sep 2023 (v1), last revised 26 Jun 2024 (this version, v2)]

Title:Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

Authors:Zi Yu Xue, Yannan Nellie Wu, Joel S. Emer, Vivienne Sze

View PDF HTML (experimental)

Abstract:Sparse tensor algebra is a challenging class of workloads to accelerate due to low arithmetic intensity and varying sparsity patterns. Prior sparse tensor algebra accelerators have explored tiling sparse data to increase exploitable data reuse and improve throughput, but typically allocate tile size in a given buffer for the worst-case data occupancy. This severely limits the utilization of available memory resources and reduces data reuse. Other accelerators employ complex tiling during preprocessing or at runtime to determine the exact tile size based on its occupancy. This paper proposes a speculative tensor tiling approach, called overbooking, to improve buffer utilization by taking advantage of the distribution of nonzero elements in sparse tensors to construct larger tiles with greater data reuse. To ensure correctness, we propose a low-overhead hardware mechanism, Tailors, that can tolerate data overflow by design while ensuring reasonable data reuse. We demonstrate that Tailors can be easily integrated into the memory hierarchy of an existing sparse tensor algebra accelerator. To ensure high buffer utilization with minimal tiling overhead, we introduce a statistical approach, Swiftiles, to pick a tile size so that tiles usually fit within the buffer's capacity, but can potentially overflow, i.e., it overbooks the buffers. Across a suite of 22 sparse tensor algebra workloads, we show that our proposed overbooking strategy introduces an average speedup of $52.7\times$ and $2.3\times$ and an average energy reduction of $22.5\times$ and $2.5\times$ over ExTensor without and with optimized tiling, respectively.

Comments:	17 pages, 13 figures, in MICRO 2023
Subjects:	Hardware Architecture (cs.AR)
Cite as:	arXiv:2310.00192 [cs.AR]
	(or arXiv:2310.00192v2 [cs.AR] for this version)
	https://doi.org/10.48550/arXiv.2310.00192
Journal reference:	56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '23), 2023
Related DOI:	https://doi.org/10.1145/3613424.3623793

Submission history

From: Zi Yu Xue [view email]
[v1] Fri, 29 Sep 2023 23:56:04 UTC (1,683 KB)
[v2] Wed, 26 Jun 2024 15:07:58 UTC (1,683 KB)

Computer Science > Hardware Architecture

Title:Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Hardware Architecture

Title:Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators