research-article

A sound and complete abstraction for reasoning about parallel prefix sums

Authors:

Nathan Chong,

Alastair F. Donaldson,

Jeroen KetemaAuthors Info & Claims

ACM SIGPLAN Notices, Volume 49, Issue 1

Pages 397 - 409

https://doi.org/10.1145/2578855.2535882

Published: 08 January 2014 Publication History

Get Access

Abstract

Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance.

We introduce a novel abstraction, the interval of summations, that allows scalable reasoning about implementations of prefix sums. We present this abstraction as a monoid, and prove a soundness and completeness result showing that a generic sequential prefix sum implementation is correct for an array of length $n$ if and only if it computes the correct result for a specific test case when instantiated with the interval of summations monoid. This allows correctness to be established by running a single test where the input and result require O(n lg(n)) space. This improves upon an existing result by Sheeran where the input requires O(n lg(n)) space and the result O(n² \lg(n)) space, and is more feasible for large n than a method by Voigtlaender that uses O(n) space for the input and result but requires running O(n²) tests. We then extend our abstraction and results to the context of data-parallel programs, developing an automated verification method for GPU implementations of prefix sums. Our method uses static verification to prove that a generic prefix sum implementation is data race-free, after which functional correctness of the implementation can be determined by running a single test case under the interval of summations abstraction.

We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtlaender's method when applied to large arrays.

Supplementary Material

JPG File (d2_left_t9.jpg)

Download
10.57 KB

MP4 File (d2_left_t9.mp4)

Download
252.74 MB

References

[1]

A. Betts, N. Chong, A. F. Donaldson, S. Qadeer, and P. Thomson. GPUVerify: a verifier for GPU kernels. In phOOPSLA, pages 113--132, 2012.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

A sound and complete abstraction for reasoning about parallel prefix sums

Abstraction and Refinement: Towards Scalable and Exact Verification of Neural Networks

Higher-order and tuple-based massively-parallel prefix sums

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations