research-article

Open access

Verified tensor-program optimization via high-level scheduling rewrites

Authors:

Gilbert Louis Bernstein,

Jonathan Ragan-KelleyAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 6, Issue POPL

Article No.: 55, Pages 1 - 28

https://doi.org/10.1145/3498717

Published: 12 January 2022 Publication History

Abstract

We present a lightweight Coq framework for optimizing tensor kernels written in a pure, functional array language. Optimizations rely on user scheduling using series of verified, semantics-preserving rewrites. Unusually for compilation targeting imperative code with arrays and nested loops, all rewrites are source-to-source within a purely functional language. Our language comprises a set of core constructs for expressing high-level computation detail and a set of what we call reshape operators, which can be derived from core constructs but trigger low-level decisions about storage patterns and ordering. We demonstrate that not only is this system capable of deriving the optimizations of existing state-of-the-art languages like Halide and generating comparably performant code, it is also able to schedule a family of useful program transformations beyond what is reachable in Halide.

Supplementary Material

Auxiliary Presentation Video (popl22main-p523-p-video.mp4)

This is a presentation video for our paper at POPL 2022 accepted into the research track. In this paper we introduce a verified framework for optimizing tensor programs and a small evaluation presenting preliminary results.

Download
47.86 MB

References

[1]

Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: expressing locality and independence with logical regions. In SC Conference on High Performance Computing Networking, Storage and Analysis, SC ’12. IEEE, Piscataway, NJ, USA. 66. https://doi.org/10.1109/SC.2012.71

Digital Library

[2]

Gilbert Bernstein, Michael Mara, Tzu-Mao Li, Dougal Maclaurin, and Jonathan Ragan-Kelley. 2020. Differentiating a Tensor Language. arxiv:2008.11256.

[3]

Manuel M. T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In Proceedings of the POPL 2011 Workshop on Declarative Aspects of Multicore Programming, Manuel Carro and John H. Reppy (Eds.). Association for Computing Machinery, New York, NY, USA. 3–14. https://doi.org/10.1145/1926354.1926358

Digital Library

[4]

B.L. Chamberlain, D. Callahan, and H.P. Zima. 2007. Parallel Programmability and the Chapel Language. The International Journal of High Performance Computing Applications, 21, 3 (2007), 291–312. https://doi.org/10.1177/1094342007078442

Digital Library

[5]

Bradford L. Chamberlain. 2001. The design and implementation of a region-based parallel programming language. Ph.D. Dissertation. The University of Washington.

[6]

Chun Chen, Jacqueline Chame, and Mary Hall. 2008. CHiLL: A framework for composing high-level loop transformations. University of Southern California.

[7]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-end Optimizing Compiler for Deep Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’18). USENIX Association, Berkeley, CA, USA. 579–594. isbn:978-1-931971-47-8 http://dl.acm.org/citation.cfm?id=3291168.3291211

[8]

Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. 2015. Fiat: Deductive Synthesis of Abstract Data Types in a Proof Assistant. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015. 689–700. https://doi.org/10.1145/2676726.2677006

Digital Library

[9]

Benjamin Delaware, Sorawit Suriyakarn, Clément Pit-Claudel, Qianchuan Ye, and Adam Chlipala. 2019. Narcissus: Correct-By-Construction Derivation of Decoders and Encoders from Binary Formats. In Proc. ICFP. https://doi.org/10.1145/3341686

Digital Library

[10]

Sébastien Donadio, James C. Brodman, Thomas Roeder, Kamen Yotov, Denis Barthou, Albert Cohen, María Jesús Garzarán, David A. Padua, and Keshav Pingali. 2005. A Language for the Compact Representation of Multiple Program Versions. In Languages and Compilers for Parallel Computing, 18th International Workshop, LCPC 2005. Springer Berlin Heidelberg, Berlin, Heidelberg. 136–151. https://doi.org/10.1007/978-3-540-69330-7_10

Digital Library

[11]

Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken, William J. Dally, and Pat Hanrahan. 2006. Sequoia: Programming the Memory Hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC ’06). Association for Computing Machinery, New York, NY, USA. 83–es. isbn:0769527000 https://doi.org/10.1145/1188455.1188543

[12]

Rongxiao Fu, Xueying Qin, Ornela Dardha, and Michel Steuwer. 2021. Row-Polymorphic Types for Strategic Rewriting. arxiv:2103.13390.

[13]

Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. 2011. Concrete Mathematics. Addison Wesley, 36–37.

[14]

Bastian Hagedorn, Archibald Samuel Elliott, Henrik Barthels, Rastislav Bodik, and Vinod Grover. 2020. Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs. arxiv:2003.06324.

[15]

Albert Hartono, Boyana Norris, and Ponnuswamy Sadayappan. 2009. Annotation-based empirical performance tuning using Orio. In 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23-29, 2009. IEEE, Piscataway, NJ, USA. 1–11. https://doi.org/10.1109/IPDPS.2009.5161004

Digital Library

[16]

Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, and Cosmin E. Oancea. 2017. Futhark: Purely Functional GPU-programming with Nested Parallelism and In-place Array Updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA. 556–571. isbn:978-1-4503-4988-8 https://doi.org/10.1145/3062341.3062354

Digital Library

[17]

Kesha Hietala, Robert Rand, Shih-Han Hung, Xiaodi Wu, and Michael Hicks. 2021. A verified optimizer for Quantum circuits. Proceedings of the ACM on Programming Languages, 5, POPL (2021), Jan, 1–29. issn:2475-1421 https://doi.org/10.1145/3434318

Digital Library

[18]

Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand. 2019. Taichi: a language for high-performance computation on spatially sparse data structures. ACM Trans. Graph., 38, 6 (2019), 201:1–201:16. https://doi.org/10.1145/3355089.3356506

Digital Library

[19]

Kenneth E. Iverson. 1962. A Programming Language. John Wiley & Sons, Inc., New York, NY, USA. isbn:0-471430-14-5

Digital Library

[20]

Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), oct, 1–29. https://doi.org/10.1145/3133901

Digital Library

[21]

Steve Kommrusch, Théo Barollet, and Louis-Noël Pouchet. 2021. Proving Equivalence Between Complex Expressions Using Graph-to-Sequence Neural Models. arxiv:2106.02452.

[22]

Tzu-Mao Li, Michaël Gharbi, Andrew Adams, Frédo Durand, and Jonathan Ragan-Kelley. 2018. Differentiable programming for image processing and deep learning in Halide. ACM Trans. Graph. (Proc. SIGGRAPH), 37, 4 (2018), 139:1–139:13. https://doi.org/10.1145/3197517.3201383

Digital Library

[23]

Adam Paszke, Daniel D. Johnson, David Duvenaud, Dimitrios Vytiniotis, Alexey Radul, Matthew J. Johnson, Jonathan Ragan-Kelley, and Dougal Maclaurin. 2021. Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming. In The 25th ACM SIGPLAN International Conference on Functional Programming (ICFP). ACM. https://doi.org/10.1145/3473593

Digital Library

[24]

Clément Pit-Claudel, Peng Wang, Benjamin Delaware, Jason Gross, and Adam Chlipala. 2020. Extensible Extraction of Efficient Imperative Programs with Foreign Functions, Manually Managed Memory, and Proofs. In IJCAR’20: Proceedings of the 9th International Joint Conference on Automated Reasoning. https://doi.org/10.1007/978-3-030-51054-1_7

Digital Library

[25]

Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman P. Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph., 31, 4 (2012), 32:1–32:12. https://doi.org/10.1145/2185520.2185528

Digital Library

[26]

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proc. PLDI. ACM, Seattle. https://doi.org/10.1145/2491956.2462176

Digital Library

[27]

Justin Slepak, Olin Shivers, and Panagiotis Manolios. 2014. An Array-Oriented Language with Static Rank Polymorphism. In Programming Languages and Systems, Zhong Shao (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 27–46. isbn:978-3-642-54833-8 https://doi.org/10.1007/978-3-642-54833-8_3

Digital Library

[28]

Gus Henry Smith, Andrew Liu, Steven Lyubomirsky, Scott Davidson, Joseph McMahan, Michael Taylor, Luis Ceze, and Zachary Tatlock. 2021. Pure Tensor Program Rewriting via Access Patterns (Representation Pearl). In Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming (MAPS 2021). Association for Computing Machinery, New York, NY, USA. 21–31. isbn:9781450384674 https://doi.org/10.1145/3460945.3464953

Digital Library

[29]

Michel Steuwer, Chris Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating Performance Portable Code using Rewrite Rules: From High-Level Functional Expressions to High-Performance OpenCL Code. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming. 50, Association for Computing Machinery. https://doi.org/10.1145/2784731.2784754

Digital Library

[30]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. arxiv:1802.04730.

[31]

Anand Venkat, Tharindu Rusira, Raj Barik, Mary Hall, and Leonard Truong. 2019. SWIRL: High-performance many-core CPU code generation for deep neural networks. The International Journal of High Performance Computing Applications, 33, 6 (2019), 1275–1289. https://doi.org/10.1177/1094342019866247 arxiv:https://doi.org/10.1177/1094342019866247.

Digital Library

[32]

Qing Yi, Keith Seymour, Haihang You, Richard W. Vuduc, and Daniel J. Quinlan. 2007. POET: Parameterized Optimizations for Empirical Tuning. In 21st International Parallel and Distributed Processing Symposium (IPDPS 2007). IEEE, Piscataway, NJ, USA. 1–8. https://doi.org/10.1109/IPDPS.2007.370637

[33]

Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman P. Amarasinghe. 2018. GraphIt: a high-performance graph DSL. PACMPL, 2, OOPSLA (2018), 121:1–121:30. https://doi.org/10.1145/3276491

Digital Library

Cited By

Root AYan BLiu PGyurgyik CBik AKjolstad F(2024)Compilation of Shape Operators on Sparse ArraysProceedings of the ACM on Programming Languages10.1145/36897528:OOPSLA2(1162-1188)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689752
Chen EChang JZhu Y(2024)CoolerSpace: A Language for Physically Correct and Computationally Efficient Color ProgrammingProceedings of the ACM on Programming Languages10.1145/36897418:OOPSLA2(846-875)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689741
Lin JIkarashi YBernstein GMcCann J(2024)UFO Instruction Graphs Are Machine KnittableACM Transactions on Graphics10.1145/368794843:6(1-22)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687948
Show More Cited By

Index Terms

Verified tensor-program optimization via high-level scheduling rewrites
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. Context specific languages
      1. Domain specific languages
  2. Software organization and properties
    1. Software functional properties
      1. Formal methods
        Software verification

Recommendations

An EDSL approach to high performance Haskell programming
Haskell '13

This paper argues for a new methodology for writing high performance Haskell programs by using Embedded Domain Specific Languages.

We exemplify the methodology by describing a complete library, meta-repa, which is a reimplementation of parts of the repa ...
An EDSL approach to high performance Haskell programming
Haskell '13: Proceedings of the 2013 ACM SIGPLAN symposium on Haskell

This paper argues for a new methodology for writing high performance Haskell programs by using Embedded Domain Specific Languages.

We exemplify the methodology by describing a complete library, meta-repa, which is a reimplementation of parts of the repa ...
Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Algebras based on combinators, i.e., variable-free, have been proposed as a better representation for query compilation and optimization. A key benefit of combinators is that they avoid the need to handle variable shadowing or accidental capture during ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 6, Issue POPL

January 2022

1886 pages

EISSN:2475-1421

DOI:10.1145/3511309

Issue’s Table of Contents

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 January 2022

Published in PACMPL Volume 6, Issue POPL

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
1,914
Total Downloads

Downloads (Last 12 months)441
Downloads (Last 6 weeks)43

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Root AYan BLiu PGyurgyik CBik AKjolstad F(2024)Compilation of Shape Operators on Sparse ArraysProceedings of the ACM on Programming Languages10.1145/36897528:OOPSLA2(1162-1188)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689752
Chen EChang JZhu Y(2024)CoolerSpace: A Language for Physically Correct and Computationally Efficient Color ProgrammingProceedings of the ACM on Programming Languages10.1145/36897418:OOPSLA2(846-875)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689741
Lin JIkarashi YBernstein GMcCann J(2024)UFO Instruction Graphs Are Machine KnittableACM Transactions on Graphics10.1145/368794843:6(1-22)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687948
Rasch A(2024)(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional HomomorphismsACM Transactions on Programming Languages and Systems10.1145/366564346:3(1-74)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3665643
Gladshtein VZhao QAhrens WAmarasinghe SSergey I(2024)Mechanised Hypersafety Proofs about Structured DataProceedings of the ACM on Programming Languages10.1145/36564038:PLDI(647-670)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656403
Liu ABernstein GChlipala ARagan-Kelley J(2024)A Verified Compiler for a Functional Tensor LanguageProceedings of the ACM on Programming Languages10.1145/36563908:PLDI(320-342)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656390
Stein BChang BSridharan M(2024)Interactive Abstract Interpretation with Demanded SummarizationACM Transactions on Programming Languages and Systems10.1145/364844146:1(1-40)Online publication date: 29-Mar-2024
https://dl.acm.org/doi/10.1145/3648441
Huang BLyubomirsky SLi YHe MSmith GTambe TGaonkar ACanumalla VCheung AWei GGupta ATatlock ZMalik S(2024)Application-level Validation of Accelerator Designs Using a Formal Software/Hardware InterfaceACM Transactions on Design Automation of Electronic Systems10.1145/363905129:2(1-25)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1145/3639051
Liu HLiu XXie XTong XLi K(2024)PmTrackProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314337:4(1-30)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631433
van den Haak LWijs AHuisman Mvan den Brand M(2024)Verifying a Radio Telescope Pipeline Using HaliVer: Solving Nonlinear and Quantifier ChallengesFormal Methods for Industrial Critical Systems10.1007/978-3-031-68150-9_9(152-169)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-68150-9_9
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents