skip to main content
research-article
Open access

Verified tensor-program optimization via high-level scheduling rewrites

Published: 12 January 2022 Publication History

Abstract

We present a lightweight Coq framework for optimizing tensor kernels written in a pure, functional array language. Optimizations rely on user scheduling using series of verified, semantics-preserving rewrites. Unusually for compilation targeting imperative code with arrays and nested loops, all rewrites are source-to-source within a purely functional language. Our language comprises a set of core constructs for expressing high-level computation detail and a set of what we call reshape operators, which can be derived from core constructs but trigger low-level decisions about storage patterns and ordering. We demonstrate that not only is this system capable of deriving the optimizations of existing state-of-the-art languages like Halide and generating comparably performant code, it is also able to schedule a family of useful program transformations beyond what is reachable in Halide.

Supplementary Material

Auxiliary Presentation Video (popl22main-p523-p-video.mp4)
This is a presentation video for our paper at POPL 2022 accepted into the research track. In this paper we introduce a verified framework for optimizing tensor programs and a small evaluation presenting preliminary results.

References

[1]
Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: expressing locality and independence with logical regions. In SC Conference on High Performance Computing Networking, Storage and Analysis, SC ’12. IEEE, Piscataway, NJ, USA. 66. https://doi.org/10.1109/SC.2012.71
[2]
Gilbert Bernstein, Michael Mara, Tzu-Mao Li, Dougal Maclaurin, and Jonathan Ragan-Kelley. 2020. Differentiating a Tensor Language. arxiv:2008.11256.
[3]
Manuel M. T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In Proceedings of the POPL 2011 Workshop on Declarative Aspects of Multicore Programming, Manuel Carro and John H. Reppy (Eds.). Association for Computing Machinery, New York, NY, USA. 3–14. https://doi.org/10.1145/1926354.1926358
[4]
B.L. Chamberlain, D. Callahan, and H.P. Zima. 2007. Parallel Programmability and the Chapel Language. The International Journal of High Performance Computing Applications, 21, 3 (2007), 291–312. https://doi.org/10.1177/1094342007078442
[5]
Bradford L. Chamberlain. 2001. The design and implementation of a region-based parallel programming language. Ph.D. Dissertation. The University of Washington.
[6]
Chun Chen, Jacqueline Chame, and Mary Hall. 2008. CHiLL: A framework for composing high-level loop transformations. University of Southern California.
[7]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-end Optimizing Compiler for Deep Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’18). USENIX Association, Berkeley, CA, USA. 579–594. isbn:978-1-931971-47-8 http://dl.acm.org/citation.cfm?id=3291168.3291211
[8]
Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. 2015. Fiat: Deductive Synthesis of Abstract Data Types in a Proof Assistant. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015. 689–700. https://doi.org/10.1145/2676726.2677006
[9]
Benjamin Delaware, Sorawit Suriyakarn, Clément Pit-Claudel, Qianchuan Ye, and Adam Chlipala. 2019. Narcissus: Correct-By-Construction Derivation of Decoders and Encoders from Binary Formats. In Proc. ICFP. https://doi.org/10.1145/3341686
[10]
Sébastien Donadio, James C. Brodman, Thomas Roeder, Kamen Yotov, Denis Barthou, Albert Cohen, María Jesús Garzarán, David A. Padua, and Keshav Pingali. 2005. A Language for the Compact Representation of Multiple Program Versions. In Languages and Compilers for Parallel Computing, 18th International Workshop, LCPC 2005. Springer Berlin Heidelberg, Berlin, Heidelberg. 136–151. https://doi.org/10.1007/978-3-540-69330-7_10
[11]
Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken, William J. Dally, and Pat Hanrahan. 2006. Sequoia: Programming the Memory Hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC ’06). Association for Computing Machinery, New York, NY, USA. 83–es. isbn:0769527000 https://doi.org/10.1145/1188455.1188543
[12]
Rongxiao Fu, Xueying Qin, Ornela Dardha, and Michel Steuwer. 2021. Row-Polymorphic Types for Strategic Rewriting. arxiv:2103.13390.
[13]
Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. 2011. Concrete Mathematics. Addison Wesley, 36–37.
[14]
Bastian Hagedorn, Archibald Samuel Elliott, Henrik Barthels, Rastislav Bodik, and Vinod Grover. 2020. Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs. arxiv:2003.06324.
[15]
Albert Hartono, Boyana Norris, and Ponnuswamy Sadayappan. 2009. Annotation-based empirical performance tuning using Orio. In 23rd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009, Rome, Italy, May 23-29, 2009. IEEE, Piscataway, NJ, USA. 1–11. https://doi.org/10.1109/IPDPS.2009.5161004
[16]
Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, and Cosmin E. Oancea. 2017. Futhark: Purely Functional GPU-programming with Nested Parallelism and In-place Array Updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA. 556–571. isbn:978-1-4503-4988-8 https://doi.org/10.1145/3062341.3062354
[17]
Kesha Hietala, Robert Rand, Shih-Han Hung, Xiaodi Wu, and Michael Hicks. 2021. A verified optimizer for Quantum circuits. Proceedings of the ACM on Programming Languages, 5, POPL (2021), Jan, 1–29. issn:2475-1421 https://doi.org/10.1145/3434318
[18]
Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand. 2019. Taichi: a language for high-performance computation on spatially sparse data structures. ACM Trans. Graph., 38, 6 (2019), 201:1–201:16. https://doi.org/10.1145/3355089.3356506
[19]
Kenneth E. Iverson. 1962. A Programming Language. John Wiley & Sons, Inc., New York, NY, USA. isbn:0-471430-14-5
[20]
Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), oct, 1–29. https://doi.org/10.1145/3133901
[21]
Steve Kommrusch, Théo Barollet, and Louis-Noël Pouchet. 2021. Proving Equivalence Between Complex Expressions Using Graph-to-Sequence Neural Models. arxiv:2106.02452.
[22]
Tzu-Mao Li, Michaël Gharbi, Andrew Adams, Frédo Durand, and Jonathan Ragan-Kelley. 2018. Differentiable programming for image processing and deep learning in Halide. ACM Trans. Graph. (Proc. SIGGRAPH), 37, 4 (2018), 139:1–139:13. https://doi.org/10.1145/3197517.3201383
[23]
Adam Paszke, Daniel D. Johnson, David Duvenaud, Dimitrios Vytiniotis, Alexey Radul, Matthew J. Johnson, Jonathan Ragan-Kelley, and Dougal Maclaurin. 2021. Getting to the Point. Index Sets and Parallelism-Preserving Autodiff for Pointful Array Programming. In The 25th ACM SIGPLAN International Conference on Functional Programming (ICFP). ACM. https://doi.org/10.1145/3473593
[24]
Clément Pit-Claudel, Peng Wang, Benjamin Delaware, Jason Gross, and Adam Chlipala. 2020. Extensible Extraction of Efficient Imperative Programs with Foreign Functions, Manually Managed Memory, and Proofs. In IJCAR’20: Proceedings of the 9th International Joint Conference on Automated Reasoning. https://doi.org/10.1007/978-3-030-51054-1_7
[25]
Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman P. Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph., 31, 4 (2012), 32:1–32:12. https://doi.org/10.1145/2185520.2185528
[26]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proc. PLDI. ACM, Seattle. https://doi.org/10.1145/2491956.2462176
[27]
Justin Slepak, Olin Shivers, and Panagiotis Manolios. 2014. An Array-Oriented Language with Static Rank Polymorphism. In Programming Languages and Systems, Zhong Shao (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 27–46. isbn:978-3-642-54833-8 https://doi.org/10.1007/978-3-642-54833-8_3
[28]
Gus Henry Smith, Andrew Liu, Steven Lyubomirsky, Scott Davidson, Joseph McMahan, Michael Taylor, Luis Ceze, and Zachary Tatlock. 2021. Pure Tensor Program Rewriting via Access Patterns (Representation Pearl). In Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming (MAPS 2021). Association for Computing Machinery, New York, NY, USA. 21–31. isbn:9781450384674 https://doi.org/10.1145/3460945.3464953
[29]
Michel Steuwer, Chris Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating Performance Portable Code using Rewrite Rules: From High-Level Functional Expressions to High-Performance OpenCL Code. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming. 50, Association for Computing Machinery. https://doi.org/10.1145/2784731.2784754
[30]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions. arxiv:1802.04730.
[31]
Anand Venkat, Tharindu Rusira, Raj Barik, Mary Hall, and Leonard Truong. 2019. SWIRL: High-performance many-core CPU code generation for deep neural networks. The International Journal of High Performance Computing Applications, 33, 6 (2019), 1275–1289. https://doi.org/10.1177/1094342019866247 arxiv:https://doi.org/10.1177/1094342019866247.
[32]
Qing Yi, Keith Seymour, Haihang You, Richard W. Vuduc, and Daniel J. Quinlan. 2007. POET: Parameterized Optimizations for Empirical Tuning. In 21st International Parallel and Distributed Processing Symposium (IPDPS 2007). IEEE, Piscataway, NJ, USA. 1–8. https://doi.org/10.1109/IPDPS.2007.370637
[33]
Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman P. Amarasinghe. 2018. GraphIt: a high-performance graph DSL. PACMPL, 2, OOPSLA (2018), 121:1–121:30. https://doi.org/10.1145/3276491

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 6, Issue POPL
January 2022
1886 pages
EISSN:2475-1421
DOI:10.1145/3511309
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 January 2022
Published in PACMPL Volume 6, Issue POPL

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. array programming
  2. formal verification
  3. optimization
  4. proof assistants

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)441
  • Downloads (Last 6 weeks)43
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media