research-article

A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data Parallelism

Authors:

Ioannis Latifis,

Karthick Parashar,

Grigoris Dimitroulakos,

Christakis Lezos,

Konstantinos Masselos,

Francky CatthoorAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 19, Issue 6

Article No.: 50, Pages 1 - 27

https://doi.org/10.1145/3391898

Published: 03 October 2020 Publication History

Abstract

This article presents a MATLAB-to-C compiler that exploits custom instructions present in state-of-the-art processor architectures and supports semi-automatic vectorization. A parameterized processor model is used to describe the target instruction set architecture to achieve user-friendly retargetability. Custom instructions are represented via specialized intrinsic functions in the generated code, which can then be used as input to any C/C++ compiler supporting the target processor. In addition, the compiler supports the generation of data parallel/vectorized code through the introduction of data packing/unpacking statements. The compiler has been used for code generation targeting ARM and x86 architectures for several benchmarks. The vectorized code generated by the compiler achieves an average speedup of 4.1× and 2.7× for packed fixed and floating point data, respectively, compared to scalarized code for ARM architecture and an average speedup of 3.1× and 1.5× for packed fixed and floating point data, respectively, for x86 architecture. Implementing data parallel instructions directly in the assembly code would have required a lot of design effort, and it would not been sustainable across evolving platform variants. Thus, the compiler can be employed to efficiently speed up critical sections of the target application. The compiler is therefore potentially employable to raise the design abstraction and reduce development time for both embedded and general-purpose applications.

References

[1]

R. Allen and S. Johnson. 1988. Compiling c for vectorization, parallelization, and inline expansion. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (PLDI’88). ACM, New York, NY, 241--249.

[2]

Oscar Almer, Richard Bennett, Igor Böhm, Alastair Murray, Xinhao Qu, Marcela Zuluaga, Björn Franke, and Nigel Topham. 2012. An End-to-End Design Flow for Automated Instruction Set Extension and Complex Instruction Selection based on GCC.

[3]

Marnix Arnold and Henk Corporaal. 2001. Designing domain-specific processors. In Proceedings of the Ninth International Symposium on Hardware/Software Codesign (CODES’01). ACM, New York, NY, 61--66.

[4]

ASIP Designer 2016. Synopsys—ASIP Designer. Retrieved from http://www.synopsys.com/dw/ipdir.php?ds=asip-designer.

[5]

P. Banerjee, N. Shenoy, A Choudhary, S. Hauck, C. Bachmann, M. Haldar, P. Joisha, A Jones, A Kanhare, A Nayak, S. Periyacheri, M. Walkden, and D. Zaretsky. (2000). A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’00).

Digital Library

[6]

M. Benincasa, R. Besler, D. Brassaw, and R. L. Kohler. 1998. Rapid development of real-time systems using RTExpressTM. In Proceedings of the 1st Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (IPPS/SPDP’98). 594--599.

[7]

Aart J. C. Bik. 2004. Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance. Intel Press, Hillsboro, OR.

[8]

João Bispo, Luís Reis, and João M. P. Cardoso. 2014. Multi-target c code generation from MATLAB. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY, 95:95–95:100.

[9]

Stéphane Chauveau and François Bodin. 1999. Menhir: An environment for high performance MATLAB. Sci. Program. 7, 3--4 (Aug. 1999), 303--312.

[10]

Nathan Clark, Amir Hormati, Scott Mahlke, and Sami Yehia. 2006. Scalable subgraph mapping for acyclic computation accelerators. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’06). ACM, New York, NY, 147--157.

Digital Library

[11]

Keith Cooper and Linda Torczon. 2012. Engineering a Compiler (Second Edition). Morgan Kaufmann, Boston. 765–785 pages.

[12]

Luiz De Rose and David Padua. 1999. Techniques for the translation of MATLAB programs into fortran 90. ACM Trans. Program. Lang. Syst. 21, 2 (March 1999), 286--323.

[13]

Alexandre E. Eichenberger, Peng Wu, and Kevin O’Brien. 2004. Vectorization for SIMD architectures with alignment constraints. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI’04). ACM, New York, NY, 82--93.

Digital Library

[14]

GCC 2020. GCC, the GNU Compiler Collection. Retrieved from https://gcc.gnu.org.

[15]

Serge Guelton, Joël Falcou, and Pierrick Brunet. 2014. Exploring the vectorization of python constructs using pythran and boost SIMD. In Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing (WPMVP’14). ACM, New York, NY, 79--86.

Digital Library

[16]

Pramod G. Joisha and Prithviraj Banerjee. 2007. A translator system for the MATLAB language: Research articles. Softw. Pract. Exper. 37, 5 (April 2007), 535--578.

[17]

Ken Kennedy and Kathryn S. McKinley. 1990. Loop distribution with arbitrary control flow. In Proceedings of the 1990 ACM/IEEE Conference on Supercomputing (Supercomputing’90). IEEE Computer Society Press, Los Alamitos, CA, 407--416. http://dl.acm.org/citation.cfm?id=110382.110458

[18]

Ioannis Latifis, Karthick Parashar, Grigoris Dimitroulakos, Hans Cappelle, Christakis Lezos, Konstantinos Masselos, and Francky Catthoor. 2017. A MATLAB vectorizing compiler targeting application-specific instruction set processors. ACM Trans. Des. Autom. Electron. Syst. 22, 2, Article 32 (Jan. 2017), 28 pages.

Digital Library

[19]

R. Leupers and P. Marwedel. 1996. Instruction selection for embedded DSPs with complex instructions. In Proceedings of the Conference on European Design Automation (EURO-DAC’96/EURO-VHDL’96). IEEE Computer Society Press, Los Alamitos, CA, 200--205. http://dl.acm.org/citation.cfm?id=252471.252509

[20]

Rainer Leupers and Steven Bashford. 2000. Graph-based code selection techniques for embedded processors. ACM Trans. Des. Autom. Electron. Syst. 5, 4 (October 2000), 794--814.

Digital Library

[21]

T. Li, W. Jigang, S. K. Lam, T. Srikanthan, and X. Lu. 2009. Efficient heuristic algorithm for rapid custom-instruction selection. In Proceedings of the 8th IEEE/ACIS International Conference on Computer and Information Science (ICIS’09). 266--270.

[22]

Bruno Cardoso Lopes and Rafael Auler. 2014. Getting Started with LLVM Core Libraries. Packt Publishing.

Digital Library

[23]

Saeed Maleki, Yaoqing Gao, Maria J. Garzarán, Tommy Wong, and David A. Padua. 2011. An evaluation of vectorizing compilers. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE Computer Society, Los Alamitos, CA, 372--382.

Digital Library

[24]

Stanislav Manilov, Björn Franke, Anthony Magrath, and Cedric Andrieu. 2015. Free rider: A tool for retargeting platform-specific intrinsic functions. In Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM (LCTES’15). ACM, New York, NY, 5:1–5:10.

Digital Library

[25]

MathWorks Coder. 2020. MATLAB Coder. Retrieved from http://www.mathworks.com/products/matlab-coder/.

[26]

Matlab embedded coder. 2020. MATLAB Embedded Coder—Generate C and C++ Code Optimized for Embedded systems. Retrieved from http://www.mathworks.com/products/embedded-coder/.

[27]

Alastair Murray and Björn Franke. 2012. Compiling for automatically generated instruction set extensions. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 13--22.

Digital Library

[28]

Dorit Naishlos. 2004. Autovectorization in GCC. In Proceedings of the GCC Developer’s Summit. 105--117.

[29]

Dorit Nuzman, Sergei Dyshel, Erven Rohou, Ira Rosen, Kevin Williams, David Yuste, Albert Cohen, and Ayal Zaks. 2011. Vapor SIMD: Auto-vectorize once, run everywhere. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE Computer Society, Los Alamitos, CA, 151--160. http://dl.acm.org/citation.cfm?id=2190025.2190062

Digital Library

[30]

Dorit Nuzman and Richard Henderson. 2006. Multi-platform auto-vectorization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). IEEE Computer Society, Los Alamitos, CA, 281--294.

Digital Library

[31]

Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization: Revisited for short SIMD architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 2--11.

Digital Library

[32]

Octave. 2020. GNU Octave. Retrieved from https://www.gnu.org/software/octave/.

[33]

Ashwin Prasad, Jayvant Anantpur, and R. Govindarajan. 2011. Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY, 152--163.

[34]

M. Prieto, L. Pinuel, F. Catthoor, F. Tirado, and C. Tenllado. 2005. Improving superword level parallelism support in modern compilers. In Proceedings of the 2005 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). 303--308.

[35]

M. J. Quinn, A Malishevsky, and N. Seelam. 1998. Otter: Bridging the gap between MATLAB and ScaLAPACK. In Proceedings of the 7th International Symposium on High Performance Distributed Computing. 114--121.

[36]

Raspberry Pi. 2016. Raspberry Pi Products. Retrieved from https://www.raspberrypi.org/products/.

[37]

G. Ren, P. Wu, and D. Padua. 2005. An empirical study on the vectorization of multimedia applications for multimedia extensions. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 89b–89b.

[38]

Gang Ren, Peng Wu, and David Padua. 2006. Optimizing data permutations for SIMD devices. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). ACM, New York, NY, 118--131.

Digital Library

[39]

Sage. 2020. SageMath—Open-Source Mathematical Software System. Retrieved from http://www.sagemath.org/.

[40]

H. Scharwaechter, R. Leupers, G. Ascheid, H. Meyr, J. M. Youn, and Y. Paek. 2007. A code-generator generator for multi-output instructions. In Proceedings of the 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07). 131--136.

[41]

Scilab. 2020. Scilab. Retrieved from https://www.scilab.org/.

[42]

Jaewook Shin. 2007. Introducing control flow into vectorized code. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT’07). IEEE Computer Society, Los Alamitos, CA, 280--291.

Digital Library

[43]

J. Shin, M. Hall, and J. Chame. 2005. Superword-level parallelism in the presence of control flow. In Proceedings of the International Symposium on Code Generation and Optimization. 165--175.

Digital Library

[44]

L. L. Smith. 1991. Vectorizing C compilers: How good are they? In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing’91). 544--553.

Digital Library

[45]

Yulei Sui, XIaokang Fan, Hao Zhou, and Jingling Xue. 2016. Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization. In Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems (LCTES’16). ACM, New York, NY, 41--51.

Digital Library

[46]

Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, and Ira Rosen. 2009. Polyhedral-model guided loop-nest auto-vectorization. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). IEEE Computer Society, Los Alamitos, CA, 327--337.

Digital Library

Index Terms

A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data Parallelism
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Retargetable compilers

Recommendations

A MATLAB Vectorizing Compiler Targeting Application-Specific Instruction Set Processors
Special Section of IDEA: Integrating Dataflow, Embedded Computing, and Architecture

This article discusses a MATLAB-to-C vectorizing compiler that exploits custom instructions, for example, for Single Instruction Multiple Data (SIMD) processing and instructions for complex arithmetic present in Application-Specific Instruction Set ...
Matlab to C compilation targeting application specific instruction set processors
DATE '16: Proceedings of the 2016 Conference on Design, Automation & Test in Europe

This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C ...
Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 19, Issue 6

Special Issue on LCETES, Part 2, Learning, Distributed, and Optimizing Compilers

November 2020

271 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3427195

Editor:
Sandeep K. Shukla
Indian Institute of Technology, India

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 03 October 2020

Online AM: 07 May 2020

Accepted: 01 March 2020

Revised: 01 March 2020

Received: 01 November 2019

Published in TECS Volume 19, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
70
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents