skip to main content
research-article

A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data Parallelism

Published: 03 October 2020 Publication History

Abstract

This article presents a MATLAB-to-C compiler that exploits custom instructions present in state-of-the-art processor architectures and supports semi-automatic vectorization. A parameterized processor model is used to describe the target instruction set architecture to achieve user-friendly retargetability. Custom instructions are represented via specialized intrinsic functions in the generated code, which can then be used as input to any C/C++ compiler supporting the target processor. In addition, the compiler supports the generation of data parallel/vectorized code through the introduction of data packing/unpacking statements. The compiler has been used for code generation targeting ARM and x86 architectures for several benchmarks. The vectorized code generated by the compiler achieves an average speedup of 4.1× and 2.7× for packed fixed and floating point data, respectively, compared to scalarized code for ARM architecture and an average speedup of 3.1× and 1.5× for packed fixed and floating point data, respectively, for x86 architecture. Implementing data parallel instructions directly in the assembly code would have required a lot of design effort, and it would not been sustainable across evolving platform variants. Thus, the compiler can be employed to efficiently speed up critical sections of the target application. The compiler is therefore potentially employable to raise the design abstraction and reduce development time for both embedded and general-purpose applications.

References

[1]
R. Allen and S. Johnson. 1988. Compiling c for vectorization, parallelization, and inline expansion. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (PLDI’88). ACM, New York, NY, 241--249.
[2]
Oscar Almer, Richard Bennett, Igor Böhm, Alastair Murray, Xinhao Qu, Marcela Zuluaga, Björn Franke, and Nigel Topham. 2012. An End-to-End Design Flow for Automated Instruction Set Extension and Complex Instruction Selection based on GCC.
[3]
Marnix Arnold and Henk Corporaal. 2001. Designing domain-specific processors. In Proceedings of the Ninth International Symposium on Hardware/Software Codesign (CODES’01). ACM, New York, NY, 61--66.
[4]
ASIP Designer 2016. Synopsys—ASIP Designer. Retrieved from http://www.synopsys.com/dw/ipdir.php?ds=asip-designer.
[5]
P. Banerjee, N. Shenoy, A Choudhary, S. Hauck, C. Bachmann, M. Haldar, P. Joisha, A Jones, A Kanhare, A Nayak, S. Periyacheri, M. Walkden, and D. Zaretsky. (2000). A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’00).
[6]
M. Benincasa, R. Besler, D. Brassaw, and R. L. Kohler. 1998. Rapid development of real-time systems using RTExpressTM. In Proceedings of the 1st Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing (IPPS/SPDP’98). 594--599.
[7]
Aart J. C. Bik. 2004. Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance. Intel Press, Hillsboro, OR.
[8]
João Bispo, Luís Reis, and João M. P. Cardoso. 2014. Multi-target c code generation from MATLAB. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY, 95:95–95:100.
[9]
Stéphane Chauveau and François Bodin. 1999. Menhir: An environment for high performance MATLAB. Sci. Program. 7, 3--4 (Aug. 1999), 303--312.
[10]
Nathan Clark, Amir Hormati, Scott Mahlke, and Sami Yehia. 2006. Scalable subgraph mapping for acyclic computation accelerators. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’06). ACM, New York, NY, 147--157.
[11]
Keith Cooper and Linda Torczon. 2012. Engineering a Compiler (Second Edition). Morgan Kaufmann, Boston. 765–785 pages.
[12]
Luiz De Rose and David Padua. 1999. Techniques for the translation of MATLAB programs into fortran 90. ACM Trans. Program. Lang. Syst. 21, 2 (March 1999), 286--323.
[13]
Alexandre E. Eichenberger, Peng Wu, and Kevin O’Brien. 2004. Vectorization for SIMD architectures with alignment constraints. In Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI’04). ACM, New York, NY, 82--93.
[14]
GCC 2020. GCC, the GNU Compiler Collection. Retrieved from https://gcc.gnu.org.
[15]
Serge Guelton, Joël Falcou, and Pierrick Brunet. 2014. Exploring the vectorization of python constructs using pythran and boost SIMD. In Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing (WPMVP’14). ACM, New York, NY, 79--86.
[16]
Pramod G. Joisha and Prithviraj Banerjee. 2007. A translator system for the MATLAB language: Research articles. Softw. Pract. Exper. 37, 5 (April 2007), 535--578.
[17]
Ken Kennedy and Kathryn S. McKinley. 1990. Loop distribution with arbitrary control flow. In Proceedings of the 1990 ACM/IEEE Conference on Supercomputing (Supercomputing’90). IEEE Computer Society Press, Los Alamitos, CA, 407--416. http://dl.acm.org/citation.cfm?id=110382.110458
[18]
Ioannis Latifis, Karthick Parashar, Grigoris Dimitroulakos, Hans Cappelle, Christakis Lezos, Konstantinos Masselos, and Francky Catthoor. 2017. A MATLAB vectorizing compiler targeting application-specific instruction set processors. ACM Trans. Des. Autom. Electron. Syst. 22, 2, Article 32 (Jan. 2017), 28 pages.
[19]
R. Leupers and P. Marwedel. 1996. Instruction selection for embedded DSPs with complex instructions. In Proceedings of the Conference on European Design Automation (EURO-DAC’96/EURO-VHDL’96). IEEE Computer Society Press, Los Alamitos, CA, 200--205. http://dl.acm.org/citation.cfm?id=252471.252509
[20]
Rainer Leupers and Steven Bashford. 2000. Graph-based code selection techniques for embedded processors. ACM Trans. Des. Autom. Electron. Syst. 5, 4 (October 2000), 794--814.
[21]
T. Li, W. Jigang, S. K. Lam, T. Srikanthan, and X. Lu. 2009. Efficient heuristic algorithm for rapid custom-instruction selection. In Proceedings of the 8th IEEE/ACIS International Conference on Computer and Information Science (ICIS’09). 266--270.
[22]
Bruno Cardoso Lopes and Rafael Auler. 2014. Getting Started with LLVM Core Libraries. Packt Publishing.
[23]
Saeed Maleki, Yaoqing Gao, Maria J. Garzarán, Tommy Wong, and David A. Padua. 2011. An evaluation of vectorizing compilers. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE Computer Society, Los Alamitos, CA, 372--382.
[24]
Stanislav Manilov, Björn Franke, Anthony Magrath, and Cedric Andrieu. 2015. Free rider: A tool for retargeting platform-specific intrinsic functions. In Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM (LCTES’15). ACM, New York, NY, 5:1–5:10.
[25]
MathWorks Coder. 2020. MATLAB Coder. Retrieved from http://www.mathworks.com/products/matlab-coder/.
[26]
Matlab embedded coder. 2020. MATLAB Embedded Coder—Generate C and C++ Code Optimized for Embedded systems. Retrieved from http://www.mathworks.com/products/embedded-coder/.
[27]
Alastair Murray and Björn Franke. 2012. Compiling for automatically generated instruction set extensions. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 13--22.
[28]
Dorit Naishlos. 2004. Autovectorization in GCC. In Proceedings of the GCC Developer’s Summit. 105--117.
[29]
Dorit Nuzman, Sergei Dyshel, Erven Rohou, Ira Rosen, Kevin Williams, David Yuste, Albert Cohen, and Ayal Zaks. 2011. Vapor SIMD: Auto-vectorize once, run everywhere. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE Computer Society, Los Alamitos, CA, 151--160. http://dl.acm.org/citation.cfm?id=2190025.2190062
[30]
Dorit Nuzman and Richard Henderson. 2006. Multi-platform auto-vectorization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). IEEE Computer Society, Los Alamitos, CA, 281--294.
[31]
Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization: Revisited for short SIMD architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 2--11.
[32]
Octave. 2020. GNU Octave. Retrieved from https://www.gnu.org/software/octave/.
[33]
Ashwin Prasad, Jayvant Anantpur, and R. Govindarajan. 2011. Automatic compilation of MATLAB programs for synergistic execution on heterogeneous processors. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY, 152--163.
[34]
M. Prieto, L. Pinuel, F. Catthoor, F. Tirado, and C. Tenllado. 2005. Improving superword level parallelism support in modern compilers. In Proceedings of the 2005 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). 303--308.
[35]
M. J. Quinn, A Malishevsky, and N. Seelam. 1998. Otter: Bridging the gap between MATLAB and ScaLAPACK. In Proceedings of the 7th International Symposium on High Performance Distributed Computing. 114--121.
[36]
Raspberry Pi. 2016. Raspberry Pi Products. Retrieved from https://www.raspberrypi.org/products/.
[37]
G. Ren, P. Wu, and D. Padua. 2005. An empirical study on the vectorization of multimedia applications for multimedia extensions. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium. 89b–89b.
[38]
Gang Ren, Peng Wu, and David Padua. 2006. Optimizing data permutations for SIMD devices. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06). ACM, New York, NY, 118--131.
[39]
Sage. 2020. SageMath—Open-Source Mathematical Software System. Retrieved from http://www.sagemath.org/.
[40]
H. Scharwaechter, R. Leupers, G. Ascheid, H. Meyr, J. M. Youn, and Y. Paek. 2007. A code-generator generator for multi-output instructions. In Proceedings of the 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07). 131--136.
[41]
Scilab. 2020. Scilab. Retrieved from https://www.scilab.org/.
[42]
Jaewook Shin. 2007. Introducing control flow into vectorized code. In Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques (PACT’07). IEEE Computer Society, Los Alamitos, CA, 280--291.
[43]
J. Shin, M. Hall, and J. Chame. 2005. Superword-level parallelism in the presence of control flow. In Proceedings of the International Symposium on Code Generation and Optimization. 165--175.
[44]
L. L. Smith. 1991. Vectorizing C compilers: How good are they? In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing’91). 544--553.
[45]
Yulei Sui, XIaokang Fan, Hao Zhou, and Jingling Xue. 2016. Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization. In Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems (LCTES’16). ACM, New York, NY, 41--51.
[46]
Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, and Ira Rosen. 2009. Polyhedral-model guided loop-nest auto-vectorization. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). IEEE Computer Society, Los Alamitos, CA, 327--337.

Index Terms

  1. A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data Parallelism

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 19, Issue 6
    Special Issue on LCETES, Part 2, Learning, Distributed, and Optimizing Compilers
    November 2020
    271 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3427195
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 03 October 2020
    Online AM: 07 May 2020
    Accepted: 01 March 2020
    Revised: 01 March 2020
    Received: 01 November 2019
    Published in TECS Volume 19, Issue 6

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ARM
    2. MATLAB
    3. auto-vectorization
    4. compilation
    5. compiler
    6. retargetable
    7. x86

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 70
      Total Downloads
    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media