skip to main content
article

Auto-vectorization of interleaved data for SIMD

Published: 11 June 2006 Publication History

Abstract

Most implementations of the Single Instruction Multiple Data (SIMD) model available today require that data elements be packed in vector registers. Operations on disjoint vector elements are not supported directly and require explicit data reorganization manipulations. Computations on non-contiguous and especially interleaved data appear in important applications, which can greatly benefit from SIMD instructions once the data is reorganized properly. Vectorizing such computations efficiently is therefore an ambitious challenge for both programmers and vectorizing compilers. We demonstrate an automatic compilation scheme that supports effective vectorization in the presence of interleaved data with constant strides that are powers of 2, facilitating data reorganization. We demonstrate how our vectorization scheme applies to dominant SIMD architectures, and present experimental results on a wide range of key kernels, showing speedups in execution time up to 3.7 for interleaving levels (stride) as high as 8.

References

[1]
R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures - A Dependence-based Approach. Morgan Kaufmann Publishers, 2001.
[2]
K. Asanovic and D. Johnson. Torrent Architecture Manual. Technical report tr-96-056, Internation Computer Science Institute (ICSI), 1996.
[3]
L. Bachega, S. Chatterjee, K. A. Dockserz, J. A. Gunnels, M. Gupta, F. G. Gustavson, C. A. Lapkowskix, G. K. Liu, M. P. Mendell, C. D. Wait, and T. J. C. Ward. A High-performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design. In Proc. of the 13th International Conference on Parallel Architecture and Compilation Techniques (PACT'04), pages 85--96, September 2004.
[4]
A. J. C. Bik, M. Girkar, P. M. Grey, and X. Tian. Efficient exploitation of parallelism on Pentium III and Pentium 4 processor-based systems. Intel Technology J., February 2001.
[5]
A. Bik. The Software Vectorization Handbook. Applying Multimedia Extensions for Maximum Performance. Intel Press, 2004.
[6]
J. Corbal, R. Espasa, and M. Valero. Exploiting a New Level of DLP in Multimedia Applications. In Proc. of the 32nd annual ACM/IEEE International Symposium on Microarchitecture (Micro), pages 72--79, 1999.
[7]
P. D'Arcy and S. Beach. StarCore SC140: A New DSP Architecture for Portable Devices. In Wireless Symposium. Motorola, September 1999.
[8]
K. Diefendorff, P. K. Dubey, R. Hochsprung and H. Scales. Altivec Extension to PowerPC Accelerates Media Processing. IEEE Micro, Vol. 20, No. 2, pages 85--95, March-April 2000.
[9]
A. E. Eichenberger, P. Wu, and K. O'brien. Vectorization for SIMD Architectures with Alignment Constraints. In Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 82--93, June 2004.
[10]
R. Espasa, F. Ardanaz, J. Emer, S. Felix, J. Gago, R. Gramunt, I. Hernandez, T. Juan, G. Lowney, M. Mattina, and A. Seznec. Tarantula: A Vector Extension to the Alpha Architecture. In Proc. of the 29th Annual International Symposium on Computer Architecture (ISCA), pages 281--292, May 2002.
[11]
Free Software Foundation. Auto-Vectorization in GCC, http://gcc.gnu.org/projects/tree-ssa/vectorization.html.
[12]
Free Software Foundation. GCC, http://gcc.gnu.org.
[13]
G. Goff, K. Kennedy, and C. Tseng. Practical Dependence Testing. In Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 15--29, June 1991.
[14]
Texas Instruments. www.ti.com/sc/c6x, 2000.
[15]
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 49(4), pages 589--604, July 2005.
[16]
A. Kudriavtsev and P. Kogge Generation of Permutations for SIMD Processors in Proc. of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems (LCTES), pages 147 -- 156, June 2005.
[17]
S. Larsen and S. Amarasinghe. Exploiting Superword Level Parallelism with Multimedia Instruction Sets. In Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 145--156, June 2000.
[18]
J. Lorenz, S. Kral, F. Franchetti, and C. W. Ueberhuber. Vectorization Techniques for the BlueGene/L Double FPU. IBM Journal of Research and Development, 49(2-3), pages 437--446, March/May 2005.
[19]
J. Merrill. Generic and Gimple: A New Tree Representation for Entire Functions. In the GCC Developer's summit, pages 171--180, June 2003.
[20]
J. H. Moreno, V. Zyuban, U. Shvadron, F. Neeser, J. Derby, M. Ware, K. Kailas, A. Zaks, A. Geva, S. Ben-David, S. Asaad, T. Fox, M. Biberstein, D. Naishlos, and H. Hunter. An Innovative Low-power High-performance Programmable Signal Processor for Digital Communications. IBM Journal of Research and Development 47(2-3), pages 299--326, March/May 2003.
[21]
D. Naishlos, M. Biberstein, S. Ben-David, and A. Zaks. Vectorizing for a SIMdD DSP Architecture. In Proc. of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), pages 2--11, October 2003.
[22]
D. Naishlos and R. Henderson. Multi-platform Auto-vectorization. In Proc. of the 4th Annual International Symposium on Code Generation and Optimization (CGO), March 2006.
[23]
H. Nguyen and L. K. John. Exploiting SIMD Parallelism in DSP and Multimedia Algorithms using the AltiVec Technology. In Intl. Conf. on Supercomputing, pages 11--20, 1999.
[24]
D. Novillo. Tree SSA - a New Optimization Infrastructure for GCC. In Proc. of the GCC Developers Summit, pages 181--194, June 2003.
[25]
A. Peleg and U. Weiser. MMX Technology Extension to the Intel Architecture. IEEE Micro Vol.16, No.4, pages 42--50, August 1996.
[26]
G. Pokam, S. Bihan, J. Simonnet, and F. Bodin. SWARP: A Retargetable Preprocessor for Multimedia Instructions In Concurrency and Computation: Practice and Experience; Special Issue: Compilers for Parallel Computers, Vol. 16, No. 2-3, pages 303 -- 318, January 2004.
[27]
S. Pop, G. Silber, A. Cohen, P. Clauss, and V. Loechner. Fast Recognition of Scalar Evolutions on Three-address SSA Code. Research Report A/354/CRI, CRI/ENSMP, April 2004.
[28]
S. Pop, A. Cohen, and G. Silber. Induction Variable Analysis with Delayed Abstractions. In Proc. of the First International Conference of High Performance Embedded Architectures and Compilers (HiPEAC), pages 218--232, November 2005.
[29]
I. Pryanishnikov, A. Krall, and N. Horspool. Pointer Alignment Analysis for Processors with SIMD Instructions. In Proc. of the 5th Workshop on Media and Streaming Processors at Micro '03, pages 50--57, December 2003.
[30]
G. Ren, P. Wu, and D. Padua. A Preliminary Study on the Vectorization of Multimedia Applications for Multimedia Extensions. In 16th International Workshop of Languages and Compilers for Parallel Computing (LCPC), pages 420 -- 435, October 2003.
[31]
G. Ren, P. Wu, and D. Padua. Optimizing Data Permutations for SIMD Devices. to appear in Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2006.
[32]
J. Shin, J. Chame, and M. W. Hall. Compiler-controlled Caching in Superword Register Files for Multimedia Extension Architectures. In Proc. of the 11th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 45--55, September 2002.
[33]
J. Shin, M. Hall, and J. Chame. Superword-Level Parallelism in the Presence of Control Flow. In Proc. of International Symposium on Code Generation and Optimization (CGO), pages 165--175, March 2005.
[34]
K. B. Smith, A. J. Bik, and X. Tian. Support for the Intel Pentium 4 Processor with Hyper-threading Technology in Intel 8.0 Compilers. Intel Technology Journal, 8(1), pages 19--31, February 2004.
[35]
D. Talla, L. K. John, and D. Burger. Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements. IEEE Trans. on Computers Vol. 52, No. 8, pages 1015--1031, August 2003.
[36]
Crecent Bay Software. VAST-F/ALtivec: Automatic Fortran Vectorizer for PowerPC Vector Unit, http://www.crescentbaysoftware.com/docs/vastfav.pdf.
[37]
Crecent Bay Software. Vast/altivec faq: Vectorization for Altivec, http://www.crescentbaysoftware.com/altivec_FAQ.html.
[38]
M. Wolfe. High Performance Compilers for Parallel Computing. Addison Wesley, 1996.
[39]
P. Wu, A. E. Eichenberger, and A. Wang. Efficient SIMD Code Generation for Runtime Alignment. In Proc. of the International Symposium on Code Generation and Optimization (CGO), pages 153-- 164, March 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 41, Issue 6
Proceedings of the 2006 PLDI Conference
June 2006
426 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1133255
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '06: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2006
    438 pages
    ISBN:1595933204
    DOI:10.1145/1133981
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2006
Published in SIGPLAN Volume 41, Issue 6

Check for updates

Author Tags

  1. SIMD
  2. Viterbi
  3. data reuse
  4. subword parallelism
  5. vectorization

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)138
  • Downloads (Last 6 weeks)14
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media