research-article

Harnessing horizontal parallelism and vertical instruction packing of programs to improve system overall efficiency

Authors:

Yunsi FeiAuthors Info & Claims

DATE '08: Proceedings of the conference on Design, automation and test in Europe

Pages 758 - 763

https://doi.org/10.1145/1403375.1403559

Published: 10 March 2008 Publication History

Abstract

Multi-issue processors can exploit the Instruction Level Parallelism (ILP) of programs to improve the performance greatly. How to reduce the energy consumption while maintaining the high performance of programs running on multi-issue processors remains a challenging problem. In this paper, we propose a novel approach to apply the instruction register file (IRF) technique from single-issue processor to VLIW architecture. Frequently executed instructions are selected to be placed in the on-chip IRF for fast access in program execution. Violation of synchronization among VLIW instruction slots is avoided by introducing new instruction formats and microarchitectural support. The enhanced VLIW architecture is thus able to orchestrate the horizontal instruction parallelism and vertical instruction packing for programs to improve system overall efficiency. Our experimental results show that the proposed processor architecture achieves both the performance advantage provided by the VLIW architecture and high energy efficiency provided by the IRF-based instruction packing technique (e.g., 71.1% reduction in the fetch energy consumption for a 4-way VLIW architecture with 8-entry IRFs).

References

[1]

SIMPLESCALAR-ARM POWER MODELING PROJECT. {http://www.eecs.umich.edu/panalyzer/}.

[2]

Trimaran. {http://www.trimaran.org/}.

[3]

G. Ascia, V. Catania, M. Palesi, and D. Patti. System-level framework for evaluating area/performance/power trade-offs of VLIW-based embedded systems. In Proc. Asia & South-Pacific Design Automation Conf., pages 940--943, Jan. 2005.

Digital Library

[4]

T. M. Conte, S. Banerjia, S. Y. Larin, K. N. Menezes, and S. W. Sathaye. Instruction fetch mechanisms for VLIW architectures with compressed encodings. In Proc. Int. Symp. Microarchitecture, pages 201--211, Dec. 1996.

Digital Library

[5]

E. Gibert, J. Sanchez, and A. Gonzalez. Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor. In Proc. Int. Symp. Microarchitecture, pages 123--133, Nov. 2002.

Digital Library

[6]

S. Haga, Y. Zhang, A. Webber, and R. Barua. Reducing code size in VLIW instruction scheduling. Journal of Embedded Computing, 1(3):415--433, Aug. 2005.

Digital Library

[7]

S. Hines, J. Green, G. Tyson, and D. Whalley. Improving program efficiency by packing instructions into registers. In Proc. Int. Symp. Computer Architecture, pages 260--271, May 2005.

Digital Library

[8]

S. Hines, G. Tyson, and D. Whalley. Improving the energy and execution efficiency of a small instruction cache by using an instruction register file. In Proc. of Watson Conf. on Interaction between Architecture, Circuits, & Compilers, pages 160--169, Sept. 2005.

[9]

M. Johnson. Superscalar Microprocessor Design. Prentice Hall, 1991.

[10]

H. S. Kim, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. A framework for energy estimation of VLIW architecture. In Proc. Int. Conf. Computer Design, pages 40--46, Sept. 2001.

Digital Library

[11]

A. Macii, E. Macii, F. Crudo, and R. Zafalon. A new algorithm for energy-driven data compression in VLIW embedded processors. In Proc. Design Automation & Test Europe Conf., pages 10024--10030, Oct. 2003.

Digital Library

[12]

Philips-Inc. An Introduction to Very-long Instruction Word (VLIW) computer architecture. Philips Semiconductors, 1997.

[13]

Y. Qian, S. Carr, and P. Sweany. Optimizing loop performance for clustered VLIW architectures. In Proc. of Int. Conf. on Parallel Architectures & Compilation Techniques, pages 271--280, Sept. 2002.

Digital Library

[14]

H. Sasaki, M. Kondo, and H. Nakamura. Energy-efficient dynamic instruction scheduling logic through instruction grouping. In Proc. Int. Symp. Low Power Electronics & Design, pages 43--48, Oct. 2006.

Digital Library

[15]

J. Sharkey, D. Ponomarev, K. Ghose, and O. Ergin. Instruction packing: reducing power and delay of the dynamic scheduling logic. In Proc. Int. Symp. Low Power Electronics & Design, pages 30--35, Aug. 2005.

Digital Library

Cited By

Lee JLee JLee JPaek Y(2014)Improving performance of loops on DIAM-based VLIW architecturesACM SIGPLAN Notices10.1145/2666357.259782549:5(135-144)Online publication date: 12-Jun-2014
https://dl.acm.org/doi/10.1145/2666357.2597825
Lee JLee JLee JPaek YZhang YKulkarni P(2014)Improving performance of loops on DIAM-based VLIW architecturesProceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems10.1145/2597809.2597825(135-144)Online publication date: 12-Jun-2014
https://dl.acm.org/doi/10.1145/2597809.2597825
Lee JYoun JCho DPaek Y(2013)Reducing instruction bit-width for low-power VLIW architecturesACM Transactions on Design Automation of Electronic Systems10.1145/2442087.244209618:2(1-32)Online publication date: 11-Apr-2013
https://dl.acm.org/doi/10.1145/2442087.2442096
Show More Cited By

Index Terms

Harnessing horizontal parallelism and vertical instruction packing of programs to improve system overall efficiency
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
      2. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Orchestrating Horizontal Parallelism and Vertical Instruction Packing of Programs to Improve System Overall Efficiency

Both performance and energy efficiency are critical concerns for embedded systems and portable devices. Multi-issue processors can exploit the instruction-level parallelism (ILP) of programs to improve the performance greatly, however, most of the time ...
Exploiting Java instruction/thread level parallelism with horizontal multithreading

Java bytecodes can be executed with the following three methods: a Java interpretor running on a particular machine interprets bytecodes; a Just-In-Time (JIT) compiler translates bytecodes to the native primitives of the particular machine and the ...
Utilizing Horizontal and Vertical Parallelism with a No-Instruction-Set Compiler for Custom Datapaths
ICCD '05: Proceedings of the 2005 International Conference on Computer Design

Performance of programs can be improved by utilizing their horizontal and vertical parallelism. In some processors (VLIW based), compiler can utilize horizontal parallelism by controlling the schedule of independent operations. Vertical parallelism is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DATE '08: Proceedings of the conference on Design, automation and test in Europe

March 2008

1575 pages

ISBN:9783981080131

DOI:10.1145/1403375

General Chair:
Donatella Sciuto
Politecnico di Milano, Italy

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

The EDA Consortium
EDAA: European Design Automation Association
ECSI
SIGDA: ACM Special Interest Group on Design Automation
The IEEE Computer Society TTTC
IEEE Council on Electronic Design Automation (CEDA)
The Russian Academy of Sciences: The Russian Academy of Sciences

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 March 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

DATE '08

Sponsor:

EDAA
SIGDA
The Russian Academy of Sciences

DATE '08: Design, Automation and Test in Europe

March 10 - 14, 2008

Munich, Germany

Acceptance Rates

Overall Acceptance Rate 518 of 1,794 submissions, 29%

Upcoming Conference

DATE '25

Sponsor:
sigda

Design, Automation and Test in Europe

March 31 - April 2, 2025

Lyon , France

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
75
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lee JLee JLee JPaek Y(2014)Improving performance of loops on DIAM-based VLIW architecturesACM SIGPLAN Notices10.1145/2666357.259782549:5(135-144)Online publication date: 12-Jun-2014
https://dl.acm.org/doi/10.1145/2666357.2597825
Lee JLee JLee JPaek YZhang YKulkarni P(2014)Improving performance of loops on DIAM-based VLIW architecturesProceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems10.1145/2597809.2597825(135-144)Online publication date: 12-Jun-2014
https://dl.acm.org/doi/10.1145/2597809.2597825
Lee JYoun JCho DPaek Y(2013)Reducing instruction bit-width for low-power VLIW architecturesACM Transactions on Design Automation of Electronic Systems10.1145/2442087.244209618:2(1-32)Online publication date: 11-Apr-2013
https://dl.acm.org/doi/10.1145/2442087.2442096
Lin HFei YOklobdzija VPangle BChang NShanbhag NKim C(2010)Exploring custom instruction synthesis for application-specific instruction set processors with multiple design objectivesProceedings of the 16th ACM/IEEE international symposium on Low power electronics and design10.1145/1840845.1840875(141-146)Online publication date: 18-Aug-2010
https://dl.acm.org/doi/10.1145/1840845.1840875
Lin HFei YBahar RLombardi FAtienza DBrunvand E(2010)A novel multi-objective instruction synthesis flow for application-specific instruction set processorsProceedings of the 20th symposium on Great lakes symposium on VLSI10.1145/1785481.1785576(409-412)Online publication date: 16-May-2010
https://dl.acm.org/doi/10.1145/1785481.1785576

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents