research-article

Generalized Multiway Branch Unit for VLIW Microprocessors

Authors:

Scott D. CarsonAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 6, Issue 8

Pages 850 - 862

https://doi.org/10.1109/71.406961

Published: 01 August 1995 Publication History

Abstract

VLIW processors use multiway branch instructions to achieve high-speed, parallel evaluation of control structures. This paper introduces a new multiway branch mechanism that allows constant-time branch-target resolution based on an arbitrary condition tree. The unique feature of this mechanism is its target selection unit, which yields a branch-target based on a set of condition bit values and a condition tree description. A representation of condition trees that results in a compact target selection unit is described, and the logic diagram of a target selection unit that provides a four-way branching is shown. Our experimental results on nontrivial integer benchmarks indicate that the proposed multiway branch unit can improve the performance of VLIW machines substantially (i.e., as much as a geometric mean of 35%), compared to using the conventional two-way branching.

References

[1]

B. Rau and J. Fisher, “Instruction-level parallel processing: History, overview, and perspective,” J. of Supercomputing, Special Issue on Instruction-Level Parallelism, vol. 7, no. 1 /2, pp. 9-50, 1993.]]

Digital Library

[2]

J. Fisher, “2<sup>n</sup>-way jump microinstruction hardware and an effective instruction binding method,” Proc. 13th Ann. Workshop Microprogramming (Micro-13), pp. 64-75, Nov. 1980.]]

Digital Library

[3]

J. Fisher, “VLIW architecture and the ELI-512,” Proc. 10th Int’l Symp. Computer Architecture, pp. 140-150, May 1983.]]

Digital Library

[4]

K. Karplus and A. Nicolau, “Efficient hardware for multi-way jumps and prefetches,” Proc. 18th Ann. Workshop Microprogramming (Micro-18), pp. 11-18, Dec. 1985.]]

Digital Library

[5]

K. Ebcioglu, “Some design ideas for a VLIW architecture for sequential natured software,” Proc. IFIP 10.3 Working Conf. Parallel Processing, pp. 3-21, Apr. 1988.]]

[6]

S.-M. Moon, S. Carson, and A. Agrawala, “Hardware implementation of a general multi-way jump mechanism,” Proc. 23rd Ann. Symp. Microarchitecture (Micro-23), pp. 38-45, Dec. 1990.]]

Digital Library

[7]

S. McFarling and J. Hennessy, “Reducing the cost of branches,” Proc. 18th Ann. Workshop Microprogramming (Micro-18), pp.11-18, 1985.]]

[8]

J. Ellis, Bulldog: A Compiler for VLIW Architecture. Cambridge, Mass.: MIT Press, 1986.]]

Digital Library

[9]

A. Aiken and A. Nicolau, “A development environment for horizontal microcode,” IEEE Transactions on Software Engineering, vol. 14, no. 5, pp. 584-594, May 1988.]]

Digital Library

[10]

S.-M. Moon and K. Ebcioglu, “An efficient resource-constrained global scheduling technique for superscalar and VLIW processors,” Proc. 25th Ann. Int’l Symp. Microarchitecture (Micro-25), pp. 55-71, Dec. 1992.]]

Digital Library

[11]

B. Rau and C. Glaeser, “Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing,” Proc. 14th Ann. Workshop Microprogramming (Micro-14), pp. 183-198, Oct. 1981.]]

Digital Library

[12]

M. Lam, “Software pipelining: An effective scheduling technique for VLIW machines,” Proc. SIGPLAN 1988 Conf. Programming Language Design and Implementation, pp. 318-328, June 1988.]]

Digital Library

[13]

K. Ebcioglu and T. Nakatani, “A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture,” Languages and Compilers for Parallel Computing. Cambridge, Mass.: MIT Press, pp. 213-229, 1989.]]

Digital Library

[14]

S.-M. Moon, “Increasing instruction-level parallelism through multi-way branching,” Proc. 1993 Int’l Conf. Parallel Processing, pp. 2:241-245, Aug. 1993.]]

Digital Library

[15]

T. Nakatani and K. Ebcioglu, “Making compaction based parallelization affordable,” IEEE Trans. Parallel Distributed Syst., pp. 1,014-1,029, Sept. 1993.]]

Digital Library

[16]

S.-M. Moon, “Compile-time parallelization of non-numerical code; VLIW and superscalar,” PhD dissertation, Dept. Computer Science, Univ. of Maryland, 1993.]]

Digital Library

[17]

H. Warren, M. Auslander, G. Chaitin, A. Chibib, M. Hopkins, and A. MacKay, “Final code generation in the PL.8 compiler,” IBM Research Division, June, 1986, Res. Rep. RC 11974.]]

[18]

D. Bernstein and M. Rodeh, “Global instruction scheduling for superscalar machines,” Proc. SIGPLAN 1991 Conf. Programming Language Design and Implementation, pp. 241-255, June 1991.]]

Digital Library

[19]

M. Smith, M. Horowitz, and M. Lam, “Efficient superscalar performance through boosting,” Proc. Fifth Int’l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-5), pp. 248-259, Oct. 1992.]]

Digital Library

Cited By

Malazgirt GYurdakul ANiar S(2015)Customizing VLIW processors from dynamically profiled execution tracesMicroprocessors & Microsystems10.1016/j.micpro.2015.09.00539:8(656-673)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1016/j.micpro.2015.09.005
Yun HKim JMoon S(2003)Time optimal software pipelining of loops with control flowsInternational Journal of Parallel Programming10.1023/A:102738702848131:5(339-391)Online publication date: 1-Oct-2003
https://dl.acm.org/doi/10.1023/A%3A1027387028481
Yun HKim JMoon SEbcioglu KPingali KNicolau A(2002)Optimal software pipelining of loops with control flowsProceedings of the 16th international conference on Supercomputing10.1145/514191.514210(117-128)Online publication date: 22-Jun-2002
https://dl.acm.org/doi/10.1145/514191.514210
Show More Cited By

Recommendations

Machine-Description Driven Compilers for EPIC and VLIW Processors

In the past, due to the restricted gate count available on an inexpensive chip, embedded DSPs have had limited parallelism, few registers and irregular, incomplete interconnectivity. More recently, with increasing levels of integration, embedded VLIW ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors

Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Inter-cluster communication in VLIW architectures

The traditional VLIW (very long instruction word) architecture with a single register file does not scale up well to address growing performance demands on embedded media processors. However, splitting a VLIW processor in smaller clusters, which are ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 6, Issue 8

August 1995

129 pages

ISSN:1045-9219

Issue’s Table of Contents

Copyright © Copyright © 1992 IEEE. All Rights Reserved.

Publisher

IEEE Press

Publication History

Published: 01 August 1995

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Malazgirt GYurdakul ANiar S(2015)Customizing VLIW processors from dynamically profiled execution tracesMicroprocessors & Microsystems10.1016/j.micpro.2015.09.00539:8(656-673)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1016/j.micpro.2015.09.005
Yun HKim JMoon S(2003)Time optimal software pipelining of loops with control flowsInternational Journal of Parallel Programming10.1023/A:102738702848131:5(339-391)Online publication date: 1-Oct-2003
https://dl.acm.org/doi/10.1023/A%3A1027387028481
Yun HKim JMoon SEbcioglu KPingali KNicolau A(2002)Optimal software pipelining of loops with control flowsProceedings of the 16th international conference on Supercomputing10.1145/514191.514210(117-128)Online publication date: 22-Jun-2002
https://dl.acm.org/doi/10.1145/514191.514210
Chen SFuchs W(2001)Compiler-Assisted Multiple Instruction Word Retry for VLIW ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/71.97056412:12(1293-1304)Online publication date: 1-Dec-2001
https://dl.acm.org/doi/10.1109/71.970564
Park SShim SMoon SSmotherman MConte T(1997)Evaluation of scheduling techniques on a SPARC-based VLIW testbedProceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture10.5555/266800.266811(104-113)Online publication date: 1-Dec-1997
https://dl.acm.org/doi/10.5555/266800.266811
Moon SEbcioğlu KWallach SZima H(1997)Performance analysis of tree VLIW architecture for exploiting branch ILP in non-numerical codeProceedings of the 11th international conference on Supercomputing10.1145/263580.263653(301-308)Online publication date: 11-Jul-1997
https://dl.acm.org/doi/10.1145/263580.263653

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents