skip to main content
research-article

High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm

Published: 01 December 2005 Publication History

Abstract

This paper presents a cost-effective and high-performance dual-thread VLIW processor model. The dual-thread VLIW processor model is a low-cost subset of the Weld architecture paradigm. It supports one main thread and one speculative thread running simultaneously in a VLIW processor with a register file and a fetch unit per thread along with memory disambiguation hardware for speculative load and store operations. This paper analyzes the performance impact of the dual-thread VLIW processor, which includes analysis of migrating disambiguation hardware for speculative load operations to the compiler and of the sensitivity of the model to the variation of branch misprediction, second-level cache miss penalties, and register file copy time. Up to 34 percent improvement in performance can be attained using the dual-thread VLIW processor when compared to a single-threaded VLIW processor model.

References

[1]
P.K. Dubey, K. O'Brien, K.M. O'Brien, and C. Barton, “Single-Program Speculative Multithreading (SPSM) Architecture: Compiler-Assisted Fine-Grained Multithreading,” Proc. Int'l Conf. Parallel Architecture and Compilation Techniques, June 1995.
[2]
G.S. Sohi, S.E. Breach, and T.N. Vijaykumar, “Multiscalar Processors,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, May 1995.
[3]
S.E. Breach, T.N. Vijaykumar, and G.S. Sohi, “The Anatomy of the Register File in a Multiscalar Processor,” Proc. 27th Ann. Int'l Symp. Microarchitecture, Dec. 1994.
[4]
S. Wallace, B. Calder, and D.M. Tullsen, “Threaded Multiple Path Execution,” Proc. 25th Ann. Int'l Symp. Computer Architecture, June 1998.
[5]
D.M. Tullsen, S.J. Eggers, and H.M. Levy, “Simultaneous Multithreading: Maximizing On-Chip Parallelism,” Proc. 22nd Ann. Int'l Symp. Computer Architecture, May 1995.
[6]
S.W. Keckler and W.J. Dally, “Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism,” Proc. 19th Ann. Int'l Symp. Computer Architecture, May 1992.
[7]
M. Fillo, S.W. Keckler, W.J. Dally, N.P. Carter, A. Chang, Y. Gurevich, and W.S. Lee, “The M-Machine Multicomputer,” Proc. 28th Ann. Int'l Symp. Microarchitecture, Dec. 1995.
[8]
A. Wolfe and J.P. Shen, “A Variable Instruction Stream Extension to the VLIW Architecture,” Proc. Fourth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Apr. 1991.
[9]
W.A. Havanki, “Treegion Scheduling for VLIW Processors,” master's thesis, Dept. of Electrical and Computer Eng., North Carolina State Univ., Raleigh, North Carolina, July 1997.
[10]
W.A. Havanki, S. Banerjia, and T.M. Conte, “Treegion Scheduling for Wide-Issue Processors,” Proc. Fourth Int'l Symp. High Performance Computer Architecture, Feb. 1998.
[11]
B.R. Rau, “Dynamically Scheduled VLIW Processors,” Proc. 26th Ann. Int'l Symp. Microarchitecture, Dec. 1993.
[12]
M. Franklin and G.S. Sohi, “ARB: A Hardware Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. Computers, May 1996.
[13]
K. Sundaramoorthy, Z. Purser, and E. Rotenberg, “Slipstream Processors: Improving both Performance and Fault Tolerance,” Proc. Ninth Int'l Conf. Architectural Support for Programming Languages and Operating Systems, Nov. 2000.
[14]
R. Balasubramonian, S. Dwarkadas, and D.H. Albonesi, “Dynamically Allocating Processor Resources between Nearby and Distant ILP,” Proc. 28th Ann. Int'l Symp. Computer Architecture, June 2001.
[15]
J.G. Steffan, C.B. Colohan, A. Zhai, and T.C. Mowry, “A Scalable Approach to Thread-Level Speculation,” Proc. 27th Ann. Int'l Symp. Computer Architecture, June 2000.
[16]
A. Roth and G.S. Sohi, “Speculative Data-Driven Multithreading,” Proc. Sixth Conf. High-Performance Computer Architecture, Jan. 2000.
[17]
C.-K. Luk, “Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors,” Proc. 28th Ann. Int'l Symp. Computer Architecture, June 2001.
[18]
E. Özer, T.M. Conte, and S. Sharma, “Weld: A Multithreading Technique towards Latency-Tolerant VLIW Processors,” Proc. Eighth Int'l Conf. High Performance Computing, Dec. 2001.
[19]
W.W. Hwu and Y.N. Patt, “Checkpoint Repair for High-Performance Out-of-Order Execution Machines,” IEEE Trans. Computers, vol. 36, no. 12, Dec. 1987.
[20]
M. Franklin and G.S. Sohi, “ARB: A Hardware Mechanism for Dynamic Reordering of Memory References,” IEEE Trans. Computers, May 1996.
[21]
M. Tremblay, “A Microprocessor Architecture for the New Millennium,” Hot Chips 11, Aug. 1999.
[22]
Transmeta, CrusoeTM,
[23]
Intel, Intel Itanium Processor at 800MHZ and 733MHZ Data Sheet, May 2001.
[24]
T. Sukemura, “FR500 VLIW-Architecture High-Performance Embedded Microprocessor,” FUJITSU Scientific and Technical J., vol 36, no. 1, June 2000.
[25]
StarCore, SC140 DSP Core Reference Manual, 2001.
[26]
Texas Instruments, TMS320C62XX CPU and Instruction Set Reference Guide, July 1997.
[27]
Philips, TM 1000 Preliminary Data Book, 1997.
[28]
P. Faraboschi, G. Brown, J.A. Fisher, G. Desoli, and F. Homewood, “Lx: A Technology Platform for Customizable VLIW Embedded Processing,” Proc. 27th Int'l Symp. Computer Architecture (ISCA-2000), 2000.
[29]
S. Kaxiras, A.D. Berenbaum, and G. Narlikar, “Simultaneous Multithreaded DSPs: Scaling from High Performance to Low Power,” Bell Laboratories Technical Memorandum 10009639-001024-06TM, 2000.
[30]
H.P. Rao, S.K. Nandy, and M.N. V. S. Kiran, “Simultaneous MultiStreaming for Complexity Effective VLIW Architectures,” Proc. Advances in Computer System Architecture (ACSAC 2003), Sept. 2003.
[31]
P.P. Chang, S.A. Mahlke, W.Y. Chen, N.J. Water, and W.-m.W. Hwu, “IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors,” Proc. 18th Ann. Int'l Symp. Computer Architecture, May 1991.
[32]
S. Aditya, V. Kathail, and B.R. Rau, “Elcor's Machine Description System: Version 3.0,” HP Technical Report HPL-98-128, Oct. 1998.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 16, Issue 12
December 2005
96 pages

Publisher

IEEE Press

Publication History

Published: 01 December 2005

Author Tags

  1. Multithreaded processors
  2. VLIW architectures
  3. modeling of computer architecture.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media