Article

Free access

The multicluster architecture: reducing cycle time through partitioning

Authors:

Keith I. Farkas,

Norman P. Jouppi,

Zvonko VranesicAuthors Info & Claims

MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture

Pages 149 - 159

Published: 01 December 1997 Publication History

PDF eReader Publisher Site

Abstract

The multicluster architecture that we introduce offers a decentralized, dynamically-scheduled architecture, in which the register files, dispatch queue, and functional units of the architecture are distributed across multiple clusters, and each cluster is assigned a subset of the architectural registers. The motivation for the multicluster architecture is to reduce the clock cycle time, relative to a single-cluster architecture with the same number of hardware resources, by reducing the size and complexity of components on critical timing paths. Resource partitioning, however, introduces instruction-execution overhead and may reduce the number of concurrently executing instructions. To counter these two negative by-products of partitioning, we developed a static instruction scheduling algorithm. We describe this algorithm, and using trace-driven simulations of SPEC92 benchmarks, evaluate its effectiveness. This evaluation indicates that for the configurations considered, the multicluster architecture may have significant performance advantages at feature sizes below 0.35um, and warrants further investigation.

References

[1]

Linley Gwennap. Digital 21264 Sets New Standard. Microprocessor Report, 10(14), 1996.

[2]

Kenneth C. Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2):28-40, 1996.

Digital Library

[3]

Keith I. Farkas. Memory-system Design Considerations for Dynamically-scheduled Microprocessors. PhD thesis, Department of Electrical and Computer Engineering, University of Toronto, Ontario, Canada, January 1997. (URL: http:/lwww, eecg.toronto.edu/~farkas/thesis_phd.html).

[4]

James E. Smith. Decoupled Acess/Execute Computer Architecture. In the Proceedings of the 9th International Symposium on Computer Architecture, pages 112-119, 1982.

Digital Library

[5]

P. Geoffrey Lowney, Stefan Freudenberger, Thomas Karzes, W.D. Lichtenstein, Robert P. Nix, John S. O' Donnell, and John C. Ruttenberg. The Multiflow Trace Scheduling Compiler. Journal Of Supercomputing, 7(I-2):51-142, May 1993.

Digital Library

[6]

Gurindar S. Sohi, Scott E. Breach, and T N. Vijaykumar. Multiscalar processors, by the Proceedings of the 22st blternational Symposium on ComputerArchitecture, pages 414-425, 1995.

Digital Library

[7]

Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreaded Processor. In the Proceedings of the 23rd International Symposium on Computer Architecture, pages 191- 202, May 1996.

Digital Library

[8]

Basem A. Nayfeh, Lance Hammond, and Kunle Olukotun. Evaluation of Design Alternatives for a Multiprocessor Microprocessor. In the Proceedings of the 23rd International Symposium on Computer Architecture, pages 67-77, May 1996.

Digital Library

[9]

Alfred V. Aho, Ravi Sethi, and Jeffrey D. LIII- man. Compilers, Principles, Techniques and Tools. Addison-Wesley Publishing Company, Reading Mass., 1986.

Digital Library

[10]

Preston Briggs, Keith D. Cooper, and Linda Torczon. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages attd Systems, 16(3):428--455, May 1994.

Digital Library

[11]

Amitabh Srivastava and Alan Eustace. Atom: A system for building customized program analysis tools. In the Proceedings of the ACM SIGPLAN '94 Conference on Programming Languages, March 1994.

Digital Library

[12]

Keith I. Farkas and Norman P. Jouppi. Complexity/Performance Tradeoffs with Non-Blocking Loads. In the Proceedings of the 21st International S)wtposium on Computer Architecture, pages 211-222, 1994.

Digital Library

[13]

Scott McFarling. Combining branch predictors. DEC WRL Technical Note TN-36, 1993.

[14]

Subbarao Palacharla, Norman P. Jouppi, and James E. Smith. Complexity-Effective Superscalar Processors. In the Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 206- 218, 1997.

Digital Library

[15]

Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Nancy J. Warter, and Wen-mei W. Hwu. IMPACT: an Architectural Framework for Multiple-Instruction- Issue Processors. In the Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 266-275, 1991.

Digital Library

Cited By

Tarsa SChowdhury RSebot JChinya GGaur JSankaranarayanan KLin CChappell RSinghal RWang HManne SHunter HAltman E(2019)Post-silicon CPU adaptation made practical using machine learningProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322267(14-26)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322267
Naresh VGope DLipasti M(2017)The CUREACM Transactions on Embedded Computing Systems10.1145/312652716:5s(1-19)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126527
Perais ASeznec A(2016)EOLEACM Transactions on Computer Systems10.1145/287063234:2(1-33)Online publication date: 21-Apr-2016
https://dl.acm.org/doi/10.1145/2870632
Show More Cited By

Index Terms

Recommendations

The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning

The multicluster architecture that we introduce offers a decentralized, dynamically-scheduled architecture, in which the register files, dispatch queue, and functional units of the architecture are distributed across multiple clusters, and each cluster ...
Register coalescing techniques for heterogeneous register architecture with copy sifting

Optimistic coalescing has been proven as an elegant and effective technique that provides better chances of safely coloring more registers in register allocation than other coalescing techniques. Its algorithm originally assumes homogeneous registers, ...
Register spilling via transformed interference equations for PAC DSP architecture

Digital signal processors DSPs with very long instruction word VLIW data-path architectures are increasingly being deployed on embedded devices for multimedia processing applications. To reduce the power consumption and design cost of VLIW DSP ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture

December 1997

369 pages

ISBN:0818679778

Chairmen:
Mark Smotherman
Clemson Univ., Clemson, SC
,
Tom Conte
North Carolina State Univ., Raleigh

Copyright © Copyright (c) 1997 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\TCMM: TC on Microprocessors & Microcomputers

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 December 1997

Check for updates

Author Tags

Qualifiers

Article

Conference

MICRO97

Sponsor:

SIGMICRO
IEEE-CS\TCMM

MICRO97: 30th Annual International Symposium on Microarchitecture

December 1 - 3, 1997

North Carolina, Research Triangle Park, USA

Acceptance Rates

MICRO 30 Paper Acceptance Rate 35 of 103 submissions, 34%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

99
Total Citations
View Citations
589
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)5

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tarsa SChowdhury RSebot JChinya GGaur JSankaranarayanan KLin CChappell RSinghal RWang HManne SHunter HAltman E(2019)Post-silicon CPU adaptation made practical using machine learningProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322267(14-26)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322267
Naresh VGope DLipasti M(2017)The CUREACM Transactions on Embedded Computing Systems10.1145/312652716:5s(1-19)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126527
Perais ASeznec A(2016)EOLEACM Transactions on Computer Systems10.1145/287063234:2(1-33)Online publication date: 21-Apr-2016
https://dl.acm.org/doi/10.1145/2870632
Michaud PMondelli ASeznec A(2015)Revisiting Clustered Microarchitecture for Future Superscalar CoresACM Transactions on Architecture and Code Optimization10.1145/280078712:3(1-22)Online publication date: 31-Aug-2015
https://dl.acm.org/doi/10.1145/2800787
Perais ASeznec AYew PZhai AKeckler S(2014)EOLEProceeding of the 41st annual international symposium on Computer architecuture10.5555/2665671.2665742(481-492)Online publication date: 14-Jun-2014
https://dl.acm.org/doi/10.5555/2665671.2665742
Perais ASeznec A(2014)EOLEACM SIGARCH Computer Architecture News10.1145/2678373.266574242:3(481-492)Online publication date: 14-Jun-2014
https://dl.acm.org/doi/10.1145/2678373.2665742
Shifer EWeiss S(2013)Low-latency adaptive mode transitions and hierarchical power management in asymmetric clustered coresACM Transactions on Architecture and Code Optimization10.1145/249990110:3(1-25)Online publication date: 16-Sep-2013
https://dl.acm.org/doi/10.1145/2499901
Naresh VPalframan DLipasti MGaluzzi CCarro LMoshovos APrvulovic M(2011)CRAMProceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2155620.2155643(196-205)Online publication date: 3-Dec-2011
https://dl.acm.org/doi/10.1145/2155620.2155643
Gupta SFeng SAnsari AMahlke S(2010)Erasing Core Boundaries for Robust and Configurable PerformanceProceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2010.30(325-336)Online publication date: 4-Dec-2010
https://dl.acm.org/doi/10.1109/MICRO.2010.30
Wang SHu JZiavras SChung SBenini LDe Micheli GAl-Hashimi BMueller W(2009)Exploiting narrow-width values for thermal-aware register file designsProceedings of the Conference on Design, Automation and Test in Europe10.5555/1874620.1874962(1422-1427)Online publication date: 20-Apr-2009
https://dl.acm.org/doi/10.5555/1874620.1874962
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents