skip to main content
10.5555/266800.266815acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article
Free access

The multicluster architecture: reducing cycle time through partitioning

Published: 01 December 1997 Publication History

Abstract

The multicluster architecture that we introduce offers a decentralized, dynamically-scheduled architecture, in which the register files, dispatch queue, and functional units of the architecture are distributed across multiple clusters, and each cluster is assigned a subset of the architectural registers. The motivation for the multicluster architecture is to reduce the clock cycle time, relative to a single-cluster architecture with the same number of hardware resources, by reducing the size and complexity of components on critical timing paths. Resource partitioning, however, introduces instruction-execution overhead and may reduce the number of concurrently executing instructions. To counter these two negative by-products of partitioning, we developed a static instruction scheduling algorithm. We describe this algorithm, and using trace-driven simulations of SPEC92 benchmarks, evaluate its effectiveness. This evaluation indicates that for the configurations considered, the multicluster architecture may have significant performance advantages at feature sizes below 0.35um, and warrants further investigation.

References

[1]
Linley Gwennap. Digital 21264 Sets New Standard. Microprocessor Report, 10(14), 1996.
[2]
Kenneth C. Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2):28-40, 1996.
[3]
Keith I. Farkas. Memory-system Design Considerations for Dynamically-scheduled Microprocessors. PhD thesis, Department of Electrical and Computer Engineering, University of Toronto, Ontario, Canada, January 1997. (URL: http:/lwww, eecg.toronto.edu/~farkas/thesis_phd.html).
[4]
James E. Smith. Decoupled Acess/Execute Computer Architecture. In the Proceedings of the 9th International Symposium on Computer Architecture, pages 112-119, 1982.
[5]
P. Geoffrey Lowney, Stefan Freudenberger, Thomas Karzes, W.D. Lichtenstein, Robert P. Nix, John S. O' Donnell, and John C. Ruttenberg. The Multiflow Trace Scheduling Compiler. Journal Of Supercomputing, 7(I-2):51-142, May 1993.
[6]
Gurindar S. Sohi, Scott E. Breach, and T N. Vijaykumar. Multiscalar processors, by the Proceedings of the 22st blternational Symposium on ComputerArchitecture, pages 414-425, 1995.
[7]
Dean M. Tullsen, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Jack L. Lo, and Rebecca L. Stamm. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreaded Processor. In the Proceedings of the 23rd International Symposium on Computer Architecture, pages 191- 202, May 1996.
[8]
Basem A. Nayfeh, Lance Hammond, and Kunle Olukotun. Evaluation of Design Alternatives for a Multiprocessor Microprocessor. In the Proceedings of the 23rd International Symposium on Computer Architecture, pages 67-77, May 1996.
[9]
Alfred V. Aho, Ravi Sethi, and Jeffrey D. LIII- man. Compilers, Principles, Techniques and Tools. Addison-Wesley Publishing Company, Reading Mass., 1986.
[10]
Preston Briggs, Keith D. Cooper, and Linda Torczon. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages attd Systems, 16(3):428--455, May 1994.
[11]
Amitabh Srivastava and Alan Eustace. Atom: A system for building customized program analysis tools. In the Proceedings of the ACM SIGPLAN '94 Conference on Programming Languages, March 1994.
[12]
Keith I. Farkas and Norman P. Jouppi. Complexity/Performance Tradeoffs with Non-Blocking Loads. In the Proceedings of the 21st International S)wtposium on Computer Architecture, pages 211-222, 1994.
[13]
Scott McFarling. Combining branch predictors. DEC WRL Technical Note TN-36, 1993.
[14]
Subbarao Palacharla, Norman P. Jouppi, and James E. Smith. Complexity-Effective Superscalar Processors. In the Proceedings of the 24th Annual International Symposium on Computer Architecture, pages 206- 218, 1997.
[15]
Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Nancy J. Warter, and Wen-mei W. Hwu. IMPACT: an Architectural Framework for Multiple-Instruction- Issue Processors. In the Proceedings of the 18th Annual International Symposium on Computer Architecture, pages 266-275, 1991.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
December 1997
369 pages
ISBN:0818679778

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 December 1997

Check for updates

Author Tags

  1. decentralized architecture
  2. partitioned architecture
  3. register allocation
  4. static instruction scheduling

Qualifiers

  • Article

Conference

MICRO97
Sponsor:
MICRO97: 30th Annual International Symposium on Microarchitecture
December 1 - 3, 1997
North Carolina, Research Triangle Park, USA

Acceptance Rates

MICRO 30 Paper Acceptance Rate 35 of 103 submissions, 34%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)5
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media