skip to main content
10.5555/774861.774874acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
Article

Convergent scheduling

Published: 18 November 2002 Publication History

Abstract

Convergent scheduling is a general framework for cluster assignment and instruction scheduling on spatial architectures. A convergent scheduler is composed of independent passes, each implementing a heuristic that addresses a particular problem or constraint. The passes share a simple, common interface that provides spatial and temporal preference for each instruction. Preferences are not absolute; instead, the interface allows a pass to express the confidence of its preferences, as well as preferences for multiple space and time slots. A pass operates by modifying these preferences. By applying a series of passes that address all the relevant constraints, the convergent scheduler can produce a schedule that satisfies all the important constraints. Because all passes are independent and need to understand only one interface to interact with each other, convergent scheduling simplifies the problem of handling multiple constraints and codeveloping different heuristics. We have applied convergent scheduling to two spatial architectures: the Raw processor and a clustered VLIW machine. It is able to successfully handle traditional constraints such as parallelism, load balancing, and communication minimization, as well as constraints due to preplaced instructions, which are instructions with predetermined cluster assignment. Convergent scheduling is able to obtain an average performance improvement of 21% over the existing space-time scheduler of the Raw processor, and an improvement of 14% over state-of-the-art assignment and scheduling techniques on a clustered VLIW architecture.

References

[1]
J. Babb, M. Frank, V. Lee, E. Waingold, R. Barua, M. Taylor, J. Kim, S. Devabhaktuni, and A. Agarwal. The RAW Benchmark Suite: Computation Structures for General Purpose Computing. In 5th Symposium on FPGAs-Based Custom Computing Machines (FCCM), pages 134--143, 1997.
[2]
R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal. Maps: A Compiler-Managed Memory System for Raw Machines. In 26th International Symposium on Computer Architecture (ISCA), pages 4--15, 1999.
[3]
A. Capitanio, N. Dutt, and A. Nicolau. Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs. In 25th International Symposium on Microarchitecture (MICRO), pages 292--300, 1992.
[4]
K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for Reduced Code Space using Genetic Algorithms. In Workshop on Languages, Compilers and Tools for Embedded Systems (LCTES), pages 1--9, 1999.
[5]
G. Desoli. Instruction Assignment for Clustered VLIW DSP Compilers: a New Approach. Technical Report HPL-98-13, Hewlett Packard Laboratories, 1998.
[6]
J. R. Ellis. Bulldog: A Compiler for VLIW Architectures. MIT Press, 1986.
[7]
J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers, C-30(7):478--490, July 1981.
[8]
L. George and A. W. Appel. Iterated Register Coalescing. ACM Transactions on Programming Languages and Systems, 18(3):300--324, 1996.
[9]
W. A. Havanki, S. Banerjia, and T. M. Conte. Treegion Scheduling for Wide Issue Processors. In 4th International Symposium on High Performance Computer Architecture (HPCA), pages 266--276, 1998.
[10]
W. Hwu, S. Mahlke, W. Chen, P. Chang, N. Warter, R. Bringmann, R. Ouellette, R. Hank, T. Kiyohara, G. Haab, J. Holm, and D. Lavery. The Superblock: An Effective Technique for VLIW and Superscalar Compilation. The Journal of Supercomputing, 7(1):229--248, Jan 1993.
[11]
K. Kailas, K. Ebcioglu, and A. K. Agrawala. CARS: A New Code Generation Framework for Clustered ILP Processors. In 7th International Symposium on High Performance Computer Architecture (HPCA), pages 133--143, 2001.
[12]
H.-S. Kim and J. E. Smith. An Instruction Set and Microarchitecture for Instruction Level Distributed Processing. In 29th International Symposium on Computer Architecture (ISCA), pages 71--81, 2002.
[13]
S. Larsen and S. Amarasinghe. Increasing and Detecting Memory Address Congruence. In 11th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2002.
[14]
W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe. Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine. In 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 46--57, 1998.
[15]
S. Lerner, D. Grove, and C. Chambers. Composing Dataflow Analyses and Transformations. In 29th Symposium on Principles of Programming Languages (POPL), pages 270--282, 2002.
[16]
R. Leupers. Instruction Scheduling for Clustered VLIW DSPs. In 9th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 291--300, 2000.
[17]
P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg. The Multiflow Trace Scheduling Compiler. In Journal of Supercomputing, pages 51--142, 1993.
[18]
Machsuif. http://www.eecs.harvard.edu/hube.
[19]
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective Compiler Support for Predicated Execution Using the Hyperblock. In 25th Annual International Symposium on Microarchitecture (MICRO), pages 45--54, 1992.
[20]
D. Maze. Compilation Infrastructure for VLIW Machines. Master's thesis, Massachusetts Institute of Technology, September 2001.
[21]
R. Motwani, K. V. Palem, V. Sarkar, and S. Reyen. Combining Register Allocation and Instruction Scheduling. Technical Report CS-TN-95-22, 1995.
[22]
R. Nagarajan, K. Sankaralingam, D. Burger, and S. Keckler. A Design Space Evaluation of Grid Processor Architectures. In 34th International Symposium on Microarchitecture (MICRO), pages 40--51, 2001.
[23]
E. Ozer, S. Banerjia, and T. M. Conte. Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures. In 31st International Symposium on Microarchitecture (MICRO), pages 308--315, 1998.
[24]
M. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, J.-W. Lee, P. Johnson, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. S. M. Frank, S. Amarasinghe, and A. Agarwal. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro, pages 25--35, March/April 2002.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
November 2002
442 pages
ISBN:0769518591

Sponsors

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 18 November 2002

Check for updates

Qualifiers

  • Article

Conference

Micro-35
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media