Article

Convergent scheduling

Authors:

Saman AmarasingheAuthors Info & Claims

MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture

Pages 111 - 122

Published: 18 November 2002 Publication History

Publisher Site Get Access

Abstract

Convergent scheduling is a general framework for cluster assignment and instruction scheduling on spatial architectures. A convergent scheduler is composed of independent passes, each implementing a heuristic that addresses a particular problem or constraint. The passes share a simple, common interface that provides spatial and temporal preference for each instruction. Preferences are not absolute; instead, the interface allows a pass to express the confidence of its preferences, as well as preferences for multiple space and time slots. A pass operates by modifying these preferences. By applying a series of passes that address all the relevant constraints, the convergent scheduler can produce a schedule that satisfies all the important constraints. Because all passes are independent and need to understand only one interface to interact with each other, convergent scheduling simplifies the problem of handling multiple constraints and codeveloping different heuristics. We have applied convergent scheduling to two spatial architectures: the Raw processor and a clustered VLIW machine. It is able to successfully handle traditional constraints such as parallelism, load balancing, and communication minimization, as well as constraints due to preplaced instructions, which are instructions with predetermined cluster assignment. Convergent scheduling is able to obtain an average performance improvement of 21% over the existing space-time scheduler of the Raw processor, and an improvement of 14% over state-of-the-art assignment and scheduling techniques on a clustered VLIW architecture.

References

[1]

J. Babb, M. Frank, V. Lee, E. Waingold, R. Barua, M. Taylor, J. Kim, S. Devabhaktuni, and A. Agarwal. The RAW Benchmark Suite: Computation Structures for General Purpose Computing. In 5th Symposium on FPGAs-Based Custom Computing Machines (FCCM), pages 134--143, 1997.

Digital Library

[2]

R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal. Maps: A Compiler-Managed Memory System for Raw Machines. In 26th International Symposium on Computer Architecture (ISCA), pages 4--15, 1999.

Digital Library

[3]

A. Capitanio, N. Dutt, and A. Nicolau. Partitioned Register Files for VLIWs: A Preliminary Analysis of Tradeoffs. In 25th International Symposium on Microarchitecture (MICRO), pages 292--300, 1992.

Digital Library

[4]

K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for Reduced Code Space using Genetic Algorithms. In Workshop on Languages, Compilers and Tools for Embedded Systems (LCTES), pages 1--9, 1999.

Digital Library

[5]

G. Desoli. Instruction Assignment for Clustered VLIW DSP Compilers: a New Approach. Technical Report HPL-98-13, Hewlett Packard Laboratories, 1998.

[6]

J. R. Ellis. Bulldog: A Compiler for VLIW Architectures. MIT Press, 1986.

Digital Library

[7]

J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Transactions on Computers, C-30(7):478--490, July 1981.

Digital Library

[8]

L. George and A. W. Appel. Iterated Register Coalescing. ACM Transactions on Programming Languages and Systems, 18(3):300--324, 1996.

Digital Library

[9]

W. A. Havanki, S. Banerjia, and T. M. Conte. Treegion Scheduling for Wide Issue Processors. In 4th International Symposium on High Performance Computer Architecture (HPCA), pages 266--276, 1998.

Digital Library

[10]

W. Hwu, S. Mahlke, W. Chen, P. Chang, N. Warter, R. Bringmann, R. Ouellette, R. Hank, T. Kiyohara, G. Haab, J. Holm, and D. Lavery. The Superblock: An Effective Technique for VLIW and Superscalar Compilation. The Journal of Supercomputing, 7(1):229--248, Jan 1993.

Digital Library

[11]

K. Kailas, K. Ebcioglu, and A. K. Agrawala. CARS: A New Code Generation Framework for Clustered ILP Processors. In 7th International Symposium on High Performance Computer Architecture (HPCA), pages 133--143, 2001.

Digital Library

[12]

H.-S. Kim and J. E. Smith. An Instruction Set and Microarchitecture for Instruction Level Distributed Processing. In 29th International Symposium on Computer Architecture (ISCA), pages 71--81, 2002.

Digital Library

[13]

S. Larsen and S. Amarasinghe. Increasing and Detecting Memory Address Congruence. In 11th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2002.

Digital Library

[14]

W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe. Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine. In 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 46--57, 1998.

Digital Library

[15]

S. Lerner, D. Grove, and C. Chambers. Composing Dataflow Analyses and Transformations. In 29th Symposium on Principles of Programming Languages (POPL), pages 270--282, 2002.

Digital Library

[16]

R. Leupers. Instruction Scheduling for Clustered VLIW DSPs. In 9th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 291--300, 2000.

Digital Library

[17]

P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg. The Multiflow Trace Scheduling Compiler. In Journal of Supercomputing, pages 51--142, 1993.

Digital Library

[18]

Machsuif. http://www.eecs.harvard.edu/hube.

[19]

S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective Compiler Support for Predicated Execution Using the Hyperblock. In 25th Annual International Symposium on Microarchitecture (MICRO), pages 45--54, 1992.

Digital Library

[20]

D. Maze. Compilation Infrastructure for VLIW Machines. Master's thesis, Massachusetts Institute of Technology, September 2001.

[21]

R. Motwani, K. V. Palem, V. Sarkar, and S. Reyen. Combining Register Allocation and Instruction Scheduling. Technical Report CS-TN-95-22, 1995.

Digital Library

[22]

R. Nagarajan, K. Sankaralingam, D. Burger, and S. Keckler. A Design Space Evaluation of Grid Processor Architectures. In 34th International Symposium on Microarchitecture (MICRO), pages 40--51, 2001.

Digital Library

[23]

E. Ozer, S. Banerjia, and T. M. Conte. Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures. In 31st International Symposium on Microarchitecture (MICRO), pages 308--315, 1998.

Digital Library

[24]

M. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, J.-W. Lee, P. Johnson, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. S. M. Frank, S. Amarasinghe, and A. Agarwal. The Raw Microprocessor: A Computational Fabric for Software Circuits and General Purpose Programs. IEEE Micro, pages 25--35, March/April 2002.

Digital Library

Cited By

Zhang YZhang NZhao TVilim MShahbaz MOlukotun KMartínez JDuato JJohn L(2021)SARAProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00085(1041-1054)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00085
Beg MBeek P(2013)A constraint programming approach for integrated spatial and temporal scheduling for clustered architecturesACM Transactions on Embedded Computing Systems10.1145/251247013:1(1-23)Online publication date: 5-Sep-2013
https://dl.acm.org/doi/10.1145/2512470
Grant DWang CLemieux GWawrzynek JCompton K(2011)A CAD framework for MalibuProceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays10.1145/1950413.1950441(123-132)Online publication date: 27-Feb-2011
https://dl.acm.org/doi/10.1145/1950413.1950441
Show More Cited By

Index Terms

Convergent scheduling
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

A New QP-Free, Globally Convergent, Locally Superlinearly Convergent Algorithm For Inequality Constrained Optimization

In this paper, we propose a new QP-free method, which ensures the feasibility of all iterates, for inequality constrained optimization. The method is based on a nonsmooth equation reformulation of the KKT optimality condition, by using the Fischer--...
Early-release fair scheduling
Euromicro-RTS'00: Proceedings of the 12th Euromicro conference on Real-time systems

We present a variant of Pfair scheduling, which we call early-release fair (ERfair) scheduling. Like conventional Pfair scheduling, ERfair scheduling algorithms can be applied to optimally schedule periodic tasks on a multiprocessor system in polynomial ...
EDZL Scheduling Analysis
ECRTS '07: Proceedings of the 19th Euromicro Conference on Real-Time Systems

A schedulability test is derived for the global Earliest Deadline Zero Laxity (EDZL) scheduling algorithm on a platform with multiple identical processors. The test is sufficient, but not necessary, to guarantee that a system of independent sporadic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture

November 2002

442 pages

ISBN:0769518591

Conference Chair:
Erik Altman
IBM
,
General Chair:
Kemal Ebcioǧlu
IBM
,
Program Chairs:
Scott Mahlke
University of Michigan
,
B. Ramakrishna Rau
Hewlett-Packard Laboratories
,
Publications Chair:
Sanjay Patel
University of Illinois

Sponsors

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 18 November 2002

Check for updates

Qualifiers

Article

Conference

Micro-35

Sponsor:

SIGMICRO

Micro-35: 35th Annual International Symposium on Microarchitecture

November 18 - 22, 2002

Istanbul, Turkey

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
213
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang YZhang NZhao TVilim MShahbaz MOlukotun KMartínez JDuato JJohn L(2021)SARAProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00085(1041-1054)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00085
Beg MBeek P(2013)A constraint programming approach for integrated spatial and temporal scheduling for clustered architecturesACM Transactions on Embedded Computing Systems10.1145/251247013:1(1-23)Online publication date: 5-Sep-2013
https://dl.acm.org/doi/10.1145/2512470
Grant DWang CLemieux GWawrzynek JCompton K(2011)A CAD framework for MalibuProceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays10.1145/1950413.1950441(123-132)Online publication date: 27-Feb-2011
https://dl.acm.org/doi/10.1145/1950413.1950441
Park HFan KMahlke SOh TKim HKim HMoshovos ATarditi DOlukotun K(2008)Edge-centric modulo scheduling for coarse-grained reconfigurable architecturesProceedings of the 17th international conference on Parallel architectures and compilation techniques10.1145/1454115.1454140(166-176)Online publication date: 25-Oct-2008
https://dl.acm.org/doi/10.1145/1454115.1454140
Chen GLi FSon SKandemir MFix L(2008)Application mapping for chip multiprocessorsProceedings of the 45th annual Design Automation Conference10.1145/1391469.1391628(620-625)Online publication date: 8-Jun-2008
https://dl.acm.org/doi/10.1145/1391469.1391628
Fan KPark HKudlur MMahlke SSoffa MDuesterwald E(2008)Modulo scheduling for highly customized datapaths to increase hardware reusabilityProceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization10.1145/1356058.1356075(124-133)Online publication date: 6-Apr-2008
https://dl.acm.org/doi/10.1145/1356058.1356075
Ottoni GAugust D(2008)Communication optimizations for global multi-threaded instruction schedulingACM SIGPLAN Notices10.1145/1353536.134631043:3(222-232)Online publication date: 1-Mar-2008
https://dl.acm.org/doi/10.1145/1353536.1346310
Ottoni GAugust D(2008)Communication optimizations for global multi-threaded instruction schedulingACM SIGOPS Operating Systems Review10.1145/1353535.134631042:2(222-232)Online publication date: 1-Mar-2008
https://dl.acm.org/doi/10.1145/1353535.1346310
Ottoni GAugust D(2008)Communication optimizations for global multi-threaded instruction schedulingACM SIGARCH Computer Architecture News10.1145/1353534.134631036:1(222-232)Online publication date: 1-Mar-2008
https://dl.acm.org/doi/10.1145/1353534.1346310
Ottoni GAugust DEggers SLarus J(2008)Communication optimizations for global multi-threaded instruction schedulingProceedings of the 13th international conference on Architectural support for programming languages and operating systems10.1145/1346281.1346310(222-232)Online publication date: 1-Mar-2008
https://dl.acm.org/doi/10.1145/1346281.1346310
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents