article

A spatial path scheduling algorithm for EDGE architectures

Authors:

Katherine E. Coons,

Kathryn S. McKinley,

Sundeep K. KushwahaAuthors Info & Claims

ACM SIGPLAN Notices, Volume 41, Issue 11

Pages 129 - 140

https://doi.org/10.1145/1168918.1168875

Published: 20 October 2006 Publication History

Abstract

Growing on-chip wire delays are motivating architectural features that expose on-chip communication to the compiler. EDGE architectures are one example of communication-exposed microarchitectures in which the compiler forms dataflow graphs that specify how the microarchitecture maps instructions onto a distributed execution substrate. This paper describes a compiler scheduling algorithm called spatial path scheduling that factors in previously fixed locations - called anchor points - for each placement. This algorithm extends easily to different spatial topologies. We augment this basic algorithm with three heuristics: (1) local and global ALU and network link contention modeling, (2) global critical path estimates, and (3) dependence chain path reservation. We use simulated annealing to explore possible performance improvements and to motivate the augmented heuristics and their weighting functions. We show that the spatial path scheduling algorithm augmented with these three heuristics achieves a 21% average performance improvement over the best prior algorithm and comes within an average of 5% of the annealed performance for our benchmarks.

References

[1]

K. Arvind and R.S. Nikhil. Executing a program on the MIT taggedtoken dataflow architecture. IEEE Transactions on Computers, 39(3):300--318, 1990.

Digital Library

[2]

S.J. Beaty and P.H. Sweany. Instruction scheduling using simulated annealing. In International Conference on Massively Parallel Computing Systems, Colorado Springs, CO, Apr. 1998.

[3]

V. Betz and J. Rose. VPR: A new packing, placement and routing tool for FPGA research. In FPL '97: Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications, pages 213--222, London, UK, 1997. Springer-Verlag.

Digital Library

[4]

D. Burger, S.W. Keckler, K.S. McKinley, M. Dahlin, L.K. John, C. Lin, C.R. Moore, J. Burrill, R.G. McDonald, W. Yoder, and others. Scaling to the end of silicon with EDGE architectures. IEEE Computer, pages 44--55, July 2004.

Digital Library

[5]

J.B. Dennis and D.P. Misunas. A preliminary architecture for a basic data-flow processor. In International Symposium on Computer Architecture, pages 126--132, New York, NY, USA, 1975.

Digital Library

[6]

J.R. Ellis. Bulldog: A Compiler for VLIW Architectures. MIT Press, 1986.

Digital Library

[7]

B. Fields, S. Rubin, and R. Bodik. Focusing processor policies via critical-path prediction. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 74--85, July 2001.

Digital Library

[8]

J.A. Fisher, J.R. Ellis, J.C. Ruttenberg, and A. Nicolau. Parallel processing: A smart compiler and a dumb machine. In ACM Symposium on Compiler Construction, Montreal, Canada, June 1984.

Digital Library

[9]

E. Gibert, J. Sanchez, and A. Gonzalez. Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor. In Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pages 123--133, 2002.

Digital Library

[10]

K. Kailas, K. Ebcioglu, and A.K. Agrawala. CARS: A new code generation framework for clustered ILP processors. In International Symposium on High-Performance Computer Architecture, pages 133--143, Jan. 2001.

Digital Library

[11]

C. Kessler and A. Bednarski. Optimal integrated code generation for clustered VLIWarchitectures. In Proceedings of the Joint Conference on Languages, Compilers and Tools for Embedded Systems, pages 102--111, June 2002.

Digital Library

[12]

S. Kirkpatrick, C.D. Gelatt Jr., and M.P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671--680, 1983.

[13]

R.E. Korf. Depth-first iterative-deepening: an optimal admissible tree search. Artif. Intell., 27(1):97--109, 1985.

Digital Library

[14]

W. Lee, D. Puppin, S. Swanson, and S. Amarasinghe. Convergent scheduling. In International Symposium on Microarchitecture, Istanbul, Turkey, Oct. 2002.

Digital Library

[15]

M. Mercaldi, S. Swanson, A. Peterson, A. Putnam, A. Schwerin, M. Oskin, and S. Eggers. Modeling instruction placement on a spatial architecture. In SPAA '06: Proceedings of the Symposium on Parallel Architectures and Applications, 2006.

Digital Library

[16]

J. Moss, P.E. Utgoff, J. Cavazos, D. Precup, D. Stefanovic, C. Brodley, and D. Scheeff. Learning to schedule straight-line code. In Neural Information Processing Systems - Natural and Synthetic, Denver, CO, Dec. 1997.

Digital Library

[17]

R. Nagarajan, D. Burger, K.S. McKinley, C. Lin, S.W. Keckler, and S.K. Kushwaha. Instruction scheduling for emerging communication-exposed architectures. In The International Conference on Parallel Architectures and Compilation Techniques, pages 74--84, Antibes Juan-les-Pins, France, Oct. 2004.

Digital Library

[18]

R. Nagarajan, X. Chen, R.G. McDonald, D. Burger, and S.W. Keckler. Critical path analysis of the TRIPS architecture. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2006.

[19]

E. Ozer, S. Banerjia, and T.M. Conte. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In International Symposium on Microarchitecture, pages 308--315, December 1998.

Digital Library

[20]

P.G. Paulin and J.P. Knight. Force-directed scheduling in automatic data path synthesis. In DAC '87: Proceedings of the 24th ACM/IEEE conference on Design automation, pages 195--202, New York, NY, USA, 1987. ACM Press.

Digital Library

[21]

Y. Qian, S. Carr, and P. Sweany. Optimizing loop performance for clustered VLIW architectures. In The International Conference on Parallel Architectures and Compilation Techniques, pages 271--280, Charlottesville, VA, Sept. 2002.

Digital Library

[22]

A. Smith, J. Burrill, J. Gibson, B. Maher, N. Nethercote, B. Yoder, D. Burger, and K.S. McKinley. Compiling for EDGE architectures. In International Symposium on Code Generation and Optimization, Manhattan, NY, Mar. 2006.

Digital Library

[23]

S. Swanson, K. Michaelson, A. Schwerin, and M. Oskin. WaveScalar. In Proceedings of the 36th Symposium on Microarchitecture, December 2003.

Digital Library

[24]

S. Swanson, K. Michelson, and M. Oskin. Configuration by combustion: Online simulated annealing for dynamic hardware configuration. In ASPLOS X Wild and Crazy Idea Session, 2002.

[25]

E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to software: Raw machines. IEEE Computer, pages 86--93, Sept. 1997.

Digital Library

[26]

J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Software and hardware techniques to optimize register file utilization in VLIW architectures. In Proceedings of the International Workshop on Advanced Compiler Technology for High Performance and Embedded Systems (IWACT), July 2001.

Cited By

Nowatzki TArdalani NSankaralingam KWeng JEvripidou SStenström PO'Boyle M(2018)Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesignProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243212(1-15)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243212
Miniskar NKohli SPark HYoo DChatha KErnst RRaghunathan AIyer R(2014)Retargetable automatic generation of compound instructions for CGRA based reconfigurable processor applicationsProceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems10.1145/2656106.2656125(1-9)Online publication date: 12-Oct-2014
https://dl.acm.org/doi/10.1145/2656106.2656125
Yazdanpanah FAlvarez-Martinez CJimenez-Gonzalez DEtsion Y(2014)Hybrid Dataflow/von-Neumann ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.12525:6(1489-1509)Online publication date: 1-Jun-2014
https://dl.acm.org/doi/10.1109/TPDS.2013.125
Show More Cited By

Index Terms

A spatial path scheduling algorithm for EDGE architectures
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

A spatial path scheduling algorithm for EDGE architectures
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Growing on-chip wire delays are motivating architectural features that expose on-chip communication to the compiler. EDGE architectures are one example of communication-exposed microarchitectures in which the compiler forms dataflow graphs that specify ...
A spatial path scheduling algorithm for EDGE architectures
Proceedings of the 2006 ASPLOS Conference

Growing on-chip wire delays are motivating architectural features that expose on-chip communication to the compiler. EDGE architectures are one example of communication-exposed microarchitectures in which the compiler forms dataflow graphs that specify ...
A spatial path scheduling algorithm for EDGE architectures
Proceedings of the 2006 ASPLOS Conference

Growing on-chip wire delays are motivating architectural features that expose on-chip communication to the compiler. EDGE architectures are one example of communication-exposed microarchitectures in which the compiler forms dataflow graphs that specify ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 41, Issue 11

Proceedings of the 2006 ASPLOS Conference

November 2006

425 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1168918

Issue’s Table of Contents

ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
October 2006
440 pages
ISBN:1595934510
DOI:10.1145/1168857
General Chair:
John Paul Shen
Intel Corp.
,
Program Chair:
Margaret R. Martonosi
Princeton University

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2006

Published in SIGPLAN Volume 41, Issue 11

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
838
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nowatzki TArdalani NSankaralingam KWeng JEvripidou SStenström PO'Boyle M(2018)Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesignProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243212(1-15)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243212
Miniskar NKohli SPark HYoo DChatha KErnst RRaghunathan AIyer R(2014)Retargetable automatic generation of compound instructions for CGRA based reconfigurable processor applicationsProceedings of the 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems10.1145/2656106.2656125(1-9)Online publication date: 12-Oct-2014
https://dl.acm.org/doi/10.1145/2656106.2656125
Yazdanpanah FAlvarez-Martinez CJimenez-Gonzalez DEtsion Y(2014)Hybrid Dataflow/von-Neumann ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2013.12525:6(1489-1509)Online publication date: 1-Jun-2014
https://dl.acm.org/doi/10.1109/TPDS.2013.125
De Sutter BRaghavan PLambrechts A(2010)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-1-4419-6345-1_17(449-484)Online publication date: 16-Jul-2010
https://doi.org/10.1007/978-1-4419-6345-1_17
Feng YLi DTan XYe XFan DLi WWang DZhang HTang Z(2022)Accelerating Data Transfer in Dataflow Architectures Through a Look-Ahead Acknowledgment MechanismJournal of Computer Science and Technology10.1007/s11390-020-0555-637:4(942-959)Online publication date: 30-Jul-2022
https://doi.org/10.1007/s11390-020-0555-6
Zhao ZSheng WWang QYin WYe PLi JMao Z(2020)Towards Higher Performance and Robust Compilation for CGRA Modulo SchedulingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.298914931:9(2201-2219)Online publication date: 1-Sep-2020
https://doi.org/10.1109/TPDS.2020.2989149
Feng YXiang TYe XFan DWang DWu DTang Z(2018)Optimizing the Efficiency of Data Transfer in Dataflow Architectures2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2018.00050(140-149)Online publication date: Jun-2018
https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00050
Sutter BRaghavan PLambrechts A(2018)Coarse-Grained Reconfigurable Array ArchitecturesHandbook of Signal Processing Systems10.1007/978-3-319-91734-4_12(427-472)Online publication date: 14-Oct-2018
https://doi.org/10.1007/978-3-319-91734-4_12
Zhao ZSheng WHe WMao ZLi Z(2017)A static-placement, dynamic-issue framework for CGRA loop acceleratorProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130697(1348-1353)Online publication date: 27-Mar-2017
https://dl.acm.org/doi/10.5555/3130379.3130697
Zhao ZSheng WHe WMao ZLi Z(2017)A static-placement, dynamic-issue framework for CGRA loop acceleratorDesign, Automation & Test in Europe Conference & Exhibition (DATE), 201710.23919/DATE.2017.7927202(1348-1353)Online publication date: Mar-2017
https://doi.org/10.23919/DATE.2017.7927202
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents