research-article

A case for bufferless routing in on-chip networks

Authors:

Thomas Moscibroda,

Onur MutluAuthors Info & Claims

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Pages 196 - 207

https://doi.org/10.1145/1555754.1555781

Published: 20 June 2009 Publication History

Abstract

Buffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip interconnection networks that eliminates the need for buffers for routing or flow control. We describe new algorithms for routing without using buffers in router input/output ports. We analyze the advantages and disadvantages of bufferless routing and discuss how router latency can be reduced by taking advantage of the fact that input/output buffers do not exist. Our evaluations show that routing without buffers significantly reduces the energy consumption of the on-chip cache/processor-to-cache network, while providing similar performance to that of existing buffered routing algorithms at low network utilization (i.e., on most real applications). We conclude that bufferless routing can be an attractive and energy-efficient design option for on-chip cache/processor-to-cache networks where network utilization is low.

References

[1]

]]R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. In ICS, 1990.

Digital Library

[2]

]]P. Baran. On distributed communications networks. IEEE Trans. on Communications, Mar. 1964.

[3]

]]P. E. Berman, L. Gravano, G. D. Pifarre, and J. L. C. Sanz. Adaptive deadlock- and livelock-free routing with all minimal paths in torus networks. IEEE TPDS, 12(5), 1994.

Digital Library

[4]

]]S. Bhansali, W.-K. Chen, S. D. Jong, A. Edwards, R. Murray, M. Drinic, D. Mihocka, and J. Chau. Framework for instruction-level tracing and analysis of programs. In VEE, 2006.

Digital Library

[5]

]]D. Boggs et al. The microarchitecture of the Intel Pentium 4 processor on 90nm technology. Intel Technology Journal, 8(1), Feb. 2004.

[6]

]]S. Borkar. Thousand core chips: A technology perspective. In DAC, 2007.

Digital Library

[7]

]]S. Bregni and A. Pattavina. Performance evaluation of deflection routing in optical ip packet-switched networks. Cluster Computing, 7, 2004.

Digital Library

[8]

]]C. Busch, M. Herlihy, and R. Wattenhofer. Routing without flow control. In SPAA, 2001.

Digital Library

[9]

]]S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006.

Digital Library

[10]

]]W. J. Dally. Virtual-channel flow control. In ISCA-17, 1990.

Digital Library

[11]

]]W. J. Dally and C. L. Seitz. The torus routing chip. Distributed Computing, 1:187--196, 1986.

[12]

]]W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004.

Digital Library

[13]

]]S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 28(3):42--53, 2008.

Digital Library

[14]

]]U. Feige and P. Raghavan. Exact analysis of hot-potato routing. In STOC, 1992.

Digital Library

[15]

]]J. M. Frailong, W. Jalby, and J. Lenfant. XOR-Schemes: A flexible data organization in parallel memories. In ICPP, 1985.

[16]

]]R. Gabor, S. Weiss, and A. Mendelson. Fairness and throughput in switch on event multithreading. In MICRO-39, 2006.

Digital Library

[17]

]]M. Galles. Spider: A high-speed network interconnect. IEEE Micro, 17(1):34--39, 2008.

Digital Library

[18]

]]C. Gomez, M. E. Gomez, P. Lopez, and J. Duato. A bufferless switching technique for NoCs. In Wina, 2008.

[19]

]]C. Gomez, M. E. Gomez, P. Lopez, and J. Duato. Reducing packet dropping in a bufferless NoC. In Euro-Par, 2008.

Digital Library

[20]

]]M. K. Gowan, L. Biro, and D. Jackson. Power considerations in the design of the Alpha 21264 microprocessor. In DAC, 1998.

Digital Library

[21]

]]P. Gratz, B. Grot, and S. W. Keckler. Regional congestion awareness for load balance in networks-on-chip. In HPCA-14, 2008.

[22]

]]P. Gratz, C. Kim, R. McDonald, S. W. Keckler, and D. Burger. Implementation and evaluation of on-chip network architectures. In ICCD, 2006.

[23]

]]W. D. Hillis. The Connection Machine. MIT Press, 1989.

Digital Library

[24]

]]Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. A 5-ghz mesh interconnect for a teraflops processor. IEEE Micro, 27(5), 2007.

Digital Library

[25]

]]N. D. E. Jerger, L.-S. Peh, and M. H. Lipasti. Circuit-switched coherence. In NOCS, 2008.

Digital Library

[26]

]]C. Kim, D. Burger, and S. Keckler. An adaptive, non-uniform cache structure for wire-dominated on-chip caches. In ASPLOS-X, 2002.

Digital Library

[27]

]]J. Kim, J. D. Balfour, and W. J. Dally. Flattened butterfly topology for on-chip networks. In MICRO, 2007.

Digital Library

[28]

]]S. Konstantinidou and L. Snyder. Chaos router: architecture and performance. In ISCA, 1991.

Digital Library

[29]

]]D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981.

Digital Library

[30]

]]A. Kumar, L.-S. Peh, and N. K. Jha. Token flow control. In MICRO-41, 2008.

Digital Library

[31]

]]Z. Lu, M. Zhong, and A. Jantsch. Evaluation of on-chip networks using deflection routing. In GLSVLSI, 2006.

Digital Library

[32]

]]C.-K. Luk et al. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005.

Digital Library

[33]

]]K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.

[34]

]]M. M. K. Martin et al. Timestamp snooping: An approach for extending smps. In ASPLOS-IX, 2000.

Digital Library

[35]

]]G. Michelogiannakis, J. Balfour, and W. J. Dally. Elastic-buffer flow control for on-chip networks. In HPCA-15, 2009.

[36]

]]G. Michelogiannakis, D. Pnevmatikatos, and M. Katevenis. Approaching ideal NoC latency with pre-configured routes. In NOCS, 2007.

Digital Library

[37]

]]Micron. 1Gb DDR2 SDRAM Component: MT47H128M8HQ-25, May 2007. http://download.micron.com/pdf/datasheets/dram/ddr2/1GbDDR2.pdf.

[38]

]]M. Millberg, R. Nilsson, R. Thid, and A. Jantsch. Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip. In DATE, 2004.

Digital Library

[39]

]]R. Mullins, A. West, and S. Moore. Low-latency virtual-channel routers for on-chip networks. In ISCA-31, 2004.

Digital Library

[40]

]]O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007.

Digital Library

[41]

]]T. Nesson and S. L. Johnsson. ROMM: Routing on mesh and torus networks. In SPAA, 1995.

Digital Library

[42]

]]J. D. Owens, W. J. Dally, R. Ho, D. N. Jayashima, S. W. Keckler, and L.-S. Peh. Research challenges for on-chip interconnection networks. IEEE Micro, 27(5), 2007.

Digital Library

[43]

]]H. Patil et al. Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In MICRO-37, 2004.

Digital Library

[44]

]]L.-S. Peh and W. J. Dally. A delay model and speculative architecture for pipelined routers. In HPCA-7, 2001.

Digital Library

[45]

]]A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. GOAL: A load-balanced adaptive routing algorithm for torus networks. In ISCA, 2003.

Digital Library

[46]

]]B. J. Smith. A pipelined shared resource MIMD computer. In ICPP, 1978.

[47]

]]B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Proc. of SPIE, 1981.

[48]

]]B. J. Smith, Apr. 2008. Personal communication.

[49]

]]A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous mutlithreading processor. In ASPLOS-IX, 2000.

Digital Library

[50]

]]M. B. Taylor et al. Evaluation of the Raw microprocessor: An exposed-wire-delay architecture for ILP and streams. In ISCA-31, 2004.

Digital Library

[51]

]]H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: a power-performance simulator for interconnection networks. In MICRO, 2002.

Digital Library

[52]

]]X. Wang, A. Morikawa, and T. Aoyama. Burst optical deflection routing protocol for wavelength routing WDM networks. In SPIE/IEEE Opticom, 2004.

[53]

]]D. Wentzlaff et al. On-chip interconnection architecture of the Tile processor. IEEE Micro, 27(5), 2007.

Digital Library

Cited By

Gozzi GFiorito MCurzel SBarone CCastellana VMinutoli MTumeo AFerrandi F(2024)SPARTA: High-Level Synthesis of Parallel Multi-Threaded AcceleratorsACM Transactions on Reconfigurable Technology and Systems10.1145/367703518:1(1-30)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3677035
Kunthara RJames RSleeba SJose J(2024)Subnetwork Based Traffic Aware Rerouting for CMesh Bufferless Network-on-ChipJournal of Circuits, Systems and Computers10.1142/S021812662450207433:12Online publication date: 16-Feb-2024
https://doi.org/10.1142/S0218126624502074
Li WGoens AOswald NNagarajan VSorin D(2024)Determining the Minimum Number of Virtual Networks for Different Coherence Protocols2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00023(182-197)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00023
Show More Cited By

Index Terms

A case for bufferless routing in on-chip networks
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
    2. Parallel architectures
      1. Interconnection architectures

Recommendations

A case for bufferless routing in on-chip networks

Buffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip interconnection networks that eliminates the need for buffers for routing or ...
Flattened Butterfly Topology for On-Chip Networks

With the trend towards increasing number of cores in a multicore processors, the on-chip network that connects the cores needs to scale efficiently. In this work, we propose the use of high-radix networks in on-chip networks and describe how the ...
Throttling Control for Bufferless Routing in On-chip Networks
MCSOC '12: Proceedings of the 2012 IEEE 6th International Symposium on Embedded Multicore SoCs

As the number of core integration on a single diegrows, buffers consume significant energy, and occupy chip area. A bufferless deflection routing that eliminates router's inputportbuffers can considerably help saving energy and chip areawhile providing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

June 2009

510 pages

ISBN:9781605585260

DOI:10.1145/1555754

General Chair:
Steve Keckler
University of Texas at Austin
,
Program Chair:
Luiz André Barroso
Google Inc.

ACM SIGARCH Computer Architecture News Volume 37, Issue 3
June 2009
495 pages
ISSN:0163-5964
DOI:10.1145/1555815
Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA '09

Sponsor:

ISCA '09: The 36th Annual International Symposium on Computer Architecture

June 20 - 24, 2009

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

339
Total Citations
View Citations
2,336
Total Downloads

Downloads (Last 12 months)163
Downloads (Last 6 weeks)6

Reflects downloads up to 27 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gozzi GFiorito MCurzel SBarone CCastellana VMinutoli MTumeo AFerrandi F(2024)SPARTA: High-Level Synthesis of Parallel Multi-Threaded AcceleratorsACM Transactions on Reconfigurable Technology and Systems10.1145/367703518:1(1-30)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3677035
Kunthara RJames RSleeba SJose J(2024)Subnetwork Based Traffic Aware Rerouting for CMesh Bufferless Network-on-ChipJournal of Circuits, Systems and Computers10.1142/S021812662450207433:12Online publication date: 16-Feb-2024
https://doi.org/10.1142/S0218126624502074
Li WGoens AOswald NNagarajan VSorin D(2024)Determining the Minimum Number of Virtual Networks for Different Coherence Protocols2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00023(182-197)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00023
Savva STatas KKyriacou C(2023)Approximate Priority Hybrid 3DNoC Buffered-Bufferless RouterMicromachines10.3390/mi1402033514:2(335)Online publication date: 28-Jan-2023
https://doi.org/10.3390/mi14020335
Kurokawa YFukushi M(2023)A Simple and Effective Evaluation Method for Fault-Tolerant Routing Methods in Network-on-ChipsJournal of Advances in Information Technology10.12720/jait.14.5.876-88214:5(876-882)Online publication date: 2023
https://doi.org/10.12720/jait.14.5.876-882
Mandal SNarayana SAyoub RKishinevsky MAbousamra AOgras U(2023)Fast Performance Analysis for NoCs With Weighted Round-Robin Arbitration and Finite BuffersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2023.325066231:5(670-683)Online publication date: May-2023
https://doi.org/10.1109/TVLSI.2023.3250662
Rout SM BSinha MDeb S(2023) ReDeSIGN: Re use of De bug S tructures for I mprovement in Performance G ain of N oC Based MPSoCs IEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.320361111:2(432-447)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TETC.2022.3203611
Lyles DGonzalez-Guerrero PBautista MMichelogiannakis G(2023)PaST-NoC: A Packet-Switched Superconducting Temporal NoCIEEE Transactions on Applied Superconductivity10.1109/TASC.2023.323624833:5(1-13)Online publication date: Aug-2023
https://doi.org/10.1109/TASC.2023.3236248
Kunthara RJosna VNeethu KJames RJose J(2023)Modelling and Impact Analysis of Antipode Attack in Bufferless On-Chip NetworksSN Computer Science10.1007/s42979-022-01622-y4:3Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1007/s42979-022-01622-y
Li XYan GLiu CLi XYan GLiu C(2023)Fault-Tolerant Network-On-ChipBuilt-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design10.1007/978-981-19-8551-5_4(169-241)Online publication date: 2-Mar-2023
https://doi.org/10.1007/978-981-19-8551-5_4
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents