skip to main content
10.1145/1555754.1555781acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

A case for bufferless routing in on-chip networks

Published: 20 June 2009 Publication History

Abstract

Buffers in on-chip networks consume significant energy, occupy chip area, and increase design complexity. In this paper, we make a case for a new approach to designing on-chip interconnection networks that eliminates the need for buffers for routing or flow control. We describe new algorithms for routing without using buffers in router input/output ports. We analyze the advantages and disadvantages of bufferless routing and discuss how router latency can be reduced by taking advantage of the fact that input/output buffers do not exist. Our evaluations show that routing without buffers significantly reduces the energy consumption of the on-chip cache/processor-to-cache network, while providing similar performance to that of existing buffered routing algorithms at low network utilization (i.e., on most real applications). We conclude that bufferless routing can be an attractive and energy-efficient design option for on-chip cache/processor-to-cache networks where network utilization is low.

References

[1]
]]R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. In ICS, 1990.
[2]
]]P. Baran. On distributed communications networks. IEEE Trans. on Communications, Mar. 1964.
[3]
]]P. E. Berman, L. Gravano, G. D. Pifarre, and J. L. C. Sanz. Adaptive deadlock- and livelock-free routing with all minimal paths in torus networks. IEEE TPDS, 12(5), 1994.
[4]
]]S. Bhansali, W.-K. Chen, S. D. Jong, A. Edwards, R. Murray, M. Drinic, D. Mihocka, and J. Chau. Framework for instruction-level tracing and analysis of programs. In VEE, 2006.
[5]
]]D. Boggs et al. The microarchitecture of the Intel Pentium 4 processor on 90nm technology. Intel Technology Journal, 8(1), Feb. 2004.
[6]
]]S. Borkar. Thousand core chips: A technology perspective. In DAC, 2007.
[7]
]]S. Bregni and A. Pattavina. Performance evaluation of deflection routing in optical ip packet-switched networks. Cluster Computing, 7, 2004.
[8]
]]C. Busch, M. Herlihy, and R. Wattenhofer. Routing without flow control. In SPAA, 2001.
[9]
]]S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006.
[10]
]]W. J. Dally. Virtual-channel flow control. In ISCA-17, 1990.
[11]
]]W. J. Dally and C. L. Seitz. The torus routing chip. Distributed Computing, 1:187--196, 1986.
[12]
]]W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004.
[13]
]]S. Eyerman and L. Eeckhout. System-level performance metrics for multiprogram workloads. IEEE Micro, 28(3):42--53, 2008.
[14]
]]U. Feige and P. Raghavan. Exact analysis of hot-potato routing. In STOC, 1992.
[15]
]]J. M. Frailong, W. Jalby, and J. Lenfant. XOR-Schemes: A flexible data organization in parallel memories. In ICPP, 1985.
[16]
]]R. Gabor, S. Weiss, and A. Mendelson. Fairness and throughput in switch on event multithreading. In MICRO-39, 2006.
[17]
]]M. Galles. Spider: A high-speed network interconnect. IEEE Micro, 17(1):34--39, 2008.
[18]
]]C. Gomez, M. E. Gomez, P. Lopez, and J. Duato. A bufferless switching technique for NoCs. In Wina, 2008.
[19]
]]C. Gomez, M. E. Gomez, P. Lopez, and J. Duato. Reducing packet dropping in a bufferless NoC. In Euro-Par, 2008.
[20]
]]M. K. Gowan, L. Biro, and D. Jackson. Power considerations in the design of the Alpha 21264 microprocessor. In DAC, 1998.
[21]
]]P. Gratz, B. Grot, and S. W. Keckler. Regional congestion awareness for load balance in networks-on-chip. In HPCA-14, 2008.
[22]
]]P. Gratz, C. Kim, R. McDonald, S. W. Keckler, and D. Burger. Implementation and evaluation of on-chip network architectures. In ICCD, 2006.
[23]
]]W. D. Hillis. The Connection Machine. MIT Press, 1989.
[24]
]]Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. A 5-ghz mesh interconnect for a teraflops processor. IEEE Micro, 27(5), 2007.
[25]
]]N. D. E. Jerger, L.-S. Peh, and M. H. Lipasti. Circuit-switched coherence. In NOCS, 2008.
[26]
]]C. Kim, D. Burger, and S. Keckler. An adaptive, non-uniform cache structure for wire-dominated on-chip caches. In ASPLOS-X, 2002.
[27]
]]J. Kim, J. D. Balfour, and W. J. Dally. Flattened butterfly topology for on-chip networks. In MICRO, 2007.
[28]
]]S. Konstantinidou and L. Snyder. Chaos router: architecture and performance. In ISCA, 1991.
[29]
]]D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, 1981.
[30]
]]A. Kumar, L.-S. Peh, and N. K. Jha. Token flow control. In MICRO-41, 2008.
[31]
]]Z. Lu, M. Zhong, and A. Jantsch. Evaluation of on-chip networks using deflection routing. In GLSVLSI, 2006.
[32]
]]C.-K. Luk et al. Pin: Building customized program analysis tools with dynamic instrumentation. In PLDI, 2005.
[33]
]]K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.
[34]
]]M. M. K. Martin et al. Timestamp snooping: An approach for extending smps. In ASPLOS-IX, 2000.
[35]
]]G. Michelogiannakis, J. Balfour, and W. J. Dally. Elastic-buffer flow control for on-chip networks. In HPCA-15, 2009.
[36]
]]G. Michelogiannakis, D. Pnevmatikatos, and M. Katevenis. Approaching ideal NoC latency with pre-configured routes. In NOCS, 2007.
[37]
]]Micron. 1Gb DDR2 SDRAM Component: MT47H128M8HQ-25, May 2007. http://download.micron.com/pdf/datasheets/dram/ddr2/1GbDDR2.pdf.
[38]
]]M. Millberg, R. Nilsson, R. Thid, and A. Jantsch. Guaranteed bandwidth using looped containers in temporally disjoint networks within the Nostrum network on chip. In DATE, 2004.
[39]
]]R. Mullins, A. West, and S. Moore. Low-latency virtual-channel routers for on-chip networks. In ISCA-31, 2004.
[40]
]]O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO-40, 2007.
[41]
]]T. Nesson and S. L. Johnsson. ROMM: Routing on mesh and torus networks. In SPAA, 1995.
[42]
]]J. D. Owens, W. J. Dally, R. Ho, D. N. Jayashima, S. W. Keckler, and L.-S. Peh. Research challenges for on-chip interconnection networks. IEEE Micro, 27(5), 2007.
[43]
]]H. Patil et al. Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In MICRO-37, 2004.
[44]
]]L.-S. Peh and W. J. Dally. A delay model and speculative architecture for pipelined routers. In HPCA-7, 2001.
[45]
]]A. Singh, W. J. Dally, A. K. Gupta, and B. Towles. GOAL: A load-balanced adaptive routing algorithm for torus networks. In ISCA, 2003.
[46]
]]B. J. Smith. A pipelined shared resource MIMD computer. In ICPP, 1978.
[47]
]]B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Proc. of SPIE, 1981.
[48]
]]B. J. Smith, Apr. 2008. Personal communication.
[49]
]]A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous mutlithreading processor. In ASPLOS-IX, 2000.
[50]
]]M. B. Taylor et al. Evaluation of the Raw microprocessor: An exposed-wire-delay architecture for ILP and streams. In ISCA-31, 2004.
[51]
]]H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: a power-performance simulator for interconnection networks. In MICRO, 2002.
[52]
]]X. Wang, A. Morikawa, and T. Aoyama. Burst optical deflection routing protocol for wavelength routing WDM networks. In SPIE/IEEE Opticom, 2004.
[53]
]]D. Wentzlaff et al. On-chip interconnection architecture of the Tile processor. IEEE Micro, 27(5), 2007.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
June 2009
510 pages
ISBN:9781605585260
DOI:10.1145/1555754
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 37, Issue 3
    June 2009
    495 pages
    ISSN:0163-5964
    DOI:10.1145/1555815
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. memory systems
  2. multi-core
  3. on-chip networks
  4. routing

Qualifiers

  • Research-article

Conference

ISCA '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)163
  • Downloads (Last 6 weeks)6
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media