skip to main content
10.1109/ISCA.2006.10acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Area-Performance Trade-offs in Tiled Dataflow Architectures

Published: 01 May 2006 Publication History

Abstract

Tiled architectures, such as RAW, SmartMemories, TRIPS, and WaveScalar, promise to address several issues facing conventional processors, including complexity, wire-delay, and performance. The basic premise of these architectures is that larger, higher-performance implementations can be constructed by replicating the basic tile across the chip. This paper explores the area-performance trade-offs when designing one such tiled architecture, WaveScalar. We use a synthesizable RTL model and cycle-level simulator to perform an area/performance pareto analysis of over 200 WaveScalar processor designs ranging in size from 19mm2 to 378mm2 and having a 22 FO4 cycle time. We demonstrate that, for multi-threaded workloads, WaveScalar performance scales almost ideally from 19 to 101mm2 when optimized for area efficiency and from 44 to 202mm2when optimized for peak performance. Our analysis reveals that WaveScalar's hierarchical interconnect plays an important role in overall scalability, and that WaveScalar achieves the same (or higher) performance in substantially less area than either an aggressive out-of-order superscalar or Sun's Niagara CMP processor.

References

[1]
{1} W. Lee et al., "Space-time scheduling of instruction-level parallelism on a Raw machine," in Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS-VIII, October 1998.
[2]
{2} K. Mai, T. Paaske, N. Jayasena, R. Ho, W. Dally, and M. Horowitz, "Smart memories: A modular reconfigurable architecture," in International Symposium on Computer Architecture , 2002.
[3]
{3} R. Nagarajan, K. Sankaralingam, D. Burger, and S. Keckler, "A design space evaluation of grid processor architectures," in Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001.
[4]
{4} K. Sankaralingam, R. Nagarajan, H. Liu, C. Kim, J. Huh, D. Burger, S. W. Keckler, and C. R. Moore, "Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture," in Proceedings of the 30th annual international symposium on Computer architecture, 2003.
[5]
{5} S. Swanson, K. Michelson, A. Schwerin, and M. Oskin, "WaveScalar," in Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, p. 291, 2003.
[6]
{6} A. H. Veen, The Misconstrued Semicolon: Reconciling Imperative Languages and Dataflow Machines. Mathematish Centrum, 1980.
[7]
{7} M. Mercaldi, "An instruction placement model for distributed ilp architectures, "Master's thesis, University of Washington, 2005.
[8]
{8} "Synopsys website." http://www.synopsys.com.
[9]
{9} "Cadence website." http://www.cadence.com.
[10]
{10} "TSMC 90nm technology platform." http://www.tsmc.com/ download/english/a05_literature/90nm_Brochure.pdf.
[11]
{11} M. B. Taylor, W. Lee, J. Miller, D. Wentzlaff, I. Bratt, B. Greenwald, H. Hoffmann, P. Johnson, J. Kim, J. Psota, A. Saraf, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal, "Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ilp and streams," in Proceedings of the 31st annual international symposium on Computer architecture, p. 2, IEEE Computer Society, 2004.
[12]
{12} D. Chinnery and K. Keutzer, Closing the Gap Between ASIC & Custom. Kluwer Academic Publishers, 2003.
[13]
{13} SPEC, "Spec CPU 2000 benchmark specifications." SPEC2000 Benchmark Release, 2000.
[14]
{14} C. Lee, M. Potkonjak, and W. H. Mangione-Smith, "Media-bench: A tool for evaluating and synthesizing multimedia and communicatons systems," in International Symposium on Microarchitecture , pp. 330-335, 1997.
[15]
{15} "The wavescalar architecture." In submission to ACM Transactions on Computer Systems, TOCS.
[16]
{16} J. B. Dennis, "A preliminary architecture for a basic dataflow processor," in Proceedings of the 2nd Annual Symposium on Computer Architecture, 1975.
[17]
{17} A. L. Davis, "The architecure and system method of DDM1: A recursively structured data driven machine," in Proceedings of the 5th Annual Symposium on Computer Architecture, (Palo Alto, California), pp. 210-215, IEEE Computer Society and ACM SIGARCH, April 3-5, 1978.
[18]
{18} T. Shimada, K. Hiraki, K. Nishida, and S. Sekiguchi, "Evaluation of a prototype data flow processor of the sigma-1 for scientific computations," in Proceedings of the 13th annual international symposium on Computer architecture, pp. 226- 234, IEEE Computer Society Press, 1986.
[19]
{19} J. R. Gurd, C. C. Kirkham, and I. Watson, "The manchester prototype dataflow computer," Communications of the ACM, vol. 28, no. 1, pp. 34-52, 1985.
[20]
{20} M. Kishi, H. Yasuhara, and Y. Kawamura, "Dddp-a distributed data driven processor," in Conference Proceedings of the tenth annual international symposium on Computer architecture, pp. 236-242, IEEE Computer Society Press, 1983.
[21]
{21} V. G. Grafe, G. S. Davidson, J. E. Hoch, and V. P. Holmes, "The epsilon dataflow processor," in Proceedings of the 16th annual international symposium on Computer architecture, pp. 36-45, ACM Press, 1989.
[22]
{22} G. Papadopoulos and D. Culler, "Monsoon: An explicit token-store architecture," in Proceedings of the 17th International Symposium on Computer Architecture, May 1990.
[23]
{23} D. E. Culler, A. Sah, K. E. Schauser, T. von Eicken, and J. Wawrzynek, "Fine-grain parallelism with minimal hard-ware support: A compiler-controlled threaded abstract machine," in Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems, 1991.
[24]
{24} Arvind and R. Nikhil, "Executing a program on the mit tagged-token dataflow architecture," IEEE Transactions on Computers, vol. 39, no. 3, pp. 300-318, 1990.
[25]
{25} S. Allan and A. Oldehoeft, "A flow analysis procedure for the translation of high-level languages to a dataflow language," IEEE Transactions on Computers, 1980.
[26]
{26} A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz, "An evaluation of directory schemes for cache coherence," SIGARCH Comput. Archit. News, vol. 16, no. 2, pp. 280-298, 1988.
[27]
{27} W. J. Dally and C. L. Seitz, "Deadlock-free message routing in multiprocessor interconnection networks," IEEE Trans. Comput. , vol. 36, no. 5, pp. 547-553, 1987.
[28]
{28} S. Swanson, A. Putnam, K. Michelson, M. Mercaldi, A. Petersen, A. Schwerin, M. Oskin, and S. Eggers, "The microarchitecture of a pipelined wavescalar processor: An rtl-based study," Tech. Rep. TR-2004-11-02, University of Washington, 2005.
[29]
{29} R. Desikan, D. Burger, S. Keckler, and T. Austin, "Sim-alpha: a validated, execution-driven alpha 21264 simulator," Tech. Rep. TR-01-23, UT-Austin Computer Sciences, 2001.
[30]
{30} A. J. et. al., "A 1.2ghz alpha microprocessor with 44.8gb/s chip pin bandwidth," in IEEE International Solid-State Circuits Conference, vol. 1, pp. 240-241, 2001.
[31]
{31} K. Krewel, "Alpha ev7 processor: A high-performance tradition continues," Microprocessor Report, April 2005.
[32]
{32} J. Laudon, "Performance/watt: the new server focus," SIGARCH Comput. Archit. News, vol. 33, no. 4, pp. 5-13, 2005.
[33]
{33} C. A. Moritz, D. Yeung, and A. Agarwal, "Exploring performance-cost optimal designs for raw microprocessors," in Proceedings of the International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM98, April 1998.

Cited By

View all

Index Terms

  1. Area-Performance Trade-offs in Tiled Dataflow Architectures

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture
      June 2006
      383 pages
      ISBN:076952608X
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 34, Issue 2
        May 2006
        383 pages
        ISSN:0163-5964
        DOI:10.1145/1150019
        Issue’s Table of Contents

      Sponsors

      Publisher

      IEEE Computer Society

      United States

      Publication History

      Published: 01 May 2006

      Check for updates

      Author Tags

      1. ASIC
      2. Dataflow computing
      3. RTL
      4. WaveScalar

      Qualifiers

      • Article

      Conference

      ISCA06
      Sponsor:

      Acceptance Rates

      ISCA '06 Paper Acceptance Rate 31 of 234 submissions, 13%;
      Overall Acceptance Rate 543 of 3,203 submissions, 17%

      Upcoming Conference

      ISCA '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 05 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media