article

The WaveScalar architecture

Authors:

Susan J. EggersAuthors Info & Claims

ACM Transactions on Computer Systems (TOCS), Volume 25, Issue 2

Article No.: 4, Pages 1 - 54

https://doi.org/10.1145/1233307.1233308

Published: 01 May 2007 Publication History

Get Access

Abstract

Silicon technology will continue to provide an exponential increase in the availability of raw transistors. Effectively translating this resource into application performance, however, is an open challenge that conventional superscalar designs will not be able to meet. We present WaveScalar as a scalable alternative to conventional designs. WaveScalar is a dataflow instruction set and execution model designed for scalable, low-complexity/high-performance processors. Unlike previous dataflow machines, WaveScalar can efficiently provide the sequential memory semantics that imperative languages require. To allow programmers to easily express parallelism, WaveScalar supports pthread-style, coarse-grain multithreading and dataflow-style, fine-grain threading. In addition, it permits blending the two styles within an application, or even a single function.

To execute WaveScalar programs, we have designed a scalable, tile-based processor architecture called the WaveCache. As a program executes, the WaveCache maps the program's instructions onto its array of processing elements (PEs). The instructions remain at their processing elements for many invocations, and as the working set of instructions changes, the WaveCache removes unused instructions and maps new ones in their place. The instructions communicate directly with one another over a scalable, hierarchical on-chip interconnect, obviating the need for long wires and broadcast communication.

This article presents the WaveScalar instruction set and evaluates a simulated implementation based on current technology. For single-threaded applications, the WaveCache achieves performance on par with conventional processors, but in less area. For coarse-grain threaded applications the WaveCache achieves nearly linear speedup with up to 64 threads and can sustain 7--14 multiply-accumulates per cycle on fine-grain threaded versions of well-known kernels. Finally, we apply both styles of threading to equake from Spec2000 and speed it up by 9x compared to the serial version.

References

[1]

Adve, S. V. and Gharachorloo, K. 1996. Shared memory consistency models: A tutorial. Comput. 29, 12, 66--76.

Digital Library

Google Scholar

[2]

Agarwal, V., Hrishikesh, M. S., Keckler, S. W., and Burger, D. 2000. Clock rate versus IPC: The end of the road for conventional microarchitectures. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA). ACM Press, New York. 248--259.

Digital Library

Google Scholar

[3]

Arvind. 2005. Dataflow: Passing the token. ISCA keynote in Annual International Symposium on Computer Architecture.

Google Scholar

[4]

Arvind, Nikhil, R. S., and Pingali, K. K. 1989. I-structures: Data structures for parallel computing. ACM Trans. Program. Lang. Syst. 11, 4, 598--632.

Digital Library

Google Scholar

[5]

Barroso, L. A., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., and Verghese, B. 2000. Piranha: A scalable architecture based on single-chip multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA). ACM Press, New York. 282--293.

Digital Library

Google Scholar

[6]

Barth, P. S., Nikhil, R. S., and Arvind. 1991. M-structures: Extending a parallel, non-strict, functional languages with state. Tech. Rep. MIT/LCS/TR-327, Massachusetts Institute of Technology. March.

Google Scholar

[7]

Beck, M., Johnson, R., and Pingali, K. 1991. From control flow to data flow. J. Parallel Distrib. Comput. 12, 2, 118--129.

Digital Library

Google Scholar

[8]

Budiu, M., Venkataramani, G., Chelcea, T., and Goldstein, S. C. 2004. Spatial computation. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM Press, New York. 14--26.

Digital Library

Google Scholar

[9]

Cadence. 2007. Cadence website. http://www.cadence.com.

Google Scholar

[10]

Chrysos, G. Z. and Emer, J. S. 1998. Memory dependence prediction using store sets. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Los Alamitos, CA. 142--153.

Digital Library

Google Scholar

[11]

Culler, D. E. 1990. Managing parallelism and resources in scientific dataflow programs. Ph.D. thesis, Massachusetts Institute of Technology.

Google Scholar

[12]

Culler, D. E., Sah, A., Schauser, K. E., von Eicken, T., and Wawrzynek, J. 1991. Fine-Grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine. In Proceedings of the 4th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM Press, New York. 164--175.

Digital Library

Google Scholar

[13]

Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4, 451--490.

Digital Library

Google Scholar

[14]

Dally, W. J. and Seitz, C. L. 1987. Deadlock-Free message routing in multiprocessor interconnection networks. IEEE Trans. Comput. 36, 5, 547--553.

Digital Library

Google Scholar

[15]

Davis, A. L. 1978. The architecture and system method of DDM1: A recursively structured data driven machine. In Proceedings of the 5th Annual Symposium on Computer Architecture (ISCA). ACM Press, New York. 210--215.

Digital Library

Google Scholar

[16]

Dennis, J. B. and Misunas, D. P. 1975. A preliminary architecture for a basic data-flow processor. In Proceedings of the 2nd Annual Symposium on Computer Architecture (ISCA). ACM Press, New York. 126--132.

Digital Library

Google Scholar

[17]

Desikan, R., Burger, D. C., Keckler, S. W., and Austin, T. M. 2001. Sim-Alpha: A validated, execution-driven Alpha 21264 simulator. Tech. Rep. TR-01-23, University of Texas-Austin, Department of Computer Sciences.

Google Scholar

[18]

Ekman, M. and Stenström, P. 2003. Performance and power impact of issue width in chip-multiprocessor cores. In Proceedings of the International Conference on Parallel Processing.

Google Scholar

[19]

Feo, J. T., Miller, P. J., and Skedzielewski, S. K. 1995. SISAL90. In Proceedings of the Conference on High Performance Functional Computing.

Google Scholar

[20]

Goldstein, S. C. and Budiu, M. 2001. NanoFabrics: Spatial computing using molecular electronics. In Proceedings of the 28th Annual International Symposium on Computer Architecture (ISCA). ACM Press, New York. 178--191.

Digital Library

Google Scholar

[21]

Goodman, J. R., Vernon, M. K., and Woest, P. J. 1989. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM Press, New York. 64--75.

Digital Library

Google Scholar

[22]

Grafe, V. G., Davidson, G. S., Hoch, J. E., and Holmes, V. P. 1989. The Epsilon dataflow processor. In Proceedings of the 16th Annual International Symposium on Computer Architecture (ISCA). ACM Press, New York. 36--45.

Digital Library

Google Scholar

[23]

Gurd, J. R., Kirkham, C. C., and Watson, I. 1985. The Manchester prototype dataflow computer. Commun. ACM 28, 1, 34--52.

Digital Library

Google Scholar

[24]

Hammond, L., Hubbert, B. A., Siu, M., Prabhu, M. K., Chen, M., and Olukotun, K. 2000. The Stanford Hydra cmp. IEEE Micro. 20, 2, 71--84.

Digital Library

Google Scholar

[25]

Hrishikesh, M. S., Burger, D., Jouppi, N. P., Keckler, S. W., Farkas, K. I., and Shivakumar, P. 2002. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Los Alamitos, CA. 14--24.

Digital Library

Google Scholar

[26]

Jain, A. E. A. 2001. A 1.2GHz Alpha microprocessor with 44.8GB/s chip pin bandwidth. In Proceedings of the IEEE International Solid-State Circuits Conference. Vol. 1. 240--241.

Google Scholar

[27]

Keckler, S. W., Dally, W. J., Maskit, D., Carter, N. P., Chang, A., and Lee, W. S. 1998. Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Los Alamitos, CA. 306--317.

Digital Library

Google Scholar

[28]

Kishi, M., Yasuhara, H., and Kawamura, Y. 1983. Dddp-A distributed data driven processor. In Proceedings of the 10th Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society Press, Los Alamitos, CA. 236--242.

Digital Library

Google Scholar

[29]

Krewel, K. 2005. Alpha EV7 processor: A high-performance tradition continues. Microprocessor Rep.

Google Scholar

[30]

Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communicatons systems. In Proceedings of the 30th International Symposium on Microarchitecture (MICRO). 330--335.

Digital Library

Google Scholar

[31]

Lee, W., Barua, R., Frank, M., Srikrishna, D., Babb, J., Sarkar, V., and Amarasinghe, S. 1998. Space-Time scheduling of instruction-level parallelism on a Raw machine. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM Press, New York. 46--57.

Digital Library

Google Scholar

[32]

Lo, J. L., Emer, J. S., Levy, H. M., Stamm, R. L., Tullsen, D. M., and Eggers, S. J. 1997. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. ACM Trans. Comput. Syst. 15, 3, 322--354.

Digital Library

Google Scholar

[33]

Mahlke, S. A., Lin, D. C., Chen, W. Y., Hank, R. E., and Bringmann, R. A. 1992. Effective compiler support for predicated execution using the hyperblock. In Proceedings of the 25th Annual International Symposium on Microarchitecture (MICRO). IEEE Computer Society Press, Los Alamitos, CA. 45--54.

Digital Library

Google Scholar

[34]

Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W. J., and Horowitz, M. 2000. Smart memories: A modular reconfigurable architecture. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA). ACM Press, New York. 161--171.

Digital Library

Google Scholar

[35]

Mercaldi, M., Swanson, S., Petersen, A., Putnam, A., Schwerin, A., Oskin, M., and Eggers, S. J. 2006a. Instruction scheduling for a tiled dataflow architecture. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 141--150.

Digital Library

Google Scholar

[36]

Mercaldi, M., Swanson, S., Petersen, A., Putnam, A., Schwerin, A., Oskin, M., and Eggers, S. J. 2006b. Modeling instruction placement on a spatial architecture. In Proceedings of the 18th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 158--169.

Digital Library

Google Scholar

[37]

Mukherjee, S. S., Weaver, C., Emer, J., Reinhardt, S. K., and Austin, T. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE Computer Society, Los Alamitos, CA. 29.

Digital Library

Google Scholar

[38]

Nagarajan, R., Sankaralingam, K., Burger, D., and Keckler, S. W. 2001. A design space evaluation of grid processor architectures. In Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO). IEEE Computer Society, Los Alamitos, CA. 40--51.

Digital Library

Google Scholar

[39]

Nichols, B., Buttlar, D., and Farrell, J. P. 1996. Pthreads Programming. O'Reilly, Sebastopol, CA.

Digital Library

Google Scholar

[40]

Nikhil, R. 1990. The parallel programming language id and its compilation for parallel machines. In Proceedings of the Workshop on Massive Paralleism: Hardware, Programming and Applications. Acamedic Press.

Google Scholar

[41]

Papadopoulos, G. M. and Culler, D. E. 1990. Monsoon: An explicit token-store architecture. In Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA). ACM Press, New York. 82--91.

Digital Library

Google Scholar

[42]

Papadopoulos, G. M. and Traub, K. R. 1991. Multithreading: A revisionist view of dataflow architectures. In Proceedings of the 18th Annual International Symposium on Computer Architecture (ISCA). ACM Press, New York. 342--351.

Digital Library

Google Scholar

[43]

Petersen, A., Putnam, A., Mercaldi, M., Schwerin, A., Eggers, S., Swanson, S., and Oskin, M. 2006. Reducing control overhead in dataflow architectures. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT). 182--191.

Digital Library

Google Scholar

[44]

Sankaralingam, K., Nagarajan, R., Liu, H., Kim, C., Huh, J., Burger, D., Keckler, S. W., and Moore, C. R. 2003. Exploiting ilp, tlp, and dlp with the polymorphous trips architecture. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA). 422--433.

Digital Library

Google Scholar

[45]

Shimada, T., Hiraki, K., and Nishida, K. 1984. An architecture of a data flow machine and its evaluation. ACM SIGARCH Comput. Architecure News 14, 2, 226--234.

Digital Library

Google Scholar

[46]

Shimada, T., Hiraki, K., Nishida, K., and Sekiguchi, S. 1986. Evaluation of a prototype data flow processor of the sigma-1 for scientific computations. In Proceedings of the 13th Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society Press, Los Alamitos, CA. 226--234.

Digital Library

Google Scholar

[47]

Smith, A., Gibson, J., Maher, B., Nethercote, N., Yoder, B., Burger, D., McKinle, K. S., and Burrill, J. 2006. Compiling for edge architectures. In Proceedings of the International Symposium on Code Generation and Optimization (CGO). IEEE Computer Society, Los Alamitos, CA. 185--195.

Digital Library

Google Scholar

[48]

SPEC. 2000. SPEC CPU 2000 benchmark specifications. SPEC2000 Benchmark Release.

Google Scholar

[49]

Swanson, S. 2006. The WaveScalar architecture. Ph.D. thesis, University of Washington.

Digital Library

Google Scholar

[50]

Swanson, S., Michelson, K., Schwerin, A., and Oskin, M. 2003. Wavescalar. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE Computer Society, Los Alamitos, CA. 291.

Digital Library

Google Scholar

[51]

Swanson, S., Putnam, A., Mercaldi, M., Michelson, K., Petersen, A., Schwerin, A., Oskin, M., and Eggers, S. J. 2006. Area-Performance trade-offs in tiled dataflow architectures. In Proceedings of the 33rd International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Los Alamitos, CA. 314--326.

Digital Library

Google Scholar

[52]

Taylor, M. B., Lee, W., Miller, J., Wentzlaff, D., Bratt, I., Greenwald, B., Hoffmann, H., Johnson, P., Kim, J., Psota, J., Saraf, A., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., and Agarwal, A. 2004. Evaluation of the Raw microprocessor: An exposed-wire-delay architecture for ILP and streams. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, Los Alamitos, CA. 2.

Digital Library

Google Scholar

[53]

TSMC. 2007. Silicon design chain cooperation enables nanometer chip design. Cadence Whitepaper. http://www.cadence.com/whitepapers/.

Google Scholar

[54]

Tullsen, D. M., Lo, J. L., Eggers, S. J., and Levy, H. M. 1999. Supporting fine-grained synchronization on a simultaneous multithreading processor. In Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA). IEEE Computer Society, Los Alamitos, CA. 54.

Digital Library

Google Scholar

Cited By

View all

Chen KMason Nelson TKhadem AFayazi MSingapuram SDreslinski RTalati NKim HBlaauw D(2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
https://dl.acm.org/doi/10.1145/3695880
Elsabbagh FSheikhha SYing VNguyen QEmer JSanchez D(2023)Accelerating RTL Simulation with Hardware-Software Co-DesignProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614257(153-166)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614257
Deng JTang XZhang JLi YZhang LHan BHe HTu FLiu LWei SHu YYin S(2023)Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow PlaneProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614246(1395-1408)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614246
Show More Cited By

Index Terms

The WaveScalar architecture
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Data flow architectures
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Concurrency control
        Multithreading

Recommendations

The wavescalar architecture
Area-Performance Trade-offs in Tiled Dataflow Architectures
ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture

Tiled architectures, such as RAW, SmartMemories, TRIPS, and WaveScalar, promise to address several issues facing conventional processors, including complexity, wire-delay, and performance. The basic premise of these architectures is that larger, higher-...
Area-Performance Trade-offs in Tiled Dataflow Architectures

Tiled architectures, such as RAW, SmartMemories, TRIPS, and WaveScalar, promise to address several issues facing conventional processors, including complexity, wire-delay, and performance. The basic premise of these architectures is that larger, higher-...

Reviews

Reviewer: Joseph M. Arul

The WaveScalar architecture aims to take advantages found in dataflow to achieve more parallelism with less silicon area, along with better performance and scalability than superscalar architecture. The program counter, which is a bottleneck in von Neumann architecture, has been eliminated, and the natural parallelism found in dataflow architecture is combined with fine-grain multi-threading to achieve a linear speedup when used with several processing elements. Instructions are ordered according to the path to be stored in a WaveCache, which is designed for WaveScalar architecture in a tiled fashion. In this architecture, deep speculation is done naturally. To alleviate the cache coherency, directory-based MESI is used. One novel idea in this architecture is that the WaveCache loads instructions from memory, and assigns them to processing elements instead of processing elements requesting certain instructions to process. Besides, the WaveScalar architecture does not use a register file for intervening accesses, but directly stores the required instructions. In previous dataflow architectures, even the dynamic ones, the matching of operands was a bottleneck, and reduced the performance. In this architecture, the authors claim that the logic of match and dispatch is the most complex. What impact will it have for the WaveScalar architecture__?__ Will it be a bottleneck, as it was in the past__?__ These things need to be addressed clearly. The WaveCache for this architecture is designed rather well. The compiler for this architecture is yet to be finished-it is rather tedious work. Will there be a linear speedup with the compiler for this particular architecture__?__ This is yet to be evaluated. A hand-coded version would be naturally optimized. Is it possible to compare the performance of WaveScalar architecture with TRIPS, which takes a similar approach, but with slight variations__?__ There is also a stream processor with clusters of processing elements, which uses local registers and global registers that try to take advantage of the von Neumann model without including dataflow concepts. Is it possible to compare the performance of the WaveScalar architecture to a stream processor__?__ Since WaveScalar does not include any register file for intervening accesses, will it have a positive or negative impact on the overall performance of this architecture__?__ The authors try to bring dataflow concepts closer to von Neumann, but with less wire delay, and less silicon that can perform in a single-threaded or multithreaded version better than the superscalar architectures. The paper is rather long; the simple concepts of the von Neumann model and dataflow could have been omitted, since anyone reading this paper should know the basics in both realms. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems

ACM Transactions on Computer Systems Volume 25, Issue 2

May 2007

115 pages

ISSN:0734-2071

EISSN:1557-7333

DOI:10.1145/1233307

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2007

Published in TOCS Volume 25, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

101
Total Citations
View Citations
2,073
Total Downloads

Downloads (Last 12 months)92
Downloads (Last 6 weeks)14

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chen KMason Nelson TKhadem AFayazi MSingapuram SDreslinski RTalati NKim HBlaauw D(2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 18-Sep-2024
https://dl.acm.org/doi/10.1145/3695880
Elsabbagh FSheikhha SYing VNguyen QEmer JSanchez D(2023)Accelerating RTL Simulation with Hardware-Software Co-DesignProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614257(153-166)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614257
Deng JTang XZhang JLi YZhang LHan BHe HTu FLiu LWei SHu YYin S(2023)Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow PlaneProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614246(1395-1408)Online publication date: 28-Oct-2023
https://dl.acm.org/doi/10.1145/3613424.3614246
Bhagyanath AKercher NSchneider KZhu Qvon Hanxleden RStephen EBrandt J(2023)Allocation and Scheduling of Dataflow Graphs on Hybrid Dataflow/von Neumann ArchitecturesProceedings of the 21st ACM-IEEE International Conference on Formal Methods and Models for System Design10.1145/3610579.3611079(59-70)Online publication date: 21-Sep-2023
https://dl.acm.org/doi/10.1145/3610579.3611079
Schneider KBhagyanath A(2023)Consistency Constraints for Mapping Dataflow Graphs to Hybrid Dataflow/von Neumann ArchitecturesACM Transactions on Embedded Computing Systems10.1145/360786922:5(1-25)Online publication date: 26-Sep-2023
https://dl.acm.org/doi/10.1145/3607869
Elakhras ASawhney RGuerrieri AJosipovic LIenne PIenne PZhang Z(2023)Straight to the Queue: Fast Load-Store Queue Allocation in Dataflow CircuitsProceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3543622.3573050(39-45)Online publication date: 12-Feb-2023
https://dl.acm.org/doi/10.1145/3543622.3573050
Bhagyanath ASchneider K(2023)Program Balancing in Compilation for Buffered Hybrid Dataflow Processors2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC57700.2023.00018(57-66)Online publication date: Jun-2023
https://doi.org/10.1109/COMPSAC57700.2023.00018
Wijtvliet MCorporaal HKumar AWijtvliet MCorporaal HKumar A(2023)Konzept der Blocks-ArchitekturBlocks: Auf dem Weg zu Energieeffizienten, Grobkörnigen, Rekonfigurierbaren Architekturen (CGRA)10.1007/978-3-031-36650-5_3(67-101)Online publication date: 10-Sep-2023
https://doi.org/10.1007/978-3-031-36650-5_3
Wijtvliet MCorporaal HKumar AWijtvliet MCorporaal HKumar A(2023)CGRA-Hintergrund und verwandte ArbeitenBlocks: Auf dem Weg zu Energieeffizienten, Grobkörnigen, Rekonfigurierbaren Architekturen (CGRA)10.1007/978-3-031-36650-5_2(17-66)Online publication date: 10-Sep-2023
https://doi.org/10.1007/978-3-031-36650-5_2
B. Khan MR. Khan AAlkahtani H(2022)Exploring the Approaches to Data Flow ComputingComputers, Materials & Continua10.32604/cmc.2022.02062371:2(2333-2346)Online publication date: 2022
https://doi.org/10.32604/cmc.2022.020623
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations