research-article

Open access

Accelerating the Task Activation and Data Communication for Dataflow Computing

Authors:

Luo QiumingAuthors Info & Claims

ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing

Article No.: 27, Pages 1 - 7

https://doi.org/10.1145/3547276.3548523

Published: 13 January 2023 Publication History

All formats PDF

Abstract

The hybrid dataflow/von-Neumann [1] architectures may differ in implementations but all follow similar principles: they harness the parallelism and data synchronization inherent to the dataflow model, yet maintain the programmability of the von-Neumann model. In this paper, we raise a new kind of hybrid dataflow/von-Neumann architectures, which contains TAU (Task Activated Unit) and SPM [9] (scratchpad memory) components, by which we can enhance parallel efficiency. We also implement the prototype design, integrated with peripheral devices and verify the whole system on FPGA. Finally, we deploy operating system on the hardware system and profile the performance. The experimental results show that the performance is improved by 3.07%∼10.32% under the random data flow graph, the performance of inter-core communication is improved by 4% and the hardware acceleration effect is achieved.

References

[1]

Fahimeh Yazdanpanah, Carlos Alvarez-Martinez, Daniel Jimenez-Gonzalez, and Yoav Etsion. 2014. Hybrid Dataflow/von-Neumann Architectures. IEEE Trans. Parallel Distrib. Syst.25, 6 (June 2014), 1489–1509. https://doi.org/10.1109/TPDS.2013.125

Digital Library

[2]

Wesley M. Johnston, J. R. Paul Hanna, and Richard J. Millar. 2004. Advances in dataflow programming languages. ACM Comput. Surv. 36, 1 (March 2004), 1–34. https://doi.org/10.1145/1013208.1013209

Digital Library

[3]

Jack B. Dennis and David P. Misunas. 1998. A preliminary architecture for a basic data-flow processor. In 25 years of the international symposia on Computer architecture (selected papers) (ISCA '98). Association for Computing Machinery, New York, NY, USA, 125–131. https://doi.org/10.1145/285930.286058

Digital Library

[4]

Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proceedings of the 30th annual international symposium on Computer architecture (ISCA '03). Association for Computing Machinery, New York, NY, USA, 422–433. https://doi.org/10.1145/859618.859667

Digital Library

[5]

Steven Swanson, Andrew Schwerin, Martha Mercaldi, Andrew Petersen, Andrew Putnam, Ken Michelson, Mark Oskin, and Susan J. Eggers. 2007. The WaveScalar architecture. ACM Trans. Comput. Syst. 25, 2, Article 4 (May 2007), 54 pages. https://doi.org/10.1145/1233307.1233308

Digital Library

[6]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107–113. https://doi.org/10.1145/1327452.1327492

Digital Library

[7]

Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avižienis, John Wawrzynek, and Krste Asanović. 2012. Chisel: constructing hardware in a Scala embedded language. In Proceedings of the 49th Annual Design Automation Conference ( DAC '12 ). Association for Computing Machinery, New York, NY, USA, 1216–1225. https://doi.org/10.1145/2228360.2228584

Digital Library

[8]

Vivy Suhendra, Chandrashekar Raghavan, and Tulika Mitra. 2006. Integrated scratchpad memory optimization and task scheduling for MPSoC architectures. In Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems ( CASES '06 ). Association for Computing Machinery, New York, NY, USA, 401–410. https://doi.org/10.1145/1176760.1176809

Digital Library

[9]

Robert P. Dick, David L. Rhodes, and Wayne Wolf. 1998. TGFF: task graphs for free. In Proceedings of the 6th international workshop on Hardware/software codesign ( CODES/CASHE '98 ). IEEE Computer Society, USA, 97–101.

[10]

Nowatzki T, Gangadhar V, Ardalani N, Stream-Dataflow Acceleration[J]. Acm Sigarch Computer Architecture News, 2017, 45(2):416-429.

Digital Library

[11]

Lu W, Yan G, Li J, FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks[C]. 2017 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2017.

[12]

F. Vahid. Digital design with RTL design, VHDL, and Verilog.[M] John Wiley & Sons, 2010.

[13]

S. Martin. Digital Design in Chisel[M]. 2020.

[14]

Dou Yong, Wang Jialun, Su Huayou, Xu Chen, Gong Xiaoli, Yang Wangdong, Weng Chuliang, Li Zhanhuai, Li Kenli, Yu Ge, Zhou Aoying. From the development of computer architecture, we can see the idea of data flow calculation [J]. Chinese Science: Information Science, 2020,50 (11): 1697-1713

[15]

Arty A7 Reference Manual[EB/OL]. https://digilent.com/reference/programmable-logic/arty-a7/reference-manual .

[16]

Kavi K M, Hurson A R . Design of cache memories for dataflow architecture[J]. Journal of Systems Architecture, 1998, 44(9-10):657-674.

Digital Library

[17]

A. Martin, TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems[J]. 2016.

[18]

MIT contributors,“XV6”[EB/OL]. https://pdos.csail.mit.edu .2022.

[19]

Kyriacou C, Evripidou P, Trancoso P . Data-Driven Multithreading Using Conventional Microprocessors[J]. IEEE Transactions on Parallel and Distributed Systems, 2006, 17(10):1176-1188.

Digital Library

[20]

Etsion Y, Ramirez A, Badia R M, Task superscalar: using processors as functional units[C]. Hot Topics Parallelism, 2010, p:16.

[21]

C. Penha J, B. Silva L, M. Silva J, ADD: Accelerator Design and Deploy ‐ A tool for FPGA high‐performance dataflow computing[J]. Concurrency and Computation: Practice and Experience, 2019.

[22]

Ssa C, Ua A, Tn B . A framework to generate domain-specific manycore architectures from dataflow programs[J]. Microprocessors and Microsystems, p72.

Index Terms

Accelerating the Task Activation and Data Communication for Dataflow Computing
1. Software and its engineering
  1. Software creation and management
    1. Software development process management
      1. Software development methods
        Agile software development

Recommendations

Beyond Static Parallel Loops: Supporting Dynamic Task Parallelism on Manycore Architectures with Software-Managed Scratchpad Memories
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

Manycore architectures integrate hundreds of cores on a single chip by using simple cores and simple memory systems usually based on software-managed scratchpad memories (SPMs). However, such architectures are notoriously challenging to program, since ...
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories
PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming

Several parallel architectures such as GPUs and the Cell processor have fast explicitly managed on-chip memories, in addition to slow off-chip memory. They also have very high computational power with multiple levels of parallelism. A significant ...
Optimizing data placement and size configuration for morphable NVM based SPM in embedded multicore systems
Abstract
Embedded multicore systems are widely designed to meet the high-performance requirement. Meanwhile, many embedded multicore systems are equipped with multiple scratchpad memories (SPM) because of their advantages in power efficiency ...
Highlights
- In this paper, we explore how to efficiently use morphable NVM based SPMs in multicore systems to improve performance and minimize memory access cost.

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing

August 2022

233 pages

ISBN:9781450394451

DOI:10.1145/3547276

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP '22

ICPP '22: 51st International Conference on Parallel Processing

August 29 - September 1, 2022

Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
235
Total Downloads

Downloads (Last 12 months)141
Downloads (Last 6 weeks)24

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents