skip to main content
10.1145/3547276.3548523acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

Accelerating the Task Activation and Data Communication for Dataflow Computing

Published: 13 January 2023 Publication History

Abstract

The hybrid dataflow/von-Neumann [1] architectures may differ in implementations but all follow similar principles: they harness the parallelism and data synchronization inherent to the dataflow model, yet maintain the programmability of the von-Neumann model. In this paper, we raise a new kind of hybrid dataflow/von-Neumann architectures, which contains TAU (Task Activated Unit) and SPM [9] (scratchpad memory) components, by which we can enhance parallel efficiency. We also implement the prototype design, integrated with peripheral devices and verify the whole system on FPGA. Finally, we deploy operating system on the hardware system and profile the performance. The experimental results show that the performance is improved by 3.07%∼10.32% under the random data flow graph, the performance of inter-core communication is improved by 4% and the hardware acceleration effect is achieved.

References

[1]
Fahimeh Yazdanpanah, Carlos Alvarez-Martinez, Daniel Jimenez-Gonzalez, and Yoav Etsion. 2014. Hybrid Dataflow/von-Neumann Architectures. IEEE Trans. Parallel Distrib. Syst.25, 6 (June 2014), 1489–1509. https://doi.org/10.1109/TPDS.2013.125
[2]
Wesley M. Johnston, J. R. Paul Hanna, and Richard J. Millar. 2004. Advances in dataflow programming languages. ACM Comput. Surv. 36, 1 (March 2004), 1–34. https://doi.org/10.1145/1013208.1013209
[3]
Jack B. Dennis and David P. Misunas. 1998. A preliminary architecture for a basic data-flow processor. In 25 years of the international symposia on Computer architecture (selected papers) (ISCA '98). Association for Computing Machinery, New York, NY, USA, 125–131. https://doi.org/10.1145/285930.286058
[4]
Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proceedings of the 30th annual international symposium on Computer architecture (ISCA '03). Association for Computing Machinery, New York, NY, USA, 422–433. https://doi.org/10.1145/859618.859667
[5]
Steven Swanson, Andrew Schwerin, Martha Mercaldi, Andrew Petersen, Andrew Putnam, Ken Michelson, Mark Oskin, and Susan J. Eggers. 2007. The WaveScalar architecture. ACM Trans. Comput. Syst. 25, 2, Article 4 (May 2007), 54 pages. https://doi.org/10.1145/1233307.1233308
[6]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107–113. https://doi.org/10.1145/1327452.1327492
[7]
Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avižienis, John Wawrzynek, and Krste Asanović. 2012. Chisel: constructing hardware in a Scala embedded language. In Proceedings of the 49th Annual Design Automation Conference ( DAC '12 ). Association for Computing Machinery, New York, NY, USA, 1216–1225. https://doi.org/10.1145/2228360.2228584
[8]
Vivy Suhendra, Chandrashekar Raghavan, and Tulika Mitra. 2006. Integrated scratchpad memory optimization and task scheduling for MPSoC architectures. In Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems ( CASES '06 ). Association for Computing Machinery, New York, NY, USA, 401–410. https://doi.org/10.1145/1176760.1176809
[9]
Robert P. Dick, David L. Rhodes, and Wayne Wolf. 1998. TGFF: task graphs for free. In Proceedings of the 6th international workshop on Hardware/software codesign ( CODES/CASHE '98 ). IEEE Computer Society, USA, 97–101.
[10]
Nowatzki T, Gangadhar V, Ardalani N, Stream-Dataflow Acceleration[J]. Acm Sigarch Computer Architecture News, 2017, 45(2):416-429.
[11]
Lu W, Yan G, Li J, FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks[C]. 2017 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2017.
[12]
F. Vahid. Digital design with RTL design, VHDL, and Verilog.[M] John Wiley & Sons, 2010.
[13]
S. Martin. Digital Design in Chisel[M]. 2020.
[14]
Dou Yong, Wang Jialun, Su Huayou, Xu Chen, Gong Xiaoli, Yang Wangdong, Weng Chuliang, Li Zhanhuai, Li Kenli, Yu Ge, Zhou Aoying. From the development of computer architecture, we can see the idea of data flow calculation [J]. Chinese Science: Information Science, 2020,50 (11): 1697-1713
[15]
Arty A7 Reference Manual[EB/OL]. https://digilent.com/reference/programmable-logic/arty-a7/reference-manual .
[16]
Kavi K M, Hurson A R . Design of cache memories for dataflow architecture[J]. Journal of Systems Architecture, 1998, 44(9-10):657-674.
[17]
A. Martin, TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems[J]. 2016.
[18]
MIT contributors,“XV6”[EB/OL]. https://pdos.csail.mit.edu .2022.
[19]
Kyriacou C, Evripidou P, Trancoso P . Data-Driven Multithreading Using Conventional Microprocessors[J]. IEEE Transactions on Parallel and Distributed Systems, 2006, 17(10):1176-1188.
[20]
Etsion Y, Ramirez A, Badia R M, Task superscalar: using processors as functional units[C]. Hot Topics Parallelism, 2010, p:16.
[21]
C. Penha J, B. Silva L, M. Silva J, ADD: Accelerator Design and Deploy ‐ A tool for FPGA high‐performance dataflow computing[J]. Concurrency and Computation: Practice and Experience, 2019.
[22]
Ssa C, Ua A, Tn B . A framework to generate domain-specific manycore architectures from dataflow programs[J]. Microprocessors and Microsystems, p72.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing
August 2022
233 pages
ISBN:9781450394451
DOI:10.1145/3547276
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Check for updates

Author Tags

  1. Dataflow/von-Neumann architecture
  2. RISC-V
  3. parallel computation
  4. scratchpad memory

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP '22
ICPP '22: 51st International Conference on Parallel Processing
August 29 - September 1, 2022
Bordeaux, France

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 235
    Total Downloads
  • Downloads (Last 12 months)141
  • Downloads (Last 6 weeks)24
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media