Automated Deep Neural Network Inference Partitioning for Distributed Embedded Systems

Kreß, Fabian; Annabi, El Mahdi El; Hotfilter, Tim; Hoefer, Julian; Harbaum, Tanja; Becker, Juergen

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2406.19913 (cs)

[Submitted on 28 Jun 2024 (v1), last revised 11 Oct 2024 (this version, v2)]

Title:Automated Deep Neural Network Inference Partitioning for Distributed Embedded Systems

Authors:Fabian Kreß, El Mahdi El Annabi, Tim Hotfilter, Julian Hoefer, Tanja Harbaum, Juergen Becker

View PDF HTML (experimental)

Abstract:Distributed systems can be found in various applications, e.g., in robotics or autonomous driving, to achieve higher flexibility and robustness. Thereby, data flow centric applications such as Deep Neural Network (DNN) inference benefit from partitioning the workload over multiple compute nodes in terms of performance and energy-efficiency. However, mapping large models on distributed embedded systems is a complex task, due to low latency and high throughput requirements combined with strict energy and memory constraints. In this paper, we present a novel approach for hardware-aware layer scheduling of DNN inference in distributed embedded systems. Therefore, our proposed framework uses a graph-based algorithm to automatically find beneficial partitioning points in a given DNN. Each of these is evaluated based on several essential system metrics such as accuracy and memory utilization, while considering the respective system constraints. We demonstrate our approach in terms of the impact of inference partitioning on various performance metrics of six different DNNs. As an example, we can achieve a 47.5 % throughput increase for EfficientNet-B0 inference partitioned onto two platforms while observing high energy-efficiency.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)
Cite as:	arXiv:2406.19913 [cs.DC]
	(or arXiv:2406.19913v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2406.19913

Submission history

From: Fabian Kreß [view email]
[v1] Fri, 28 Jun 2024 13:36:08 UTC (114 KB)
[v2] Fri, 11 Oct 2024 15:55:00 UTC (133 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Automated Deep Neural Network Inference Partitioning for Distributed Embedded Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Automated Deep Neural Network Inference Partitioning for Distributed Embedded Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators