skip to main content
research-article

An Efficient Technique of Application Mapping and Scheduling on Real-Time Multiprocessor Systems for Throughput Optimization

Published: 02 August 2016 Publication History

Abstract

Multiprocessor systems are becoming ubiquitous in today’s embedded systems design. In this article, we address the problem of mapping an application represented by a Homogeneous Synchronous Dataflow (HSDF) graph onto a real-time multiprocessor platform with the objective of maximizing total throughput. We propose that the optimal solution to the problem is composed of three components: actor-to-processor mapping, retiming, and actor ordering on each processor. The entire problem is systematically modeled into a Boolean Satisfiability (SAT) problem such that the optimal solution can be guaranteed theoretically. In order to explore the vast solution space more efficiently, we develop a specific HSDF theory solver based on the special characteristics of the timed HSDF, and integrate it into the general search framework of the SAT solver. Two alternative integration methods based on branch-and-bound are presented to achieve early branch pruning in the search space; thus, the scalability is greatly improved. Extensive performance evaluation on synthetic examples and a case study on the realistic H.264 Video Decoder show that our approach provides as much as 76.9% throughput improvement, and is scalable to industry-sized applications.

References

[1]
Mauricio Alvarez, Arnaldo Azevedo, Alex Ramrez, Cor Meenderinck, Mateo Valero, and Ben Juurlink. 2009. Performance evaluation of macroblock-level parallelization of H.264 decoding on a CC-NUMA multiprocessor architecture. In Proceedings of the 4th Colombian Computing Conference (4CCC’09).
[2]
A. Bonfietti, L. Benini, M. Lombardi, and M. Milano. 2010. An efficient and complete approach for throughput-maximal SDF allocation and scheduling on multi-core platforms. In Design, Automation Test in Europe Conference Exhibition (DATE’10). 897--902.
[3]
Jason Cong, Guoling Han, and Wei Jiang. 2007. Synthesis of an application-specific soft multiprocessor system. In Proceedings of the International Symposium on Field Programmable Gate Arrays (FPGA’07), André DeHon and Mike Hutton (Eds.). ACM/SIGDA, New York, NY, Monterey, CA, 99--107.
[4]
Ali Dasdan and Rajesh K. Gupta. 1998. Faster maximum and minimum mean cycle algorithms for system-performance analysis. IEEE Transactions on CAD of Integrated Circuits and Systems 17, 10, 889--899.
[5]
Tracy C. Denk and Keshab K. Parhi. 1998. Exhaustive scheduling and retiming of digital signal processing systems. IEEE Transactions on Circuits and Systems 45, 7, 821--838.
[6]
Niklas Eén and Niklas Sörensson. 2003. An extensible SAT-solver. In SAT, Lecture Notes in Computer Science, Enrico Giunchiglia and Armando Tacchella (Eds.), Vol. 2919. Springer, Berlin, 502--518.
[7]
Federico Heras, Javier Larrosa, and Albert Oliveras. 2008. MiniMaxSat: An efficient weighted max-SAT solver. Journal of Artificial Intelligence Research 31, 1--32.
[8]
J. N. Hooker and Hong Yan. 1995. Logic circuit verification by Benders decomposition. Principles and Practice of Constraint Programming: The Newport Papers, 267C288.
[9]
Edward A. Lee and David G. Messerschmitt. 1987. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers 36, 1, 24--35.
[10]
Weichen Liu, Zonghua Gu, Jiang Xu, Yu Wang, and Mingxuan Yuan. 2009. An efficient technique for analysis of minimal buffer requirements of synchronous dataflow graphs with model checking. In CODES’09: Proceedings of the 2009 International Conference on Hardware-Software Codesign and System Synthesis. Grenoble, France.
[11]
Weichen Liu, Zonghua Gu, and Ye Yaoyao. 2015. Efficient SAT-based application mapping and scheduling on multiprocessor systems for throughput maximization. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). 127--136.
[12]
Weichen Liu, Jiang Xu, Xiaowen Wu, Yaoyao Ye, Xuan Wang, Wei Zhang, M. Nikdast, and Zhehui Wang. 2011. A NoC traffic suite based on real applications. In IEEE Computer Society Annual Symposium on VLSI (ISVLSI’11). 66--71.
[13]
Weichen Liu, Mingxuan Yuan, Xiuqiang He, Zonghua Gu, and Xue Liu. 2008. Efficient SAT-based mapping and scheduling of homogeneous synchronous dataflow graphs for throughput optimization. In Proceedings of the 2008 Real-Time Systems Symposium (RTSS’08). IEEE Computer Society, Washington, DC, 492--504.
[14]
N. Liveris, C. Lin, J. Wang, H. Zhou, and P. Banerjee. 2007. Retiming for synchronous data flow graphs. In Design Automation Conference (ASP-DAC’07). Asia and South Pacific. 480--485.
[15]
Alexander Metzner and Christian Herde. 2006. RTSAT--An optimal and efficient approach to the task allocation problem in distributed architectures. In RTSS. IEEE Computer Society, Los Alamitos, CA, 147--158.
[16]
Orlando Moreira, Twan Basten, Marc Geilen, and Sander Stuijk. 2010. Buffer sizing for rate-optimal single-rate data-flow scheduling revisited. IEEE Transactions on Computers 59, 188--201.
[17]
Orlando Moreira, Frederico Valente, and Marco Bekooij. 2007. Scheduling multiple independent hard-real-time jobs on a heterogeneous multiprocessor. In Proceedings of the 7th ACM and IEEE International Conference on Embedded Software (EMSOFT’07). ACM, New York, NY, 57--66.
[18]
Qi Ning and Guang R. Gao. 1993. A novel framework of register allocation for software pipelining. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’93). ACM, New York, NY, 29--42.
[19]
Object Management Group. MDA--The architecture of choice for a changing world. Retrieved July 1, 2016 from http://www.omg.org/mda.
[20]
Keshab K. Parhi and David G. Messerschmitt. 1991. Static rate-optimal scheduling of iterative data-flow programs via optimum unfolding. IEEE Transactions on Computers 40, 2, 178--195.
[21]
Thomas M. Parks, Jose Luis Pino, and Edward A. Lee. 1995. A comparison of synchronous and cyclo-static dataflow. In Proceedings of Asilomar Conference on Signals, Systems and Computers (ACSSC’95).
[22]
Nadathur Satish, Kaushik Ravindran, and Kurt Keutzer. 2007. A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors. In DATE, Rudy Lauwereins and Jan Madsen (Eds.). ACM, 57--62.
[23]
Hossein M. Sheini and Karem A. Sakallah. 2006. From propositional satisfiability to satisfiability modulo theories. In SAT, Lecture Notes in Computer Science, Armin Biere and Carla P. Gomes (Eds.), Vol. 4121. Springer, Berlin, 1--9.
[24]
S. Sriram and S. S. Bhattacharyya. 2000. Embedded Multiprocessors: Scheduling and Synchronization. Marcel Dekker, New York, NY.
[25]
Sander Stuijk. 2007. Predictable Mapping of Streaming Applications on Multiprocessors. Ph.D. Dissertation. Technical University of Eindhoven, Eindhoven, The Netherlands.
[26]
Sander Stuijk, Twan Basten, Marc Geilen, and Henk Corporaal. 2007. Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs. In DAC. 777--782.
[27]
S. Stuijk, M. C. W. Geilen, and T. Basten. 2006. SDF3: SDF for free. In Proceedings of the 6th International Conference on Application of Concurrency to System Design (ACSD’06). IEEE Computer Society Press, Los Alamitos, CA, 276--278.
[28]
Hoeseok Yang and Soonhoi Ha. 2009. Pipelined data parallel task mapping/scheduling technique for MPSoC. In Design, Automation Test in Europe Conference Exhibition (DATE’09). 69--74.
[29]
L. Yang, W. Liu, W. Jiang, M. Li, J. Yi, and E. H. M. Sha. 2016. Application mapping and scheduling for network-on-chip-based multiprocessor system-on-chip with fine-grain communication optimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems PP, 99, 1--14.
[30]
Lei Yang, Weichen Liu, Weiwen Jiang, Juan Yi, Duo Liu, and Qingfeng Zhuge. 2014. Contention-aware task and communication co-scheduling for network-on-chip based multiprocessor system-on-chip. In IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications. 1--8.
[31]
L. Yang, W. Liu, W. Jiang, W. Zhang, M. Li, J. Yi, D. Liu, and E. H. M. Sha. 2015. Traffic-aware application mapping for network-on-chip based multiprocessor system-on-chip. In IEEE 17th International Conference on High Performance Computing and Communications (HPCC’15), IEEE 7th International Symposium on Cyberspace Safety and Security (CSS’15), IEEE 12th International Conference on Embedded Software and Systems (ICESS’15). 571--576.
[32]
Xue-Yang Zhu, T. Basten, M. Geilen, and S. Stuijk. 2012. Efficient retiming of multirate DSP algorithms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31, 6, 831--844.
[33]
Xue-Yang Zhu, M. Geilen, T. Basten, and S. Stuijk. 2014. Memory-constrained static rate-optimal scheduling of synchronous dataflow graphs via retiming. In Design, Automation and Test in Europe Conference and Exhibition (DATE’14). 1--6.

Cited By

View all

Index Terms

  1. An Efficient Technique of Application Mapping and Scheduling on Real-Time Multiprocessor Systems for Throughput Optimization

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 15, Issue 4
        Special Issue on ESWEEK2015 and Regular Papers
        August 2016
        411 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/2982215
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Journal Family

        Publication History

        Published: 02 August 2016
        Accepted: 01 May 2016
        Revised: 01 May 2016
        Received: 01 December 2015
        Published in TECS Volume 15, Issue 4

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Multiprocessor
        2. optimization
        3. satisfiability
        4. scheduling

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Funding Sources

        • Chongqing High-Tech Research Programs
        • National 863 Programs
        • NSFC

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)6
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 17 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media