skip to main content
10.1145/2578948.2560696acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
tutorial

DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures

Published: 07 February 2014 Publication History

Abstract

Traditional work-stealing schedulers perform poorly in multi-programmed multi-core architectures, because all the programs tend to use all the cores and thus incur serious core contention. To relieve this problem, this paper proposes a Demand-aware Work-Stealing (DWS) task scheduler, with which a work-stealing program uses cores according to its realtime demand on the cores. If multiple programs scheduled by DWS run in a multi-core architecture concurrently, the cores are first evenly allocated to the co-running programs. At runtime, if a program cannot fully utilize its cores, it releases some of its allocated cores. Otherwise, if a program demands more cores, it tries to use the free cores released by its co-running programs. Experimental results show that DWS can achieve up to 32.3% performance gain for co-running programs compared to traditional work-stealing schedulers with the ABP yielding mechanism.

References

[1]
K. Agrawal, Y. He, W. Hsu, and C. Leiserson. Adaptive scheduling with parallelism feedback. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 100--109. ACM, 2006.
[2]
K. Agrawal, Y. He, and C. E. Leiserson. An empirical evaluation of work stealing with parallelism feedback. In the 26th IEEE International Conference on Distributed Computing Systems, pages 19--19. IEEE, 2006.
[3]
K. Agrawal, C. E. Leiserson, Y. He, and W. J. Hsu. Adaptive work-stealing with parallelism feedback. ACM Transactions on Computer Systems, 26(3):7, 2008.
[4]
N. Arora, R. Blumofe, and C. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In ACM Symposium on Parallelism in Algorithms and Architectures, pages 119--129. ACM, 1998.
[5]
E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. The design of openmp tasks. IEEE Transactions on Parallel and Distributed Systems, 20(3):404--418, 2009.
[6]
M. Bhadauria and S. McKee. An approach to resource-aware co-scheduling for CMPs. In International Conference on Supercomputing, pages 189--199. ACM, 2010.
[7]
R. Blumofe and D. Papadopoulos. The performance of work stealing in multiprogrammed environments. Performance evaluation review, 26:266--267, 1998.
[8]
R. D. Blumofe. Executing Multithreaded Programs Efficiently. PhD thesis, MIT, Sept. 1995.
[9]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. The Journal of Parallel and Distributed Computing, 37(1):55--69, Aug. 1996.
[10]
Q. Chen, Y. Chen, Z. Huang, and M. Guo. WATS: Workload-Aware Task Scheduling in Asymmetric Multi-core Architectures. In IEEE International Parallel and Distributed Processing Symposium. IEEE, 2012.
[11]
Q. Chen and M. Guo. Adaptive workload aware task scheduling for single-ISA multi-core architectures. ACM Transactions on Architecture and Code Optimization (to appear), 2013.
[12]
Q. Chen, M. Guo, and Z. Huang. CATS: Cache aware task-stealing based on online profiling in multi-socket multi-core architectures. In International Conference on Supercomputing. IEEE, 2012.
[13]
Q. Chen, M. Guo, and Z. Huang. Adaptive cache aware bi-tier work-stealing in multi-socket multi-core architectures. IEEE Transactions on Parallel and Distributed System, 24(12):2334--2343, 2013.
[14]
Q. Chen, Z. Huang, M. Guo, and J. Zhou. CAB: Cache aware bi-tier task-stealing in multi-socket multi-core architecture. In International Conference on Parallel Processing, pages 722--732. IEEE, 2011.
[15]
J. Corbalán, X. Martorell, and J. Labarta. Performance-driven processor allocation. In Proceedings of the 4th conference on Symposium on Operating System Design and Implementation-Volume 4, pages 5--5. USENIX Association, 2000.
[16]
X. Ding, K. Wang, P. B. Gibbons, and X. Zhang. BWS: balanced work stealing for time-sharing multicores. In Proceedings of the 7th ACM european conference on Computer Systems, pages 365--378. ACM, 2012.
[17]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. ACM Sigplan Notices, 33(5):212--223, 1998.
[18]
Y. Guo, J. Zhao, V. Cave, and V. Sarkar. SLAW: a scalable locality-aware adaptive work-stealing scheduler. In IEEE International Symposium on Parallel and Distributed Processing, 2010.
[19]
Y. He, W. Hsu, and C. Leiserson. Provably efficient online nonclairvoyant adaptive scheduling. IEEE Transactions on Parallel and Distributed Systems, 19(9):1263--1279, 2008.
[20]
I. IEEE Std 1003.1-2001 (Open Group Technical Standard. Standard for Information Technology--Portable Operating System Interface (POSIX). 2001.
[21]
M. Johnson, H. McCraw, S. Moore, P. Mucci, J. Nelson, D. Terpstra, V. Weaver, and T. Mohan. Papi-v: Performance monitoring for virtual machines. 2012.
[22]
J. Lee and J. Palsberg. Featherweight X10: a core calculus for async-finish parallelism. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 25--36. ACM, 2010.
[23]
C. E. Leiserson. The cilk++ concurrency platform. The Journal of Supercomputing, 51(3):244--257, 2010.
[24]
C. McCann, R. Vaswani, and J. Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. ACM Transactions on Computer Systems, 11(2):146--178, 1993.
[25]
M. M. Michael, M. T. Vechev, and V. A. Saraswat. Idempotent work stealing. In Proceedings of ACM symposium on Principles and practice of parallel programming, pages 45--54. ACM, 2009.
[26]
J. Reinders. Intel threading building blocks. O'Reilly, 2007.
[27]
S. Sen. Dynamic processor allocation for adaptively parallel work-stealing jobs. Master's thesis, MIT, Aug. 2004.
[28]
P. Sobalvarro and W. Weihl. Demand-based coscheduling of parallel jobs on multiprogrammed multiprocessors. In Job Scheduling Strategies for Parallel Processing, pages 106--126. Springer, 1995.
[29]
H. Sun, Y. Cao, and W.-J. Hsu. Efficient adaptive scheduling of multiprocessors with stable parallelism feedback. IEEE Transactions on Parallel and Distributed Systems, 22(4):594--607, 2011.
[30]
H. Sun and W.-J. Hsu. Adaptive B-Greedy (ABG): A simple yet efficient scheduling algorithm. In IEEE International Symposium on Parallel and Distributed Processing, pages 1--8. IEEE, 2008.
[31]
A. Tucker and A. Gupta. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In ACM SIGOPS Operating Systems Review, volume 23, pages 159--166. ACM, 1989.
[32]
L. Wang, H. Cui, Y. Duan, F. Lu, X. Feng, and P.-C. Yew. An adaptive task creation strategy for work-stealing scheduling. In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, pages 266--277. ACM, 2010.

Cited By

View all

Index Terms

  1. DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores
    February 2014
    156 pages
    ISBN:9781450326575
    DOI:10.1145/2578948
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 February 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Core allocation
    2. Multi-programmed
    3. Work-stealing

    Qualifiers

    • Tutorial
    • Research
    • Refereed limited

    Conference

    PPoPP '14
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 53 of 97 submissions, 55%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media