research-article

Hyperqueues: Design and Implementation of Deterministic Concurrent Queues

Authors:

Hans Vandierendonck,

Dimitrios S. NikolopoulosAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 6, Issue 4

Article No.: 23, Pages 1 - 35

https://doi.org/10.1145/3365660

Published: 19 November 2019 Publication History

Abstract

The hyperqueue is a programming abstraction for queues that results in deterministic and scale-free parallel programs. Hyperqueues extend the concept of Cilk++ hyperobjects to provide thread-local views on a shared data structure. While hyperobjects are organized around private local views, hyperqueues provide a shared view on a queue data structure. Hereby, hyperqueues guarantee determinism for programs using concurrent queues. We define the programming API and semantics of two instances of the hyperqueue concept. These hyperqueues differ in their API and the degree of concurrency that is extracted. We describe the implementation of the hyperqueues in a work-stealing scheduler and demonstrate scalable performance on pipeline-parallel benchmarks from PARSEC and StreamIt.

References

[1]

P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. 2003. STAPL: An adaptive, generic parallel C++ library. In Proceedings of the 14th International Conference on Languages and Compilers for Parallel Computing (LCPC’01). Springer-Verlag, Berlin, 193--208. Retrieved from http://dl.acm.org/citation.cfm?id=1769331.1769344.

[2]

M. Bauer, S. Treichler, E. Slaughter, and A. Aitken. 2012. Legion: Expressing locality and independence with logical regions. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE Computer Society Press, Los Alamitos, CA. Retrieved from http://dl.acm.org/citation.cfm?id=2388996.2389086.

[3]

C. Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.

Digital Library

[4]

G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun. 2012. Internally deterministic parallel algorithms can be fast. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). ACM, New York, NY, 181--192.

[5]

R. D. Blumofe and C. E. Leiserson. 1994. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science. IEEE Computer Society, Washington, DC, 356--368.

[6]

R. L. Bocchino, Jr., S. Heumann, N. Honarmand, S. V. Adve, V. S. Adve, A. Welc, and T. Shpeisman. 2011. Safe nondeterminism in a deterministic-by-default parallel language. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’11). ACM, New York, NY, 535--548.

[7]

V. Cavé, J. Zhao, J. Shirako, and V. Sarkar. 2011. Habanero-java: The new adventures of old X11. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java (PPPJ’11). ACM, New York, NY, 51--61.

[8]

E. G. Coffman, M. Elphick, and A. Shoshani. 1971. System deadlocks. ACM Comput. Surv. 3, 2 (June 1971), 67--78.

Digital Library

[9]

P. Fatourou and N. D. Kallimanis. 2011. A highly efficient wait-free universal construction. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11). ACM, New York, NY, 325--334.

[10]

M. Frigo, P. Halpern, C. E. Leiserson, and S. Lewin-Berlin. 2009. Reducers and other Cilk++ hyperobjects. In Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures (SPAA’09). ACM, New York, NY, 79--90.

[11]

M. P. Herlihy and J. M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3 (July 1990), 463--492.

Digital Library

[12]

Intel. 2010. Intel Threading Building Blocks. Intel. Document Number 319872-006US.

[13]

J. C. Jenista, Y. h. Eom, and B. C. Demsky. 2011. OoOJava: Software out-of-order execution. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). ACM, New York, NY, 57--68.

[14]

A. Katranov. 2012. Deterministic Reduction: A New Community Preview Feature in Intel® Threading Building Blocks. Retrieved from http://software.intel.com/en-us/blogs/2012/05/11/deterministic-reduction-a-new-community-preview-feature-in-intel-threading-building-blocks.

[15]

L. Lamport. 1983. Specifying concurrent program modules. ACM Trans. Program. Lang. Syst. 5, 2 (Apr. 1983), 190--222.

Digital Library

[16]

D. Lea. 2013. Concurrency JSR-166 Interest Site. Retrieved from http://gee.cs.oswego.edu/dl/concurrency-interest/.

[17]

I.-T. A. Lee, C. E. Leiserson, T. B. Schardl, J. Sukha, and Z. Zhang. 2013. On-the-fly pipeline parallelism. In Proceedings of the 25th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’13). ACM, New York, NY, 140--151.

[18]

S. Macdonald, D. Szafron, and J. Schaeffer. 2004. Rethinking the pipeline as object-oriented states with transformations. In Proceedings of the 9th International Workshop on High-Level Parallel Programming Models and Supportive Environments. IEEE Computer Society, Washington, DC, 12--21.

[19]

M. M. Michael and Michael L. Scott. 1996. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC’96). ACM, New York, NY, 267--275.

[20]

A. Navarro, R. Asenjo, S. Tabik, and C. Cascaval. 2009. Analytical modeling of pipeline parallelism. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). ACM, New York, NY, 281--290.

[21]

OpenMP. 2013. OpenMP Application Programming Interface, version 4.0. Retrieved from http://www.openmp.org/.

[22]

Hannes Payer, Harald Roeck, Christoph M. Kirsch, and Ana Sokolova. 2011. Scalability versus semantics of concurrent FIFO queues. In Proceedings of the 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC’11). ACM, New York, NY, 331--332.

Digital Library

[23]

A. Pop and A. Cohen. 2013. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. ACM Trans. Archit. Code Optim. 9, 4 (Jan. 2013).

Digital Library

[24]

P. Pratikakis, H. Vandierendonck, S. Lyberis, and D. S. Nikolopoulos. 2011. A programming model for deterministic task parallelism. In Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC’11). ACM, New York, NY, 7--12.

[25]

Aleksandar Prokopec, Heather Miller, Tobias Schlatter, Philipp Haller, and Martin Odersky. 2013. FlowPools: A Lock-Free Deterministic Concurrent Dataflow Abstraction. Springer, Berlin, 158--173.

[26]

W. Pugh. 1990. Skip lists: A probabilistic alternative to balanced trees. Commun. ACM 33, 6 (June 1990), 668--676.

Digital Library

[27]

A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. 2011. Parallelism orchestration using DoPE: The degree of parallelism executive. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY, 26--37.

[28]

E. C. Reed, N. Chen, and R. E. Johnson. 2011. Expressing pipeline parallelism using TBB constructs: A case study on what works and what doesn’t. In Proceedings of the Compilation of the Co-located Workshops on DSM’11, TMC’11, AGERE! 2011, AOOPES’11, NEAT’11, and VMIL’11 (SPLASH’11 Workshops). ACM, New York, NY, 133--138.

[29]

D. Sanchez, D. Lo, R. M. Yoo, J. Sugerman, and C. Kozyrakis. 2011. Dynamic fine-grain scheduling of pipeline parallelism. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE Computer Society, Washington, DC, 22--32.

[30]

J. Shirako, D. M. Peixotto, V. Sarkar, and W. N. Scherer. 2008. Phasers: A unified deadlock-free construct for collective and point-to-point synchronization. In Proceedings of the 22nd Annual International Conference on Supercomputing (ICS’08). ACM, New York, NY, 277--288.

[31]

M. A. Suleman, M. K. Qureshi, Khubaib, and Y. N. Patt. 2010. Feedback-directed pipeline parallelism. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, New York, NY, 147--156.

[32]

W. Thies and S. Amarasinghe. 2010. An empirical characterization of stream programs and its implications for language and compiler design. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, New York, NY, 365--376.

[33]

W. Thies, M. Karczmarek, and S. P. Amarasinghe. 2002. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC’02). Springer-Verlag, London, UK, 179--196. Retrieved from http://dl.acm.org/citation.cfm?id=647478.727935.

Digital Library

[34]

D. Unnikrishnan, J. Zhao, and R. Tessier. 2009. Application specific customization and scalability of soft multiprocessors. In Proceedings of the 17th IEEE Symposium on Field Programmable Custom Computing Machines (FCCM’09). 123--130.

[35]

J. D. Valois. 1994. Implementing lock-free queues. In Proceedings of the 7th International Conference on Parallel and Distributed Computing Systems. IEEE Computer Society, Las Vegas, 64--69.

[36]

H. Vandierendonck, K. Chronaki, and D. S. Nikolopoulos. 2013. Deterministic scale-free pipeline parallelism with hyperqueues. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’13). ACM, New York, NY.

[37]

H. Vandierendonck, P. Pratikakis, and D. S. Nikolopoulos. 2011a. Parallel programming of general-purpose programs using task-based programming models. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism (HotPar’11). USENIX Association, Berkeley, CA, 6.

Digital Library

[38]

H. Vandierendonck, G. Tzenakis, and D. S. Nikolopoulos. 2011b. A unified scheduler for recursive and task dataflow parallelism. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE Computer Society, Washington, DC, 1--11.

[39]

H. Vandierendonck, G. Tzenakis, and D. S. Nikolopoulos. 2013. Analysis of dependence tracking algorithms for task dataflow execution. ACM Trans. Archit. Code Optim. 10, 4 (Dec. 2013).

Digital Library

Index Terms

Hyperqueues: Design and Implementation of Deterministic Concurrent Queues

Recommendations

Deterministic scale-free pipeline parallelism with hyperqueues
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Ubiquitous parallel computing aims to make parallel programming accessible to a wide variety of programming areas using deterministic and scale-free programming models built on a task abstraction. However, it remains hard to reconcile these attributes ...
WFR-TM

Transactional Memory (TM) is a promising concurrent programming paradigm which employs transactions to achieve synchronization in accessing common data known as transactional variables. A transaction may either commit, making its updates to ...
Mixed gated/exhaustive service in a polling model with priorities

In this paper we consider a single-server polling system with switch-over times. We introduce a new service discipline, mixed gated/exhaustive service, that can be used for queues with two types of customers: high and low priority customers. At the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 6, Issue 4

December 2019

188 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/3372747

Editor:
David A. Bader
New Jersey Institute of Technology, USA

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 November 2019

Accepted: 01 July 2019

Revised: 01 July 2019

Received: 01 June 2016

Published in TOPC Volume 6, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

Hyperqueue

Qualifiers

Research-article
Research
Refereed

Funding Sources

European Community's Seventh Framework Programme
NovoSoft project Marie Curie Actions
United Kingdom EPSRC GEMSCLAIM project
TEXT project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
213
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents