skip to main content
research-article

Hyperqueues: Design and Implementation of Deterministic Concurrent Queues

Published: 19 November 2019 Publication History

Abstract

The hyperqueue is a programming abstraction for queues that results in deterministic and scale-free parallel programs. Hyperqueues extend the concept of Cilk++ hyperobjects to provide thread-local views on a shared data structure. While hyperobjects are organized around private local views, hyperqueues provide a shared view on a queue data structure. Hereby, hyperqueues guarantee determinism for programs using concurrent queues. We define the programming API and semantics of two instances of the hyperqueue concept. These hyperqueues differ in their API and the degree of concurrency that is extracted. We describe the implementation of the hyperqueues in a work-stealing scheduler and demonstrate scalable performance on pipeline-parallel benchmarks from PARSEC and StreamIt.

References

[1]
P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. 2003. STAPL: An adaptive, generic parallel C++ library. In Proceedings of the 14th International Conference on Languages and Compilers for Parallel Computing (LCPC’01). Springer-Verlag, Berlin, 193--208. Retrieved from http://dl.acm.org/citation.cfm?id=1769331.1769344.
[2]
M. Bauer, S. Treichler, E. Slaughter, and A. Aitken. 2012. Legion: Expressing locality and independence with logical regions. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE Computer Society Press, Los Alamitos, CA. Retrieved from http://dl.acm.org/citation.cfm?id=2388996.2389086.
[3]
C. Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University.
[4]
G. E. Blelloch, J. T. Fineman, P. B. Gibbons, and J. Shun. 2012. Internally deterministic parallel algorithms can be fast. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). ACM, New York, NY, 181--192.
[5]
R. D. Blumofe and C. E. Leiserson. 1994. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science. IEEE Computer Society, Washington, DC, 356--368.
[6]
R. L. Bocchino, Jr., S. Heumann, N. Honarmand, S. V. Adve, V. S. Adve, A. Welc, and T. Shpeisman. 2011. Safe nondeterminism in a deterministic-by-default parallel language. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’11). ACM, New York, NY, 535--548.
[7]
V. Cavé, J. Zhao, J. Shirako, and V. Sarkar. 2011. Habanero-java: The new adventures of old X11. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java (PPPJ’11). ACM, New York, NY, 51--61.
[8]
E. G. Coffman, M. Elphick, and A. Shoshani. 1971. System deadlocks. ACM Comput. Surv. 3, 2 (June 1971), 67--78.
[9]
P. Fatourou and N. D. Kallimanis. 2011. A highly efficient wait-free universal construction. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11). ACM, New York, NY, 325--334.
[10]
M. Frigo, P. Halpern, C. E. Leiserson, and S. Lewin-Berlin. 2009. Reducers and other Cilk++ hyperobjects. In Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures (SPAA’09). ACM, New York, NY, 79--90.
[11]
M. P. Herlihy and J. M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12, 3 (July 1990), 463--492.
[12]
Intel. 2010. Intel Threading Building Blocks. Intel. Document Number 319872-006US.
[13]
J. C. Jenista, Y. h. Eom, and B. C. Demsky. 2011. OoOJava: Software out-of-order execution. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’11). ACM, New York, NY, 57--68.
[14]
A. Katranov. 2012. Deterministic Reduction: A New Community Preview Feature in Intel® Threading Building Blocks. Retrieved from http://software.intel.com/en-us/blogs/2012/05/11/deterministic-reduction-a-new-community-preview-feature-in-intel-threading-building-blocks.
[15]
L. Lamport. 1983. Specifying concurrent program modules. ACM Trans. Program. Lang. Syst. 5, 2 (Apr. 1983), 190--222.
[16]
D. Lea. 2013. Concurrency JSR-166 Interest Site. Retrieved from http://gee.cs.oswego.edu/dl/concurrency-interest/.
[17]
I.-T. A. Lee, C. E. Leiserson, T. B. Schardl, J. Sukha, and Z. Zhang. 2013. On-the-fly pipeline parallelism. In Proceedings of the 25th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’13). ACM, New York, NY, 140--151.
[18]
S. Macdonald, D. Szafron, and J. Schaeffer. 2004. Rethinking the pipeline as object-oriented states with transformations. In Proceedings of the 9th International Workshop on High-Level Parallel Programming Models and Supportive Environments. IEEE Computer Society, Washington, DC, 12--21.
[19]
M. M. Michael and Michael L. Scott. 1996. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing (PODC’96). ACM, New York, NY, 267--275.
[20]
A. Navarro, R. Asenjo, S. Tabik, and C. Cascaval. 2009. Analytical modeling of pipeline parallelism. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT’09). ACM, New York, NY, 281--290.
[21]
OpenMP. 2013. OpenMP Application Programming Interface, version 4.0. Retrieved from http://www.openmp.org/.
[22]
Hannes Payer, Harald Roeck, Christoph M. Kirsch, and Ana Sokolova. 2011. Scalability versus semantics of concurrent FIFO queues. In Proceedings of the 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC’11). ACM, New York, NY, 331--332.
[23]
A. Pop and A. Cohen. 2013. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. ACM Trans. Archit. Code Optim. 9, 4 (Jan. 2013).
[24]
P. Pratikakis, H. Vandierendonck, S. Lyberis, and D. S. Nikolopoulos. 2011. A programming model for deterministic task parallelism. In Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness (MSPC’11). ACM, New York, NY, 7--12.
[25]
Aleksandar Prokopec, Heather Miller, Tobias Schlatter, Philipp Haller, and Martin Odersky. 2013. FlowPools: A Lock-Free Deterministic Concurrent Dataflow Abstraction. Springer, Berlin, 158--173.
[26]
W. Pugh. 1990. Skip lists: A probabilistic alternative to balanced trees. Commun. ACM 33, 6 (June 1990), 668--676.
[27]
A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. 2011. Parallelism orchestration using DoPE: The degree of parallelism executive. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY, 26--37.
[28]
E. C. Reed, N. Chen, and R. E. Johnson. 2011. Expressing pipeline parallelism using TBB constructs: A case study on what works and what doesn’t. In Proceedings of the Compilation of the Co-located Workshops on DSM’11, TMC’11, AGERE! 2011, AOOPES’11, NEAT’11, and VMIL’11 (SPLASH’11 Workshops). ACM, New York, NY, 133--138.
[29]
D. Sanchez, D. Lo, R. M. Yoo, J. Sugerman, and C. Kozyrakis. 2011. Dynamic fine-grain scheduling of pipeline parallelism. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE Computer Society, Washington, DC, 22--32.
[30]
J. Shirako, D. M. Peixotto, V. Sarkar, and W. N. Scherer. 2008. Phasers: A unified deadlock-free construct for collective and point-to-point synchronization. In Proceedings of the 22nd Annual International Conference on Supercomputing (ICS’08). ACM, New York, NY, 277--288.
[31]
M. A. Suleman, M. K. Qureshi, Khubaib, and Y. N. Patt. 2010. Feedback-directed pipeline parallelism. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, New York, NY, 147--156.
[32]
W. Thies and S. Amarasinghe. 2010. An empirical characterization of stream programs and its implications for language and compiler design. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, New York, NY, 365--376.
[33]
W. Thies, M. Karczmarek, and S. P. Amarasinghe. 2002. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC’02). Springer-Verlag, London, UK, 179--196. Retrieved from http://dl.acm.org/citation.cfm?id=647478.727935.
[34]
D. Unnikrishnan, J. Zhao, and R. Tessier. 2009. Application specific customization and scalability of soft multiprocessors. In Proceedings of the 17th IEEE Symposium on Field Programmable Custom Computing Machines (FCCM’09). 123--130.
[35]
J. D. Valois. 1994. Implementing lock-free queues. In Proceedings of the 7th International Conference on Parallel and Distributed Computing Systems. IEEE Computer Society, Las Vegas, 64--69.
[36]
H. Vandierendonck, K. Chronaki, and D. S. Nikolopoulos. 2013. Deterministic scale-free pipeline parallelism with hyperqueues. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’13). ACM, New York, NY.
[37]
H. Vandierendonck, P. Pratikakis, and D. S. Nikolopoulos. 2011a. Parallel programming of general-purpose programs using task-based programming models. In Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism (HotPar’11). USENIX Association, Berkeley, CA, 6.
[38]
H. Vandierendonck, G. Tzenakis, and D. S. Nikolopoulos. 2011b. A unified scheduler for recursive and task dataflow parallelism. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques (PACT’11). IEEE Computer Society, Washington, DC, 1--11.
[39]
H. Vandierendonck, G. Tzenakis, and D. S. Nikolopoulos. 2013. Analysis of dependence tracking algorithms for task dataflow execution. ACM Trans. Archit. Code Optim. 10, 4 (Dec. 2013).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 6, Issue 4
December 2019
188 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3372747
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 November 2019
Accepted: 01 July 2019
Revised: 01 July 2019
Received: 01 June 2016
Published in TOPC Volume 6, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. Hyperqueue

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • European Community's Seventh Framework Programme
  • NovoSoft project Marie Curie Actions
  • United Kingdom EPSRC GEMSCLAIM project
  • TEXT project

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 213
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media