skip to main content
research-article

Scalable address spaces using RCU balanced trees

Published: 03 March 2012 Publication History

Abstract

Software developers commonly exploit multicore processors by building multithreaded software in which all threads of an application share a single address space. This shared address space has a cost: kernel virtual memory operations such as handling soft page faults, growing the address space, mapping files, etc. can limit the scalability of these applications. In widely-used operating systems, all of these operations are synchronized by a single per-process lock. This paper contributes a new design for increasing the concurrency of kernel operations on a shared address space by exploiting read-copy-update (RCU) so that soft page faults can both run in parallel with operations that mutate the same address space and avoid contending with other page faults on shared cache lines. To enable such parallelism, this paper also introduces an RCU-based binary balanced tree for storing memory mappings. An experimental evaluation using three multithreaded applications shows performance improvements on 80 cores ranging from 1.7x to 3.4x for an implementation of this design in the Linux 2.6.37 kernel. The RCU-based binary tree enables soft page faults to run at a constant cost with an increasing number of cores,suggesting that the design will scale well beyond 80 cores.

References

[1]
S. Adams. Implementing sets efficiently in a functional language. Technical Report CSTR 92--10, University of Southampton, 1992.
[2]
C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011.
[3]
S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. D. Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In Proc. of the 8th OSDI, December 2008.
[4]
S. Boyd-Wickizer, A. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of Linux scalability to many cores. In Proc. of the 9th OSDI, Vancouver, Canada, October 2010.
[5]
J. Evans. A scalable concurrent malloc (3) implementation for FreeBSD. In Proc. of the BSDCan Conference, Ottawa, Canada, April 2006.
[6]
K. Fraser. Practical lock freedom. Technical Report UCAM-CL-TR-579, Cambridge University, 2003.
[7]
K. Fraser and T. Harris. Concurrent programming without locks. ACM Transactions on Computer Systems, 25 (2), May 2007.
[8]
M. Herlihy. A methodology for implementing highly concurrent data objects. Technical Report CRL 91/10, Digital Equipment Corporation, October 1991.
[9]
P. W. Howard and J. Walpole. Relativistic red-black trees. Technical Report 10-06, Portland State University, Computer Science Department, 2010.
[10]
P. L. Lehman and S. B. Yao. Efficient locking for concurrent operations on B-trees. ACM Transactions on Database Systems, 6: 650--670, December 1981.
[11]
Linux Test Project. http://ltp.sourceforge.net/.
[12]
Y. Mao, R. Morris, and F. Kaashoek. Optimizing MapReduce for multicore architectures. Technical Report MIT-CSAIL-TR-2010-020, MIT CSAIL, May 2010.
[13]
P. E. McKenney. Exploiting Deferred Destruction: An Analysis of Read-Copy-Update Techniques in Operating System Kernels. PhD thesis, OGI School of Science and Engineering at Oregon Health and Sciences University, 2004. Available: http://www.rdrop.com/users/paulmck/RCU/RCUdissertation.2004.07.14e1.pdf.
[14]
P. E. McKenney. Sleepable RCU. Available: http://lwn.net/Articles/202847/ Revised: http://www.rdrop.com/users/paulmck/RCU/srcu.2007.01.14a.pdf, October 2006.
[15]
P. E. McKenney and J. D. Slingwine. Read-copy update: Using execution history to solve concurrency problems. In Proc. of the 10th IASTED International Conference on Parallel and Distributed Computing and Systems, pages 509--518, Las Vegas, NV, October 1998.
[16]
P. E. McKenney and J. Walpole. Introducing technology into the Linux kernel: a case study. SIGOPS Operating Systems Review, 42 (5): 4--17, July 2008.
[17]
P. E. McKenney, J. Appavoo, A. Kleen, O. Krieger, R. Russell, D. Sarma, and M. Soni. Read-copy update. In Proc. of the Ottawa Linux Symposium, pages 338--367, July 2001.
[18]
Microsoft Corp. Windows research kernel. http://www.microsoft.com/resources/sharedsource/windowsacademic/researc%hkernelkit.mspx.
[19]
J. Nievergelt and E. M. Reingold. Binary search trees of bounded balance. In Proc. of the 4th STOC, pages 137--142, Denver, CO, 1972.
[20]
W. Pugh. Concurrent maintenance of skip lists. Technical Report CS-TR-2222, Dept. of Computer Science, University of Maryland, College Park, 1990.
[21]
QEMU. http://www.qemu.org/.
[22]
S. Schneider, C. D. Antonopoulos, and D. S. Nikolopoulos. Scalable locality-conscious multithreaded memory allocation. In Proc. of the 2006 ACM SIGPLAN International Symposium on Memory Management, Ottawa, Canada, June 2006.
[23]
J. Stribling, J. Li, I. G. Councill, M. F. Kaashoek, and R. Morris. Overcite: A distributed, cooperative CiteSeer. In Proc. of the 3rd NSDI, San Jose, CA, May 2006.
[24]
L. Wang. Windows 7 memory management, November 2009. http://download.microsoft.com/download/7/E/7/7E7662CF-CBEA-470B-A97E-CE%7CE0D98DC2/mmwin7.pptx.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 40, Issue 1
ASPLOS '12
March 2012
453 pages
ISSN:0163-5964
DOI:10.1145/2189750
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
    March 2012
    476 pages
    ISBN:9781450307598
    DOI:10.1145/2150976
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 March 2012
Published in SIGARCH Volume 40, Issue 1

Check for updates

Author Tags

  1. RCU
  2. concurrent balanced trees
  3. lock-free algorithms
  4. multicore
  5. scalability
  6. virtual memory

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)97
  • Downloads (Last 6 weeks)3
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media