skip to main content
10.1145/2749469.2750379acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management

Published: 13 June 2015 Publication History

Abstract

Many recent works propose mechanisms demonstrating the potential advantages of managing memory at a fine (e.g., cache line) granularity---e.g., fine-grained deduplication and fine-grained memory protection. Unfortunately, existing virtual memory systems track memory at a larger granularity (e.g., 4 KB pages), inhibiting efficient implementation of such techniques. Simply reducing the page size results in an unacceptable increase in page table overhead and TLB pressure.
We propose a new virtual memory framework that enables efficient implementation of a variety of fine-grained memory management techniques. In our framework, each virtual page can be mapped to a structure called a page overlay, in addition to a regular physical page. An overlay contains a subset of cache lines from the virtual page. Cache lines that are present in the overlay are accessed from there and all other cache lines are accessed from the regular physical page. Our page-overlay framework enables cache-line-granularity memory management without significantly altering the existing virtual memory framework or introducing high overheads.
We show that our framework can enable simple and efficient implementations of seven memory management techniques, each of which has a wide variety of applications. We quantitatively evaluate the potential benefits of two of these techniques: overlay-on-write and sparse-data-structure computation. Our evaluations show that overlay-on-write, when applied to fork, can improve performance by 15% and reduce memory capacity requirements by 53% on average compared to traditional copy-on-write. For sparse data computation, our framework can outperform a state-of-the-art software-based sparse representation on a number of real-world sparse matrices. Our framework is general, powerful, and effective in enabling fine-grained memory management at low cost.

References

[1]
fork(2) - Linux manual page. http://man7.org/linux/man-pages/man2/fork.2.html.
[2]
Memsim. http://safari.ece.cmu.edu/tools.html, 2012.
[3]
C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie. Unbounded Transactional Memory. In HPCA, 2005.
[4]
P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the Art of Virtualization. In SOSP, 2003.
[5]
A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift. Efficient Virtual Memory for Big Memory Servers. In ISCA, 2013.
[6]
J. Bent, G. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate. PLFS: A Checkpoint Filesystem for Parallel Applications. In SC, 2009.
[7]
D. L. Black, R. F. Rashid, D. B. Golub, and C. R. Hill. Translation lookaside buffer consistency: A software approach. In ASPLOS, 1989.
[8]
J. C. Brustoloni. Interoperation of copy avoidance in network and file I/O. In INFOCOM, volume 2, 1999.
[9]
J. Carter, W. Hsieh, L. Stoller, M. Swanson, L. Zhang, E. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M. Parker, L. Schaelicke, and T. Tateyama. Impulse: Building a Smarter Memory Controller. In HPCA, 1999.
[10]
M. Cekleov and M. Dubois. Virtual-Address Caches Part 1: Problems and Solutions in Uniprocessors. IEEE Micro, 17(5), 1997.
[11]
F. Chang and G. A. Gibson. Automatic I/O Hint Generation Through Speculative Execution. In OSDI, 1999.
[12]
D. Cheriton, A. Firoozshahian, A. Solomatnikov, J. P. Stevenson, and Omid A. HICAMP: Architectural Support for Efficient Concurrency-safe Shared Structured Data Access. In ASPLOS, 2012.
[13]
K. Constantinides, O. Mutlu, and T. Austin. Online design bug detection: Rtl analysis, flexible mechanisms, and evaluation. In MICRO, 2008.
[14]
K. Constantinides, O. Mutlu, T. Austin, and V. Bertacco. Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation. In MICRO, 2007.
[15]
Intel Corporation. Intel Architecture Instruction Set Extensions Programming Reference, chapter 8. Intel Transactional Synchronization Extensions. Sep 2012.
[16]
Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmark Suite. www.spec.org/cpu2006, 2006.
[17]
T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. TOMS, 38(1), 2011.
[18]
P. J. Denning. Virtual Memory. ACM Computer Survey, 2(3), 1970.
[19]
I. P. Egwutuoha, D. Levy, B. Selic, and S. Chen. A Survey of Fault Tolerance Mechanisms and Checkpoint/Restart Implementations for High Performance Computing Systems. Journal of Supercomputing, 2013.
[20]
S. C. Eisenstat, M. C. Gursky, M. H. Schultz, and A. H. Sherman. Yale Sparse Matrix Package I: The Symmetric Codes. IJNME, 18(8), 1982.
[21]
M. Ekman and P. Stenstrom. A Robust Main-Memory Compression Scheme. In ISCA, 2005.
[22]
J. Fotheringham. Dynamic Storage Allocation in the Atlas Computer, Including an Automatic Use of a Backing Store. Commun. ACM, 1961.
[23]
M. Gorman. Understanding the Linux Virtual Memory Manager, chapter 4, page 57. Prentice Hall, 2004.
[24]
D. Gupta, S. Lee, M. Vrable, S. Savage, A. C. Snoeren, G. Varghese, G. M. Voelker, and A. Vahdat. Difference Engine: Harnessing Memory Redundancy in Virtual Machines. In OSDI, 2008.
[25]
M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-free Data Structures. In ISCA, 1993.
[26]
Intel. Architecture Guide: Intel Active Management Technology. https://software.intel.com/en-us/articles/architecture-guide-intel-active-management-technology/.
[27]
Intel. Sparse Matrix Storage Formats, Intel Math Kernel Library. https://software.intel.com/en-us/node/471374.
[28]
A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ISCA, 2010.
[29]
JEDEC. DDR3 SDRAM, JESD79-3F, 2012.
[30]
L. Jiang, Y. Zhang, and J. Yang. Mitigating Write Disturbance in Super-Dense Phase Change Memories. In DSN, 2014.
[31]
T. Kilburn, D. B. G. Edwards, M. J. Lanigan, and F. H. Sumner. One-Level Storage System. IRE Transactions on Electronic Computers, 11(2), 1962.
[32]
S. Kumar, H. Zhao, A. Shriraman, E. Matthews, S. Dwarkadas, and L. Shannon. Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy. In MICRO, 2012.
[33]
H. A. Lagar-Cavilla, J. A. Whitney, A. M. Scannell, P. Patchin, S. M. Rumble, E. de Lara, M. Brudno, and M. Satyanarayanan. SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing. In EuroSys, 2009.
[34]
H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. M. Sauer, E. M. Schwarz, and M. T. Vaden. Ibm power6 microarchitecture. IBM JRD, 51(6), 2007.
[35]
C. J. Lee, V. Narasiman, E. Ebrahimi, O. Mutlu, and Y. N. Patt. DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report TR-HPS-2010-2, University of Texas at Austin, 2010.
[36]
V. Nagarajan and R. Gupta. Architectural Support for Shadow Memory in Multiprocessors. In VEE, 2009.
[37]
E. B. Nightingale, P. M. Chen, and J. Flinn. Speculative execution in a distributed file system. In SOSP, 2005.
[38]
G. Pekhimenko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Linearly Compressed Pages: A Low-complexity, Low-latency Main Memory Compression Framework. In MICRO, 2013.
[39]
M. Prvulovic, Z. Zhang, and J. Torrellas. Revive: Cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In ISCA, 2002.
[40]
E. D. Reilly. Memory-mapped I/O. In Encyclopedia of Computer Science, page 1152. John Wiley and Sons Ltd., Chichester, UK.
[41]
B. Romanescu, A. R. Lebeck, D. J. Sorin, and A. Bracy. UNified Instruction/Translation/Data (UNITD) Coherence: One Protocol to Rule Them All. In HPCA, 2010.
[42]
R. F. Sauers, C. P. Ruemmler, and P. S. Weygant. HP-UX 11i Tuning and Performance, chapter 8. Memory Bottlenecks. Prentice Hall, 2004.
[43]
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A Dynamic Data Race Detector for Multithreaded Programs. TOCS, 15(4), November 1997.
[44]
V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. RowClone: Fast and Energy-efficient in-DRAM Bulk Data Copy and Initialization. In MICRO, 2013.
[45]
V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing. In PACT, 2012.
[46]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In ASPLOS, 2002.
[47]
W. Shi, H.-H. S. Lee, L. Falk, and M. Ghosh. An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors. In ISCA, 2006.
[48]
A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts, chapter 11. File-System Implementation. Wiley, 2012.
[49]
G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In ISCA, 1995.
[50]
D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. Safetynet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In ISCA, 2002.
[51]
S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA, 2007.
[52]
S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In USENIX ATC, 2004.
[53]
M. E. Staknis. Sheaved Memory: Architectural Support for State Saving and Restoration in Pages Systems. In ASPLOS, 1989.
[54]
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A Scalable Approach to Thread-level Speculation. In ISCA, 2000.
[55]
P. J. Teller. Translation-Lookaside Buffer Consistency. IEEE Computer, 23(6), 1990.
[56]
G. Venkataramani, I. Doudalis, D. Solihin, and M. Prvulovic. FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation. In HPCA, 2008.
[57]
C. Villavieja, V. Karakostas, L. Vilanova, Y. Etsion, A. Ramirez, A. Mendelson, N. Navarro, A. Cristal, and O. S. Unsal. DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory. In PACT, 2011.
[58]
C. A. Waldspurger. Memory Resource Management in VMware ESX Server. OSDI, 2002.
[59]
Y-M. Wang, Y. Huang, K-P. Vo, P-Y. Chung, and C. Kintala. Checkpointing and its applications. In FTCS, 1995.
[60]
B. Wester, P. M. Chen, and J. Flinn. Operating system support for application-specific speculation. In EuroSys, 2011.
[61]
A. Wiggins, S. Winwood, H. Tuch, and G. Heiser. Legba: Fast Hardware Support for Fine-Grained Protection. In Amos Omondi and Stanislav Sedukhin, editors, Advances in Computer Systems Architecture, volume 2823 of Lecture Notes in Computer Science, 2003.
[62]
E. Witchel, J. Cates, and K. Asanović. Mondrian Memory Protection. In ASPLOS, 2002.
[63]
Q. Zhao, D. Bruening, and S. Amarasinghe. Efficient Memory Shadowing for 64-bit Architectures. In ISMM, 2010.

Cited By

View all
  • (2023)Copy-on-Pin: The Missing Piece for Correct Copy-on-WriteProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575716(176-191)Online publication date: 27-Jan-2023
  • (2022)MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer OptimizationsACM Transactions on Architecture and Code Optimization10.1145/350525019:2(1-29)Online publication date: 24-Mar-2022
  • (2021)NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00046(498-511)Online publication date: Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
June 2015
768 pages
ISBN:9781450334020
DOI:10.1145/2749469
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ISCA '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)83
  • Downloads (Last 6 weeks)12
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media