research-article

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management

Authors:

Vivek Seshadri,

Gennady Pekhimenko,

Olatunji Ruwase,

Phillip B. Gibbons,

Michael A. Kozuch,

Trishul ChilimbiAuthors Info & Claims

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

Pages 79 - 91

https://doi.org/10.1145/2749469.2750379

Published: 13 June 2015 Publication History

Abstract

Many recent works propose mechanisms demonstrating the potential advantages of managing memory at a fine (e.g., cache line) granularity---e.g., fine-grained deduplication and fine-grained memory protection. Unfortunately, existing virtual memory systems track memory at a larger granularity (e.g., 4 KB pages), inhibiting efficient implementation of such techniques. Simply reducing the page size results in an unacceptable increase in page table overhead and TLB pressure.

We propose a new virtual memory framework that enables efficient implementation of a variety of fine-grained memory management techniques. In our framework, each virtual page can be mapped to a structure called a page overlay, in addition to a regular physical page. An overlay contains a subset of cache lines from the virtual page. Cache lines that are present in the overlay are accessed from there and all other cache lines are accessed from the regular physical page. Our page-overlay framework enables cache-line-granularity memory management without significantly altering the existing virtual memory framework or introducing high overheads.

We show that our framework can enable simple and efficient implementations of seven memory management techniques, each of which has a wide variety of applications. We quantitatively evaluate the potential benefits of two of these techniques: overlay-on-write and sparse-data-structure computation. Our evaluations show that overlay-on-write, when applied to fork, can improve performance by 15% and reduce memory capacity requirements by 53% on average compared to traditional copy-on-write. For sparse data computation, our framework can outperform a state-of-the-art software-based sparse representation on a number of real-world sparse matrices. Our framework is general, powerful, and effective in enabling fine-grained memory management at low cost.

References

[1]

fork(2) - Linux manual page. http://man7.org/linux/man-pages/man2/fork.2.html.

[2]

Memsim. http://safari.ece.cmu.edu/tools.html, 2012.

[3]

C. S. Ananian, K. Asanovic, B. C. Kuszmaul, C. E. Leiserson, and S. Lie. Unbounded Transactional Memory. In HPCA, 2005.

Digital Library

[4]

P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the Art of Virtualization. In SOSP, 2003.

Digital Library

[5]

A. Basu, J. Gandhi, J. Chang, M. D. Hill, and M. M. Swift. Efficient Virtual Memory for Big Memory Servers. In ISCA, 2013.

Digital Library

[6]

J. Bent, G. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate. PLFS: A Checkpoint Filesystem for Parallel Applications. In SC, 2009.

Digital Library

[7]

D. L. Black, R. F. Rashid, D. B. Golub, and C. R. Hill. Translation lookaside buffer consistency: A software approach. In ASPLOS, 1989.

Digital Library

[8]

J. C. Brustoloni. Interoperation of copy avoidance in network and file I/O. In INFOCOM, volume 2, 1999.

[9]

J. Carter, W. Hsieh, L. Stoller, M. Swanson, L. Zhang, E. Brunvand, A. Davis, C.-C. Kuo, R. Kuramkote, M. Parker, L. Schaelicke, and T. Tateyama. Impulse: Building a Smarter Memory Controller. In HPCA, 1999.

Digital Library

[10]

M. Cekleov and M. Dubois. Virtual-Address Caches Part 1: Problems and Solutions in Uniprocessors. IEEE Micro, 17(5), 1997.

Digital Library

[11]

F. Chang and G. A. Gibson. Automatic I/O Hint Generation Through Speculative Execution. In OSDI, 1999.

Digital Library

[12]

D. Cheriton, A. Firoozshahian, A. Solomatnikov, J. P. Stevenson, and Omid A. HICAMP: Architectural Support for Efficient Concurrency-safe Shared Structured Data Access. In ASPLOS, 2012.

Digital Library

[13]

K. Constantinides, O. Mutlu, and T. Austin. Online design bug detection: Rtl analysis, flexible mechanisms, and evaluation. In MICRO, 2008.

Digital Library

[14]

K. Constantinides, O. Mutlu, T. Austin, and V. Bertacco. Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation. In MICRO, 2007.

Digital Library

[15]

Intel Corporation. Intel Architecture Instruction Set Extensions Programming Reference, chapter 8. Intel Transactional Synchronization Extensions. Sep 2012.

[16]

Standard Performance Evaluation Corporation. SPEC CPU2006 Benchmark Suite. www.spec.org/cpu2006, 2006.

[17]

T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. TOMS, 38(1), 2011.

Digital Library

[18]

P. J. Denning. Virtual Memory. ACM Computer Survey, 2(3), 1970.

Digital Library

[19]

I. P. Egwutuoha, D. Levy, B. Selic, and S. Chen. A Survey of Fault Tolerance Mechanisms and Checkpoint/Restart Implementations for High Performance Computing Systems. Journal of Supercomputing, 2013.

Digital Library

[20]

S. C. Eisenstat, M. C. Gursky, M. H. Schultz, and A. H. Sherman. Yale Sparse Matrix Package I: The Symmetric Codes. IJNME, 18(8), 1982.

[21]

M. Ekman and P. Stenstrom. A Robust Main-Memory Compression Scheme. In ISCA, 2005.

Digital Library

[22]

J. Fotheringham. Dynamic Storage Allocation in the Atlas Computer, Including an Automatic Use of a Backing Store. Commun. ACM, 1961.

Digital Library

[23]

M. Gorman. Understanding the Linux Virtual Memory Manager, chapter 4, page 57. Prentice Hall, 2004.

[24]

D. Gupta, S. Lee, M. Vrable, S. Savage, A. C. Snoeren, G. Varghese, G. M. Voelker, and A. Vahdat. Difference Engine: Harnessing Memory Redundancy in Virtual Machines. In OSDI, 2008.

Digital Library

[25]

M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-free Data Structures. In ISCA, 1993.

Digital Library

[26]

Intel. Architecture Guide: Intel Active Management Technology. https://software.intel.com/en-us/articles/architecture-guide-intel-active-management-technology/.

[27]

Intel. Sparse Matrix Storage Formats, Intel Math Kernel Library. https://software.intel.com/en-us/node/471374.

[28]

A. Jaleel, K. B. Theobald, S. C. Steely, Jr., and J. Emer. High performance cache replacement using re-reference interval prediction (rrip). In ISCA, 2010.

Digital Library

[29]

JEDEC. DDR3 SDRAM, JESD79-3F, 2012.

[30]

L. Jiang, Y. Zhang, and J. Yang. Mitigating Write Disturbance in Super-Dense Phase Change Memories. In DSN, 2014.

Digital Library

[31]

T. Kilburn, D. B. G. Edwards, M. J. Lanigan, and F. H. Sumner. One-Level Storage System. IRE Transactions on Electronic Computers, 11(2), 1962.

[32]

S. Kumar, H. Zhao, A. Shriraman, E. Matthews, S. Dwarkadas, and L. Shannon. Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy. In MICRO, 2012.

Digital Library

[33]

H. A. Lagar-Cavilla, J. A. Whitney, A. M. Scannell, P. Patchin, S. M. Rumble, E. de Lara, M. Brudno, and M. Satyanarayanan. SnowFlock: Rapid Virtual Machine Cloning for Cloud Computing. In EuroSys, 2009.

Digital Library

[34]

H. Q. Le, W. J. Starke, J. S. Fields, F. P. O'Connell, D. Q. Nguyen, B. J. Ronchetti, W. M. Sauer, E. M. Schwarz, and M. T. Vaden. Ibm power6 microarchitecture. IBM JRD, 51(6), 2007.

Digital Library

[35]

C. J. Lee, V. Narasiman, E. Ebrahimi, O. Mutlu, and Y. N. Patt. DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. Technical Report TR-HPS-2010-2, University of Texas at Austin, 2010.

[36]

V. Nagarajan and R. Gupta. Architectural Support for Shadow Memory in Multiprocessors. In VEE, 2009.

Digital Library

[37]

E. B. Nightingale, P. M. Chen, and J. Flinn. Speculative execution in a distributed file system. In SOSP, 2005.

Digital Library

[38]

G. Pekhimenko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Linearly Compressed Pages: A Low-complexity, Low-latency Main Memory Compression Framework. In MICRO, 2013.

Digital Library

[39]

M. Prvulovic, Z. Zhang, and J. Torrellas. Revive: Cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In ISCA, 2002.

Digital Library

[40]

E. D. Reilly. Memory-mapped I/O. In Encyclopedia of Computer Science, page 1152. John Wiley and Sons Ltd., Chichester, UK.

[41]

B. Romanescu, A. R. Lebeck, D. J. Sorin, and A. Bracy. UNified Instruction/Translation/Data (UNITD) Coherence: One Protocol to Rule Them All. In HPCA, 2010.

[42]

R. F. Sauers, C. P. Ruemmler, and P. S. Weygant. HP-UX 11i Tuning and Performance, chapter 8. Memory Bottlenecks. Prentice Hall, 2004.

Digital Library

[43]

S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A Dynamic Data Race Detector for Multithreaded Programs. TOCS, 15(4), November 1997.

Digital Library

[44]

V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. RowClone: Fast and Energy-efficient in-DRAM Bulk Data Copy and Initialization. In MICRO, 2013.

Digital Library

[45]

V. Seshadri, O. Mutlu, M. A. Kozuch, and T. C. Mowry. The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing. In PACT, 2012.

Digital Library

[46]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In ASPLOS, 2002.

Digital Library

[47]

W. Shi, H.-H. S. Lee, L. Falk, and M. Ghosh. An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors. In ISCA, 2006.

Digital Library

[48]

A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts, chapter 11. File-System Implementation. Wiley, 2012.

Digital Library

[49]

G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In ISCA, 1995.

Digital Library

[50]

D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. Safetynet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In ISCA, 2002.

Digital Library

[51]

S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers. In HPCA, 2007.

Digital Library

[52]

S. M. Srinivasan, S. Kandula, C. R. Andrews, and Y. Zhou. Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging. In USENIX ATC, 2004.

Digital Library

[53]

M. E. Staknis. Sheaved Memory: Architectural Support for State Saving and Restoration in Pages Systems. In ASPLOS, 1989.

Digital Library

[54]

J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A Scalable Approach to Thread-level Speculation. In ISCA, 2000.

Digital Library

[55]

P. J. Teller. Translation-Lookaside Buffer Consistency. IEEE Computer, 23(6), 1990.

Digital Library

[56]

G. Venkataramani, I. Doudalis, D. Solihin, and M. Prvulovic. FlexiTaint: A Programmable Accelerator for Dynamic Taint Propagation. In HPCA, 2008.

[57]

C. Villavieja, V. Karakostas, L. Vilanova, Y. Etsion, A. Ramirez, A. Mendelson, N. Navarro, A. Cristal, and O. S. Unsal. DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory. In PACT, 2011.

Digital Library

[58]

C. A. Waldspurger. Memory Resource Management in VMware ESX Server. OSDI, 2002.

Digital Library

[59]

Y-M. Wang, Y. Huang, K-P. Vo, P-Y. Chung, and C. Kintala. Checkpointing and its applications. In FTCS, 1995.

Digital Library

[60]

B. Wester, P. M. Chen, and J. Flinn. Operating system support for application-specific speculation. In EuroSys, 2011.

Digital Library

[61]

A. Wiggins, S. Winwood, H. Tuch, and G. Heiser. Legba: Fast Hardware Support for Fine-Grained Protection. In Amos Omondi and Stanislav Sedukhin, editors, Advances in Computer Systems Architecture, volume 2823 of Lecture Notes in Computer Science, 2003.

[62]

E. Witchel, J. Cates, and K. Asanović. Mondrian Memory Protection. In ASPLOS, 2002.

Digital Library

[63]

Q. Zhao, D. Bruening, and S. Amarasinghe. Efficient Memory Shadowing for 64-bit Architectures. In ISMM, 2010.

Digital Library

Cited By

Hildenbrand DSchulz MAmit NAamodt TJerger NSwift M(2023)Copy-on-Pin: The Missing Piece for Correct Copy-on-WriteProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575716(176-191)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575716
Vijaykumar NOlgun AKanellopoulos KBostanci FHassan HLotfi MGibbons PMutlu O(2022)MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer OptimizationsACM Transactions on Architecture and Code Optimization10.1145/350525019:2(1-29)Online publication date: 24-Mar-2022
https://dl.acm.org/doi/10.1145/3505250
Wang ZChoo CKozuch MMowry TPekhimenko GSeshadri VSkarlatos D(2021)NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00046(498-511)Online publication date: Jun-2021
https://doi.org/10.1109/ISCA52012.2021.00046
Show More Cited By

Index Terms

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management
1. Hardware
  1. Hardware validation
  2. Integrated circuits
    1. Semiconductor memory
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Virtual memory

Recommendations

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management
ISCA'15

Many recent works propose mechanisms demonstrating the potential advantages of managing memory at a fine (e.g., cache line) granularity---e.g., fine-grained deduplication and fine-grained memory protection. Unfortunately, existing virtual memory systems ...
Migration based page caching algorithm for a hybrid main memory of DRAM and PRAM
SAC '11: Proceedings of the 2011 ACM Symposium on Applied Computing

As the DRAM based main memory significantly increases the power and cost budget of a computer system, new memory technologies such as Phase-change RAM (PRAM), Ferroelectric RAM (FRAM), and Magnetic RAM (MRAM) have been proposed to replace the DRAM. ...
Page replacement algorithms for NAND flash memory storages
ICCSA'07: Proceedings of the 2007 international conference on Computational science and its applications - Volume Part I

This paper presents new page replacement algorithms for NAND flash memory, called CFLRU/C, CFLRU/E, and DL-CFLRU/E. The algorithms aim at reducing the number of erase operations and improving the wear-leveling degree of flash memory. In the CFLRU/C and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

June 2015

768 pages

ISBN:9781450334020

DOI:10.1145/2749469

General Chair:
Debbie Marr
Intel
,
Program Chair:
David Albonesi
Cornell

ACM SIGARCH Computer Architecture News Volume 43, Issue 3S
ISCA'15
June 2015
745 pages
ISSN:0163-5964
DOI:10.1145/2872887
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE TCCA: IEEE Computer Society Technical Committee on Computer Architecture
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ISCA '15

Sponsor:

IEEE TCCA
SIGARCH

ISCA '15: The 42nd Annual International Symposium on Computer Architecture

June 13 - 17, 2015

Oregon, Portland

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
915
Total Downloads

Downloads (Last 12 months)83
Downloads (Last 6 weeks)12

Reflects downloads up to 15 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hildenbrand DSchulz MAmit NAamodt TJerger NSwift M(2023)Copy-on-Pin: The Missing Piece for Correct Copy-on-WriteProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575716(176-191)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575716
Vijaykumar NOlgun AKanellopoulos KBostanci FHassan HLotfi MGibbons PMutlu O(2022)MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer OptimizationsACM Transactions on Architecture and Code Optimization10.1145/350525019:2(1-29)Online publication date: 24-Mar-2022
https://dl.acm.org/doi/10.1145/3505250
Wang ZChoo CKozuch MMowry TPekhimenko GSeshadri VSkarlatos D(2021)NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00046(498-511)Online publication date: Jun-2021
https://doi.org/10.1109/ISCA52012.2021.00046
Chen SChen TChang YWei HShih W(2020)A Partial Page Cache Strategy for NVRAM-Based Storage DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2018.288704539:2(373-386)Online publication date: Feb-2020
https://doi.org/10.1109/TCAD.2018.2887045
Hajinazar NPatel PPatel MKanellopoulos KGhose SAusavarungnirun ROliveira GAppavoo JSeshadri VMutlu OMartínez JDuato JEeckhout L(2020)The virtual block interfaceProceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture10.1109/ISCA45697.2020.00089(1050-1063)Online publication date: 30-May-2020
https://dl.acm.org/doi/10.1109/ISCA45697.2020.00089
Arima E(2020)Classification-Based Unified Cache Replacement via Partitioned Victim Address History2020 23rd Euromicro Conference on Digital System Design (DSD)10.1109/DSD51259.2020.00027(101-108)Online publication date: Aug-2020
https://doi.org/10.1109/DSD51259.2020.00027
Ni YZhao JLitz HBittman DMiller E(2019)SSPProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358326(836-848)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1145/3352460.3358326
Kanellopoulos KVijaykumar NGiannoula CAzizi RKoppula SGhiasi NShahroodi TLuna JMutlu O(2019)SMASHProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358286(600-614)Online publication date: 12-Oct-2019
https://dl.acm.org/doi/10.1145/3352460.3358286
Calciu IPuddu IKolli ANowatzyk AGandhi JMutlu OSubrahmanyam P(2019)Project PBerryProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3317550.3321424(127-135)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3317550.3321424
Du DHua ZXia YZang BChen HManne SHunter HAltman E(2019)XPCProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322218(671-684)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322218
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents