skip to main content
research-article
Public Access

Efficient Address Translation for Architectures with Multiple Page Sizes

Published: 04 April 2017 Publication History

Abstract

Processors and operating systems (OSes) support multiple memory page sizes. Superpages increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained memory protection. Ideally, TLBs should perform well for any distribution of page sizes. In reality, set-associative TLBs -- used frequently for their energy efficiency compared to fully-associative TLBs -- cannot (easily) support multiple page sizes concurrently. Instead, commercial systems typically implement separate set-associative TLBs for different page sizes. This means that when superpages are allocated aggressively, TLB misses may, counter intuitively, increase even if entries for small pages remain unused (and vice-versa). We invent MIX TLBs, energy-frugal set-associative structures that concurrently support all page sizes by exploiting superpage allocation patterns. MIX TLBs boost the performance (often by 10-30%) of big-memory applications on native CPUs, virtualized CPUs, and GPUs. MIX TLBs are simple and require no OS or program changes.

References

[1]
J. Navarro, S. Iyer, P. Druschel, and A. Cox, "Practical, Transparent Operating System Support for Superpages," OSDI, 2002.
[2]
M. Talluri and M. Hill, "Surpassing the TLB Performance of Superpages with Less Operating System Support," ASPLOS, 1994.
[3]
M. Talluri, S. Kong, M. Hill, and D. Patterson, "Tradeoffs in Supporting Two Page Sizes," ISCA, 1992.
[4]
B. Pham, J. Vesely, G. Loh, and A. Bhattacharjee, "Large P ages and Lightweight Memory Management in Virtualized Systems: Can You Have it Both Ways?," MICRO, 2015.
[5]
D. Fan, Z. Tang, H. Huang, and G. Gao, "An Energy Efficient TLB Design Methodology," ISLPED, 2005.
[6]
V. Karakostas, J. Gandhi, A. Cristal, M. Hill, K. McKinle y, M. Nemirovsky, M. Swift, and O. Unsal, "Energy-Efficient Address Translation," HPCA, 2016.
[7]
T. Juan, T. Lang, and J. Navarro, "Reducing TLB Power Requirements," ISLPED, 1997.
[8]
I. Kadayif, A. Sivasubramaniam, M. Kandemir, G. Kandiraju, and G. Chen, "Generating Physical Addresses Directly for Saving Instruction TLB Energy," MICRO, 2002.
[9]
A. Sodani, "Race to Exascale: Opportunities and Challenges," MICRO Keynote, 2011.
[10]
M. Papadopoulou, X. Tong, A. Seznec, and A. Moshovos, "Prediction-Based Superpage-Friendly TLB Designs," HPCA, 2014.
[11]
Intel, "Haswell," www.7-cpu.com/cpu/Haswell.html, 2016.
[12]
Intel, "Skylake," www.7-cpu.com/cpu/Skylake.html, 2016.
[13]
J. Gandhi, A. Basu, M. Hill, and M. Swift, "Efficient Memory Virtualization," MICRO, 2014.
[14]
J. Buell, D. Hecht, J. Heo, K. Saladi, and R. Taheri, "Methodology for Performance Analysis of VMware vSphere under Tier-1 Applications," VMWare Technical Journal, 2013.
[15]
A. Seznec, "Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB," IEEE Transactions on Computers, 2004.
[16]
B. Pham, V. Vaidyanathan, A. Jaleel, and A. Bhattacharj ee, "CoLT: Coalesced Large-Reach TLBs," MICRO, 2012.
[17]
B. Pham, A. Bhattacharjee, Y. Eckert, and G. Loh, "Increasing TLB Reach by Exploiting Clustering in Page Translations," HPCA, 2014.
[18]
A. Basu, J. Gandhi, J. Chang, M. Hill, and M. Swift, "Effic ient Virtual Memory for Big Memory Servers," ISCA, 2013.
[19]
A. Bhattacharjee, "Large-Reach Memory Management Unit Caches," MICRO, 2013.
[20]
R. Bhargava, B. Serebrin, F. Spadini, and S. Manne, "Accelerating Two-Dimensional Page Walks for Virtualized Systems," ASPLOS, 2008.
[21]
B. Pichai, L. Hsu, and A. Bhattacharjee, "Architectura l Support for Address Translation on GPUs," ASPLOS, 2014.
[22]
B. Pichai, L. Hsu, and A. Bhattacharjee, "Address Translation for Throughput Oriented Accelerators," IEEE Micro Top Picks, 2015.
[23]
J. Power, M. Hill, and D. Wood, "Supporting x86-64 Addre ss Translation for 100s of GPU Lanes," HPCA, 2014.
[24]
N. Agarwal, D. Nellans, M. O'Connor, S. Keckler, and T. Wenisch, "Unlocking Bandwidth for GPUs in CC-NUMA Systems," HPCA, 2015.
[25]
N. Agarwal, D. Nellans, M. Stephenson, M. O'Connor, and S. Keckler, "Page Placement Strategies for GPUs within Heterogeneous Memory Systems," ASPLOS, 2015.
[26]
G. Kyriazis, "Heterogeneous System Architecture: A Te chnical Review," Whitepaper, 2012.
[27]
J. Vesely, A. Basu, M. Oskin, G. Loh, and A. Bhattacharjee, "Observations and Opportunities in Architecting Shared Virtual Memory for Heterogeneous Systems," ISPASS, 2016.
[28]
T. Zheng, D. Nellans, A. Zulfiqar, M. Stephenson, and S. Keckler, "Towards a High Performance Paged Memory for GPUs," HPCA, 2016.
[29]
V. Karakostas, J. Gandhi, F. Ayar, A. Cristal, M. Hill, K. McKinley, M. Nemirovsky, M. Swift, and O. Unsal, "Redundant Memory Mappings for Fast Access to Large Memories," ISCA, 2015.
[30]
Intel, "Intel 64 and IA-32 Architectures Software Deve loper's Manual," 2016.
[31]
D. Lustig, G. Sethi, M. Martonosi, and A. Bhattacharjee, "COATCheck: Verifying Memory Ordering at the Hardware-OS Interface," ASPLOS, 2016.
[32]
B. Romanescu, A. Lebeck, and D. Sorin, "Specifying and Dynamically Verifying Address Translation-Aware Memory Consistency," ASPLOS, 2010.
[33]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi, "CACTI 6.0: A Tool to Model Large Caches," MICRO, 2007.
[34]
A. Basu, M. Hill, and M. Swift, "Reducing Memory Reference Energy with Opportunistic Virtual Caching," ISCA, 2012.
[35]
A. Seznec, "A Case for Two-Way Skewed Associative Cache," ISCA, 1993.
[36]
F. Bodin and A. Seznec, "Skewed Associativity Enhances Performance Predictability," ISCA, 1995.
[37]
D. Sanchez and C. Kozyrakis, "The ZCache: Decoupling Ways and Associativity," MICRO, 2010.
[38]
R. Sampson and T. Wenisch, "Z-Cache Skewered," WDDD, 2011.
[39]
A. Bhattacharjee, D. Lustig, and M. Martonosi, "Shared Last-Level TLBs for Chip Multiprocessors," HPCA, 2011.
[40]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lown ey, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," PLDI, 2005.
[41]
C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Simp lications," IISWC, 2008.
[42]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisaf aee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, "Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware," ASPLOS, 2012.
[43]
S. Che, J. Sheaffer, M. Boyer, L. Szafaryn, L. Wang, and K. Skadron, "A Characterization of the Rodinia Benchmark Suite with Comparison to Contemporary CMP Workloads," IISWC, 2010.
[44]
A. Arcangeli, "Transparent Hugepage Support," KVM Forum, 2010.
[45]
A. Clements, F. Kaashoek, and N. Zeldovich, "Scalable Address Spaces Using RCU Balanced Trees," ASPLOS, 2012.
[46]
A. Bhattacharjee, "Translation-Triggered Prefetching," ASP-LOS, 2017.
[47]
B. Pham, J. Vesely, G. Loh, and A. Bhattacharjee, "Using TLB Speculation to Overcome Page Splintering in Virtual Machines," Rutgers Technical Report DCS-TR-713, 2015.
[48]
F. Guo, S. Kim, Y. Baskakov, and I. Banerjee, "Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi," VEE, 2015.
[49]
F. Gaud, B. Lepers, J. Decouchant, J. Funston, and A. Fedorova, "Large Pages May be Harmful on NUMA Systems," USENIX ATC, 2014.
[50]
J. Gandhi, M. Hill, and M. Swift, "Agile Paging: Exceedi ng the Best of Nested and Shadow Paging," ISCA, 2016.

Cited By

View all

Index Terms

  1. Efficient Address Translation for Architectures with Multiple Page Sizes

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 52, Issue 4
      ASPLOS '17
      April 2017
      811 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3093336
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
        April 2017
        856 pages
        ISBN:9781450344654
        DOI:10.1145/3037697
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 April 2017
      Published in SIGPLAN Volume 52, Issue 4

      Check for updates

      Author Tags

      1. coalescing
      2. superpages
      3. tlb
      4. virtual memory

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)493
      • Downloads (Last 6 weeks)57
      Reflects downloads up to 18 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media