skip to main content
10.1145/3132402.3132409acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article
Public Access

Speculative paging for future NVM storage

Published: 02 October 2017 Publication History

Abstract

The quest for greater performance and efficiency has driven modern cloud applications towards "in-memory" implementations, such as memcached and Apache Spark. Looking forward, however, the costs of DRAM, due to its low area density and high energy consumption, may make this trend unsustainable. Traditionally, OS paging system mechanisms were intended to bridge the gap between expensive, under-provisioned DRAM and inexpensive, dense storage, however, in the past twenty years the latency of storage, relative to DRAM became too great to overcome without significant performance impact. Recent NVM storage devices, such as Intel Optane drives and aggressive, 3D flash SSDs, may dramatically change the picture for OS paging. These new drives are expected to provide much lower latency compared to the existing flash-based SSDs or traditional HDDs. Unfortunately, even these future NVM drives are still much too slow to replace DRAM, since the access latency of fast NVM storage is expected on the order of tens of microseconds, and they often require block-level access. Unlike traditional HDDs, for which the baseline OS paging policies are designed, these new SSDs place no penalty for "random" access and their access latency promises to be significantly less than traditional SSDs, thus arguing for a rearchitecting of the OS paging system.
In this paper, we propose SPAN (Speculative PAging for future NVM storage), a software-only, OS swap-based, page management and prefetching scheme designed for emerging NVM storage. Unlike the baseline OS swapping mechanism, which is highly optimized for traditional spinning disks, SPAN leverages the inherent parallelism of NVM devices to proactivley fetch a set of pages from NVM storage to the small and fast main DRAM. In doing so, SPAN yields a speedup of ~18% versus swapping into the NVM with the baseline OS (~50% of the performance lost by the baseline OS versus placing the entire working set in DRAM memory). The proposed technique thus enables the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system.

References

[1]
Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. 2013. Efficient Virtual Memory for Big Memory Servers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 237--248.
[2]
Pei Cao, Edward W Felten, and Kai Li. 1994. Implementation and performance of application-controlled file caching. In Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation. USENIX Association, 13.
[3]
Tien-Fu Chen and Jean-Loup Baer. 1995. Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput. 44, 5 (May 1995), 609--623.
[4]
Suock Chung, K. M. Rho, S. D. Kim, H. J. Suh, D. J. Kim, H. J. Kim, S. H. Lee, J. H. Park, H. M. Hwang, S. M. Hwang, J. Y. Lee, Y. B. An, J. U. Yi, Y. H. Seo, D. H. Jung, M. S. Lee, S. H. Cho, J. N. Kim, G. J. Park, Gyuan Jin, A. Driskill-Smith, V. Nikitin, A. Ong, X. Tang, Yongki Kim, J. S. Rho, S. K. Park, S. W. Chung, J. G. Jeong, and S. J. Hong. 2010. Fully integrated 54nm STT-RAM with the smallest bit cell dimension for high density memory application. In Electron Devices Meeting (IEDM), 2010 IEEE International. 12.7.1--12.7.4.
[5]
Peter J. Denning. 1970. Virtual Memory. ACM Comput. Surv. 2, 3 (Sept. 1970), 153--189.
[6]
Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, USA, 37--48.
[7]
M. Ghosh and H. H. S. Lee. 2007. Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). 134--145.
[8]
James Griffioen and Randy Appleton. 1994. Reducing File System Latency Using a Predictive Approach. In Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1 (USTC'94). USENIX Association, Berkeley, CA, USA, 13--13. http://dl.acm.org/citation.cfm?id=1267257.1267270
[9]
Intel Optane SSD DC P4800X Series 2017. Intel Optane SSD DC P4800X Serie. (2017). http://www.intel.com/content/www/us/en/solid-state-drives/optane-ssd-dc-p4800x-brief.html
[10]
Intel XPoint 2015. Intel 3D XPoint Technology. (2015). https://www.intelsalestraining.com/infographics/memory/3DXPointc.pdf
[11]
Yasuo Ishii, Mary Inaba, and Kei Hiraki. 2009. Access Map Pattern Matching for Data Cache Prefetch. In Proceedings of the 23rd International Conference on Supercomputing (ICS '09). ACM, New York, NY, USA, 499--500.
[12]
N. D. E. Jerger, E. L. Hill, and M. H. Lipasti. 2006. Friendly fire: understanding the effects of multiprocessor prefetches. In 2006 IEEE International Symposium on Performance Analysis of Systems and Software. 177--188.
[13]
Song Jiang, Feng Chen, and Xiaodong Zhang. 2005. CLOCK-Pro: An Effective Improvement of the CLOCK Replacement. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC '05). USENIX Association, Berkeley, CA, USA, 35--35. http://dl.acm.org/citation.cfm?id=1247360.1247395
[14]
David Kadjo, Jinchun Kim, Prabal Sharma, Reena Panda, Paul Gratz, and Daniel Jimenez. 2014. B-fetch: Branch prediction directed prefetching for chip-multiprocessors. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 623--634.
[15]
Scott F. Kaplan, Lyle A. McGeoch, and Megan F. Cole. 2002. Adaptive Caching for Demand Prepaging. In Proceedings of the 3rd International Symposium on Memory Management (ISMM '02). ACM, New York, NY, USA, 114--126.
[16]
Jinchun Kim, Seth H. Pugsley, Paul V. Gratz, A. L. Narasimha Reddy, Chris Wilkerson, and Zeshan Christi. 2016. Path Confidence based Lookahead Prefetching. In Microarchitecture, 2016. MICRO-49. 49th Annual IEEE/ACM International Symposium on. IEEE.
[17]
Con Kolivas. 2005. Linux Swap Prefetching. (2005). https://lwn.net/Articles/153353/
[18]
Christos Kozyrakis, Aman Kansal, Sriram Sankar, and Kushagra Vaid. 2010. Server Engineering Insights for Large-Scale Online Services. IEEE Micro 30, 4 (July 2010), 8--19.
[19]
Min Li, Jian Tan, Yandong Wang, Li Zhang, and Valentina Salapura. 2015. Spark-Bench: A Comprehensive Benchmarking Suite for in Memory Data Analytic Platform Spark. In Proceedings of the 12th ACM International Conference on Computing Frontiers (CF '15). ACM, New York, NY, USA, Article 53, 8 pages.
[20]
Gabriel H. Loh and Mark D. Hill. 2012. Supporting Very Large DRAM Caches with Compound-Access Scheduling and MissMap. Micro, IEEE 32, 3 (2012), 70--78.
[21]
Nimrod Megiddo and Dharmendra S. Modha. 2003. ARC: A Self-Tuning, Low Overhead Replacement Cache. In Proceedings of the 2Nd USENIX Conference on File and Storage Technologies (FAST '03). USENIX Association, Berkeley, CA, USA, 115--130. http://dl.acm.org/citation.cfm?id=1090694.1090708
[22]
Justin Meza, Jichuan Chang, HanBin Yoon, Onur Mutlu, and Parthasarathy Ranganathan. 2012. Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management. IEEE Comput. Archit. Lett. 11, 2 (July 2012), 61--64.
[23]
Pierre Michaud. 2016. A best-offset prefetcher. In High Performance Computer Architecture (HPCA), 2016 IEEE 20th International Symposium on. IEEE.
[24]
Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (nsdi'13). USENIX Association, Berkeley, CA, USA, 385--398. http://dl.acm.org/citation.cfm?id=2482626.2482663
[25]
YunjooPark and Hyokyung Bahn. 2015. Management of Virtual Memory Systems Under High Performance PCM-based Swap Devices. In Proceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference - Volume 02 (COMPSAC '15). IEEE Computer Society, Washington, DC, USA, 764--772.
[26]
R. H. Patterson, G. A. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. 1995. Informed Prefetching and Caching. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP '95). ACM, New York, NY, USA, 79--95.
[27]
Matthew Poremba, Tao Zhang, and Yuan Xie. 2016. Fine-granularity Tile-level Parallelism in Non-volatile Memory Architecture with Two-dimensional Bank Subdivision. In Proceedings of the 53rd Annual Design Automation Conference (DAC '16). ACM, New York, NY, USA, Article 168, 6 pages.
[28]
S. H. Pugsley, Z. Chishti, C. Wilkerson, P. f. Chuang, R. L. Scott, A. Jaleel, S. L. Lu, K. Chow, and R. Balasubramonian. 2014. Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 626--637.
[29]
Moinuddin K. Qureshi and Gabriel H. Loh. 2012. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '12). IEEE Computer Society, Washington, DC, USA, 235--246.
[30]
Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page Placement in Hybrid Memory Systems. In Proceedings of the International Conference on Supercomputing (ICS '11). ACM, New York, NY, USA, 85--95.
[31]
S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rettner, Y. C. Chen, R. M. Shelby, M. Salinga, D. Krebs, S. H. Chen, H. L. Lung, and C. H. Lam. 2008. Phase-change random access memory: A scalable technology. IBM Journal of Research and Development 52, 4.5 (July 2008), 465--479.
[32]
John T. Robinson and Murthy V. Devarakonda. 1990. Data Cache Management Using Frequency-based Replacement. In Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '90). ACM, New York, NY, USA, 134--142.
[33]
Vivek Seshadri, Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry, and Trishul Chilimbi. 2015. Page Overlays: An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 79--91.
[34]
Kai Shen and Stan Park. 2013. FlashFQ: A Fair Queueing I/O Scheduler for Flash-based SSDs. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (USENIX ATC'13). USENIX Association, Berkeley, CA, USA, 67--78. http://dl.acm.org/citation.cfm?id=2535461.2535471
[35]
Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian, Chris Wilkerson, Seth H. Pugsley, and Zeshan Chishti. 2015. Efficiently Prefetching Complex Address Patterns. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 141--152.
[36]
Abraham Silberschatz, Peter Baer Galvin, and Greg Gagne. 2008. Operating System Concepts (8th ed.). Wiley Publishing.
[37]
Sivashankar and S. Ramasamy. 2014. Design and implementation of non-volatile memory express. In Recent Trends in Information Technology (ICRTIT), 2014 International Conference on. 1--6.
[38]
Stephen Somogyi, Thomas F. Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. 2006. Spatial Memory Streaming. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA '06). IEEE Computer Society, Washington, DC, USA, 252--263.
[39]
S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt. 2007. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. 63--74.
[40]
Andrew S. Tanenbaum and Albert S. Woodhull. 1997. Operating Systems (2Nd Ed.): Design and Implementation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
[41]
Kishor S Trivedi. 1979. An analysis of prepaging. Computing 22, 3 (1979), 191--210.
[42]
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: cluster computing with working sets. HotCloud 10 (2010), 10--10.
[43]
Y. Zhou, Z. Chen, and K. Li. 2004. Second-level buffer cache management. IEEE Transactions on Parallel and Distributed Systems 15, 6 (June 2004), 505--519.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '17: Proceedings of the International Symposium on Memory Systems
October 2017
409 pages
ISBN:9781450353359
DOI:10.1145/3132402
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. memory system
  2. paging
  3. prefetching

Qualifiers

  • Research-article

Funding Sources

Conference

MEMSYS 2017

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)204
  • Downloads (Last 6 weeks)32
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media