skip to main content
10.1145/3295500.3356139acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

LPCC: hierarchical persistent client caching for lustre

Published: 17 November 2019 Publication History

Abstract

Most high-performance computing (HPC) clusters use a global parallel file system to enable high data throughput. The parallel file system is typically centralized and its storage media are physically separated from the compute cluster. Compute nodes as clients of the parallel file system are often additionally equipped with SSDs. The node internal storage media are rarely well-integrated into the I/O and compute workflows. How to make full and flexible use of these storage media is therefore a valuable research question.
In this paper, we propose a hierarchical Persistent Client Caching (LPCC) mechanism for the Lustre file system. LPCC provides two modes: RW-PCC builds a read-write cache on the local SSD of a single client; RO-PCC distributes a read-only cache over the SSDs of multiple clients. LPCC integrates with the Lustre HSM solution and the Lustre layout lock mechanism to provide consistent persistent caching services for I/O applications running on client nodes, meanwhile maintaining a global unified namespace of the entire Lustre file system. The evaluation results presented in this paper show LPCC's advantages for various workloads, enabling even speed-ups linear in the number of clients for several real-world scenarios.

References

[1]
Marc Abrams, Charles R. Standridge, Ghaleb Abdulla, Edward A. Fox, and Stephen M. Williams. 1996. Removal Policies in Network Caches for World-Wide Web Documents. In Proceedings of the ACM SIGCOMM 1996 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, Stanford, CA. 293--305.
[2]
Marc Abrams, Charles R. Standridge, Ghaleb Abdulla, Stephen M. Williams, and Edward A. Fox. 1996. Caching Proxies: Limitations and Potentials. World Wide Web Journal 1, 1 (1996).
[3]
Jens Axboe. 2018. fio: Flexible I/O Tester. http://freecode.com/projects/fio.
[4]
BeeGFS. [n. d.]. BeeOND: BeeGFS On Demand. https://www.beegfs.io/wiki/BeeOND.
[5]
Babak Behzad, Surendra Byna, Prabhat, and Marc Snir. 2015. Pattern-driven parallel I/O tuning. In Proceedings of the 10th Parallel Data Storage Workshop (PDSW), Austin, Texas, USA. 43--48.
[6]
Peter Braam. 2002. Lustre: the intergalactic file system. In Proceedings of the Ottawa Linux Symposium, Ottawa, Ontario Canada. 50--54.
[7]
Steve Byan, James Lentini, Anshul Madan, and Luis Pabon. 2012. Mercury: Hostside flash caching for the data center. In IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), Asilomar Conference Grounds, Pacific Grove, CA, USA. 1--12.
[8]
Brent Callaghan. 2000. NFS Illustrated. Addison-Wesley Longman Ltd., Essex, UK.
[9]
Philip H. Carns, Kevin Harms, William E. Allcock, Charles Bacon, Samuel Lang, Robert Latham, and Robert B. Ross. 2011. Understanding and improving computational science storage access through continuous characterization. In IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), Denver, Colorado, USA. 1--14.
[10]
Cray. [n. d.]. Cray DATAWARP Applications I/O Accelerator. http://www.cray.com/datawarp.
[11]
George Crump. 2018. What the Enterprise Needs to Learn from HPC Environments - DDN Briefing Note. https://storageswiss.com/2018/01/10/enterprise-needs-to-learn-from-hpc-environments.
[12]
DDN. 2018. Infinite Memory Engine. https://www.ddn.com/products/ime-flash-native-data-cache.
[13]
Henri Doreau. 2015. Distributed Lustre activity tracking. CoRR abs/1505.02656 (2015).
[14]
Marc Eshel, Roger L. Haskin, Dean Hildebrand, Manoj Naik, Frank B. Schmuck, and Renu Tewari. 2010. Panache: A Parallel File System Cache for Global File Access. In 8th USENIX Conference on File and Storage Technologies (FAST), San Jose, CA, USA. 155--168.
[15]
Ribel Fares, Brian Romoser, Ziliang Zong, Mais Nijim, and Xiao Qin. 2012. Performance Evaluation of Traditional Caching Policies on a Large System with Petabytes of Data. In Seventh IEEE International Conference on Networking, Architecture, and Storage (NAS), Xiamen, China. 227--234.
[16]
Joe Gmitter. 2018. Small File IO Performance in Lustre. http://cdn.opensfs.org/wp-content/uploads/2018/04/Gmitter-Intel_Small_File_IO_Perf_DataOnMDT.pdf. LUG2018.
[17]
GrauData. 2016. XtreemStore - A Scalable Storage Management Software without Limits. http://konferenz-nz.dlr.de/pages/storage2016/present/1.%20Konferenztag/10_16_06_16_grau_data.pdf.
[18]
Ajay Gulati, Manoj Naik, and Renu Tewari. 2007. Nache: Design and Implementation of a Caching Proxy for NFSv4. In 5th USENIX Conference on File and Storage Technologies (FAST), San Jose, CA, USA. 199--214.
[19]
Salman Habib, Vitali A. Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, and Katrin Heitmann. 2013. HACC: extreme scaling and performance across diverse architectures. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Denver, CO, USA. 6:1--6:10.
[20]
Jun He, John Bent, Aaron Torres, Gary Grider, Garth A. Gibson, Carlos Maltzahn, and Xian-He Sun. 2013. I/O acceleration with pattern detection. In The 22nd International Symposium on High-Performance Parallel and Distributed Computing (HPDC), New York, NY, USA. 25--36.
[21]
Jun He, Xian-He Sun, and Rajeev Thakur. 2012. KNOWAC: I/O Prefetch via Accumulated Knowledge. In 2012 IEEE International Conference on Cluster Computing (CLUSTER), Beijing, China. 429--437.
[22]
Richard Hedges, Bill Loewe, and Tyce T. McLarty and Chris Morrone. 2005. Parallel File System Testing for the Lunatic Fringe: The Care and Feeding of Restless I/O Power Users. In 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST), 11--14 April 2005, Monterey, CA, USA. 3--17.
[23]
Eric Van Hensbergen and Ming Zhao. 2006. Dynamic Policy Disk Caching for Storage Networking. Technical Report. IBM Corporation.
[24]
John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, Mahadev Satyanarayanan, Robert N. Sidebotham, and Michael J. West. 1988. Scale and Performance in a Distributed File System. ACM Trans. Comput. Syst. 6, 1 (1988), 51--81.
[25]
David Howells. 2006. Fs-cache: A network filesystem caching facility. Proceedings of the Linux Symposium, Ottawa, Ontario Canada (2006), 427--440.
[26]
IBM. 2018. Local read-only cache. https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.adm.doc/bl1adm_lroc.htm.
[27]
Theodore Johnson and Dennis E. Shasha. 1994. 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. In Proceedings of 20th International Conference on Very Large Data Bases (VLDB), Santiago de Chile, Chile. 439--450.
[28]
Ashok M. Joshi. 1991. Adaptive Locking Strategies in a Multi-node Data Sharing Environment. In 17th International Conference on Very Large Data Bases (VLDB), Barcelona, Catalonia, Spain. 181--191.
[29]
Taeho Kgil and Trevor N. Mudge. 2006. FlashCache: a NAND flash memory file cache for low power web servers. In Proceedings of the 2006 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Seoul, Korea. 103--112.
[30]
James J. Kistler and Mahadev Satyanarayanan. 1992. Disconnected Operation in the Coda File System. ACM Trans. Comput. Syst. 10, 1 (1992), 3--25.
[31]
Ioannis Koltsidas, Slavisa Sarafijanovic, Martin Petermann, Nils Haustein, Harald Seipp, Robert Haas and Jens Jelitto, Thomas Weigold, Edwin R. Childers, David Pease, and Evangelos Eleftheriou. 2015. Seamlessly integrating disk and tape in a multi-tiered distributed file system. In 31st IEEE International Conference on Data Engineering (ICDE), Seoul, South Korea, April 13--17, 2015. 1328--1339.
[32]
Nancy P. Kronenberg, Henry M. Levy, and William D. Strecker. 1986. VAXclusters: A Closely-Coupled Distributed System. ACM Trans. Comput. Syst. 4, 2 (1986), 130--146.
[33]
LANL. 2018. HPC-5 open source software projects: LANL-Trace. http://institute.lanl.gov/data/software/lanl-trace.
[34]
Frank Leers and Shuichi Ihara. 2018. Early Performance Investigation of Data on Metadata. http://cdn.opensfs.org/wp-content/uploads/2018/04/Leers-Lustre-Data_on_MDT_An_Early_Look_DDN.pdf. LUG2018.
[35]
Thomas Leibovici. 2015. Taking back control of HPC file systems with Robinhood Policy Engine. CoRR abs/1505.01448 (2015).
[36]
Ning Liu, Jason Cope, Philip H. Carns, Christopher D. Carothers, Robert B. Ross, Gary Grider, Adam Crume, and Carlos Maltzahn. 2012. On the role of burst buffers in leadership-class storage systems. In IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), Asilomar Conference Grounds, Pacific Grove, CA, USA. 1--11.
[37]
Lustre. 2017. Layout Enhancement High Level Design. http://wiki.lustre.org/Layout_Enhancement_High_Level_Design.
[38]
Lustre. 2017. Lustre Architecture. http://wiki.lustre.org/images/6/64/LustreArchitecture-v4.pdf.
[39]
Lustre. 2018. Data on MDT Solution Architecture. http://wiki.lustre.org/Data_on_MDT_Solution_Architecture.
[40]
Lustre. 2018. File Level Redundancy Solution Architecture. http://wiki.lustre.org/File_Level_Redundancy_Solution_Architecture.
[41]
Lustre. 2018. Lustre Manual. http://doc.lustre.org/lustre_manual.pdf.
[42]
Chris Mason. 2008. Compilebench. https://oss.oracle.com/~mason/compilebench/.
[43]
Nimrod Megiddo and Dharmendra S. Modha. 2003. ARC: A Self-Tuning, Low Overhead Replacement Cache. In Proceedings of the FAST '03 Conference on File and Storage Technologies (FAST), Cathedral Hill Hotel, San Francisco, California, USA.
[44]
Sandra Mendez and Sebastian Lührs. 2019. Best Practice Guide - Parallel I/O. http://www.prace-ri.eu/best-practice-guide-parallel-i-o/.
[45]
Xiaoxuan Meng, Chengxiang Si, Wenwu Na, and Lu Xu. 2009. P-Cache: Providing Prioritized Caching Service for Storage System. In IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Chengdu, Sichuan, China. 3--10.
[46]
Adam Moody, Greg Bronevetsky, Kathryn Mohror, and Bronis R. de Supinski. 2010. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System. In Conference on High Performance Computing Networking, Storage and Analysis (SC), New Orleans, LA, USA. 1--11.
[47]
Torben Kling Petersen. 2015. Inside the Lustre File System. Technical Report. Seagate.
[48]
Stefan Podlipnig and László Böszörményi. 2003. A survey of Web cache replacement strategies. ACM Comput. Surv. 35, 4 (2003), 374--398.
[49]
John T. Robinson and Murthy V. Devarakonda. 1990. Data Cache Management Using Frequency-Based Replacement. In Proceedings of the 1990 ACM SIGME-TRICS conference on Measurement and modeling of computer systems, University of Colorado, Boulder, Colorado, USA. 134--142.
[50]
Philip C. Roth. 2007. Characterizing the I/O behavior of scientific applications on the Cray XT. In Proceedings of the 2nd International Petascale Data Storage Workshop (PDSW), Reno, Nevada, USA. 50--55.
[51]
Frank B. Schmuck and Roger L. Haskin. 2002. GPFS: A Shared-Disk File System for Large Computing Clusters. In Proceedings of the Conference on File and Storage Technologies (FAST), Monterey, California, USA. 231--244.
[52]
Seetharami Seelam, I-Hsin Chung, Ding-Yong Hong, Hui-Fang Wen, and Hao Yu. 2008. Early experiences in application level I/O tracing on blue gene systems. In 22nd IEEE International Symposium on Parallel and Distributed Processing (IPDPS), Miami, Florida USA,. 1--8.
[53]
Hongzhang Shan and John Shalf. 2007. Using IOR to analyze the I/O Performance for HPC Platforms. Technical Report. Lawrence Berkeley National Laboratory.
[54]
Liu Shi, Zhenjun Liu, and Lu Xu. 2012. BWCC: A FS-Cache Based Cooperative Caching System for Network Storage System. In 2012 IEEE International Conference on Cluster Computing (CLUSTER), Beijing, China. 546--550.
[55]
Gopalan Sivathanu and Erez Zadok. 2005. A Versatile Persistent Caching Framework for File Systems. Technical Report FSL-05-05. Stony Brook University.
[56]
Shane Snyder, Philip H. Carns, Kevin Harms, Robert B. Ross, Glenn K. Lockwood, and Nicholas J. Wright. 2016. Modular HPC I/O Characterization with Darshan. In 5th Workshop on Extreme-Scale Programming Tools, ESPT@SC 2016, Salt Lake City, UT, USA, November 13, 2016. 9--17.
[57]
Shane Snyder, Philip H. Carns, Robert Latham, Misbah Mubarak, Robert B. Ross, Christopher D. Carothers, Babak Behzad, Huong Vu Thanh Luu, Surendra Byna, and Prabhat. 2015. Techniques for modeling large-scale HPC I/O workloads. In Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems (PMBS), Austin, Texas, USA, November 15, 2015. 5:1--5:11.
[58]
William Stearns and Kent Overstreet. 2010. Bcache: Caching beyond just RAM. lwn.net
[59]
Vasily Tarasov, Erez Zadok, and Spencer Shepler. 2016. Filebench: A Flexible Framework for File System Benchmarking. ;login: The Usenix Magazine 41, 1 (2016), 6--12.
[60]
Alfred Torrez, Ruth Klundt, and William Loewe. 2013. mdtest HPC Benchmark. https://sourceforge.net/projects/mdtest.
[61]
Marc-Andre Vef, Nafiseh Moti, Tim Süß, Tommaso Tocci, Ramon Nou, Alberto Miranda, Toni Cortes, and André Brinkmann. 2018. GekkoFS - A Temporary Distributed File System for HPC Applications. In IEEE International Conference on Cluster Computing (CLUSTER), Belfast, UK, September 10--13. 319--324.
[62]
Jeffrey S. Vetter and Michael O. McCracken. 2001. Statistical scalability analysis of communication operations in distributed applications. In Proceedings of the 2001 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP), Snowbird, Utah, USA. 123--132.
[63]
Feiyi Wang, Sarp Oral, Galen Shipman, Oleg Drokin, Tom Wang, and Isaac Huang. 2009. Understanding Lustre Filesystem Internals. Technical Report. OakRidge National Laboratory.
[64]
Teng Wang, Kathryn Mohror, Adam Moody, Kento Sato, and Weikuan Yu. 2016. An ephemeral burst-buffer file system for scientific applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Salt Lake City, UT, USA. 807--818.
[65]
Teng Wang, Adam Moody, Yue Zhu, Kathryn Mohror, Kento Sato, Tanzima Islam, and Weikuan Yu. 2017. MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, USA. 1174--1183.
[66]
Teng Wang, Sarp Oral, Michael Pritchard, Bin Wang, and Weikuan Yu. 2015. TRIO: Burst Buffer Based I/O Orchestration. In IEEE International Conference on Cluster Computing (CLUSTER), Chicago, IL, USA. 194--203.
[67]
Xue Wei and Li Xi. 2017. LCOC - Lustre Cache on Client. Presentation at the Lustre Administrator and Developer Workshop (LAD), Paris.
[68]
Darryl L. Willick, Derek L. Eager, and Richard B. Bunt. 1993. Disk Cache Replacement Policies for Network Fileservers. In Proceedings of the 13th International Conference on Distributed Computing Systems (ICDCS), Pittsburgh, Pennsylvania, USA. 2--11.
[69]
Nicholas J Wright, Wayne Pfeiffer, and Allan Snavely. 2009. Characterizing parallel scaling of scientific applications using IPM. In Proceedings of the 10th LCI International Conference on High-Performance Clustered Computing. 10--12.
[70]
Cong Xu, Suren Byna, Vishwanath Venkatesan, Robert Sisneros, Omkar Kulkarni, Mohamad Chaarawi, and Kalyana Chadalavada. 2016. LIOProf: Exposing Lustre File System Behavior for I/O middleware. In Cray User Group Conference (CUG).
[71]
Yuanyuan Zhou, James Philbin, and Kai Li. 2001. The Multi-Queue Replacement Algorithm for Second Level Buffer Caches. In Proceedings of the General Track: 2001 USENIX Annual Technical Conference (ATC), Boston, Massachusetts, USA. 91--104.

Cited By

View all
  • (2024)Combining buffered I/O and direct I/O in distributed file systemsProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650699(17-34)Online publication date: 27-Feb-2024
  • (2024)DPC: DPU-accelerated High-Performance File System ClientProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673123(63-72)Online publication date: 12-Aug-2024
  • (2024)Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future OpportunitiesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.340676435:9(1551-1564)Online publication date: Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2019
1921 pages
ISBN:9781450362290
DOI:10.1145/3295500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hierarchical storage management
  2. intelligent prefetching
  3. lustre
  4. persistent caching

Qualifiers

  • Research-article

Conference

SC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)148
  • Downloads (Last 6 weeks)21
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media