skip to main content
article

The Conquest file system: Better performance through a disk/persistent-RAM hybrid design

Published: 01 August 2006 Publication History

Abstract

Modern file systems assume the use of disk, a system-wide performance bottleneck for over a decade. Current disk caching and RAM file systems either impose high overhead to access memory content or fail to provide mechanisms to achieve data persistence across reboots.The Conquest file system is based on the observation that memory is becoming inexpensive, which enables all file system services to be delivered from memory, except for providing large storage capacity. Unlike caching, Conquest uses memory with battery backup as persistent storage, and provides specialized and separate data paths to memory and disk. Therefore, the memory data path contains no disk-related complexity. The disk data path consists of optimizations only for the specialized disk usage pattern.Compared to a memory-based file system, Conquest incurs little performance overhead. Compared to several disk-based file systems, Conquest achieves 1.3x to 19x faster memory performance, and 1.4x to 2.0x faster performance when exercising both memory and disk.Conquest realizes most of the benefits of persistent RAM at a fraction of the cost of a RAM-only solution. It also demonstrates that disk-related optimizations impose high overheads for accessing memory content in a memory-rich environment.

References

[1]
APC. 2005. SMART-UPS. http://www.apc.com.]]
[2]
Anderson, D., Chase, J., and Vahdat, A. 2000. Interposed request routing for scalable network storage. In Proceedings of the 4th Symposium on Operating System Design and Implementation. San Diego, CA.]]
[3]
Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In Proceedings of the 13th Symposium on Operating Systems Principles. Pacific Grove, CA.]]
[4]
Baker, M., Asami, S., Deprit, E., Ousterhout, J., and Seltzer, M. 1992. Non-volatile memory for fast, reliable file systems. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. Boston, MA.]]
[5]
BITMICRO. 2005. High-End solid state disk. http://www.bitmicro.com/products_edisk_25_scsin.php.]]
[6]
Boeve, H., Bruynseraede, C., Das, J., Dessein, K., Borghs, G., de Boeck, J., Sousa, R., Melo, L., and Freitas, P. 1999. Technology assessment for the implementation of magnetoresistive elements with semiconductor components in magnetic random access memory (MRAM) architectures. IEEE Trans. Magnet. 35, 5, 2820--2825.]]
[7]
Bolosky, W. J., Fitzgerald, R. P., and Douceur, J. R. 1997. Distributed schedule management in the Tiger video fileserver. In Proceedings of the 16th ACM Symposium on Operating Systems Principles. Saint-Malo, France.]]
[8]
Bonwick, J. 1994. The slab allocator: An object-caching kernel memory allocator. In Proceedings of the USENIX Summer Technical Conference. Boston, MA.]]
[9]
Bozman, G. P., Ghannad, H. H., and Weinberger, E. D. 1991. A trace-driven study of CMS file references. IBM J. Res. Dev. 35, 5--6, 815--828.]]
[10]
Cáceres, R., Douglis, F., Li, K., and Marsh, B. 1993. Operating system implications of solid-state mobile computers. Tech. rep. MITL-TR-56-93, Matsushita Information Technology Laboratory, United States.]]
[11]
Card, R., Ts'o, T., and Tweedie, S. 1994. Design and implementation of the second extended filesystem. In Proceedings of the 1st Dutch International Symposium on Linux. ISBN 90-367-0385-9.]]
[12]
Chen, P. M., Ng, W. T., Chandra, S., Aycock, C., Rajamani, G., and Lowell, D. 1996. The Rio file cache: Surviving operating system crashes. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Cambridge, MA.]]
[13]
Chen, S. and Thapar, M. 1997. A novel video layout strategy for near-video-on-demand servers. Tech. rep. HPL-97-52. Hewlett-Packard Laboratories.]]
[14]
DELL. 2002. Determining the availability and reliability of storage configurations. http://www1.us.dell.com/content/topics/global.aspx/power/en/ps3q02_shetty?c=us&l=en&s=corp. Google keywords: Dell, reliability, MTBF, hours.]]
[15]
Dewitt, D. J., Katz, R. H., Olken, F., Shapiro, L. D., Stonebraker, M., and Wood, D. A. 1984. Implementation techniques for main memory database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]]
[16]
Douceur, J. R. and Bolosky, W. J. 1999. A large-scale study of file-system contents. In Proceedings of the ACM Sigmetrics International Conference on Measurement and Modeling of Computer Systems. Atlanta, GA.]]
[17]
Douglis, F., Cáceres, R., Kaashoek, F., Li, K., Marsh, B., and Tauber, J. A. 1994. Storage alternatives for mobile computers. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation. Monterey, CA.]]
[18]
Edel, N. K., Tuteja, D., Miller, M. L., and Brandt, S. A. 2004. MRAMFS: A compressing file system for non-volatile RAM. In Proceedings of the 12th IEEE/ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. Volendam, the Netherlands.]]
[19]
Eich, M. H. 1987. A classification and comparison of main memory database recovery techniques. In Proceedings of the 3rd International Conference on Data Engineering. Los Angeles, CA.]]
[20]
Evans, K. M. and Kuenning, G. K. 2002. A study of irregularities in file-size distributions. In Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems. San Diego, CA.]]
[21]
Fagin, R., Nievergelt, J., Pippenger, N., and Strong, H. R. 1979. Extensible hashing---A fast access method for dynamic files. ACM Trans. Datab. Syst. 4, 3, 315--344.]]
[22]
Gal, E. and Toledo, S. 2005. A transactional flash file system for microcontrollers. In Proceedings of the USENIX Annual Technical Conference. Anaheim, CA.]]
[23]
Ganger, G. R. and Patt, Y. N. 1994. Metadata update performance in file systems. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation.]]
[24]
Ganger, G. R., Mckusick, M. K., Soules, C. A. N., and Patt, Y. N. 2000. Soft updates: A solution to the metadata update problem in file systems. ACM Trans. Comput. Syst. 18, 2, 127--153.]]
[25]
Garcia-Molina, H. and Salem, K. 1987. High performance transaction processing with memory resident data. In Proceedings of the 2nd International Workshop on High Performance Transaction Systems. Pacific Grove, CA.]]
[26]
Garcia-Molina, H. and Salem, K. 1992. Main memory database systems: An overview. IEEE Trans. Know. Data Eng. 4, 6, 509--516.]]
[27]
Gawlick, D. and Kinkade, D. 1985. Varieties of concurrency control in MIS/VS fast path. IEEE Datab. Eng. 8, 2, 3--10.]]
[28]
Gibson, G. A. and Patterson, D. A. 1993. Designing disk arrays for high data reliability. J. Parallel. Distribut. Comput. 17, 1--2, 4--27.]]
[29]
Grochowski, E. and Halem, R. D. 2003. Technological impact of magnetic hard disk drives on storage systems. IBM Syst. J. 42, 2. http://www.research.ibm.com/journal/sj/422/grochowski.html.]]
[30]
Hitz, D., Lau, J., and Malcolm, M. File system design for an NFS file server appliance. In Proceedings of the USENIX Winter Technical Conference. San Francisco, CA.]]
[31]
Howard, J., Kazar, M., Menees, S., Nichols, D., Satyanarayanan, M., Sidebotham, R., and West, M. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1, 51--81.]]
[32]
IBM. 2003. IBM iSeries storage overview. http://www-1.ibm.com/servers/eserver/iseries/hardware/storage/overview.html.]]
[33]
Irlam, G. 1993. UNIX file size survey---1993. http://www.base.com/gordoni/ufs93.html.]]
[34]
Katcher, J. 1997. PostMark: A new file system benchmark. Tech. Rep. TR3022. Network Appliance, Inc.]]
[35]
Kawaguichi, A., Nishioka, S., and Motoda, H. 1995. A flash-memory-based file system. In Proceedings of the USENIX Winter Technical Conference. New Orleans, LA.]]
[36]
Kerekes, Z. 2005. Charting the rise of the solid state disk market. http://www.storagesearch.com/chartingtheriseofssds.html.]]
[37]
Kleiman, S. R. 1986. Vnodes: An architecture for multiple file system types in Sun UNIX. In Proceedings of the Summer USENIX Conference. Atlanta, GA.]]
[38]
Lehman, T. J. and Carey, M. J. 1987. A recovery algorithm for a high-performance memory-resident database system. In Proceedings of the ACM SIGMOD Conference. San Francisco, CA.]]
[39]
Li, K. and Naughton, J. F. 1988. Multiprocessor main memory transaction processing. In Proceedings of the International Symposium on Databases in Parallel and Distributed Systems. Austin, TX.]]
[40]
Liebert Cooperation. 2005. Field MTBF numbers: What do they really mean? http://www.liebert.com/support/whitepapers/documents/techmtbf.asp.]]
[41]
Mahanti, A., Williamson, C., and Eager, D. 2000. Traffic analysis of a web proxy caching hierarchy. IEEE Netw. Magazine: Special Issue on Web Performance 14, 3, 16--23.]]
[42]
McKusick, M. K., Joy, W. N., Leffler, S. J., and Fabry, R. S. 1984. A fast file system for UNIX. ACM Trans. Comput. Syst. 2, 3, 181--197.]]
[43]
McKusick, M. K., Karels, M. J., and Bostic, K. 1990. A pageable memory based filesystem. In Proceedings of the Summer USENIX Conference. Anaheim, CA.]]
[44]
McKusick, M. K. and Ganger, G. R. 1991. Soft updates: A technique for eliminating most synchronous writes in the fast filesystem. In Proceedings of the USENIX Annual Technical Conference.]]
[45]
McKusick, M. K. 2002. Running “fsck” in the background. In Proceedings of the BSDCon Conference. San Francisco, CA.]]
[46]
MICRON. 1997. Module mean time between failures (MTBF). Tech. Note TN-04-45. http://download.micron.com/pdf/technotes/DT45.pdf.]]
[47]
MICROSOFT. 2003. Microsoft Windows CE 3.0: Files, databases, and persistent storage. MSDN Online Library. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncenet/html/systemmemorymgmtwince.asp.]]
[48]
Miles, J. B. 2000. Thin clients. Government Comput. News 6, 11. http://appserv.gcn.com/state/vol6_no11/guide/893-1.html.]]
[49]
Miller, E. L., Brandt, S. A., and Long, D. D. E. 2001. HerMES: High-performance reliable MRAM-enabled storage. In Proceedings of the 8th IEEE Workshop on Hot Topics in Operating Systems. Schloss Elmau, Germany.]]
[50]
NAMESYS. 2005. http://www.namesys.com.]]
[51]
Ng, N. T., Aycock, C. M., Rajamani, G., and Chen, P. M. 1996. Comparing disk and memory's resistance to operating system crashes. In Proceedings of the International Symposium on Software Reliability Engineering. Hong Kong, China.]]
[52]
Ng, N. T. and Chen, P. M. 2001. The design and verification of the Rio file cache. IEEE Trans. Comput. 50, 4, 322--337.]]
[53]
Niijima, H. 1995. Design of a solid-state file using flash EEPROM. IBM J. Res. Dev. 39, 5, 531--546.]]
[54]
Ousterhout, J. K., Da Costa, H., Harrison, D., Kunze, A., Kupfer, M., and Thompson, J. G. 1985. A trace driven analysis of the UNIX 4.2 BSD file systems. In Proceedings of the 10th ACM Symposium on Operating Systems Principles. Orcas Island, WA, 15--24.]]
[55]
PALM. 2004. Introduction to palm OS memory use. Palm OS Programmer's Companion, Vol. I. http://www.palmos.com/dev/support/docs/palmos/PalmOSCompanion/Memory.html.]]
[56]
PC WORLD. 2005. IRam speeds Windows XP startup. PC World. http://www.pcworld.com/news/article/0,aid,121105,00.asp.]]
[57]
Peacock, J. K., Kamaraju, A. and Agrawal, S. 1998. Fast consistency checking for the solaris file system. In Proceedings of the USENIX Annual Technical Conference. New Orleans, LA.]]
[58]
Peterson, J. L. and Norman, T. A. 1997. Buddy systems. Commun. ACM 20, 6, 421--431.]]
[59]
PRICE WATCH. 2005. Memory---System. http://www.pricewatch.com.]]
[60]
QUANTUM. 2003. Achieving real-time multimedia performance with multistream solid-state disk. http://uk.builder.com/whitepapers/0,39026692,60018746p-39000844q,00.htm.]]
[61]
Riedel, E. 1998. A performance study of sequential I/O on Windows NT 4. In Proceedings of the 2nd USENIX Windows NT Symposium. Seattle, WA.]]
[62]
Roselli, D., Lorch, J. R., and Anderson, T. E. 2000. A comparison of file system workloads. In Proceedings of the USENIX Annual Technical Conference. San Diego, CA.]]
[63]
Rosenblum, M. and Ousterhout, J. 1991. The design and implementation of a log-structured file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles. Pacific Grove, CA.]]
[64]
Schindler, J., Griffin, J. L., Lumb, C. R., and Ganger, G. R. 2002. Track-aligned extents: Matching access patterns to disk drive characteristics. In Proceedings of the USENIX File and Storage Technologies Conference. Monterey, CA.]]
[65]
SEAGATE. 2003. Cheetah 10K.6 reliability, performance, and low ownership cost. http://www.seagate.com.]]
[66]
Seltzer, M. I., Ganger, G. R., McKusick, M. K., Smith, K. A., Soules, C. A. N., and Stein, C. A. 2000. Journaling versus soft updates: Asynchronous meta-data protection in file systems. In Proceedings of the USENIX Annual Technical Conference. San Diego, CA.]]
[67]
Shankland, S. 2001. Transmeta taking Linux gadgets mobile. CNET News.com http://news.com.com/2100-1001-254020.html?legacy=cnet.]]
[68]
Sweeney, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., and Peck, G. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference. San Digeo, CA.]]
[69]
Thompson, K. 1978. UNIX implementation. Bell Syst. Tech. J. 57, 6, 1931--1946.]]
[70]
Torelli, P. 1995. The Microsoft flash file system. Dr. Dobb's J. Feb, 63--70.]]
[71]
Vogels, W. 1999. File system usage in Windows NT 4.0. In Proceedings of the 17th Symposium on Operating Systems Principles. Kiawah Island, SC.]]
[72]
Wang, A. I. A., Kuenning, G. H., Reiher P., and Popek, G. 2003. The effects of memory-rich environments on file system microbenchmarks. In Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems. Montreal, Canada.]]
[73]
Woodhouse, D. 2001. JFFS: The journaling flash file system. http://sources.redhat.com/jffs2/jffs2-html/.]]
[74]
Wu, M. and Zwaenepoel, W. 1994. eNVy: A non-volatile, main memory storage system. In Proceedings of the 6th Conference on Architectural Support for Programming Languages and Operating Systems. San Jose, CA.]]

Cited By

View all

Recommendations

Reviews

Suma Adabala

As dynamic random access memory (DRAM) gets cheaper, larger memories are typically used as buffers to hide input/output (I/O) latency to disk. The Conquest file system is a novel approach for the more effective use of cheap DRAM. It is designed so that battery-backed DRAM serves as a persistent store for small files and file system services, while the slower disks serve as a store for large files. Rather than adapting existing file system solutions, as in the case of random access memory (RAM) file systems or RAM-based disk emulators, the authors make a case for the need to redesign a file system optimized for persistent RAMs. The Conquest file system has a simpler datapath to small files and metadata in memory that bypasses the I/O buffer and disk management found in conventional disk-based file systems. The performance evaluation of Conquest shows up to a 19-times improvement in memory performance compared to file systems designed for disks, supporting the need for file system redesign to better exploit memory performance. Based on a variety of prior studies of file access patterns and file size distribution, the strategy for delegating files to a storage medium has a filesize threshold. Files with sizes below the threshold are delegated to persistent RAM, while those larger than the threshold are stored on disk. The performance gain achieved with Conquest for workloads that exercise both disk and memory supports this simple design decision. The large file layout on disk is optimized for sequential rather than random access, making Conquest disk access optimal for multimedia files, a significant component of current and future workloads. An implementation of Conquest as a loadable module in the Linux 2.4.2 kernel is available; however, due to issues such as lower reliability and lack of a garbage collector implementation, persistent DRAM must be cost-effective before it can be deployed. This paper is definitely worth reading for operating system designers. It demonstrates a successful redesign of a system component, the Conquest file system, after re-evaluating underlying assumptions, namely, file system optimizations for disks, in the context of changes to the system organization, namely, a memory-rich storage hierarchy. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 2, Issue 3
August 2006
149 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/1168910
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2006
Published in TOS Volume 2, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Persistent RAM
  2. file systems
  3. performance measurement
  4. storage management

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media