skip to main content
10.1145/3492321.3527539acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Public Access

Jiffy: elastic far-memory for stateful serverless analytics

Published: 28 March 2022 Publication History

Abstract

Stateful serverless analytics can be enabled using a remote memory system for inter-task communication, and for storing and exchanging intermediate data. However, existing systems allocate memory resources at job granularity---jobs specify their memory demands at the time of the submission; and, the system allocates memory equal to the job's demand for the entirety of its lifetime. This leads to resource underutilization and/or performance degradation when intermediate data sizes vary during job execution.
This paper presents Jiffy, an elastic far-memory system for stateful serverless analytics that meets the instantaneous memory demand of a job at seconds timescales. Jiffy efficiently multiplexes memory capacity across concurrently running jobs, reducing the overheads of reads and writes to slower persistent storage, resulting in 1.6 -- 2.5× improvements in job execution time over production workloads. Jiffy implementation currently runs on Amazon EC2, enables a wide variety of distributed programming models including MapReduce, Dryad, StreamScope, and Piccolo, and natively supports a large class of analytics applications on AWS Lambda.

References

[1]
AWS Lamda. https://aws.amazon.com/lambda/.
[2]
Azure Functions. https://azure.microsoft.com/en-us/services/functions.
[3]
Google Cloud Functions. https://cloud.google.com/functions.
[4]
State of the Serverless Community Survey Results. https://serverless.com/blog/state-of-serverless-community.
[5]
2018 Serverless Community Survey: huge growth in serverless usage. https://bit.ly/2Mu5TCR.
[6]
Matthew Perron, Raul Castro Fernandez, David DeWitt, and Samuel Madden. Starling: A scalable query engine on cloud function services. In SIGMOD, 2020.
[7]
Qifan Pu, Shivaram Venkataraman, and Ion Stoica. Shuffling, fast and slow: scalable analytics on serverless infrastructure. In NSDI, 2019.
[8]
Ana Klimovic, Yawen Wang, Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, and Christos Kozyrakis. Pocket: Elastic ephemeral storage for serverless analytics. In OSDI, 2018.
[9]
Youngbin Kim and Jimmy Lin. Serverless data analytics with Flint. In CLOUD, 2018.
[10]
Qubole Announces Apache Spark on AWS Lambda. https://www.qubole.com/blog/spark-on-aws-lambda.
[11]
Joao Carreira, Pedro Fonseca, Alexey Tumanov, Andrew Zhang, and Randy Katz. Cirrus: A serverless framework for end-to-end ml workflows. In SoCC, 2019.
[12]
Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In NSDI, 2017.
[13]
Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. Occupy the cloud: distributed computing for the 99%. In SoCC, 2017.
[14]
Vaishaal Shankar, Karl Krauth, Qifan Pu, Eric Jonas, Shivaram Venkataraman, Ion Stoica, Benjamin Recht, and Jonathan Ragan-Kelley. numpywren: serverless linear algebra. arXiv preprint arXiv:1810.09679, 2018.
[15]
Sadjad Fouladi, Francisco Romero, Dan Iter, Qian Li, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and Keith Winstein. From laptop to lambda: Outsourcing everyday jobs to thousands of transient functional containers. In ATC, 2019.
[16]
Amazon. Amazon Athena. https://aws.amazon.com/athena.
[17]
Amazon. Amazon Aurora Serverless. https://aws.amazon.com/rds/aurora/serverless.
[18]
Azure. Azure SQL Data Warehouse. https://azure.microsoft.com/en-us/services/sql-data-warehouse.
[19]
Vikram Sreekanti, Chenggang Wu Xiayue Charles Lin, Jose M Faleiro, Joseph E Gonzalez, Joseph M Hellerstein, and Alexey Tumanov. Cloudburst: Stateful functions-as-a-service. arXiv preprint arXiv:2001.04592, 2020.
[20]
Midhul Vuppalapati, Justin Miron, Rachit Agarwal, Dan Truong, Ashish Motivala, and Thierry Cruanes. Building an elastic query engine on disaggregated storage. In NSDI, 2020.
[21]
Hong Zhang, Yupeng Tang, Anurag Khandelwal, Jingrong Chen, and Ion Stoica. Caerus: NIMBLE task scheduling for serverless analytics. In NSDI, 2021.
[22]
Zhipeng Jia and Emmett Witchel. Boki: Stateful serverless computing with shared logs. In SOSP, 2021.
[23]
Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thelen, Kamil Adam Yurtsever, Yu Zhao, and Parthasarathy Ranganathan. Software-defined far memory in warehouse-scale computers. In ASPLOS, 2019.
[24]
Emmanuel Amaro, Christopher Branner-Augmon, Zhihong Luo, Amy Ousterhout, Marcos K. Aguilera, Aurojit Panda, Sylvia Ratnasamy, and Scott Shenker. Can far memory improve job throughput? In EuroSys, 2020.
[25]
Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, and Boris Grot. Scale-out numa. In ASPLOS, 2014.
[26]
Ling Liu, Wenqi Cao, Semih Sahin, Qi Zhang, Juhyun Bae, and Yanzhao Wu. Memory disaggregation: Research problems and opportunities. In ICDCS, 2019.
[27]
Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K. Reinhardt, and Thomas F. Wenisch. Disaggregated memory for expansion and sharing in blade servers. In ISCA, 2009.
[28]
K. Lim, Y. Turner, J. R. Santos, A. AuYoung, J. Chang, P. Ranganathan, and T. F. Wenisch. System-level implications of disaggregated memory. In HPCA, 2012.
[29]
Ahmad Samih, Ren Wang, Christian Maciocco, Mazen Kharbutli, and Yan Solihin. Collaborative Memories in Clusters: Opportunities and Challenges. 2014.
[30]
Krste Asanović. Firebox: A hardware building block for 2020 warehouse-scale computers. 2014.
[31]
Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. Legoos: A disseminated, distributed OS for hardware resource disaggregation. In OSDI, 2018.
[32]
Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. Efficient Memory Disaggregation with Infiniswap. In NSDI, 2017.
[33]
Peter Xiang Gao, Akshay Narayan, Sagar Karandikar, Joao Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. Network requirements for resource disaggregation. In OSDI, 2016.
[34]
Amanda Carbonari and Ivan Beschasnikh. Tolerating faults in disaggregated datacenters. In HotNets, 2017.
[35]
High Throughput Computing Data Center Architecture. http://www.huawei.com/ilink/en/download/HW_349607.
[36]
The Machine: A new kind of computer. https://www.hpl.hp.com/research/systems-research/themachine/.
[37]
Intel Rack Scale Design: Just what is it? https://www.datacenterdynamics.com/en/opinions/intel-rack-scale-design-just-what-is-it/.
[38]
Facebook's Disaggregated Racks Strategy Provides an Early Glimpse into Next Gen Cloud Computing Data Center Infrastructures. https://dcig.com/2015/01/facebooks-disaggregated-racks-strategy-provides-early-glimpse-next-gen-cloud-computing.html.
[39]
Rack-scale Computing. https://www.microsoft.com/en-us/research/project/rack-scale-computing/.
[40]
In Bid for Major Carriers and Service Providers, Dell EMC Rack Scale Infrastructure Offers 'Hyperscale Principles'. https://www.enterpriseai.news/2017/09/12/bid-major-carriers-service-providers-dell-emc-rack-scale-infrastructure-offers-hyperscale-principles/.
[41]
Kshiteej Mahajan, Mosharaf Chowdhury, Aditya Akella, and Shuchi Chawla. Dynamic query re-planning using QOOP. In OSDI, 2018.
[42]
Jaehyun Hwang, Qizhe Cai, Ao Tang, and Rachit Agarwal. TCP≈RDMA: CPU-efficient remote storage access with i10. In NSDI, 2020.
[43]
Jaehyun Hwang, Midhul Vuppalapati, Simon Peter, and Rachit Agarwal. Rearchitecting linux storage stack for μs latency and high throughput. In OSDI), 2021.
[44]
Kai Li. Ivy: A shared virtual memory system for parallel computing. ICPP, 1988.
[45]
Brett Fleisch and Gerald Popek. Mirage: A coherent distributed shared memory design. 1989.
[46]
Eric Jul, Henry Levy, Norman Hutchinson, and Andrew Black. Finegrained mobility in the emerald system. TOCS, 1988.
[47]
Partha Dasgupta, Richard J LeBlanc, Mustaque Ahamad, and Umakishore Ramachandran. The clouds distributed operating system. Computer, 1991.
[48]
John B Carter, Dilip Khandekar, and Linus Kamb. Distributed shared memory: Where we are and where we should be headed. In HotOS, 1995.
[49]
Hyeontaek Lim, Dongsu Han, David G Andersen, and Michael Kaminsky. MICA: A Holistic Approach to Fast In-memory Key-value Storage. In NSDI, 2014.
[50]
John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, et al. The Case for RAMClouds: Scalable High-performance Storage Entirely in DRAM. SIGOPS OSR, 2010.
[51]
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. FaRM: Fast Remote Memory. In NSDI, 2014.
[52]
Redis. http://www.redis.io.
[53]
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. CACM, 2008.
[54]
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In SIGOPS OSR, 2007.
[55]
Wei Lin, Zhengping Qian, Junwei Xu, Sen Yang, Jingren Zhou, and Lidong Zhou. Streamscope: continuous reliable distributed processing of big data streams. In NSDI, 2016.
[56]
Russell Power and Jinyang Li. Piccolo: Building Fast, Distributed Programs with Partitioned Tables. In OSDI, 2010.
[57]
Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, and Michael A. Kozuch. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In SoCC, 2012.
[58]
TPC-DS. http://www.tpc.org/tpcds/.
[59]
Hadoop Distributed File System. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.
[60]
Paul R. Wilson. Uniprocessor garbage collection techniques. In IWMM, 1992.
[61]
David I Bevan. Distributed garbage collection using reference counting. In PARLE, 1987.
[62]
K G Cassidy. Feasibility of automatic storage reclamation with concurrent program execution in a lisp environment. master's thesis. 1985.
[63]
Cary Gray and David Cheriton. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. 1989.
[64]
Mike Burrows. The Chubby Lock Service for Loosely-coupled Distributed Systems. In OSDI, 2006.
[65]
R. Droms. RFC 2131: Dynamic Host Configuration Protocol. https://www.ietf.org/rfc/rfc2131.txt, 1997.
[66]
Amazon ElastiCache. https://aws.amazon.com/elasticache.
[67]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.
[68]
Amazon Simple Notification Service (SNS). https://aws.amazon.com/sns.
[69]
Patrick Hunt, Mahadev Konar, Flavio Paiva Junqueira, and Benjamin Reed. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In ATC, 2010.
[70]
Apache Thrift. https://thrift.apache.org/.
[71]
Robbert van Renesse and Fred B. Schneider. Chain Replication for Supporting High Throughput and Availability. In OSDI, 2004.
[72]
Apache Hadoop. https://hadoop.apache.org/.
[73]
libcuckoo. https://github.com/efficient/libcuckoo.
[74]
Amazon EC2. https://aws.amazon.com/ec2/.
[75]
Amazon S3. https://aws.amazon.com/s3.
[76]
Wikipedia Dataset. https://en.wikipedia.org/wiki/Wikipedia:Database_download.
[77]
Ton Roosendaal. Sintel. In ACM SIGGRAPH CAF, 2011.
[78]
MemCached. http://www.memcached.org.
[79]
Bin Fan, David G. Andersen, and Michael Kaminsky. MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing. In NSDI, 2013.
[80]
Chenggang Wu, Vikram Sreekanti, and Joseph M Hellerstein. Autoscaling tiered cloud storage in anna. VLDB, 2019.
[81]
Rachit Agarwal, Anurag Khandelwal, and Ion Stoica. Succinct: Enabling Queries on Compressed Data. In NSDI, 2015.
[82]
Anurag Khandelwal, Rachit Agarwal, and Ion Stoica. Blowfish: Dynamic storage-performance tradeoff in data stores. In NSDI, 2016.
[83]
John B. Carter, John K. Bennett, and Willy Zwaenepoel. Implementation and performance of munin. In SOSP, 1991.
[84]
Kai Li and Paul Hudak. Memory coherence in shared virtual memory systems. ACM TOCS, 1989.
[85]
Bill Nitzberg and Virginia Lo. Distributed Shared Memory: A Survey of Issues and Algorithms. Computer, 1991.
[86]
Pete Keleher, Alan L. Cox, Sandhya Dwarkadas, and Willy Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In WTEC, 1994.
[87]
Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Stanko Novaković, Arun Ramanathan, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. Remote regions: a simple abstraction for remote memory. In ATC, 2018.
[88]
Seung-seob Lee, Yanpeng Yu, Yupeng Tang, Anurag Khandelwal, Lin Zhong, and Abhishek Bhattacharjee. Mind: In-network memory management for disaggregated data centers. In SOSP, 2021.
[89]
Postgres: User defined Functions. https://www.postgresql.org/docs/8.0/xfunc.html.
[90]
Oracle: User defined Functions. https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions231.htm.
[91]
SQL Server: User defined Functions. https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/user-defined-functions.
[92]
Postgres: Stored Procedures. https://www.postgresql.org/docs/11/sql-createprocedure.html.
[93]
Oracle: Stored Procedures. https://docs.oracle.com/cd/B28359_01/appdev.111/b28843/tdddg_procedures.htm.
[94]
SQL Server: Stored Procedures. https://docs.microsoft.com/en-us/sql/relational-databases/stored-procedures/create-a-stored-procedure?view=sql-server-2017.
[95]
John MacCormick, Nick Murphy, Marc Najork, Chandramohan A. Thekkath, and Lidong Zhou. Boxwood: Abstractions as the foundation for storage infrastructure. In OSDI, 2004.
[96]
Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. In SOSP, 2007.
[97]
David Shue, Michael J. Freedman, and Anees Shaikh. Performance isolation and fairness for multi-tenant cloud storage. In OSDI, 2012.
[98]
Asaf Cidon, Daniel Rushton, Stephen M. Rumble, and Ryan Stutsman. Memshare: a dynamic multi-tenant key-value cache. In ATC, 2017.

Cited By

View all

Index Terms

  1. Jiffy: elastic far-memory for stateful serverless analytics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EuroSys '22: Proceedings of the Seventeenth European Conference on Computer Systems
    March 2022
    783 pages
    ISBN:9781450391627
    DOI:10.1145/3492321
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 March 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. data analytics
    2. far-memory
    3. function-as-a-service
    4. serverless computing

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    EuroSys '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 241 of 1,308 submissions, 18%

    Upcoming Conference

    EuroSys '25
    Twentieth European Conference on Computer Systems
    March 30 - April 3, 2025
    Rotterdam , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)643
    • Downloads (Last 6 weeks)61
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media