research-article

Open access

DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Steps

Authors:

Tianyu Li,

Badrish Chandramouli,

Sebastian Burckhardt,

Samuel MaddenAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 2

Article No.: 117, Pages 1 - 27

https://doi.org/10.1145/3589262

Published: 20 June 2023 Publication History

PDF eReader

Abstract

Providing strong fault-tolerant guarantees for the modern cloud is difficult, as application developers must coordinate between independent stateful services and ephemeral compute and handle various failure-induced anomalies. We propose Composable Resilient Steps (CReSt), a new abstraction for resilient cloud applications. CReSt uses fault-tolerant steps as its core building block, which allows participants to receive, process, and send messages as a single uninterruptible atomic unit. Composability and reliability are orthogonally achieved by reusable CReSt implementations, for example, leveraging reliable message queues. Thus, CReSt application builders focus solely on translating application logic into steps, and infrastructure builders focus on efficient CReSt implementations. We propose one such implementation called DARQ (for Deduplicated Asynchronously Recoverable Queues). At its core, DARQ is a storage service that encapsulates CReSt participant state and enforces CReSt semantics; developers attach ephemeral compute nodes to DARQ instances to implement stateful distributed components. Services built with DARQ are resilient by construction, and CReSt-compatible services naturally compose without loss of resilience. For performance, we propose a novel speculative execution scheme to execute CReSt steps without waiting for message persistence in DARQ, effectively eliding cloud persistence overheads; our scheme maintains CReSt's fault-tolerance guarantees and automatically restores to a consistent system state upon failure. We showcase the generality of CReSt and DARQ using two applications: cloud streaming and workflow processing. Experiments show that DARQ is able to achieve extremely low latency and high throughput across these use cases, often beating state-of-the-art customized solutions.

Supplemental Material

MP4 File

Video for paper 117: DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient Steps

Download
282.29 MB

PDF File

Read me

Download
72.04 KB

ZIP File

Source Code

Download
158.45 MB

References

[1]

Amazon Step Functions. https://aws.amazon.com/step-functions/, retrieved 13-Oct-2022.

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Resilient distributed computing

An Automatic Recovery Mechanism for Cloud Service Composition

A memory approach to consistent, reliable distributed shared memory

Comments

Information

Published In

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations