research-article

PrimCast: A Latency-Efficient Atomic Multicast

Authors:

Leandro Pacheco,

Fernando PedoneAuthors Info & Claims

Middleware '23: Proceedings of the 24th International Middleware Conference

Pages 124 - 136

https://doi.org/10.1145/3590140.3629110

Published: 27 November 2023 Publication History

Abstract

Atomic multicast is a communication abstraction that allows for messages to be addressed to and reliably delivered by multiple process groups, while ensuring a partial order on delivered messages. Strong ordering guarantees can greatly simplify the design and implementation of distributed applications. One critical property for the performance and scalability of an atomic multicast protocol is that of genuineness: a protocol is said to be genuine if only the sender and destinations of a message are involved in ordering the message. This paper presents PrimCast, the first genuine atomic multicast protocol able to deliver messages at every destination in three communication steps. PrimCast uses a primary-based consensus protocol for deciding on message timestamps at each group. Differently from previous work, it does not rely on consensus for advancing and maintaining logical clocks. PrimCast introduces a novel approach, relying on simple quorum intersection, to decide when a multicast message can be delivered. We also show how loosely synchronized clocks can be used to reduce the convoy effect that delays messages under high system load. We present the complete algorithm for PrimCast and evaluate its performance under various scenarios. Our results show that PrimCast achieves lower latency than state-of-the-art approaches while providing higher or comparable throughput.

References

[1]

Marcos K Aguilera, Carole Delporte-Gallet, Hugues Fauconnier, and Sam Toueg. 2001. Stable leader election. In Distributed Computing: 15th International Conference, DISC 2001 Lisbon, Portugal, October 3-5, 2001 Proceedings 15. Springer, 108--122.

[2]

Tarek Ahmed-Nacer, Pierre Sutra, and Denis Conan. 2016. The convoy effect in atomic multicast. In 2016 IEEE 35th Symposium on Reliable Distributed Systems Workshops (SRDSW). IEEE, 67--72.

[3]

Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobbler, Michael Wei, and John D Davis. 2012. Corfu: A shared log design for flash clusters. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 1--14.

[4]

Samuel Benz, Parisa Jalili Marandi, Fernando Pedone, and Benoît Garbinato. 2014. Building Global and Scalable Systems with Atomic Multicast. In 15th ACM/IFIP/USENIX International Middleware Conference (Middleware).

[5]

Samuel Benz and Fernando Pedone. 2017. Elastic Paxos: A Dynamic Atomic Multicast Protocol. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2157--2164.

[6]

Carlos Eduardo Bezerra, Daniel Cason, and Fernando Pedone. 2015. Ridge: high-throughput, low-latency atomic multicast. In 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS). IEEE, 256--265.

Digital Library

[7]

Carlos Eduardo Bezerra, Fernando Pedone, and Robbert Van Renesse. 2014. Scalable state-machine replication. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 331--342.

Digital Library

[8]

Kenneth P Birman and Thomas A Joseph. 1987. Reliable communication in the presence of failures. ACM Transactions on Computer Systems (TOCS) 5, 1 (1987), 47--76.

Digital Library

[9]

Mike Blasgen, Jim Gray, Mike Mitoma, and Tom Price. 1979. The convoy phenomenon. ACM SIGOPS Operating Systems Review 13, 2 (1979), 20--25.

Digital Library

[10]

Paulo Coelho, Tarcisio Ceolin Junior, Alysson Bessani, Fernando Dotti, and Fernando Pedone. 2018. Byzantine fault-tolerant atomic multicast. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 39--50.

[11]

Paulo R Coelho, Nicolas Schiper, and Fernando Pedone. 2017. Fast atomic multicast. In Dependable Systems and Networks (DSN), 2017 47th Annual IEEE/IFIP International Conference on. IEEE, 37--48.

[12]

James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, et al. 2013. Spanner: Google's globally distributed database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 8.

Digital Library

[13]

James Cowling and Barbara Liskov. 2012. Granola: Low-Overhead Distributed Transaction Coordination. In 2012 USENIX Annual Technical Conference (USENIX ATC 12). USENIX Association, Boston, MA, 223--235. https://www.usenix.org/conference/atc12/technical-sessions/presentation/cowling

[14]

Xavier Défago, André Schiper, and Péter Urbán. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys (CSUR) 36, 4 (2004), 372--421.

Digital Library

[15]

Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (1988), 288--323.

Digital Library

[16]

Vitor Enes, Carlos Baquero, Alexey Gotsman, and Pierre Sutra. 2021. Efficient Replication via Timestamp Stability. In Proceedings of the Sixteenth European Conference on Computer Systems (Online Event, United Kingdom) (EuroSys '21). ACM, New York, NY, USA, 178--193.

Digital Library

[17]

FastCast implementation [n. d.]. https://bitbucket.org/paulo_coelho/libmcast.

[18]

Udo Fritzke and Philippe Ingels. 2001. Transactions on Partially Replicated Data based on Reliable and Atomic Multicasts. In Proceedings of the The 21st International Conference on Distributed Computing Systems. 284--291.

[19]

Udo Fritzke, Philippe Ingels, Achour Mostéfaoui, and Michel Raynal. 1998. Fault-tolerant total order multicast to asynchronous groups. In Reliable Distributed Systems, 1998. Proceedings. Seventeenth IEEE Symposium on. IEEE, 228--234.

[20]

Alexey Gotsman, Anatole Lefort, and Gregory Chockler. 2019. White-Box Atomic Multicast. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 176--187.

[21]

Rachid Guerraoui and André Schiper. 1997. Genuine Atomic Multicast. In Proceedings of the 7th IEEE International Conference on Computer Communications and Networks. IEEE, 840--847.

[22]

Rachid Guerraoui and Andre Schiper. 1997. Total order multicast to multiple groups. In Proceedings of 17th International Conference on Distributed Computing Systems. IEEE, 578--585.

[23]

Rachid Guerraoui and André Schiper. 2001. Genuine atomic multicast in asynchronous distributed systems. Theoretical Computer Science 254, 1-2 (2001), 297--316.

Digital Library

[24]

Vassos Hadzilacos and Sam Toueg. 1994. A Modular Approach to Fault-Tolerant Broadcasts and Related Problems. Technical Report. Cornell University, Ithaca, NY, USA.

[25]

Flavio P Junqueira, Benjamin C Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN). IEEE, 245--256.

Digital Library

[26]

Sandeep S Kulkarni, Murat Demirbas, Deepak Madappa, Bharadwaj Avva, and Marcelo Leone. 2014. Logical physical clocks. In International Conference on Principles of Distributed Systems. Springer, 17--32.

[27]

Long Hoang Le, Enrique Fynn, Mojtaba Eslahi-Kelorazi, Robert Soulé, and Fernando Pedone. 2019. Dynastar: Optimized dynamic partitioning for scalable state machine replication. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 1453--1465.

[28]

Jialin Li, Ellis Michael, Naveen Kr Sharma, Adriana Szekeres, and Dan RK Ports. 2016. Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering. In OSDI. 467--483.

Digital Library

[29]

Libevent library [n. d.]. https://libevent.org.

[30]

Barbara Liskov and James Cowling. 2012. Viewstamped replication revisited. Technical Report. Technical Report MIT-CSAIL-TR-2012-021, MIT.

[31]

Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. 2012. Multi-ring paxos. In Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on. IEEE, 1--12.

Digital Library

[32]

Leandro Pacheco. 2023. Scaling Strongly Consistent Replicated Systems. Ph. D. Dissertation. Università della Svizzera italiana. https://sonar.ch/usi/documents/325574

[33]

Leandro Pacheco, Raluca Halalai, Valerio Schiavoni, Fernando Pedone, Etienne Riviere, and Pascal Felber. 2016. GlobalFS: A Strongly Consistent Multi-site File System. In Reliable Distributed Systems (SRDS), 2016 IEEE 35th Symposium on. IEEE, 147--156.

[34]

Fernando Pedone and André Schiper. 1999. Generic Broadcast. In Proceedings of the 13th International Symposium on Distributed Computing (DISC'99, formerly WDAG).

[35]

PrimCast implementation [n. d.]. https://github.com/pacheco/primcast.

[36]

Luis Rodrigues, Rachid Guerraoui, and André Schiper. 1998. Scalable atomic multicast. In International Conference on Computer Communications and Networks. 840--847.

[37]

Nicolas Schiper and Fernando Pedone. 2007. Optimal atomic broadcast and multicast algorithms for wide area networks. In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing. ACM, 384--385.

Digital Library

[38]

Nicolas Schiper and Fernando Pedone. 2008. On the inherent cost of atomic broadcast and multicast in wide area networks. In International conference on Distributed computing and networking (ICDCN). 147--157.

[39]

Nicholas Schiper, Pierre Sutra, and Fernando Pedone. 2010. P-Store: Genuine Partial Replication in Wide Area Networks. In Symposium on Reliable Distributed Systems (SRDS).

[40]

Amazon Time Sync Service. [n. d.]. https://aws.amazon.com/about-aws/whats-new/2017/11/introducing-the-amazon-time-sync-service/.

[41]

Tokio asynchronous runtime [n. d.]. https://tokio.rs/.

[42]

Robbert Van Renesse, Nicolas Schiper, and Fred B Schneider. 2014. Vive la différence: Paxos vs. viewstamped replication vs. zab. IEEE Transactions on Dependable and Secure Computing 12, 4 (2014), 472--484.

Digital Library

[43]

White-Box implementation [n. d.]. https://github.com/imdea-software/atomic-multicast.

Cited By

Bolina JSutra PAntunes DCamargos L(2024)Generic MulticastProceedings of the 13th Latin-American Symposium on Dependable and Secure Computing10.1145/3697090.3697095(81-90)Online publication date: 26-Nov-2024
https://dl.acm.org/doi/10.1145/3697090.3697095

Index Terms

PrimCast: A Latency-Efficient Atomic Multicast
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software fault tolerance
    2. Software functional properties
      1. Correctness
        Consistency

Recommendations

FlexCast: Genuine Overlay-based Atomic Multicast
Middleware '23: Proceedings of the 24th International Middleware Conference

Atomic multicast is a communication abstraction where messages are propagated to groups of processes with reliability and order guarantees. Atomic multicast is at the core of strongly consistent storage and transactional systems. This paper presents ...
Broadcast Protocols for Distributed Systems

An innovative approach is presented to the design of fault-tolerant distributed systems that avoids the several rounds of message exchange required by current protocols for consensus agreement. The approach is based on broadcast communication over a ...
Optimistic Atomic Multicast
ICDCS '13: Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems

Message ordering is one of the cornerstones of reliable distributed systems. However, some ordering guarantees, such as atomic order, are expensive to implement in terms of message delays. This paper presents Optimistic Atomic Multicast, a protocol that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

Middleware '23: Proceedings of the 24th International Middleware Conference

November 2023

334 pages

ISBN:9798400701771

DOI:10.1145/3590140

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

ACM: Association for Computing Machinery

In-Cooperation

IFIP: International Federation for Information Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Swiss National Science Foundation
Conselho Nacional de Desenvolvimento Científico e Tecnológico

Conference

Middleware '23

Sponsor:

ACM

Middleware '23: 24th International Middleware Conference

December 11 - 15, 2023

Bologna, Italy

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
70
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)2

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bolina JSutra PAntunes DCamargos L(2024)Generic MulticastProceedings of the 13th Latin-American Symposium on Dependable and Secure Computing10.1145/3697090.3697095(81-90)Online publication date: 26-Nov-2024
https://dl.acm.org/doi/10.1145/3697090.3697095

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents