skip to main content
10.1145/3590140.3629110acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

PrimCast: A Latency-Efficient Atomic Multicast

Published: 27 November 2023 Publication History

Abstract

Atomic multicast is a communication abstraction that allows for messages to be addressed to and reliably delivered by multiple process groups, while ensuring a partial order on delivered messages. Strong ordering guarantees can greatly simplify the design and implementation of distributed applications. One critical property for the performance and scalability of an atomic multicast protocol is that of genuineness: a protocol is said to be genuine if only the sender and destinations of a message are involved in ordering the message. This paper presents PrimCast, the first genuine atomic multicast protocol able to deliver messages at every destination in three communication steps. PrimCast uses a primary-based consensus protocol for deciding on message timestamps at each group. Differently from previous work, it does not rely on consensus for advancing and maintaining logical clocks. PrimCast introduces a novel approach, relying on simple quorum intersection, to decide when a multicast message can be delivered. We also show how loosely synchronized clocks can be used to reduce the convoy effect that delays messages under high system load. We present the complete algorithm for PrimCast and evaluate its performance under various scenarios. Our results show that PrimCast achieves lower latency than state-of-the-art approaches while providing higher or comparable throughput.

References

[1]
Marcos K Aguilera, Carole Delporte-Gallet, Hugues Fauconnier, and Sam Toueg. 2001. Stable leader election. In Distributed Computing: 15th International Conference, DISC 2001 Lisbon, Portugal, October 3-5, 2001 Proceedings 15. Springer, 108--122.
[2]
Tarek Ahmed-Nacer, Pierre Sutra, and Denis Conan. 2016. The convoy effect in atomic multicast. In 2016 IEEE 35th Symposium on Reliable Distributed Systems Workshops (SRDSW). IEEE, 67--72.
[3]
Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobbler, Michael Wei, and John D Davis. 2012. Corfu: A shared log design for flash clusters. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). 1--14.
[4]
Samuel Benz, Parisa Jalili Marandi, Fernando Pedone, and Benoît Garbinato. 2014. Building Global and Scalable Systems with Atomic Multicast. In 15th ACM/IFIP/USENIX International Middleware Conference (Middleware).
[5]
Samuel Benz and Fernando Pedone. 2017. Elastic Paxos: A Dynamic Atomic Multicast Protocol. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2157--2164.
[6]
Carlos Eduardo Bezerra, Daniel Cason, and Fernando Pedone. 2015. Ridge: high-throughput, low-latency atomic multicast. In 2015 IEEE 34th Symposium on Reliable Distributed Systems (SRDS). IEEE, 256--265.
[7]
Carlos Eduardo Bezerra, Fernando Pedone, and Robbert Van Renesse. 2014. Scalable state-machine replication. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, 331--342.
[8]
Kenneth P Birman and Thomas A Joseph. 1987. Reliable communication in the presence of failures. ACM Transactions on Computer Systems (TOCS) 5, 1 (1987), 47--76.
[9]
Mike Blasgen, Jim Gray, Mike Mitoma, and Tom Price. 1979. The convoy phenomenon. ACM SIGOPS Operating Systems Review 13, 2 (1979), 20--25.
[10]
Paulo Coelho, Tarcisio Ceolin Junior, Alysson Bessani, Fernando Dotti, and Fernando Pedone. 2018. Byzantine fault-tolerant atomic multicast. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 39--50.
[11]
Paulo R Coelho, Nicolas Schiper, and Fernando Pedone. 2017. Fast atomic multicast. In Dependable Systems and Networks (DSN), 2017 47th Annual IEEE/IFIP International Conference on. IEEE, 37--48.
[12]
James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, et al. 2013. Spanner: Google's globally distributed database. ACM Transactions on Computer Systems (TOCS) 31, 3 (2013), 8.
[13]
James Cowling and Barbara Liskov. 2012. Granola: Low-Overhead Distributed Transaction Coordination. In 2012 USENIX Annual Technical Conference (USENIX ATC 12). USENIX Association, Boston, MA, 223--235. https://www.usenix.org/conference/atc12/technical-sessions/presentation/cowling
[14]
Xavier Défago, André Schiper, and Péter Urbán. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys (CSUR) 36, 4 (2004), 372--421.
[15]
Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (1988), 288--323.
[16]
Vitor Enes, Carlos Baquero, Alexey Gotsman, and Pierre Sutra. 2021. Efficient Replication via Timestamp Stability. In Proceedings of the Sixteenth European Conference on Computer Systems (Online Event, United Kingdom) (EuroSys '21). ACM, New York, NY, USA, 178--193.
[17]
FastCast implementation [n. d.]. https://bitbucket.org/paulo_coelho/libmcast.
[18]
Udo Fritzke and Philippe Ingels. 2001. Transactions on Partially Replicated Data based on Reliable and Atomic Multicasts. In Proceedings of the The 21st International Conference on Distributed Computing Systems. 284--291.
[19]
Udo Fritzke, Philippe Ingels, Achour Mostéfaoui, and Michel Raynal. 1998. Fault-tolerant total order multicast to asynchronous groups. In Reliable Distributed Systems, 1998. Proceedings. Seventeenth IEEE Symposium on. IEEE, 228--234.
[20]
Alexey Gotsman, Anatole Lefort, and Gregory Chockler. 2019. White-Box Atomic Multicast. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 176--187.
[21]
Rachid Guerraoui and André Schiper. 1997. Genuine Atomic Multicast. In Proceedings of the 7th IEEE International Conference on Computer Communications and Networks. IEEE, 840--847.
[22]
Rachid Guerraoui and Andre Schiper. 1997. Total order multicast to multiple groups. In Proceedings of 17th International Conference on Distributed Computing Systems. IEEE, 578--585.
[23]
Rachid Guerraoui and André Schiper. 2001. Genuine atomic multicast in asynchronous distributed systems. Theoretical Computer Science 254, 1-2 (2001), 297--316.
[24]
Vassos Hadzilacos and Sam Toueg. 1994. A Modular Approach to Fault-Tolerant Broadcasts and Related Problems. Technical Report. Cornell University, Ithaca, NY, USA.
[25]
Flavio P Junqueira, Benjamin C Reed, and Marco Serafini. 2011. Zab: High-performance broadcast for primary-backup systems. In 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN). IEEE, 245--256.
[26]
Sandeep S Kulkarni, Murat Demirbas, Deepak Madappa, Bharadwaj Avva, and Marcelo Leone. 2014. Logical physical clocks. In International Conference on Principles of Distributed Systems. Springer, 17--32.
[27]
Long Hoang Le, Enrique Fynn, Mojtaba Eslahi-Kelorazi, Robert Soulé, and Fernando Pedone. 2019. Dynastar: Optimized dynamic partitioning for scalable state machine replication. In 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 1453--1465.
[28]
Jialin Li, Ellis Michael, Naveen Kr Sharma, Adriana Szekeres, and Dan RK Ports. 2016. Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering. In OSDI. 467--483.
[29]
Libevent library [n. d.]. https://libevent.org.
[30]
Barbara Liskov and James Cowling. 2012. Viewstamped replication revisited. Technical Report. Technical Report MIT-CSAIL-TR-2012-021, MIT.
[31]
Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. 2012. Multi-ring paxos. In Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on. IEEE, 1--12.
[32]
Leandro Pacheco. 2023. Scaling Strongly Consistent Replicated Systems. Ph. D. Dissertation. Università della Svizzera italiana. https://sonar.ch/usi/documents/325574
[33]
Leandro Pacheco, Raluca Halalai, Valerio Schiavoni, Fernando Pedone, Etienne Riviere, and Pascal Felber. 2016. GlobalFS: A Strongly Consistent Multi-site File System. In Reliable Distributed Systems (SRDS), 2016 IEEE 35th Symposium on. IEEE, 147--156.
[34]
Fernando Pedone and André Schiper. 1999. Generic Broadcast. In Proceedings of the 13th International Symposium on Distributed Computing (DISC'99, formerly WDAG).
[35]
PrimCast implementation [n. d.]. https://github.com/pacheco/primcast.
[36]
Luis Rodrigues, Rachid Guerraoui, and André Schiper. 1998. Scalable atomic multicast. In International Conference on Computer Communications and Networks. 840--847.
[37]
Nicolas Schiper and Fernando Pedone. 2007. Optimal atomic broadcast and multicast algorithms for wide area networks. In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing. ACM, 384--385.
[38]
Nicolas Schiper and Fernando Pedone. 2008. On the inherent cost of atomic broadcast and multicast in wide area networks. In International conference on Distributed computing and networking (ICDCN). 147--157.
[39]
Nicholas Schiper, Pierre Sutra, and Fernando Pedone. 2010. P-Store: Genuine Partial Replication in Wide Area Networks. In Symposium on Reliable Distributed Systems (SRDS).
[40]
Amazon Time Sync Service. [n. d.]. https://aws.amazon.com/about-aws/whats-new/2017/11/introducing-the-amazon-time-sync-service/.
[41]
Tokio asynchronous runtime [n. d.]. https://tokio.rs/.
[42]
Robbert Van Renesse, Nicolas Schiper, and Fred B Schneider. 2014. Vive la différence: Paxos vs. viewstamped replication vs. zab. IEEE Transactions on Dependable and Secure Computing 12, 4 (2014), 472--484.
[43]
White-Box implementation [n. d.]. https://github.com/imdea-software/atomic-multicast.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware '23: Proceedings of the 24th International Middleware Conference
November 2023
334 pages
ISBN:9798400701771
DOI:10.1145/3590140
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IFIP: International Federation for Information Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. atomic multicast
  2. distributed agreement
  3. fault-tolerant distributed systems

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Swiss National Science Foundation
  • Conselho Nacional de Desenvolvimento Científico e Tecnológico

Conference

Middleware '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media