skip to main content
10.1145/3373376.3378496acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol

Published: 13 March 2020 Publication History

Abstract

Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency.
This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6X lower than that of CRAQ and ZAB.

References

[1]
Atul Adya, Daniel Myers, Jon Howell, Jeremy Elson, Colin Meek, Vishesh Khemani, Stefan Fulger, Pan Gu, Lakshminath Bhuvanagiri, Jason Hunter, Roberto Peon, Larry Kai, Alexander Shraer, Arif Merchant, and Kfir Lev-Ari. 2016. Slicer: Auto-sharding for Datacenter Applications. In Proceedings of the 12th Conference on Operating Systems Design and Implementation (OSDI'16). USENIX, USA, 739--753.
[2]
Marcos Aguilera, Carole Gallet, Hugues Fauconnier, and Sam Toueg. 2000. Thrifty Generic Broadcast. In Proceedings of the 14th Conference on Distributed Computing (DISC '00). ., UK, 268--282.
[3]
Marcos Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. 2007. Sinfonia: A New Paradigm for Building Scalable Distributed Systems. SIGOPS Oper. Syst. Rev., Vol. 41, 6 (2007), 159--174. https://doi.org/10.1145/1323293.1294278
[4]
Sérgio Almeida, Jo ao Leit ao, and Lu'is Rodrigues. 2013. ChainReaction: A Causal
[5]
Consistent Datastore Based on Chain Replication. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 85--98. https://doi.org/10.1145/2465351.2465361
[6]
Peter Alsberg and John Day. 1976. A Principle for Resilient Sharing of Distributed Resources. In Proceedings of the 2nd International Conference on Software Engineering (ICSE '76). IEEE, USA, 562--570.
[7]
Yair Amir, Louise Moser, Peter Melliar, Deborah Agarwal, and Paul Ciarfella. 1995. The Totem Single-ring Ordering and Membership Protocol. ACM Trans. Comput. Syst., Vol. 13, 4 (Nov. 1995), 311--342. https://doi.org/10.1145/210223.210224
[8]
Ali Anwar, Yue Cheng, Hai Huang, Jingoo Han, Hyogi Sim, Dongyoon Lee, Fred Douglis, and Ali R. Butt. 2018. bespoKV: Application Tailored Scale-out Key-value Stores. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, Article 2, bibinfonumpages16 pages.
[9]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-scale Key-value Store . SIGMETRICS Perform. Eval. Rev., Vol. 40, 1 (June 2012), 53--64. https://doi.org/10.1145/2318857.2254766
[10]
Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. 1995. Sharing Memory Robustly in Message-passing Systems . J. ACM, Vol. 42, 1 (1995), 124--142. https://doi.org/10.1145/200836.200869
[11]
Hagit Attiya and Jennifer Welch. 1994. Sequential Consistency versus Linearizability. ACM Trans. Comput. Syst., Vol. 12, 2 (May 1994), 91--122. https://doi.org/10.1145/176575.176576
[12]
Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing Scalable, Highly Available Storage for Interactive Services. In Proceedings of the Conference on Innovative Data system Research (CIDR) . ., Asilomar, CA, 223--234.
[13]
Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobber, Michael Wei, and John D. Davis. 2012. CORFU: A Shared Log Design for Flash Clusters. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 1--1.
[14]
Dotan Barak. 2013. Tips and tricks to optimize your RDMA code . https://www.rdmamojo.com/2013/06/08/tips-and-tricks-to-optimize-your-rdma-code/. (Accessed on 13/08/2019).
[15]
Dotan Barak. 2015. RDMA Aware Networks Programming User Manual .
[16]
Luiz Barroso, Urs Hölzle, and Parthasarathy Ranganathan. 2018. The datacenter as a computer: Designing warehouse-scale machines. Synthesis Lectures on Computer Architecture, Vol. 13, 3 (2018), i--189.
[17]
Luiz Barroso, Mike Marty, David Patterson, and Parthasarathy Ranganathan. 2017. Attack of the Killer Microseconds. Commun. ACM, Vol. 60, 4 (2017), 48--54. https://doi.org/10.1145/3015146
[18]
Jonathan Behrens, Ken Birman, Sagar Jha, Matthew Milano, Edward Tremel, Eugene Bagdasaryan, Theo Gkountouvas, Weijia Song, and Robbert Van Renesse. 2016. Derecho: Group Communication at the Speed of Light . Technical Report. Cornell University.
[19]
Ken Birman and Thomas Joseph. 1987. Exploiting Virtual Synchrony in Distributed Systems. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (SOSP '87). ACM, USA, 123--138. https://doi.org/10.1145/41457.37515
[20]
William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters, and Peng Li. 2011. Paxos Replicated State Machines As the Basis of a High-performance Data Store. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). USENIX Association, USA, 141--154.
[21]
Fábio Botelho, Fernando Ramos, Diego Kreutz, and Alysson Bessani. 2013. On the Feasibility of a Consistent and Fault-Tolerant Data Store for SDNs. In Proceedings of the 2013 Second European Workshop on Software Defined Networks (EWSDN '13). IEEE, USA, 38--43. https://doi.org/10.1109/EWSDN.2013.13
[22]
Eric Brewer. 2000. Towards Robust Distributed Systems. In Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing (PODC '00). ACM, USA, 7--. https://doi.org/10.1145/343477.343502
[23]
Eric Brewer. 2012. CAP twelve years later: How the" rules" have changed . Computer, Vol. 45, 2 (2012), 23--29.
[24]
Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, Harry Li, Mark Marchukov, Dmitri Petrov, Lovro Puzar, Yee Jiun Song, and Venkat Venkataramani. 2013. TAO: Facebook's Distributed Data Store for the Social Graph. In Proceedings of the 2013 Conference on Annual Technical Conference (ATC'13). USENIX, Berkeley, 49--60.
[25]
Mike Burrows. 2006. The Chubby Lock Service for Loosely-coupled Distributed Systems. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7 (OSDI '06). USENIX Association, USA, 24--24.
[26]
Tushar Chandra, Vassos Hadzilacos, and Sam Toueg. 2016. An Algorithm for Replicated Objects with Efficient Reads. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (PODC '16). ACM, New York, NY, USA, 325--334. https://doi.org/10.1145/2933057.2933111
[27]
Tushar Chandra and Sam Toueg. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM, Vol. 43, 2 (1996), 225--267.
[28]
Kelly Clay. 2013. Amazon.com Goes Down, Loses $66,240 Per Minute. https://www.forbes.com/sites/kellyclay/2013/08/19/amazon-com-goes-down-loses-66240-per-minute/#4e849f8b495c . (Accessed on 13/08/2019).
[29]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). ACM, New York, NY, USA, 143--154. https://doi.org/10.1145/1807128.1807152
[30]
James Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2013. Spanner: Google's Globally Distributed Database. ACM Trans. Comput. Syst., Vol. 31, 3 (2013), 22. https://doi.org/10.1145/2491245
[31]
Huynh Tu Dang, Daniele Sciascia, Marco Canini, Fernando Pedone, and Robert Soulé. 2015. NetPaxos: Consensus at Network Speed. In Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research (SOSR '15). ACM, New York, Article 5, bibinfonumpages7 pages. https://doi.org/10.1145/2774993.2774999
[32]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-value Store . SIGOPS Oper. Sys., Vol. 41, 6 (2007), 5--20. https://doi.org/10.1145/1323293.1294281
[33]
Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 401--414.
[34]
Aleksandar Dragojević, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No Compromises: Distributed Transactions with Consistency, Availability, and Performance. In Proceedings of the Symposium on Operating Systems Principles (SOSP '15). ACM, New York, 54--70. https://doi.org/10.1145/2815400.2815425
[35]
Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the Presence of Partial Synchrony. J. ACM, Vol. 35, 2 (1988), 288--323. https://doi.org/10.1145/42282.42283
[36]
Niklas Ekströ m and Seif Haridi. 2016. A Fault-Tolerant Sequentially Consistent DSM With a Compositional Correctness Proof .arxiv: 1608.02442
[37]
Nathan Farrington. 2009. Multipath TCP under Massive Packet Reordering.
[38]
Vasilis Gavrielatos, Antonios Katsarakis, Arpit Joshi, Nicolai Oswald, Boris Grot, and Vijay Nagarajan. 2018. Scale-out ccNUMA: Exploiting Skew with Strongly Consistent Caching. In Proceedings of the EuroSys Conference (EuroSys '18). ACM, USA, Article 21, bibinfonumpages15 pages. https://doi.org/10.1145/3190508.3190550
[39]
Seth Gilbert and Nancy Lynch. 2002. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. Acm Sigact News, Vol. 33, 2 (2002), 51--59.
[40]
Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. 2011. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In Proceedings of the ACM SIGCOMM 2011 Conference (SIGCOMM '11). ACM, USA, 350--361. https://doi.org/10.1145/2018436.2018477
[41]
Jim Gray. 1978. Notes on Data Base Operating Systems. In Operating Systems, An Advanced Course . Springer-Verlag, London, UK, 393--481.
[42]
Rachid Guerraoui. 2002. Non-blocking atomic commit in asynchronous distributed systems with failure detectors. Distributed Computing, Vol. 15, 1 (2002), 17--25. https://doi.org/10.1007/s446-002--8027--4
[43]
Rachid Guerraoui, Dejan Kostic, Ron R. Levy, and Vivien Quema. 2007. A High Throughput Atomic Storage Algorithm. In Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS '07). IEEE Computer Society, Washington, DC, USA, 19--. https://doi.org/10.1109/ICDCS.2007.80
[44]
Rachid Guerraoui, Mikel Larrea, and André Schiper. 1995. Non Blocking Atomic Commitment with an Unreliable Failure Detector. In Proceedings of the 14TH Symposium on Reliable Distributed Systems (SRDS '95). IEEE Computer Society, Washington, DC, USA, 41--.
[45]
Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. 2016. RDMA over Commodity Ethernet at Scale. In Proceedings of the 2016 ACM SIGCOMM Conference (SIGCOMM '16). ACM, USA, 202--215. https://doi.org/10.1145/2934872.2934908
[46]
Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., USA.
[47]
Maurice Herlihy and Jeannette Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects . ACM Trans. Program. Lang. Syst., Vol. 12, 3 (July 1990), 463--492. https://doi.org/10.1145/78969.78972
[48]
Heidi Howard. 2019. Distributed consensus revised (Thesis).
[49]
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC'10). USENIX Association, Berkeley, CA, USA, 11--11.
[50]
Zsolt István, David Sidler, Gustavo Alonso, and Marko Vukolic. 2016. Consensus in a Box: Inexpensive Coordination in Hardware. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI'16). USENIX, USA, 425--438.
[51]
Sagar Jha, Jonathan Behrens, Theo Gkountouvas, Matthew Milano, Weijia Song, Edward Tremel, Robbert Van Renesse, Sydney Zink, and Kenneth P. Birman. 2019. Derecho: Fast State Machine Replication for Cloud Services. Trans. Comput. Syst., Vol. 36, 2, Article 4 (2019), bibinfonumpages49 pages. https://doi.org/10.1145/3302258
[52]
Ricardo Jiménez-Peris, M. Pati no Mart'inez, Gustavo Alonso, and Bettina Kemme. 2003. Are Quorums an Alternative for Data Replication? ACM Trans. Database Syst., Vol. 28, 3 (Sept. 2003), 257--294. https://doi.org/10.1145/937598.937601
[53]
Xin Jin, Xiaozhou Li, Haoyu Zhang, Nate Foster, Jeongkeun Lee, Robert Soulé, Changhoon Kim, and Ion Stoica. 2018. NetChain: Scale-Free Sub-RTT Coordination. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX, USA, 35--49.
[54]
Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini. 2011. Zab: High-performance Broadcast for Primary-backup Systems. In Proceedings of the IEEE 41st International Conference on Dependable Systems&Networks (DSN '11). IEEE, USA, 245--256. https://doi.org/10.1109/DSN.2011.5958223
[55]
Gopal Kakivaya, Lu Xun, Richard Hasha, Shegufta Bakht Ahsan, Todd Pfleiger, Rishi Sinha, Anurag Gupta, Mihail Tarta, Mark Fussell, Vipul Modi, Mansoor Mohsin, Ray Kong, Anmol Ahuja, Oana Platon, Alex Wun, Matthew Snider, Chacko Daniel, Dan Mastrian, Yang Li, Aprameya Rao, Vaishnav Kidambi, Randy Wang, Abhishek Ram, Sumukh Shivaprakash, Rajeet Nair, Alan Warwick, Bharat S. Narasimman, Meng Lin, Jeffrey Chen, Abhay Balkrishna Mhatre, Preetha Subbarayalu, Mert Coskun, and Indranil Gupta. 2018. Service Fabric: A Distributed Platform for Building Microservices in the Cloud. In Proceedings of the EuroSys Conference (EuroSys '18). ACM, USA, 1--15. https://doi.org/10.1145/3190508.3190546
[56]
Anuj Kalia, Michael Kaminsky, and David Andersen. 2014. Using RDMA Efficiently for Key-value Services . SIGCOMM Comput. Commun. Rev., Vol. 44, 4 (Aug. 2014), 295--306. https://doi.org/10.1145/2740070.2626299
[57]
Anuj Kalia, Michael Kaminsky, and David Andersen. 2016. Design Guidelines for High Performance RDMA Systems. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 437--450.
[58]
Tim Kraska, Gene Pang, Michael J. Franklin, Samuel Madden, and Alan Fekete. 2013. MDCC: Multi-data Center Consistency. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 113--126. https://doi.org/10.1145/2465351.2465363
[59]
H. T. Kung, Trevor Blackwell, and Alan Chapman. 1994. Credit-based Flow Control for ATM Networks: Credit Update Protocol, Adaptive Credit Allocation and Statistical Multiplexing . In Proceedings of the Conference on Communications Architectures, Protocols and Applications (SIGCOMM '94). ACM, New York, NY, USA, 101--114. https://doi.org/10.1145/190314.190324
[60]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System . SIGOPS Oper. Sys., Vol. 44, 2 (2010), 35--40. https://doi.org/10.1145/1773912.1773922
[61]
Christoph Lameter. 2005. Effective synchronization on Linux/NUMA systems.
[62]
Leslie Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM, Vol. 21, 7 (1978), 558--565.
[63]
Leslie Lamport. 1994. The temporal logic of actions. Transactions on Programming Languages and Systems (TOPLAS), Vol. 16, 3 (1994), 872--923.
[64]
Leslie Lamport. 1998. The part-time parliament. ACM Transactions on Computer Systems (TOCS), Vol. 16, 2 (1998), 133--169.
[65]
Leslie Lamport. 2005. Generalized consensus and Paxos .
[66]
Leslie Lamport. 2006. Fast Paxos. Distributed Computing, Vol. 19, 2 (2006), 79--103. https://doi.org/10.1007/s00446-006-0005-x
[67]
Leslie Lamport et almbox. 2001. Paxos made simple. ACM Sigact News, Vol. 32, 4 (2001), 18--25.
[68]
Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. 2009. Vertical Paxos and Primary-backup Replication. In Proceedings of the Symposium on Principles of Distributed Computing (PODC '09). ACM, USA, 312--313. https://doi.org/10.1145/1582716.1582783
[69]
Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, and Dan R. K. Ports. 2016. Just Say No to Paxos Overhead: Replacing Consensus with Network Ordering. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 467--483.
[70]
Hyeontaek Lim, Dongsu Han, David Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-memory Key-value Storage. In Proceedings of the 11th Networked Systems Design and Implementation (NSDI'14). USENIX Association, USA, 429--444.
[71]
Barbara Liskov and James Cowling. 2012. Viewstamped replication revisited.
[72]
Wyatt Lloyd, Michael Freedman, Michael Kaminsky, and David Andersen. 2011. Don't Settle for Eventual: Scalable Causal Consistency for Wide-area Storage with COPS. In Proceedings of the 23rd Symposium on Operating Systems Principles (SOSP '11). ACM, USA, 401--416. https://doi.org/10.1145/2043556.2043593
[73]
Yuanwei Lu, Guo Chen, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, Enhong Chen, and Thomas Moscibroda. 2018. Multi-Path Transport for RDMA in Datacenters. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, USA, 357--371.
[74]
Nancy Lynch and Alexander Shvartsman. 1997. Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts., bibinfonumpages272--281 pages. https://doi.org/10.1109/FTCS.1997.614100
[75]
Yanhua Mao, Flavio P. Junqueira, and Keith Marzullo. 2008. Mencius: Building Efficient Replicated State Machines for WANs. In Proceedings of the 8th Conference on Operating Systems Design and Implementation (OSDI'08). USENIX, Berkeley, CA, USA, 369--384.
[76]
Parisa Jalili Marandi, Marco Primi, and Fernando Pedone. 2011. High Performance State-machine Replication. In Proceedings of the 41st International Conference on Dependable Systems&Networks (DSN '11). IEEE Computer Society, USA, 454--465. https://doi.org/10.1109/DSN.2011.5958258
[77]
Parisa Jalili Marandi, Marco Primi, Nicolas Schiper, and Fernando Pedone. 2010. Ring Paxos: A high-throughput atomic broadcast protocol. In 2010 International Conference on Dependable Systems Networks. ., USA, 527--536. https://doi.org/10.1109/DSN.2010.5544272
[78]
Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C. Evans, Steve Gribble, Nicholas Kidd, Roman Kononov, Gautam Kumar, Carl Mauer, Emily Musick, Lena Olson, Erik Rubow, Michael Ryan, Kevin Springborn, Paul Turner, Valas Valancius, Xi Wang, and Amin Vahdat. 2019. Snap: A Microkernel Approach to Host Networking. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP '19). ACM, USA, 399--413. https://doi.org/10.1145/3341301.3359657
[79]
Iulian Moraru, David Andersen, and Michael Kaminsky. 2013. There is More Consensus in Egalitarian Parliaments. In Proceedings of the 24th Symposium on Operating Systems Principles (SOSP '13). ACM, USA, 358--372. https://doi.org/10.1145/2517349.2517350
[80]
Iulian Moraru, David Andersen, and Michael Kaminsky. 2014. Paxos Quorum Leases: Fast Reads Without Sacrificing Writes. In Proceedings of the Symposium on Cloud Computing (SOCC '14). ACM, USA, 1--13. https://doi.org/10.1145/2670979.2671001
[81]
Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell, and Yutaka Suzue. 2012. Flat Datacenter Storage. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12) . USENIX, Hollywood, CA, 1--15.
[82]
Stanko Novakovic, Alexandros Daglis, Edouard Bugnion, Babak Falsafi, and Boris Grot. 2016. The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems. In Proceedings of the Seventh ACM Symposium on Cloud Computing (SoCC '16). ACM, USA, 182--195. https://doi.org/10.1145/2987550.2987577
[83]
Brian M. Oki and Barbara H. Liskov. 1988. Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems. In Proceedings of the Seventh Symposium on Principles of Distributed Computing (PODC '88). ACM, USA, 8--17. https://doi.org/10.1145/62546.62549
[84]
Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC'14). USENIX, USA, 305--320.
[85]
Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast Crash Recovery in RAMCloud. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 29--41. https://doi.org/10.1145/2043556.2043560
[86]
Seo Jin Park and John Ousterhout. 2019. Exploiting Commutativity for Practical Fast Replication. In Proceedings of the 16th Conference on Networked Systems Design and Implementation (NSDI'19). USENIX, USA, 47--64.
[87]
Marius Poke and Torsten Hoefler. 2015. DARE: High-Performance State Machine Replication on RDMA Networks. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15). ACM, USA, 107--118. https://doi.org/10.1145/2749246.2749267
[88]
Marius Poke, Torsten Hoefler, and Colin W. Glass. 2017. AllConcur: Leaderless Concurrent Atomic Broadcast. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '17). ACM, USA, 205--218. https://doi.org/10.1145/3078597.3078598
[89]
Ian Prittie. 2018. Windows Time Service | Microsoft Docs . https://docs.microsoft.com/en-us/windows-server/networking/windows-time-service/windows-time-service-top . (Accessed on 13/08/2019).
[90]
Benjamin Reed and Flavio P. Junqueira. 2008. A Simple Totally Ordered Broadcast Protocol. In Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS '08). ACM, USA, 2:1--2:6. https://doi.org/10.1145/1529974.1529978
[91]
Fred B. Schneider. 1990. Implementing Fault-tolerant Services Using the State Machine Approach: A Tutorial . ACM Comput. Surv., Vol. 22, 4 (Dec. 1990), 299--319. https://doi.org/10.1145/98163.98167
[92]
Michael L. Scott. 2013. Shared-Memory Synchronization.
[93]
Alex Shamis, Matthew Renzelmann, Stanko Novakovic, Georgios Chatzopoulos, Aleksandar Dragojević, Dushyanth Narayanan, and Miguel Castro. 2019. Fast General Distributed Transactions with Opacity. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 433--448. https://doi.org/10.1145/3299869.3300069
[94]
Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. 2015. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, USA, 183--197. https://doi.org/10.1145/2785956.2787508
[95]
Dale Skeen. 1981. Nonblocking Commit Protocols. In Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data (SIGMOD '81). ACM, USA, 133--142. https://doi.org/10.1145/582318.582339
[96]
Jeff Terrace and Michael J. Freedman. 2009. Object Storage on CRAQ: High-throughput Chain Replication for Read-mostly Workloads. In Proceedings of the 2009 Conference on USENIX Annual Technical Conference (USENIX'09). USENIX Association, Berkeley, CA, USA, 11--11.
[97]
Robbert Van Renesse, Kenneth P. Birman, Bradford B. Glade, Katie Guo, Mark Hayden, Takako Hickey, Dalia Malki, Alex Vaysburd, and Werner Vogels. 1995. Horus: A Flexible Group Communications System . Technical Report. Cornell University, Ithaca, NY, USA.
[98]
Robbert van Renesse and Fred B. Schneider. 2004. Chain Replication for Supporting High Throughput and Availability. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation (OSDI'04). USENIX, Berkeley, CA, USA, 7--7.
[99]
Paolo Viotti and Marko Vukolić. 2016. Consistency in Non-Transactional Distributed Storage Systems . ACM Comput. Surv., Vol. 49, 1 (2016), 19:1--19:34. https://doi.org/10.1145/2926965
[100]
Werner Vogels. 2009. Eventually Consistent . Commun. ACM, Vol. 52, 1 (2009), 40--44. https://doi.org/10.1145/1435417.1435432
[101]
Cheng Wang, Jianyu Jiang, Xusheng Chen, Ning Yi, and Heming Cui. 2017. APUS: Fast and Scalable Paxos on RDMA. In Proceedings of the Symposium on Cloud Computing (SoCC '17). ACM, USA, 94--107. https://doi.org/10.1145/3127479.3128609
[102]
Michael Wei, Amy Tai, Christopher J. Rossbach, Ittai Abraham, Maithem Munshed, Medhavi Dhawan, Jim Stabile, Udi Wieder, Scott Fritchie, Steven Swanson, Michael J. Freedman, and Dahlia Malkhi. 2017. vCorfu: A Cloud-scale Object Store on a Shared Log. In Proceedings of the 14th Conference on Networked Systems Design and Implementation (NSDI'17). USENIX Association, USA, 35--49.
[103]
Shinae Woo, Justine Sherry, Sangjin Han, Sue Moon, Sylvia Ratnasamy, and Scott Shenker. 2018. Elastic Scaling of Stateful Network Functions. In 15th Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 299--312.
[104]
Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran, Marcos K. Aguilera, and Jinyang Li. 2013. Transaction Chains: Achieving Serializability with Low Latency in Geo-distributed Storage Systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 276--291. https://doi.org/10.1145/2517349.2522729
[105]
Hang Zhu, Zhihao Bai, Jialin Li, Ellis Michael, Dan Ports, Ion Stoica, and Xin Jin. 2019. Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict Detection. arxiv: 1904.08964

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
March 2020
1412 pages
ISBN:9781450371025
DOI:10.1145/3373376
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2020

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. availability
  2. consistency
  3. fault-tolerant
  4. latency
  5. linearizability
  6. rdma
  7. replication
  8. throughput

Qualifiers

  • Research-article

Funding Sources

  • EPSRC

Conference

ASPLOS '20

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)367
  • Downloads (Last 6 weeks)40
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media