skip to main content
research-article

Deterministic or probabilistic? - A survey on Byzantine fault tolerant state machine replication

Published: 01 June 2023 Publication History

Highlights

Network infrastructures and software systems are vulnerable to failures.
Service replication is a solution that guarantees the service’s correct execution even in the presence of faults.
The use of a consensus protocol is necessary for replicas to execute the same order of requests, i.e., achieve the same outcome.
We present a survey with a detailed description on deterministic and probabilistic consensus in BFT-SMR protocols.
We provide a discussion between the protocols trade-offs and how it reflects in their efficiency.

Abstract

Byzantine Fault tolerant (BFT) protocols are implemented to guarantee the correct system/application behavior even in the presence of arbitrary faults (i.e., Byzantine faults). Byzantine Fault tolerant State Machine Replication (BFT-SMR) is a known software solution for masking arbitrary faults and malicious attacks (Liu et al., 2020). In this survey, we present and discuss relevant BFT-SMR protocols, focusing on deterministic and probabilistic approaches. The main purpose of this paper is to discuss the characteristics of proposed works for each approach, as well as identify the trade-offs for each different approach.

References

[1]
I. Abraham, D. Malkhi, A. Spiegelman, Asymptotically optimal validated asynchronous byzantine agreement, Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, New York, NY, USA, 2019, pp. 337–346,.
[2]
Avarikioti, Z., Heimbach, L., Schmid, R., Vanbever, L., Wattenhofer, R., Wintermeyer, P., 2020. FNF-BFT: exploring performance limits of BFT protocols. arXiv preprint arXiv:2009.02235.
[3]
A. Avizienis, The n-version approach to fault-tolerant software, IEEE Trans. Softw. Eng. (12) (1985) 1491–1501.
[4]
P. Bachmann, Zahlentheorie: th. Die analytische Zahlentheorie (1894), Vol. 2, BG Teubner, 1894.
[5]
J. Baek, Y. Zheng, Simple and efficient threshold cryptosystem from the Gap Diffie-Hellman group, GLOBECOM ’03. IEEE Global Telecommunications Conference (IEEE Cat. No.03CH37489), Vol. 3, 2003, pp. 1491–1495vol.3,.
[6]
J. Behl, T. Distler, R. Kapitza, Consensus-oriented parallelization: how to earn your first million, Proceedings of the 16th Annual Middleware Conference, Association for Computing Machinery, New York, NY, USA, 2015, pp. 173–184,.
[7]
Behl, J., Distler, T., Kapitza, R., 2017a. A highly parallelizable protocol for hybrid fault-tolerant service replication.
[8]
J. Behl, T. Distler, R. Kapitza, Hybrids on steroids: SGX-based high performance BFT, Proceedings of the Twelfth European Conference on Computer Systems, Association for Computing Machinery, New York, NY, USA, 2017, pp. 222–237,.
[9]
M. Ben-Or, B. Kelmer, T. Rabin, Asynchronous secure computations with optimal resilience (extended abstract), Proceedings of the Thirteenth Annual ACM Symposium on Principles of Distributed Computing, ACM, New York, NY, USA, 1994, pp. 183–192,.
[10]
A. Bessani, J. Sousa, E.E. Alchieri, State machine replication for the masses with BFT-SMART, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014, pp. 355–362,.
[11]
A.N. Bessani, P. Sousa, M. Correia, N.F. Neves, P. Veríssimo, The critical way of critical infrastructure protection, IEEE Secur. Privacy 6 (6) (2008) 44–51,.
[12]
A. Boldyreva, Threshold signatures, multisignatures and blind signatures based on the Gap-Diffie-Hellman-Group signature scheme, Proceedings of the 6th International Workshop on Theory and Practice in Public Key Cryptography: Public Key Cryptography, Springer-Verlag, Berlin, Heidelberg, 2003, pp. 31–46.
[13]
G. Bracha, An asynchronous [(n - 1)/3]-resilient consensus protocol, Proceedings of the Third Annual ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, New York, NY, USA, 1984, pp. 154–162,.
[14]
G. Bracha, S. Toueg, Asynchronous consensus and broadcast protocols, J. ACM 32 (4) (1985) 824–840,.
[15]
C. Cachin, K. Kursawe, F. Petzold, V. Shoup, Secure and efficient asynchronous broadcast protocols, Proceedings of the 21st Annual International Cryptology Conference on Advances in Cryptology, Springer-Verlag, Berlin, Heidelberg, 2001, pp. 524–541.
[16]
C. Cachin, K. Kursawe, V. Shoup, Random oracles in constantipole: Practical asynchronous byzantine agreement using cryptography (extended abstract), Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, New York, NY, USA, 2000, pp. 123–132,.
[17]
C. Cachin, J. Poritz, Secure intrusion-tolerant replication on the internet, Proceedings International Conference on Dependable Systems and Networks, 2002, pp. 167–176,.
[18]
C. Cachin, S. Tessaro, Asynchronous verifiable information dispersal, 24th IEEE Symposium on Reliable Distributed Systems (SRDS’05), 2005, pp. 191–201,.
[19]
R. Canetti, T. Rabin, Fast asynchronous byzantine agreement with optimal resilience, Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, Association for Computing Machinery, New York, NY, USA, 1993, pp. 42–51,.
[20]
M. Castro, B. Liskov, Practical byzantine fault tolerance, Proceedings of the Third Symposium on Operating Systems Design and Implementation, USENIX Association, USA, 1999, pp. 173–186.
[21]
M. Castro, B. Liskov, Practical byzantine fault tolerance and proactive recovery, ACM Trans. Comput. Syst. 20 (4) (2002) 398–461,.
[22]
I. Chivers, J. Sleightholme, I. Chivers, J. Sleightholme, An introduction to algorithms and the big O notation, Introduction to Programming with Fortran: With Coverage of Fortran 90, 95, 2003, 2008 and 77, 2015, pp. 359–364.
[23]
J.C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J.J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, D. Woodford, Spanner: Google’s globally-distributed database, Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, USA, 2012, pp. 251–264.
[24]
M. Correia, N. Neves, P. Verissimo, How to tolerate half less one byzantine nodes in practical distributed systems, Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004, 2004, pp. 174–183,.
[25]
M. Correia, N.F. Neves, L.C. Lung, P. Veríssimo, Worm-it–a wormhole-based intrusion-tolerant group communication system, J. Syst. Softw. 80 (2) (2007) 178–197.
[26]
M. Correia, N.F. Neves, P. Veríssimo, From consensus to atomic broadcast: time-free byzantine-resistant protocols without signatures, Comput. J. 49 (1) (2005) 82–96,.
[27]
M. Correia, P. Verissimo, N.F. Neves, The design of a cots real-time distributed security kernel, European Dependable Computing Conference, Springer, 2002, pp. 234–252.
[28]
F. Cristian, H. Aghili, R. Strong, D. Volev, Atomic broadcast: from simple message diffusion to byzantine agreement, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ’ Highlights from Twenty-Five Years’, 1995, p. 431,.
[29]
G. Danezis, L. Kokoris-Kogias, A. Sonnino, A. Spiegelman, Narwhal and tusk: a DAG-based mempool and efficient BFT consensus, Proceedings of the Seventeenth European Conference on Computer Systems, 2022, pp. 34–50.
[30]
Y.G. Desmedt, Threshold cryptography, Eur. Trans. Telecommun. 5 (4) (1994) 449–458.
[31]
T. Distler, C. Cachin, R. Kapitza, Resource-efficient byzantine fault tolerance, IEEE Trans. Comput. 65 (9) (2015) 2807–2819.
[32]
S. Duan, M.K. Reiter, H. Zhang, BEAT: asynchronous BFT made practical, Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Association for Computing Machinery, New York, NY, USA, 2018, pp. 2028–2041,.
[33]
C. Dwork, N. Lynch, L. Stockmeyer, Consensus in the presence of partial synchrony, J. ACM 35 (2) (1988) 288–323,.
[34]
M.J. Fischer, The consensus problem in unreliable distributed systems (a brief survey), International conference on fundamentals of computation theory, Springer, 1983, pp. 127–140.
[35]
M.J. Fischer, N.A. Lynch, M.S. Paterson, Impossibility of distributed consensus with one faulty process, J. ACM 32 (2) (1985) 374–382,.
[36]
Y. Gao, Y. Lu, Z. Lu, Q. Tang, J. Xu, Z. Zhang, Dumbo-NG: fast asynchronous BFT consensus with throughput-oblivious latency, Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022, pp. 1187–1201.
[37]
J. Garay, A. Kiayias, SoK: a consensus taxonomy in the blockchain era, Topics in Cryptology–CT-RSA 2020: The Cryptographers’ Track at the RSA Conference 2020, San Francisco, CA, USA, February 24–28, 2020, Proceedings, Springer, 2020, pp. 284–318.
[38]
A. Ga̧gol, D. Leśniak, D. Straszak, M. Świȩtek, Aleph: efficient atomic broadcast in asynchronous networks with byzantine nodes, Proceedings of the 1st ACM Conference on Advances in Financial Technologies, Association for Computing Machinery, New York, NY, USA, 2019, pp. 214–228,.
[39]
Golan Gueta, G., Abraham, I., Grossman, S., Malkhi, D., Pinkas, B., Reiter, M. K., Seredinschi, D.-A., Tamir, O., Tomescu, A., 2018. SBFT: a scalable and decentralized trust infrastructure. arXiv e-prints, arXiv–1804.
[40]
Gunn, L. J., Liu, J., Vavala, B., Asokan, N., 2019. Making speculative BFT resilient with trusted monotonic counters. https://arxiv.org/abs/1905.10255. 10.48550/ARXIV.1905.10255
[41]
B. Guo, Z. Lu, Q. Tang, J. Xu, Z. Zhang, Dumbo: Faster Asynchronous BFT Protocols, Association for Computing Machinery, New York, NY, USA, 2020, pp. 803–818.
[42]
X. Hao, L. Yu, L. Zhiqiang, L. Zhen, G. Dawu, Dynamic practical byzantine fault tolerance, 2018 IEEE Conference on Communications and Network Security (CNS), 2018, pp. 1–8,.
[43]
X. Hao, L. Yu, L. Zhiqiang, L. Zhen, G. Dawu, Dynamic practical byzantine fault tolerance, 2018 IEEE Conference on Communications and Network Security (CNS), 2018, pp. 1–8,.
[44]
P. Hunt, M. Konar, F.P. Junqueira, B. Reed, ZooKeeper: wait-free coordination for internet-scale systems, Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIX Association, USA, 2010, p. 11.
[45]
B. Jin, Y. Hu, H. Tao, Y. He, An improved practical byzantine fault-tolerant consensus algorithm combined with aggregating signature, 7th International Symposium on Advances in Electrical, Electronics, and Computer Engineering, Vol. 12294, SPIE, 2022, pp. 1175–1182.
[46]
R. Kapitza, J. Behl, C. Cachin, T. Distler, S. Kuhnle, S.V. Mohammadi, W. Schröder-Preikschat, K. Stengel, CheapBFT: resource-efficient byzantine fault tolerance, Proceedings of the 7th ACM European Conference on Computer Systems, Association for Computing Machinery, New York, NY, USA, 2012, pp. 295–308,.
[47]
K. Kihlstrom, L. Moser, P. Melliar-Smith, The SecureRing protocols for securing group communication, Proceedings of the Thirty-First Hawaii International Conference on System Sciences, Vol. 3, 1998, pp. 317–326vol.3,.
[48]
R. Kotla, L. Alvisi, M. Dahlin, A. Clement, E. Wong, Zyzzyva: speculative byzantine fault tolerance, ACM Trans. Comput. Syst. 27 (4) (2010),.
[49]
K. Kursawe, V. Shoup, Optimistic asynchronous atomic broadcast, in: Caires L., Italiano G.F., Monteiro L., Palamidessi C., Yung M. (Eds.), Automata, Languages and Programming, Springer Berlin Heidelberg, Berlin, Heidelberg, 2005, pp. 204–215.
[50]
Kwon, J., 2014. TenderMint: consensus without mining. Draft v. 0.6, fall 1 (11).
[51]
L. Lamport, Time, clocks, and the ordering of events in a distributed system, Commun. ACM 21 (7) (1978) 558–565,.
[52]
L. Lamport, Paxos made simple, ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001), 2001, pp. 51–58.
[53]
Levin, D., Douceur, J. R., Lorch, J. R., Moscibroda, T., 2009. TrInc: small trusted hardware for large distributed systems
[54]
B. Li, W. Xu, M.Z. Abid, T. Distler, R. Kapitza, SAREK: optimistic parallel ordering in byzantine fault tolerance, 2016 12th European Dependable Computing Conference (EDCC), IEEE, 2016, pp. 77–88.
[55]
C. Liu, S. Duan, H. Zhang, EPIC: efficient asynchronous BFT with adaptive security, 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2020, pp. 437–451,.
[56]
Liu, C., Duan, S., Zhang, H., 2021. MiB: asynchronous BFT with more replicas. arXiv preprint arXiv:2108.04488.
[57]
S. Liu, P. Viotti, C. Cachin, V. Quéma, M. Vukolic, XFT: practical fault tolerance beyond crashes, Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, USA, 2016, pp. 485–500.
[58]
D. Malkhi, M. Reiter, R. Wright, Probabilistic quorum systems, Proceedings of the Sixteenth Annual ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, New York, NY, USA, 1997, pp. 267–273,.
[59]
S. Maneas, N. Chondros, P. Diamantopoulos, C. Patsonakis, M. Roussopoulos, On achieving interactive consistency in real-world distributed systems, J. Parallel Distrib. Comput. 147 (2021) 220–235.
[60]
J.-P. Martin, L. Alvisi, Fast byzantine consensus, IEEE Trans. Dependable Secure Comput. 3 (3) (2006) 202–215.
[61]
F. McKeen, I. Alexandrovich, A. Berenzon, C.V. Rozas, H. Shafi, V. Shanbhogue, U.R. Savagaonkar, Innovative instructions and software model for isolated execution, HASP@ isca, 2013, p. 1.
[62]
A. Miller, Y. Xia, K. Croman, E. Shi, D. Song, The honey badger of BFT protocols, Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Association for Computing Machinery, New York, NY, USA, 2016, pp. 31–42,.
[63]
Z. Milosevic, M. Biely, A. Schiper, Bounded delay in byzantine-tolerant state machine replication, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems, 2013, pp. 61–70,.
[64]
H. Moniz, N.F. Neves, M. Correia, P. Verissimo, RITAS: services for randomized intrusion tolerance, IEEE Trans. Dependable Secur. Comput. 8 (1) (2011) 122–136,.
[65]
A. Mostefaoui, H. Moumen, M. Raynal, Signature-free asynchronous byzantine consensus with t < n / 3 and O(n2) messages, Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, New York, NY, USA, 2014, pp. 2–9,.
[66]
M.T. Özsu, P. Valduriez, et al., Principles of Distributed Database Systems, Vol. 2, Springer, 1999.
[67]
J. Plank, L. Xu, Optimizing cauchy reed-solomon codes for fault-tolerant network storage applications, Fifth IEEE International Symposium on Network Computing and Applications (NCA’06), 2006, pp. 173–180,.
[68]
M.O. Rabin, Randomized byzantine generals, Proceedings of the 24th Annual Symposium on Foundations of Computer Science, IEEE Computer Society, USA, 1983, pp. 403–409,.
[69]
H.V. Ramasamy, C. Cachin, Parsimonious asynchronous byzantine-fault-tolerant atomic broadcast, Proceedings of the 9th International Conference on Principles of Distributed Systems, Springer-Verlag, Berlin, Heidelberg, 2005, pp. 88–102,.
[70]
M. Reiter, Distributing trust with the rampart toolkit, Commun. ACM 39 (4) (1996) 71–74,.
[71]
M.K. Reiter, The rampart toolkit for building high-integrity services, Selected Papers from the International Workshop on Theory and Practice in Distributed Systems, Springer-Verlag, Berlin, Heidelberg, 1994, pp. 99–110.
[72]
M.K. Reiter, Secure agreement protocols: reliable and atomic group multicast in rampart, Proceedings of the 2nd ACM Conference on Computer and Communications Security, Association for Computing Machinery, New York, NY, USA, 1994, pp. 68–80,.
[73]
M.K. Reiter, K.P. Birman, How to securely replicate services, ACM Trans. Program. Lang. Syst. 16 (3) (1994) 986–1009,.
[74]
R.L. Rivest, A. Shamir, L. Adleman, A method for obtaining digital signatures and public-key cryptosystems, Commun. ACM 21 (2) (1978) 120–126,.
[75]
F.B. Schneider, Implementing fault-tolerant services using the state machine approach: a tutorial, ACM Comput. Surv. 22 (4) (1990) 299–319,.
[76]
V. Shoup, Practical threshold signatures, Proceedings of the 19th International Conference on Theory and Application of Cryptographic Techniques, Springer-Verlag, Berlin, Heidelberg, 2000, pp. 207–220.
[77]
V. Shoup, R. Gennaro, Securing threshold cryptosystems against chosen ciphertext attack, in: Nyberg K. (Ed.), Advances in Cryptology — EUROCRYPT’98, Springer Berlin Heidelberg, Berlin, Heidelberg, 1998, pp. 1–16.
[78]
J. Sousa, A. Bessani, From byzantine consensus to BFT state machine replication: a latency-optimal transformation, 2012 Ninth European Dependable Computing Conference, 2012, pp. 37–48,.
[79]
P. Sousa, A.N. Bessani, R.R. Obelheiro, The forever service for fault/intrusion removal, Proceedings of the 2Nd Workshop on Recent Advances on Intrusiton-Tolerant Systems, ACM, New York, NY, USA, 2008, pp. 5:1–5:6,.
[80]
S. Toueg, Randomized byzantine agreements, Proceedings of the Third Annual ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, New York, NY, USA, 1984, pp. 163–178,.
[81]
G. Tsudik, Message authentication with one-way hash functions, SIGCOMM Comput. Commun. Rev. 22 (5) (1992) 29–38,.
[82]
G.S. Veronese, M. Correia, A.N. Bessani, L.C. Lung, P. Verissimo, Efficient byzantine fault-tolerance, IEEE Trans. Comput. 62 (1) (2013) 16–30,.
[83]
Y. Wang, Byzantine fault tolerance for distributed ledgers revisited, Distrib. Ledger Technol. Res.Pract. 1 (1) (2022) 1–26.
[84]
T. Wood, R. Singh, A. Venkataramani, P.J. Shenoy, E. Cecchet, ZZ and the art of practical BFT, 2009.
[85]
Y. Xiao, N. Zhang, W. Lou, Y.T. Hou, A survey of distributed consensus protocols for blockchain networks, IEEE Commun. Surv. Tutor. 22 (2) (2020) 1432–1465.
[86]
Yin, M., Malkhi, D., Reiter, M. K., Gueta, G. G., Abraham, I., 2018. HotStuff: BFT consensus in the lens of blockchain. arXiv preprint arXiv:1803.05069.
[87]
M. Yin, D. Malkhi, M.K. Reiter, G.G. Gueta, I. Abraham, HotStuff: BFT consensus with linearity and responsiveness, Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, Association for Computing Machinery, New York, NY, USA, 2019, pp. 347–356,.
[88]
P. Zieliński, Paxos at War, Technical Report, University of Cambridge, Computer Laboratory, 2004.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computers and Security
Computers and Security  Volume 129, Issue C
Jun 2023
606 pages

Publisher

Elsevier Advanced Technology Publications

United Kingdom

Publication History

Published: 01 June 2023

Author Tags

  1. Byzantine fault tolerance
  2. State machine replication
  3. Fault tolerance
  4. Deterministic consensus
  5. Probabilistic consensus
  6. Asynchronous distributed systems
  7. Security

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media