On the Practicality of `Practical' Byzantine Fault Tolerance
Abstract
Byzantine Fault Tolerant (BFT) systems are considered by the systems research community to be state of the art with regards to providing reliability in distributed systems. BFT systems provide safety and liveness guarantees with reasonable assumptions, amongst a set of nodes where at most f nodes display arbitrarily incorrect behaviors, known as Byzantine faults. Despite this, BFT systems are still rarely used in practice. In this paper we describe our experience, from an application developer's perspective, trying to leverage the publicly available and highly-tuned PBFT middleware (by Castro and Liskov), to provide provable reliability guarantees for an electronic voting application with high security and robustness needs. We describe several obstacles we encountered and drawbacks we identified in the PBFT approach. These include some that we tackled, such as lack of support for dynamic client management and leaving state management completely up to the application. Others still remaining include the lack of robust handling of non-determinism, lack of support for web-based applications, lack of support for stronger cryptographic primitives, and others. We find that, while many of the obstacles could be overcome with a revised BFT middleware implementation that is tuned specifically for the needs of the particular application, they require significant engineering effort and time and their performance implications for the end-application are unclear. An application developer is thus unlikely to be willing to invest the time and effort to do so to leverage the BFT approach. We conclude that the research community needs to focus on the usability of BFT algorithms for real world applications, from the end-developer perspective, in addition to continuing to improve the BFT middleware performance, robustness and deployment layouts.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2011
- DOI:
- arXiv:
- arXiv:1110.4854
- Bibcode:
- 2011arXiv1110.4854C
- Keywords:
-
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing