Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- editorialNovember 2024
- research-articleDecember 2011
Depot: Cloud Storage with Minimal Trust
ACM Transactions on Computer Systems (TOCS), Volume 29, Issue 4Article No.: 12, Pages 1–38https://doi.org/10.1145/2063509.2063512This article describes the design, implementation, and evaluation of Depot, a cloud storage system that minimizes trust assumptions. Depot tolerates buggy or malicious behavior by any number of clients or servers, yet it provides safety and liveness ...
- research-articleDecember 2011
Efficient Testing of Recovery Code Using Fault Injection
ACM Transactions on Computer Systems (TOCS), Volume 29, Issue 4Article No.: 11, Pages 1–38https://doi.org/10.1145/2063509.2063511A critical part of developing a reliable software system is testing its recovery code. This code is traditionally difficult to test in the lab, and, in the field, it rarely gets to run; yet, when it does run, it must execute flawlessly in order to ...
- research-articleJuly 2010
Throughput optimal total order broadcast for cluster environments
ACM Transactions on Computer Systems (TOCS), Volume 28, Issue 2Article No.: 5, Pages 1–32https://doi.org/10.1145/1813654.1813656Total order broadcast is a fundamental communication primitive that plays a central role in bringing cheap software-based high availability to a wide range of services. This article studies the practical performance of such a primitive on a cluster of ...
- research-articleJuly 2010
Proactive obfuscation
ACM Transactions on Computer Systems (TOCS), Volume 28, Issue 2Article No.: 4, Pages 1–54https://doi.org/10.1145/1813654.1813655Proactive obfuscation is a new method for creating server replicas that are likely to have fewer shared vulnerabilities. It uses semantics-preserving code transformations to generate diverse executables, periodically restarting servers with these fresh ...
-
- research-articleJanuary 2010
Zyzzyva: Speculative Byzantine fault tolerance
ACM Transactions on Computer Systems (TOCS), Volume 27, Issue 4Article No.: 7, Pages 1–39https://doi.org/10.1145/1658357.1658358A longstanding vision in distributed systems is to build reliable systems from unreliable components. An enticing formulation of this vision is Byzantine Fault-Tolerant (BFT) state machine replication, in which a group of servers collectively act as a ...
- research-articleMay 2009
Practical and low-overhead masking of failures of TCP-based servers
ACM Transactions on Computer Systems (TOCS), Volume 27, Issue 2Article No.: 4, Pages 1–39https://doi.org/10.1145/1534909.1534911This article describes an architecture that allows a replicated service to survive crashes without breaking its TCP connections. Our approach does not require modifications to the TCP protocol, to the operating system on the server, or to any of the ...
- articleNovember 2006
Recovering device drivers
ACM Transactions on Computer Systems (TOCS), Volume 24, Issue 4Pages 333–360https://doi.org/10.1145/1189256.1189257This article presents a new mechanism that enables applications to run correctly when device drivers fail. Because device drivers are the principal failing component in most systems, reducing driver-induced failures greatly improves overall reliability. ...
- articleFebruary 2005
Improving the reliability of commodity operating systems
ACM Transactions on Computer Systems (TOCS), Volume 23, Issue 1Pages 77–110https://doi.org/10.1145/1047915.1047919Despite decades of research in extensible operating system technology, extensions such as device drivers remain a significant cause of system failures. In Windows XP, for example, drivers account for 85% of recently reported failures.This article ...
- articleAugust 2003
BASE: Using abstraction to improve fault tolerance
ACM Transactions on Computer Systems (TOCS), Volume 21, Issue 3Pages 236–269https://doi.org/10.1145/859716.859718Software errors are a major cause of outages and they are increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. This paper describes a replication ...
- articleMay 2003
Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining
ACM Transactions on Computer Systems (TOCS), Volume 21, Issue 2Pages 164–206https://doi.org/10.1145/762483.762485Scalable management and self-organizational capabilities are emerging as central requirements for a generation of large-scale, highly dynamic, distributed applications. We have developed an entirely new distributed information management system called ...
- articleNovember 2002
Practical byzantine fault tolerance and proactive recovery
ACM Transactions on Computer Systems (TOCS), Volume 20, Issue 4Pages 398–461https://doi.org/10.1145/571637.571640Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions ...
- articleNovember 2002
COCA: A secure distributed online certification authority
ACM Transactions on Computer Systems (TOCS), Volume 20, Issue 4Pages 329–368https://doi.org/10.1145/571637.571638COCA is a fault-tolerant and secure online certification authority that has been built and deployed both in a local area network and in the Internet. Extremely weak assumptions characterize environments in which COCA's protocols execute correctly: no ...
- articleMay 2002
The evolution of Coda
ACM Transactions on Computer Systems (TOCS), Volume 20, Issue 2Pages 85–124https://doi.org/10.1145/507052.507053Failure-resilient, scalable, and secure read-write access to shared information by mobile and static users over wireless and wired networks is a fundamental computing challenge. In this article, we describe how the Coda file system has evolved to meet ...
- articleMay 2001
Specifying and using a partitionable group communication service
ACM Transactions on Computer Systems (TOCS), Volume 19, Issue 2Pages 171–216https://doi.org/10.1145/377769.377776Group communication services are becoming accepted as effective building blocks for the construction of fault-tolerant distributed applications. Many specifications for group communication services have been proposed. However, there is still no agreement ...
- articleAugust 2000
Manageability, availability, and performance in porcupine: a highly scalable, cluster-based mail service
ACM Transactions on Computer Systems (TOCS), Volume 18, Issue 3Page 298https://doi.org/10.1145/354871.354875This paper describes the motivation, design and performance of Porcupine, a scalable mail server. The goal of Porcupine is to provide a highly available and scalable electronic mail service using a large cluster of commodity PCs. We designed Porcupine to ...
- articleNovember 1998
Coyote: a system for constructing fine-grain configurable communication services
ACM Transactions on Computer Systems (TOCS), Volume 16, Issue 4Pages 321–366https://doi.org/10.1145/292523.292524Communication-oriented abstractions such as atomic multicast, group RPC, and protocols for location-independent mobile computing can simplify the development of complex applications built on distributed systems. This article describes Coyote, a system ...
- articleMay 1998
The part-time parliament
ACM Transactions on Computer Systems (TOCS), Volume 16, Issue 2Pages 133–169https://doi.org/10.1145/279227.279229Recent archaeological discoveries on the island of Paxos reveal that the parliament functioned despite the peripatetic propensity of its part-time legislators. The legislators maintained consistent copies of the parliamentary record, despite their ...
- articleMay 1997
Strong loss tolerance of electronic coin systems
ACM Transactions on Computer Systems (TOCS), Volume 15, Issue 2Pages 194–213https://doi.org/10.1145/253145.253282Untraceable electronic cash means prepaid digital payment systems, usually with offline payments, that protect user privacy. Such systems have recently been given considerable attention by both theory and development projects. However, in most current ...
- articleAugust 1996
Recovery in the Calypso file system
ACM Transactions on Computer Systems (TOCS), Volume 14, Issue 3Pages 287–310https://doi.org/10.1145/233557.233560This article presents the deign and implementation of the recovery scheme in Calypso. Calypso is a cluster-optimized, distributed file system for UNIX clusters. As in Sprite and AFS, Calypso servers are stateful and scale well to a large number of ...