skip to main content
article

A regulated transitive reduction (RTR) for longer memory race recording

Published: 20 October 2006 Publication History

Abstract

Now at VMware. Multithreaded deterministic replay has important applications in cyclic debugging, fault tolerance and intrusion analysis. Memory race recording is a key technology for multithreaded deterministic replay. In this paper, we considerably improve our previous always-on Flight Data Recorder (FDR) in four ways: •Longer recording by reducing the log size growth rate to approximately one byte per thousand dynamic instructions. •Lower hardware cost by reducing the cost to 24 KB per processor core. •Simpler design by modifying only the cache coherence protocol, but not the cache. •Broader applicability by supporting both Sequential Consistency (SC) and Total Store Order (TSO) memory consistency models (existing recorders support only SC).These improvements stem from several ideas: (1) a Regulated Transitive Reduction (RTR) recording algorithm that creates stricter and vectorizable dependencies to reduce the log growth rate; (2) a Set/LRU timestamp approximation method that better approximates timestamps of uncached memory locations to reduce the hardware cost; (3) an order-value-hybrid recording methodthat explicitly logs the value of potential SC-violating load instructions to support multiprocessor systems with TSO.

References

[1]
A.R. Alameldeen, et al. Simulating a $2M Commercial Server on a $2K PC. IEEE Computer, 36(2):50--57, Feb. 2003.
[2]
Arvind and J.-W. Maessen. Memory Model = Instruction Reordering + Store Atomicity. In Proceedings of the 33nd Annual International Symposium on Computer Architecture, June 2006.
[3]
D.F. Bacon and S.C. Goldstein. Hardware-Assisted Replay of Multiprocessor Programs. Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, published in ACM SIGPLAN Notices, pages 194--206, 1991.
[4]
P. Barford and M. Crovella. Generating Representative Web Workloads for Network and Server Performance Evaluation. In Proceedings of the 1998 Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 151--160, June 1998.
[5]
L.A. Barroso, et al. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.
[6]
H.W. Cain and M.H. Lipasti. Memory Ordering: A Value-Based Approach. In Proceedings of the 31st Annual International Symposium on Computer Architecture, June 2004.
[7]
J.-D. Choi and H. Srinivasan. Deterministic Replay of Java Multithread Applications. In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools (SPDT-98), Aug. 1998.
[8]
G.W. Dunlap, et al. ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. In Proceedings of the 2002 Symposium on Operating Systems Design and Implementation, pages 211--224, Dec. 2002.
[9]
K. Gharachorloo, et al. Two Techniques to Enhance the Performance of Memory Consistency Models. In Proceedings of the International Conference on Parallel Processing, volume I, p 355--364, Aug. 1991.
[10]
C. Gniady, et al. Is SC + ILP = RC? In Proceedings of the 26th International Symposium on Computer Architecture, May 1999.
[11]
P. Kongetira, et al. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21--29, Mar 2005.
[12]
L. Lamport. Time, Clocks and the Ordering of Events in a Distributed System. Communications of the ACM, 21(7):558--565, July 1978.
[13]
T.J. Leblanc and J.M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, C-36(4):471--482, Apr. 1987.
[14]
K. Lepak. Personal Communication, Mar. 2006.
[15]
D. Lucchetti, et al. ExtraVirt: Detecting and recovering from transient processor faults. In 2005 Symposium on Operating System Principles work-in-progress session, Oct. 2005.
[16]
P.S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50--58, Feb. 2002.
[17]
M. Martin, et al. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, pages 92--99, Sept. 2005.
[18]
M.R. Marty, et al. Improving Multiple-CMP Systems Using Token Coherence. In Proceedings of the Eleventh IEEE Symposium on High-Performance Computer Architecture, Feb. 2005.
[19]
S.L. Min and J.-D. Choi. An Efficient Cache-based Access Anomaly Detection Scheme. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 235-244, Apr. 1991.
[20]
S. Narayanasamy, et al. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In Proceedings of the 32nd International Symposium on Computer Architecture, June 2005.
[21]
R.H.B. Netzer. Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs. In Proceedings of the Workshop on Parallel and Distributed Debugging (PADD), p 1--11, 1993.
[22]
C. Newburn. Personal Communication, Oct. 2003.
[23]
C.M. Pancake and R.H.B. Netzer. A bibliography of parallel debuggers, 1993 edition. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging (PADD), p 169--186, 1993.
[24]
M. Prvulovic. CORD: Cost-effective (and nearly overhead-free) Order Recording and Data race detection. In Proceedings of the 12th Symposium on High-Performance Computer Architecture, Feb. 2006.
[25]
M. Prvulovic and J. Torrellas. ReEnact: Using Thread-Level Speculation Mechanisms to Debug Data Races in Multithreaded Codes. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pages 110--121, June 2003.
[26]
F. Qin, S. Lu, and Y. Zhou. SafeMem: Exploiting ECC-Memory for Detecting Memory Leaks and Memory Corruption During Production Runs. In Proceedings of the Eleventh IEEE Symposium on High-Performance Computer Architecture, Feb. 2005.
[27]
B. Richards and J.R. Larus. Protocol-based Data-race Detection. In SIGMETRICS symposium on Parallel and Distributed Tools, 1998.
[28]
M. Ronsse and K. De Bosschere. Non-intrusive On-the-fly Data Race Detection using Execution Replay. In AADEBUG, Nov. 2000.
[29]
M. Ronsse, et al. Efficient coding of execution-traces of parallel programs. In Proceedings of the ProRISC & IEEE-Benelux workshop on Circuits, Systems and Signal Processing, p 251--258, Mar. 1995.
[30]
M. Rosenblum. Virtual is Better Than Real. http://www.vmware.com/vmworld/2005/keynote_rosenblum.pdf.
[31]
D.L. Weaver and T. Germond, editors. SPARC Architecture Manual (Version 9). PTR Prentice Hall, 1994.
[32]
M. Xu. Race Recording for Multithreaded Deterministic Replay Using Multiprocessor Hardware. PhD thesis, http://www.cs.wisc.edu/multifacet/theses/min_xu_phd.pdf, University of Wisconsin-Madison, 2006.
[33]
M. Xu, et al. A "Flight Data Recorder" for Enabling Full-system Multiprocessor Deterministic Replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003.
[34]
K.C. Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2):28--40, Apr. 1996.
[35]
P. Zhou, et al. AccMon: Automatically Detecting Memory-related Bugs via Program Counter-based Invariants. In Proceedings of the 37th Annual International Symposium on Microarchitecture, 2004.
[36]
P. Zhou, et al. iWatcher: Efficient Architectural Support for Software Debugging. In Proceedings of the 31st Annual International Symposium on Computer Architecture, page 224, June 2004.
[37]
J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, 23(3):337--343, May 1977.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 41, Issue 11
Proceedings of the 2006 ASPLOS Conference
November 2006
425 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1168918
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
    October 2006
    440 pages
    ISBN:1595934510
    DOI:10.1145/1168857
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2006
Published in SIGPLAN Volume 41, Issue 11

Check for updates

Author Tags

  1. determinism
  2. multithreading
  3. race recording

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media