article

A regulated transitive reduction (RTR) for longer memory race recording

Authors:

Rastislav BodikAuthors Info & Claims

ACM SIGPLAN Notices, Volume 41, Issue 11

Pages 49 - 60

https://doi.org/10.1145/1168918.1168865

Published: 20 October 2006 Publication History

Abstract

Now at VMware. Multithreaded deterministic replay has important applications in cyclic debugging, fault tolerance and intrusion analysis. Memory race recording is a key technology for multithreaded deterministic replay. In this paper, we considerably improve our previous always-on Flight Data Recorder (FDR) in four ways: •Longer recording by reducing the log size growth rate to approximately one byte per thousand dynamic instructions. •Lower hardware cost by reducing the cost to 24 KB per processor core. •Simpler design by modifying only the cache coherence protocol, but not the cache. •Broader applicability by supporting both Sequential Consistency (SC) and Total Store Order (TSO) memory consistency models (existing recorders support only SC).These improvements stem from several ideas: (1) a Regulated Transitive Reduction (RTR) recording algorithm that creates stricter and vectorizable dependencies to reduce the log growth rate; (2) a Set/LRU timestamp approximation method that better approximates timestamps of uncached memory locations to reduce the hardware cost; (3) an order-value-hybrid recording methodthat explicitly logs the value of potential SC-violating load instructions to support multiprocessor systems with TSO.

References

[1]

A.R. Alameldeen, et al. Simulating a $2M Commercial Server on a $2K PC. IEEE Computer, 36(2):50--57, Feb. 2003.

Digital Library

[2]

Arvind and J.-W. Maessen. Memory Model = Instruction Reordering + Store Atomicity. In Proceedings of the 33nd Annual International Symposium on Computer Architecture, June 2006.

Digital Library

[3]

D.F. Bacon and S.C. Goldstein. Hardware-Assisted Replay of Multiprocessor Programs. Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, published in ACM SIGPLAN Notices, pages 194--206, 1991.

Digital Library

[4]

P. Barford and M. Crovella. Generating Representative Web Workloads for Network and Server Performance Evaluation. In Proceedings of the 1998 Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 151--160, June 1998.

Digital Library

[5]

L.A. Barroso, et al. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000.

Digital Library

[6]

H.W. Cain and M.H. Lipasti. Memory Ordering: A Value-Based Approach. In Proceedings of the 31st Annual International Symposium on Computer Architecture, June 2004.

Digital Library

[7]

J.-D. Choi and H. Srinivasan. Deterministic Replay of Java Multithread Applications. In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools (SPDT-98), Aug. 1998.

Digital Library

[8]

G.W. Dunlap, et al. ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. In Proceedings of the 2002 Symposium on Operating Systems Design and Implementation, pages 211--224, Dec. 2002.

Digital Library

[9]

K. Gharachorloo, et al. Two Techniques to Enhance the Performance of Memory Consistency Models. In Proceedings of the International Conference on Parallel Processing, volume I, p 355--364, Aug. 1991.

[10]

C. Gniady, et al. Is SC + ILP = RC? In Proceedings of the 26th International Symposium on Computer Architecture, May 1999.

Digital Library

[11]

P. Kongetira, et al. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21--29, Mar 2005.

Digital Library

[12]

L. Lamport. Time, Clocks and the Ordering of Events in a Distributed System. Communications of the ACM, 21(7):558--565, July 1978.

Digital Library

[13]

T.J. Leblanc and J.M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, C-36(4):471--482, Apr. 1987.

Digital Library

[14]

K. Lepak. Personal Communication, Mar. 2006.

[15]

D. Lucchetti, et al. ExtraVirt: Detecting and recovering from transient processor faults. In 2005 Symposium on Operating System Principles work-in-progress session, Oct. 2005.

Digital Library

[16]

P.S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50--58, Feb. 2002.

Digital Library

[17]

M. Martin, et al. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, pages 92--99, Sept. 2005.

Digital Library

[18]

M.R. Marty, et al. Improving Multiple-CMP Systems Using Token Coherence. In Proceedings of the Eleventh IEEE Symposium on High-Performance Computer Architecture, Feb. 2005.

Digital Library

[19]

S.L. Min and J.-D. Choi. An Efficient Cache-based Access Anomaly Detection Scheme. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 235-244, Apr. 1991.

Digital Library

[20]

S. Narayanasamy, et al. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In Proceedings of the 32nd International Symposium on Computer Architecture, June 2005.

Digital Library

[21]

R.H.B. Netzer. Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs. In Proceedings of the Workshop on Parallel and Distributed Debugging (PADD), p 1--11, 1993.

Digital Library

[22]

C. Newburn. Personal Communication, Oct. 2003.

[23]

C.M. Pancake and R.H.B. Netzer. A bibliography of parallel debuggers, 1993 edition. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging (PADD), p 169--186, 1993.

Digital Library

[24]

M. Prvulovic. CORD: Cost-effective (and nearly overhead-free) Order Recording and Data race detection. In Proceedings of the 12th Symposium on High-Performance Computer Architecture, Feb. 2006.

[25]

M. Prvulovic and J. Torrellas. ReEnact: Using Thread-Level Speculation Mechanisms to Debug Data Races in Multithreaded Codes. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pages 110--121, June 2003.

Digital Library

[26]

F. Qin, S. Lu, and Y. Zhou. SafeMem: Exploiting ECC-Memory for Detecting Memory Leaks and Memory Corruption During Production Runs. In Proceedings of the Eleventh IEEE Symposium on High-Performance Computer Architecture, Feb. 2005.

Digital Library

[27]

B. Richards and J.R. Larus. Protocol-based Data-race Detection. In SIGMETRICS symposium on Parallel and Distributed Tools, 1998.

Digital Library

[28]

M. Ronsse and K. De Bosschere. Non-intrusive On-the-fly Data Race Detection using Execution Replay. In AADEBUG, Nov. 2000.

[29]

M. Ronsse, et al. Efficient coding of execution-traces of parallel programs. In Proceedings of the ProRISC & IEEE-Benelux workshop on Circuits, Systems and Signal Processing, p 251--258, Mar. 1995.

[30]

M. Rosenblum. Virtual is Better Than Real. http://www.vmware.com/vmworld/2005/keynote_rosenblum.pdf.

[31]

D.L. Weaver and T. Germond, editors. SPARC Architecture Manual (Version 9). PTR Prentice Hall, 1994.

Digital Library

[32]

M. Xu. Race Recording for Multithreaded Deterministic Replay Using Multiprocessor Hardware. PhD thesis, http://www.cs.wisc.edu/multifacet/theses/min_xu_phd.pdf, University of Wisconsin-Madison, 2006.

Digital Library

[33]

M. Xu, et al. A "Flight Data Recorder" for Enabling Full-system Multiprocessor Deterministic Replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003.

Digital Library

[34]

K.C. Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2):28--40, Apr. 1996.

Digital Library

[35]

P. Zhou, et al. AccMon: Automatically Detecting Memory-related Bugs via Program Counter-based Invariants. In Proceedings of the 37th Annual International Symposium on Microarchitecture, 2004.

Digital Library

[36]

P. Zhou, et al. iWatcher: Efficient Architectural Support for Software Debugging. In Proceedings of the 31st Annual International Symposium on Computer Architecture, page 224, June 2004.

Digital Library

[37]

J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, 23(3):337--343, May 1977.

Digital Library

Index Terms

A regulated transitive reduction (RTR) for longer memory race recording
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 2006 ASPLOS Conference

Now at VMware. Multithreaded deterministic replay has important applications in cyclic debugging, fault tolerance and intrusion analysis. Memory race recording is a key technology for multithreaded deterministic replay. In this paper, we considerably ...
A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 2006 ASPLOS Conference

Now at VMware. Multithreaded deterministic replay has important applications in cyclic debugging, fault tolerance and intrusion analysis. Memory race recording is a key technology for multithreaded deterministic replay. In this paper, we considerably ...
A regulated transitive reduction (RTR) for longer memory race recording
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Now at VMware. Multithreaded deterministic replay has important applications in cyclic debugging, fault tolerance and intrusion analysis. Memory race recording is a key technology for multithreaded deterministic replay. In this paper, we considerably ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 41, Issue 11

Proceedings of the 2006 ASPLOS Conference

November 2006

425 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1168918

Issue’s Table of Contents

ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
October 2006
440 pages
ISBN:1595934510
DOI:10.1145/1168857
General Chair:
John Paul Shen
Intel Corp.
,
Program Chair:
Margaret R. Martonosi
Princeton University

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 October 2006

Published in SIGPLAN Volume 41, Issue 11

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

109
Total Citations
View Citations
747
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents