skip to main content
10.1145/3192366.3192380acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

iReplayer: in-situ and identical record-and-replay for multithreaded applications

Published: 11 June 2018 Publication History

Abstract

Reproducing executions of multithreaded programs is very challenging due to many intrinsic and external non-deterministic factors. Existing RnR systems achieve significant progress in terms of performance overhead, but none targets the in-situ setting, in which replay occurs within the same process as the recording process. Also, most existing work cannot achieve identical replay, which may prevent the reproduction of some errors.
This paper presents iReplayer, which aims to identically replay multithreaded programs in the original process (under the "in-situ" setting). The novel in-situ and identical replay of iReplayer makes it more likely to reproduce errors, and allows it to directly employ debugging mechanisms (e.g. watchpoints) to aid failure diagnosis. Currently, iReplayer only incurs 3% performance overhead on average, which allows it to be always enabled in the production environment. iReplayer enables a range of possibilities, and this paper presents three examples: two automatic tools for detecting buffer overflows and use-after-free bugs, and one interactive debugging tool that is integrated with GDB.

Supplementary Material

WEBM File (p344-liu.webm)

References

[1]
2017. Pure python memcached client. https://pypi.python.org/pypi/python-memcached.
[2]
ab Developers. 2017. ab - Apache HTTP server benchmarking tool. https://httpd.apache.org/docs/2.4/programs/ab.html.
[3]
Periklis Akritidis, Manuel Costa, Miguel Castro, and Steven Hand. 2009. Baggy bounds checking: an e_cient and backwards-compatible defense against out-of-bounds errors. In Proceedings of the 18th conference on USENIX security symposium (SSYM'09). USENIX Association, Berkeley, CA, USA, 51-66. http://dl.acm.org/citation.cfm?id=1855768.1855772
[4]
Mohammad Mejbah ul Alam, Tongping Liu, Guangming Zeng, and Abdullah Muzahid. 2017. SyncPerf: Categorizing, Detecting, and Diagnosing Synchronization Performance Bugs. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys '17). ACM, New York, NY, USA, 298-313.
[5]
Gautam Altekar and Ion Stoica. 2009. ODR: Output-deterministic Replay for Multicore Debugging. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09). ACM, New York, NY, USA, 193-206.
[6]
Joy Arulraj, Guoliang Jin, and Shan Lu. 2014. Leveraging the Short-term Memory of Hardware to Diagnose Production-run Software Failures. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 207-222.
[7]
Amittai Aviram, Shu-Chun Weng, Sen Hu, and Bryan Ford. 2010. Efficient System-enforced Deterministic Parallelism. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10). USENIX Association, Berkeley, CA, USA, 1-16. http://dl.acm.org/citation.cfm?id=1924943.1924957
[8]
Thomas Ball and James R. Larus. 1996. Efficient Path Profiling. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO 29). IEEE Computer Society, Washington, DC, USA, 46-57. http://dl.acm.org/citation.cfm?id=243846.243857
[9]
Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. 2010. CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY, USA, 53-64.
[10]
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). ACM, New York, NY, USA, 117-128.
[11]
Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. 2001. Composing High-performance Memory Allocators. In Proceedings of the ACM SIGPLAN 2001 on Programming Language Design and Implementation (PLDI '01). ACM, New York, NY, USA, 114-124.
[12]
Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. 2002. Reconsidering Custom Memory Allocation. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA '02). ACM, New York, NY, USA, 1-12.
[13]
Sanjay Bhansali, Wen-Ke Chen, Stuart de Jong, Andrew Edwards, Ron Murray, Milenko Drinic, Darek Mihocka, and Joe Chau. 2006. Framework for Instruction-level Tracing and Analysis of Program Executions. In Proceedings of the 2Nd International Conference on Virtual Execution Environments (VEE '06). ACM, New York, NY, USA, 154-163.
[14]
Christian Bienia and Kai Li. 2009. PARSEC 2.0: A NewBenchmark Suite for Chip-Multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation.
[15]
Hans-J. Boehm, Alan J. Demers, and Scott Shenker. 1991. Mostly Parallel Garbage Collection. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (PLDI '91). ACM, New York, NY, USA, 157-164.
[16]
Michael D. Bond, Milind Kulkarni, Man Cao, Meisam Fathi Salmi, and Jipeng Huang. 2015. Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine. In Proceedings of the Principles and Practices of Programming on The Java Platform (PPPJ '15). ACM, New York, NY, USA, 90-101.
[17]
Brad Spengler. 2003. PaX: The Guaranteed End of Arbitrary Code Execution. https://grsecurity.net/PaX-presentation.ppt.
[18]
Derek Bruening and Qin Zhao. 2011. Practical memory checking with Dr. Memory. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '11). IEEE Computer Society, Washington, DC, USA, 213-223. http://dl.acm.org/citation.cfm?id=2190025.2190067
[19]
Bugzilla. 2013. "libtiff (gif2tiff): possible heapbased buffer overflow in readgifimage()". http://bugzilla.maptools.org/show_bug.cgi?id=2451.
[20]
Crispin Cowan, Calton Pu, Dave Maier, Heather Hinton, Jonathan Walpole, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, and Qian Zhang. 1998. StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks. In In Proceedings of the 7th USENIX Security Symposium. 63-78.
[21]
cppreference. 2017. Atomic operations library. http://en.cppreference.com/w/cpp/atomic.
[22]
Heming Cui, Jiri Simsa, Yi-Hong Lin, Hao Li, Ben Blum, Xinan Xu, Junfeng Yang, Garth A. Gibson, and Randal E. Bryant. 2013. Parrot: A Practical Runtime for Deterministic, Stable, and Reliable Threads. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 388-405.
[23]
Heming Cui, Jingyue Wu, John Gallagher, Huayang Guo, and Junfeng Yang. 2011. Efficient Deterministic Multithreading Through Schedule Relaxation. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 337-351.
[24]
David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, and Peter M. Chen. 2014. Eidetic Systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI'14). USENIX Association, Berkeley, CA, USA, 525-540. http://dl.acm.org/citation.cfm?id=2685048.2685090
[25]
Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. 2011. RCDC: A Relaxed Consistency Deterministic Computer. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, USA, 67-78.
[26]
George W. Dunlap, Samuel T. King, Sukru Cinar, Murtaza A. Basrai, and Peter M. Chen. 2002. ReVirt: Enabling Intrusion Analysis Through Virtual-machine Logging and Replay. SIGOPS Oper. Syst. Rev. 36, SI (Dec. 2002), 211-224.
[27]
Laura Effinger-Dean, Brandon Lucia, Luis Ceze, Dan Grossman, and Hans-J. Boehm. 2012. IFRit: Interference-free Regions for Dynamic Data-race Detection. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '12). ACM, New York, NY, USA, 467-484.
[28]
Frank Ch. Eigler. 2003. Mud_ap: pointer use checking for C/C++. Red Hat Inc.
[29]
Weining Gu, Z. Kalbarczyk, Ravishankar, K. Iyer, and Zhenyu Yang. 2003. Characterization of linux kernel behavior under errors. In 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings. 459-468.
[30]
Zhenyu Guo, Xi Wang, Jian Tang, Xuezheng Liu, Zhilei Xu, Ming Wu, M Frans Kaashoek, and Zheng Zhang. 2008. R2: An application-level kernel for record and replay. In Proceedings of the 8th USENIX conference on Operating systems design and implementation. USENIX Association, 193-208.
[31]
Niranjan Hasabnis, Ashish Misra, and R. Sekar. 2012. Light-weight bounds checking. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO '12). ACM, New York, NY, USA, 135-144.
[32]
Reed Hastings and Bob Joyce. 1991. Purify: Fast detection of memory leaks and access errors. In In Proc. of the Winter 1992 USENIX Conference. 125-138.
[33]
Nima Honarmand, Nathan Dautenhahn, Josep Torrellas, Samuel T. King, Gilles Pokam, and Cristiano Pereira. 2013. Cyrus: Unintrusive Application-level Record-replay for Replay Parallelism. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). ACM, New York, NY, USA, 193-206.
[34]
Nima Honarmand and Josep Torrellas. 2014. Replay Debugging: Leveraging Record and Replay for Program Debugging. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 445-456. http://dl.acm.org/citation.cfm?id=2665671.2665737
[35]
Derek R. Hower and Mark D. Hill. 2008. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society, Washington, DC, USA, 265-276.
[36]
Jeff Huang, Peng Liu, and Charles Zhang. 2010. LEAP: lightweight deterministic multi-processor replay of concurrent java programs. In Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering. ACM, 207-216.
[37]
Jeff Huang, Charles Zhang, and Julian Dolby. 2013. CLAP: Recording Local Executions to Reproduce Concurrency Failures. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13). ACM, New York, NY, USA, 141-152.
[38]
Shiyou Huang, Bowen Cai, and Jeff Huang. 2017. Towards Production-Run Heisenbugs Reproduction on Commercial Hardware. In 2017 USENIX Annual Technical Conference. USENIX Association, 403-415.
[39]
Intel Corporation. 2012. Intel Inspector XE 2013. http://software.intel.com/en-us/intel-inspector-xe.
[40]
Baris Kasikci, Benjamin Schubert, Cristiano Pereira, Gilles Pokam, and George Candea. 2015. Failure Sketching: A Technique for Automated Root Cause Diagnosis of In-production Failures. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15). ACM, New York, NY, USA, 344-360.
[41]
Joseph Kulandai. 2011. Java Hashtable. http://javapapers.com/core-java/java-hashtable/.
[42]
Lubomir Kundrak. 2007. Buffer overflow in bzip2's bzip2recover. https://bugzilla.redhat.com/show_bug.cgi?id=226979.
[43]
Dongyoon Lee, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2012. Chimera: Hybrid Program Analysis for Determinism. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 463- 474.
[44]
Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn. 2010. Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY, USA, 77-90.
[45]
Kyu Hyung Lee, Dohyeong Kim, and Xiangyu Zhang. 2014. Infrastructure-Free Logging and Replay of Concurrent Execution on Multiple Cores. In Proceedings of the 28th European Conference on ECOOP 2014 -- Object-Oriented Programming - Volume 8586. Springer-Verlag New York, Inc., New York, NY, USA, 232-256.
[46]
Peng Liu, Xiangyu Zhang, Omer Tripp, and Yunhui Zheng. 2015. Light: Replay via Tightly Bounded Recording. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015). ACM, New York, NY, USA, 55-64.
[47]
Tongping Liu and Emery D. Berger. 2011. SHERIFF: precise detection and automatic mitigation of false sharing. In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications (OOPSLA '11). ACM, New York, NY, USA, 3-18.
[48]
Tongping Liu, Charlie Curtsinger, and Emery D. Berger. 2011. Dthreads: efficient deterministic multithreading. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP '11). ACM, New York, NY, USA, 327-336.
[49]
Tongping Liu, Charlie Curtsinger, and Emery D. Berger. 2016. Double-Take: Fast and Precise Error Detection via Evidence-Based Dynamic Analysis. In Proceedings of 38th International Conference on Software Engineering (ICSE'16). ACM, New York, NY, USA.
[50]
Shan Lu, Zhenmin Li, Feng Qin, Lin Tan, Pin Zhou, and Yuanyuan Zhou. 2005. Bugbench: Benchmarks for evaluating bug detection tools. In In Workshop on the Evaluation of Software Defect Detection Tools.
[51]
Nuno Machado, Brandon Lucia, and Luís Rodrigues. 2015. Concurrency debugging with differential schedule projections. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 586-595.
[52]
Ali José Mashtizadeh, Tal Gar_nkel, David Terei, David Mazieres, and Mendel Rosenblum. 2017. Towards Practical Default-On Multi-Core Record/Replay. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 693-708.
[53]
Microsoft. 2007. What is the Staging Environment? https://msdn.microsoft.com/en-us/library/ms942990(v=cs.70).aspx.
[54]
Pablo Montesinos, Luis Ceze, and Josep Torrellas. 2008. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society, Washington, DC, USA, 289-300.
[55]
Mozilla Corporation. 2017. rr: lightweight recording & deterministic debugging. http://rr-project.org/.
[56]
Madanlal Musuvathi, Shaz Qadeer, Thomas Ball, Gerard Basler, Piramanayagam Arumuga Nainar, and Iulian Neamtiu. 2008. Finding and Reproducing Heisenbugs in Concurrent Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). USENIX Association, Berkeley, CA, USA, 267-280. http://dl.acm.org/citation.cfm?id=1855741.1855760
[57]
George C. Necula Necula, McPeak Scott, and Weimer Westley. 2002. CCured: Type-Safe Retrofitting of Legacy Code. In Proceedings of the Principles of Programming Languages. 128-139.
[58]
Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (PLDI '07). ACM, New York, NY, USA, 89-100.
[59]
Oracle Corporation. 2011. Sun Memory Error Discovery Tool (Discover). http://docs.oracle.com/cd/E18659_01/html/821-1784/gentextid-302.html.
[60]
Robert O'Callahan, Chris Jones, Nathan Froyd, Kyle Huey, Albert Noll, and Nimrod Partush. 2017. Engineering Record And Replay For Deployability. In 2017 USENIX Annual Technical Conference. USENIX Association.
[61]
parasoft Company. 2013. Runtime Analysis and Memory Error Detection for C and C++.
[62]
Soyeon Park, Yuanyuan Zhou, Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H. Lee, and Shan Lu. 2009. PRES: probabilistic replay with execution sketching on multiprocessors. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles (SOSP '09). ACM, New York, NY, USA, 177-192.
[63]
Harish Patil, Cristiano Pereira, Mack Stallcup, Gregory Lueck, and James Cownie. 2010. PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '10). ACM, New York, NY, USA, 2-11.
[64]
Feng Qin, Joseph Tucek, Jagadeesan Sundaresan, and Yuanyuan Zhou. 2005. Rx: Treating Bugs As Allergies--a Safe Method to Survive Software Failures. In Proceedings of the Twentieth ACM Symposium on Operating Systems Principles (SOSP '05). ACM, New York, NY, USA, 235-248.
[65]
James Reinders. 2013. "Processor Tracing". https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing.
[66]
Michiel Ronsse and Koen De Bosschere. 1999. RecPlay: A Fully Integrated Practical Record/Replay System. ACM Trans. Comput. Syst. 17, 2 (May 1999), 133-152.
[67]
Michiel Ronsse and Koen De Bosschere. 1999. RecPlay: a fully integrated practical record/replay system. ACM Trans. Comput. Syst. 17, 2 (May 1999), 133-152.
[68]
Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: a fast address sanity checker. In Proceedings of the 2012 USENIX conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, Berkeley, CA, USA, 28-28. http://dl.acm.org/citation.cfm?id=2342821.2342849
[69]
SQL Developers. 2017. How SQLite Is Tested. https://www.sqlite.org/testing.html.
[70]
Talos. 2016. "Memcached Server SASL Autentication Remote Code Execution Vulnerability". https://www.talosintelligence.com/reports/TALOS-2016-0221/.
[71]
Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanthos, and Yuanyuan Zhou. 2007. Triage: Diagnosing Production Run Failures at the User's Site. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles (SOSP '07). ACM, New York, NY, USA, 131-144.
[72]
Kaushik Veeraraghavan, Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2011. DoublePlay: parallelizing sequential logging and replay. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS XVI). ACM, New York, NY, USA, 15-26.
[73]
Yan Wang, Harish Patil, Cristiano Pereira, Gregory Lueck, Rajiv Gupta, and Iulian Neamtiu. 2014. DrDebug: Deterministic Replay Based Cyclic Debugging with Dynamic Slicing. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14). ACM, New York, NY, USA, Article 98, 11 pages.
[74]
Weiwei Xiong, Soyeon Park, Jiaqi Zhang, Yuanyuan Zhou, and Zhiqiang Ma. 2010. Ad Hoc Synchronization Considered Harmful. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10). USENIX Association, Berkeley, CA, USA, 163-176. http://dl.acm.org/citation.cfm?id=1924943.1924955
[75]
Qiang Zeng, Dinghao Wu, and Peng Liu. 2011. Cruiser: concurrent heap buffer overflow monitoring using lock-free data structures. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation (PLDI '11). ACM, New York, NY, USA, 367-377.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2018
825 pages
ISBN:9781450356985
DOI:10.1145/3192366
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Identical Replay
  2. In-situ Replay
  3. Multithreaded Debugging
  4. Record-and-Replay

Qualifiers

  • Research-article

Conference

PLDI '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)13
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media