skip to main content
research-article
Open access

HMTT: A hybrid hardware/software tracing system for bridging the DRAM access trace's semantic gap

Published: 01 February 2014 Publication History

Abstract

DRAM access traces (i.e., off-chip memory references) can be extremely valuable for the design of memory subsystems and performance tuning of software. Hardware snooping on the off-chip memory interface is an effective and nonintrusive approach to monitoring and collecting real-life DRAM accesses. However, compared with software-based approaches, hardware snooping approaches typically lack semantic information, such as process/function/object identifiers, virtual addresses, and lock contexts, that is essential to the complete understanding of the systems and software under investigation.
In this article, we propose a hybrid hardware/software mechanism that is able to collect off-chip memory reference traces with semantic information. We have designed and implemented a prototype system called HMTT (Hybrid Memory Trace Tool), which uses a custom-made DIMM connector to collect off-chip memory references and a high-level event-encoding scheme to correlate semantic information with memory references. In addition to providing complete, undistorted DRAM access traces, the proposed system is also able to perform various types of low-overhead profiling, such as object-relative accesses and multithread lock accesses.

References

[1]
C. Alexander, W. Keshlear, F. Cooper, and F. Briggs. 1986. Cache memory performance in a UNIX environment. Computer Architecture News 14 (1986), 14--70.
[2]
Todd Austin, Eric Larson, and Dan Ernst. 2002. SimpleScalar: An infrastructure for computer system modeling. Computer 35, 2 (2002), 59--67.
[3]
Yungang Bao, Mingyu Chen, Yuan Ruan, Li Liu, Jianping Fan, Qingbo Yuan, Bo Song, and Jianwei Xu. 2008. HMTT: A platform independent full-system memory trace monitoring system. ACM SIGMETRICS Performance Evaluation Review 36, 229--240.
[4]
Luiz André Barroso. 1999. Design and Evaluation of Architectures for Commercial Applications. Technical Report. Western Research Laboratory.
[5]
Luiz André Barroso, Sasan Iman, Jaeheon Jeong, Koray öner, Michel Dubois, and Krishnan Ramamurthy. 1995. RPM: A rapid prototyping engine for multiprocessor systems. Computer 28, 2 (1995), 26--34.
[6]
Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Usenix Annual Technical Conference.
[7]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). ACM, 72--81.
[8]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, and others. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1--7.
[9]
Bryan Buck and Jeffrey K. Hollingsworth. 2000. An API for runtime code patching. International Journal of High Performance Computing Applications 14, 4 (2000), 317--329.
[10]
Prashanth P. Bungale and Chi-Keung Luk. 2007. PinOS: A programmable framework for whole-system dynamic instrumentation. In Proceedings of the 3rd International Conference on Virtual Execution Environments (VEE'07). 137--147.
[11]
Licheng Chen, Zehan Cui, Yungang Bao, Mingyu Chen, Yongbing Huang, and Guangming Tan. 2012. A lightweight hybrid hardware/software approach for object-relative memory profiling. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'12). IEEE, 46--57.
[12]
John Demme and Simha Sethumadhavan. 2011. Rapid identification of architectural bottlenecks via precise event counting. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11). IEEE, 353--364.
[13]
Mathieu Desnoyers and Michel R. Dagenais. 2006. The lttng tracer: A low impact performance and behavior monitor for GNU/Linux. In OLS (Ottawa Linux Symposium). Citeseer, 209--224.
[14]
C. Fuentes. 1993. Hardware Support for Operating Systems. Technical Report. University of Michigan.
[15]
FuturePlus. 2012. FuturePlus Systems. Retrieved from http://www.futureplus.com/.
[16]
Green Hills Software. 2013. SuperTrace Probe. Retrieved from http://www.ghs.com/products/supertraceprobe.html.
[17]
K. Grimsrud, J. Archibald, M. Ripley, K. Flanagan, and B. Nelson. 1993. BACH: A hardware monitor for tracing microprocessor-based sytems. Microprocessors and Microsystems 17, 6 (1993).
[18]
Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. Simpoint 3.0: Faster and more flexible program phase analysis. Journal of Instruction Level Parallelism 7, 4 (2005), 1--28.
[19]
Jumnit Hong, Eriko Nurvitadhi, and Shih-Lien L. Lu. 2006. Design, implementation, and verification of active cache emulator (ACE). In Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays (FPGA'06). ACM, New York, 63--72.
[20]
Yongbing Huang, Zehan Cui, Licheng Chen, Wenli Zhang, Yungang Bao, and Mingyu Chen. 2012. HaLock: Hardware-assisted lock contention detection in multithreaded applications. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT'12). ACM, 253--262.
[21]
Intel Corporation. 2004. Intel Itanium2 Processor Reference Manual. Intel Corporation.
[22]
Intel Corporation. 2012. Intel 64 and IA-32 Architectures Software Developer&smacr; Manual, vol. 3A & 3B.
[23]
JEDEC Solid State Technology Association. 2004. Double Data Rate (DDR) SDRAM Specification.
[24]
Kevin P. Lawton. 1996. Bochs: A portable PC emulator for Unix/x. Linux Journal 1996, 29es (1996), 7.
[25]
John Levon and Philippe Elie. 2004. Oprofile: A system profiler for Linux. Retrieved from http://oprofile.sourceforge.net/.
[26]
Qingda Lu, Jiang Lin, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang, and P. Sadayappan. 2009. Soft-OLP: Improving hardware cache performance through software-controlled object-level partitioning. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT'09). IEEE, 246--257.
[27]
Shih-Lien L. Lu, Peter Yiannacouras, Rolf Kassa, Michael Konow, and Taeweon Suh. 2007. An FPGA-based Pentium® in a complete desktop system. In Proceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays (FPGA'07). ACM, New York, 53--59.
[28]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Notices 40, 6 (2005), 190--200.
[29]
Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Hogberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. IEEE Computer (Feb. 2002).
[30]
D. Nagle, R. Uhlig, and T. Mudge. 1992. Monster: A Tool for Analyzing the Interaction between Operating Systems and Computer Architectures. Technical Report.
[31]
Ashwini Nanda, Kwok-Ken Mak, Krishnan Sugarvanam, Ramendra K. Sahoo, Vijayaraghavan Soundarararjan, and T. Basil Smith. 2000. MemorIES: A programmable, real-time hardware emulation tool for multiprocessor server design. In ASPLOS-IX: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 37--48.
[32]
Nicholas Nethercote and Julian Seward. 2007. Valgrind: A framework for heavyweight dynamic binary instrumentation. ACM Sigplan Notices 42, 6 (2007), 89--100.
[33]
K. Rajamani. 2000. Memsim Users' Guide. Technical Report. IBM.
[34]
Mendel Rosenblum, Stephen A. Herrod, Emmett Witchel, and Anoop Gupta. 1995. Complete computer system simulation: The SimOS approach. IEEE Parallel and Distributed Technology: Systems and Applications 3, 4 (Winter 1995), 34--43.
[35]
Amitabh Srivastava and Alan Eustace. 1994. ATOM: A system for building customized program analysis tools. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI'94). ACM, New York, 196--205.
[36]
Teledyne LeCroy. 2013. Protocol analyzers. Retrieved from http://teledynelecroy.com/protocolanalyzer/.
[37]
TopMC. 2011. TopMC: Performance Counter Monitor Tool. http://asg.ict.ac.cn/projects/topmc/. (2011).
[38]
Josep Torrellas, Anoop Gupta, and John Hennessy. 1992. Characterizing the Caching and Synchronization Performance of a Multiprocessor Operating System. Vol. 27. ACM.
[39]
Richard A. Uhlig and Trevor N. Mudge. 1997. Trace-driven memory simulation: A survey. Comput. Surveys 29, 2 (1997), 128--170.
[40]
David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Katie Baynes, Aamer Jaleel, and Bruce Jacob. 2005. DRAMsim: A memory system simulator. Computer Architecture News 33, 4 (Sept. 2005), 20--24.
[41]
John Wawrzynek, David Patterson, Mark Oskin, Shin-Lien Lu, Christoforos Kozyrakis, James C. Hoe, Derek Chiou, and Krste Asanovic. 2007. RAMP: Research accelerator for multiple processors. Micro, IEEE 27, 2 (2007), 46--57.
[42]
Win A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: Implications of the obvious. Computer Architecture News 23, 1 (March 1995), 20--24.
[43]
Hyung-Min Youn, Gi-Ho Park, Kil-Whan Lee, Tack-Don Han, Shin-Dug Kim, and Sung-Bong Yang. 1997. Reconfigurable address collector and flying cache simulator. In Proceedings of High Performance Computing Asia.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 11, Issue 1
February 2014
373 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2591460
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 2014
Accepted: 01 October 2013
Revised: 01 August 2013
Received: 01 May 2013
Published in TACO Volume 11, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DRAM access trace
  2. Hybrid tracing mechanism
  3. function
  4. high-level event
  5. lock
  6. object
  7. semantic gap

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)319
  • Downloads (Last 6 weeks)119
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media