skip to main content
10.1145/2462029.2462036acmconferencesArticle/Chapter ViewAbstractPublication PagespasteConference Proceedingsconference-collections
research-article

Automatically mining program build information via signature matching

Published: 20 June 2013 Publication History

Abstract

Program build information, such as compilers and libraries used, is vitally important in an auditing and benchmarking framework for High-Performance Computing (HPC) systems. I have developed a tool to automatically extract this information using signature-based detection, a common strategy employed by anti-virus software to search for known patterns of data within the program binaries. I formulate the patterns from various "features" embedded in the program binaries, and the experiment shows that my tool can successfully identify many different compilers, libraries, and their versions.

References

[1]
http://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.
[2]
http://modules.sf.net.
[3]
http://www.cpmd.org.
[4]
http://www.spec.org.
[5]
http://www.teragrid.org/userinfo/softenv.
[6]
Section 8.5 of working draft of standard for programming language C++, document no. N1905. http://www.open-std.org.
[7]
System V application binary interface - AMD64 architecture processor supplement. http://www.x86-64.org/documentation.
[8]
B. R. Brooks and et al. CHARMM: The biomolecular simulation program. J. Computat. Chem., 30:1545--1615, 2009.
[9]
D. J. Brown and K. Runge. Library interface versioning in Solaris and Linux. In The 4th Annual Linux Showcase (ALS) & Conference, 2000.
[10]
D. A. Case and et al. The Amber biomolecular simulation programs. J. Computat. Chem., 26:1668--1688, 2005.
[11]
I. Dooley and L. Kale. Quantifying the interference caused by subnormal floating-point values. In Workshop on Operating System Interference in High Performance Applications (OSIHPA), 2005.
[12]
A. Fog. Optimizing software in C++: An optimization guide for Windows, Linux and Mac platforms, chapter 13. http://www.agner.org/optimize.
[13]
T. R. Furlani and et al. Performance metrics and auditing framework using application kernels for high performance computer systems. Concurrency: Experience and Practice, 25(7):918--931, 2013.
[14]
P. Giannozzi and et al. http://www.quantum-espresso.org.
[15]
B. Hadri, M. Fahey, and N. Jones. Identifying software usage at HPC centers with the automatic library tracking database. In TeraGrid Conference Proceedings, 2010.
[16]
J. Jelinek. http://people.redhat.com/jakub/prelink.pdf.
[17]
G. Johansen and B. Mauzy. Cray XT programming environment's implementation of dynamic shared libraries. In Cray User Group (CUG) Conference, 2009.
[18]
J. S. Kim. Recovering debugging symbols from stripped static compiled binaries. Hakin9 Magazine, June 2009. http://0xbeefc0de.org/papers.
[19]
T. Kojm. http://www.clamav.net.
[20]
J. R. Levine. Linkers and loaders. Morgan Kaufmann, 1999.
[21]
J. C. Phillips and et al. Scalable molecular dynamics with NAMD. J. Computat. Chem., 26:1781--1802, 2005.
[22]
S. J. Plimpton. Fast parallel algorithms for short-range molecular dynamics. J. Computat. Phys., 117:1--19, 1995.
[23]
N. Rosenblum, B. Miller, and X. Zhu. Extracting compiler provenance from program binaries. In The workshop on Program Analysis for Software Tools and Engineering (PASTE), 2010.
[24]
M. W. Schmidt and et al. General atomic and molecular electronic structure system. J. Computat. Chem., 14:1347--1363, 1993.
[25]
N. Sidwell. A common vendor ABI for C++ -- GCC's why, what and not. In ACCU Conference, 2003.
[26]
M. Valiev and et al. NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations. Computat. Phys. Commun., 181:1477, 2010.
[27]
M. Wilding and D. Behman. Self-service Linux: Mastering the art of problem determination. Prentice Hall, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PASTE '13: Proceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering
June 2013
54 pages
ISBN:9781450321280
DOI:10.1145/2462029
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ClamAV
  2. program analysis
  3. program provenance
  4. static binary analysis
  5. technology audit

Qualifiers

  • Research-article

Funding Sources

Conference

PASTE '13

Acceptance Rates

PASTE '13 Paper Acceptance Rate 7 of 13 submissions, 54%;
Overall Acceptance Rate 57 of 159 submissions, 36%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media