skip to main content
10.1145/3395363.3397370acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

How far we have come: testing decompilation correctness of C decompilers

Published: 18 July 2020 Publication History

Abstract

A C decompiler converts an executable (the output from a C compiler) into source code. The recovered C source code, once recompiled, will produce an executable with the same functionality as the original executable. With over twenty years of development, C decompilers have been widely used in production to support reverse engineering applications, including legacy software migration, security retrofitting, software comprehension, and to act as the first step in launching adversarial software exploitations. As the paramount component and the trust base in numerous cybersecurity tasks, C decompilers have enabled the analysis of malware, ransomware, and promoted cybersecurity professionals’ understanding of vulnerabilities in real-world systems.
In contrast to this flourishing market, our observation is that in academia, outputs of C decompilers (i.e., recovered C source code) are still not extensively used. Instead, the intermediate representations are often more desired for usage when developing applications such as binary security retrofitting. We acknowledge that such conservative approaches in academia are a result of widespread and pessimistic views on the decompilation correctness. However, in conventional software engineering and security research, how much of a problem is, for instance, reusing a piece of simple legacy code by taking the output of modern C decompilers?
In this work, we test decompilation correctness to present an up-to-date understanding regarding modern C decompilers. We detected a total of 1,423 inputs that can trigger decompilation errors from four popular decompilers, and with extensive manual effort, we identified 13 bugs in two open-source decompilers. Our findings show that the overly pessimistic view of decompilation correctness leads researchers to underestimate the potential of modern decompilers; the state-of-the-art decompilers certainly care about the functional correctness, and they are making promising progress. However, some tasks that have been studied for years in academia, such as type inference and optimization, still impede C decompilers from generating quality outputs more than is reflected in the literature. These issues rarely receive enough attention and can lead to great confusion that misleads users.

References

[1]
2014. Starcraft Reverse Engineered to run on ARM. https://news.ycombinator. com/item?id= 7372414.
[2]
2016. radare2. http://www.radare.org/r/.
[3]
2018. Diablo devolved-magic behind the 1996 computer game. https://github. com/diasurgical/devilution.
[4]
2018. Firmware Mod Kit. https://github.com/rampageX/firmware-mod-kit.
[5]
2018. Snowman decompiler. https://derevenets.com.
[6]
2019. Moss: A System for Detecting Software Similarity. https://theory.stanford. edu/~aiken/moss.
[7]
2019. Output of nocode Invalid C++ Code. https://github.com/yegord/snowman/ issues/196.
[8]
2020. Decompiler Flaws and Root Cause Analysis. https://www.dropbox.com/ sh/kqw7e19snfeukai/AADHZ45TAL9Kxi7v9nmdXfLCa?dl= 0.
[9]
2020. Decompiler Fuzzing Test with EMI mutation. https://github.com/monkbai/ DecFuzzer.
[10]
Kapil Anand, Matthew Smithson, Khaled Elwazeer, Aparna Kotha, Jim Gruen, Nathan Giles, and Rajeev Barua. 2013. A Compiler-level Intermediate Representation Based Binary Analysis and Rewriting System. In EuroSys ' 13.
[11]
Dennis Andriesse, Xi Chen, Victor van der Veen, Asia Slowinska, and Herbert Bos. 2016. An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries. In USENIX Sec.
[12]
Gogul Balakrishnan and Thomas Reps. [n.d.]. DIVINE: DIscovering Variables IN Executables. In VMCAI 2007.
[13]
Gogul Balakrishnan and Thomas Reps. 2010. WYSINWYX: What You See is Not What You eXecute. ACM Trans. Program. Lang. Syst. 32, 6, Article 23 ( Aug. 2010 ), 84 pages.
[14]
Tifany Bao, Jonathan Burket, Maverick Woo, Rafael Turner, and David Brumley. 2014. ByteWeight: Learning to Recognize Functions in Binary Code. In Proceedings of the 23rd USENIX Conference on Security Symposium. USENIX Association.
[15]
Ahmed Bougacha. 2016. Dagger. https://github.com/repzret/dagger.
[16]
David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J Schwartz. 2011. BAP: A binary analysis platform (CAV).
[17]
David Brumley, JongHyup Lee, Edward J. Schwartz, and Maverick Woo. 2013. Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring. In Presented as part of the 22nd USENIX Security Symposium (USENIX Security 13). 353-368.
[18]
Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. 2016. BinGo: Cross-architecture cross-OS Binary Search (FSE).
[19]
Yuting Chen, Ting Su, Chengnian Sun, Zhendong Su, and Jianjun Zhao. 2016. Coverage-directed Diferential Testing of JVM Implementations. In PLDI.
[20]
Zheng Leong Chua, Shiqi Shen, Prateek Saxena, and Zhenkai Liang. 2017. Neural Nets Can Learn Function Type Signatures From Binaries. In 26th USENIX Security Symposium (USENIX Security 17). USENIX Association, 99-116.
[21]
Cristina Cifuentes. 1994. Reverse compilation techniques. Queensland University of Technology, Brisbane.
[22]
Nassim Corteggiani, Giovanni Camurati, and Aurélien Francillon. 2018. Inception: System-Wide Security Testing of Real-World Embedded Systems Software. In USENIX Sec.
[23]
Al Danial. [n.d.]. CLOC. https://goo.gl/3KFACB.
[24]
Sandeep Dasgupta, Sushant Dinesh, Deepan Venkatesh, Vikram S Adve, and Christopher W Fletcher. 2020. Scalable Validation for Binary Lifters.
[25]
Yaniv David, Nimrod Partush, and Eran Yahav. 2018. FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware. In ASPLOS.
[26]
Yaniv David and Eran Yahav. 2014. Tracelet-based Code Search in Executables. In PLDI.
[27]
Zhui Deng, Xiangyu Zhang, and Dongyan Xu. 2013. BISTRO: Binary Component Extraction and Embedding for Software Security Applications.
[28]
Alessandro Di Federico, Pietro Fezzardi, and Giovanni Agosta. 2018. rev.ng: A Multi-Architecture Framework for Reverse Engineering and Vulnerability Discovery. In ICCST.
[29]
Alessandro Di Federico, Mathias Payer, and Giovanni Agosta. 2017. Rev.Ng: A Unified Binary Analysis Framework to Recover CFGs and Function Boundaries. In CC.
[30]
Khaled ElWazeer, Kapil Anand, Aparna Kotha, Matthew Smithson, and Rajeev Barua. 2013. Scalable Variable and Data Type Detection in a Binary Rewriter. In PLDI.
[31]
Bauman Erick, Lin Zhiqiang, and Hamlen Kevin W. 2018. Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics. In NDSS.
[32]
Ivan Gotovchits, Rijnard van Tonder, and David Brumley. 2018. Saluki: finding taint-style vulnerabilities with static property checking. In NDSS.
[33]
I. Guilfanov. 2001. A Simple Type System for Program Reengineering. In WCRE.
[34]
SA Hex-Rays. 2014. IDA Pro: a cross-platform multi-processor disassembler and debugger.
[35]
Anastasis Keliris and Michail Maniatakos Yakdan. 2019. ICSREF: A Framework for Automated Reverse Engineering of Industrial Control Systems Binaries. In NDSS.
[36]
Soomin Kim, Markus Faerevaag, Minkyu Jung, SeungIl Jung, DongYeop Oh, JongHyup Lee, and Sang Kil Cha. 2017. Testing Intermediate Representations for Binary Analysis. In ASE.
[37]
Taegyu Kim, Chung Hwan Kim, Hongjun Choi, Yonghwi Kwon, Brendan Saltaformaggio, Xiangyu Zhang, and Dongyan Xu. 2017. RevARM: A Platform-Agnostic ARM Binary Rewriter for Security Applications. In ACSAC.
[38]
Jakub Křoustek and Peter Matula. 2017. Retdec: An open-source machine-code decompiler. ( 2017 ).
[39]
Christopher Kruegel, William Robertson, Fredrik Valeur, and Giovanni Vigna. 2004. Static Disassembly of Obfuscated Binaries. In USENIX Sec.
[40]
Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler Validation via Equivalence Modulo Inputs. In PLDI.
[41]
Vu Le, Chengnian Sun, and Zhendong Su. 2015. Finding Deep Compiler Bugs via Guided Stochastic Program Mutation. In OOPSLA.
[42]
JongHyup Lee, Thanassis Avgerinos, and David Brumley. 2011. TIE: Principled Reverse Engineering of Types in Binary Programs. In NDSS.
[43]
Lorenzo Martignoni, Stephen McCamant, Pongsin Poosankam, Dawn Song, and Petros Maniatis. 2012. Path-exploration Lifting: Hi-fi Tests for Lo-fi Emulators. In ASPLOS.
[44]
Lorenzo Martignoni, Roberto Paleari, Giampaolo Fresi Roglia, and Danilo Bruschi. 2010. Testing System Virtual Machines. In ISSTA.
[45]
Lorenzo Martignoni, Roberto Paleari, Giampaolo Fresi Roglia, and Danilo Bruschi. 2009. Testing CPU Emulators. In ISSTA.
[46]
Microsoft. 2018. llvm-mctoll. https://github.com/Microsoft/llvm-mctoll.
[47]
Kenneth Miller, Yonghwi Kwon, Yi Sun, Zhuo Zhang, Xiangyu Zhang, and Zhiqiang Lin. 2019. Probabilistic Disassembly. In ICSE.
[48]
Lily Hay Newman. 2019. The NSA makes Ghidra, a powerful cybersecurity tool, open source. https://www.wired.com/story/nsa-ghidra-open-source-tool/.
[49]
Matt Noonan, Alexey Loginov, and David Cok. 2016. Polymorphic Type Inference for Machine Code. In PLDI.
[50]
National Security Agency (NSA). 2018. Ghidra. https://www.nsa.gov/resources/ everyone/ghidra/.
[51]
Roberto Paleari, Lorenzo Martignoni, Giampaolo Fresi Roglia, and Danilo Bruschi. 2010. N-version Disassembly: Diferential Testing of x86 Disassemblers. In ISSTA.
[52]
PNF. 2018. JEB Decompiler. https://www.pnfsoftware.com/.
[53]
PNF. 2018. Type Library. https://www.pnfsoftware.com/blog/native-types-andtypelibs-with-jeb/.
[54]
Thomas Reps and Gogul Balakrishnan. 2008. Improved Memory-Access Analysis for x86 Executables. In CC.
[55]
rev.ng Srls. 2018. Rev.ng. https://rev.ng/.
[56]
Edward J. Schwartz, Cory F. Cohen, Michael Duggan, Jefrey Gennari, Jefrey S. Havrilla, and Charles Hines. 2018. Using Logic Programming to Recover C++ Classes and Methods from Compiled Executables (CCS '18). Association for Computing Machinery, 426-441.
[57]
Hao Shi, Abdulla Alwabel, and Jelena Mirkovic. 2014. Cardinal Pill Testing of System Virtual Machines. In USENIX Sec.
[58]
Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing Functions in Binaries with Neural Networks. In USENIX Sec.
[59]
Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. 2008. BitBlaze: A new approach to computer security via binary analysis. In Information systems security. Springer, 1-25.
[60]
Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding Compiler Bugs via Live Code Mutation. In OOPSLA.
[61]
Dullien Thomas and Sebastian Porst. 2009. REIL: A platform-independent intermediate representation of disassembled code for static code analysis. In CanSecWest.
[62]
trailofbits. 2018. McSema. https://github.com/trailofbits/mcsema.
[63]
Pei Wang, Qinkun Bao, Li Wang, Shuai Wang, Zhaofeng Chen, Tao Wei, and Dinghao Wu. 2018. Software Protection on the Go: A Large-scale Empirical Study on Mobile App Obfuscation. In ICSE.
[64]
Ruoyu Wang, Yan Shoshitaishvili, Antonio Bianchi, Aravind Machiry, John Grosen, Paul Grosen, Christopher Kruegel, and Giovanni Vigna. 2017. Ramblr: Making Reassembly Great Again. In NDSS.
[65]
Shuai Wang, Pei Wang, and Dinghao Wu. 2015. Reassembleable Disassembling. In USENIX Sec.
[66]
Shuai Wang, Pei Wang, and Dinghao Wu. 2016. Uroboros: Instrumenting stripped binaries with static reassembling. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE, 236-247.
[67]
Khaled Yakdan, Sebastian Eschweiler, Elmar Gerhards-Padilla, and Matthew Smith. 2015. No More Gotos: Decompilation Using Pattern-Independent ControlFlow Structuring and Semantic-Preserving Transformations. In NDSS.
[68]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In PLDI.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2020
591 pages
ISBN:9781450380089
DOI:10.1145/3395363
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2020

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Decompiler
  2. Reverse Engineering
  3. Software Testing

Qualifiers

  • Research-article

Conference

ISSTA '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)286
  • Downloads (Last 6 weeks)32
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media