DIRE: A Neural Approach to Decompiled Identifier Naming

Lacomis, Jeremy; Yin, Pengcheng; Schwartz, Edward J.; Allamanis, Miltiadis; Goues, Claire Le; Neubig, Graham; Vasilescu, Bogdan

Computer Science > Software Engineering

arXiv:1909.09029 (cs)

[Submitted on 19 Sep 2019 (v1), last revised 3 Oct 2019 (this version, v2)]

Title:DIRE: A Neural Approach to Decompiled Identifier Naming

Authors:Jeremy Lacomis, Pengcheng Yin, Edward J. Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, Bogdan Vasilescu

View PDF

Abstract:The decompiler is one of the most common tools for examining binaries without corresponding source code. It transforms binaries into high-level code, reversing the compilation process. Decompilers can reconstruct much of the information that is lost during the compilation process (e.g., structure and type information). Unfortunately, they do not reconstruct semantically meaningful variable names, which are known to increase code understandability. We propose the Decompiled Identifier Renaming Engine (DIRE), a novel probabilistic technique for variable name recovery that uses both lexical and structural information recovered by the decompiler. We also present a technique for generating corpora suitable for training and evaluating models of decompiled code renaming, which we use to create a corpus of 164,632 unique x86-64 binaries generated from C projects mined from GitHub. Our results show that on this corpus DIRE can predict variable names identical to the names in the original source code up to 74.3% of the time.

Comments:	2019 International Conference on Automated Software Engineering
Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:1909.09029 [cs.SE]
	(or arXiv:1909.09029v2 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.1909.09029

Submission history

From: Jeremy Lacomis [view email]
[v1] Thu, 19 Sep 2019 14:57:31 UTC (434 KB)
[v2] Thu, 3 Oct 2019 15:42:43 UTC (434 KB)

Computer Science > Software Engineering

Title:DIRE: A Neural Approach to Decompiled Identifier Naming

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Software Engineering

Title:DIRE: A Neural Approach to Decompiled Identifier Naming

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators