skip to main content
10.1007/978-3-031-51476-0_24guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

FSmell: Recognizing Inline Function in Binary Code

Published: 11 January 2024 Publication History

Abstract

Function recognition is one of the most critical tasks in binary analysis and reverse engineering. However, the recognition of inline functions still remains challenging. This is mainly due to two factors. Firstly, in binaries, there exist no expert patterns, e.g., prologue/epilogue instructions, for inline functions. Secondly, instruction reordering introduced by compiler optimization makes the address space of the instruction from the same inline function discontinuous. The address space of an inline function is often mingled with that of regular functions. This paper proposes FSmell, a graph theory based function recognition framework that specifically targets inline functions. FSmell introduces Instruction Topology Graph (ITG) to represent the data flow dependencies for instructions in a basic block. With the help of ITG, the problem of distinguishing inline instructions from caller instructions is transformed into the graph connectivity problem, which is solved by computing the minimum vertex separator. We have applied FSmell to analyze 78 binaries compiled by GCC and CLANG with 3 different optimization levels. Of the 205,890 inline functions in the 78 binaries, FSmell reports 76,777, with a precision of 67.5%, and a recall of 39.2%. With the help of FSmell, 50% of the vulnerabilities missed by other methods are detected and located.

References

[1]
Perkins, J.H., et al.: Automatically patching errors in deployed software. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, pp. 87–102 (2009)
[2]
Cesare, S., Xiang, Y., Zhou, W.: Control flow-based malware VariantDetection. IEEE Trans. Dependable Secure Comput. 11(4), 307–317 (2013)
[3]
Gu, F., et al.: {COMRace}: detecting data race vulnerabilities in {COM} objects. In: 31st USENIX Security Symposium (USENIX Security 2022), pp. 3019–3036 (2022)
[4]
Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 363–376 (2017)
[5]
Luo, L., Ming, J., Wu, D., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 389–400 (2014)
[6]
Schwartz, E.J., Lee, J., Woo, M., Brumley, D.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring (2013)
[7]
Gussoni, A., Di Federico, A., Fezzardi, P., Agosta, G.: A comb for decompiled C code. In: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, pp. 637–651 (2020)
[8]
Burk, K., Pagani, F., Kruegel, C., Vigna, G.: Decomperson: how humans decompile and what we can learn from it. In: 31st USENIX Security Symposium (USENIX Security 2022), pp. 2765–2782 (2022)
[9]
Zeping, Yu., Zheng, W., Wang, J., Tang, Q., Nie, S., Shi, W.: CodeCMR: cross-modal retrieval for function-level binary source code matching. In: Advances in Neural Information Processing Systems, vol. 33, pp. 3872–3883 (2020)
[10]
Yuan, Z., et al.: B2SFinder: detecting open-source software reuse in COTS software. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1038–1049. IEEE (2019)
[11]
Ban, G., Lili, X., Xiao, Y., Li, X., Yuan, Z., Huo, W.: B2SMatcher: fine-grained version identification of open-source software in binary files. Cybersecurity 4(1), 1–21 (2021)
[12]
He, J., Ivanov, P., Tsankov, P., Raychev, V., Vechev, M.: Debin: predicting debug information in stripped binaries. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1667–1680 (2018)
[13]
Lacomis, J., et al.: DIRE: a neural approach to decompiled identifier naming. In: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 628–639. IEEE (2019)
[14]
Schwartz, E.J., Cohen, C.F., Duggan, M., Gennari, J., Havrilla, J.S., Hines, C.: Using logic programming to recover C++ classes and methods from compiled executables. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 426–441 (2018)
[15]
Zhang, M., Sekar, R.: Control flow and code integrity for COTS binaries: an effective defense against real-world ROP attacks. In: Proceedings of the 31st Annual Computer Security Applications Conference, pp. 91–100 (2015)
[16]
Abadi M, Budiu M, Erlingsson U, and Ligatti J Control-flow integrity principles, implementations, and applications ACM Trans. Inf. Sys. Secur. (TISSEC) 2009 13 1 1-40
[17]
Nethercote N and Seward J Valgrind: a framework for heavyweight dynamic binary instrumentation ACM Sigplan Not. 2007 42 6 89-100
[19]
Brumley D, Jager I, Avgerinos T, and Schwartz EJ Gopalakrishnan G and Qadeer S BAP: a binary analysis platform Computer Aided Verification 2011 Heidelberg Springer 463-469
[20]
Shoshitaishvili, Y., et al.: SOK: (state of) the art of war: offensive techniques in binary analysis. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 138–157. IEEE (2016)
[21]
Jia, A., et al.: 1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis. ACM Trans. Softw. Eng. Methodol. (2022). Just Accepted
[22]
Serrano M Glaser H, Hartel P, and Kuchen H Inline expansion: When and how? Programming Languages: Implementations, Logics, and Programs 1997 Heidelberg Springer 143-157
[23]
Bao, T., Burket, J., Woo, M., Turner, R., Brumley, D.: {BYTEWEIGHT}: learning to recognize functions in binary code. In: 23rd USENIX Security Symposium (USENIX Security 2014), pp. 845–860 (2014)
[24]
Ahmed T, Devanbu P, and Sawant AA Learning to find usages of library functions in optimized binaries IEEE Trans. Softw. Eng. 2021 48 10 3862-3876
[25]
Qiu J, Su X, and Ma P Using reduced execution flow graph to identify library functions in binary code IEEE Trans. Softw. Eng. 2015 42 2 187-202
[26]
Chandramohan, M., Xue, Y., Xu, Z., Liu, Y., Cho, C.Y., Tan, H.B.K.: BinGo: cross-architecture cross-OS binary search. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 678–689 (2016)
[27]
Ding, S.H.H., Fung, B.C.M., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 472–489. IEEE (2019)
[28]
Guilfanov, I.: Decompiler internals: microcode (2018)
[29]
Lin, Y., Gao, D.: When function signature recovery meets compiler optimization. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 36–52. IEEE (2021)
[30]
Beyer, D., Fararooy, A.: A simple and effective measure for complex low-level dependencies. In: 2010 IEEE 18th International Conference on Program Comprehension, pp. 80–83. IEEE (2010)
[31]
Yakdan, K., Eschweiler, S., Gerhards-Padilla, E., Smith, M.: No More Gotos: decompilation using pattern-independent control-flow structuring and semantic-preserving transformations. In: NDSS. Citeseer (2015)
[32]
Becker, P., Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D.: Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional, New York (1999)
[33]
Anderson, D.: Libdwarf and dwarfdump (2011)
[34]
Rosenblum, N.E., Zhu, X., Miller, B.P., Hunt, K.: Learning to analyze binary computer code. In: AAAI, pp. 798–804 (2008)
[35]
Shin, E.C.R., Song, D., Moazzezi, R.: Recognizing functions in binaries with neural networks. In: 24th USENIX security symposium (USENIX Security 2015), pp. 611–626 (2015)
[36]
Wang, S., Wang, P., Wu, D.: Semantics-aware machine learning for function recognition in binary code. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 388–398. IEEE (2017)
[37]
Pei, K., Guan, J., King, D.W., Yang, J., Jana, S.: XDA: accurate, robust disassembly with transfer learning. In: Proceedings of the 2021 Network and Distributed System Security Symposium (NDSS) (2021)
[38]
Yu, S., Qu, Y., Hu, X., Yin, H.: DeepDi: learning a relational graph convolutional network model on instructions for fast and accurate disassembly. In: Proceedings of the USENIX Security Symposium (2022)

Index Terms

  1. FSmell: Recognizing Inline Function in Binary Code
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        Computer Security – ESORICS 2023: 28th European Symposium on Research in Computer Security, The Hague, The Netherlands, September 25–29, 2023, Proceedings, Part II
        Sep 2023
        538 pages
        ISBN:978-3-031-51475-3
        DOI:10.1007/978-3-031-51476-0
        • Editors:
        • Gene Tsudik,
        • Mauro Conti,
        • Kaitai Liang,
        • Georgios Smaragdakis

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 11 January 2024

        Author Tags

        1. binary analysis
        2. inline function
        3. function recognition

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 13 Jan 2025

        Other Metrics

        Citations

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media