skip to main content
10.1145/3377811.3380421acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Ankou: guiding grey-box fuzzing towards combinatorial difference

Published: 01 October 2020 Publication History

Abstract

Grey-box fuzzing is an evolutionary process, which maintains and evolves a population of test cases with the help of a fitness function. Fitness functions used by current grey-box fuzzers are not informative in that they cannot distinguish different program executions as long as those executions achieve the same coverage. The problem is that current fitness functions only consider a union of data, but not their combination. As such, fuzzers often get stuck in a local optimum during their search. In this paper, we introduce Ankou, the first grey-box fuzzer that recognizes different combinations of execution information, and present several scalability challenges encountered while designing and implementing Ankou. Our experimental results show that Ankou is 1.94× and 8.0× more effective in finding bugs than AFL and Angora, respectively.

References

[1]
[n.d.]. Data Flow Sanitizer. http://clang.llvm.org/docs/DataFlowSanitizer.html.
[2]
[n.d.]. Fidgety AFL. https://groups.google.com/forum/#!topic/afl-users/fOPeb62FZUg.
[3]
[n.d.]. The Go Programming Language. https://golang.org.
[4]
[n.d.]. Gonum Numeric Library. https://www.gonum.org.
[5]
[n.d.]. LibFuzzer. http://llvm.org/docs/LibFuzzer.html.
[6]
Mike Aizatsky, Kostya Serebryany, Oliver Chang, Abhishek Arya, and Meredith Whittaker. 2016. Announcing OSS-Fuzz: Continuous Fuzzing for Open Source Software. Google Testing Blog.
[7]
Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. 1--10.
[8]
Raman Arora, Andy Cotter, and Nati Srebro. 2013. Stochastic optimization of PCA with capped MSG. In Advances in Neural Information Processing Systems. 1815--1823.
[9]
Cornelius Aschermann, Sergej Schumilo, Tim Blazytko, Robert Gawlik, and Thorsten Holz. 2019. REDQUEEN: Fuzzing with Input-to-State Correspondence. In Proceedings of the Network and Distributed System Security Symposium.
[10]
Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. 2017. Directed Greybox Fuzzing. In Proceedings of the ACM Conference on Computer and Communications Security. 2329--2344.
[11]
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coveragebased Greybox Fuzzing as Markov Chain. In Proceedings of the ACM Conference on Computer and Communications Security. 1032--1043.
[12]
Sang Kil Cha, Maverick Woo, and David Brumley. 2015. Program-Adaptive Mutational Fuzzing. In Proceedings of the IEEE Symposium on Security and Privacy. 725--741.
[13]
Hongxu Chen, Yinxing Xue, Yuekang Li, Bihuan Chen, Xiaofei Xie, Xiuheng Wu, and Yang Liu. 2018. Hawkeye: Towards a Desired Directed Grey-box Fuzzer. In Proceedings of the ACM Conference on Computer and Communications Security. 2095--2108.
[14]
Peng Chen and Hao Chen. 2018. Angora: Efficient Fuzzing by Principled Search. In Proceedings of the IEEE Symposium on Security and Privacy. 855--869.
[15]
Jaeseung Choi, Joonun Jang, Choongwoo Han, and Sang Kil Cha. 2019. Grey-box Concolic Testing on Binary Code. In Proceedings of the International Conference on Software Engineering. 736--747.
[16]
Paolo Ciaccia, Marco Patella, and Pavel Zezula. 1997. M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces. In Proceedings of the International Conference on Very Large Data Bases. 426--435.
[17]
Weidong Cui, Marcus Peinado, Sang Kil Cha, Yanick Fratantonio, and Vasileios P. Kemerlis. 2016. RETracer: Triaging Crashes by Reverse Execution from Partial Memory Dumps. In Proceedings of the International Conference on Software Engineering. 820--831.
[18]
Al Danial. [n.d.]. Count Lines of Code: Coverage Tool. http://cloc.sourceforge.net/.
[19]
Shawn Embleton, Sherri Sparks, and Ryan Cunningham. 2006. "Sidewinder": An Evolutionary Guidance System for Malicious Input Crafting. In Proceedings of the Black Hat USA.
[20]
Robert Feldt, Simon Poulding, David Clark, and Shin Yoo. 2016. Test Set Diameter: Quantifying the Diversity of Sets of Test Cases. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation. 223--233.
[21]
John GF Francis. 1961. The QR transformation a unitary analogue to the LR transformation. Comput. J. 4, 3 (1961), 265--271.
[22]
Shuitao Gan, Chao Zhang, Xiaojun Qin, Xuwen Tu, Kang Li, Zhongyu Pei, and Zuoning Chen. 2018. CollAFL: Path Sensitive Fuzzing. In Proceedings of the IEEE Symposium on Security and Privacy. 660--677.
[23]
Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&Fuzz: Machine Learning for Input Fuzzing. In Proceedings of the International Conference on Automated Software Engineering. 50--59.
[24]
Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions. SIAM review 53, 2 (2011), 217--288.
[25]
HyungSeok Han and Sang Kil Cha. 2017. IMF: Inferred Model-based Fuzzer. In Proceedings of the ACM Conference on Computer and Communications Security. 2345--2358.
[26]
HyungSeok Han, DongHyeon Oh, and Sang Kil Cha. 2019. CodeAlchemist: Semantics-Aware Code Generation to Find Vulnerabilities in JavaScript Engines. In Proceedings of the Network and Distributed System Security Symposium.
[27]
Ian T. Jolliffe. 2011. Principal Component Analysis. Springer.
[28]
George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating fuzz testing. In Proceedings of the ACM Conference on Computer and Communications Security. 2123--2138.
[29]
lafintel. 2016. Circumventing Fuzzing Roadblocks with Compiler Transformations. https://lafintel.wordpress.com/2016/08/15/circumventing-fuzzing-roadblocks-with-compiler-transformations/.
[30]
Joel Lehman and Kenneth O Stanley. 2008. Exploiting Open-Endedness to Solve Problems through the Search for Novelty. In Proceedings of the International Conference on Artificial Life. 329--336.
[31]
Caroline Lemieux, Rohan Padhye, Koushik Sen, and Dawn Song. 2018. PerfFuzz: Automatically Generating Pathological Inputs. In Proceedings of the International Symposium on Software Testing and Analysis. 254--265.
[32]
Caroline Lemieux and Koushik Sen. 2018. FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage. In Proceedings of the International Conference on Automated Software Engineering. 475--485.
[33]
Yuekang Li, Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu, and Alwen Tiu. 2017. Steelix: Program-state Based Binary Fuzzing. In Proceedings of the International Symposium on Foundations of Software Engineering. 627--637.
[34]
Yuekang Li, Yinxing Xue, Hongxu Chen, Xiuheng Wu, Cen Zhang, Xiaofei Xie, Haijun Wang, and Yang Liu. 2019. Cerebro: Context-Aware Adaptive Fuzzing for Effective Vulnerability Detection. In Proceedings of the International Symposium on Foundations of Software Engineering. 533--544.
[35]
Daniel Liew, Cristian Cadar, Alastair F Donaldson, and J Ryan Stinnett. 2019. Just Fuzz It: Solving Floating-Point Constraints using Coverage-Guided Fuzzing. In Proceedings of the International Symposium on Foundations of Software Engineering. 521--532.
[36]
Jorge Pinilla López. 2019. Improving fuzzing performance using hardware-accelerated hashing and PCA guidance. https://cs.anu.edu.au/courses/csprojects/19S1/reports/u6759601_report.pdf.
[37]
Valentin J. M. Manès, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, and Maverick Woo. 2019. The Art, Science, and Engineering of Fuzzing: A Survey. IEEE Transactions on Software Engineering (2019).
[38]
Valentin J. M. Manès, Soomin Kim, and Sang Kil Cha. 2020. Ankou. https://github.com/SoftSec-KAIST/Ankou.
[39]
Björn Mathis, Rahul Gopinath, Michaël Mera, Alexander Kampmann, Matthias Höschele, and Andreas Zeller. 2019. Parser-directed Fuzzing. In Proceedings of the ACM Conference on Programming Language Design and Implementation. 548--560.
[40]
Phil McMinn. 2011. Search-Based Software Testing: Past, Present and Future. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation Workshops. 153--163.
[41]
David Molnar, Xue Cong Li, and David A. Wagner. 2009. Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs. In Proceedings of the USENIX Security Symposium. 67--82.
[42]
Jiazhong Nie, Wojciech Kotłowski, and Manfred K. Warmuth. 2013. Online PCA with Optimal Regrets. In Proceedings of the International Conference on Algorithmic Learning Theory. 98--112.
[43]
Shankara Pailoor, Andrew Aday, and Suman Jana. 2018. MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation. In Proceedings of the USENIX Security Symposium. 729--743.
[44]
Jibesh Patra and Michael Pradel. 2016. Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data. Technical Report TUD-CS-2016-14664. TU Darmstadt.
[45]
Van-Thuan Pham, Marcel Böhme, Andrew E. Santosa, Alexandru R. Căciulescu, and Abhik Roychoudhury. 2019. Smart Greybox Fuzzing. IEEE Transactions on Software Engineering (2019).
[46]
Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. VUzzer: Application-aware Evolutionary Fuzzing. In Proceedings of the Network and Distributed System Security Symposium.
[47]
Alexandre Rebert, Sang Kil Cha, Thanassis Avgerinos, Jonathan Foote, David Warren, Gustavo Grieco, and David Brumley. 2014. Optimizing Seed Selection for Fuzzing. In Proceedings of the USENIX Security Symposium. 861--875.
[48]
Sam Roweis. 1997. EM Algorithms for PCA and SPCA. In Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems. 626--632.
[49]
Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitriy Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In Proceedings of the USENIX Annual Technical Conference. 309--318.
[50]
Ohad Shamir. 2015. A stochastic PCA and SVD algorithm with an exponential convergence rate. In International Conference on Machine Learning. 144--152.
[51]
Heyuan Shi, Runzhe Wang, Ying Fu, Mingzhe Wang, Xiaohai Shi, Xun Jiao, Houbing Song, Yu Jiang, and Jiaguang Sun. 2019. Industry Practice of Coverage-Guided Enterprise Linux Kernel Fuzzing. In Proceedings of the International Symposium on Foundations of Software Engineering. 986--995.
[52]
Gilbert Strang. 2003. Introduction to Linear Algebra (3 ed.). Wellesley-Cambridge Press.
[53]
Charles F Van Loan and Gene H Golub. 1983. Matrix computations. Johns Hopkins University Press.
[54]
Manfred K. Warmuth and Dima Kuzmin. 2008. Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension. Journal of Machine Learning Research 9 (2008), 2287--2320.
[55]
Maverick Woo, Sang Kil Cha, Samantha Gottlieb, and David Brumley. 2013. Scheduling Black-box Mutational Fuzzing. In Proceedings of the ACM Conference on Computer and Communications Security. 511--522.
[56]
Jun Xu, Dongliang Mu, Ping Chen, Xinyu Xing, Pei Wang, and Peng Liu. 2016. CREDAL: Towards Locating a Memory Corruption Vulnerability with Your Core Dump. In Proceedings of the ACM Conference on Computer and Communications Security. 529--540.
[57]
Wei You, Xuwei Liu, Shiqing Ma, David Perry, Xiangyu Zhang, and Bin Liang. 2019. SLF: Fuzzing Without Valid Seed Inputs. In Proceedings of the International Conference on Software Engineering. 712--723.
[58]
Michal Zalewski. [n.d.]. American Fuzzy Lop. http://lcamtuf.coredump.cx/afl/.
[59]
Michal Zalewski. [n.d.]. Technical "whitepaper" for afl-fuzz. http://lcamtuf.coredump.cx/afl/technical_details.txt.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
June 2020
1640 pages
ISBN:9781450371216
DOI:10.1145/3377811
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • KIISE: Korean Institute of Information Scientists and Engineers
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2020

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. fuzz testing
  2. grey-box fuzzing
  3. guided fuzzing
  4. principal component analysis
  5. software testing

Qualifiers

  • Research-article

Funding Sources

  • Korea government (MSIT)

Conference

ICSE '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)4
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media