skip to main content
10.1145/3314221.3314651acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Parser-directed fuzzing

Published: 08 June 2019 Publication History

Abstract

To be effective, software test generation needs to well cover the space of possible inputs. Traditional fuzzing generates large numbers of random inputs, which however are unlikely to contain keywords and other specific inputs of non-trivial input languages. Constraint-based test generation solves conditions of paths leading to uncovered code, but fails on programs with complex input conditions because of path explosion. In this paper, we present a test generation technique specifically directed at input parsers. We systematically produce inputs for the parser and track comparisons made; after every rejection, we satisfy the comparisons leading to rejection. This approach effectively covers the input space: Evaluated on five subjects, from CSV files to JavaScript, our pFuzzer prototype covers more tokens than both random-based and constraint-based approaches, while requiring no symbolic analysis and far fewer tests than random fuzzers.

Supplementary Material

WEBM File (p548-mathis.webm)

References

[1]
Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing Program Input Grammars. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, New York, NY, USA, 95–110.
[2]
Sofia Bekrar, Chaouki Bekrar, Roland Groz, and Laurent Mounier. 2012. A Taint Based Approach for Smart Fuzzing. In International Conference on Software Testing, Verification and Validation . IEEE Computer Society, Washington, DC, USA, 818–825.
[3]
Ben Hoyt and contributors. 2018. inih - Simple .INI file parser in C, good for embedded systems. https://github.com/benhoyt/inih . Accessed: 2018-10-25.
[4]
D. L. Bird and C. U. Munoz. 1983. Automatic Generation of Random Self-checking Test Cases. IBM Systems Journal 22, 3 (Sept. 1983), 229– 245.
[5]
Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In USENIX conference on Operating systems design and implementation, Vol. 8. 209–224.
[6]
Cesanta Software. 2018. Embedded JavaScript engine for C/C++ https: //mongoose-os.com . https://github.com/cesanta/mjs . Accessed: 2018-06-21.
[7]
Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing mayhem on binary code. In IEEE Symposium on Security and Privacy . IEEE, 380–394.
[8]
Peng Chen and Hao Chen. 2018. Angora: Efficient Fuzzing by Principled Search. In IEEE Symposium on Security and Privacy. http: //arxiv.org/abs/1803.01307
[9]
Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis . ACM, 95–105.
[10]
Dave Gamble and contributors. 2018. cJSON - Ultralightweight JSON parser in ANSI C. https://github.com/DaveGamble/cJSON . Accessed: 2018-10-25.
[11]
Will Drewry and Tavis Ormandy. 2007. Flayer: Exposing Application Internals. In USENIX Workshop on Offensive Technologies (WOOT ’07). USENIX Association, Berkeley, CA, USA, Article 1, 9 pages.
[12]
Vijay Ganesh, Tim Leek, and Martin Rinard. 2009. Taint-based Directed Whitebox Fuzzing. In International Conference on Software Engineering (ICSE ’09) . IEEE Computer Society, Washington, DC, USA, 474–484.
[13]
Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox Fuzzing for Security Testing. Queue 10, 1, Article 20 (Jan. 2012), 20:20–20:27 pages.
[14]
Patrice Godefroid, Michael Y Levin, David A Molnar, et al. 2008. Automated whitebox fuzz testing. In Network and Distributed System Security Symposium, Vol. 8. 151–166.
[15]
Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&fuzz: Machine learning for input fuzzing. In IEEE/ACM Automated Software Engineering . IEEE Press, 50–59.
[16]
HyungSeok Han and Sang Kil Cha. 2017. IMF: Inferred Model-based Fuzzer. In ACM SIGSAC Conference on Computer and Communications Security (CCS ’17) . ACM, New York, NY, USA, 2345–2358.
[17]
K. V. Hanford. 1970. Automatic Generation of Test Cases. IBM Systems Journal 9, 4 (Dec. 1970), 242–257.
[18]
Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with Code Fragments. In USENIX Conference on Security Symposium. 445– 458.
[19]
Matthias Höschele and Andreas Zeller. 2016. Mining Input Grammars from Dynamic Taints. In IEEE/ACM Automated Software Engineering (ASE 2016) . ACM, New York, NY, USA, 720–725.
[20]
JamesRamm and contributors. 2018. csv_parser - C library for parsing CSV files. https://github.com/JamesRamm/csv_parser . Accessed: 2018-10-25.
[21]
Min Gyung Kang, Stephen McCamant, Pongsin Poosankam, and Dawn Song. 2011. DTA++: Dynamic Taint Analysis with Targeted ControlFlow Propagation. In Proceedings of the Network and Distributed System Security Symposium, NDSS 2011, San Diego, California, USA, 6th February - 9th February 2011 .
[22]
Kartik Talwar. 2018. Tiny-C Compiler. https://gist.github.com/ KartikTalwar/3095780 . Accessed: 2018-10-25.
[23]
Yuekang Li, Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu, and Alwen Tiu. 2017. Steelix: program-state based binary fuzzing. In ACM SIGSOFT Symposium on The Foundations of Software Engineering . ACM, 627–637.
[24]
Rupak Majumdar and Koushik Sen. 2007. Hybrid Concolic Testing. In International Conference on Software Engineering (ICSE ’07). IEEE Computer Society, Washington, DC, USA, 416–426.
[25]
Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. In Workshop of Parallel and Distributed Debugging . Academic Medicine, pages ix–xxi,.
[26]
Charlie Miller, Zachary NJ Peterson, et al. 2007. Analysis of mutation and generation-based fuzzing . Technical Report. Independent Security Evaluators.
[27]
Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. Vuzzer: Application-aware evolutionary fuzzing. In Network and Distributed System Security Symposium.
[28]
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering . ACM, 155–165.
[29]
Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2016. Driller: Augmenting Fuzzing Through Selective Symbolic Execution. In Network and Distributed System Security Symposium, Vol. 16. 1–16.
[30]
Joachim Viide, Aki Helin, Marko Laakso, Pekka Pietikäinen, Mika Seppänen, Kimmo Halunen, Rauli Puuperä, and Juha Röning. 2008. Experiences with Model Inference Assisted Fuzzing. In USENIX Workshop on Offensive Technologies (WOOT’08) . USENIX Association, Berkeley, CA, USA, Article 2, 6 pages.
[31]
Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Datadriven seed generation for fuzzing. In IEEE Symposium on Security and Privacy . IEEE, 579–594.
[32]
Wikipedia. 2018. List of File Formats. https://en.wikipedia.org/wiki/ List_of_file_formats . Accessed: 2018-11-14.
[33]
Jingbo Yan, Yuqing Zhang, and Dingning Yang. 2013. Structurized grammar-based fuzz testing for programs with highly structured inputs. Security and Communication Networks 6, 11 (2013), 1319–1330.
[34]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In ACM SIGPLAN Notices, Vol. 46. ACM, 283–294.
[35]
Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. 2018. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In USENIX Conference on Security Symposium. USENIX Association.
[36]
Michal Zalewski. 2018. American Fuzzy Lop. http://lcamtuf.coredump. cx/afl/ . Accessed: 2018-01-28.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2019
1162 pages
ISBN:9781450367127
DOI:10.1145/3314221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fuzzing
  2. parsers
  3. security
  4. test generation

Qualifiers

  • Research-article

Conference

PLDI '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)138
  • Downloads (Last 6 weeks)21
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media