research-article

Parser-directed fuzzing

Authors:

Rahul Gopinath,

Alexander Kampmann,

Matthias Höschele,

Andreas ZellerAuthors Info & Claims

PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 548 - 560

https://doi.org/10.1145/3314221.3314651

Published: 08 June 2019 Publication History

Abstract

To be effective, software test generation needs to well cover the space of possible inputs. Traditional fuzzing generates large numbers of random inputs, which however are unlikely to contain keywords and other specific inputs of non-trivial input languages. Constraint-based test generation solves conditions of paths leading to uncovered code, but fails on programs with complex input conditions because of path explosion. In this paper, we present a test generation technique specifically directed at input parsers. We systematically produce inputs for the parser and track comparisons made; after every rejection, we satisfy the comparisons leading to rejection. This approach effectively covers the input space: Evaluated on five subjects, from CSV files to JavaScript, our pFuzzer prototype covers more tokens than both random-based and constraint-based approaches, while requiring no symbolic analysis and far fewer tests than random fuzzers.

Supplementary Material

WEBM File (p548-mathis.webm)

Download
84.70 MB

References

[1]

Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing Program Input Grammars. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, New York, NY, USA, 95–110.

Digital Library

[2]

Sofia Bekrar, Chaouki Bekrar, Roland Groz, and Laurent Mounier. 2012. A Taint Based Approach for Smart Fuzzing. In International Conference on Software Testing, Verification and Validation . IEEE Computer Society, Washington, DC, USA, 818–825.

Digital Library

[3]

Ben Hoyt and contributors. 2018. inih - Simple .INI file parser in C, good for embedded systems. https://github.com/benhoyt/inih . Accessed: 2018-10-25.

[4]

D. L. Bird and C. U. Munoz. 1983. Automatic Generation of Random Self-checking Test Cases. IBM Systems Journal 22, 3 (Sept. 1983), 229– 245.

Digital Library

[5]

Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In USENIX conference on Operating systems design and implementation, Vol. 8. 209–224.

Digital Library

[6]

Cesanta Software. 2018. Embedded JavaScript engine for C/C++ https: //mongoose-os.com . https://github.com/cesanta/mjs . Accessed: 2018-06-21.

[7]

Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing mayhem on binary code. In IEEE Symposium on Security and Privacy . IEEE, 380–394.

Digital Library

[8]

Peng Chen and Hao Chen. 2018. Angora: Efficient Fuzzing by Principled Search. In IEEE Symposium on Security and Privacy. http: //arxiv.org/abs/1803.01307

[9]

Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis . ACM, 95–105.

Digital Library

[10]

Dave Gamble and contributors. 2018. cJSON - Ultralightweight JSON parser in ANSI C. https://github.com/DaveGamble/cJSON . Accessed: 2018-10-25.

[11]

Will Drewry and Tavis Ormandy. 2007. Flayer: Exposing Application Internals. In USENIX Workshop on Offensive Technologies (WOOT ’07). USENIX Association, Berkeley, CA, USA, Article 1, 9 pages.

Digital Library

[12]

Vijay Ganesh, Tim Leek, and Martin Rinard. 2009. Taint-based Directed Whitebox Fuzzing. In International Conference on Software Engineering (ICSE ’09) . IEEE Computer Society, Washington, DC, USA, 474–484.

Digital Library

[13]

Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox Fuzzing for Security Testing. Queue 10, 1, Article 20 (Jan. 2012), 20:20–20:27 pages.

Digital Library

[14]

Patrice Godefroid, Michael Y Levin, David A Molnar, et al. 2008. Automated whitebox fuzz testing. In Network and Distributed System Security Symposium, Vol. 8. 151–166.

[15]

Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&fuzz: Machine learning for input fuzzing. In IEEE/ACM Automated Software Engineering . IEEE Press, 50–59.

Digital Library

[16]

HyungSeok Han and Sang Kil Cha. 2017. IMF: Inferred Model-based Fuzzer. In ACM SIGSAC Conference on Computer and Communications Security (CCS ’17) . ACM, New York, NY, USA, 2345–2358.

Digital Library

[17]

K. V. Hanford. 1970. Automatic Generation of Test Cases. IBM Systems Journal 9, 4 (Dec. 1970), 242–257.

Digital Library

[18]

Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with Code Fragments. In USENIX Conference on Security Symposium. 445– 458.

Digital Library

[19]

Matthias Höschele and Andreas Zeller. 2016. Mining Input Grammars from Dynamic Taints. In IEEE/ACM Automated Software Engineering (ASE 2016) . ACM, New York, NY, USA, 720–725.

Digital Library

[20]

JamesRamm and contributors. 2018. csv_parser - C library for parsing CSV files. https://github.com/JamesRamm/csv_parser . Accessed: 2018-10-25.

[21]

Min Gyung Kang, Stephen McCamant, Pongsin Poosankam, and Dawn Song. 2011. DTA++: Dynamic Taint Analysis with Targeted ControlFlow Propagation. In Proceedings of the Network and Distributed System Security Symposium, NDSS 2011, San Diego, California, USA, 6th February - 9th February 2011 .

[22]

Kartik Talwar. 2018. Tiny-C Compiler. https://gist.github.com/ KartikTalwar/3095780 . Accessed: 2018-10-25.

[23]

Yuekang Li, Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu, and Alwen Tiu. 2017. Steelix: program-state based binary fuzzing. In ACM SIGSOFT Symposium on The Foundations of Software Engineering . ACM, 627–637.

Digital Library

[24]

Rupak Majumdar and Koushik Sen. 2007. Hybrid Concolic Testing. In International Conference on Software Engineering (ICSE ’07). IEEE Computer Society, Washington, DC, USA, 416–426.

Digital Library

[25]

Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. In Workshop of Parallel and Distributed Debugging . Academic Medicine, pages ix–xxi,.

[26]

Charlie Miller, Zachary NJ Peterson, et al. 2007. Analysis of mutation and generation-based fuzzing . Technical Report. Independent Security Evaluators.

[27]

Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. Vuzzer: Application-aware evolutionary fuzzing. In Network and Distributed System Security Symposium.

[28]

Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering . ACM, 155–165.

Digital Library

[29]

Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2016. Driller: Augmenting Fuzzing Through Selective Symbolic Execution. In Network and Distributed System Security Symposium, Vol. 16. 1–16.

[30]

Joachim Viide, Aki Helin, Marko Laakso, Pekka Pietikäinen, Mika Seppänen, Kimmo Halunen, Rauli Puuperä, and Juha Röning. 2008. Experiences with Model Inference Assisted Fuzzing. In USENIX Workshop on Offensive Technologies (WOOT’08) . USENIX Association, Berkeley, CA, USA, Article 2, 6 pages.

[31]

Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Datadriven seed generation for fuzzing. In IEEE Symposium on Security and Privacy . IEEE, 579–594.

[32]

Wikipedia. 2018. List of File Formats. https://en.wikipedia.org/wiki/ List_of_file_formats . Accessed: 2018-11-14.

[33]

Jingbo Yan, Yuqing Zhang, and Dingning Yang. 2013. Structurized grammar-based fuzz testing for programs with highly structured inputs. Security and Communication Networks 6, 11 (2013), 1319–1330.

[34]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In ACM SIGPLAN Notices, Vol. 46. ACM, 283–294.

Digital Library

[35]

Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. 2018. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In USENIX Conference on Security Symposium. USENIX Association.

Digital Library

[36]

Michal Zalewski. 2018. American Fuzzy Lop. http://lcamtuf.coredump. cx/afl/ . Accessed: 2018-01-28.

Cited By

Zhang XZhang CLi XDu ZMao BLi YZheng YLi YPan LLiu YDeng R(2024)A Survey of Protocol FuzzingACM Computing Surveys10.1145/369678857:2(1-36)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3696788
Steinhöfel DZeller A(2024)Language-Based Software TestingCommunications of the ACM10.1145/363152067:4(80-84)Online publication date: 25-Mar-2024
https://dl.acm.org/doi/10.1145/3631520
Paliath VTrickel EBao TWang RDoupé AShoshitaishvili Y(2024)SandPuppy: Deep-State Fuzzing Guided by Automatic Detection of State-Representative VariablesDetection of Intrusions and Malware, and Vulnerability Assessment10.1007/978-3-031-64171-8_12(227-250)Online publication date: 9-Jul-2024
https://doi.org/10.1007/978-3-031-64171-8_12
Show More Cited By

Index Terms

Parser-directed fuzzing
1. Security and privacy
  1. Software and application security
    1. Software security engineering

Recommendations

Fuzzing for CPS Mutation Testing
ASE '23: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering

Mutation testing can help reduce the risks of releasing faulty software. For such reason, it is a desired practice for the development of embedded software running in safety-critical cyber-physical systems (CPS). Unfortunately, state-of-the-art test data ...
Online Model-Based Behavioral Fuzzing
ICSTW '13: Proceedings of the 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops

Fuzz testing or fuzzing is interface robustness testing by stressing the interface of a system under test (SUT) with invalid input data. It aims at finding security-relevant weaknesses in the implementation that may result in a crash of the system-under-...
Guiding Greybox Fuzzing with Mutation Testing
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

Greybox fuzzing and mutation testing are two popular but mostly independent fields of software testing research that have so far had limited overlap. Greybox fuzzing, generally geared towards searching for new bugs, predominantly uses code coverage ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2019

1162 pages

ISBN:9781450367127

DOI:10.1145/3314221

General Chair:
Kathryn S. McKinley
Google, USA
,
Program Chair:
Kathleen Fisher
Tufts University, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PLDI '19

Sponsor:

SIGPLAN

PLDI '19: 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

June 22 - 26, 2019

AZ, Phoenix, USA

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
968
Total Downloads

Downloads (Last 12 months)138
Downloads (Last 6 weeks)21

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XZhang CLi XDu ZMao BLi YZheng YLi YPan LLiu YDeng R(2024)A Survey of Protocol FuzzingACM Computing Surveys10.1145/369678857:2(1-36)Online publication date: 10-Oct-2024
https://dl.acm.org/doi/10.1145/3696788
Steinhöfel DZeller A(2024)Language-Based Software TestingCommunications of the ACM10.1145/363152067:4(80-84)Online publication date: 25-Mar-2024
https://dl.acm.org/doi/10.1145/3631520
Paliath VTrickel EBao TWang RDoupé AShoshitaishvili Y(2024)SandPuppy: Deep-State Fuzzing Guided by Automatic Detection of State-Representative VariablesDetection of Intrusions and Malware, and Vulnerability Assessment10.1007/978-3-031-64171-8_12(227-250)Online publication date: 9-Jul-2024
https://doi.org/10.1007/978-3-031-64171-8_12
Kim TChoi JHeo KCha SCalandrino JTroncoso C(2023)DAFLProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620513(4931-4948)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620513
Liu JHuang YWang ZMa LFang CGu MZhang XChen Z(2023)Generation-based Differential Fuzzing for Deep Learning LibrariesACM Transactions on Software Engineering and Methodology10.1145/362815933:2(1-28)Online publication date: 23-Dec-2023
https://dl.acm.org/doi/10.1145/3628159
Dutra RGopinath RZeller A(2023)FormatFuzzer: Effective Fuzzing of Binary File FormatsACM Transactions on Software Engineering and Methodology10.1145/362815733:2(1-29)Online publication date: 22-Dec-2023
https://dl.acm.org/doi/10.1145/3628157
Liu JZhu FHe F(2023)Automated Ambiguity Detection in Layout-Sensitive GrammarsProceedings of the ACM on Programming Languages10.1145/36228387:OOPSLA2(1150-1175)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622838
Ye GHu TTang ZFan ZTan SZhang BQian WWang ZChandra SBlincoe KTonella P(2023)A Generative and Mutational Approach for Synthesizing Bug-Exposing Test Cases to Guide Compiler FuzzingProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616332(1127-1139)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616332
Souza BPradel MChandra SBlincoe KTonella P(2023)LExecutor: Learning-Guided ExecutionProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616254(1522-1534)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616254
Saha SSarker LShafiuzzaman MShou CLi ASankaran GBultan TJust RFraser G(2023)Rare Path Guided FuzzingProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598136(1295-1306)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597926.3598136
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents