skip to main content
10.1145/3519939.3523716acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

“Synthesizing input grammars”: a replication study

Published: 09 June 2022 Publication History

Abstract

When producing test inputs for a program, test generators ("fuzzers") can greatly profit from grammars that formally describe the language of expected inputs. In recent years, researchers thus have studied means to recover input grammars from programs and their executions. The GLADE algorithm by Bastani et al., published at PLDI 2017, was the first black-box approach to claim context-free approximation of input specification for non-trivial languages such as XML, Lisp, URLs, and more.
Prompted by recent observations that the GLADE algorithm may show lower performance than reported in the original paper, we have reimplemented the GLADE algorithm from scratch. Our evaluation confirms that the effectiveness score (F1) reported in the GLADE paper is overly optimistic, and in some cases, based on the wrong language. Furthermore, GLADE fares poorly in several real-world languages evaluated, producing grammars that spend megabytes to enumerate inputs.

References

[1]
[n. d.]. Artifact Review and Badging – Version 1.1. https://www.acm.org/publications/policies/artifact-review-and-badging-current
[2]
[n. d.]. GLADE Implementation and experiments. https://github.com/obastani/glade-full
[3]
[n. d.]. Grammars written for ANTLR v4. https://github.com/antlr/grammars-v4
[4]
Dana Angluin. 1987. Learning Regular Sets from Queries and Counterexamples. Inf. Comput., 75, 2 (1987), Nov., 87–106. issn:0890-5401 https://doi.org/10.1016/0890-5401(87)90052-6
[5]
Dana Angluin and Michael Kharitonov. 1995. When Won’t Membership Queries Help? J. Comput. System Sci., 50, 2 (1995), 336–355. https://doi.org/10.1145/103418.103420
[6]
Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing Program Input Grammars. In ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, USA. 95–110. isbn:978-1-4503-4988-8 https://doi.org/10.1145/3140587.3062349
[7]
Juan Caballero, Heng Yin, Zhenkai Liang, and Dawn Song. 2007. Polyglot: Automatic Extraction of Protocol Message Format Using Dynamic Binary Analysis. In ACM Conference on Computer and Communications Security. ACM, New York, NY, USA. 317–329. isbn:978-1-59593-703-2 https://doi.org/10.1145/1315245.1315286
[8]
Alexander Clark. 2010. Distributional learning of some context-free languages with a minimally adequate teacher. In International Colloquium on Grammatical Inference. 24–37. https://doi.org/10.1007/978-3-642-15488-1_4
[9]
Jean-Christophe Deprez and Arun Lakhotia. 2000. A Formalism to Automate Mapping from Program Features to Code. In IWPC. 69–78. https://doi.org/10.1109/WPC.2000.852481
[10]
E Mark Gold. 1967. Language identification in the limit. Information and control, 10, 5 (1967), 447–474. https://doi.org/10.1016/S0019-9958(67)91165-5
[11]
E Mark Gold. 1978. Complexity of automaton identification from given data. Information and control, 37, 3 (1978), 302–320. https://doi.org/10.1016/S0019-9958(78)90562-4
[12]
Rahul Gopinath, Björn Mathis, and Andreas Zeller. 2020. Mining input grammars from dynamic control flow. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 172–183. https://doi.org/10.1145/3368089.3409679
[13]
Rahul Gopinath and Andreas Zeller. 2019. Building Fast Fuzzers. CoRR, arxiv:1911.07707. arxiv:1911.07707
[14]
Benedikt Hauptmann, Elmar Juergens, and Volkmar Woinke. 2015. Generating refactoring proposals to remove clones from automated system tests. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension. 115–124. https://doi.org/10.1109/ICPC.2015.20
[15]
Renáta Hodován, Ákos Kiss, and Tibor Gyimóthy. 2018. Grammarinator: a grammar-based open source fuzzer. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation. 45–48. https://doi.org/10.1145/3278186.3278193
[16]
Matthias Höschele and Andreas Zeller. 2016. Mining Input Grammars from Dynamic Taints. In IEEE/ACM International Conference on Automated Software Engineering. ACM, New York, NY, USA. 720–725. https://doi.org/10.1145/2970276.2970321
[17]
C Jacobs and D Grune. 1990. Parsing techniques: A practical guide. Springer. isbn:978-0-387-68954-8
[18]
Neil Kulkarni, Caroline Lemieux, and Koushik Sen. 2021. Learning Highly Recursive Input Grammars. In 36th IEEE/ACM International Conference on Automated Software Engineering. ACM New York, NY, USA. https://doi.org/10.1109/ASE51524.2021.9678879
[19]
Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. 2010. Reverse Engineering Input Syntactic Structure from Program Execution and Its Applications. IEEE Transactions on Software Engineering, 36, 5 (2010), Sept., 688–703. issn:0098-5589 https://doi.org/10.1109/TSE.2009.54
[20]
José Oncina and Pedro Garcia. 1992. Identifying regular languages in polynomial time. In Advances in structural and syntactic pattern recognition. World Scientific, 99–108. https://doi.org/10.1142/9789812797919_0007
[21]
Václav Rajlich and Norman Wilde. 2002. The role of concepts in program comprehension. In Proceedings 10th International Workshop on Program Comprehension. 271–278. https://doi.org/10.1109/WPC.2002.1021348
[22]
Andreas Zeller, Rahul Gopinath, Marcel Böhme, Gordon Fraser, and Christian Holler. 2021. Parsing Inputs. In The Fuzzing Book. CISPA Helmholtz Center for Information Security. https://www.fuzzingbook.org/html/Parser.html Retrieved 2021-11-16 14:26:40+01:00

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
June 2022
1038 pages
ISBN:9781450392655
DOI:10.1145/3519939
  • General Chair:
  • Ranjit Jhala,
  • Program Chair:
  • Işil Dillig
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2022

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. GLADE
  2. context-free grammar
  3. inference

Qualifiers

  • Research-article

Conference

PLDI '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)90
  • Downloads (Last 6 weeks)8
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media