skip to main content
research-article
Open access

Automated Ambiguity Detection in Layout-Sensitive Grammars

Published: 16 October 2023 Publication History

Abstract

Layout-sensitive grammars have been adopted in many modern programming languages. In a serious language design phase, the specified syntax—typically a grammar—must be unambiguous. Although checking ambiguity is undecidable for context-free grammars and (trivially also) layout-sensitive grammars, ambiguity detection, on the other hand, is possible and can benefit language designers from exposing potential design flaws. In this paper, we tackle the ambiguity detection problem in layout-sensitive grammars. Inspired by a previous work on checking the bounded ambiguity of context-free grammars via SAT solving, we intensively extend their approach to support layout-sensitive grammars but via SMT solving to express the ordering and quantitative relations over line/column numbers. Our key novelty lies in a reachability condition, which takes the impact of layout constraints on ambiguity into careful account. With this condition in hand, we propose an equivalent ambiguity notion called local ambiguity for the convenience of SMT encoding. We translate local ambiguity into an SMT formula and developed a bounded ambiguity checker that automatically finds a shortest nonempty ambiguous sentence (if exists) for a user-input grammar. The soundness and completeness of our SMT encoding are mechanized in the Coq proof assistant. We conducted an evaluation on both grammar fragments and full grammars extracted from the language manuals of domain-specific languages like YAML as well as general-purpose languages like Python, which reveals the effectiveness of our approach.

References

[1]
Michael D. Adams. 2013. Principled Parsing for Indentation-sensitive Languages: Revisiting Landin’s Offside Rule. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’13). ACM, New York, NY, USA. 511–522. isbn:978-1-4503-1832-7 https://doi.org/10.1145/2429069.2429129
[2]
Ali Afroozeh and Anastasia Izmaylova. 2015. One Parser to Rule Them All. In 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!) (Onward! 2015). ACM, New York, NY, USA. 151–170. isbn:978-1-4503-3688-8 https://doi.org/10.1145/2814228.2814242
[3]
Ali Afroozeh and Anastasia Izmaylova. 2016. Iguana: A Practical Data-dependent Parsing Framework. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). ACM, New York, NY, USA. 267–268. isbn:978-1-4503-4241-4 https://doi.org/10.1145/2892208.2892234
[4]
Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman. 1986. Compilers: Principles, Techniques, and Tools. Addison wesley, 7, 8 (1986), 9.
[5]
Luís Eduardo de Souza Amorim, Michael J. Steindorfer, Sebastian Erdweg, and Eelco Visser. 2018. Declarative Specification of Indentation Rules: A Tooling Perspective on Parsing and Pretty-printing Layout-sensitive Languages. In Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering (SLE 2018). ACM, New York, NY, USA. 3–15. isbn:978-1-4503-6029-6 https://doi.org/10.1145/3276604.3276607
[6]
Roland Axelsson, Keijo Heljanko, and Martin Lange. 2008. Analyzing Context-Free Grammars Using an Incremental SAT Solver. In Automata, Languages and Programming, Luca Aceto, Ivan Damgård, Leslie Ann Goldberg, Magnús M. Halldórsson, Anna Ingólfsdóttir, and Igor Walukiewicz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 410–422. isbn:978-3-540-70583-3
[7]
Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing Program Input Grammars. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA. 95–110. isbn:978-1-4503-4988-8 https://doi.org/10.1145/3062341.3062349
[8]
Hendrikus J. S. Basten. 2016. Context-Free Ambiguity Detection Using Multi-stack Pushdown Automata. In Developments in Language Theory - 20th International Conference, DLT 2016, Montréal, Canada, July 25-28, 2016, Proceedings, Srecko Brlek and Christophe Reutenauer (Eds.) (Lecture Notes in Computer Science, Vol. 9840). Springer, 1–12. https://doi.org/10.1007/978-3-662-53132-7_1
[9]
Jonathan Immanuel Brachthäuser, Tillmann Rendel, and Klaus Ostermann. 2016. Parsing with First-class Derivatives. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). ACM, New York, NY, USA. 588–606. isbn:978-1-4503-4444-9 https://doi.org/10.1145/2983990.2984026
[10]
Janusz A. Brzozowski. 1964. Derivatives of Regular Expressions. J. ACM, 11, 4 (1964), Oct., 481–494. issn:0004-5411 https://doi.org/10.1145/321239.321249
[11]
N Chomsky and MP Schützenberger. 1963. The Algebraic Theory of Context-Free Languages. In Studies in Logic and the Foundations of Mathematics. 35, Elsevier, 118–161.
[12]
Chris Coyier. 2012. https://css-tricks.com/poll-results-popularity-of-css-preprocessors/
[13]
Sebastian Erdweg, Tillmann Rendel, Christian Kästner, and Klaus Ostermann. 2013. Layout-Sensitive Generalized Parsing. In Software Language Engineering, Krzysztof Czarnecki and Görel Hedin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 244–263. isbn:978-3-642-36089-3
[14]
Clark C Evans. 2014. Yaml: Yaml ain’t markup language.
[15]
Patrice Godefroid, Adam Kiezun, and Michael Y. Levin. 2008. Grammar-based Whitebox Fuzzing. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’08). ACM, New York, NY, USA. 206–215. isbn:978-1-59593-860-2 https://doi.org/10.1145/1375581.1375607
[16]
Rahul Gopinath, Hamed Nemati, and Andreas Zeller. 2021. Input Algebras. In Proceedings of the 43rd International Conference on Software Engineering (ICSE ’21). IEEE Press, 699–710. isbn:9781450390859 https://doi.org/10.1109/ICSE43902.2021.00070
[17]
John Gruber. 2012. Markdown: Syntax. URL http://daringfireball.net/projects/markdown/syntax. Retrieved on June, 24 (2012), 640.
[18]
Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, 317–330. isbn:978-1-4503-0490-0 https://doi.org/10.1145/1926385.1926423
[19]
Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with Code Fragments. In Proceedings of the 21st USENIX Conference on Security Symposium (Security’12). USENIX Association, Berkeley, CA, USA. 38–38. http://dl.acm.org/citation.cfm?id=2362793.2362831
[20]
John E Hopcroft, Rajeev Motwani, and Jeffrey D Ullman. 2001. Introduction to automata theory, languages, and computation. Acm Sigact News, 32, 1 (2001), 60–65.
[21]
Trevor Jim, Yitzhak Mandelbaum, and David Walker. 2010. Semantics and Algorithms for Data-dependent Grammars. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’10). ACM, New York, NY, USA. 417–430. isbn:978-1-60558-479-9 https://doi.org/10.1145/1706299.1706347
[22]
Paul Klint and Eelco Visser. 1994. Using filters for the disambiguation of context-free grammars. In Proc. ASMICS Workshop on Parsing Theory. 1–20.
[23]
Donald E Knuth. 1965. On the translation of languages from left to right. Information and control, 8, 6 (1965), 607–639.
[24]
P. J. Landin. 1966. The Next 700 Programming Languages. Commun. ACM, 9, 3 (1966), March, 157–166. issn:0001-0782 https://doi.org/10.1145/365230.365257
[25]
Martin Lange and Hans Leiß. 2009. To CNF or not to CNF? An efficient yet presentable version of the CYK algorithm. Informatica Didactica, 8, 2009 (2009), 1–21.
[26]
Alan Leung, John Sarracino, and Sorin Lerner. 2015. Interactive Parser Synthesis by Example. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA. 565–574. isbn:978-1-4503-3468-6 https://doi.org/10.1145/2737924.2738002
[27]
Christian Lindig. 2005. Random Testing of C Calling Conventions. In Proceedings of the Sixth International Symposium on Automated Analysis-driven Debugging (AADEBUG’05). ACM, New York, NY, USA. 3–12. isbn:1-59593-050-7 https://doi.org/10.1145/1085130.1085132
[28]
Ravichandhran Madhavan, Mikaël Mayer, Sumit Gulwani, and Viktor Kuncak. 2015. Automating Grammar Comparison. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA. 183–200. isbn:9781450336895 https://doi.org/10.1145/2814270.2814304
[29]
Rupak Majumdar and Ru-Gang Xu. 2007. Directed Test Generation Using Symbolic Grammars. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering (ASE ’07). ACM, New York, NY, USA. 134–143. isbn:978-1-59593-882-4 https://doi.org/10.1145/1321631.1321653
[30]
Simon Marlow. 2010. Haskell 2010 language report. Available online http://www.haskell.org/(May 2011).
[31]
Björn Mathis, Rahul Gopinath, Michaël Mera, Alexander Kampmann, Matthias Höschele, and Andreas Zeller. 2019. Parser-directed Fuzzing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). ACM, New York, NY, USA. 548–560. isbn:978-1-4503-6712-7 https://doi.org/10.1145/3314221.3314651
[32]
Marjan Mernik, Goran Gerlič, Viljem Žumer, and Barrett R. Bryant. 2003. Can a Parser Be Generated from Examples? In Proceedings of the 2003 ACM Symposium on Applied Computing (SAC ’03). ACM, New York, NY, USA. 1063–1067. isbn:1-58113-624-2 https://doi.org/10.1145/952532.952740
[33]
Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A framework for inductive program synthesis. ACM SIGPLAN Notices, 50, 10 (2015), 107–126.
[34]
Dominic Steinhöfel and Andreas Zeller. 2022. Input Invariants. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA. 583–594. isbn:9781450394130 https://doi.org/10.1145/3540250.3549139
[35]
Don Syme, Luke Hoban, Tao Liu, Dmitry Lomov, James Margetson, Brian McNamara, Joe Pamer, Penny Orwick, Daniel Quirk, and Chris Smith. 2010. The F# 2.0 language specification. Microsoft, August.
[36]
Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, 1, 2 (1972), 146–160.
[37]
Mikkel Thorup. 1996. Disambiguating grammars by exclusion of sub-parse trees. Acta Informatica, 33, 6 (1996), 511–522.
[38]
Guido Van Rossum and Fred L Drake. 2011. The python language reference manual. Network Theory Ltd.
[39]
Naveneetha Vasudevan and Laurence Tratt. 2013. Detecting Ambiguity in Programming Language Grammars. In Software Language Engineering - 6th International Conference, SLE 2013, Indianapolis, IN, USA, October 26-28, 2013. Proceedings, Martin Erwig, Richard F. Paige, and Eric Van Wyk (Eds.) (Lecture Notes in Computer Science, Vol. 8225). Springer, 157–176. https://doi.org/10.1007/978-3-319-02654-1_9
[40]
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). ACM, New York, NY, USA. 283–294. isbn:978-1-4503-0663-8 https://doi.org/10.1145/1993498.1993532
[41]
Daniel H Younger. 1967. Recognition and parsing of context-free languages in time n3. Information and control, 10, 2 (1967), 189–208.
[42]
Andreas Zeller, Rahul Gopinath, Marcel Böhme, Gordon Fraser, and Christian Holler. 2019. The fuzzing book.
[43]
Fengmin Zhu and Jiangyi Liu. 2023. Artifact of paper "Automated Ambiguity Detection in Layout-Sensitive Grammars". https://archive.softwareheritage.org/swh:1:rev:6a08a4b9a6321aeb44d5bde19c1a62dd4e5fd2a2;origin=https://github.com/lay-it-out/OOPSLA23-Artifact;visit=swh:1:snp:2135b15883c8028549ce5a460c745b782bdc2544
[44]
Fengmin Zhu and Jiangyi Liu. 2023. Artifact of paper "Automated Ambiguity Detection in Layout-Sensitive Grammars". https://doi.org/10.5281/zenodo.8329981

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 7, Issue OOPSLA2
October 2023
2250 pages
EISSN:2475-1421
DOI:10.1145/3554312
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2023
Published in PACMPL Volume 7, Issue OOPSLA2

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Coq
  2. SMT
  3. ambiguity
  4. layout-sensitive grammar

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 337
    Total Downloads
  • Downloads (Last 12 months)280
  • Downloads (Last 6 weeks)38
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media