research-article

Open access

Automated Ambiguity Detection in Layout-Sensitive Grammars

Authors:

Fei HeAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 7, Issue OOPSLA2

Article No.: 262, Pages 1150 - 1175

https://doi.org/10.1145/3622838

Published: 16 October 2023 Publication History

Abstract

Layout-sensitive grammars have been adopted in many modern programming languages. In a serious language design phase, the specified syntax—typically a grammar—must be unambiguous. Although checking ambiguity is undecidable for context-free grammars and (trivially also) layout-sensitive grammars, ambiguity detection, on the other hand, is possible and can benefit language designers from exposing potential design flaws. In this paper, we tackle the ambiguity detection problem in layout-sensitive grammars. Inspired by a previous work on checking the bounded ambiguity of context-free grammars via SAT solving, we intensively extend their approach to support layout-sensitive grammars but via SMT solving to express the ordering and quantitative relations over line/column numbers. Our key novelty lies in a reachability condition, which takes the impact of layout constraints on ambiguity into careful account. With this condition in hand, we propose an equivalent ambiguity notion called local ambiguity for the convenience of SMT encoding. We translate local ambiguity into an SMT formula and developed a bounded ambiguity checker that automatically finds a shortest nonempty ambiguous sentence (if exists) for a user-input grammar. The soundness and completeness of our SMT encoding are mechanized in the Coq proof assistant. We conducted an evaluation on both grammar fragments and full grammars extracted from the language manuals of domain-specific languages like YAML as well as general-purpose languages like Python, which reveals the effectiveness of our approach.

References

[1]

Michael D. Adams. 2013. Principled Parsing for Indentation-sensitive Languages: Revisiting Landin’s Offside Rule. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’13). ACM, New York, NY, USA. 511–522. isbn:978-1-4503-1832-7 https://doi.org/10.1145/2429069.2429129

Digital Library

[2]

Ali Afroozeh and Anastasia Izmaylova. 2015. One Parser to Rule Them All. In 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!) (Onward! 2015). ACM, New York, NY, USA. 151–170. isbn:978-1-4503-3688-8 https://doi.org/10.1145/2814228.2814242

Digital Library

[3]

Ali Afroozeh and Anastasia Izmaylova. 2016. Iguana: A Practical Data-dependent Parsing Framework. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). ACM, New York, NY, USA. 267–268. isbn:978-1-4503-4241-4 https://doi.org/10.1145/2892208.2892234

Digital Library

[4]

Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman. 1986. Compilers: Principles, Techniques, and Tools. Addison wesley, 7, 8 (1986), 9.

[5]

Luís Eduardo de Souza Amorim, Michael J. Steindorfer, Sebastian Erdweg, and Eelco Visser. 2018. Declarative Specification of Indentation Rules: A Tooling Perspective on Parsing and Pretty-printing Layout-sensitive Languages. In Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering (SLE 2018). ACM, New York, NY, USA. 3–15. isbn:978-1-4503-6029-6 https://doi.org/10.1145/3276604.3276607

Digital Library

[6]

Roland Axelsson, Keijo Heljanko, and Martin Lange. 2008. Analyzing Context-Free Grammars Using an Incremental SAT Solver. In Automata, Languages and Programming, Luca Aceto, Ivan Damgård, Leslie Ann Goldberg, Magnús M. Halldórsson, Anna Ingólfsdóttir, and Igor Walukiewicz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 410–422. isbn:978-3-540-70583-3

[7]

Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing Program Input Grammars. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA. 95–110. isbn:978-1-4503-4988-8 https://doi.org/10.1145/3062341.3062349

Digital Library

[8]

Hendrikus J. S. Basten. 2016. Context-Free Ambiguity Detection Using Multi-stack Pushdown Automata. In Developments in Language Theory - 20th International Conference, DLT 2016, Montréal, Canada, July 25-28, 2016, Proceedings, Srecko Brlek and Christophe Reutenauer (Eds.) (Lecture Notes in Computer Science, Vol. 9840). Springer, 1–12. https://doi.org/10.1007/978-3-662-53132-7_1

Digital Library

[9]

Jonathan Immanuel Brachthäuser, Tillmann Rendel, and Klaus Ostermann. 2016. Parsing with First-class Derivatives. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). ACM, New York, NY, USA. 588–606. isbn:978-1-4503-4444-9 https://doi.org/10.1145/2983990.2984026

Digital Library

[10]

Janusz A. Brzozowski. 1964. Derivatives of Regular Expressions. J. ACM, 11, 4 (1964), Oct., 481–494. issn:0004-5411 https://doi.org/10.1145/321239.321249

Digital Library

[11]

N Chomsky and MP Schützenberger. 1963. The Algebraic Theory of Context-Free Languages. In Studies in Logic and the Foundations of Mathematics. 35, Elsevier, 118–161.

[12]

Chris Coyier. 2012. https://css-tricks.com/poll-results-popularity-of-css-preprocessors/

[13]

Sebastian Erdweg, Tillmann Rendel, Christian Kästner, and Klaus Ostermann. 2013. Layout-Sensitive Generalized Parsing. In Software Language Engineering, Krzysztof Czarnecki and Görel Hedin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 244–263. isbn:978-3-642-36089-3

[14]

Clark C Evans. 2014. Yaml: Yaml ain’t markup language.

[15]

Patrice Godefroid, Adam Kiezun, and Michael Y. Levin. 2008. Grammar-based Whitebox Fuzzing. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’08). ACM, New York, NY, USA. 206–215. isbn:978-1-59593-860-2 https://doi.org/10.1145/1375581.1375607

Digital Library

[16]

Rahul Gopinath, Hamed Nemati, and Andreas Zeller. 2021. Input Algebras. In Proceedings of the 43rd International Conference on Software Engineering (ICSE ’21). IEEE Press, 699–710. isbn:9781450390859 https://doi.org/10.1109/ICSE43902.2021.00070

Digital Library

[17]

John Gruber. 2012. Markdown: Syntax. URL http://daringfireball.net/projects/markdown/syntax. Retrieved on June, 24 (2012), 640.

[18]

Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, 317–330. isbn:978-1-4503-0490-0 https://doi.org/10.1145/1926385.1926423

Digital Library

[19]

Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with Code Fragments. In Proceedings of the 21st USENIX Conference on Security Symposium (Security’12). USENIX Association, Berkeley, CA, USA. 38–38. http://dl.acm.org/citation.cfm?id=2362793.2362831

Digital Library

[20]

John E Hopcroft, Rajeev Motwani, and Jeffrey D Ullman. 2001. Introduction to automata theory, languages, and computation. Acm Sigact News, 32, 1 (2001), 60–65.

Digital Library

[21]

Trevor Jim, Yitzhak Mandelbaum, and David Walker. 2010. Semantics and Algorithms for Data-dependent Grammars. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’10). ACM, New York, NY, USA. 417–430. isbn:978-1-60558-479-9 https://doi.org/10.1145/1706299.1706347

Digital Library

[22]

Paul Klint and Eelco Visser. 1994. Using filters for the disambiguation of context-free grammars. In Proc. ASMICS Workshop on Parsing Theory. 1–20.

[23]

Donald E Knuth. 1965. On the translation of languages from left to right. Information and control, 8, 6 (1965), 607–639.

[24]

P. J. Landin. 1966. The Next 700 Programming Languages. Commun. ACM, 9, 3 (1966), March, 157–166. issn:0001-0782 https://doi.org/10.1145/365230.365257

Digital Library

[25]

Martin Lange and Hans Leiß. 2009. To CNF or not to CNF? An efficient yet presentable version of the CYK algorithm. Informatica Didactica, 8, 2009 (2009), 1–21.

[26]

Alan Leung, John Sarracino, and Sorin Lerner. 2015. Interactive Parser Synthesis by Example. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA. 565–574. isbn:978-1-4503-3468-6 https://doi.org/10.1145/2737924.2738002

Digital Library

[27]

Christian Lindig. 2005. Random Testing of C Calling Conventions. In Proceedings of the Sixth International Symposium on Automated Analysis-driven Debugging (AADEBUG’05). ACM, New York, NY, USA. 3–12. isbn:1-59593-050-7 https://doi.org/10.1145/1085130.1085132

Digital Library

[28]

Ravichandhran Madhavan, Mikaël Mayer, Sumit Gulwani, and Viktor Kuncak. 2015. Automating Grammar Comparison. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA. 183–200. isbn:9781450336895 https://doi.org/10.1145/2814270.2814304

Digital Library

[29]

Rupak Majumdar and Ru-Gang Xu. 2007. Directed Test Generation Using Symbolic Grammars. In Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering (ASE ’07). ACM, New York, NY, USA. 134–143. isbn:978-1-59593-882-4 https://doi.org/10.1145/1321631.1321653

Digital Library

[30]

Simon Marlow. 2010. Haskell 2010 language report. Available online http://www.haskell.org/(May 2011).

[31]

Björn Mathis, Rahul Gopinath, Michaël Mera, Alexander Kampmann, Matthias Höschele, and Andreas Zeller. 2019. Parser-directed Fuzzing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). ACM, New York, NY, USA. 548–560. isbn:978-1-4503-6712-7 https://doi.org/10.1145/3314221.3314651

Digital Library

[32]

Marjan Mernik, Goran Gerlič, Viljem Žumer, and Barrett R. Bryant. 2003. Can a Parser Be Generated from Examples? In Proceedings of the 2003 ACM Symposium on Applied Computing (SAC ’03). ACM, New York, NY, USA. 1063–1067. isbn:1-58113-624-2 https://doi.org/10.1145/952532.952740

Digital Library

[33]

Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A framework for inductive program synthesis. ACM SIGPLAN Notices, 50, 10 (2015), 107–126.

Digital Library

[34]

Dominic Steinhöfel and Andreas Zeller. 2022. Input Invariants. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA. 583–594. isbn:9781450394130 https://doi.org/10.1145/3540250.3549139

Digital Library

[35]

Don Syme, Luke Hoban, Tao Liu, Dmitry Lomov, James Margetson, Brian McNamara, Joe Pamer, Penny Orwick, Daniel Quirk, and Chris Smith. 2010. The F# 2.0 language specification. Microsoft, August.

[36]

Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing, 1, 2 (1972), 146–160.

[37]

Mikkel Thorup. 1996. Disambiguating grammars by exclusion of sub-parse trees. Acta Informatica, 33, 6 (1996), 511–522.

Digital Library

[38]

Guido Van Rossum and Fred L Drake. 2011. The python language reference manual. Network Theory Ltd.

[39]

Naveneetha Vasudevan and Laurence Tratt. 2013. Detecting Ambiguity in Programming Language Grammars. In Software Language Engineering - 6th International Conference, SLE 2013, Indianapolis, IN, USA, October 26-28, 2013. Proceedings, Martin Erwig, Richard F. Paige, and Eric Van Wyk (Eds.) (Lecture Notes in Computer Science, Vol. 8225). Springer, 157–176. https://doi.org/10.1007/978-3-319-02654-1_9

[40]

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). ACM, New York, NY, USA. 283–294. isbn:978-1-4503-0663-8 https://doi.org/10.1145/1993498.1993532

Digital Library

[41]

Daniel H Younger. 1967. Recognition and parsing of context-free languages in time n3. Information and control, 10, 2 (1967), 189–208.

[42]

Andreas Zeller, Rahul Gopinath, Marcel Böhme, Gordon Fraser, and Christian Holler. 2019. The fuzzing book.

[43]

Fengmin Zhu and Jiangyi Liu. 2023. Artifact of paper "Automated Ambiguity Detection in Layout-Sensitive Grammars". https://archive.softwareheritage.org/swh:1:rev:6a08a4b9a6321aeb44d5bde19c1a62dd4e5fd2a2;origin=https://github.com/lay-it-out/OOPSLA23-Artifact;visit=swh:1:snp:2135b15883c8028549ce5a460c745b782bdc2544

[44]

Fengmin Zhu and Jiangyi Liu. 2023. Artifact of paper "Automated Ambiguity Detection in Layout-Sensitive Grammars". https://doi.org/10.5281/zenodo.8329981

Digital Library

Index Terms

Automated Ambiguity Detection in Layout-Sensitive Grammars
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Parsers
    2. Formal language definitions
      1. Syntax
2. Theory of computation
  1. Formal languages and automata theory
    1. Grammars and context-free languages
  2. Logic
    1. Constraint and logic programming

Recommendations

Layout-sensitive language extensibility with SugarHaskell
Haskell '12: Proceedings of the 2012 Haskell Symposium

Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more ...
Layout-sensitive language extensibility with SugarHaskell
Haskell '12

Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more ...
An Ambiguity Hierarchy of Weighted Context-Free Grammars
Implementation and Application of Automata
Abstract
Weighted context-free grammar (WCFG) is a quantitative extension of context-free grammar (CFG). It is known that unambiguous weighted automata (WA), finitely-ambiguous WA, polynomially-ambiguous WA and general WA over the tropical semiring have ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 7, Issue OOPSLA2

October 2023

2250 pages

EISSN:2475-1421

DOI:10.1145/3554312

Editor:
Michael Hicks
Amazon, USA

Issue’s Table of Contents

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2023

Published in PACMPL Volume 7, Issue OOPSLA2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
337
Total Downloads

Downloads (Last 12 months)280
Downloads (Last 6 weeks)38

Reflects downloads up to 21 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents