skip to main content
research-article
Open access

Narcissus: correct-by-construction derivation of decoders and encoders from binary formats

Published: 26 July 2019 Publication History

Abstract

It is a neat result from functional programming that libraries of parser combinators can support rapid construction of decoders for quite a range of formats. With a little more work, the same combinator program can denote both a decoder and an encoder. Unfortunately, the real world is full of gnarly formats, as with the packet formats that make up the standard Internet protocol stack. Most past parser-combinator approaches cannot handle these formats, and the few exceptions require redundancy – one part of the natural grammar needs to be hand-translated into hints in multiple parts of a parser program. We show how to recover very natural and nonredundant format specifications, covering all popular network packet formats and generating both decoders and encoders automatically. The catch is that we use the Coq proof assistant to derive both kinds of artifacts using tactics, automatically, in a way that guarantees that they form inverses of each other. We used our approach to reimplement packet processing for a full Internet protocol stack, inserting our replacement into the OCaml-based MirageOS unikernel, resulting in minimal performance degradation.

Supplementary Material

WEBM File (a82-pit-claudel.webm)

References

[1]
2013a. CVE-2012-5965: Stack-based buffer overflow in the unique_service_name function in ssdp/ssdp_server.c in the SSDP parser in the portable SDK for UPnP Devices 1.3.1 allows remote attackers to execute arbitrary code via a long DeviceType field in a UDP packet. (Jan. 2013). https://cve.mitre.org/cgi- bin/cvename.cgi?name=CVE- 2012- 5965
[2]
2013b. CVE-2013-1203: Cisco ASA CX Context-Aware Security Software allows remote attackers to cause a denial of service (device reload) via crafted TCP packets that appear to have been forwarded by a Cisco Adaptive Security Appliances device. (May 2013). https://cve.mitre.org/cgi- bin/cvename.cgi?name=CVE- 2013- 1203
[3]
2015. CVE-2015-0618: Cisco IOS XR 5.0.1 and 5.2.1 on Network Convergence System 6000 devices and 5.1.3 and 5.1.4 on Carrier Routing System X devices allows remote attackers to cause a denial of service via malformed IPv6 packets with extension headers. (Feb. 2015). https://cve.mitre.org/cgi- bin/cvename.cgi?name=CVE- 2015- 0618
[4]
2016. CVE-2016-5080: Integer overflow in the rtxMemHeapAlloc function in asn1rt_a.lib in Objective Systems ASN1C for C/C++ before 7.0.2 allows context-dependent attackers to execute arbitrary code or cause a denial of service, on a system running an application compiled by ASN1C, via crafted ASN.1 data. (July 2016). https://cve.mitre.org/cgibin/cvename.cgi?name=CVE- 2016- 5080
[5]
Artem Alimarine, Sjaak Smetsers, Arjen van Weelden, Marko van Eekelen, and Rinus Plasmeijer. 2005. There and Back Again: Arrows for Invertible Programming. In Proceedings of the 2005 ACM SIGPLAN Workshop on Haskell (Haskell ’05). ACM, New York, NY, USA, 86–97.
[6]
Nada Amin and Tiark Rompf. 2017. LMS-Verify: Abstraction Without Regret for Verified Systems Programming. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 859–873.
[7]
Apache Software Foundation. 2016. Apache Avro 1.8.0 Documentation. (2016). http://avro.apache.org/docs/current/ {Accessed May 04, 2016}.
[8]
Godmar Back. 2002. DataScript - A Specification and Scripting Language for Binary Data. In Proceedings of the 1st ACM SIGPLAN/SIGSOFT Conference on Generative Programming and Component Engineering (GPCE ’02). Springer-Verlag, London, UK, UK, 66–77. http://dl.acm.org/citation.cfm?id=645435.652647
[9]
Julian Bangert and Nickolai Zeldovich. 2014. Nail: A Practical Tool for Parsing and Generating Data Formats. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI ’14, Broomfield, CO, USA, October 6-8, 2014. 615–628. https://www.usenix.org/conference/osdi14/technical- sessions/presentation/bangert
[10]
Aditi Barthwal and Michael Norrish. 2009. Verified, Executable Parsing. In Programming Languages and Systems, Giuseppe Castagna (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 160–174.
[11]
Jean-Philippe Bernardy and Patrik Jansson. 2016. Certified Context-Free Parsing: A formalisation of Valiant’s Algorithm in Agda. Logical Methods in Computer Science Volume 12, Issue 2 (June 2016).
[12]
Aaron Bohannon, J. Nathan Foster, Benjamin C. Pierce, Alexandre Pilkiewicz, and Alan Schmitt. 2008. Boomerang: Resourceful Lenses for String Data. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’08). ACM, 407–419.
[13]
Pascal Cuoq, Florent Kirchner, Nikolai Kosmatov, Virgile Prevosto, Julien Signoles, and Boris Yakobowski. 2012. Frama-C: A Software Analysis Perspective. In Proceedings of the 10th International Conference on Software Engineering and Formal Methods (SEFM’12). Springer-Verlag, Berlin, Heidelberg, 233–247.
[14]
Nils Anders Danielsson. 2013. Correct-by-construction Pretty-printing. In Proceedings of the 2013 ACM SIGPLAN workshop on Dependently-typed programming, DTP@ICFP 2013. 1–12.
[15]
Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. 2015. Fiat: Deductive Synthesis of Abstract Data Types in a Proof Assistant. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages - POPL ’15. ACM Press, 689–700.
[16]
Edsger W. Dijkstra. 1967. A constructive approach to the problem of program correctness. (Aug. 1967). http://www.cs. utexas.edu/users/EWD/ewd02xx/EWD209.PDF Circulated privately.
[17]
Robert Dockins, Adam Foltzer, Joe Hendrix, Brian Huffman, Dylan McNamee, and Aaron Tomb. 2016. Constructing Semantic Models of Programs with the Software Analysis Workbench. In Verified Software. Theories, Tools, and Experiments, Sandrine Blazy and Marsha Chechik (Eds.). Springer International Publishing, Cham, 56–72.
[18]
Olivier Dubuisson. 2001. ASN. 1: communication between heterogeneous systems. Morgan Kaufmann.
[19]
Kathleen Fisher and Robert Gruber. 2005. PADS: A Domain-Specific Language for Processing Ad Hoc Data. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005. 295–304.
[20]
Kathleen Fisher, Yitzhak Mandelbaum, and David Walker. 2006. The Next 700 Data Description Languages. In Proceedings of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2006, Charleston, South Carolina, USA, January 11-13, 2006. 2–15.
[21]
Pedro Fonseca, Kaiyuan Zhang, Xi Wang, and Arvind Krishnamurthy. 2017. An Empirical Study on the Correctness of Formally Verified Distributed Systems. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys ’17). ACM, New York, NY, USA, 328–343.
[22]
J. Nathan Foster, Alexandre Pilkiewicz, and Benjamin C. Pierce. 2008. Quotient Lenses. In Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming (ICFP ’08). ACM, 383–396.
[23]
Christopher S. Hardin and Roshan P. James. 2013. Core_bench: micro-benchmarking for OCaml. (2013). https://github. com/janestreet/core_bench
[24]
John Hughes. 2000. Generalising Monads to Arrows. Sci. Comput. Program. 37, 1-3 (May 2000), 67–111.
[25]
Shachar Itzhaky, Rohit Singh, Armando Solar-Lezama, Kuat Yessenov, Yongquan Lu, Charles Leiserson, and Rezaul Chowdhury. 2016. Deriving divide-and-conquer dynamic programming algorithms using solver-aided transformations. Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications - OOPSLA 2016 (2016).
[26]
Stephen C. Johnson. 1979. Yacc: Yet Another Compiler-Compiler. Technical Report.
[27]
Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. 2012. Validating LR(1) Parsers. In Programming Languages and Systems, Helmut Seidl (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 397–416.
[28]
Andrew J. Kennedy. 2004. Functional Pearl: Pickler Combinators. J. Funct. Program. 14, 6 (Nov. 2004), 727–739.
[29]
Etienne Kneuss, Ivan Kuraj, Viktor Kuncak, and Philippe Suter. 2013. Synthesis modulo recursive functions. In Proc. OOPSLA. 407–426.
[30]
Hsiang-Shang Ko and Zhenjiang Hu. 2017. An Axiomatic Basis for Bidirectional Programming. Proceedings of the ACM on Programming Languages 2, POPL, Article 41 (Dec. 2017), 29 pages.
[31]
Hsiang-Shang Ko, Tao Zan, and Zhenjiang Hu. 2016. BiGUL: a formally verified core language for putback-based bidirectional programming. Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM 2016 (2016), 61–72.
[32]
Adam Koprowski and Henri Binsztok. 2011. TRX: A Formally Verified Parser Interpreter. Logical Methods in Computer Science 7, 2 (2011).
[33]
Daan Leijen and Erik Meijer. 2001. Parsec: Direct style monadic parser combinators for the real world. (2001).
[34]
Anil Madhavapeddy, Richard Mortier, Charalampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. 2013. Unikernels: Library Operating Systems for the Cloud. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’13). ACM, New York, NY, USA, 461–472.
[35]
Z. Manna and R. Waldinger. 1979. Synthesis: Dreams ⇒ Programs. IEEE Trans. Softw. Eng. 5, 4 (July 1979), 294–328.
[36]
Kazutaka Matsuda and Meng Wang. 2018. FliPpr: A System for Deriving Parsers from Pretty-Printers. New Generation Computing 36, 3 (01 Jul 2018), 173–202.
[37]
Peter J. McCann and Satish Chandra. 2000. Packet Types: Abstract Specification of Network Protocol Messages. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM ’00). ACM, New York, NY, USA, 321–333.
[38]
Anders Miltner, Kathleen Fisher, Benjamin C. Pierce, David Walker, and Steve Zdancewic. 2017. Synthesizing Bijective Lenses. Proceedings of the ACM on Programming Languages 2, POPL (Dec 2017), 1–30.
[39]
P. Mockapetris. 1987. Domain names - implementation and specification. RFC 1035.
[40]
Greg Morrisett, Gang Tan, Joseph Tassarotti, Jean-Baptiste Tristan, and Edward Gan. 2012. RockSalt: Better, Faster, Stronger SFI for the x86. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012. 395–404.
[41]
Shin-Cheng Mu, Zhenjiang Hu, and Masato Takeichi. 2004. An Injective Language for Reversible Computation. In Mathematics of Program Construction, Dexter Kozen (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 289–313.
[42]
Ruoming Pang, Vern Paxson, Robin Sommer, and Larry Peterson. 2006. binpac: A yacc for writing application protocol parsers. In Proceedings of the 6th ACM SIGCOMM conference on Internet measurement. ACM, 289–300.
[43]
T. J. Parr and R. W. Quong. 1995. ANTLR: A Predicated-LL(k) Parser Generator. Software: Practice and Experience 25, 7 (July 1995), 789–810.
[44]
Dusko Pavlovic, Peter Pepper, and Douglas R. Smith. 2010. Formal Derivation of Concurrent Garbage Collectors. In Mathematics of Program Construction. Springer Berlin Heidelberg, 353–376.
[45]
Jonathan Protzenko, Jean-Karim Zinzindohoué, Aseem Rastogi, Tahina Ramananandro, Peng Wang, Santiago ZanellaBéguelin, Antoine Delignat-Lavaud, Catalin Hritcu, Karthikeyan Bhargavan, Cédric Fournet, and Nikhil Swamy. 2017. Verified Low-Level Programming Embedded in F*. PACMPL 1, ICFP (Sept. 2017), 17:1–17:29.
[46]
Tillmann Rendel and Klaus Ostermann. 2010. Invertible Syntax Descriptions: Unifying Parsing and Pretty Printing. In Proceedings of the Third ACM Haskell Symposium on Haskell (Haskell ’10). ACM, New York, NY, USA, 1–12.
[47]
Tom Ridge. 2011. Simple, Functional, Sound and Complete Parsing for All Context-Free Grammars. In Certified Programs and Proofs, Jean-Pierre Jouannaud and Zhong Shao (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 103–118.
[48]
Keith Simmons. 2016. Cheerios. (2016). https://courses.cs.washington.edu/courses/cse599w/16sp/projects/cheerios.pdf.
[49]
Douglas R. Smith and Stephen J. Westfold. 2008. Synthesis of Propositional Satisfiability Solvers. (2008).
[50]
Yellamraju V. Srinivas and Richard Jüllig. 1995. Specware: Formal support for composing software. In Mathematics of Program Construction, Bernhard Möller (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 399–422.
[51]
Raj Srinivasan. 1995. XDR: External data representation standard. Technical Report.
[52]
Nikhil Swamy, Cătălin Hriţcu, Chantal Keller, Aseem Rastogi, Antoine Delignat-Lavaud, Simon Forest, Karthikeyan Bhargavan, Cédric Fournet, Pierre-Yves Strub, Markulf Kohlweiss, Jean-Karim Zinzindohoue, and Santiago ZanellaBéguelin. 2016. Dependent Types and Multi-monadic Effects in F*. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). ACM, New York, NY, USA, 256–270.
[53]
Gang Tan and Greg Morrisett. 2018. Bidirectional Grammars for Machine-Code Decoding and Encoding. Journal of Automated Reasoning 60, 3 (01 Mar 2018), 257–277.
[54]
The Coq Development Team. 2018. The Coq Proof Assistant, version 8.7.2. (Feb. 2018).
[55]
Mark Tullsen, Lee Pike, Nathan Collins, and Aaron Tomb. 2018. Formal Verification of a Vehicle-to-Vehicle (V2V) Messaging System. In Computer Aided Verification, Hana Chockler and Georg Weissenbacher (Eds.). Springer International Publishing, Cham, 413–429.
[56]
Marcell van Geest and Wouter Swierstra. 2017. Generic Packet Descriptions: Verified Parsing and Pretty Printing of Low-level Data. In Proceedings of the 2Nd ACM SIGPLAN International Workshop on Type-Driven Development (TyDe 2017). ACM, New York, NY, USA, 30–40.
[57]
Kenton Varda. 2008. Protocol Buffers. https://developers.google.com/protocol-buffers/. (2008).
[58]
Dimitrios Vytiniotis and Andrew J. Kennedy. 2010. Functional Pearl: Every bit counts. Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010 (2010), 15–26.
[59]
Qianchuan Ye and Benjamin Delaware. 2019. A verified protocol buffer compiler. In Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2019, Cascais, Portugal, January 14-15, 2019. 222–233.

Cited By

View all

Index Terms

  1. Narcissus: correct-by-construction derivation of decoders and encoders from binary formats

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 3, Issue ICFP
    August 2019
    1054 pages
    EISSN:2475-1421
    DOI:10.1145/3352468
    Issue’s Table of Contents
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 July 2019
    Published in PACMPL Volume 3, Issue ICFP

    Check for updates

    Badges

    Author Tags

    1. Deductive Synthesis
    2. Parser Combinators
    3. Program Synthesis
    4. Serialization and Deserialization

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)386
    • Downloads (Last 6 weeks)59
    Reflects downloads up to 05 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media