skip to main content
article

Fusing effectful comprehensions

Published: 14 June 2017 Publication History

Abstract

List comprehensions provide a powerful abstraction mechanism for expressing computations over ordered collections of data declaratively without having to use explicit iteration constructs. This paper puts forth effectful comprehensions as an elegant way to describe list comprehensions that incorporate loop-carried state. This is motivated by operations such as compression/decompression and serialization/deserialization that are common in log/data processing pipelines and require loop-carried state when processing an input stream of data.
We build on the underlying theory of symbolic transducers to fuse pipelines of effectful comprehensions into a single representation, from which efficient code can be generated. Using background theory reasoning with an SMT solver, our fusion and subsequent reachability based branch elimination algorithms can significantly reduce the complexity of the fused pipelines. Our implementation shows significant speedups over reasonable hand-written code (3.4×, on average) and traditionally fused version of the pipeline (2.6×, on average) for a variety of examples, including scenarios for extracting fields with regular expressions, processing XML with XPath, and running queries over encoded data.

Supplementary Material

Auxiliary Archive (pldi17-main162-s.zip)
This artifact includes the benchmarks described in the paper: Olli Saarikivi, Margus Veanes, Todd Mytkowicz and Madan Musuvathi. Fusing Effectful Comprehensions. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'17). ACM, 2017. The artifact is a modified version of the Automata library available at https://github.com/AutomataDotNet/Automata See ReadMe.txt in the archive for usage instructions.

References

[1]
Conduit (Haskell library). https://github.com/snoyberg/conduit.
[2]
Emoticons. http://unicode.org/charts/PDF/U1F600.pdf.
[3]
Apache Flink. https://flink.apache.org/.
[4]
Apache Hadoop. http://hadoop.apache.org/.
[5]
Highland.js. http://highlandjs.org/.
[6]
The .NET compiler platform “Roslyn”. https://github.com/dotnet/roslyn.
[7]
Spark Streaming. http://spark.apache.org/streaming/.
[8]
S. Agrawal, W. Thies, and S. Amarasinghe. Optimizing stream programs using linear state space analysis. In Proceedings of the 2005 International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’05), pages 126– 136. ACM, 2005.
[9]
A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. Markl, F. Naumann, M. Peters, A. Rheinländer, M. J. Sax, S. Schelter, M. Höger, K. Tzoumas, and D. Warneke. The Stratosphere platform for big data analytics. The VLDB Journal, 23(6): 939–964, Dec. 2014.
[10]
R. Alur and J. V. Deshmukh. Nondeterministic streaming string transducers. In Proceedings of Automata, Languages and Programming: 38th International Colloquium (ICALP 2011), volume 6756 of LNCS, pages 1–20. Springer, 2011.
[11]
R. Alur and P. ˇCerný. Streaming transducers for algorithmic verification of single-pass list-processing programs. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’11), pages 599–610. ACM, 2011.
[12]
R. Alur, A. Freilich, and M. Raghothaman. Regular combinators for string transformations. In Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 9:1–9:10. ACM, 2014.
[13]
R. Alur, L. D’Antoni, and M. Raghothaman. DReX: A declarative language for efficiently evaluating regular string transformations. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15), pages 125–137. ACM, 2015.
[14]
R. Alur and P. ˇCerný. Expressiveness of streaming string transducers. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2010), volume 8 of Leibniz International Proceedings in Informatics (LIPIcs), pages 1–12, Dagstuhl, Germany, 2010. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
[15]
D. Calvanese, G. Giacomo, M. Lenzerini, and M. Y. Vardi. An automata-theoretic approach to regular XPath. In Proceedings of the 12th International Symposium on Database Programming Languages (DBPL’09), volume 5708 of LNCS, pages 18–35. Springer, 2009.
[16]
D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP’07), pages 315–326. ACM, 2007.
[17]
L. D’Antoni and M. Veanes. Extended symbolic finite automata and transducers. Formal Methods in System Design, 47 (1):93–119, Aug. 2015.
[18]
L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08), volume 4963 of LNCS, pages 337–340. Springer, 2008.
[19]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, Jan. 2008.
[20]
D. Debarbieux, O. Gauwin, J. Niehren, T. Sebastian, and M. Zergaoui. Early nested word automata for XPath query answering on XML streams. Theoretical Computer Science, 578:100–125, May 2015.
[21]
J. Engelfriet and H. J. Hoogeboom. MSO definable string transductions and two-way finite-state transducers. ACM Transactions on Computational Logic, 2(2):216–254, Apr. 2001.
[22]
P. Fradet and S. H. T. Ha. Network fusion. In Proceedings of Programming Languages and Systems: Second Asian Symposium (APLAS’04), volume 3302 of LNCS, pages 21–40. Springer, 2004.
[23]
M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A stream compiler for communicationexposed architectures. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X), pages 291–303. ACM, 2002.
[24]
B. B. Grathwohl, F. Henglein, U. T. Rasmussen, K. A. Søholm, and S. P. Tørholm. Kleenex: Compiling nondeterministic transducers to deterministic streaming transducers. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’16), pages 284– 297. ACM, 2016.
[25]
M. Hirzel, R. Soulé, S. Schneider, B. Gedik, and R. Grimm. A catalog of stream processing optimizations. ACM Computing Surveys, 46(4):46:1–46:34, Mar. 2014.
[26]
J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979. ISBN 0321455363.
[27]
M. Hyland, G. D. Plotkin, and J. Power. Combining effects: Sum and tensor. Theoretical Computer Science, 357(1-3):70– 99, July 2006.
[28]
G. Mainland, R. Leshchinskiy, and S. Peyton Jones. Exploiting vector instructions with generalized stream fusion. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming (ICFP’13), pages 37–48. ACM, 2013.
[29]
A. Maletti, J. Graehl, M. Hopkins, and K. Knight. The power of extended top-down tree transducers. SIAM Journal on Computing, 39(2):410–430, June 2009.
[30]
E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling object, relations and XML in the .NET framework. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD’06), pages 706–706. ACM, 2006.
[31]
T. Milo, D. Suciu, and V. Vianu. Typechecking for XML transformers. In Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’00), pages 11–22. ACM, 2000.
[32]
E. Moggi. Notions of computation and monads. Information and Computation, 93(1):55–92, July 1991.
[33]
B. Mozafari, K. Zeng, L. D’antoni, and C. Zaniolo. Highperformance complex event processing over hierarchical data. ACM Transactions on Database Systems, 38(4):21:1–21:39, Dec. 2013.
[34]
D. G. Murray, M. Isard, and Y. Yu. Steno: Automatic optimization of declarative queries. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11), pages 121–131. ACM, 2011.
[35]
M. Poess, T. Rabl, H.-A. Jacobsen, and B. Caufield. TPC-DI: The first industry benchmark for data integration. Proceedings of the VLDB Endowment, 7(13):1367–1378, Aug. 2014.
[36]
T. A. Proebsting and S. A. Watterson. Filter fusion. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’96), pages 119–130. ACM, 1996.
[37]
Y. Sakuma, Y. Minamide, and A. Voronkov. Translating regular expression matching into transducers. Journal of Applied Logic, 10(1):32–51, Mar. 2012.
[38]
S. Schneider, M. Hirzel, B. Gedik, and K.-L. Wu. Autoparallelizing stateful distributed streaming applications. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12), pages 53–64. ACM, 2012.
[39]
J. Sermulins, W. Thies, R. Rabbah, and S. Amarasinghe. Cache aware optimization of stream programs. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’05), pages 115–126. ACM, 2005.
[40]
O. Shivers and M. Might. Continuations and transducer composition. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’06), pages 295–307. ACM, 2006.
[41]
J. H. Spring, J. Privat, R. Guerraoui, and J. Vitek. Stream-Flex: High-throughput stream programming in Java. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications (OOPSLA’07), pages 211–228. ACM, 2007.
[42]
W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction (CC’02), volume 2304 of LNCS, pages 179–196. Springer, 2002.
[43]
M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N. Bjorner. Symbolic finite state transducers: Algorithms and applications. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’12), pages 137–150. ACM, 2012.
[44]
M. Veanes, T. Mytkowicz, D. Molnar, and B. Livshits. Dataparallel string-manipulating programs. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’15), pages 139– 152. ACM, 2015.
[45]
P. Wadler. Deforestation: Transforming programs to eliminate trees. Theoretical Computer Science, 73(2):231–248, Jan. 1988.
[46]
P. Wadler. Comprehending monads. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming (LFP’90), pages 61–78. ACM, 1990.
[47]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10), pages 10–10. USENIX Association, 2010.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 52, Issue 6
PLDI '17
June 2017
708 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3140587
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2017
    708 pages
    ISBN:9781450349888
    DOI:10.1145/3062341
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2017
Published in SIGPLAN Volume 52, Issue 6

Check for updates

Author Tags

  1. comprehension
  2. deforestation
  3. fusion
  4. reachability analysis
  5. symbolic automaton
  6. symbolic transducer

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)2
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media