skip to main content
article

Datalog in Wonderland

Published: 29 July 2022 Publication History

Abstract

Modern data analytics applications, such as knowledge graph reasoning and machine learning, typically involve recursion through aggregation. Such computations pose great challenges to both system builders and theoreticians: first, to derive simple yet powerful abstractions for these computations; second, to define and study the semantics for the abstractions; third, to devise optimization techniques for these computations.
In recent work we presented a generalization of Datalog called Datalog, which addresses these challenges. Datalog is a simple abstraction, which allows aggregates to be interleaved with recursion, and retains much of the simplicity and elegance of Datalog. We define its formal semantics based on an algebraic structure called Partially Ordered Pre-Semirings, and illustrate through several examples how Datalog can be used for a variety of applications. Finally, we describe a new optimization rule for Datalog, called the FGH-rule, then illustrate the FGH-rule on several examples, including a simple magic-set rewriting, generalized semi-naïve evaluation, and a bill-of-material example, and briefly discuss the implementation of the FGH-rule and present some experimental validation of its effectiveness.

References

[1]
ABITEBOUL, S., HULL, R., AND VIANU, V. Foundations of Databases. Addison-Wesley, 1995.
[2]
ALVIANO, M. Evaluating Answer Set Programming with non-convex recursive aggregates. Fundam. Informaticae (2016).
[3]
ALVIANO, M., FABER, W., LEONE, N., PERRI, S., PFEIFER, G., AND TERRACINA, G. The disjunctive Datalog system DLV. In Datalog (2010).
[4]
AREF, M., TEN CATE, B., GREEN, T. J., KIMELFELD, B., OLTEANU, D., PASALIC, E., VELDHUIZEN, T. L., AND WASHBURN, G. Design and implementation of the LogicBlox system. In SIGMOD (2015).
[5]
BANCILHON, F., MAIER, D., SAGIV, Y., AND ULLMAN, J. D. Magic sets and other strange ways to implement logic programs. In PODS (1986).
[6]
BEERI, C., AND RAMAKRISHNAN, R. On the power of magic. J. Log. Program. (1991).
[7]
BUDIU, M., MCSHERRY, F., RYZHYK, L., AND TANNEN, V. DBSP: automatic incremental view maintenance for rich query languages. CoRR abs/2203.16684 (2022).
[8]
CARRÉ, B. Graphs and networks. The Clarendon Press, Oxford University Press, New York, 1979.
[9]
CONWAY, N., MARCZAK, W. R., ALVARO, P., HELLERSTEIN, J. M., AND MAIER, D. Logic and lattices for distributed programming. In SOCC (2012).
[10]
COUSOT, P., AND COUSOT, R. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL (1977).
[11]
DANNERT, K. M., GR-DEL, E., NAAF, M., AND TANNEN, V. Semiring provenance for fixed-point logic. In CSL (2021).
[12]
FABER, W., PFEIFER, G., AND LEONE, N. Semantics and complexity of recursive aggregates in Answer Set Programming. Artif. Intell. (2011).
[13]
FAN, Z., ZHU, J., ZHANG, Z., ALBARGHOUTHI, A., KOUTRIS, P., AND PATEL, J. M. Scaling-up in-memory Datalog processing: Observations and techniques. Proc. VLDB Endow. (2019).
[14]
GANGULY, S., GRECO, S., AND ZANIOLO, C. Minimum and maximum predicates in logic programming. In PODS (1991).
[15]
GELDER, A. V. The alternating fixpoint of logic programs with negation. In PODS (1989).
[16]
GELDER, A. V., ROSS, K. A., AND SCHLIPF, J. S. The well-founded semantics for general logic programs. J. ACM (1991).
[17]
GELFOND, M., AND LIFSCHITZ, V. The stable model semantics for logic programming. In Logic Programming (1988).
[18]
GELFOND, M., AND ZHANG, Y. Vicious circle principle and logic programs with aggregates. Theory Pract. Log. Program. (2014).
[19]
GOLDSTEIN, J., AND LARSON, P. Optimizing queries using materialized views: A practical, scalable solution. In SIGMOD (2001).
[20]
GONDRAN, M., AND MINOUX, M. Graphs, dioids and semirings. Springer, New York, 2008.
[21]
GREEN, T. J., KARVOUNARAKIS, G., AND TANNEN, V. Provenance semirings. In PODS (2007).
[22]
HALEVY, A. Y. Answering queries using views: A survey. VLDB J. (2001).
[23]
HOPKINS, M. W., AND KOZEN, D. Parikh's theorem in commutative Kleene algebra. In LICS (1999).
[24]
JORDAN, H., SCHOLZ, B., AND SUBOTIC, P. Soufflé: On synthesis of program analyzers. In CAV (2016).
[25]
KEMP, D. B., AND STUCKEY, P. J. Semantics of logic programs with aggregates. In Logic Programming (1991).
[26]
KHAMIS, M. A., NGO, H. Q., PICHLER, R., SUCIU, D., AND WANG, Y. R. Convergence of Datalog over (pre-) semirings. CoRR abs/2105.14435 (2021).
[27]
KHAMIS, M. A., NGO, H. Q., PICHLER, R., SUCIU, D., AND WANG, Y. R. Convergence of datalog over (pre-) semirings. In PODS (2022).
[28]
KUICH, W. Semirings and formal power series: their relevance to formal languages and automata. In Handbook of formal languages, Vol. 1. Springer, Berlin, 1997.
[29]
LEHMANN, D. J. Algebraic structures for transitive closure. Theor. Comput. Sci. (1977).
[30]
LESKOVEC, J., AND KREVL, A. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
[31]
LIPTON, R. J., ROSE, D. J., AND TARJAN, R. E. Generalized nested dissection. SIAM J. Numer. Anal. (1979).
[32]
LIPTON, R. J., AND TARJAN, R. E. Applications of a planar separator theorem. SIAM J. Comput. (1980).
[33]
LIU, Y. A., AND STOLLER, S. D. Recursive rules with aggregation: A simple unified semantics. In LFCS (2022).
[34]
MAZURAN, M., SERRA, E., AND ZANIOLO, C. A declarative extension of horn clauses, and its significance for Datalog and its applications. Theory Pract. Log. Program. (2013).
[35]
MUMICK, I. S., PIRAHESH, H., AND RAMAKRISHNAN, R. The magic of duplicates and aggregates. In VLDB (1990).
[36]
NIELSON, F., NIELSON, H. R., AND HANKIN, C. Principles of program analysis. Springer-Verlag, Berlin, 1999.
[37]
NOCEDAL, J., AND WRIGHT, S. J. Numerical Optimization. Springer, 1999.
[38]
RAMAKRISHNAN, R., AND SRIVASTAVA, D. Semantics and optimization of constraint queries in databases. IEEE Data Eng. Bull. (1994).
[39]
ROSS, K. A., AND SAGIV, Y. Monotonic aggregation in deductive databases. In PODS (1992).
[40]
ROTE, G. Path problems in graphs. In Computational graph theory, Comput. Suppl. Springer, Vienna, 1990.
[41]
SHKAPSKY, A., YANG, M., INTERLANDI, M., CHIU, H., CONDIE, T., AND ZANIOLO, C. Big data analytics with Datalog queries on spark. In SIGMOD (2016).
[42]
SHKAPSKY, A., YANG, M., AND ZANIOLO, C. Optimizing recursive queries with monotonic aggregates in DeALS. In ICDE (2015).
[43]
SOLAR-LEZAMA, A., TANCAU, L., BODÍK, R., SESHIA, S. A., AND SARASWAT, V. A. Combinatorial sketching for finite programs. In ASPLOS (2006).
[44]
TORLAK, E., AND BODÍK, R. Growing solver-aided languages with Rosette. In Onward! (2013).
[45]
VIANU, V. Datalog unchained. In PODS (2021).
[46]
WANG, Y. R., KHAMIS, M. A., NGO, H. Q., PICHLER, R., AND SUCIU, D. Optimizing recursive queries with progam synthesis. In SIGMOD (2022).
[47]
WANG, Y. R., KHAMIS, M. A., NGO, H. Q., PICHLER, R., AND SUCIU, D. Optimizing recursive queries with program synthesis. CoRR abs/2202.10390 (2022).
[48]
WILLSEY, M., NANDI, C., WANG, Y. R., FLATT, O., TATLOCK, Z., AND PANCHEKHA, P. egg: Fast and extensible equality saturation. Proc. ACM Program. Lang., POPL (2021).
[49]
ZANIOLO, C., DAS, A., GU, J., LI, Y., LI, M., AND WANG, J. Monotonic properties of completed aggregates in recursive queries. CoRR abs/1910.08888 (2019).
[50]
ZANIOLO, C., YANG, M., DAS, A., AND INTERLANDI, M. The magic of pushing extrema into recursion: Simple, powerful Datalog programs. In AMW (2016).
[51]
ZANIOLO, C., YANG, M., DAS, A., SHKAPSKY, A., CONDIE, T., AND INTERLANDI, M. Fixpoint semantics and optimization of recursive Datalog programs with aggregates. Theory Pract. Log. Program. (2017).
[52]
ZIMMERMANN, U. Linear and combinatorial optimization in ordered algebraic structures. Ann. Discrete Math. (1981).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 51, Issue 2
June 2022
72 pages
ISSN:0163-5808
DOI:10.1145/3552490
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 July 2022
Published in SIGMOD Volume 51, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)120
  • Downloads (Last 6 weeks)10
Reflects downloads up to 30 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media