skip to main content
research-article

Convergence of datalog over (Pre-) Semirings

Published: 10 April 2024 Publication History

Abstract

Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this article, we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the semi-naïve evaluation algorithm on any datalog program.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), Kimberly Keeton and Timothy Roscoe (Eds.). USENIX Association, 265–283. Retrieved from: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
[2]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley. Retrieved from: http://webdam.inria.fr/Alice/
[3]
Mahmoud Abo Khamis, Hung Q. Ngo, Reinhard Pichler, Dan Suciu, and Yisu Remy Wang. 2022. Convergence of Datalog over (pre-) semirings. In International Conference on Management of Data (PODS’22), Leonid Libkin and Pablo Barceló (Eds.). ACM, 105–117. DOI:
[4]
Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. 2016. FAQ: Questions asked frequently. In 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Tova Milo and Wang-Chiew Tan (Eds.). ACM, 13–28. DOI:
[5]
Srinivas M. Aji and Robert J. McEliece. 2000. The generalized distributive law. IEEE Trans. Inf. Theor 46, 2 (2000), 325–343. DOI:
[6]
Michael Arntzenius and Neel Krishnaswami. 2020. Seminaïve evaluation for a higher-order functional language. Proc. ACM Program. Lang. 4, POPL (2020), 22:1–22:28. DOI:
[7]
R. C. Backhouse and B. A. Carré. 1975. Regular algebra applied to path-finding problems. J. Inst. Math. Appl. 15 (1975), 161–186.
[8]
Sophie Brinke, Erich Grädel, and Lovro Mrkonjic. 2023. Ehrenfeucht-Fraïssé games in semiring semantics. CoRR abs/2308.04910 (2023).
[9]
Bernard Carré. 1979. Graphs and Networks. The Clarendon Press, Oxford University Press, New York. xvi+277 pages.
[10]
E. F. Codd. 1970. A relational model of data for large shared data banks. Commun. ACM 13, 6 (1970), 377–387. DOI:
[11]
Tyson Condie, Ariyam Das, Matteo Interlandi, Alexander Shkapsky, Mohan Yang, and Carlo Zaniolo. 2018. Scaling-up reasoning and advanced analytics on BigData. Theor. Pract. Log. Program. 18, 5–6 (2018), 806–845. DOI:
[12]
Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In 4th ACM Symposium on Principles of Programming Languages, Robert M. Graham, Michael A. Harrison, and Ravi Sethi (Eds.). ACM, 238–252. DOI:
[13]
Patrick Cousot and Radhia Cousot. 1992. Comparing the Galois connection and widening/narrowing approaches to abstract interpretation. In Programming Language Implementation and Logic Programming (Leuven, 1992) (Lecture Notes in Computer Science, Vol. 631). Springer, Berlin, 269–295. DOI:
[14]
Nadia Creignou, Phokion G. Kolaitis, and Heribert Vollmer (Eds.). 2008. Complexity of Constraints—An Overview of Current Research Themes. (Lecture Notes in Computer Science, Vol. 5250). Springer. DOI:
[15]
Katrin M. Dannert, Erich Grädel, Matthias Naaf, and Val Tannen. 2021. Semiring provenance for fixed-point logic. In 29th EACSL Annual Conference on Computer Science Logic (CSL’21) (LIPIcs, Vol. 183), Christel Baier and Jean Goubault-Larrecq (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 17:1–17:22. DOI:
[16]
Ariyam Das, Youfu Li, Jin Wang, Mingda Li, and Carlo Zaniolo. 2019. BigData applications from graph analytics to machine learning by aggregates in recursion. In 35th International Conference on Logic Programming (Technical Communications) (ICLP’19)(EPTCS, Vol. 306), Bart Bogaerts, Esra Erdem, Paul Fodor, Andrea Formisano, Giovambattista Ianni, Daniela Inclezan, Germán Vidal, Alicia Villanueva, Marina De Vos, and Fangkai Yang (Eds.). 273–279. DOI:
[17]
Brian A. Davey and Hilary A. Priestley. 1990. Introduction to Lattices and Order. Cambridge University Press, Cambridge. Retrieved from: http://www.worldcat.org/search?qt=worldcat_org_all&q=0521367662
[18]
Rina Dechter. 1997. Bucket elimination: A unifying framework for processing hard and soft constraints. Constraints Int. J. 2, 1 (1997), 51–55. DOI:
[19]
Javier Esparza, Stefan Kiefer, and Michael Luttenberger. 2010. Newtonian program analysis. J. ACM 57, 6 (2010), 33:1–33:47. DOI:
[20]
Melvin Fitting. 1985. A Kripke-Kleene semantics for logic programs. J. Log. Program. 2, 4 (1985), 295–312. DOI:
[21]
Melvin Fitting. 1991. Bilattices and the semantics of logic programming. J. Log. Program. 11, 1&2 (1991), 91–116. DOI:
[22]
Melvin Fitting. 1991. Kleene’s logic, generalized. J. Log. Comput. 1, 6 (1991), 797–810. DOI:
[23]
Melvin Fitting. 1993. The family of stable models. J. Log. Program. 17, 2/3&4 (1993), 197–225. DOI:
[24]
Melvin Fitting. 2002. Fixpoint semantics for logic programming a survey. Theor. Comput. Sci. 278, 1–2 (2002), 25–51. DOI:
[25]
Robert W. Floyd. 1962. Algorithm 97: Shortest path. Commun. ACM 5, 6 (1962), 345. DOI:
[26]
Sumit Ganguly, Sergio Greco, and Carlo Zaniolo. 1991. Minimum and maximum predicates in logic programming. In 10th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Daniel J. Rosenkrantz (Ed.). ACM Press, 154–163. DOI:
[27]
Sumit Ganguly, Sergio Greco, and Carlo Zaniolo. 1995. Extrema predicates in deductive databases. J. Comput. Syst. Sci. 51, 2 (1995), 244–259. DOI:
[28]
Allen Van Gelder. 1989. The alternating fixpoint of logic programs with negation. In 8th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Avi Silberschatz (Ed.). ACM Press, 1–10. DOI:
[29]
Allen Van Gelder, Kenneth A. Ross, and John S. Schlipf. 1991. The well-founded semantics for general logic programs. J. ACM 38, 3 (1991), 620–650. DOI:
[30]
Michael Gelfond and Vladimir Lifschitz. 1988. The stable model semantics for logic programming. In 5th International Conference and Symposium on Logic Programming, Robert A. Kowalski and Kenneth A. Bowen (Eds.). MIT Press, 1070–1080.
[31]
M. Gondran. 1975. Algèbre linéaire et cheminement dans un graphe. Rev. Française Automat. Informat. Recherche Opérationnelle Sér. Verte 9, V-1 (1975), 77–99.
[32]
Michel Gondran. 1979. Les elements p-reguliers dans les dioïdes. Discret. Math. 25, 1 (1979), 33–39. DOI:
[33]
Michel Gondran and Michel Minoux. 2008. Graphs, Dioids and Semirings(Operations Research/Computer Science Interfaces Series, Vol. 41). Springer, New York, xx+383 pages.
[34]
Sergio Greco, Domenico Saccà, and Carlo Zaniolo. 1995. DATALOG queries with stratified negation and choice: From P to D\({}^{\mbox{P}}\). In 5th International Conference on Database Theory (ICDT’95) (Lecture Notes in Computer Science, Vol. 893), Georg Gottlob and Moshe Y. Vardi (Eds.). Springer, 82–96. DOI:
[35]
Sergio Greco and Carlo Zaniolo. 2001. Greedy algorithms in Datalog. Theory Pract. Log. Program. 1, 4 (2001), 381–407. DOI:
[36]
Sergio Greco, Carlo Zaniolo, and Sumit Ganguly. 1992. Greedy by choice. In 11th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Moshe Y. Vardi and Paris C. Kanellakis (Eds.). ACM Press, 105–113. DOI:
[37]
Todd J. Green, Shan Shan Huang, Boon Thau Loo, and Wenchao Zhou. 2013. Datalog and recursive query processing. Found. Trends Datab. 5, 2 (2013), 105–195. DOI:
[38]
Todd J. Green, Gregory Karvounarakis, and Val Tannen. 2007. Provenance semirings. In 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Leonid Libkin (Ed.). ACM, 31–40. DOI:
[39]
Jiaqi Gu, Yugo H. Watanabe, William A. Mazza, Alexander Shkapsky, Mohan Yang, Ling Ding, and Carlo Zaniolo. 2019. RaSQL: Greater power and performance for big data analytics with recursive-aggregate-SQL on Spark. In International Conference on Management of Data, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 467–484. DOI:
[40]
Jeremy Gunawardena. 1998. An introduction to idempotency. In Idempotency. Publications of the Newton Institute, Vol. 11. Cambridge University Press, Cambridge, 1–49. DOI:
[41]
Mark W. Hopkins and Dexter Kozen. 1999. Parikh’s theorem in commutative Kleene algebra. In 14th Annual IEEE Symposium on Logic in Computer Science. IEEE Computer Society, 394–401. DOI:
[42]
Herbert Jordan, Bernhard Scholz, and Pavle Subotic. 2016. Soufflé: On synthesis of program analyzers. In 28th International Conference on Computer Aided Verification (CAV’16) (Lecture Notes in Computer Science, Vol. 9780), Swarat Chaudhuri and Azadeh Farzan (Eds.). Springer, 422–430. DOI:
[43]
John B. Kam and Jeffrey D. Ullman. 1976. Global data flow analysis and iterative algorithms. J. ACM 23, 1 (1976), 158–171. DOI:
[44]
S. C. Kleene. 1956. Representation of events in nerve nets and finite automata. In Automata Studies. Princeton University Press, Princeton, NJ, 3–41.
[45]
Christoph Koch, Yanif Ahmad, Oliver Kennedy, Milos Nikolic, Andres Nötzli, Daniel Lupei, and Amir Shaikhha. 2014. DBToaster: Higher-order delta processing for dynamic, frequently fresh views. VLDB J. 23, 2 (2014), 253–278. DOI:
[46]
Jürg Kohlas. 2003. Information Algebras—Generic Structures for Inference. Springer.
[47]
Jürg Kohlas and Nic Wilson. 2008. Semiring induced valuation algebras: Exact and approximate local computation algorithms. Artif. Intell. 172, 11 (2008), 1360–1399. DOI:
[48]
Phokion G. Kolaitis. 1991. The expressive power of stratified programs. Inf. Comput. 90, 1 (1991), 50–66. DOI:
[49]
Phokion G. Kolaitis and Christos H. Papadimitriou. 1991. Why not negation by fixpoint? J. Comput. Syst. Sci. 43, 1 (1991), 125–144. DOI:
[50]
Werner Kuich. 1987. The Kleene and the Parikh theorem in complete semirings. In Automata, Languages and Programming. (Lecture Notes in Computer Science, Vol. 267). Springer, Berlin, 212–225. DOI:
[51]
Werner Kuich. 1997. Semirings and formal power series: their relevance to formal languages and automata. In Handbook of Formal Languages, Vol. 1. Springer, Berlin, 609–677.
[52]
Daniel J. Lehmann. 1977. Algebraic structures for transitive closure. Theor. Comput. Sci. 4, 1 (1977), 59–76. DOI:
[53]
Nicola Leone, Gerald Pfeifer, Wolfgang Faber, Thomas Eiter, Georg Gottlob, Simona Perri, and Francesco Scarcello. 2006. The DLV system for knowledge representation and reasoning. ACM Trans. Comput. Log. 7, 3 (2006), 499–562. DOI:
[54]
Leonid Libkin. 2004. Elements of Finite Model Theory. Springer. DOI:
[55]
Richard J. Lipton, Donald J. Rose, and Robert Endre Tarjan. 1979. Generalized nested dissection. SIAM J. Numer. Anal. 16, 2 (1979), 346–358. DOI:
[56]
Richard J. Lipton and Robert Endre Tarjan. 1980. Applications of a planar separator theorem. SIAM J. Comput. 9, 3 (1980), 615–627. DOI:
[57]
Yanhong A. Liu and Scott D. Stoller. 2020. Founded semantics and constraint semantics of logic rules. J. Log. Comput. 30, 8 (2020), 1609–1668. DOI:
[58]
Yanhong A. Liu and Scott D. Stoller. 2022. Recursive rules with aggregation: A simple unified semantics. J. Log. Comput. 32, 8 (2022), 1659–1693. DOI:
[59]
Michael Luttenberger and Maximilian Schlund. 2016. Convergence of Newton’s method over commutative semirings. Inf. Comput. 246 (2016), 43–61. DOI:
[60]
David Maier, K. Tuncay Tekle, Michael Kifer, and David Scott Warren. 2018. Datalog: Concepts, History, and Outlook. In Declarative Logic Programming: Theory, Systems, and Applications, Michael Kifer and Yanhong Annie Liu (Eds.). ACM / Morgan & Claypool, 3–100. DOI:
[61]
Mirjana Mazuran, Edoardo Serra, and Carlo Zaniolo. 2013. Extending the power of datalog recursion. VLDB J. 22, 4 (2013), 471–493. DOI:
[62]
Frank McSherry. 2022. Recursion in Materialize. Retrieved from
[63]
Flemming Nielson, Hanne Riis Nielson, and Chris Hankin. 1999. Principles of Program Analysis. Springer-Verlag, Berlin. xxii+450 pages. DOI:
[64]
Jorge Nocedal and Stephen J. Wright. 1999. Numerical Optimization. Springer. DOI:
[65]
Rohit J. Parikh. 1966. On context-free languages. J. Assoc. Comput. Mach. 13 (1966), 570–581. DOI:
[66]
D. L. Pilling. 1973. Commutative regular equations and Parikh’s theorem. J. London Math. Soc. (2) 6 (1973), 663–666. DOI:
[67]
Teodor C. Przymusinski. 1989. Every logic program has a natural stratification and an iterated least fixed point model. In 8th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Avi Silberschatz (Ed.). ACM Press, 11–21. DOI:
[68]
Teodor C. Przymusinski. 1990. The well-founded semantics coincides with the three-valued stable semantics. Fundam. Inform. 13, 4 (1990), 445–463.
[69]
Thomas W. Reps, Emma Turetsky, and Prathmesh Prabhu. 2016. Newtonian program analysis via tensor product. In 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’16), Rastislav Bodík and Rupak Majumdar (Eds.). ACM, 663–677. DOI:
[70]
Tim Rocktäschel. 2018. Einsum is all you need—Einstein summation in deep learning. Retrieved from
[71]
Kenneth A. Ross and Yehoshua Sagiv. 1992. Monotonic aggregation in deductive databases. In 11th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Moshe Y. Vardi and Paris C. Kanellakis (Eds.). ACM Press, 114–126. DOI:
[72]
Günter Rote. 1990. Path problems in graphs. In Computational Graph Theory(Comput. Suppl., Vol. 7). Springer, Vienna, 155–189. DOI:
[73]
Prakash P. Shenoy and Glenn Shafer. 1988. Axioms for probability and belief-function proagation. In 4th Annual Conference on Uncertainty in Artificial Intelligence (UAI’88), Ross D. Shachter, Tod S. Levitt, Laveen N. Kanal, and John F. Lemmer (Eds.). North-Holland, 169–198.
[74]
Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, and Carlo Zaniolo. 2016. Big data analytics with Datalog queries on Spark. In International SIGMOD Conference on Management of Data. 1135–1149. DOI:
[75]
Alexander Shkapsky, Mohan Yang, and Carlo Zaniolo. 2015. Optimizing recursive queries with monotonic aggregates in DeALS. In 31st IEEE International Conference on Data Engineering (ICDE’15), Johannes Gehrke, Wolfgang Lehner, Kyuseok Shim, Sang Kyun Cha, and Guy M. Lohman (Eds.). IEEE Computer Society, 867–878. DOI:
[76]
Richard P. Stanley. 1999. Enumerative Combinatorics. Vol. 2 (Cambridge Studies in Advanced Mathematics, Vol. 62). Cambridge University Press, Cambridge. xii+581 pages. DOI:
[77]
Robert E. Tarjan. 1976. Graph theory and Gaussian elimination. J. R. Bunch and D. J. Rose (Eds.). 3–22.
[78]
Robert Endre Tarjan. 1981. A unified approach to path problems. J. ACM 28, 3 (1981), 577–593. DOI:
[79]
Moshe Y. Vardi. 1982. The complexity of relational query languages (extended abstract). In 14th Annual ACM Symposium on Theory of Computing, Harry R. Lewis, Barbara B. Simons, Walter A. Burkhard, and Lawrence H. Landweber (Eds.). ACM, 137–146. DOI:
[80]
Victor Vianu. 2021. Datalog unchained. In 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’21), Leonid Libkin, Reinhard Pichler, and Paolo Guagliardo (Eds.). ACM, 57–69. DOI:
[81]
Yisu Remy Wang, Mahmoud Abo Khamis, Hung Q. Ngo, Reinhard Pichler, and Dan Suciu. 2022. Optimizing recursive queries with progam synthesis. In International SIGMOD Conference on Management of Data, Zachary Ives, Angela Bonifati, and Amr El Abbadi (Eds.). ACM, 79–93. DOI:
[82]
Stephen Warshall. 1962. A theorem on Boolean matrices. J. ACM 9, 1 (1962), 11–12. DOI:
[83]
Carlo Zaniolo, Ariyam Das, Jiaqi Gu, Youfu Li, Mingda Li, and Jin Wang. 2019. Monotonic properties of completed aggregates in recursive queries. CoRR abs/1910.08888 (2019).
[84]
Carlo Zaniolo, Ariyam Das, Jiaqi Gu, Youfu Li, Mingda Li, and Jin Wang. 2021. Developing big-data Application as queries: An aggregate-based approach. IEEE Data Eng. Bull. 44, 2 (2021), 3–13. Retrieved from: http://sites.computer.org/debull/A21june/p3.pdf
[85]
Carlo Zaniolo, Mohan Yang, Ariyam Das, and Matteo Interlandi. 2016. The magic of pushing extrema into recursion: Simple, powerful Datalog programs. In 10th Alberto Mendelzon International Workshop on Foundations of Data Management (CEUR Workshop Proceedings, Vol. 1644), Reinhard Pichler and Altigran Soares da Silva (Eds.). CEUR-WS.org. Retrieved from: http://ceur-ws.org/Vol-1644/paper16.pdf
[86]
Carlo Zaniolo, Mohan Yang, Ariyam Das, Alexander Shkapsky, Tyson Condie, and Matteo Interlandi. 2017. Fixpoint semantics and optimization of recursive Datalog programs with aggregates. Theory Pract. Log. Program. 17, 5–6 (2017), 1048–1065. DOI:
[87]
Carlo Zaniolo, Mohan Yang, Matteo Interlandi, Ariyam Das, Alexander Shkapsky, and Tyson Condie. 2018. Declarative BigData algorithms via aggregates and relational database dependencies. In 12th Alberto Mendelzon International Workshop on Foundations of Data Management(CEUR Workshop Proceedings, Vol. 2100), Dan Olteanu and Barbara Poblete (Eds.). CEUR-WS.org. Retrieved from: http://ceur-ws.org/Vol-2100/paper2.pdf
[88]
U. Zimmermann. 1981. Linear and combinatorial optimization in ordered algebraic structures. Ann. Discrete Math. 10 (1981), viii+380.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 71, Issue 2
April 2024
627 pages
EISSN:1557-735X
DOI:10.1145/3613546
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 April 2024
Online AM: 30 January 2024
Accepted: 09 January 2024
Revised: 22 December 2023
Received: 01 February 2023
Published in JACM Volume 71, Issue 2

Check for updates

Author Tags

  1. Datalog
  2. semirings
  3. fixpoint

Qualifiers

  • Research-article

Funding Sources

  • NSF IIS
  • NSF IIS
  • Austrian Science Fund (FWF)
  • Vienna Science and Technology Fund (WWTF)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 247
    Total Downloads
  • Downloads (Last 12 months)247
  • Downloads (Last 6 weeks)31
Reflects downloads up to 30 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media