skip to main content
10.1109/PACT.2019.00015acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Fast Parallel Equivalence Relations in a Datalog Compiler

Published: 26 November 2024 Publication History

Abstract

Modern parallelizing Datalog compilers are employed in industrial applications such as networking and static program analysis. These applications regularly reason about equivalences, e.g., computing bitcoin user groups, fast points-to analyses, and optimal network routes. State-of-the-art Datalog engines represent equivalence relations verbatim by enumerating all possible pairs in an equivalence class. This approach inhibits scalability for large datasets.
In this paper, we introduce EQREL, a specialized parallel union-find data structure for scalable equivalence relations, and its integration into a Datalog compiler. Our data structure provides a quadratic worst-case speed-up and space improvement. We demonstrate the efficacy of our data structure in Soufflé, which is a Datalog compiler that synthesizes parallel C++ code. We use real-world benchmarks and show that the new data structure scales on shared-memory multi-core architectures storing up to a half-billion pairs for a static program analysis scenario.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu, Foundations of databases: the logical level. Addison-Wesley Longman Publishing Co., Inc., 1995.
[2]
P. Alvaro, T. Condie, N. Conway, K. Elmeleegy, J. M. Hellerstein, and R. Sears, "Boom analytics: exploring data-centric, declarative programming for the cloud", in Proceedings of the 5th European conference on Computer systems. ACM, 2010, pp. 223--236.
[3]
R. J. Anderson and H. Woll, "Wait-free parallel algorithms for the union-find problem", in Proceedings of the twenty-third annual ACM symposium on Theory of computing. ACM, 1991, pp. 370--380.
[4]
T. Antoniadis, K. Triantafyllou, and Y. Smaragdakis, "Porting doop to souffle: A tale of inter-engine portability for datalogbased analyses", in Proceedings of the 6th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis, ser. SOAP 2017. New York, NY, USA: ACM, 2017, pp. 25--30.
[5]
M. Aref, B. ten Cate, T. J. Green, B. Kimelfeld, D. Olteanu, E. Pasalic, T. L. Veldhuizen, and G. Washburn, "Design and implementation of the logicblox system", in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015, pp. 1371--1382.
[6]
B. Bishop, A. Kiryakov, Z. Tashev, M. Damova, and K. I. Simov, "Owlim reasoning over factforge." in ORE. Citeseer, 2012.
[7]
M. Bravenboer and Y. Smaragdakis, "Exception analysis and points-to analysis: better together", in Proceedings of the eighteenth international symposium on Software testing and analysis, ser. ISSTA '09. New York, NY, USA: ACM, 2009, pp. 1--12.
[8]
S. Cohen and O. Wolfson, "Why a single parallelization strategy is not enough in knowledge bases", in Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ser. PODS '89. New York, NY, USA: ACM, 1989, pp. 200--216.
[9]
B. A. Galler and M. J. Fisher, "An improved equivalence algorithm", Commun. ACM, vol. 7, no. 5, pp. 301--303, May 1964. [Online]. Available: http://doi.acm.org/10.1145/364099.364331
[10]
S. Ganguly, A. Silberschatz, and S. Tsur, "A framework for the parallel processing of datalog queries", in Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD '90. New York, NY, USA: ACM, 1990, pp. 143--152.
[11]
A. Gurfinkel, T. Kahsai, A. Komuravelli, and J. A. Navas, "The seahorn verification framework", in International Conference on Computer Aided Verification. Springer, 2015, pp. 343--361.
[12]
K. Hoder, N. Bjørner, and L. de Moura, "μz- an efficient engine for fixed points with constraints", in Computer Aided Verification, G. Gopalakrishnan and S. Qadeer, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 457--462.
[13]
P. Hu, B. Motik, and I. Horrocks, "Modular materialisation of datalog programs", 2019.
[14]
G. Hulin, "Parallel processing of recursive queries in distributed architectures", in Proceedings of the 15th International Conference on Very Large Data Bases, ser. VLDB '89. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1989, pp. 87--96.
[15]
Intel, "Threading building blocks - high performance concurrent data structures", Dec 2017. [Online]. Available: https://www.threadingbuildingblocks.org/
[16]
H. Jordan, B. Scholz, and P. Subotić, "Soufflé: on synthesis of program analyzers", in International Conference on Computer Aided Verification. Springer, 2016, pp. 422--430.
[17]
H. Jordan, P. Subotić, D. Zhao, and B. Scholz, "Brie: A specialized trie for concurrent datalog", in Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores, ser. PMAM'19. New York, NY, USA: ACM, 2019, pp. 31--40.
[18]
-----, "A specialized b-tree for concurrent datalog evaluation", in Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, ser. PPoPP '19. New York, NY, USA: ACM, 2019, pp. 327--339.
[19]
G. Kastrinis, G. Balatsouras, K. Ferles, N. Prokopaki-Kostopoulou, and Y. Smaragdakis, "An efficient data structure for must-alias analysis", in Proceedings of the 27th International Conference on Compiler Construction. ACM, 2018, pp. 48--58.
[20]
J. Kleinberg and E. Tardos, Algorithm Design. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2005.
[21]
V. Kolovski, Z. Wu, and G. Eadon, "Optimizing enterprise-scale owl 2 rl reasoning in a relational database system", in International Semantic Web Conference. Springer, 2010, pp. 436--452.
[22]
M. S. Lam, S. Guo, and J. Seo, "Socialite: Datalog extensions for efficient social network analysis", in Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013), ser. ICDE '13. Washington, DC, USA: IEEE Computer Society, 2013, pp. 278--289.
[23]
C. Liu, L. Ren, B. T. Loo, Y. Mao, and P. Basu, "Cologne: A declarative distributed constraint optimization platform", Proceedings of the VLDB Endowment, vol. 5, no. 8, pp. 752--763, 2012.
[24]
B. T. Loo, J. M. Hellerstein, I. Stoica, and R. Ramakrishnan, "Declarative routing: extensible routing with declarative queries", in ACM SIGCOMM Computer Communication Review, vol. 35, no. 4. ACM, 2005, pp. 289--300.
[25]
W. R. Marczak, S. S. Huang, M. Bravenboer, M. Sherr, B. T. Loo, and M. Aref, "Secureblox: customizable secure distributed data processing", in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010, pp. 723--734.
[26]
C. A. Martınez-Angeles, I. Dutra, V. S. Costa, and J. Buenabad-Chávez, "A datalog engine for gpus", Declarative Programming and Knowledge Management, pp. 152--168, 2014.
[27]
M.Bravenboer and Y.Smaragdakis, "Strictly declarative specification of sophisticated points-to analyses", in Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, ser. OOPSLA '09. New York, NY, USA: ACM, 2009, pp. 243--262.
[28]
B. Motik, Y. Nenov, R. E. F. Piro, and I. Horrocks, "Handling owl: sameas via rewriting", in Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
[29]
F. Reid and M. Harrigan, "An analysis of anonymity in the bitcoin system", in Security and privacy in social networks. Springer, 2013, pp. 197--223.
[30]
B. Scholz, H. Jordan, P. Subotić, and T. Westmann, "On fast large-scale program analysis in datalog", in Proceedings of the 25th International Conference on Compiler Construction. ACM, 2016, pp. 196--206.
[31]
B. Scholz, H. Jordan, P. Subotić, and T. Westmann, "On fast large-scale program analysis in datalog", in Proceedings of the 25th International Conference on Compiler Construction, ser. CC 2016. New York, NY, USA: ACM, 2016, pp. 196--206.
[32]
J. Seib and G. Lausen, "Parallelizing datalog programs by generalized pivoting", in Proceedings of the Tenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ser. PODS '91. New York, NY, USA: ACM, 1991, pp. 241--251.
[33]
M. Shaw, P. Koutris, B. Howe, and D. Suciu, "Optimizing large-scale semi-naïve datalog evaluation in hadoop", in Proceedings of the Second International Conference on Datalog in Academia and Industry, ser. Datalog 2.0'12. Berlin, Heidelberg: Springer-Verlag, 2012, pp. 165--176.
[34]
B. Steensgaard, "Points-to analysis in almost linear time", in Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 1996, pp. 32--41.
[35]
P. Subotic, H. Jordan, L. Chang, A. Fekete, and B. Scholz, "Automatic index selection for large-scale datalog computation", PVLDB, vol. 12, no. 2, pp. 141--153, 2018.
[36]
D. Suthers, "Ics 311 16: Disjoint sets and union-find", 2015. [Online]. Available: https://www2.hawaii.edu/~janst/311/Notes/Topic-16.html
[37]
B. Thau Loo, "Datalog and its application to network routing design", 2010. [Online]. Available: https://www.cis.upenn.edu/~boonloo/research/talks/fmin-loo.pdf
[38]
J. Whaley, D. Avots, M. Carbin, and M. S. Lam, "Using Datalog with binary decision diagrams for program analysis", in APLAS, 2005, pp. 97--118.
[39]
J. Whaley and M. S. Lam, "Cloning-based context-sensitive pointer alias analysis using binary decision diagrams", SIGPLAN Not., vol. 39, no. 6, pp. 131--144, Jun. 2004. [Online]. Available: http://doi.acm.org/10.1145/996893.996859
[40]
O. Wolfson and A. Ozeri, "A new paradigm for parallel and distributed rule-processing", SIGMOD Rec., vol. 19, no. 2, pp. 133--142, May 1990.
[41]
O. Wolfson and A. Silberschatz, "Distributed processing of logic programs", SIGMOD Rec., vol. 17, no. 3, pp. 329--336, Jun. 1988.
[42]
M. Yang, A. Shkapsky, and C. Zaniolo, "Scaling up the performance of more powerful datalog systems on multicore machines", VLDB J., vol. 26, no. 2, pp. 229--248, 2017.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques
September 2019
521 pages
ISBN:9781728136134

Sponsors

Publisher

IEEE Press

Publication History

Published: 26 November 2024

Check for updates

Badges

Author Tags

  1. Datalog Compiler
  2. Equivalence Relation
  3. Parallel Data Structures
  4. Semi-naïve Evaluation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PACT '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 2
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)2
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media