skip to main content
research-article

Automatic index selection for large-scale datalog computation

Published: 01 October 2018 Publication History

Abstract

Datalog has been applied to several use cases that require very high performance on large rulesets and factsets. It is common to create indexes for relations to improve search performance. However, the existing indexing schemes either require manual index selection or result in insufficient performance on very large tasks. In this paper, we propose an automatic scheme to select indexes. We automatically create the minimum number of indexes to speed up all the searches in a given Datalog program. We have integrated our indexing scheme into an open-source Datalog engine SOUFFLÉ. We obtain performance on a par with what users have accepted from hand-optimized Datalog programs running on state-of-the-art Datalog engines, while we do not require the effort of manual index selection. Extensive experiments on large real Datalog programs demonstrate that our indexing scheme results in considerable speedups (up to 2x) and significantly less memory usage (up to 6x) compared with other automated index selections.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[2]
T. Antoniadis, K. Triantafyllou, and Y. Smaragdakis. Porting doop to souffle;: A tale of inter-engine portability for datalog-based analyses. In Proc. SOAP Workshop, pages 25--30, 2017.
[3]
M. Aref, B. ten Cate, T. J. Green, B. Kimelfeld, D. Olteanu, E. Pasalic, T. L. Veldhuizen, and G. Washburn. Design and implementation of the logicblox system. In Proc. SIGMOD, pages 1371--1382, 2015.
[4]
S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In Proc. OOPSLA, pages 169--190, Oct. 2006.
[5]
S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core cpus. In Proc. SIGMOD, pages 37--48, 2011.
[6]
M. Bravenboer and Y. Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. In Proc. OOPSLA, pages 243--262, 2009.
[7]
M. Bravenboer and Y. Smaragdakis. Strictly declarative specification of sophisticated points-to analyses. SIGPLAN Not., 44(10):243--262, Oct. 2009.
[8]
N. Bruno. Automated Physical Database Design and Tuning. CRC Press, Inc., Boca Raton, FL, USA, 1st edition, 2011.
[9]
S. Ceri, G. Gottlob, and L. Tanca. What you always wanted to know about datalog (and never dared to ask). IEEE Trans. on Knowl. and Data Eng., 1(1):146--166, 1989.
[10]
S. Chaudhuri and V. Narasayya. Autoadmin "what-if" index analysis utility. Association for Computing Machinery, Inc., June 1998.
[11]
S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for Microsoft SQL Server. In Proc. VLDB, pages 146--155, 1997.
[12]
D. Comer. The difficulty of optimum index selection. ACM Trans. Database Syst., 3(4):440--445, Dec. 1978.
[13]
T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001.
[14]
J. Dietrich, N. Hollingum, and B. Scholz. Giga-scale exhaustive points-to analysis for java in under a minute. In Proc. OOPSLA, pages 535--551, 2015.
[15]
R. Dilworth. A decomposition theorem for partially ordered sets. Ann. Math. (2), 51:161--166, 1950.
[16]
D. R. Fulkerson. Note on dilworth's decomposition theorem for partially ordered sets. Proc. Amer. Math. Soc., 7(4):pp. 701--702, 1956.
[17]
N. Grech, M. Kong, A. Jurisevic, L. Brent, B. Scholz, and Y. Smaragdakis. Madmax: Surviving out-of-gas conditions in ethereum smart contracts. In Proc. OOPSLA (to appear), 2018.
[18]
T. J. Green, S. S. Huang, B. T. Loo, and W. Zhou. Datalog and recursive query processing. Foundations and Trends in Databases, 5(2):105--195, 2013.
[19]
K. Hoder, N. Bjørner, and L. M. de Moura. Z- an efficient engine for fixed points with constraints. In Proc. CAV, pages 457--462, 2011.
[20]
M. Ip, L. Saxton, and V. Raghavan. On the selection of an optimal set of indexes. IEEE Trans. on Software Engineering, SE-9(2):135--143, March 1983.
[21]
H. Jordan, B. Scholz, and P. Subotic. Soufflé: On synthesis of program analyzers. In Proc. CAV, pages 422--430, 2016.
[22]
J. Kratica, I. Ljubic, and D. Tošic. A genetic algorithm for the index selection problem. In Proc. of EvoWorkshops, pages 280--290, Berlin, Heidelberg, 2003. Springer-Verlag.
[23]
LogicBlox and P. (UoA). PA-Datalog. http://snf-705535.vm.okeanos.grnet.gr/agreement.html,2018. {Online; accessed 30-Jan-2018}.
[24]
LogicBlox Inc. Declartive cloud platform for applications that combine transactions & analytics. http://www.logicblox.com.
[25]
M. Madsen, M.-H. Yee, and O. Lhoták. From datalog to flix: A declarative language for fixed points on lattices. SIGPLAN Not., 51(6):194--208, June 2016.
[26]
G. Piatetsky-Shapiro. The Optimal Selection of Secondary Indices is NP-complete. SIGMOD Rec., 13(2):72--75, Jan. 1983.
[27]
W. Pijls and R. Potharst. Another note on dilworth's decomposition theorem. Journal of Discrete Mathematics, 2013:4, 2013.
[28]
R. Ramakrishnan, D. Srivastava, and S. Sudarshan. Efficient bottom-up evaluation of logic programs. In P. Dewilde and J. Vandewalle, editors, Computer Systems and Software Engineering, pages 287--324. Springer US, 1992.
[29]
R. Ramakrishnan and J. D. Ullman. A survey of deductive database systems. Journal of Logic Programming, 23(2):125--149, 1995.
[30]
K. Ramamohanarao and J. Harland. An introduction to deductive database languages and systems. PVLDB, 3(2):107--122, 1994.
[31]
M. Schkolnick. The optimal selection of secondary indices for files. Information Systems, 1(4):141 -- 146, 1975.
[32]
K. Schnaitter and N. Polyzotis. Semi-automatic index tuning: Keeping dbas in the loop. PVLDB, 5(5):478--489, 2012.
[33]
B. Scholz, H. Jordan, P. Subotic, and T. Westmann. On fast large-scale program analysis in datalog. In Proc. CC, pages 196--206, 2016.
[34]
P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proc. SIGMOD, pages 23--34, 1979.
[35]
A. W. Services. Serverless Architectures with AWS Lambda. Technical report, Amazon Web Services, 11 2017.
[36]
A. Shkapsky, M. Yang, M. Interlandi, H. Chiu, T. Condie, and C. Zaniolo. Big data analytics with datalog queries on spark. In Proc. SIGMOD, pages 1135--1149, 2016.
[37]
Y. Smaragdaiks, M. Bravenboer, and G. Kastrinis. Doop: A framework for java pointer analysis. http://doop.program-analysis.org/.
[38]
J. Whaley, D. Avots, M. Carbin, and M. S. Lam. Using datalog with binary decision diagrams for program analysis. In Proc. APLAS, pages 97--118, 2005.
[39]
M. Yang, A. Shkapsky, and C. Zaniolo. Scaling up the performance of more powerful datalog systems on multicore machines. The VLDB Journal, 26(2):229--248, Apr. 2017.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 2
October 2018
98 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 October 2018
Published in PVLDB Volume 12, Issue 2

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media