research-article

MLog: towards declarative in-database machine learning

Authors:

Xupeng Li,

Bin Cui,

Yiru Chen,

Wentao Wu,

Ce ZhangAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 10, Issue 12

Pages 1933 - 1936

https://doi.org/10.14778/3137765.3137812

Published: 01 August 2017 Publication History

Get Access

Abstract

We demonstrate MLog, a high-level language that integrates machine learning into data management systems. Unlike existing machine learning frameworks (e.g., TensorFlow, Theano, and Caffe), MLog is declarative, in the sense that the system manages all data movement, data persistency, and machine-learning related optimizations (such as data batching) automatically. Our interactive demonstration will show audience how this is achieved based on the novel notion of tensoral views (TViews), which are similar to relational views but operate over tensors with linear algebra. With MLog, users can succinctly specify not only simple models such as SVM (in just two lines), but also sophisticated deep learning models that are not supported by existing in-database analytics systems (e.g., MADlib, PAL, and SciDB), as a series of cascaded TViews. Given the declarative nature of MLog, we further demonstrate how query/program optimization techniques can be leveraged to translate MLog programs into native TensorFlow programs. The performance of the automatically generated Tensor-Flow programs is comparable to that of hand-optimized ones.

References

[1]

P. G. Brown. Overview of scidb: Large scale array storage, processing and analysis. In SIGMOD, 2010.

Digital Library

Google Scholar

[2]

S. Cohen, W. Nutt, and Y. Sagiv. Deciding equivalences among conjunctive aggregate queries. J. ACM, 2007.

Digital Library

Google Scholar

[3]

S. Hadjis, F. Abuzaid, C. Zhang, and C. Ré. Caffe Con Troll: Shallow ideas to speed up deep learning. In DanaC, 2015.

Digital Library

Google Scholar

[4]

J. M. Hellerstein et al. The MADlib analytics library: Or mad skills, the sql. Proc. VLDB Endow., 2012.

Digital Library

Google Scholar

[5]

J. Jiang, B. Cui, C. Zhang, and L. Yu. Heterogeneity-aware distributed parameter servers. In SIGMOD. ACM, 2017.

Digital Library

Google Scholar

[6]

J. Jiang, L. Yu, J. Jiang, Y. Liu, and B. Cui. Angel: a new large-scale machine learning system. National Science Review, 2017.

Google Scholar

[7]

A. Kumar, J. Naughton, and J. M. Patel. Learning generalized linear models over normalized data. In SIGMOD, 2015.

Digital Library

Google Scholar

[8]

M. Lin, Q. Chen, and S. Yan. Network In Network. ICLR, 2014.

Google Scholar

[9]

J. MacGregor. Predictive Analysis with SAP: The Comprehensive Guide. SAP PRESS, 2013.

Digital Library

Google Scholar

[10]

M. Schleich, D. Olteanu, and R. Ciucanu. Learning linear regression models over factorized joins. In SIGMOD, 2016.

Digital Library

Google Scholar

[11]

J. Seib and G. Lausen. Parallelizing Datalog programs by generalized pivoting. In PODS, 1991.

Digital Library

Google Scholar

Cited By

View all

Xing NCai SChen GLuo ZOoi BPei J(2024)Database Native Model Selection: Harnessing Deep Neural Networks in Database SystemsProceedings of the VLDB Endowment10.14778/3641204.364121217:5(1020-1033)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641212
Huang ZSen RLiu JWu E(2023)JoinBoost: Grow Trees over Normalized Data Using Only SQLProceedings of the VLDB Endowment10.14778/3611479.361150916:11(3071-3084)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611509
Wang XWu WWu JChen YZrymiak NQu CFlokas LChow GWang JWang TWu EZhou Q(2022)ConnectorXProceedings of the VLDB Endowment10.14778/3551793.355184715:11(2994-3003)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.14778/3551793.3551847
Show More Cited By

MLog: towards declarative in-database machine learning
1. Information systems
  1. Data management systems

Recommendations

Layout-sensitive language extensibility with SugarHaskell
Haskell '12: Proceedings of the 2012 Haskell Symposium

Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more ...
The 5th International Workshop on Machine Learning on Graphs (MLoG)
WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining

Graphs, which encode pairwise relations between entities, are a kind of universal data structure for a lot of real-world data, including social networks, transportation networks, and chemical molecules. Many important applications on these data can be ...
Layout-sensitive language extensibility with SugarHaskell
Haskell '12

Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 10, Issue 12

August 2017

427 pages

ISSN:2150-8097

Editors:
Peter Boncz
CWI
,
Ken Salem
University of Waterloo

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2017

Published in PVLDB Volume 10, Issue 12

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
593
Total Downloads

Downloads (Last 12 months)90
Downloads (Last 6 weeks)7

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Xing NCai SChen GLuo ZOoi BPei J(2024)Database Native Model Selection: Harnessing Deep Neural Networks in Database SystemsProceedings of the VLDB Endowment10.14778/3641204.364121217:5(1020-1033)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641212
Huang ZSen RLiu JWu E(2023)JoinBoost: Grow Trees over Normalized Data Using Only SQLProceedings of the VLDB Endowment10.14778/3611479.361150916:11(3071-3084)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611509
Wang XWu WWu JChen YZrymiak NQu CFlokas LChow GWang JWang TWu EZhou Q(2022)ConnectorXProceedings of the VLDB Endowment10.14778/3551793.355184715:11(2994-3003)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.14778/3551793.3551847
Zhou TTian RAshraf RGioiosa RKestor GSarkar VKloeckner AMoreira J(2022)ReACTProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569685(1-13)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569685
Leclercq ÉGillet AGrison TSavonnet M(2022)Polystore and Tensor Data Model for Logical Data Independence and Impedance Mismatch in Big Data AnalyticsTransactions on Large-Scale Data- and Knowledge-Centered Systems XLII10.1007/978-3-662-60531-8_3(51-90)Online publication date: 11-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-60531-8_3
Eltabakh MSubramanian AAl-Omari AAl-Kateb MNair SHasan MCabrera WZhang CKishore APrasad S(2021)Not black-box anymore!Proceedings of the VLDB Endowment10.14778/3476311.347637514:12(2959-2971)Online publication date: 28-Oct-2021
https://dl.acm.org/doi/10.14778/3476311.3476375
Yuan BJankov DZou JTang YBourgeois DJermaine C(2021)Tensor relational algebra for distributed machine learning system designProceedings of the VLDB Endowment10.14778/3457390.345739914:8(1338-1350)Online publication date: 1-Apr-2021
https://dl.acm.org/doi/10.14778/3457390.3457399
Jankov DYuan BLuo SJermaine C(2021)Distributed numerical and machine learning computations via two-phase execution of aggregated join treesProceedings of the VLDB Endowment10.14778/3450980.345099114:7(1228-1240)Online publication date: 12-Apr-2021
https://dl.acm.org/doi/10.14778/3450980.3450991
Makrynioti NLey-Wild RVassalos VBoehm MStoyanovich JWhang S(2021)Machine learning in SQL by translation to TensorFlowProceedings of the Fifth Workshop on Data Management for End-To-End Machine Learning10.1145/3462462.3468879(1-11)Online publication date: 20-Jun-2021
https://dl.acm.org/doi/10.1145/3462462.3468879
Wang JWu JLi MGu JDas AZaniolo C(2021)Formal semantics and high performance in declarative machine learning using DatalogThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00665-630:5(859-881)Online publication date: 31-May-2021
https://dl.acm.org/doi/10.1007/s00778-021-00665-6
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Layout-sensitive language extensibility with SugarHaskell

The 5th International Workshop on Machine Learning on Graphs (MLoG)

Layout-sensitive language extensibility with SugarHaskell