skip to main content
research-article

MLog: towards declarative in-database machine learning

Published: 01 August 2017 Publication History

Abstract

We demonstrate MLog, a high-level language that integrates machine learning into data management systems. Unlike existing machine learning frameworks (e.g., TensorFlow, Theano, and Caffe), MLog is declarative, in the sense that the system manages all data movement, data persistency, and machine-learning related optimizations (such as data batching) automatically. Our interactive demonstration will show audience how this is achieved based on the novel notion of tensoral views (TViews), which are similar to relational views but operate over tensors with linear algebra. With MLog, users can succinctly specify not only simple models such as SVM (in just two lines), but also sophisticated deep learning models that are not supported by existing in-database analytics systems (e.g., MADlib, PAL, and SciDB), as a series of cascaded TViews. Given the declarative nature of MLog, we further demonstrate how query/program optimization techniques can be leveraged to translate MLog programs into native TensorFlow programs. The performance of the automatically generated Tensor-Flow programs is comparable to that of hand-optimized ones.

References

[1]
P. G. Brown. Overview of scidb: Large scale array storage, processing and analysis. In SIGMOD, 2010.
[2]
S. Cohen, W. Nutt, and Y. Sagiv. Deciding equivalences among conjunctive aggregate queries. J. ACM, 2007.
[3]
S. Hadjis, F. Abuzaid, C. Zhang, and C. Ré. Caffe Con Troll: Shallow ideas to speed up deep learning. In DanaC, 2015.
[4]
J. M. Hellerstein et al. The MADlib analytics library: Or mad skills, the sql. Proc. VLDB Endow., 2012.
[5]
J. Jiang, B. Cui, C. Zhang, and L. Yu. Heterogeneity-aware distributed parameter servers. In SIGMOD. ACM, 2017.
[6]
J. Jiang, L. Yu, J. Jiang, Y. Liu, and B. Cui. Angel: a new large-scale machine learning system. National Science Review, 2017.
[7]
A. Kumar, J. Naughton, and J. M. Patel. Learning generalized linear models over normalized data. In SIGMOD, 2015.
[8]
M. Lin, Q. Chen, and S. Yan. Network In Network. ICLR, 2014.
[9]
J. MacGregor. Predictive Analysis with SAP: The Comprehensive Guide. SAP PRESS, 2013.
[10]
M. Schleich, D. Olteanu, and R. Ciucanu. Learning linear regression models over factorized joins. In SIGMOD, 2016.
[11]
J. Seib and G. Lausen. Parallelizing Datalog programs by generalized pivoting. In PODS, 1991.

Cited By

View all
  1. MLog: towards declarative in-database machine learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 10, Issue 12
    August 2017
    427 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 August 2017
    Published in PVLDB Volume 10, Issue 12

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)90
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media