skip to main content
10.1145/2939502.2939503acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Visual exploration of machine learning results using data cube analysis

Published: 26 June 2016 Publication History

Abstract

As complex machine learning systems become more widely adopted, it becomes increasingly challenging for users to understand models or interpret the results generated from the models. We present our ongoing work on developing interactive and visual approaches for exploring and understanding machine learning results using data cube analysis. We propose MLCube, a data cube inspired framework that enables users to define instance subsets using feature conditions and computes aggregate statistics and evaluation metrics over the subsets. We also design MLCube Explorer, an interactive visualization tool for comparing models' performances over the subsets. Users can interactively specify operations, such as drilling down to specific instance subsets, to perform more in-depth exploration. Through a usage scenario, we demonstrate how MLCube Explorer works with a public advertisement click log data set, to help a user build new advertisement click prediction models that advance over an existing model.

References

[1]
S. Amershi, M. Chickering, S. M. Drucker, B. Lee, P. Simard, and J. Suh. Modeltracker: Redesigning performance analysis tools for machine learning. In CHI, pages 337--346. ACM, 2015.
[2]
M. R. Anderson, D. Antenucci, V. Bittorf, M. Burgess, M. J. Cafarella, A. Kumar, F. Niu, Y. Park, C. Ré, and C. Zhang. Brainwash: A data system for feature engineering. In CIDR, 2013.
[3]
M. R. Anderson and M. Cafarella. Input selection for fast feature engineering. In ICDE, 2016.
[4]
R. Barga, V. Fontama, and W. H. Tok. Predictive analytics with Microsoft Azure machine learning (2nd Edition). Apress, 2015.
[5]
M. Brooks, S. Amershi, B. Lee, S. M. Drucker, A. Kapoor, and P. Simard. Featureinsight: Visual support for error-driven feature ideation in text classification. In IEEE Conference on Visual Analytics Science and Technology, pages 105--112. IEEE, 2015.
[6]
B.-C. Chen, L. Chen, Y. Lin, and R. Ramakrishnan. Prediction cubes. In VLDB, pages 982--993, 2005.
[7]
M. Das, S. Amer-Yahia, G. Das, and C. Yu. Mri: Meaningful interpretations of collaborative ratings. Proceedings of the VLDB Endowment, 4(11), 2011.
[8]
M. Joglekar, H. Garcia-Molina, and A. Parameswaran. Interactive data exploration with smart drill-down. In ICDE, 2016.
[9]
N. Kamat, P. Jayachandran, K. Tunga, and A. Nandi. Distributed and interactive cube exploration. In ICDE, pages 472--483. IEEE, 2014.
[10]
J. Krause, A. Perer, and E. Bertini. Infuse: interactive feature selection for predictive modeling of high dimensional data. IEEE Transactions on Visualization and Computer Graphics, 20(12):1614--1623, 2014.
[11]
J. Krause, A. Perer, and K. Ng. Interacting with predictions: Visual inspection of black-box machine learning models. In CHI, pages 5686--5697. ACM, 2016.
[12]
T. Kulesza, M. Burnett, W.-K. Wong, and S. Stumpf. Principles of explanatory debugging to personalize interactive machine learning. In IUI, pages 126--137. ACM, 2015.
[13]
A. Kumar, R. McCann, J. Naughton, and J. M. Patel. Model selection management systems: The next frontier of advanced analytics. ACM SIGMOD Record, 2015.
[14]
A. Kumar, J. Naughton, and J. M. Patel. Learning generalized linear models over normalized data. In SIGMOD, pages 1969--1984. ACM, 2015.
[15]
Z. Liu, B. Jiang, and J. Heer. immens: Real-time visual querying of big data. Computer Graphics Forum (Proceedings of EuroVis), 32(3pt4):421--430, 2013.
[16]
H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, et al. Ad click prediction: a view from the trenches. In KDD, pages 1222--1230. ACM, 2013.
[17]
A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan. Data cube materialization and mining over mapreduce. IEEE Transactions on Knowledge and Data Engineering, 24(10):1747--1759, 2012.
[18]
K. Patel, N. Bancroft, S. M. Drucker, J. Fogarty, A. J. Ko, and J. Landay. Gestalt: integrated support for implementation and analysis in machine learning. In UIST, pages 37--46. ACM, 2010.
[19]
M. T. Ribeiro, S. Singh, and C. Guestrin. "why should i trust you?": Explaining the predictions of any classifier. In KDD. ACM, 2016.
[20]
S. Sarawagi and G. Sathe. i3: intelligent, interactive investigation of olap data cubes. ACM SIGMOD Record, 29(2):589, 2000.
[21]
D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison. Hidden technical debt in machine learning systems. In NIPS, pages 2494--2502, 2015.
[22]
C. Stolte, D. Tang, and P. Hanrahan. Polaris: A system for query, analysis, and visualization of multidimensional relational databases. IEEE Transactions on Visualization and Computer Graphics, 8(1):52--65, 2002.
[23]
C. Stolte, D. Tang, and P. Hanrahan. Multiscale visualization using data cubes. IEEE Transactions on Visualization and Computer Graphics, 9(2):176--187, 2003.
[24]
S. Van Den Elzen and J. J. Van Wijk. Baobabview: Interactive construction and analysis of decision trees. In IEEE Conference on Visual Analytics Science and Technology, pages 151--160. IEEE, 2011.
[25]
K.-W. Wu, C.-S. Ferng, C.-H. Ho, A.-C. Liang, C.-H. Huang, W.-Y. Shen, J.-Y. Jiang, M.-H. Yang, T.-W. Lin, C.-P. Lee, et al. A two-stage ensemble of diverse models for advertisement ranking in kdd cup 2012. In ACM KDD Cup Workshop, 2012.
[26]
C. Zhang, A. Kumar, and C. Ré. Materialization optimizations for feature selection workloads. In SIGMOD, pages 265--276. ACM, 2014.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HILDA '16: Proceedings of the Workshop on Human-In-the-Loop Data Analytics
June 2016
93 pages
ISBN:9781450342070
DOI:10.1145/2939502
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Paxata: Paxata
  • tableau: Tableau Software
  • Trifacta: Trifacta
  • IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data cube
  2. data visualization
  3. interactive data analysis
  4. machine learning

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'16
Sponsor:
  • Paxata
  • tableau
  • Trifacta
  • IBM
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco

Acceptance Rates

HILDA '16 Paper Acceptance Rate 16 of 32 submissions, 50%;
Overall Acceptance Rate 28 of 56 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)172
  • Downloads (Last 6 weeks)17
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media