research-article

Public Access

Visual exploration of machine learning results using data cube analysis

Authors:

Duen Horng (Polo) ChauAuthors Info & Claims

HILDA '16: Proceedings of the Workshop on Human-In-the-Loop Data Analytics

Article No.: 1, Pages 1 - 6

https://doi.org/10.1145/2939502.2939503

Published: 26 June 2016 Publication History

Abstract

As complex machine learning systems become more widely adopted, it becomes increasingly challenging for users to understand models or interpret the results generated from the models. We present our ongoing work on developing interactive and visual approaches for exploring and understanding machine learning results using data cube analysis. We propose MLCube, a data cube inspired framework that enables users to define instance subsets using feature conditions and computes aggregate statistics and evaluation metrics over the subsets. We also design MLCube Explorer, an interactive visualization tool for comparing models' performances over the subsets. Users can interactively specify operations, such as drilling down to specific instance subsets, to perform more in-depth exploration. Through a usage scenario, we demonstrate how MLCube Explorer works with a public advertisement click log data set, to help a user build new advertisement click prediction models that advance over an existing model.

References

[1]

S. Amershi, M. Chickering, S. M. Drucker, B. Lee, P. Simard, and J. Suh. Modeltracker: Redesigning performance analysis tools for machine learning. In CHI, pages 337--346. ACM, 2015.

Digital Library

[2]

M. R. Anderson, D. Antenucci, V. Bittorf, M. Burgess, M. J. Cafarella, A. Kumar, F. Niu, Y. Park, C. Ré, and C. Zhang. Brainwash: A data system for feature engineering. In CIDR, 2013.

[3]

M. R. Anderson and M. Cafarella. Input selection for fast feature engineering. In ICDE, 2016.

[4]

R. Barga, V. Fontama, and W. H. Tok. Predictive analytics with Microsoft Azure machine learning (2nd Edition). Apress, 2015.

Digital Library

[5]

M. Brooks, S. Amershi, B. Lee, S. M. Drucker, A. Kapoor, and P. Simard. Featureinsight: Visual support for error-driven feature ideation in text classification. In IEEE Conference on Visual Analytics Science and Technology, pages 105--112. IEEE, 2015.

[6]

B.-C. Chen, L. Chen, Y. Lin, and R. Ramakrishnan. Prediction cubes. In VLDB, pages 982--993, 2005.

Digital Library

[7]

M. Das, S. Amer-Yahia, G. Das, and C. Yu. Mri: Meaningful interpretations of collaborative ratings. Proceedings of the VLDB Endowment, 4(11), 2011.

[8]

M. Joglekar, H. Garcia-Molina, and A. Parameswaran. Interactive data exploration with smart drill-down. In ICDE, 2016.

[9]

N. Kamat, P. Jayachandran, K. Tunga, and A. Nandi. Distributed and interactive cube exploration. In ICDE, pages 472--483. IEEE, 2014.

[10]

J. Krause, A. Perer, and E. Bertini. Infuse: interactive feature selection for predictive modeling of high dimensional data. IEEE Transactions on Visualization and Computer Graphics, 20(12):1614--1623, 2014.

[11]

J. Krause, A. Perer, and K. Ng. Interacting with predictions: Visual inspection of black-box machine learning models. In CHI, pages 5686--5697. ACM, 2016.

Digital Library

[12]

T. Kulesza, M. Burnett, W.-K. Wong, and S. Stumpf. Principles of explanatory debugging to personalize interactive machine learning. In IUI, pages 126--137. ACM, 2015.

Digital Library

[13]

A. Kumar, R. McCann, J. Naughton, and J. M. Patel. Model selection management systems: The next frontier of advanced analytics. ACM SIGMOD Record, 2015.

Digital Library

[14]

A. Kumar, J. Naughton, and J. M. Patel. Learning generalized linear models over normalized data. In SIGMOD, pages 1969--1984. ACM, 2015.

Digital Library

[15]

Z. Liu, B. Jiang, and J. Heer. immens: Real-time visual querying of big data. Computer Graphics Forum (Proceedings of EuroVis), 32(3pt4):421--430, 2013.

Digital Library

[16]

H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, et al. Ad click prediction: a view from the trenches. In KDD, pages 1222--1230. ACM, 2013.

Digital Library

[17]

A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan. Data cube materialization and mining over mapreduce. IEEE Transactions on Knowledge and Data Engineering, 24(10):1747--1759, 2012.

Digital Library

[18]

K. Patel, N. Bancroft, S. M. Drucker, J. Fogarty, A. J. Ko, and J. Landay. Gestalt: integrated support for implementation and analysis in machine learning. In UIST, pages 37--46. ACM, 2010.

Digital Library

[19]

M. T. Ribeiro, S. Singh, and C. Guestrin. "why should i trust you?": Explaining the predictions of any classifier. In KDD. ACM, 2016.

Digital Library

[20]

S. Sarawagi and G. Sathe. i3: intelligent, interactive investigation of olap data cubes. ACM SIGMOD Record, 29(2):589, 2000.

Digital Library

[21]

D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison. Hidden technical debt in machine learning systems. In NIPS, pages 2494--2502, 2015.

Digital Library

[22]

C. Stolte, D. Tang, and P. Hanrahan. Polaris: A system for query, analysis, and visualization of multidimensional relational databases. IEEE Transactions on Visualization and Computer Graphics, 8(1):52--65, 2002.

Digital Library

[23]

C. Stolte, D. Tang, and P. Hanrahan. Multiscale visualization using data cubes. IEEE Transactions on Visualization and Computer Graphics, 9(2):176--187, 2003.

Digital Library

[24]

S. Van Den Elzen and J. J. Van Wijk. Baobabview: Interactive construction and analysis of decision trees. In IEEE Conference on Visual Analytics Science and Technology, pages 151--160. IEEE, 2011.

[25]

K.-W. Wu, C.-S. Ferng, C.-H. Ho, A.-C. Liang, C.-H. Huang, W.-Y. Shen, J.-Y. Jiang, M.-H. Yang, T.-W. Lin, C.-P. Lee, et al. A two-stage ensemble of diverse models for advertisement ranking in kdd cup 2012. In ACM KDD Cup Workshop, 2012.

[26]

C. Zhang, A. Kumar, and C. Ré. Materialization optimizations for feature selection workloads. In SIGMOD, pages 265--276. ACM, 2014.

Digital Library

Cited By

Dong SWang QSahri SPalpanas TSrivastava D(2024)Efficiently Mitigating the Impact of Data Drift on Machine Learning PipelinesProceedings of the VLDB Endowment10.14778/3681954.368198417:11(3072-3081)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681984
Zhang HYan BCao LMadden SRundensteiner E(2024)MetaStore: Analyzing Deep Learning Meta-Data at ScaleProceedings of the VLDB Endowment10.14778/3648160.364818217:6(1446-1459)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648182
Hohman FWang CLee JGörtler JMoritz DBigham JRen ZForet CShan QZhang X(2024)Talaria: Interactively Optimizing Machine Learning Models for Efficient InferenceProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642628(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642628
Show More Cited By

Index Terms

Visual exploration of machine learning results using data cube analysis

Recommendations

3D Parallel Coordinates for Multidimensional Data Cube Exploration
ICCBD '18: Proceedings of the 2018 International Conference on Computing and Big Data

Visual analytics becomes an important approach for discovering patterns in big data. As visualization struggles from high dimensionality of data, issues like concept hierarchy on each dimension add more difficulty and make visualization a prohibitive ...
Cognitive Stages in Visual Data Exploration
BELIV '16: Proceedings of the Sixth Workshop on Beyond Time and Errors on Novel Evaluation Methods for Visualization

Data exploration requires forming analysis goals, planning actions and evaluating results effectively, all of which are complex cognitive activities. Therefore, the data exploration and analysis process can be improved through a principled and ...
The Application of Data Cubes in Business Data Visualization

Data cubes are used in online analytical processing (OLAP) systems to support decision making. Constructed from base business data, this interactive visualization system introduces a conditional 1D cuboid hierarchical tree structure to represent data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HILDA '16: Proceedings of the Workshop on Human-In-the-Loop Data Analytics

June 2016

93 pages

ISBN:9781450342070

DOI:10.1145/2939502

Conference Chairs:
Carsten Binnig,
Alan Fekete,
Arnab Nandi

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Paxata: Paxata
tableau: Tableau Software
Trifacta: Trifacta
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SIGMOD/PODS'16

Sponsor:

Paxata
tableau
Trifacta
IBM

SIGMOD/PODS'16: International Conference on Management of Data

June 26 - July 1, 2016

California, San Francisco

Acceptance Rates

HILDA '16 Paper Acceptance Rate 16 of 32 submissions, 50%;

Overall Acceptance Rate 28 of 56 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

59
Total Citations
View Citations
1,660
Total Downloads

Downloads (Last 12 months)172
Downloads (Last 6 weeks)17

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dong SWang QSahri SPalpanas TSrivastava D(2024)Efficiently Mitigating the Impact of Data Drift on Machine Learning PipelinesProceedings of the VLDB Endowment10.14778/3681954.368198417:11(3072-3081)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3681984
Zhang HYan BCao LMadden SRundensteiner E(2024)MetaStore: Analyzing Deep Learning Meta-Data at ScaleProceedings of the VLDB Endowment10.14778/3648160.364818217:6(1446-1459)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648182
Hohman FWang CLee JGörtler JMoritz DBigham JRen ZForet CShan QZhang X(2024)Talaria: Interactively Optimizing Machine Learning Models for Efficient InferenceProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642628(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642628
Gero KSwoopes CGu ZKummerfeld JGlassman E(2024)Supporting Sensemaking of Large Language Model Outputs at ScaleProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642139(1-21)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642139
Prasad Vvan Sloun RElzen SVilanova APezzotti N(2024) The Transform-and-Perform Framework: Explainable Deep Learning Beyond Classification IEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.321924830:2(1502-1515)Online publication date: Feb-2024
https://doi.org/10.1109/TVCG.2022.3219248
Sun TGao YKhaladkar SLiu SZhao LKim YHong S(2023)Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local ExplanationsProceedings of the ACM on Human-Computer Interaction10.1145/36101877:CSCW2(1-32)Online publication date: 4-Oct-2023
https://dl.acm.org/doi/10.1145/3610187
Rathore ADev SPhillips JSrikumar VZheng YYeh CWang JZhang WWang B(2023)VERB: Visualizing and Interpreting Bias Mitigation Techniques Geometrically for Word RepresentationsACM Transactions on Interactive Intelligent Systems10.1145/360443314:1(1-34)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1145/3604433
Kerrigan DBertini E(2023)SliceLensProceedings of the Workshop on Human-In-the-Loop Data Analytics10.1145/3597465.3605217(1-7)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3597465.3605217
Basil John SLindner PJiang ZKoch CDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Aggregation and Exploration of High-Dimensional Data Using the Sudokube Data Cube EngineCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589729(175-178)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589729
Cabrera ÁFu EBertucci DHolstein KTalwalkar AHong JPerer A(2023)Zeno: An Interactive Framework for Behavioral Evaluation of Machine LearningProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581268(1-14)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581268
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents