research-article

CATAPULT: Data-driven Selection of Canned Patterns for Efficient Visual Graph Query Formulation

Authors:

Sourav S. Bhowmick,

Shuigeng ZhouAuthors Info & Claims

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Pages 900 - 917

https://doi.org/10.1145/3299869.3300072

Published: 25 June 2019 Publication History

Abstract

Visual graph query interfaces (a.k.a gui ) widen the reach of graph querying frameworks across different users by enabling non-programmers to use them. Consequently, several commercial and academic frameworks for querying a large collection of small- or medium-sized data graphs (\textite.g., chemical compounds) provide such visual interfaces. Majority of these interfaces expose a fixed set ofcanned patterns (\textiti.e., small subgraph patterns) to expedite query formulation by enabling pattern-at-a-time in lieu of edge-at-a-time construction mode. Canned patterns to be displayed on a gui are typically selected manually based on domain knowledge. However, manual generation of canned patterns is labour intensive. Furthermore, these patterns may not sufficiently cover the underlying data graphs to expedite visual formulation of a wide range of subgraph queries. In this paper, we present a generic and extensible framework called Catapult to address these limitations. Catapult takes a data-driven approach toautomatically select canned patterns, thereby taking a concrete step towards the vision of data-driven construction of visual query interfaces. Specifically, it firstclusters the underlying data graphs based on their topological similarities and thensummarize each cluster to create acluster summary graph (csg ). The canned patterns within a user-specifiedpattern budget are then generated from these csg s by maximizingcoverage anddiversity, and minimizingcognitive load of the patterns. Experimental study with real-world datasets and visual graph interfaces demonstrates the superiority of Catapult compared to traditional techniques.

References

[1]

A.V. Aho, J.E. Hopcroft, J.E. Ullman. The design and analysis of computer algorithms. Addison-Wesley, 1974.

Digital Library

[2]

S. Arora, E. Hazan, S. Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory Comput. 8(1), 2012.

[3]

D. Arthur, S. Vassilvitskii. How slow is the k-means method?. In SCG, 2006.

Digital Library

[4]

D. Arthur, S. Vassilvitskii. k-means++: The advantages of careful seeding. In SIAM, 2007.

[5]

B. Bahmani,et al. Scalable k-means++. In VLDB, 2012.

Digital Library

[6]

S. S. Bhowmick, B. Choi, C. E. Dyreson. Data-driven visual graph query interface construction and maintenance: challenges and opportunities. PVLDB, 9(12): 984--992, 2016.

Digital Library

[7]

H. Bunke. On a relation between graph edit distance and maximum common subgraph. Pattern Recogn. Lett., 18(8):689--694, 1997.

Digital Library

[8]

T.Y.S. But, P.H. Toy. The Mitsunobu reaction: origin, mechanism, improvements, and applications. Chem. Asian J., 2(11):1340--1355, 2007.

[9]

Y. Chi, et al. Canonical forms for labelled trees and their applications in frequent subtree mining. Knowl. Inf. Syst., 8(2), 2005.

Digital Library

[10]

Y. Chi, et al. Indexing and mining free trees. In ICDM, 2003.

Digital Library

[11]

W.G. Cochran. Sampling techniques. Third edition. Wiley, New York, New York, USA.

[12]

C.A.C Coello, G.B. Lamont, D.A. Van Veldhuizen. Evolutionary algorithms for solving multi-objective problems. 2nd Edition, Springer, 2007.

Digital Library

[13]

D. Conte, P. Foggia, M. Vento. Challenging complexity of maximum common subgraph detection algorithms: a performance analysis of three algorithms on a wide database of graphs. J. Graph Algorithms Appl., 11(1):99--143, 2007.

[14]

L.P. Cordella, P. Foggia, C. Sansone. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell., 26(10):1367--1372, 2004.

Digital Library

[15]

L. Laura Faulkner. Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behavior Research Methods, Instruments, & Computers, 35(3), 2003.

[16]

F. Geerts, et al. Relational link-based ranking. In VLDB, 2004.

Digital Library

[17]

C. Guestrin, A. Krause, A.P. Singh. Near-optimal sensor placements in gaussian processes. In ICML, 2005.

Digital Library

[18]

S. Günter, H. Bunke. Self-organizing map for clustering in the graph domain. Pattern Recogn. Lett., 23(4):405--417, 2002.

Digital Library

[19]

H. He, A.K. Singh. Closure-tree: An index structure for graph queries. In ICDE, 2006.

Digital Library

[20]

W. Huang, P. Eades, S.H. Hong. Measuring effectiveness of graph visualizations: A cognitive load perspective. Inf. Vis., 8(3): 139--152, 2009.

Digital Library

[21]

K. Jain, V.V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. J. ACM, 48(2):274--296, 2001.

Digital Library

[22]

N. Jiang, Y. Bu, Y. Wang, M. Nie, D. Zhang, X. Zhai. Design, synthesis and structure-activity relationships of novel diaryl urea derivatives as potential EGFR inhibitors. Molecules, 21(11): 1572, 2016.

[23]

H. Kai et al. CATAPULT: data-driven selection of canned patterns for efficient visual graph query formulation. Technical Report. Available at: http://www.ntu.edu.sg/home/assourav/TechReports/catapult-TR. pdf, June 2018.

[24]

S. Khuller, A. Moss, J. Naor. The budgeted maximum coverage problem. Inf. Process. Lett., 70(1): 39--45, 1999.

Digital Library

[25]

S.G. Kobourov, S. Pupyrev, B. Saket. Are crossings important for drawing large graphs? In GD, 2014.

Digital Library

[26]

J. Lazar, J.H. Feng, H. Hochheiser. Research methods in humancomputer interaction. John Wiley & Sons, 2010.

Digital Library

[27]

J.J. McGregor. Backtrack search algorithms and the maximal common subgraph problem. Software: Practice and Experience, 12(1):29--34, 1982.

[28]

M.D. McKay, R.J Beckman,W.J. Conover. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2):239--245, 1979.

[29]

M. Meila. The uniqueness of a good optimum for k-means. In ICML, 2006.

Digital Library

[30]

S. Nijssen, J.N. Kok. The gaston tool for frequent subgraph mining. Electron. Notes Theor. Comput. Sci., 127(1): 77--87, 2005.

Digital Library

[31]

T. Ramraj, R. Prabhakar. Frequent subgraph mining algorithms - a survey. Procedia Computer Science, 47:197--204, 2015.

[32]

K. Riesen, M. Neuhaus, H. Bunke. Bipartite graph matching for computing the edit distance of graphs. In GbRPR, 2007.

Digital Library

[33]

S. Sakai, M. Togasaki, K. Yamazaki. A note on greedy algorithms for the maximum weighted independent set problem. Discrete Appl. Math. 126(2--3): 313--322, 2003.

Digital Library

[34]

S.E. Schaeffer. Graph clustering. Comput. Sci. Rev., 1(1):27--64, 2007.

Digital Library

[35]

T. Schäfer, P. Mutzel. StruClus: structural clustering of large-scale graph databases. CoRR abs/1609.09000, 2016.

[36]

H. Shang, X. Lin, Y. Zhang, J.X. Yu, W. Wang. Connected substructure similarity search. In SIGMOD, 2010.

Digital Library

[37]

C. Tofallis. Add or multiply? A tutorial on ranking and choosing with multiple criteria. INFORMS Trans. on Education, 14(3): 109--119, 2014.

Digital Library

[38]

H. Toivonen. Sampling large databases for association rules. In VLDB, 1996.

Digital Library

[39]

Y. Chi, R.R. Muntz, S. Nijssen. Frequent subtree mining. In T. Washio, et al., editors, Advances in Mining Graphs, Trees and Sequences, 2005.

[40]

J. Zhang, et al. DaVinci: Data-driven visual interface construction for subgraph search in graph databases. In ICDE, 2015.

[41]

H. Zhang, V. Raj, T. Sellam, E. Wu. Precision interfaces for different modalities. In SIGMOD, 2018.

Digital Library

Cited By

Huang KCui YYe QZhao YZhao XTian YZheng KHu HZhou X(2024)TED$^+$: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3312566(1-14)Online publication date: 2024
https://doi.org/10.1109/TKDE.2023.3312566
Huang KLi YYe QTian YZhao XCui YHu HZhou X(2024)FRESH: Towards Efficient Graph Queries in an Outsourced Graph2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00346(4545-4557)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00346
Bhowmick SChoi BBhowmick SChoi B(2023)Pattern Selection for Large NetworksPlug-and-Play Visual Subgraph Query Interfaces10.1007/978-3-031-16162-9_7(83-121)Online publication date: 14-Mar-2023
https://doi.org/10.1007/978-3-031-16162-9_7
Show More Cited By

Index Terms

CATAPULT: Data-driven Selection of Canned Patterns for Efficient Visual Graph Query Formulation
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Data structures and algorithms for data management

Recommendations

AURORA: Data-driven Construction of Visual Graph Query Interfaces for Graph Databases
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Several commercial and academic frameworks for querying a large collection of small- or medium-sized data graphs (eg. chemical compounds) provide visual graph query interfaces (a.k.a GUI) to facilitate non-programmers to query these sources. However, ...
Graph Querying Meets HCI: State of the Art and Future Directions
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Querying graph databases has emerged as an important research problem for real-world applications that center on large graph data. Given the syntactic complexity of graph query languages (e.g., SPARQL, Cypher), visual graph query interfaces make it easy ...
GBLENDER: towards blending visual query formulation and query processing in graph databases
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Given a graph database D and a query graph g, an exact subgraph matching query asks for the set S of graphs in D that contain g as a subgraph. This type of queries find important applications in several domains such as bioinformatics and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

June 2019

2106 pages

ISBN:9781450356435

DOI:10.1145/3299869

General Chairs:
Peter Boncz
CWI & Vrije Universiteit Amsterdam, The Netherlands
,
Stefan Manegold
CWI & Universiteit Leiden, The Netherlands
,
Program Chairs:
Anastasia Ailamaki
EPFL, Switzerland
,
Amol Deshpande
University of Maryland, USA
,
Tim Kraska
MIT, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministry of Education - Singapore

Conference

SIGMOD/PODS '19

Sponsor:

SIGMOD

SIGMOD/PODS '19: International Conference on Management of Data

June 30 - July 5, 2019

Amsterdam, Netherlands

Acceptance Rates

SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
329
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang KCui YYe QZhao YZhao XTian YZheng KHu HZhou X(2024)TED$^+$: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3312566(1-14)Online publication date: 2024
https://doi.org/10.1109/TKDE.2023.3312566
Huang KLi YYe QTian YZhao XCui YHu HZhou X(2024)FRESH: Towards Efficient Graph Queries in an Outsourced Graph2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00346(4545-4557)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00346
Bhowmick SChoi BBhowmick SChoi B(2023)Pattern Selection for Large NetworksPlug-and-Play Visual Subgraph Query Interfaces10.1007/978-3-031-16162-9_7(83-121)Online publication date: 14-Mar-2023
https://doi.org/10.1007/978-3-031-16162-9_7
Bhowmick SChoi BBhowmick SChoi B(2023)Pattern Selection for Graph DatabasesPlug-and-Play Visual Subgraph Query Interfaces10.1007/978-3-031-16162-9_6(49-81)Online publication date: 14-Mar-2023
https://doi.org/10.1007/978-3-031-16162-9_6
Bhowmick SChoi BBhowmick SChoi B(2023)The Building Block of PnP Interfaces: Canned PatternsPlug-and-Play Visual Subgraph Query Interfaces10.1007/978-3-031-16162-9_5(39-47)Online publication date: 14-Mar-2023
https://doi.org/10.1007/978-3-031-16162-9_5
Bhowmick SChoi BBhowmick SChoi B(2023)The World of Visual Graph Query Interfaces—An OverviewPlug-and-Play Visual Subgraph Query Interfaces10.1007/978-3-031-16162-9_3(21-28)Online publication date: 14-Mar-2023
https://doi.org/10.1007/978-3-031-16162-9_3
Bhowmick SChoi BBhowmick SChoi B(2023)The Future is Democratized GraphsPlug-and-Play Visual Subgraph Query Interfaces10.1007/978-3-031-16162-9_1(1-14)Online publication date: 14-Mar-2023
https://doi.org/10.1007/978-3-031-16162-9_1
Yi PLi JChoi BBhowmick SXu J(2022)FLAG: Towards Graph Query Autocompletion for Large GraphsData Science and Engineering10.1007/s41019-022-00182-87:2(175-191)Online publication date: 16-Apr-2022
https://doi.org/10.1007/s41019-022-00182-8
Yuan ZChua HBhowmick SYe ZHan WChoi B(2021)Towards plug-and-play visual graph query interfacesProceedings of the VLDB Endowment10.14778/3476249.347625614:11(1979-1991)Online publication date: 27-Oct-2021
https://dl.acm.org/doi/10.14778/3476249.3476256
Tzanikos MKrommyda MKantere V(2021)A Highly Modular Architecture for Canned Pattern Selection ProblemDatabase and Expert Systems Applications10.1007/978-3-030-86475-0_8(77-83)Online publication date: 1-Sep-2021
https://doi.org/10.1007/978-3-030-86475-0_8
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents