skip to main content
10.1145/3318464.3380561acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

QUAD: Quadratic-Bound-based Kernel Density Visualization

Published: 31 May 2020 Publication History

Abstract

Kernel density visualization, or KDV, is used to view and understand data points in various domains, including traffic or crime hotspot detection, ecological modeling, chemical geology, and physical modeling. Existing solutions, which are based on computing kernel density (KDE) functions, are computationally expensive. Our goal is to improve the performance of KDV, in order to support large datasets (e.g., one million points) and high screen resolutions (e.g., 1280 x 960 pixels). We examine two widely-used variants of KDV, namely approximate kernel density visualization (EKDV) and thresholded kernel density visualization (TKDV). For these two operations, we develop fast solution, called QUAD, by deriving quadratic bounds of KDE functions for different types of kernel functions, including Gaussian, triangular etc. We further adopt a progressive visualization framework for KDV, in order to stream partial visualization results to users continuously. Extensive experiment results show that our new KDV techniques can provide at least one-order-of-magnitude speedup over existing methods, without degrading visualization quality. We further show that QUAD can produce the reasonable visualization results in real-time (0.5 sec) by combining the progressive visualization framework in single machine setting without using GPU and parallel computation.

Supplementary Material

MP4 File (3318464.3380561.mp4)
Presentation Video

References

[1]
Arcgis. http://pro.arcgis.com/en/pro-app/tool-reference/spatial-analyst/ how-kernel-density-works.htm.
[2]
Atlanta police department open data. http://opendata.atlantapd.org/.
[3]
Qgis. https://docs.qgis.org/2.18/en/docs/user_manual/plugins/plugins_heatmap.html.
[4]
UCI machine learning repository. http://archive.ics.uci.edu/ml/index.php.
[5]
Comparison of density estimation methods for astronomical datasets. Astronomy and Astrophysics, 531, 7 2011.
[6]
S. Chainey, L. Tompson, and S. Uhlig. The utility of hotspot mapping for predicting spatial patterns of crime. Security Journal, 21(1):4--28, Feb 2008.
[7]
T. N. Chan, R. Cheng, and M. L. Yiu. QUAD: Quadratic-boundbased kernel density visualization (HKU Technical Report TR-2019- . https://www.cs.hku.hk/data/techreps/document/TR-2019-05.pdf.
[8]
T. N. Chan, M. L. Yiu, and K. A. Hua. A progressive approach for similarity search on matrix. In SSTD, pages 373--390. Springer, 2015.
[9]
T. N. Chan, M. L. Yiu, and K. A. Hua. Efficient sub-window nearest neighbor search on matrix. IEEE Trans. Knowl. Data Eng., 29(4):784--797, 2017.
[10]
T. N. Chan, M. L. Yiu, and L. H. U. KARL: fast kernel aggregation queries. In ICDE, pages 542--553, 2019.
[11]
W. Chen, F. Guo, and F. Wang. A survey of traffic data visualization. IEEE Trans. Intelligent Transportation Systems, 16(6):2970--2984, 2015.
[12]
E. Cheney and W. Light. A Course in Approximation Theory. Mathematics Series. Brooks/Cole Publishing Company, 2000.
[13]
K. Cranmer. Kernel estimation in high-energy physics. 136:198--207, 2001.
[14]
M. D. Felice, M. Petitta, and P. M. Ruti. Short-term predictability of photovoltaic production over Italy. Renewable Energy, 80:197 -- 204, 2015.
[15]
S. Frey, F. Sadlo, K. Ma, and T. Ertl. Interactive progressive visualization with space-time error control. IEEE Trans. Vis. Comput. Graph., 20(12):2397--2406, 2014.
[16]
E. Gan and P. Bailis. Scalable kernel density classification via thresholdbased pruning. In ACM SIGMOD, pages 945--959, 2017.
[17]
E. R. Gansner, Y. Hu, S. C. North, and C. E. Scheidegger. Multilevel agglomerative edge bundling for visualizing large graphs. In PacificVis, pages 187--194, 2011.
[18]
W. Gong, D. Yang, H. V. Gupta, and G. Nearing. Estimating information entropy for hydrological data: One-dimensional case. Water Resources Research, 50(6):5003--5018, 2014.
[19]
A. Gramacki. Nonparametric Kernel Density Estimation and Its Computational Aspects. Studies in Big Data. Springer International Publishing, 2017.
[20]
A. G. Gray and A. W. Moore. Nonparametric density estimation: Toward computational tractability. In SDM, pages 203--211, 2003.
[21]
T. Guo, K. Feng, G. Cong, and Z. Bao. Efficient selection of geospatial data on maps for interactive and visualized exploration. In SIGMOD, pages 567--582, 2018.
[22]
T. Guo, M. Li, P. Li, Z. Bao, and G. Cong. Poisam: a system for efficient selection of large-scale geospatial data on maps. In SIGMOD, pages 1677--1680, 2018.
[23]
T. Hart and P. Zandbergen. Kernel density estimation and hotspot mapping: examining the influence of interpolation method, grid cell size, and bandwidth on crime forecasting. Policing: An International Journal of Police Strategies and Management, 37:305--323, 2014.
[24]
Q. Jin, X. Ma, G. Wang, X. Yang, and F. Guo. Dynamics of major air pollutants from crop residue burning in mainland china, 2000--2014. Journal of Environmental Sciences, 70:190 -- 205, 2018.
[25]
S. C. Joshi, R. V. Kommaraju, J. M. Phillips, and S. Venkatasubramanian. Comparing distributions and shapes using the kernel distance. In SOCG, pages 47--56, 2011.
[26]
P. K. Kefaloukos, M. A. V. Salles, and M. Zachariasen. Declarative cartography: In-database map generalization of geospatial datasets. In ICDE, pages 1024--1035, 2014.
[27]
J. Kehrer and H. Hauser. Visualization and visual analysis of multifaceted scientific data: A survey. IEEE Trans. Vis. Comput. Graph., 19(3):495--513, 2013.
[28]
D. A. Keim. Visual exploration of large data sets. Commun. ACM, 44(8):38--44, 2001.
[29]
O. D. Lampe and H. Hauser. Interactive visualization of streaming data with kernel density estimation. In Pacific Vis, pages 171--178, 2011.
[30]
H. Lee and K. Kang. Interpolation of missing precipitation data using kernel estimations for hydrologic modeling. Advances in Meteorology, pages 1--12, 2015.
[31]
M. Li, Z. Bao, F. M. Choudhury, and T. Sellis. Supporting large-scale geographical visualization in a multi-granularity way. In WSDM, pages 767--770, 2018.
[32]
Y.-P. Lin, H.-J. Chu, C.-F. Wu, T.-K. Chang, and C.-Y. Chen. Hotspot analysis of spatial environmental pollutants using kernel density estimation and geostatistical techniques. International Journal of Environmental Research and Public Health, 8(1):75--88, 2011.
[33]
Y. Ma, M. Richards, M. Ghanem, Y. Guo, and J. Hassard. Air pollution monitoring and mining based on sensor grid in london. Sensors, 8(6):3601--3623, 2008.
[34]
A. Mayorga and M. Gleicher. Splatterplots: Overcoming overdraw in scatter plots. IEEE Transactions on Visualization and Computer Graphics, 19(9):1526--1538, Sept 2013.
[35]
L. Micallef, G. Palmas, A. Oulasvirta, and T. Weinkauf. Towards perceptual optimization of the visual design of scatterplots. IEEE Trans. Vis. Comput. Graph., 23(6):1588--1599, 2017.
[36]
Y. Park, M. J. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. In ICDE, pages 755--766, 2016.
[37]
Y. Park, B. Mozafari, J. Sorenson, and J. Wang. Verdictdb: Universalizing approximate query processing. In SIGMOD, pages 1461--1476, 2018.
[38]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander- Plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825--2830, 2011.
[39]
A. Perrot, R. Bourqui, N. Hanusse, F. Lalanne, and D. Auber. Large interactive visualization of density functions on big data infrastructure. In LDAV, pages 99--106, 2015.
[40]
J. M. Phillips. -samples for kernels. In SODA, pages 1622--1632, 2013.
[41]
J. M. Phillips and W. M. Tai. Improved coresets for kernel density estimates. In SODA, pages 2718--2727, 2018.
[42]
J. M. Phillips and W. M. Tai. Near-optimal coresets of kernel density estimates. In SOCG, pages 66:1--66:13, 2018.
[43]
QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation, 2009.
[44]
V. C. Raykar, R. Duraiswami, and L. H. Zhao. Fast computation of kernel estimators. Journal of Computational and Graphical Statistics, 19(1):205--220, 2010.
[45]
A. D. Sarma, H. Lee, H. Gonzalez, J. Madhavan, and A. Y. Halevy. Efficient spatial sampling of large geographical tables. In SIGMOD, pages 193--204, 2012.
[46]
D. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. A Wiley-interscience publication. Wiley, 1992.
[47]
A. C. Telea. Data Visualization: Principles and Practice, Second Edition. A. K. Peters, Ltd., Natick, MA, USA, 2nd edition, 2014.
[48]
L. Thakali, T. J. Kwon, and L. Fu. Identification of crash hotspots using kernel density estimation and kriging methods: a comparison. Journal of Modern Transportation, 23(2):93--106, Jun 2015.
[49]
P. Vermeesch. On the visualisation of detrital age distributions. Chemical Geology, 312--313(Complete):190--194, 2012.
[50]
I. A. S. Vladislav Kirillovich Dziadyk. Theory of Uniform Approximation of Functions by Polynomials. Walter De Gruyter, 2008.
[51]
M. Williams and T. Munzner. Steerable, progressive multidimensional scaling. In InfoVis, pages 57--64, 2004.
[52]
K. Xie, K. Ozbay, A. Kurkcu, and H. Yang. Analysis of traffic crashes involving pedestrians using big data: Investigation of contributing factors and identification of hotspots. Risk Analysis, 37(8):1459--1476, 2017.
[53]
C. Yang, R. Duraiswami, and L. S. Davis. Efficient kernel machines using the improved fast gauss transform. In NIPS, pages 1561--1568, 2004.
[54]
H. Yu, P. Liu, J. Chen, and H. Wang. Comparative analysis of the spatial analysis methods for hotspot identification. Accident Analysis and Prevention, 66:80 -- 88, 2014.
[55]
G. Zhang, A. Zhu, and Q. Huang. A gpu-accelerated adaptive kernel density estimation approach for efficient point pattern analysis on spatial big data. International Journal of Geographical Information Science, 31(10):2068--2097, 2017.
[56]
X. Zhao and J. Tang. Crime in urban areas: A data mining perspective. SIGKDD Explorations, 20(1):1--12, 2018.
[57]
Y. Zheng, J. Jestes, J. M. Phillips, and F. Li. Quality and efficiency for kernel density estimates in large data. In SIGMOD, pages 433--444, 2013.
[58]
Y. Zheng, Y. Ou, A. Lex, and J. M. Phillips. Visualization of big spatial data using coresets for kernel density estimates. In IEEE Symposium on Visualization in Data Science (VDS '17), to appear. IEEE, 2017.
[59]
Y. Zheng and J. M. Phillips. L. error and bandwidth selection for kernel density estimates of large data. In SIGKDD, pages 1533--1542, 2015.
[60]
M. Zinsmaier, U. Brandes, O. Deussen, and H. Strobelt. Interactive level-of-detail rendering of large graphs. IEEE Trans. Vis. Comput. Graph., 18(12):2486--2495, 2012.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. KDV
  2. QUAD
  3. kernel density visualization
  4. quadratic bounds

Qualifiers

  • Research-article

Funding Sources

  • Hong Kong ITF
  • Hong Kong GRF
  • RGC of Hong Kong
  • HKU

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)5
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media