skip to main content
research-article

HippogriffDB: balancing I/O and GPU bandwidth in big data analytics

Published: 01 October 2016 Publication History

Abstract

As data sets grow and conventional processor performance scaling slows, data analytics move towards heterogeneous architectures that incorporate hardware accelerators (notably GPUs) to continue scaling performance. However, existing GPU-based databases fail to deal with big data applications efficiently: their execution model suffers from scalability limitations on GPUs whose memory capacity is limited; existing systems fail to consider the discrepancy between fast GPUs and slow storage, which can counteract the benefit of GPU accelerators.
In this paper, we propose HippogriffDB, an efficient, scalable GPU-accelerated OLAP system. It tackles the bandwidth discrepancy using compression and an optimized data transfer path. HippogriffDB stores tables in a compressed format and uses the GPU for decompression, trading GPU cycles for the improved I/O bandwidth. To improve the data transfer efficiency, HippogriffDB introduces a peer-to-peer, multi-threaded data transfer mechanism, directly transferring data from the SSD to the GPU. HippogriffDB adopts a query-over-block execution model that provides scalability using a stream-based approach. The model improves kernel efficiency with the operator fusion and double buffering mechanism.
We have implemented HippogriffDB using an NVMe SSD, which talks directly to a commercial GPU. Results on two popular benchmarks demonstrate its scalability and efficiency. HippogriffDB outperforms existing GPU-based databases (YDB) and in-memory data analytics (MonetDB) by 1-2 orders of magnitude.

References

[1]
http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-s3700-spec.pdf.
[2]
http://www.nvidia.com/object/tesla-servers.html.
[3]
https://developer.nvidia.com/gpudirect.
[4]
http://blog.pmcs.com/project-donard-peer-to-peer-communication-with-nvm-express-devices-part-two.
[5]
https://trademarks.justia.com/865/43/nvmedirect-86543720.html.
[6]
D. J. Abadi. Query execution in column-oriented database systems. PhD thesis, Massachusetts Institute of Technology, 2008.
[7]
D. J. Abadi, S. R. Madden, and N. Hachem. Column-stores vs. row-stores: How different are they really? In SIGMOD, pages 967--980. ACM, 2008.
[8]
D. Agrawal, P. Bernstein, E. Bertino, S. Davidson, U. Dayal, M. Franklin, J. Gehrke, L. Haas, A. Halevy, J. Han, et al. Challenges and opportunities with big data 2011--1. 2011.
[9]
P. A. Boncz, M. Zukowski, and N. Nes. Monetdb/x100: Hyper-pipelining query execution. In CIDR, volume 5, pages 225--237, 2005.
[10]
S. Breß and G. Saake. Why it is time for a hype: A hybrid query processing engine for efficient gpu coprocessing in dbms. VLDB, 6(12):1398--1403, 2013.
[11]
E. S. Chung, P. A. Milder, J. C. Hoe, and K. Mai. Single-chip heterogeneous computing: Does the future include custom logic, fpgas, and gpgpus? In MICRO, pages 225--236. IEEE Computer Society, 2010.
[12]
R. H. Dennard, V. Rideout, E. Bassous, and A. Leblanc. Design of ion-implanted mosfet's with very small physical dimensions. Solid-State Circuits, IEEE Journal of, 9(5):256--268, 1974.
[13]
H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In ISCA, pages 365--376, 2011.
[14]
W. Fang, B. He, and Q. Luo. Database compression on graphics processors. VLDB, 3(1--2):670--680, 2010.
[15]
N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. Gputerasort: high performance graphics co-processor sorting for large database management. In SIGMOD, pages 325--336. ACM, 2006.
[16]
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(EPFL-ARTICLE-168285):6--15, 2011.
[17]
B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, pages 511--524, 2008.
[18]
B. He and J. X. Yu. High-throughput transaction executions on graphics processors. VLDB, 4(5):314--325, 2011.
[19]
J. He, M. Lu, and B. He. Revisiting co-processing for hash joins on the coupled cpu-gpu architecture. VLDB, 6(10):889--900, 2013.
[20]
M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. VLDB, 6(9):709--720, 2013.
[21]
H. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M. Patel, R. Ramakrishnan, and C. Shahabi. Big data and its technical challenges. Communications of the ACM, 57(7):86--94, 2014.
[22]
S. Kim, S. Huh, Y. Hu, X. Zhang, A. Wated, E. Witchel, and M. Silberstein. Gpunet: Networking abstractions for gpu programs. In OSDI, pages 6--8, 2014.
[23]
R. Kimball and M. Ross. The data warehouse toolkit: The definitive guide to dimensional modeling. John Wiley & Sons, 2013.
[24]
J. Li, H.-W. Tseng, C. Lin, Y. Papakonstantinou, and S. Swanson. Hippogriffdb: Balancing i/o and gpu bandwidth in big data analytics. VLDB, 9(14), 2016.
[25]
Y. Liu, H.-W. Tseng, M. Gahagan, J. Li, Y. Jin, and S. Swanson. Hippogriff: Efficiently Moving Data in Heterogeneous Computing Systems. In ICCD, 2016.
[26]
S. Martello and P. Toth. Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc., 1990.
[27]
M. A. O'Neil and M. Burtscher. Floating-point data compression at 75 gb/s on a gpu. In GPGPU, page 7. ACM, 2011.
[28]
P. ONeil, E. ONeil, X. Chen, and S. Revilak. The star schema benchmark and augmented fact table indexing. In Performance evaluation and benchmarking, pages 237--252. Springer, 2009.
[29]
R. Pagh and F. F. Rodler. Cuckoo hashing. Springer, 2001.
[30]
R. Patel, Y. Zhang, J. Mak, A. Davidson, J. D. Owens, et al. Parallel lossless data compression on the GPU. IEEE, 2012.
[31]
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD, pages 165--178. ACM, 2009.
[32]
V. Sathish, M. J. Schulte, and N. S. Kim. Lossless and lossy memory i/o link compression for improving performance of gpgpu workloads. In PACT, pages 325--334. ACM, 2012.
[33]
S. Seshadri, M. Gahagan, S. Bhaskaran, T. Bunker, A. De, Y. Jin, Y. Liu, and S. Swanson. Willow: A user-programmable ssd. In OSDI, pages 67--80, Broomfield, CO, Oct. 2014. USENIX Association.
[34]
B. Smith. A survey of compressed domain processing techniques. Cornell University, 1995.
[35]
J. Teuhola. A compression method for clustered bit-vectors. Information processing letters, 7(6):308--311, 1978.
[36]
H.-W. Tseng, Y. Liu, M. Gahagan, J. Li, Y. Jin, and S. Swanson. Gullfoss: Accelerating and simplifying data movement among heterogeneous computing and storage resources. Technical report.
[37]
N. Vijaykumar, G. Pekhimenko, A. Jog, A. Bhowmick, R. Ausavarungnirun, C. Das, M. Kandemir, T. C. Mowry, and O. Mutlu. A case for core-assisted bottleneck acceleration in gpus: enabling flexible data compression with assist warps. In ISCA, pages 41--53. ACM, 2015.
[38]
K. Wang, Y. Huai, R. Lee, F. Wang, X. Zhang, and J. H. Saltz. Accelerating pathology image data cross-comparison on cpu-gpu hybrid systems. VLDB, 5(11):1543--1554, 2012.
[39]
K. Wang, K. Zhang, Y. Yuan, S. Ma, R. Lee, X. Ding, and X. Zhang. Concurrent analytical query processing with gpus. VLDB, 7(11):1011--1022, July 2014.
[40]
H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient gpu computation. In MICRO, pages 107--118. IEEE Computer Society, 2012.
[41]
Y. Yuan, R. Lee, and X. Zhang. The yin and yang of processing data warehousing queries on gpu devices. VLDB, 6(10):817--828, 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 9, Issue 14
October 2016
96 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 October 2016
Published in PVLDB Volume 9, Issue 14

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)7
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media