skip to main content
research-article

Concurrent analytical query processing with GPUs

Published: 01 July 2014 Publication History

Abstract

In current databases, GPUs are used as dedicated accelerators to process each individual query. Sharing GPUs among concurrent queries is not supported, causing serious resource underutilization. Based on the profiling of an open-source GPU query engine running commonly used single-query data warehousing workloads, we observe that the utilization of main GPU resources is only up to 25%. The underutilization leads to low system throughput.
To address the problem, this paper proposes concurrent query execution as an effective solution. To efficiently share GPUs among concurrent queries for high throughput, the major challenge is to provide software support to control and resolve resource contention incurred by the sharing. Our solution relies on GPU query scheduling and device memory swapping policies to address this challenge. We have implemented a prototype system and evaluated it intensively. The experiment results confirm the effectiveness and performance advantage of our approach. By executing multiple GPU queries concurrently, system throughput can be improved by up to 55% compared with dedicated processing.

References

[1]
code.google.com/p/gpudb.
[2]
monetdb.org.
[3]
docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html.
[4]
N. Bandi, C. Sun, D. Agrawal, and A. El Abbadi. Hardware acceleration in commercial databases: A case study of spatial operations. In VLDB, 2004.
[5]
S. Bress. Why it is time for a HyPE: A hybrid query processing engine for efficient GPU coprocessing in DBMS. Proc. VLDB Endow., 6(12), 2013.
[6]
N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: High performance graphics co-processor sorting for large database management. In SIGMOD, pages 325--336, 2006.
[7]
N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD, 2004.
[8]
B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, pages 511--524, 2008.
[9]
B. He and J. X. Yu. High-throughput transaction executions on graphics processors. Proc. VLDB Endow., 4(5): 314--325, 2011.
[10]
M. Heimel and V. Markl. A first step towards GPU-assisted query optimization. In ADMS, 2012.
[11]
M. Heimel, M. Saecker, H. Pirk, S. Manegold, and V. Markl. Hardware-oblivious parallelism for in-memory column-stores. Proc. VLDB Endow., 6(9): 709--720, 2013.
[12]
T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. GPU join processing revisited. In DaMoN, pages 55--62, 2012.
[13]
S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. TimeGraph: GPU scheduling for real-time multi-tasking environments. In USENIX ATC, pages 2--2, 2011.
[14]
S. Kato, M. McThrow, C. Maltzahn, and S. Brandt. Gdev: First-class GPU resource management in the operating system. In USENIX ATC, 2012.
[15]
Khronos OpenCL Working Group. The OpenCL Specification, version 2.0, 2013.
[16]
R. Lee, T. Luo, Y. Huai, F. Wang, Y. He, and X. Zhang. YSmart: Yet another SQL-to-MapReduce translator. In ICDCS, pages 25--36, 2011.
[17]
T. Mostak. An overview of MapD (massively parallel database). MIT Technical Report, 2013.
[18]
O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In ISCA, pages 63--74, 2008.
[19]
T. Ni. DirectCompute: Bring GPU computing to the mainstream. In GTC, 2009.
[20]
NVIDIA. CUDA C programming guide, 2013.
[21]
P. O'Neil, B. O'Neil, and X. Chen. Star schema benchmark. cs.umb.edu/poneil/StarSchemaB.PDF.
[22]
H. Pirk, S. Manegold, and M. Kersten. Waste not... efficient co-processing of relational data. In ICDE, 2014.
[23]
H. Pirk, S. Manegold, and M. L. Kersten. Accelerating foreign-key joins using asymmetric memory channels. In VLDB, pages 27--35, 2011.
[24]
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. PTask: Operating system abstractions to manage GPUs as compute devices. In SOSP, 2011.
[25]
N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on CPUs and GPUs: A case for bandwidth oblivious SIMD sort. In SIGMOD, pages 351--362, 2010.
[26]
E. A. Sitaridi and K. A. Ross. Ameliorating memory contention of OLAP operators on GPU processors. In DaMoN, 2012.
[27]
A. Snavely and D. M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreaded processor. In ASPLOS, pages 234--244, 2000.
[28]
K. Wang, X. Ding, R. Lee, S. Kato, and X. Zhang. GDM: device memory management for GPGPU computing. In SIGMETRICS, 2014.
[29]
K. Wang, Y. Huai, R. Lee, F. Wang, X. Zhang, and J. H. Saltz. Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems. Proc. VLDB Endow., 5(11): 1543--1554, 2012.
[30]
H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili. Kernel weaver: Automatically fusing database primitives for efficient GPU computation. In Micro, pages 107--118, 2012.
[31]
S. Yalamanchili. Scaling data warehousing applications using GPUs. In FastPath, 2013.
[32]
Y. Yuan, R. Lee, and X. Zhang. The Yin and Yang of processing data warehousing queries on GPU devices. Proc. VLDB Endow., 6(10): 817--828, 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 7, Issue 11
July 2014
92 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2014
Published in PVLDB Volume 7, Issue 11

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media