skip to main content
10.1145/2578948.2560688acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
tutorial

A Novel CPU-GPU Cooperative Implementation of A Parallel Two-List Algorithm for the Subset-Sum Problem

Published: 07 February 2014 Publication History

Abstract

The subset-sum problem is a well-known NP-complete decision problem. Many parallel algorithms have been developed to solve the problem within a reasonable computation time, and some of them have been implemented on a GPU. However, the GPU implementations of these parallel algorithms may fail to fully utilize all the CPU cores and the GPU resources at the same time. When the GPU performs some tasks, only one CPU core is used to control the GPU, all the rest of CPU cores are in idle state, this leads to large amounts of available CPU resources are wasted. This paper proposes a novel CPU-GPU cooperative implementation of a parallel two-list algorithm to efficiently solve the subset-sum problem in a heterogeneous CPU-GPU system, which enables the efficient utilization of all the available computational resources of both CPUs and GPUs. In order to find the most appropriate task distribution ratio between CPUs and GPUs, this paper establishes an optimal task distribution model. A series of experiments are conducted on two different hardware platforms. The experimental results show that the CPU-GPU cooperative implementation produces a speedup factor of 9.2 over the best sequential implementation, achieves up to 96.3% performance improvement over the optimized CPU-only implementation, and yields up to 25.7% performance improvement over the optimized GPU-only implementation.

References

[1]
S. G. Akl and N. Santoro. Optimal parallel merging and sorting without memory conflicts. Computers, IEEE Transactions on, 100(11):1367--1369, 1987.
[2]
R. E. Bellman. Dynamic Programming. Princeton University Press: Princeton, New Jersey, 1957.
[3]
S. S. Bokhari. Parallel solution of the subset-sum problem: an empirical study. Concurrency and Computation: Practice and Experience, 24(18):2241--2254, 2012.
[4]
V. Boyer, D. El Baz, and M. Elkihel. Solving knapsack problems on GPU. Computers & Operations Research, 39(1):42--47, 2012.
[5]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general-purpose applications on graphics processors using CUDA. Journal of parallel and distributed computing, 68(10):1370--1380, 2008.
[6]
F. B. Chedid. An optimal parallelization of the two-list algorithm of cost O(2n/2). Parallel Computing, 34(1):63--65, 2008.
[7]
L. Dagum and R. Menon. OpenMP: an industry standard API for shared-memory programming. Computational Science & Engineering, IEEE, 5(1):46--55, 1998.
[8]
H. Dyckhoff. A new linear programming approach to the cutting stock problem. Operations Research, 29(6):1092--1104, 1981.
[9]
A. G. Ferreira. A parallel time/hardware tradeoff T·H = O(2n/2) for the knapsack problem. Computers, IEEE Transactions on, 40(2):221--225, 1991.
[10]
E. Horowitz and S. Sahni. Computing partitions with applications to the knapsack problem. Journal of the ACM (JACM), 21(2):277--292, 1974.
[11]
E. D. Karnin. A parallel algorithm for the knapsack problem. Computers, IEEE Transactions on, 100(5):404--408, 1984.
[12]
A. J. Kleywegt and J. D. Papastavrou. The dynamic and stochastic knapsack problem with random sized items. Operations Research, 49(1):26--41, 2001.
[13]
M. E. Lalami and D. El-Baz. GPU implementation of the Branch and Bound method for knapsack problems. In Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International, pages 1769--1777. IEEE, Shanghai, China, 2012.
[14]
K.-L. Li, R.-F. Li, and Q.-H. Li. Optimal parallel algorithms for the knapsack problem without memory conflicts. Journal of Computer Science and Technology, 19(6):760--768, 2004.
[15]
D.-C. Lou and C.-C. Chang. A parallel two-list algorithm for the knapsack problem. Parallel Computing, 22(14):1985--1996, 1997.
[16]
S. Martello and P. Toth. Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc., 1990.
[17]
NVIDIA Corporation. Compute unified device architecture programming guide version 5.5. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf, August 2013.
[18]
Y. Ogata, T. Endo, N. Maruyama, and S. Matsuoka. An efficient, model-based CPU-GPU heterogeneous FFT library. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1--10. IEEE, 2008.
[19]
S. Ohshima, K. Kise, T. Katagiri, and T. Yuba. Parallel processing of matrix multiplication in a CPU and GPU heterogeneous environment. In High Performance Computing for Computational Science-VECPAR 2006, pages 305--318. Springer, 2007.
[20]
P. Pospíchal, J. Schwarz, and J. Jaros. Parallel genetic algorithm solving 0/1 knapsack problem running on the gpu. In 16th International Conference on Soft Computing MENDEL, pages 64--70. Brno University of Technology, Brno, Czech Republic, 2010.
[21]
C. A. A. Sanches, N. Y. Soma, and H. H. Yanasse. An optimal and scalable parallelization of the two-list algorithm for the subset-sum problem. European Journal of Operational Research, 176(2):870--879, 2007.
[22]
S. Tomov, J. Dongarra, and M. Baboulin. Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Computing, 36(5):232--240, 2010.
[23]
C. D. Yu, W. Wang, and D. Pierce. A CPU--GPU hybrid approach for the unsymmetric multifrontal method. Parallel Computing, 37(12):759--770, 2011.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores
February 2014
156 pages
ISBN:9781450326575
DOI:10.1145/2578948
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CPU-GPU cooperative computing
  2. CUDA
  3. knapsack problem
  4. parallel two-list algorithm
  5. subset-sum problem

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

PPoPP '14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 53 of 97 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media