tutorial

A Novel CPU-GPU Cooperative Implementation of A Parallel Two-List Algorithm for the Subset-Sum Problem

Authors:

Keqin LiAuthors Info & Claims

PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Pages 70 - 79

https://doi.org/10.1145/2578948.2560688

Published: 07 February 2014 Publication History

Abstract

The subset-sum problem is a well-known NP-complete decision problem. Many parallel algorithms have been developed to solve the problem within a reasonable computation time, and some of them have been implemented on a GPU. However, the GPU implementations of these parallel algorithms may fail to fully utilize all the CPU cores and the GPU resources at the same time. When the GPU performs some tasks, only one CPU core is used to control the GPU, all the rest of CPU cores are in idle state, this leads to large amounts of available CPU resources are wasted. This paper proposes a novel CPU-GPU cooperative implementation of a parallel two-list algorithm to efficiently solve the subset-sum problem in a heterogeneous CPU-GPU system, which enables the efficient utilization of all the available computational resources of both CPUs and GPUs. In order to find the most appropriate task distribution ratio between CPUs and GPUs, this paper establishes an optimal task distribution model. A series of experiments are conducted on two different hardware platforms. The experimental results show that the CPU-GPU cooperative implementation produces a speedup factor of 9.2 over the best sequential implementation, achieves up to 96.3% performance improvement over the optimized CPU-only implementation, and yields up to 25.7% performance improvement over the optimized GPU-only implementation.

References

[1]

S. G. Akl and N. Santoro. Optimal parallel merging and sorting without memory conflicts. Computers, IEEE Transactions on, 100(11):1367--1369, 1987.

Digital Library

[2]

R. E. Bellman. Dynamic Programming. Princeton University Press: Princeton, New Jersey, 1957.

Digital Library

[3]

S. S. Bokhari. Parallel solution of the subset-sum problem: an empirical study. Concurrency and Computation: Practice and Experience, 24(18):2241--2254, 2012.

Digital Library

[4]

V. Boyer, D. El Baz, and M. Elkihel. Solving knapsack problems on GPU. Computers & Operations Research, 39(1):42--47, 2012.

Digital Library

[5]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, and K. Skadron. A performance study of general-purpose applications on graphics processors using CUDA. Journal of parallel and distributed computing, 68(10):1370--1380, 2008.

Digital Library

[6]

F. B. Chedid. An optimal parallelization of the two-list algorithm of cost O(2n/2). Parallel Computing, 34(1):63--65, 2008.

Digital Library

[7]

L. Dagum and R. Menon. OpenMP: an industry standard API for shared-memory programming. Computational Science & Engineering, IEEE, 5(1):46--55, 1998.

Digital Library

[8]

H. Dyckhoff. A new linear programming approach to the cutting stock problem. Operations Research, 29(6):1092--1104, 1981.

Digital Library

[9]

A. G. Ferreira. A parallel time/hardware tradeoff T·H = O(2n/2) for the knapsack problem. Computers, IEEE Transactions on, 40(2):221--225, 1991.

Digital Library

[10]

E. Horowitz and S. Sahni. Computing partitions with applications to the knapsack problem. Journal of the ACM (JACM), 21(2):277--292, 1974.

Digital Library

[11]

E. D. Karnin. A parallel algorithm for the knapsack problem. Computers, IEEE Transactions on, 100(5):404--408, 1984.

Digital Library

[12]

A. J. Kleywegt and J. D. Papastavrou. The dynamic and stochastic knapsack problem with random sized items. Operations Research, 49(1):26--41, 2001.

Digital Library

[13]

M. E. Lalami and D. El-Baz. GPU implementation of the Branch and Bound method for knapsack problems. In Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International, pages 1769--1777. IEEE, Shanghai, China, 2012.

Digital Library

[14]

K.-L. Li, R.-F. Li, and Q.-H. Li. Optimal parallel algorithms for the knapsack problem without memory conflicts. Journal of Computer Science and Technology, 19(6):760--768, 2004.

Digital Library

[15]

D.-C. Lou and C.-C. Chang. A parallel two-list algorithm for the knapsack problem. Parallel Computing, 22(14):1985--1996, 1997.

Digital Library

[16]

S. Martello and P. Toth. Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc., 1990.

Digital Library

[17]

NVIDIA Corporation. Compute unified device architecture programming guide version 5.5. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf, August 2013.

[18]

Y. Ogata, T. Endo, N. Maruyama, and S. Matsuoka. An efficient, model-based CPU-GPU heterogeneous FFT library. In Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1--10. IEEE, 2008.

[19]

S. Ohshima, K. Kise, T. Katagiri, and T. Yuba. Parallel processing of matrix multiplication in a CPU and GPU heterogeneous environment. In High Performance Computing for Computational Science-VECPAR 2006, pages 305--318. Springer, 2007.

Digital Library

[20]

P. Pospíchal, J. Schwarz, and J. Jaros. Parallel genetic algorithm solving 0/1 knapsack problem running on the gpu. In 16th International Conference on Soft Computing MENDEL, pages 64--70. Brno University of Technology, Brno, Czech Republic, 2010.

[21]

C. A. A. Sanches, N. Y. Soma, and H. H. Yanasse. An optimal and scalable parallelization of the two-list algorithm for the subset-sum problem. European Journal of Operational Research, 176(2):870--879, 2007.

[22]

S. Tomov, J. Dongarra, and M. Baboulin. Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Computing, 36(5):232--240, 2010.

Digital Library

[23]

C. D. Yu, W. Wang, and D. Pierce. A CPU--GPU hybrid approach for the unsymmetric multifrontal method. Parallel Computing, 37(12):759--770, 2011.

Digital Library

Cited By

Wan LLi KLi K(2016)A novel cooperative accelerated parallel two-list algorithm for solving the subset-sum problem on a hybrid CPU-GPU clusterJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.07.00397:C(112-123)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1016/j.jpdc.2016.07.003

Index Terms

A Novel CPU-GPU Cooperative Implementation of A Parallel Two-List Algorithm for the Subset-Sum Problem
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

Efficient CPU-GPU cooperative computing for solving the subset-sum problem

Heterogeneous CPU-GPU system is a powerful way to accelerate compute-intensive applications, such as the subset-sum problem. Many parallel algorithms for solving the problem have been implemented on graphics processing units GPUs. However, these GPU ...
A Novel CPU-GPU Cooperative Implementation of A Parallel Two-List Algorithm for the Subset-Sum Problem
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

The subset-sum problem is a well-known NP-complete decision problem. Many parallel algorithms have been developed to solve the problem within a reasonable computation time, and some of them have been implemented on a GPU. However, the GPU ...
GPU implementation of a parallel two-list algorithm for the subset-sum problem

The subset-sum problem is a well-known non-deterministic polynomial-time complete NP-complete decision problem. This paper proposes a novel and efficient implementation of a parallel two-list algorithm for solving the problem on a graphics processing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

February 2014

156 pages

ISBN:9781450326575

DOI:10.1145/2578948

Conference Chairs:
Pavan Balaji
Argonne National Laboratory, USA
,
Minyi Guo
Shanghai Jiao Tong, University, China
,
Zhiyi Huang
University of Otago, New Zealand

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

PPoPP '14

Sponsor:

SIGPLAN

PPoPP '14: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 15 - 19, 2014

FL, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 53 of 97 submissions, 55%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
240
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wan LLi KLi K(2016)A novel cooperative accelerated parallel two-list algorithm for solving the subset-sum problem on a hybrid CPU-GPU clusterJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.07.00397:C(112-123)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1016/j.jpdc.2016.07.003

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten