Article

GPU-accelerated restricted boltzmann machine for collaborative filtering

Authors:

Xiaola LinAuthors Info & Claims

ICA3PP'12: Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I

Pages 303 - 316

https://doi.org/10.1007/978-3-642-33078-0_22

Published: 04 September 2012 Publication History

Abstract

Collaborative Filtering (CF) is an important technique for recommendation systems which model and analyzes the preferences of customers for giving reasonable advices. Recently, many applications based on Restricted Boltzmann Machine (RBM) have been developed for a large variety of learning problems. RBM-based model for Collaborative Filtering (RBM-CF) is able to deal with large scale data sets and obtains good recommendation performance. However, the computation of RBM becomes problematic when using large number of hidden features to improve the recommendation accuracy. Although RBM has great potential for parallelism, it is still a challenge to develop a parallel implementation of RBM-CF on GPU, since the data sets for CF are always large and sparse. In this paper, we propose a parallel implementation of RBM-CF on GPU using CUDA. We first present how to transform the computation of RBM-CF into matrix-based operation on GPU, and three CUDA kernels for sparse matrix-matrix multiplication to further improve the computational efficiency of RBM-CF for modeling large scale and sparse data sets. Experimental results show that significant speedups are achieved by our parallel implementation on GPU.

References

[1]

Smolensky, P.: Information processing in dynamical systems: Foundations of harmony theory. Parallel Distributed Processing: Explorations in the Microstructure of Cognition 1, 194-281 (1986).

[2]

Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning, pp. 791-798. ACM (2007).

[3]

Salakhutdinov, R., Hinton, G.: Deep boltzmann machines. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 5, pp. 448-455 (2009).

[4]

Ranzato, M., Szummer, M.: Semi-supervised learning of compact document representations with deep networks. In: Proceedings of the 25th International Conference on Machine Learning, pp. 792-799. ACM (2008).

[5]

Ly, D., Paprotski, V., Yen, D.: Neural networks on gpus: Restricted boltzmann machines. Tech. rep., Technical Report, Department of Electrical and Computer Engineering, University of Toronto (2008).

[6]

McAfee, L.: Design and analysis of blas, gpu, and sparse multithreaded acceleration methods for restricted boltzmann machine training.

[7]

Raina, R., Madhavan, A., Ng, A.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 873-880. ACM (2009).

[8]

Kim, S., McAfee, L., McMahon, P., Olukotun, K.: A highly scalable restricted boltzmann machine FPGA implementation. In: International Conference on Field Programmable Logic and Applications, FPL 2009, pp. 367-372. IEEE (2009).

[9]

Kim, S., McMahon, P., Olukotun, K.: A large-scale architecture for restricted boltzmann machines. In: 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 201-208. IEEE (2010).

[10]

Ly, D., Chow, P.: A high-performance FPGA architecture for restricted boltzmann machines. In: Proceeding of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 73-82. ACM (2009).

[11]

Ly, D., Chow, P.: A multi-fpga architecture for stochastic restricted boltzmann machines. In: International Conference on Field Programmable Logic and Applications, FPL 2009, pp. 168-173. IEEE (2009).

[12]

Le Ly, D., Chow, P.: High-performance reconfigurable hardware architecture for restricted boltzmann machines. IEEE Transactions on Neural Networks 21(11), 1780-1792 (2010).

[13]

Lekakos, G., Giaglis, G.: Improving the prediction accuracy of recommendation algorithms. Approaches Anchored on Human Factors. Interacting with Computers 18(3), 410- 431 (2006)

[14]

Roh, T., Oh, K., Han, I.: The collaborative filtering recommendation based on som cluster-indexing cbr. Expert Systems with Applications 25(3), 413-423 (2003).

[15]

Shih, Y., Liu, D.: Product recommendation approaches: Collaborative filtering via customer lifetime value and customer demands. Expert Systems with Applications 35(1), 350-360 (2008).

[16]

Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Computation 14(8), 1771-1800 (2002).

[17]

Nvidia, C.: Compute unified device architecture programming guide, vol. 83, p. 129. NVIDIA, Santa Clara (2007).

[18]

Nvidia, C.: Cublas library, vol. 15. NVIDIA Corporation, Santa Clara (2008).

Cited By

Xie XTan WFong LLiang YHuang HWeissman JIamnitchi AIosup A(2017)CuMF_SGDProceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3078597.3078602(79-92)Online publication date: 26-Jun-2017
https://dl.acm.org/doi/10.1145/3078597.3078602
Karydi EMargaritis K(2016)Parallel and Distributed Collaborative FilteringACM Computing Surveys10.1145/295195249:2(1-41)Online publication date: 13-Aug-2016
https://dl.acm.org/doi/10.1145/2951952
Tan WCao LFong LNakashima HTaura KLange J(2016)Faster and CheaperProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907297(219-230)Online publication date: 31-May-2016
https://dl.acm.org/doi/10.1145/2907294.2907297

Recommendations

An efficient parallel collaborative filtering algorithm on multi-GPU platform

Collaborative filtering (CF) is one of the essential algorithms in recommendation system. As the size of the data in real applications is huge, usually at the magnitude of Petabytes, parallel computing technique is required to accelerate the ...
Restricted Boltzmann machines for collaborative filtering
ICML '07: Proceedings of the 24th international conference on Machine learning

Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM's), can be used to model tabular data, ...
Optimizing linpack benchmark on GPU-accelerated petascale supercomputer
Special issue on Community Analysis and Information Recommendation

In this paper we present the programming of the Linpack benchmark on TianHe-1 system, the first petascale supercomputer system of China, and the largest GPU-accelerated heterogeneous system ever attempted before. A hybrid programming model consisting of ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICA3PP'12: Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I

September 2012

562 pages

ISBN:9783642330773

Editors:
Yang Xiang
School of Information Technology, Deakin University, Melbourne Burwood Campus, 221 Burwood Highway, Burwood, VIC, Australia
,
Ivan Stojmenovic
SEECS, University of Ottawa, Ottawa, ON, Canada
,
Bernady O. Apduhan
Department of Intelligent Informatics, Kyushu Sangyo University, Fukuoka, ON, Japan
,
Guojun Wang
School of Information Science and Engineering, Central South University, Changsha, Hunan Province, P.R. China
,
Koji Nakano
Department of Information Engineering, Hiroshima University, 1-4-1, Kagamiyama, Changsha, Higashi-Hiroshima, Japan

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 04 September 2012

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xie XTan WFong LLiang YHuang HWeissman JIamnitchi AIosup A(2017)CuMF_SGDProceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3078597.3078602(79-92)Online publication date: 26-Jun-2017
https://dl.acm.org/doi/10.1145/3078597.3078602
Karydi EMargaritis K(2016)Parallel and Distributed Collaborative FilteringACM Computing Surveys10.1145/295195249:2(1-41)Online publication date: 13-Aug-2016
https://dl.acm.org/doi/10.1145/2951952
Tan WCao LFong LNakashima HTaura KLange J(2016)Faster and CheaperProceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing10.1145/2907294.2907297(219-230)Online publication date: 31-May-2016
https://dl.acm.org/doi/10.1145/2907294.2907297

View Options

View options

Media

Figures

Other

Tables

View Table of Contents