skip to main content
10.1007/978-3-642-33078-0_22guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

GPU-accelerated restricted boltzmann machine for collaborative filtering

Published: 04 September 2012 Publication History

Abstract

Collaborative Filtering (CF) is an important technique for recommendation systems which model and analyzes the preferences of customers for giving reasonable advices. Recently, many applications based on Restricted Boltzmann Machine (RBM) have been developed for a large variety of learning problems. RBM-based model for Collaborative Filtering (RBM-CF) is able to deal with large scale data sets and obtains good recommendation performance. However, the computation of RBM becomes problematic when using large number of hidden features to improve the recommendation accuracy. Although RBM has great potential for parallelism, it is still a challenge to develop a parallel implementation of RBM-CF on GPU, since the data sets for CF are always large and sparse. In this paper, we propose a parallel implementation of RBM-CF on GPU using CUDA. We first present how to transform the computation of RBM-CF into matrix-based operation on GPU, and three CUDA kernels for sparse matrix-matrix multiplication to further improve the computational efficiency of RBM-CF for modeling large scale and sparse data sets. Experimental results show that significant speedups are achieved by our parallel implementation on GPU.

References

[1]
Smolensky, P.: Information processing in dynamical systems: Foundations of harmony theory. Parallel Distributed Processing: Explorations in the Microstructure of Cognition 1, 194-281 (1986).
[2]
Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning, pp. 791-798. ACM (2007).
[3]
Salakhutdinov, R., Hinton, G.: Deep boltzmann machines. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 5, pp. 448-455 (2009).
[4]
Ranzato, M., Szummer, M.: Semi-supervised learning of compact document representations with deep networks. In: Proceedings of the 25th International Conference on Machine Learning, pp. 792-799. ACM (2008).
[5]
Ly, D., Paprotski, V., Yen, D.: Neural networks on gpus: Restricted boltzmann machines. Tech. rep., Technical Report, Department of Electrical and Computer Engineering, University of Toronto (2008).
[6]
McAfee, L.: Design and analysis of blas, gpu, and sparse multithreaded acceleration methods for restricted boltzmann machine training.
[7]
Raina, R., Madhavan, A., Ng, A.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 873-880. ACM (2009).
[8]
Kim, S., McAfee, L., McMahon, P., Olukotun, K.: A highly scalable restricted boltzmann machine FPGA implementation. In: International Conference on Field Programmable Logic and Applications, FPL 2009, pp. 367-372. IEEE (2009).
[9]
Kim, S., McMahon, P., Olukotun, K.: A large-scale architecture for restricted boltzmann machines. In: 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 201-208. IEEE (2010).
[10]
Ly, D., Chow, P.: A high-performance FPGA architecture for restricted boltzmann machines. In: Proceeding of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 73-82. ACM (2009).
[11]
Ly, D., Chow, P.: A multi-fpga architecture for stochastic restricted boltzmann machines. In: International Conference on Field Programmable Logic and Applications, FPL 2009, pp. 168-173. IEEE (2009).
[12]
Le Ly, D., Chow, P.: High-performance reconfigurable hardware architecture for restricted boltzmann machines. IEEE Transactions on Neural Networks 21(11), 1780-1792 (2010).
[13]
Lekakos, G., Giaglis, G.: Improving the prediction accuracy of recommendation algorithms. Approaches Anchored on Human Factors. Interacting with Computers 18(3), 410- 431 (2006)
[14]
Roh, T., Oh, K., Han, I.: The collaborative filtering recommendation based on som cluster-indexing cbr. Expert Systems with Applications 25(3), 413-423 (2003).
[15]
Shih, Y., Liu, D.: Product recommendation approaches: Collaborative filtering via customer lifetime value and customer demands. Expert Systems with Applications 35(1), 350-360 (2008).
[16]
Hinton, G.: Training products of experts by minimizing contrastive divergence. Neural Computation 14(8), 1771-1800 (2002).
[17]
Nvidia, C.: Compute unified device architecture programming guide, vol. 83, p. 129. NVIDIA, Santa Clara (2007).
[18]
Nvidia, C.: Cublas library, vol. 15. NVIDIA Corporation, Santa Clara (2008).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICA3PP'12: Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
September 2012
562 pages
ISBN:9783642330773
  • Editors:
  • Yang Xiang,
  • Ivan Stojmenovic,
  • Bernady O. Apduhan,
  • Guojun Wang,
  • Koji Nakano

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 04 September 2012

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media