skip to main content
10.1145/3447786.3456251acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Open access

Tahoe: tree structure-aware high performance inference engine for decision tree ensemble on GPU

Published: 21 April 2021 Publication History

Abstract

Decision trees are widely used and often assembled as a forest to boost prediction accuracy. However, using decision trees for inference on GPU is challenging, because of irregular memory access patterns and imbalance workloads across threads. This paper proposes Tahoe, a tree structure-aware high performance inference engine for decision tree ensemble. Tahoe rearranges tree nodes to enable efficient and coalesced memory accesses; Tahoe also rearranges trees, such that trees with similar structures are grouped together in memory and assigned to threads in a balanced way. Besides memory access efficiency, we introduce a set of inference strategies, each of which uses shared memory differently and has different implications on reduction overhead. We introduce performance models to guide the selection of the inference strategies for arbitrary forests and data set. Tahoe consistently outperforms the state-of-the-art industry-quality library FIL by 3.82x, 2.59x, and 2.75x on three generations of NVIDIA GPUs (Kepler, Pascal, and Volta), respectively.

References

[1]
Hervé Abdi. 2010. Coefficient of variation. Encyclopedia of research design 1 (2010), 169--171.
[2]
Yali Amit and Donald Geman. 1994. Randomized Inquiries About Shape: An Application to Handwritten Digit Recognition. Technical Report. CHICAGO UNIV IL DEPT OF STATISTICS.
[3]
Kellie J Archer and Ryan V Kimes. 2008. Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis 52, 4 (2008), 2249--2260.
[4]
A. Asuncion and D.J. Newman. 2007. UCI Machine Learning Repository. http://www.ics.uci.edu/$\sim$mlearn/{MLR}epository.html
[5]
Ahmad Taher Azar and Shereen M El-Metwally. 2013. Decision tree classifiers for automated medical diagnosis. Neural Computing and Applications 23, 7-8 (2013), 2387--2403.
[6]
Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning 11, 23--581 (2010), 81.
[7]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2, 3 (2011), 27.
[8]
Moses S Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 380--388.
[9]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785--794.
[10]
NVIDIA Corporation. 2020. RAPIDS | NVIDIA Developer. https://developer.nvidia.com/rapids. [Online; accessed 7-Jar-2020].
[11]
Qun Dai. 2013. A competitive ensemble pruning approach based on cross-validation technique. Knowledge-Based Systems 37 (2013), 394--414.
[12]
Dursun Delen, Cemil Kuzey, and Ali Uyar. 2013. Measuring firm performance using financial ratios: A decision tree approach. Expert Systems with Applications 40, 10 (2013), 3970--3983.
[13]
Wenqian Dong, Jie Liu, Zhen Xie, and Dong Li. 2019. Adaptive neural network-based approximation to accelerate eulerian fluid simulation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--22.
[14]
Wenqian Dong, Zhen Xie, Gokcen Kestor, and Dong Li. 2020. Smart-PGSim: using neural network to accelerate AC-OPF power grid simulation. arXiv preprint arXiv:2008.11827 (2020).
[15]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[16]
Christian Henke, Carsten Schmoll, and Tanja Zseby. 2008. Empirical evaluation of hash functions for multipoint measurements. ACM SIGCOMM Computer Communication Review 38, 3 (2008), 39--50.
[17]
Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, Vol. 1. IEEE, 278--282.
[18]
Karl Jansson, Håkan Sundell, and Henrik Boström. 2014. gpuRF and gpuERT: efficient and scalable GPU algorithms for decision tree ensembles. In 2014 IEEE International Parallel & Distributed Processing Symposium Workshops. IEEE, 1612--1621.
[19]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3146--3154.
[20]
Sotiris B Kotsiantis. 2013. Decision trees: a recent overview. Artificial Intelligence Review 39, 4 (2013), 261--283.
[21]
Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search. In ICCV, Vol. 9. 2130--2137.
[22]
Andy Liaw, Matthew Wiener, et al. 2002. Classification and regression by randomForest. R news 2, 3 (2002), 18--22.
[23]
Weifeng Liu and Brian Vinter. 2015. CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing. 339--350.
[24]
Wei-Yin Loh. 2014. Classification and regression tree methods. Wiley StatsRef: Statistics Reference Online (2014).
[25]
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2007. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In Proceedings of the 33rd international conference on Very large data bases. VLDB Endowment, 950--961.
[26]
Ke Meng, Jiajia Li, Guangming Tan, and Ninghui Sun. 2019. A pattern based algorithmic autotuner for graph processing on GPUs. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 201--213.
[27]
Aziz Nasridinov, Yangsun Lee, and Young-Ho Park. 2014. Decision tree construction on GPU: ubiquitous parallel computing approach. Computing 96, 5 (2014), 403--413.
[28]
Alexey Natekin and Alois Knoll. 2013. Gradient boosting machines, a tutorial. Frontiers in neurorobotics 7 (2013), 21.
[29]
NVIDIA. [n.d.]. Profiler User's Guide, v10.2.89, 2020.
[30]
NVlabs. 2018. CUB: a flexible library of cooperative threadblock primitives and other utilities for CUDA kernel programming. https://github.com/NVlabs/cub.
[31]
Thais Mayumi Oshiro, Pedro Santoro Perez, and José Augusto Baranauskas. 2012. How many trees in a random forest?. In International workshop on machine learning and data mining in pattern recognition. Springer, 154--168.
[32]
Mahesh Pal. 2005. Random forest classifier for remote sensing classification. International Journal of Remote Sensing 26, 1 (2005), 217--222.
[33]
Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, et al. 2018. Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications. arXiv preprint arXiv:1811.09886 (2018).
[34]
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems. 6638--6648.
[35]
Bin Ren, Gagan Agrawal, James R Larus, Todd Mytkowicz, Tomi Poutanen, and Wolfram Schulte. 2013. SIMD parallelization of applications that traverse irregular data structures. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 1--10.
[36]
Bin Ren, Todd Mytkowicz, and Gagan Agrawal. 2014. A portable optimization engine for accelerating irregular data-traversal applications on SIMD architectures. ACM Transactions on Architecture and Code Optimization (TACO) 11, 2 (2014), 1--31.
[37]
Shaoqing Ren, Xudong Cao, Yichen Wei, and Jian Sun. 2015. Global refinement of random forest. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 723--730.
[38]
Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. International Journal of Approximate Reasoning 50, 7 (2009), 969--978.
[39]
Robert E Schapire. 2003. The boosting approach to machine learning: An overview. In Nonlinear estimation and classification. Springer, 149--171.
[40]
Toby Sharp. 2008. Implementing decision trees and forests on a GPU. In European conference on computer vision. Springer, 595--608.
[41]
Si Si, Huan Zhang, S Sathiya Keerthi, Dhruv Mahajan, Inderjit S Dhillon, and Cho-Jui Hsieh. 2017. Gradient boosted decision trees for high dimensional sparse output. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 3182--3190.
[42]
Yan-Yan Song and LU Ying. 2015. Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry 27, 2 (2015), 130.
[43]
Jyoti Soni, Ujma Ansari, Dipesh Sharma, and Sunita Soni. 2011. Predictive data mining for medical diagnosis: An overview of heart disease prediction. International Journal of Computer Applications 17, 8 (2011), 43--48.
[44]
Souhaib Ben Taieb and Rob J Hyndman. 2014. A gradient boosting approach to the Kaggle load forecasting competition. International journal of forecasting 30, 2 (2014), 382--394.
[45]
Ferhan Ture, Tamer Elsayed, and Jimmy Lin. 2011. No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 943--952.
[46]
Zhen Xie, Zheng Cao, Zhan Wang, Dawei Zang, En Shao, and Ninghui Sun. 2016. Modeling traffic of big data platform for large scale data-center networks. In 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 224--231.
[47]
Zhen Xie, Guangming Tan, Weifeng Liu, and Ninghui Sun. 2019. IASpGEMM: An input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication. In Proceedings of the ACM International Conference on Supercomputing. 94--105.
[48]
Xie Zhen, Tan Guangming, and Sun Ninghui. 2021. Research on Optimal Performance of Sparse Matrix-Vector Multiplication and Convoulution Using the Probability-Process-Ram Model. Journal of Computer Research and Development 58, 3 (2021), 445.
[49]
Xie Zhen, Tan Guangming, Liu Weifeng, and Sun Ninghui. 2020. PRF: a process-RAM-feedback performance model to reveal bottlenecks and propose optimizations. HIGH TECHNOLOGY LETTERS 3 (2020), 285--298.
[50]
Ke Zhou, Shuang-Hong Yang, and Hongyuan Zha. 2011. Functional matrix factorizations for cold-start recommendation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 315--324.

Cited By

View all

Index Terms

  1. Tahoe: tree structure-aware high performance inference engine for decision tree ensemble on GPU

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      EuroSys '21: Proceedings of the Sixteenth European Conference on Computer Systems
      April 2021
      631 pages
      ISBN:9781450383349
      DOI:10.1145/3447786
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 April 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. decision tree ensemble
      2. decision tree inference
      3. performance model
      4. tree structure

      Qualifiers

      • Research-article

      Funding Sources

      • CNS
      • CCF

      Conference

      EuroSys '21
      Sponsor:
      EuroSys '21: Sixteenth European Conference on Computer Systems
      April 26 - 28, 2021
      Online Event, United Kingdom

      Acceptance Rates

      EuroSys '21 Paper Acceptance Rate 38 of 181 submissions, 21%;
      Overall Acceptance Rate 241 of 1,308 submissions, 18%

      Upcoming Conference

      EuroSys '25
      Twentieth European Conference on Computer Systems
      March 30 - April 3, 2025
      Rotterdam , Netherlands

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)216
      • Downloads (Last 6 weeks)39
      Reflects downloads up to 05 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media