skip to main content
10.1145/3613424.3623788acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

ACRE: Accelerating Random Forests for Explainability

Published: 08 December 2023 Publication History

Abstract

As machine learning models become more widespread, they are being increasingly applied in applications that heavily impact people’s lives (e.g., medical diagnoses, judicial system sentences, etc.). Several communities are thus calling for ML models to be not only accurate, but also explainable. To achieve this, recommendations must be augmented with explanations summarizing how each recommendation outcome is derived. Explainable Random Forest (XRF) models are popular choices in this space, as they are both very accurate and can be augmented with explainability functionality, allowing end-users to learn how and why a specific outcome was reached. However, the limitations of XRF models hamper their adoption, the foremost being the high computational demands associated with training such models to support high-accuracy classifications, while also annotating them with explainability meta-data.
In response, we present ACRE, a hardware accelerator to support XRF model training. ACRE accelerates key operations that bottleneck performance, while maintaining meta-data critical to support explainability. It leverages a novel Processing-in-Memory hardware unit, co-located with banks of a 3D-stacked High-Bandwidth Memory (HBM). The unit locally accelerates the execution of key training computations, boosting effective data-transfer bandwidth. Our evaluation shows that, when ACRE augments HBM3 memory, it yields an average system-level training performance improvement of 26.6x, compared to a baseline multicore processor solution with DDR4 memory. Further, ACRE yields a 2.5x improvement when compared to an HBM3 architecture baseline, increasing to 5x when not bottlenecked by a 16k-thread limit in the host. Finally, due to much higher performance, we observe that ACRE provides a 16.5x energy reduction overall, over a DDR baseline.

References

[1]
[n. d.]. Adult Census Dataset. https://archive.ics.uci.edu/ml/datasets/adult.
[2]
[n. d.]. English Premier League (EPL) Results. www.kaggle.com/datasets/irkaal/english-premier-league-results.
[3]
S. Aga, N. Jayasena, and M. Ignatowski. 2019. Co-ML: a case for Co llaborative ML acceleration using near-data processing. In Proc. MEMSYS.
[4]
J Ahn, S Hong, S Yoo, O Mutlu, and K Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proc. ISCA.
[5]
F. Alimoglu and E. Alpaydin. 1996. Methods of combining multiple classifiers based on different representations for pen-based handwritten digit recognition. In Proc. TAINN.
[6]
AMD. 2020. 3990X Specifications. www.amd.com/en/product/9111.
[7]
American Hospital Association. 2021. Total number of hospital outpatient visits in the U.S. from 1965 to 2019. https://www.statista.com/statistics/459744/total-outpatient-visit-numbers-in-the-us/.
[8]
C. Cheng and C. Bouganis. 2013. Accelerating random forest training process using FPGA. In Proc. FPL. IEEE.
[9]
M. Chirodea, O. Novac, C. Novac, N. Bizon, M. Oproescu, and C. Gordan. 2021. Comparison of tensorflow and pytorch in convolutional neural network-based applications. In Proc. ECAI.
[10]
S. Choi, D. Ko, S. Hwang, and Y. Choi. 2018. Memory-efficient random forest generation method for network intrusion detection. In Proc. ICUFN.
[11]
P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. 2009. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems (2009).
[12]
A. Dakkak, C. Li, J. Xiong, I. Gelado, and W. Hwu. 2019. Accelerating Reduction and Scan Using Tensor Core Units. In Proc. ISC.
[13]
M. Ghasemzadeh, S. Najafibisfar, and A. Amini. 2018. Ultra Low-power, High-speed Digital Comparator. In Proc. MIXDES.
[14]
F. Gieseke and C. Igel. 2018. Training big random forests with little resources. In Proc. SIGKDD.
[15]
B. Giridhar, M. Cieslak, D. Duggal, R. Dreslinski, H. Chen, R. Patti, B. Hold, C. Chakrabarti, T. Mudge, and D. Blaauw. 2013. Exploring DRAM Organizations for Energy-Efficient and Resilient Exascale Memories. In Proc. SC.
[16]
B. Goodman and S. Flaxman. 2017. European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation”. AI Magazine (2017).
[17]
H. Grahn, N. Lavesson, M. H. Lapajne, and D. Slat. 2011. CudaRF: A CUDA-based implementation of Random Forests. In Proc. AICCSA.
[18]
M. He, C. Song, I. Kim, C. Jeong, S. Kim, I. Park, M. Thottethodi, and T. Vijaykumar. 2020. Newton: A DRAM-maker’s Accelerator-in-Memory (AiM) Architecture for Machine Learning. In Proc. MICRO.
[19]
Intel. 2021. Xeon 8380 Specifications. www.intel.com/content/www/us/en/products/sku/212287/intel-xeon-platinum-8380-processor-60m-cache-2-30-ghz/specifications.html.
[20]
JEDEC. 2021. DDR4 SDRAM STANDARD. (2021).
[21]
JEDEC. 2021. High Bandwidth Memory (HBM) DRAM. (2021).
[22]
JEDEC. 2023. High Bandwidth Memory (HBM3) DRAM. (2023).
[23]
Kaggle. 2021. State of Data Science and Machine Learning. https://www.kaggle.com/kaggle-survey-2021
[24]
M. Kang, S. Gonugondla, S. Lim, and N. Shanbhag. 2018. A 19.4-nJ/decision, 364-K decisions/s, in-memory random forest multi-class inference accelerator. IEEE Trans. SSC (2018).
[25]
T. Kari, N. Leelavani, R. Dhanushree, K. Jagannatha, and S. Natarajan. 2021. An Accelerated Approach to Parallel Ensemble Techniques Targeting Healthcare and Environmental Applications. In Proc. ICEPE.
[26]
J. Kim and Y. Kim. 2014. HBM: Memory solution for bandwidth-hungry processors. In Proc. HCS.
[27]
J. H. Kim, S. Kang, S. Lee, H. Kim, W. Song, Y. Ro, S. Lee, D. Wang, H. Shin, B. Phuah, J. Choi, J. So, Y. Cho, J. Song, J. Choi, J. Cho, K. Sohn, Y. Sohn, K. Park, and N. Kim. 2021. Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ML accelerators and beyond. In Proc. HCS.
[28]
Y. Kim, W. Yang, and O. Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Computer Architecture Letters (2016).
[29]
Y. Kwon, Y. Lee, and M. Rhu. 2019. Tensordimm: A practical near-memory processing architecture for embeddings and tensor operations in deep learning. In Proc. MICRO.
[30]
Y. LeCun, C. Cortes, and C. Burges. 2010. MNIST handwritten digit database. (2010).
[31]
S. Lee, S. Kang, J. Lee, H. Kim, E. Lee, S. Seo, H. Yoon, S. Lee, K. Lim, H. Shin, J. Kim, O Seongil, A. Iyer, D. Wang, K. Sohn, and N. S. Kim. 2021. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product. In Proc. ISCA.
[32]
S. Li, J. Ahn, R. Strong, J. Brockman, D. Tullsen, and N. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proc. MICRO.
[33]
K. P. Lu Thi, N. C. Vo Thi, and N. H. Phung. 2015. Extracting Rule RF in Educational Data Classification: From a Random Forest to Interpretable Refined Rules. In Proc. ACOMP.
[34]
S. Lundberg, G. Erion, H. Chen, A. DeGrave, J. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S. Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence (2020).
[35]
A. McCrabb, E. Winsor, and V. Bertacco. 2019. Dredge: Dynamic repartitioning during dynamic graph execution. In Proc. DAC.
[36]
E. Menendez, D. Maduike, R. Garg, and S. Khatri. 2006. CMOS comparators for high-speed and low-power applications. In Proc. ICCD.
[37]
I. Mollas, N. Bassiliades, I. Vlahavas, and G. Tsoumakas. 2019. LionForests: local interpretation of random forests. arXiv (2019).
[38]
H. Nakahara, A. Jinguji, T. Fujii, and S. Sato. 2016. An acceleration of a random forest classification using Altera SDK for OpenCL. In Proc. FPT.
[39]
R. Narayanan, D. Honbo, G. Memik, A. Choudhary, and J. Zambreno. 2007. An FPGA implementation of decision tree classification. In Proc. DATE.
[40]
M Neto and F Paulovich. 2020. Explainable matrix-visualization for global and local interpretability of random forest classification ensembles. IEEE Trans. Vis. Comput. Graph. (2020).
[41]
NVIDIA. 2017. 1080Ti User Guide. https://www.nvidia.com/content/geforce-gtx/GTX_1080_User_Guide.pdf.
[42]
NVIDIA. 2021. 3080Ti. https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3080-3080ti/.
[43]
T. Oshiro, P. Perez, and J. Baranauskas. 2012. How many trees in a random forest?. In Proc. MLDM.
[44]
M. O’Connor, N. Chatterjee, D. Lee, J. Wilson, A. Agrawal, S. Keckler, and W. Dally. 2017. Fine-grained DRAM: Energy-efficient DRAM for extreme bandwidth systems. In Proc. MICRO.
[45]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. JMLR (2011).
[46]
D. Petkovic, R. Altman, M. Wong, and A. Vigil. 2018. Improving the explainability of Random Forest classifier–user centered approach. In Proc. PSB.
[47]
P. Poorheravi and V. Gaudet. 2022. FPGA-Based Architectures for Random Forest Acceleration. In Proc. MWSCAS.
[48]
K. Prabhala and P. Raju. 2019. An Implementation of 32 BIT CMOS Comparator in Mentor EDA Tools. IJEE (2019).
[49]
S. Raschka, J. Patterson, and C. Nolet. 2020. Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information (2020).
[50]
M. Roberts, D. Driggs, M. Thorpe, J. Gilbey, M. Yeung, S. Ursprung, A. Aviles-Rivero, C. Etmann, C. McCague, and L. Beer. 2021. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nature Machine Intelligence (2021).
[51]
SafeAtLast. 2023. Lawsuit Statistics To Intrigue You in 2023. https://safeatlast.co/blog/lawsuit-statistics.
[52]
O. Sagi and L. Rokach. 2020. Explainable decision forest: Transforming a decision forest into an interpretable tree. Trans. IF (2020).
[53]
D. Sanchez and C. Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. SIGARCH Computer Arch. News (2013).
[54]
A. Shilov. 2016. JEDEC Publishes HBM2 Specification as Samsung Begins Mass Production of Chips. https://www.anandtech.com/show/9969/jedec-publishes-hbm2-specification.
[55]
G. Singh, J. Gómez-Luna, G. Mariani, G. Oliveira, S. Corda, S. Stuijk, O. Mutlu, and H. Corporaal. 2019. Napel: Near-memory computing application performance prediction via ensemble learning. In Proc. DAC.
[56]
A. Stillmaker and B. Baas. 2017. Scaling equations for the accurate prediction of CMOS device performance from 180nm to 7nm. Trans. Integration (2017).
[57]
R Struharik. 2015. Decision tree ensemble hardware accelerators for embedded applications. In Proc. SISY.
[58]
Synopsys. 2022. HBM3 Controller. https://www.synopsys.com/dw/doc.php/ds/c/dwc_hbm3_controller_ds.pdf.
[59]
K. Tran. 2016. The era of high bandwidth memory. In Proc. HCS.
[60]
B. Van Essen, C. Macaraeg, M. Gokhale, and R. Prenger. 2012. Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA?. In Proc. FCCM.
[61]
M. Wright and A. Ziegler. 2017. Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Trans. JSS (2017).
[62]
L. Zhao, Q. Deng, Y. Zhang, and J. Yang. 2019. RFAcc: A 3D ReRAM associative array based random forest accelerator. In Proc. ISC.
[63]
X. Zhao, Y. Wu, D. Lee, and W. Cui. 2019. iForest: Interpreting Random Forests via Visual Analytics. IEEE Trans. Vis. Comput. Graph. (2019).

Index Terms

  1. ACRE: Accelerating Random Forests for Explainability
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
        October 2023
        1528 pages
        ISBN:9798400703294
        DOI:10.1145/3613424
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 08 December 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • SRC and DARPA

        Conference

        MICRO '23
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 484 of 2,242 submissions, 22%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 265
          Total Downloads
        • Downloads (Last 12 months)212
        • Downloads (Last 6 weeks)14
        Reflects downloads up to 06 Jan 2025

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media