research-article

ACRE: Accelerating Random Forests for Explainability

Authors:

Andrew McCrabb,

Valeria BertaccoAuthors Info & Claims

MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 1016 - 1028

https://doi.org/10.1145/3613424.3623788

Published: 08 December 2023 Publication History

Abstract

As machine learning models become more widespread, they are being increasingly applied in applications that heavily impact people’s lives (e.g., medical diagnoses, judicial system sentences, etc.). Several communities are thus calling for ML models to be not only accurate, but also explainable. To achieve this, recommendations must be augmented with explanations summarizing how each recommendation outcome is derived. Explainable Random Forest (XRF) models are popular choices in this space, as they are both very accurate and can be augmented with explainability functionality, allowing end-users to learn how and why a specific outcome was reached. However, the limitations of XRF models hamper their adoption, the foremost being the high computational demands associated with training such models to support high-accuracy classifications, while also annotating them with explainability meta-data.

In response, we present ACRE, a hardware accelerator to support XRF model training. ACRE accelerates key operations that bottleneck performance, while maintaining meta-data critical to support explainability. It leverages a novel Processing-in-Memory hardware unit, co-located with banks of a 3D-stacked High-Bandwidth Memory (HBM). The unit locally accelerates the execution of key training computations, boosting effective data-transfer bandwidth. Our evaluation shows that, when ACRE augments HBM3 memory, it yields an average system-level training performance improvement of 26.6x, compared to a baseline multicore processor solution with DDR4 memory. Further, ACRE yields a 2.5x improvement when compared to an HBM3 architecture baseline, increasing to 5x when not bottlenecked by a 16k-thread limit in the host. Finally, due to much higher performance, we observe that ACRE provides a 16.5x energy reduction overall, over a DDR baseline.

References

[1]

[n. d.]. Adult Census Dataset. https://archive.ics.uci.edu/ml/datasets/adult.

[2]

[n. d.]. English Premier League (EPL) Results. www.kaggle.com/datasets/irkaal/english-premier-league-results.

[3]

S. Aga, N. Jayasena, and M. Ignatowski. 2019. Co-ML: a case for Co llaborative ML acceleration using near-data processing. In Proc. MEMSYS.

[4]

J Ahn, S Hong, S Yoo, O Mutlu, and K Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proc. ISCA.

Digital Library

[5]

F. Alimoglu and E. Alpaydin. 1996. Methods of combining multiple classifiers based on different representations for pen-based handwritten digit recognition. In Proc. TAINN.

[6]

AMD. 2020. 3990X Specifications. www.amd.com/en/product/9111.

[7]

American Hospital Association. 2021. Total number of hospital outpatient visits in the U.S. from 1965 to 2019. https://www.statista.com/statistics/459744/total-outpatient-visit-numbers-in-the-us/.

[8]

C. Cheng and C. Bouganis. 2013. Accelerating random forest training process using FPGA. In Proc. FPL. IEEE.

[9]

M. Chirodea, O. Novac, C. Novac, N. Bizon, M. Oproescu, and C. Gordan. 2021. Comparison of tensorflow and pytorch in convolutional neural network-based applications. In Proc. ECAI.

[10]

S. Choi, D. Ko, S. Hwang, and Y. Choi. 2018. Memory-efficient random forest generation method for network intrusion detection. In Proc. ICUFN.

[11]

P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. 2009. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems (2009).

[12]

A. Dakkak, C. Li, J. Xiong, I. Gelado, and W. Hwu. 2019. Accelerating Reduction and Scan Using Tensor Core Units. In Proc. ISC.

[13]

M. Ghasemzadeh, S. Najafibisfar, and A. Amini. 2018. Ultra Low-power, High-speed Digital Comparator. In Proc. MIXDES.

[14]

F. Gieseke and C. Igel. 2018. Training big random forests with little resources. In Proc. SIGKDD.

[15]

B. Giridhar, M. Cieslak, D. Duggal, R. Dreslinski, H. Chen, R. Patti, B. Hold, C. Chakrabarti, T. Mudge, and D. Blaauw. 2013. Exploring DRAM Organizations for Energy-Efficient and Resilient Exascale Memories. In Proc. SC.

[16]

B. Goodman and S. Flaxman. 2017. European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation”. AI Magazine (2017).

[17]

H. Grahn, N. Lavesson, M. H. Lapajne, and D. Slat. 2011. CudaRF: A CUDA-based implementation of Random Forests. In Proc. AICCSA.

[18]

M. He, C. Song, I. Kim, C. Jeong, S. Kim, I. Park, M. Thottethodi, and T. Vijaykumar. 2020. Newton: A DRAM-maker’s Accelerator-in-Memory (AiM) Architecture for Machine Learning. In Proc. MICRO.

[19]

Intel. 2021. Xeon 8380 Specifications. www.intel.com/content/www/us/en/products/sku/212287/intel-xeon-platinum-8380-processor-60m-cache-2-30-ghz/specifications.html.

[20]

JEDEC. 2021. DDR4 SDRAM STANDARD. (2021).

[21]

JEDEC. 2021. High Bandwidth Memory (HBM) DRAM. (2021).

[22]

JEDEC. 2023. High Bandwidth Memory (HBM3) DRAM. (2023).

[23]

Kaggle. 2021. State of Data Science and Machine Learning. https://www.kaggle.com/kaggle-survey-2021

[24]

M. Kang, S. Gonugondla, S. Lim, and N. Shanbhag. 2018. A 19.4-nJ/decision, 364-K decisions/s, in-memory random forest multi-class inference accelerator. IEEE Trans. SSC (2018).

[25]

T. Kari, N. Leelavani, R. Dhanushree, K. Jagannatha, and S. Natarajan. 2021. An Accelerated Approach to Parallel Ensemble Techniques Targeting Healthcare and Environmental Applications. In Proc. ICEPE.

[26]

J. Kim and Y. Kim. 2014. HBM: Memory solution for bandwidth-hungry processors. In Proc. HCS.

[27]

J. H. Kim, S. Kang, S. Lee, H. Kim, W. Song, Y. Ro, S. Lee, D. Wang, H. Shin, B. Phuah, J. Choi, J. So, Y. Cho, J. Song, J. Choi, J. Cho, K. Sohn, Y. Sohn, K. Park, and N. Kim. 2021. Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ML accelerators and beyond. In Proc. HCS.

[28]

Y. Kim, W. Yang, and O. Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Computer Architecture Letters (2016).

[29]

Y. Kwon, Y. Lee, and M. Rhu. 2019. Tensordimm: A practical near-memory processing architecture for embeddings and tensor operations in deep learning. In Proc. MICRO.

[30]

Y. LeCun, C. Cortes, and C. Burges. 2010. MNIST handwritten digit database. (2010).

[31]

S. Lee, S. Kang, J. Lee, H. Kim, E. Lee, S. Seo, H. Yoon, S. Lee, K. Lim, H. Shin, J. Kim, O Seongil, A. Iyer, D. Wang, K. Sohn, and N. S. Kim. 2021. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product. In Proc. ISCA.

Digital Library

[32]

S. Li, J. Ahn, R. Strong, J. Brockman, D. Tullsen, and N. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proc. MICRO.

[33]

K. P. Lu Thi, N. C. Vo Thi, and N. H. Phung. 2015. Extracting Rule RF in Educational Data Classification: From a Random Forest to Interpretable Refined Rules. In Proc. ACOMP.

Digital Library

[34]

S. Lundberg, G. Erion, H. Chen, A. DeGrave, J. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S. Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence (2020).

[35]

A. McCrabb, E. Winsor, and V. Bertacco. 2019. Dredge: Dynamic repartitioning during dynamic graph execution. In Proc. DAC.

[36]

E. Menendez, D. Maduike, R. Garg, and S. Khatri. 2006. CMOS comparators for high-speed and low-power applications. In Proc. ICCD.

[37]

I. Mollas, N. Bassiliades, I. Vlahavas, and G. Tsoumakas. 2019. LionForests: local interpretation of random forests. arXiv (2019).

[38]

H. Nakahara, A. Jinguji, T. Fujii, and S. Sato. 2016. An acceleration of a random forest classification using Altera SDK for OpenCL. In Proc. FPT.

[39]

R. Narayanan, D. Honbo, G. Memik, A. Choudhary, and J. Zambreno. 2007. An FPGA implementation of decision tree classification. In Proc. DATE.

[40]

M Neto and F Paulovich. 2020. Explainable matrix-visualization for global and local interpretability of random forest classification ensembles. IEEE Trans. Vis. Comput. Graph. (2020).

[41]

NVIDIA. 2017. 1080Ti User Guide. https://www.nvidia.com/content/geforce-gtx/GTX_1080_User_Guide.pdf.

[42]

NVIDIA. 2021. 3080Ti. https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3080-3080ti/.

[43]

T. Oshiro, P. Perez, and J. Baranauskas. 2012. How many trees in a random forest?. In Proc. MLDM.

[44]

M. O’Connor, N. Chatterjee, D. Lee, J. Wilson, A. Agrawal, S. Keckler, and W. Dally. 2017. Fine-grained DRAM: Energy-efficient DRAM for extreme bandwidth systems. In Proc. MICRO.

[45]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. JMLR (2011).

Digital Library

[46]

D. Petkovic, R. Altman, M. Wong, and A. Vigil. 2018. Improving the explainability of Random Forest classifier–user centered approach. In Proc. PSB.

[47]

P. Poorheravi and V. Gaudet. 2022. FPGA-Based Architectures for Random Forest Acceleration. In Proc. MWSCAS.

[48]

K. Prabhala and P. Raju. 2019. An Implementation of 32 BIT CMOS Comparator in Mentor EDA Tools. IJEE (2019).

[49]

S. Raschka, J. Patterson, and C. Nolet. 2020. Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information (2020).

[50]

M. Roberts, D. Driggs, M. Thorpe, J. Gilbey, M. Yeung, S. Ursprung, A. Aviles-Rivero, C. Etmann, C. McCague, and L. Beer. 2021. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nature Machine Intelligence (2021).

[51]

SafeAtLast. 2023. Lawsuit Statistics To Intrigue You in 2023. https://safeatlast.co/blog/lawsuit-statistics.

[52]

O. Sagi and L. Rokach. 2020. Explainable decision forest: Transforming a decision forest into an interpretable tree. Trans. IF (2020).

[53]

D. Sanchez and C. Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. SIGARCH Computer Arch. News (2013).

[54]

A. Shilov. 2016. JEDEC Publishes HBM2 Specification as Samsung Begins Mass Production of Chips. https://www.anandtech.com/show/9969/jedec-publishes-hbm2-specification.

[55]

G. Singh, J. Gómez-Luna, G. Mariani, G. Oliveira, S. Corda, S. Stuijk, O. Mutlu, and H. Corporaal. 2019. Napel: Near-memory computing application performance prediction via ensemble learning. In Proc. DAC.

[56]

A. Stillmaker and B. Baas. 2017. Scaling equations for the accurate prediction of CMOS device performance from 180nm to 7nm. Trans. Integration (2017).

[57]

R Struharik. 2015. Decision tree ensemble hardware accelerators for embedded applications. In Proc. SISY.

[58]

Synopsys. 2022. HBM3 Controller. https://www.synopsys.com/dw/doc.php/ds/c/dwc_hbm3_controller_ds.pdf.

[59]

K. Tran. 2016. The era of high bandwidth memory. In Proc. HCS.

[60]

B. Van Essen, C. Macaraeg, M. Gokhale, and R. Prenger. 2012. Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA?. In Proc. FCCM.

[61]

M. Wright and A. Ziegler. 2017. Ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Trans. JSS (2017).

[62]

L. Zhao, Q. Deng, Y. Zhang, and J. Yang. 2019. RFAcc: A 3D ReRAM associative array based random forest accelerator. In Proc. ISC.

[63]

X. Zhao, Y. Wu, D. Lee, and W. Cui. 2019. iForest: Interpreting Random Forests via Visual Analytics. IEEE Trans. Vis. Comput. Graph. (2019).

Index Terms

ACRE: Accelerating Random Forests for Explainability
1. Computer systems organization
2. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Index terms have been assigned to the content through auto-classification.

Recommendations

Accelerating PQMRCGSTAB algorithm on GPU
UCHPC-MAW '09: Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop

The general computations on GPU are becoming more and more popular because of GPU's powerful computing ability. In this paper, how to use GPU to accelerate sparse linear system solver, preconditioned QMRCGSTAB (PQMRCGSTAB for short), is our concern. We ...
Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA?
FCCM '12: Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines

Random forest classification is a well known machine learning technique that generates classifiers in the form of an ensemble ("forest") of decision trees. The classification of an input sample is determined by the majority classification by the ...
Accelerating simulation of agent-based models on heterogeneous architectures
GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units

The wide usage of GPGPU programming models and compiler techniques enables the optimization of data-parallel programs on commodity GPUs. However, mapping GPGPU applications running on discrete parts to emerging integrated heterogeneous architectures ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

October 2023

1528 pages

ISBN:9798400703294

DOI:10.1145/3613424

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

SRC and DARPA

Conference

MICRO '23

Sponsor:

SIGMICRO

MICRO '23: 56th Annual IEEE/ACM International Symposium on Microarchitecture

October 28 - November 1, 2023

ON, Toronto, Canada

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
265
Total Downloads

Downloads (Last 12 months)212
Downloads (Last 6 weeks)14

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents