skip to main content
research-article

AxOTreeS: A Tree Search Approach to Synthesizing FPGA-based Approximate Operators

Published: 09 September 2023 Publication History

Abstract

Approximate computing (AxC) provides the scope for achieving disproportionate gains in a system’s power, performance, and area (PPA) metrics by leveraging an application’s inherent error-resilient behavior (BEHAV). Trading computational accuracy for performance gains makes AxC an attractive proposition for implementing computationally complex AI/ML-based applications on resource-constrained embedded systems. The growing diversity of application domains using AI/ML has also led to the increasing usage of FPGA-based embedded systems. However, implementing AxC for FPGAs has primarily been limited to the post-processing of ASIC-optimized approximate operators (AxOs). This approach usually involves selecting from a set of AxOs that have been optimized for a gate-based implementation in an ASIC. While such an approach does allow leveraging existing knowledge of ASIC-based AxO design, it limits the scope for considering the challenges and opportunities associated with FPGA’s LUT-based computation structures. Similarly, the few works considering the LUT-based computing for AxO design use generic optimization approaches that do not allow integrating problem-specific prior knowledge—empirical and/or statistical. To this end, we propose a novel tree search-based approach to AxO synthesis for FPGAs. Specifically, we present a design methodology using Monte Carlo Tree Search (MCTS)-based search tree traversal that allows the designer to integrate statistical data, such as correlation, into the AxOs optimization. With the proposed methods, we report improvements over standard MCTS algorithm-based results as well as improved hypervolume for both operator-level and application-specific DSE, compared to state-of-the-art design methodologies.

References

[1]
Sara Achour and Martin C. Rinard. 2015. Approximate computation with outlier detection in topaz. Acm Sigplan Notices 50, 10 (2015), 711–730.
[2]
Francesco Biscani and Dario Izzo. 2020. A parallel global multiobjective framework for optimization: Pagmo. Journal of Open Source Software 5, 53 (2020), 2338. DOI:
[3]
Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games 4, 1 (2012), 1–43. DOI:
[4]
Vincent Camus, Christian Enz, and Marian Verhelst. 2019. Survey of precision-scalable multiply-accumulate units for neural-network processing. In 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS’19). 57–61. DOI:
[5]
Vinay Kumar Chippa, Debabrata Mohapatra, Kaushik Roy, Srimat T. Chakradhar, and Anand Raghunathan. 2014. Scalable effort hardware design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 9 (2014), 2004–2016.
[6]
Siddharth Gupta, Salim Ullah, Kapil Ahuja, Aruna Tiwari, and Akash Kumar. 2020. Align: A highly accurate adaptive layerwise log_2_lead quantization of pre-trained neural networks. IEEE Access 8 (2020), 118899–118911.
[7]
Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics 4, 2 (1968), 100–107. DOI:
[8]
Soheil Hashemi, R. Iris Bahar, and Sherief Reda. 2015. DRUM: A dynamic range unbiased multiplier for approximate applications. In 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 418–425.
[9]
Hou-Jen Ko and Shen-Fu Hsiao. 2011. Design and application of faithfully rounded and truncated multipliers with combined deletion, reduction, truncation, and rounding. IEEE Transactions on Circuits and Systems II: Express Briefs 58, 5 (2011), 304–308. DOI:
[10]
Parag Kulkarni, Puneet Gupta, and Milos Ercegovac. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In 2011 24th Internatioal Conference on VLSI Design. 346–351. DOI:
[11]
Anji Liu, Yitao Liang, Ji Liu, Guy Van den Broeck, and Jianshu Chen. 2020. On effective parallelization of Monte Carlo tree search. arXiv preprint arXiv:2006.08785 (2020).
[12]
Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765–4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
[13]
Joshua San Miguel, Jorge Albericio, Andreas Moshovos, and Natalie Enright Jerger. 2015. Doppelgänger: A cache for approximate computing. In Proceedings of the 48th International Symposium on Microarchitecture. 50–61.
[14]
Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Comput. Surv. 48, 4, Article 62 (March2016), 33 pages. DOI:
[15]
Vojtech Mrazek, Muhammad Abdullah Hanif, Zdenek Vasicek, Lukas Sekanina, and Muhammad Shafique. 2019. AutoAx: An automatic design space exploration and circuit building methodology utilizing libraries of approximate components. In Proceedings of the 56th Annual Design Automation Conference 2019 (Las Vegas, NV, USA) (DAC’19). Association for Computing Machinery, New York, NY, USA, Article 123, 6 pages. DOI:
[16]
Vojtech Mrazek, Radek Hrbacek, Zdenek Vasicek, and Lukas Sekanina. 2017. EvoApprox8b: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods. In Design, Automation Test in Europe Conference Exhibition (DATE), 2017. 258–261. DOI:
[17]
Vojtech Mrazek, Syed Shakib Sarwar, Lukas Sekanina, Zdenek Vasicek, and Kaushik Roy. 2016. Design of power-efficient approximate multipliers for approximate artificial neural networks. In Proceedings of the 35th International Conference on Computer-Aided Design (Austin, Texas) (ICCAD’16). Association for Computing Machinery, New York, NY, USA, Article 81, 7 pages. DOI:
[18]
Vojtech Mrazek, Lukas Sekanina, and Zdenek Vasicek. 2020. Libraries of approximate circuits: Automated design and application in CNN accelerators. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 10, 4 (2020), 406–418. DOI:
[19]
Jiapu Pan and Willis J. Tompkins. 1985. A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering BME-32, 3 (1985), 230–236. DOI:
[20]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[21]
Nicola Petra, Davide De Caro, Valeria Garofalo, Ettore Napoli, and Antonio G. M. Strollo. 2010. Truncated binary multipliers with variable correction and minimum mean square error. IEEE Transactions on Circuits and Systems I: Regular Papers 57, 6 (2010), 1312–1325. DOI:
[22]
Aleksandra Płońska and Piotr Płoński. 2021. MLJAR: State-of-the-art Automated Machine Learning Framework for Tabular Data. Version 0.10.3. https://github.com/mljar/mljar-supervised
[23]
Bharath Srinivas Prabakaran, Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina, and Muhammad Shafique. 2020. ApproxFPGAs: Embracing ASIC-based approximate arithmetic components for FPGA-based systems. In 2020 57th ACM/IEEE Design Automation Conference (DAC’20). DOI:
[24]
Bharath Srinivas Prabakaran, Semeen Rehman, Muhammad Abdullah Hanif, Salim Ullah, Ghazal Mazaheri, Akash Kumar, and Muhammad Shafique. 2018. DeMAS: An efficient design methodology for building approximate adders for FPGA-based systems. In 2018 Design, Automation Test in Europe Conference Exhibition (DATE). 917–920. DOI:
[25]
Rohit Ranjan, Salim Ullah, Siva Satyendra Sahoo, and Akash Kumar. 2023. SyFAxO-GeN: Synthesizing FPGA-based approximate operators with generative networks. In Proceedings of the 28th Asia and South Pacific Design Automation Conference (Tokyo, Japan) (ASPDAC’23). Association for Computing Machinery, New York, NY, USA, 402–409. DOI:
[26]
Semeen Rehman, Walaa El-Harouni, Muhammad Shafique, Akash Kumar, Jorg Henkel, and Jörg Henkel. 2016. Architectural-space exploration of approximate multipliers. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’16). 1–8. DOI:
[27]
Kamil Rocki and Reiji Suda. 2011. Large-scale parallel Monte Carlo tree search on GPU. In 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum. 2034–2037. DOI:
[28]
Siva Satyendra Sahoo and Akash Kumar. 2021. Using Monte Carlo tree search for EDA – A case-study with designing cross-layer reliability for heterogeneous embedded systems. In 2021 IFIP/IEEE 29th International Conference on Very Large Scale Integration (VLSI-SoC). 1–6. DOI:
[29]
Ilaria Scarabottolo, Giovanni Ansaloni, and Laura Pozzi. 2018. Circuit carving: A methodology for the design of approximate hardware. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE’18). 545–550. DOI:
[30]
Muhammad Shafique, Waqas Ahmad, Rehan Hafiz, and Jörg Henkel. 2015. A low latency generic accuracy configurable adder. In Proceedings of the 52nd Annual Design Automation Conference (San Francisco, California) (DAC’15). Association for Computing Machinery, New York, NY, USA, Article 86, 6 pages. DOI:
[31]
Muhammad Shafique, Theocharis Theocharides, Vijay Janapa Reddy, and Boris Murmann. 2021. TinyML: Current progress, research challenges, and future roadmap. In 2021 58th ACM/IEEE Design Automation Conference (DAC’21). 1303–1306. DOI:
[32]
Salim Ullah and Akash Kumar. 2023. Introduction: Approximate Arithmetic Circuit Architectures for FPGA-based Systems. Springer International Publishing, Cham, 1–26. DOI:
[33]
Salim Ullah, Sanjeev Sripadraj Murthy, and Akash Kumar. 2018. SMApproxlib: Library of FPGA-based approximate multipliers. In Proceedings of the 55th Annual Design Automation Conference (San Francisco, California) (DAC’18). Association for Computing Machinery, New York, NY, USA, Article 157, 6 pages. DOI:
[34]
Salim Ullah, Tuan Duy Anh Nguyen, and Akash Kumar. 2020. Energy-efficient low-latency signed multiplier for FPGA-based hardware accelerators. IEEE Embedded Systems Letters 13, 2 (2020), 41–44.
[35]
Salim Ullah, Semeen Rehman, Muhammad Shafique, and Akash Kumar. 2021. High-performance accurate and approximate multipliers for FPGA-based hardware accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021), 1–1. DOI:
[36]
Salim Ullah, Siva Satyendra Sahoo, Nemath Ahmed, Debabrata Chaudhury, and Akash Kumar. 2022. AppAxO: Designing application-specific approximate operators for FPGA-based embedded systems. ACM Transactions on Embedded Computing Systems (TECS) (2022).
[37]
Salim Ullah, Siva Satyendra Sahoo, and Akash Kumar. 2023. CoOAx: Correlation-aware synthesis of FPGA-based approximate operators. In Proceedings of the Great Lakes Symposium on VLSI 2023 (Knoxville, TN, USA) (GLSVLSI’23). Association for Computing Machinery, New York, NY, USA, 671–677. DOI:
[38]
Salim Ullah, Hendrik Schmidl, Siva Satyendra Sahoo, Semeen Rehman, and Akash Kumar. 2021. Area-optimized accurate and approximate softcore signed multiplier architectures. IEEE Trans. Comput. 70, 3 (2021), 384–392. DOI:
[39]
Swagath Venkataramani, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2015. Computing approximately, and efficiently. In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE’15). IEEE, 748–751.
[40]
E. George Walters. 2014. Partial-product generation and addition for multiplication in FPGAs with 6-input LUTs. In 2014 48th Asilomar Conference on Signals, Systems and Computers. 1247–1251. DOI:
[41]
E. George Walters. 2016. Array multipliers for high throughput in xilinx FPGAs with 6-Input LUTs. Computers 5, 4 (2016).
[42]
Shibo Wang and Pankaj Kanwar. 2019. BFloat16: The secret to high performance on cloud TPUs. Google Cloud Blog (2019).
[43]
Rong Ye, Ting Wang, Feng Yuan, Rakesh Kumar, and Qiang Xu. 2013. On reconfiguration-oriented approximate adder design and its application. In 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’13). 48–54. DOI:
[44]
Shihui Yin, Gaurav Srivastava, Shreyas K. Venkataramanaiah, Chaitali Chakrabarti, Visar Berisha, and Jae-sun Seo. 2017. Minimizing area and energy of deep learning hardware design using collective low precision and structured compression. In 2017 51st Asilomar Conference on Signals, Systems, and Computers. IEEE, 1907–1911.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 22, Issue 5s
Special Issue ESWEEK 2023
October 2023
1394 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3614235
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 09 September 2023
Accepted: 13 July 2023
Revised: 02 June 2023
Received: 23 March 2023
Published in TECS Volume 22, Issue 5s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Approximate computing
  2. arithmetic operator design
  3. circuit synthesis
  4. AI-based exploration
  5. computer arithmetic
  6. automated hardware design
  7. Monte Carlo Tree Search

Qualifiers

  • Research-article

Funding Sources

  • Deutsche Forschungsgemeinschaft (DFG)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)152
  • Downloads (Last 6 weeks)3
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media