research-article

AxOTreeS: A Tree Search Approach to Synthesizing FPGA-based Approximate Operators

Authors:

Siva Satyendra Sahoo,

Akash KumarAuthors Info & Claims

ACM Transactions on Embedded Computing Systems, Volume 22, Issue 5s

Article No.: 101, Pages 1 - 26

https://doi.org/10.1145/3609096

Published: 09 September 2023 Publication History

Abstract

Approximate computing (AxC) provides the scope for achieving disproportionate gains in a system’s power, performance, and area (PPA) metrics by leveraging an application’s inherent error-resilient behavior (BEHAV). Trading computational accuracy for performance gains makes AxC an attractive proposition for implementing computationally complex AI/ML-based applications on resource-constrained embedded systems. The growing diversity of application domains using AI/ML has also led to the increasing usage of FPGA-based embedded systems. However, implementing AxC for FPGAs has primarily been limited to the post-processing of ASIC-optimized approximate operators (AxOs). This approach usually involves selecting from a set of AxOs that have been optimized for a gate-based implementation in an ASIC. While such an approach does allow leveraging existing knowledge of ASIC-based AxO design, it limits the scope for considering the challenges and opportunities associated with FPGA’s LUT-based computation structures. Similarly, the few works considering the LUT-based computing for AxO design use generic optimization approaches that do not allow integrating problem-specific prior knowledge—empirical and/or statistical. To this end, we propose a novel tree search-based approach to AxO synthesis for FPGAs. Specifically, we present a design methodology using Monte Carlo Tree Search (MCTS)-based search tree traversal that allows the designer to integrate statistical data, such as correlation, into the AxOs optimization. With the proposed methods, we report improvements over standard MCTS algorithm-based results as well as improved hypervolume for both operator-level and application-specific DSE, compared to state-of-the-art design methodologies.

References

[1]

Sara Achour and Martin C. Rinard. 2015. Approximate computation with outlier detection in topaz. Acm Sigplan Notices 50, 10 (2015), 711–730.

Digital Library

[2]

Francesco Biscani and Dario Izzo. 2020. A parallel global multiobjective framework for optimization: Pagmo. Journal of Open Source Software 5, 53 (2020), 2338. DOI:

[3]

Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games 4, 1 (2012), 1–43. DOI:

[4]

Vincent Camus, Christian Enz, and Marian Verhelst. 2019. Survey of precision-scalable multiply-accumulate units for neural-network processing. In 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS’19). 57–61. DOI:

[5]

Vinay Kumar Chippa, Debabrata Mohapatra, Kaushik Roy, Srimat T. Chakradhar, and Anand Raghunathan. 2014. Scalable effort hardware design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 9 (2014), 2004–2016.

[6]

Siddharth Gupta, Salim Ullah, Kapil Ahuja, Aruna Tiwari, and Akash Kumar. 2020. Align: A highly accurate adaptive layerwise log_2_lead quantization of pre-trained neural networks. IEEE Access 8 (2020), 118899–118911.

[7]

Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics 4, 2 (1968), 100–107. DOI:

[8]

Soheil Hashemi, R. Iris Bahar, and Sherief Reda. 2015. DRUM: A dynamic range unbiased multiplier for approximate applications. In 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 418–425.

Digital Library

[9]

Hou-Jen Ko and Shen-Fu Hsiao. 2011. Design and application of faithfully rounded and truncated multipliers with combined deletion, reduction, truncation, and rounding. IEEE Transactions on Circuits and Systems II: Express Briefs 58, 5 (2011), 304–308. DOI:

[10]

Parag Kulkarni, Puneet Gupta, and Milos Ercegovac. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In 2011 24th Internatioal Conference on VLSI Design. 346–351. DOI:

Digital Library

[11]

Anji Liu, Yitao Liang, Ji Liu, Guy Van den Broeck, and Jianshu Chen. 2020. On effective parallelization of Monte Carlo tree search. arXiv preprint arXiv:2006.08785 (2020).

[12]

Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765–4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf

[13]

Joshua San Miguel, Jorge Albericio, Andreas Moshovos, and Natalie Enright Jerger. 2015. Doppelgänger: A cache for approximate computing. In Proceedings of the 48th International Symposium on Microarchitecture. 50–61.

Digital Library

[14]

Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Comput. Surv. 48, 4, Article 62 (March2016), 33 pages. DOI:

Digital Library

[15]

Vojtech Mrazek, Muhammad Abdullah Hanif, Zdenek Vasicek, Lukas Sekanina, and Muhammad Shafique. 2019. AutoAx: An automatic design space exploration and circuit building methodology utilizing libraries of approximate components. In Proceedings of the 56th Annual Design Automation Conference 2019 (Las Vegas, NV, USA) (DAC’19). Association for Computing Machinery, New York, NY, USA, Article 123, 6 pages. DOI:

Digital Library

[16]

Vojtech Mrazek, Radek Hrbacek, Zdenek Vasicek, and Lukas Sekanina. 2017. EvoApprox8b: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods. In Design, Automation Test in Europe Conference Exhibition (DATE), 2017. 258–261. DOI:

[17]

Vojtech Mrazek, Syed Shakib Sarwar, Lukas Sekanina, Zdenek Vasicek, and Kaushik Roy. 2016. Design of power-efficient approximate multipliers for approximate artificial neural networks. In Proceedings of the 35th International Conference on Computer-Aided Design (Austin, Texas) (ICCAD’16). Association for Computing Machinery, New York, NY, USA, Article 81, 7 pages. DOI:

Digital Library

[18]

Vojtech Mrazek, Lukas Sekanina, and Zdenek Vasicek. 2020. Libraries of approximate circuits: Automated design and application in CNN accelerators. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 10, 4 (2020), 406–418. DOI:

[19]

Jiapu Pan and Willis J. Tompkins. 1985. A real-time QRS detection algorithm. IEEE Transactions on Biomedical Engineering BME-32, 3 (1985), 230–236. DOI:

[20]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.

Digital Library

[21]

Nicola Petra, Davide De Caro, Valeria Garofalo, Ettore Napoli, and Antonio G. M. Strollo. 2010. Truncated binary multipliers with variable correction and minimum mean square error. IEEE Transactions on Circuits and Systems I: Regular Papers 57, 6 (2010), 1312–1325. DOI:

Digital Library

[22]

Aleksandra Płońska and Piotr Płoński. 2021. MLJAR: State-of-the-art Automated Machine Learning Framework for Tabular Data. Version 0.10.3. https://github.com/mljar/mljar-supervised

[23]

Bharath Srinivas Prabakaran, Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina, and Muhammad Shafique. 2020. ApproxFPGAs: Embracing ASIC-based approximate arithmetic components for FPGA-based systems. In 2020 57th ACM/IEEE Design Automation Conference (DAC’20). DOI:

[24]

Bharath Srinivas Prabakaran, Semeen Rehman, Muhammad Abdullah Hanif, Salim Ullah, Ghazal Mazaheri, Akash Kumar, and Muhammad Shafique. 2018. DeMAS: An efficient design methodology for building approximate adders for FPGA-based systems. In 2018 Design, Automation Test in Europe Conference Exhibition (DATE). 917–920. DOI:

[25]

Rohit Ranjan, Salim Ullah, Siva Satyendra Sahoo, and Akash Kumar. 2023. SyFAxO-GeN: Synthesizing FPGA-based approximate operators with generative networks. In Proceedings of the 28th Asia and South Pacific Design Automation Conference (Tokyo, Japan) (ASPDAC’23). Association for Computing Machinery, New York, NY, USA, 402–409. DOI:

Digital Library

[26]

Semeen Rehman, Walaa El-Harouni, Muhammad Shafique, Akash Kumar, Jorg Henkel, and Jörg Henkel. 2016. Architectural-space exploration of approximate multipliers. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’16). 1–8. DOI:

Digital Library

[27]

Kamil Rocki and Reiji Suda. 2011. Large-scale parallel Monte Carlo tree search on GPU. In 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum. 2034–2037. DOI:

Digital Library

[28]

Siva Satyendra Sahoo and Akash Kumar. 2021. Using Monte Carlo tree search for EDA – A case-study with designing cross-layer reliability for heterogeneous embedded systems. In 2021 IFIP/IEEE 29th International Conference on Very Large Scale Integration (VLSI-SoC). 1–6. DOI:

[29]

Ilaria Scarabottolo, Giovanni Ansaloni, and Laura Pozzi. 2018. Circuit carving: A methodology for the design of approximate hardware. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE’18). 545–550. DOI:

[30]

Muhammad Shafique, Waqas Ahmad, Rehan Hafiz, and Jörg Henkel. 2015. A low latency generic accuracy configurable adder. In Proceedings of the 52nd Annual Design Automation Conference (San Francisco, California) (DAC’15). Association for Computing Machinery, New York, NY, USA, Article 86, 6 pages. DOI:

Digital Library

[31]

Muhammad Shafique, Theocharis Theocharides, Vijay Janapa Reddy, and Boris Murmann. 2021. TinyML: Current progress, research challenges, and future roadmap. In 2021 58th ACM/IEEE Design Automation Conference (DAC’21). 1303–1306. DOI:

Digital Library

[32]

Salim Ullah and Akash Kumar. 2023. Introduction: Approximate Arithmetic Circuit Architectures for FPGA-based Systems. Springer International Publishing, Cham, 1–26. DOI:

[33]

Salim Ullah, Sanjeev Sripadraj Murthy, and Akash Kumar. 2018. SMApproxlib: Library of FPGA-based approximate multipliers. In Proceedings of the 55th Annual Design Automation Conference (San Francisco, California) (DAC’18). Association for Computing Machinery, New York, NY, USA, Article 157, 6 pages. DOI:

Digital Library

[34]

Salim Ullah, Tuan Duy Anh Nguyen, and Akash Kumar. 2020. Energy-efficient low-latency signed multiplier for FPGA-based hardware accelerators. IEEE Embedded Systems Letters 13, 2 (2020), 41–44.

[35]

Salim Ullah, Semeen Rehman, Muhammad Shafique, and Akash Kumar. 2021. High-performance accurate and approximate multipliers for FPGA-based hardware accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021), 1–1. DOI:

[36]

Salim Ullah, Siva Satyendra Sahoo, Nemath Ahmed, Debabrata Chaudhury, and Akash Kumar. 2022. AppAxO: Designing application-specific approximate operators for FPGA-based embedded systems. ACM Transactions on Embedded Computing Systems (TECS) (2022).

Digital Library

[37]

Salim Ullah, Siva Satyendra Sahoo, and Akash Kumar. 2023. CoOAx: Correlation-aware synthesis of FPGA-based approximate operators. In Proceedings of the Great Lakes Symposium on VLSI 2023 (Knoxville, TN, USA) (GLSVLSI’23). Association for Computing Machinery, New York, NY, USA, 671–677. DOI:

Digital Library

[38]

Salim Ullah, Hendrik Schmidl, Siva Satyendra Sahoo, Semeen Rehman, and Akash Kumar. 2021. Area-optimized accurate and approximate softcore signed multiplier architectures. IEEE Trans. Comput. 70, 3 (2021), 384–392. DOI:

Digital Library

[39]

Swagath Venkataramani, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2015. Computing approximately, and efficiently. In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE’15). IEEE, 748–751.

[40]

E. George Walters. 2014. Partial-product generation and addition for multiplication in FPGAs with 6-input LUTs. In 2014 48th Asilomar Conference on Signals, Systems and Computers. 1247–1251. DOI:

[41]

E. George Walters. 2016. Array multipliers for high throughput in xilinx FPGAs with 6-Input LUTs. Computers 5, 4 (2016).

[42]

Shibo Wang and Pankaj Kanwar. 2019. BFloat16: The secret to high performance on cloud TPUs. Google Cloud Blog (2019).

[43]

Rong Ye, Ting Wang, Feng Yuan, Rakesh Kumar, and Qiang Xu. 2013. On reconfiguration-oriented approximate adder design and its application. In 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’13). 48–54. DOI:

[44]

Shihui Yin, Gaurav Srivastava, Shreyas K. Venkataramanaiah, Chaitali Chakrabarti, Visar Berisha, and Jae-sun Seo. 2017. Minimizing area and energy of deep learning hardware design using collective low precision and structured compression. In 2017 51st Asilomar Conference on Signals, Systems, and Computers. IEEE, 1907–1911.

Cited By

Sahoo SUllah SKumar A(2024)AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical ProgrammingACM Transactions on Reconfigurable Technology and Systems10.1145/364869417:2(1-28)Online publication date: 30-Apr-2024
https://dl.acm.org/doi/10.1145/3648694

Index Terms

AxOTreeS: A Tree Search Approach to Synthesizing FPGA-based Approximate Operators
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Randomized search
2. Hardware
  1. Electronic design automation
    1. Logic synthesis
      1. Circuit optimization
    2. Methodologies for EDA
      1. Software tools for EDA
  2. Integrated circuits
    1. Logic circuits
      1. Arithmetic and datapath circuits
    2. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

CoOAx: Correlation-aware Synthesis of FPGA-based Approximate Operators
GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023

The run-time reconfigurability and high parallelism offered by Field Programmable Gate Arrays (FPGAs) make them an attractive choice for implementing hardware accelerators for Machine Learning (ML) algorithms. In the quest for designing efficient FPGA-...
SyFAxO-GeN: Synthesizing FPGA-Based Approximate Operators with Generative Networks
ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference

With rising trends of moving AI inference to the edge, due to communication and privacy challenges, there has been a growing focus on designing low-cost Edge-AI. Given the diversity of application areas at the edge, FPGA-based systems are increasingly ...
AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming
With the increasing application of machine learning (ML) algorithms in embedded systems, there is a rising necessity to design low-cost computer arithmetic for these resource-constrained systems. As a result, emerging models of computation, such as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 22, Issue 5s

Special Issue ESWEEK 2023

October 2023

1394 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3614235

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 09 September 2023

Accepted: 13 July 2023

Revised: 02 June 2023

Received: 23 March 2023

Published in TECS Volume 22, Issue 5s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Deutsche Forschungsgemeinschaft (DFG)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
250
Total Downloads

Downloads (Last 12 months)152
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sahoo SUllah SKumar A(2024)AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical ProgrammingACM Transactions on Reconfigurable Technology and Systems10.1145/364869417:2(1-28)Online publication date: 30-Apr-2024
https://dl.acm.org/doi/10.1145/3648694

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents