skip to main content
research-article

AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming

Published: 30 April 2024 Publication History

Abstract

With the increasing application of machine learning (ML) algorithms in embedded systems, there is a rising necessity to design low-cost computer arithmetic for these resource-constrained systems. As a result, emerging models of computation, such as approximate and stochastic computing, that leverage the inherent error-resilience of such algorithms are being actively explored for implementing ML inference on resource-constrained systems. Approximate computing (AxC) aims to provide disproportionate gains in the power, performance, and area (PPA) of an application by allowing some level of reduction in its behavioral accuracy (BEHAV). Using approximate operators (AxOs) for computer arithmetic forms one of the more prevalent methods of implementing AxC. AxOs provide the additional scope for finer granularity of optimization, compared to only precision scaling of computer arithmetic. To this end, the design of platform-specific and cost-efficient approximate operators forms an important research goal. Recently, multiple works have reported the use of AI/ML-based approaches for synthesizing novel FPGA-based AxOs. However, most of such works limit the use of AI/ML to designing ML-based surrogate functions that are used during iterative optimization processes. To this end, we propose a novel data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs. Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data and use the solutions to enable a more directed search approach for evolutionary optimization algorithms. Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV, in the design of signed 8-bit multipliers. Further, we report up to 27% better hypervolume than other state-of-the-art approaches to DSE for FPGA-based application-specific AxOs.

References

[1]
Waqar Ahmad, Berke Ayrancioglu, and Ilker Hamzaoglu. 2021. Low error efficient approximate adders for FPGAs. IEEE Access 9 (2021), 117232–117243.
[2]
Zainab Aizaz and Kavita Khare. 2023. ASMPEC: Approximate-sum-based mapping of partial products with error correction for softcore multipliers on FPGAs. IEEE Transactions on Circuits and Systems II: Express Briefs 70, 12 (2023), 4569–4573.
[3]
Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, and Frédéric Pétrot. 2017. Ternary neural networks for resource-efficient AI applications. In 2017 International Joint Conference on Neural Networks (IJCNN’17). 2547–2554.
[4]
C. R. Baugh and B. A. Wooley. 1973. A two’s complement parallel array multiplication algorithm. IEEE Trans. Comput. C-22, 12 (1973), 1045–1047.
[5]
Francesco Biscani and Dario Izzo. 2020. A parallel global multiobjective framework for optimization: PAGMO. Journal of Open Source Software 5, 53 (2020), 2338.
[6]
Andrew D. Booth. 1951. A signed binary multiplication technique. The Quarterly Journal of Mechanics and Applied Mathematics 4, 2 (1951), 236–240.
[7]
Sina Boroumand, Hadi P. Afshar, and Philip Brisk. 2018. Approximate quaternary addition with the fast carry chains of FPGAs. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE’18). 577–580.
[8]
Sina Boroumand, Hadi P. Afshar, Philip Brisk, and Siamak Mohammadi. 2018. Exploration of approximate multipliers design space using carry propagation free compressors. In 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC’18). 611–616.
[9]
Sina Boroumand and Philip Brisk. 2019. Approximate adder tree synthesis for FPGAs. In 2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig’19). 1–8.
[10]
Zahra Ebrahimi, Salim Ullah, and Akash Kumar. 2020. SIMDive: Approximate SIMD soft multiplier-divider for FPGAs with tunable accuracy. In Proceedings of the 2020 on Great Lakes Symposium on VLSI (Virtual Event, China) (GLSVLSI’20). Association for Computing Machinery, New York, NY, USA, 151–156.
[11]
Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 1 (July 2012), 2171–2175.
[12]
Soheil Hashemi, R. Iris Bahar, and Sherief Reda. 2015. DRUM: A dynamic range unbiased multiplier for approximate applications. In 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’15). 418–425.
[13]
Chandan Kumar Jha, Kailash Prasad, Arun Singh Tomar, and Joycee Mekie. 2020. SEDAAF: FPGA based single exact dual approximate adders for approximate processors. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS’20). 1–5.
[14]
Hou-Jen Ko and Shen-Fu Hsiao. 2011. Design and application of faithfully rounded and truncated multipliers with combined deletion, reduction, truncation, and rounding. IEEE Transactions on Circuits and Systems II: Express Briefs 58, 5 (2011), 304–308.
[15]
Parag Kulkarni, Puneet Gupta, and Milos Ercegovac. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In 2011 24th Internatioal Conference on VLSI Design. 346–351.
[16]
Martin Kumm, Oscar Gustafsson, Mario Garrido, and Peter Zipf. 2018. Optimal single constant multiplication using ternary adders. IEEE Transactions on Circuits and Systems II: Express Briefs 65, 7 (2018), 928–932.
[17]
Martin Kumm, Johannes Kappauf, Matei Istoan, and Peter Zipf. 2017. Resource optimal design of large multipliers for FPGAs. In 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH’17). 131–138.
[18]
Ian Kuon and Jonathan Rose. 2006. Measuring the gap between FPGAs and ASICs. In Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays. 21–30.
[19]
Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Comput. Surv. 48, 4, Article 62 (Mar. 2016), 33 pages.
[20]
Vojtech Mrazek, Muhammad Abdullah Hanif, Zdenek Vasicek, Lukas Sekanina, and Muhammad Shafique. 2019. AutoAx: An automatic design space exploration and circuit building methodology utilizing libraries of approximate components. In Proceedings of the 56th Annual Design Automation Conference 2019 (Las Vegas, NV, USA) (DAC’19). Association for Computing Machinery, New York, NY, USA, Article 123, 6 pages.
[21]
Vojtech Mrazek, Radek Hrbacek, Zdenek Vasicek, and Lukas Sekanina. 2017. EvoApprox8b: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. 258–261.
[22]
Vojtech Mrazek, Lukas Sekanina, and Zdenek Vasicek. 2020. Libraries of approximate circuits: Automated design and application in CNN accelerators. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 10, 4 (2020), 406–418.
[23]
Eriko Nurvitadhi, Andrew Boutros, Prerna Budhkar, Ali Jafari, Dongup Kwon, David Sheffield, Abirami Prabhakaran, Karthik Gururaj, Pranavi Appana, and Mishali Naik. 2019. Scalable low-latency persistent neural machine translation on CPU server with multiple FPGAs. In 2019 International Conference on Field-Programmable Technology (ICFPT’19). 307–310.
[24]
Hadi Parandeh-Afshar, Philip Brisk, and Paolo Ienne. 2009. Exploiting fast carry-chains of FPGAs for designing compressor trees. In 2009 International Conference on Field Programmable Logic and Applications. 242–249.
[25]
Hadi Parandeh-Afshar and Paolo Ienne. 2011. Measuring and reducing the performance gap between embedded and soft multipliers on FPGAs. In 2011 21st International Conference on Field Programmable Logic and Applications. 225–231.
[26]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[27]
Nicola Petra, Davide De Caro, Valeria Garofalo, Ettore Napoli, and Antonio G. M. Strollo. 2010. Truncated binary multipliers with variable correction and minimum mean square error. IEEE Transactions on Circuits and Systems I: Regular Papers 57, 6 (2010), 1312–1325.
[28]
Aleksandra Płońska and Piotr Płoński. 2021. MLJAR: State-of-the-art Automated Machine Learning Framework for Tabular Data. Version 0.10.3. https://github.com/mljar/mljar-supervised
[29]
Bharath Srinivas Prabakaran, Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina, and Muhammad Shafique. 2020. ApproxFPGAs: Embracing ASIC-based approximate arithmetic components for FPGA-based systems. In Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference (Virtual Event, USA) (DAC’20). IEEE Press, Article 118, 6 pages.
[30]
Bharath Srinivas Prabakaran, Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina, and Muhammad Shafique. 2023. Xel-FPGAs: An end-to-end automated exploration framework for approximate accelerators in FPGA-based systems. In 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD’23). 1–9.
[31]
Bharath Srinivas Prabakaran, Semeen Rehman, Muhammad Abdullah Hanif, Salim Ullah, Ghazal Mazaheri, Akash Kumar, and Muhammad Shafique. 2018. DeMAS: An efficient design methodology for building approximate adders for FPGA-based systems. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE’18). 917–920.
[32]
Shvetank Prakash, Tim Callahan, Joseph Bushagour, Colby Banbury, Alan V. Green, Pete Warden, Tim Ansell, and Vijay Janapa Reddi. 2022. CFU playground: Full-stack open-source framework for tiny machine learning (tinyML) acceleration on FPGAs. arXiv preprint arXiv:2201.01863 (2022).
[33]
Rohit Ranjan, Salim Ullah, Siva Satyendra Sahoo, and Akash Kumar. 2023. SyFAxO-GeN: Synthesizing FPGA-based approximate operators with generative networks. In Proceedings of the 28th Asia and South Pacific Design Automation Conference (Tokyo, Japan) (ASPDAC’23). Association for Computing Machinery, New York, NY, USA, 402–409.
[34]
Semeen Rehman, Walaa El-Harouni, Muhammad Shafique, Akash Kumar, Jorg Henkel, and Jörg Henkel. 2016. Architectural-space exploration of approximate multipliers. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’16) (Austin, TX, USA). IEEE Press, 1–8.
[35]
Siva Satyendra Sahoo, Salim Ullah, Soumyo Bhattacharjee, and Akash Kumar. 2023. AxOCS: Scaling FPGA-based Approximate Operators using Configuration Supersampling. arxiv:2309.12830 [cs.AR].
[36]
Siva Satyendra Sahoo, Salim Ullah, and Akash Kumar. 2023. AxOTreeS: A tree search approach to synthesizing FPGA-based approximate operators. ACM Trans. Embed. Comput. Syst. 22, 5s, Article 101 (Sep. 2023), 26 pages.
[37]
Muhammad Shafique, Waqas Ahmad, Rehan Hafiz, and Jörg Henkel. 2015. A low latency generic accuracy configurable adder. In 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC’15). 1–6.
[38]
Muhammad Shafique, Rehan Hafiz, Semeen Rehman, Walaa El-Harouni, and Jörg Henkel. 2016. Cross-layer approximate computing: From logic to architectures. In DAC.
[39]
Salim Ullah, Sanjeev Sripadraj Murthy, and Akash Kumar. 2018. SMApproxLib: Library of FPGA-based approximate multipliers. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC’18). 1–6.
[40]
Salim Ullah, Semeen Rehman, Muhammad Shafique, and Akash Kumar. 2022. High-performance accurate and approximate multipliers for FPGA-based hardware accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41, 2 (2022), 211–224.
[41]
Salim Ullah, Siva Satyendra Sahoo, Nemath Ahmed, Debabrata Chaudhury, and Akash Kumar. 2022. AppAxO: Designing application-specific approximate operators for FPGA-based embedded systems. ACM Trans. Embed. Comput. Syst. 21, 3, Article 29 (May 2022), 31 pages.
[42]
Salim Ullah, Siva Satyendra Sahoo, and Akash Kumar. 2021. CLAppED: A design framework for implementing cross-layer approximation in FPGA-based embedded systems. In 2021 58th ACM/IEEE Design Automation Conference (DAC’21). 475–480.
[43]
Salim Ullah, Hendrik Schmidl, Siva Satyendra Sahoo, Semeen Rehman, and Akash Kumar. 2021. Area-optimized accurate and approximate softcore signed multiplier architectures. IEEE Trans. Comput. 70, 3 (2021), 384–392.
[44]
Nguyen Van Toan and Jeong-Gun Lee. 2020. FPGA-based multi-level approximate multipliers for high-performance error-resilient applications. IEEE Access 8 (2020), 25481–25497.
[45]
Swagath Venkataramani, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2015. Approximate computing and the quest for computing efficiency. In Proceedings of the 52nd Annual Design Automation Conference (San Francisco, California) (DAC’15). Association for Computing Machinery, New York, NY, USA, Article 120, 6 pages.
[46]
E. George Walters. 2014. Partial-product generation and addition for multiplication in FPGAs with 6-input LUTs. In 2014 48th Asilomar Conference on Signals, Systems and Computers. 1247–1251.
[47]
E. George Walters. 2016. Array multipliers for high throughput in Xilinx FPGAs with 6-Input LUTs. Computers 5, 4 (2016).
[48]
Shibo Wang and Pankaj Kanwar. 2019. BFloat16: The secret to high performance on Cloud TPUs. Google Cloud Blog (2019).
[49]
Xilinx. 2021. Xilinx UltraScale Architecture DSP Slice User Guide. https://docs.xilinx.com/v/u/en-US/ug579-ultrascale-dspAccessed: 2023-10-04.
[50]
Xilinx. 2022. Xilinx Versal ACAP DSP Engine Architecture Manual. https://docs.xilinx.com/r/en-US/am004-versal-dsp-engine/Device-ResourcesAccessed: 2023-10-04.
[51]
Rong Ye, Ting Wang, Feng Yuan, Rakesh Kumar, and Qiang Xu. 2013. On reconfiguration-oriented approximate adder design and its application. In 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’13). 48–54.
[52]
Ofir Zafrir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019. Q8BERT: Quantized 8Bit BERT. In 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS’19). 36–39.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems
ACM Transactions on Reconfigurable Technology and Systems  Volume 17, Issue 2
June 2024
464 pages
EISSN:1936-7414
DOI:10.1145/3613550
  • Editor:
  • Deming Chen
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2024
Online AM: 19 February 2024
Accepted: 15 February 2024
Revised: 14 December 2023
Received: 05 July 2023
Published in TRETS Volume 17, Issue 2

Check for updates

Author Tags

  1. Approximate computing
  2. arithmetic operator design
  3. circuit synthesis
  4. AI-based exploration

Qualifiers

  • Research-article

Funding Sources

  • Deutsche Forschungsgemeinschaft (DFG)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 166
    Total Downloads
  • Downloads (Last 12 months)166
  • Downloads (Last 6 weeks)42
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media