research-article

SimBU: Self-Similarity-Based Hybrid Binary-Unary Computing for Nonlinear Functions

Authors:

Alireza Khataei,

Kia BazarganAuthors Info & Claims

IEEE Transactions on Computers, Volume 73, Issue 9

Pages 2192 - 2205

https://doi.org/10.1109/TC.2024.3398512

Published: 09 May 2024 Publication History

Abstract

Unary computing is a relatively new method for implementing arbitrary nonlinear functions that uses unpacked thermometer number encoding, enabling much lower hardware costs. In its original form, unary computing provides no trade-off between accuracy and hardware cost. In this work, we propose a novel self-similarity-based method to optimize the previous hybrid binary-unary work and provide it with the trade-off between accuracy and hardware cost by introducing controlled levels of approximation. Looking for self-similarity between different parts of a function allows us to implement a very small subset of core unique subfunctions and derive the rest of the subfunctions from this core using simple linear transformations. We compare our method to previous works such as FloPoCo-LUT (lookup table), HBU (hybrid binary-unary) and FloPoCo-PPA (piecewise polynomial approximation) on several 8–12-bit nonlinear functions including Log, Exp, Sigmoid, GELU, Sin, and Sqr, which are frequently used in neural networks and image processing applications. The area <inline-formula><tex-math notation="LaTeX">$\times$</tex-math><alternatives><mml:math display="inline"><mml:mo>×</mml:mo></mml:math><inline-graphic xlink:href="bazargan-ieq1-3398512.gif"/></alternatives></inline-formula> delay hardware cost of our method is on average 32%–60% better than previous methods in both exact and approximate implementations. We also extend our method to multivariate nonlinear functions and show on average 78%–92% improvement over previous work.

References

[1]

M. H. Najafi, D. J. Lilja, M. D. Riedel, and K. Bazargan, “Low-cost sorting network circuits using unary processing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 8, pp. 1471–1480, Aug. 2018.

[2]

R. Andraka, “A survey of CORDIC algorithms for FPGA based computers,” in Proc. ACM/SIGDA Sixth Int. Symp. Field Programmable Gate Arrays, New York, NY, USA: ACM, 1998, pp. 191–200.

Digital Library

[3]

J. Detrey and F. de Dinechin, “Table-based polynomials for fast hardware function evaluation,” in Proc. IEEE Int. Conf. Appl. Specific Syst., Archit. Processors (ASAP’05), 2005, pp. 328–333.

[4]

H. Dong et al., “PLAC: Piecewise linear approximation computation for all nonlinear unary functions,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 9, pp. 2014–2027, Sep. 2020.

Digital Library

[5]

Y. Tian, T. Wang, Q. Zhang, and Q. Xu, “Approxlut: A novel approximate lookup table-based accelerator,” in Proc. IEEE/ACM Int. Conf. Comput. Aided Des. (ICCAD), 2017, pp. 438–443.

[6]

J. T. Butler, C. Frenzen, N. Macaria, and T. Sasao, “A fast segmentation algorithm for piecewise polynomial numeric function generators,” J. Comput. Appl. Math., vol. 235, no. 14, pp. 4076–4082, 2011. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S037704271100121X

Digital Library

[7]

D.-U. Lee, R. C. C. Cheung, W. Luk, and J. D. Villasenor, “Hierarchical segmentation for hardware function evaluation,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 1, pp. 103–116, Jan. 2009.

Digital Library

[8]

B. Adcock, S. Brugiapaglia, and C. G. Webster, “Compressed sensing approaches for polynomial approximation of high-dimensional functions,” in Proc. Compressed Sens. Appl. 2nd Int. MATHEON Conf., Cham, Switzerland: Springer International Publishing, 2017, pp. 93–124.

[9]

C. Pradhan, M. Letras, and J. Teich, “Efficient table-based function approximation on FPGAs using interval splitting and BRAM instantiation,” ACM Trans. Embed. Comput. Syst., 2023.

Digital Library

[10]

F. de Dinechin and B. Pasca, “Designing custom arithmetic data paths with FloPoCo,” IEEE Des. Test Comput., vol. 28, no. 4, pp. 18–27, Jul./Aug. 2011.

Digital Library

[11]

S. Mohajer, Z. Wang, and K. Bazargan, “Routing magic: Performing computations using routing networks and voting logic on unary encoded data,” in Proc. ACM/SIGDA Int. Symp. Field Programmable Gate Arrays, (FPGA ’18), New York, NY, USA: ACM, 2018, p. 77–86.

Digital Library

[12]

S. Mohajer, Z. Wang, K. Bazargan, and Y. Li, “Parallel unary computing based on function derivatives,” ACM Trans. Reconfigurable Technol. Syst., vol. 14, no. 1, Oct. 2020.

Digital Library

[13]

W. Qian, X. Li, M. D. Riedel, K. Bazargan, and D. J. Lilja, “An architecture for fault-tolerant computation with stochastic logic,” IEEE Trans. Comput., vol. 60, no. 1, pp. 93–105, Jan. 2011.

Digital Library

[14]

Z. Wang, N. Saraf, K. Bazargan, and A. Scheel, “Randomness meets feedback: Stochastic implementation of logistic map dynamical system,” in Proc. 52nd Annu. Des. Automat. Conf. (DAC ’15), New York, NY, USA: ACM, 2015.

Digital Library

[15]

S. A. Salehi, Y. Liu, M. D. Riedel, and K. K. Parhi, “Computing polynomials with positive coefficients using stochastic logic by double-nand expansion,” in Proc. Great Lakes Symp. VLSI (GLSVLSI ’17),. New York, NY, USA: ACM, 2017, pp. 471–474,.

Digital Library

[16]

P. Li, D. J. Lilja, W. Qian, M. D. Riedel, and K. Bazargan, “Logical computation on stochastic bit streams with linear finite-state machines,” IEEE Trans. Comput., vol. 63, no. 6, pp. 1474–1486, Jun. 2014.

Digital Library

[17]

S. R. Faraji and K. Bazargan, “Hybrid binary-unary hardware accelerator,” in Proc. 24th Asia South Pacific Des. Automat. Conf. (ASPDAC ’19), New York, NY, USA: ACM, 2019, pp. 210–215,.

Digital Library

[18]

S. R. Faraji and K. Bazargan, “Hybrid binary-unary hardware accelerator,” IEEE Trans. Comput., vol. 69, no. 9, pp. 1308–1319, Sep. 2020.

[19]

A. Khataei, G. Singh, and K. Bazargan, “Approximate hybrid binary-unary computing with applications in bert language model and image processing,” in Proc. ACM/SIGDA Int. Symp. Field Programmable Gate Arrays (FPGA ’23). New York, NY, USA: ACM, 2023, pp. 165–175.

Digital Library

[20]

A. Khataei, G. Singh, and K. Bazargan, “Optimizing hybrid binary-unary hardware accelerators using self-similarity measures,” in Proc. IEEE 31th Annu. Int. Symp. Field-Programmable Custom Comput. Mach. (FCCM), 2023.

[21]

J.-M. Muller, “Elementary functions and approximate computing,” Proc. IEEE, vol. 108, no. 12, pp. 2136–2149, 2020.

[22]

Y. Chen and H. Li, “Stochastic computing using amplitude and frequency encoding,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 30, no. 5, pp. 656–660, May 2022.

[23]

M. H. Najafi, D. Jenson, D. J. Lilja, and M. D. Riedel, “Performing stochastic computation deterministically,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 12, pp. 2925–2938, Dec. 2019.

[24]

A. Morán, L. Parrilla, M. Roca, J. Font-Rossello, E. Isern, and V. Canals, “Digital implementation of radial basis function neural networks based on stochastic computing,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 13, no. 1, pp. 257–269, Mar. 2023.

[25]

A. Alaghi, W. Qian, and J. P. Hayes, “The promise and challenge of stochastic computing,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 37, no. 8, pp. 1515–1531, Aug. 2018.

[26]

K. Chen, Y. Gao, H. Waris, W. Liu, and F. Lombardi, “Approximate softmax functions for energy-efficient deep neural networks,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 31, no. 1, pp. 4–16, Jan. 2023.

[27]

Y. Zhang, J. Qin, J. Han, and G. Xie, “Design of a stochastic computing architecture for the phansalkar algorithm,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 32, no. 3, pp. 442–454, Mar. 2024.

Digital Library

[28]

X. Wei and L. Xiu, “A VLSI digital circuit platform for performing deterministic stochastic computing in the time dimension using fraction operations on rational numbers,” IEEE Trans. Emerg. Topics Comput., vol. 11, no. 1, pp. 194–207, Jan.–Mar. 2023.

[29]

J. Wang, H. Chen, D. Wang, K. Mei, S. Zhang, and X. Fan, “A noise-driven heterogeneous stochastic computing multiplier for heuristic precision improvement in energy-efficient DNNs,” IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., vol. 42, no. 2, pp. 630–643, Feb. 2023.

[30]

H. Guo et al., “Cambricon-u: A systolic random increment memory architecture for unary computing,” in Proc. 56th Annu. IEEE/ACM Int. Symp. Microarchit., (MICRO ’23), New York, NY, USA: ACM, 2023, pp. 424–437.

Digital Library

[31]

M. Schulte and J. Stine, “Approximating elementary functions with symmetric bipartite tables,” IEEE Trans. Comput., vol. 48, no. 8, pp. 842–847, Aug. 1999.

Digital Library

[32]

F. de Dinechin and A. Tisserand, “Multipartite table methods,” IEEE Trans. Comput., vol. 54, no. 3, pp. 319–330, Mar. 2005.

Digital Library

[33]

S.-F. Hsiao, C.-S. Wen, Y.-H. Chen, and K.-C. Huang, “Hierarchical multipartite function evaluation,” IEEE Trans. Comput., vol. 66, no. 1, pp. 89–99, Jan. 2017.

Digital Library

[34]

S.-F. Hsiao, P.-H. Wu, C.-S. Wen, and P. K. Meher, “Table size reduction methods for faithfully rounded lookup-table-based multiplierless function evaluation,” IEEE Trans. Circuits Syst. II: Exp. Briefs, vol. 62, no. 5, pp. 466–470, May 2015.

[35]

M. Christ, L. Forget, and F. de Dinechin, “Lossless differential table compression for hardware function evaluation,” IEEE Trans. Circuits Syst. II: Exp. Briefs, vol. 69, no. 3, pp. 1642–1646, Mar. 2022.

[36]

A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 6000–6010.

[37]

J. Yu et al., “NN-LUT: Neural approximation of non-linear operations for efficient transformer inference,” in Proc. 59th ACM/IEEE Des. Automat. Conf., (DAC ’22), New York, NY, USA: ACM, 2022, pp. 577–582.

Digital Library

[38]

H. Khan, A. Khan, Z. Khan, L. B. Huang, K. Wang, and L. He, “NPE: An FPGA-based overlay processor for natural language processing,” 2021,.

[39]

S. Kim, A. Gholami, Z. Yao, M. W. Mahoney, and K. Keutzer, “I-bert: Integer-only bert quantization,” in Proc. 38th Int. Conf. Mach. Learn., M. Meila and T. Zhang, Eds., vol. 139, PMLR, Jul. 2021, pp. 5506–5518, [Online]. Available: https://proceedings.mlr.press/v139/kim21d.html

[40]

M. S. Ansari, V. Mrazek, B. F. Cockburn, L. Sekanina, Z. Vasicek, and J. Han, “Improving the accuracy and hardware efficiency of neural networks using approximate multipliers,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 2, pp. 317–328, Feb. 2020.

[41]

T. Na and S. Mukhopadhyay, “Speeding up convolutional neural network training with dynamic precision scaling and flexible multiplier-accumulator,” in Proc. Int. Symp. Low Power Electron. Des., (ISLPED ’16), New York, NY, USA: ACM, 2016, pp. 58–63.

Digital Library

[42]

S. Venkataramani, A. Ranjan, K. Roy, and A. Raghunathan, “AxNN: Energy-efficient neuromorphic systems using approximate computing,” in Proc. Int. Symp. Low Power Electron. Des., (ISLPED ’14), New York, NY, USA: ACM, 2014, pp. 27–32.

Digital Library

[43]

S. Orcioni, A. Paffi, F. Camera, F. Apollonio, and M. Liberti, “Automatic decoding of input sinusoidal signal in a neuron model: High pass homomorphic filtering,” Neurocomputing, vol. 292, pp. 165–173, 2018.

[44]

R. Gonzalez and R. Woods, Digital Image Processing. New York, NY, USA: Pearson, 2018. [Online]. Available: https://books.google.com/books?id=0F05vgAACAAJ

[45]

L. S. Davis, “A survey of edge detection techniques,” Computer Graphics and Image Processing, vol. 4, no. 3, pp. 248–270, 1975.

Index Terms

SimBU: Self-Similarity-Based Hybrid Binary-Unary Computing for Nonlinear Functions

Index terms have been assigned to the content through auto-classification.

Recommendations

Adaptive neural control for a class of stochastic nonlinear systems with unknown parameters, unknown nonlinear functions and stochastic disturbances

In this paper, adaptive neural control (ANC) is investigated for a class of strict-feedback nonlinear stochastic systems with unknown parameters, unknown nonlinear functions and stochastic disturbances. The new controller of adaptive neural network with ...
Direct adaptive self-structuring fuzzy controller for nonaffine nonlinear system

A direct adaptive state-feedback controller for highly nonlinear systems is proposed. This paper considers uncertain or ill-defined nonaffine nonlinear systems and employ a static fuzzy logic system (FLS) with an on-line structuring algorithm. The FLS ...
Observer-based direct adaptive fuzzy-neural control for nonaffine nonlinear systems

In this paper, an observer-based direct adaptive fuzzy-neural control scheme is presented for nonaffine nonlinear systems in the presence of unknown structure of nonlinearities. A direct adaptive fuzzy-neural controller and a class of generalized ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 73, Issue 9

Sept. 2024

241 pages

Issue’s Table of Contents

0018-9340 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Computer Society

United States

Publication History

Published: 09 May 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents