skip to main content
research-article

SimBU: Self-Similarity-Based Hybrid Binary-Unary Computing for Nonlinear Functions

Published: 09 May 2024 Publication History

Abstract

Unary computing is a relatively new method for implementing arbitrary nonlinear functions that uses unpacked thermometer number encoding, enabling much lower hardware costs. In its original form, unary computing provides no trade-off between accuracy and hardware cost. In this work, we propose a novel self-similarity-based method to optimize the previous hybrid binary-unary work and provide it with the trade-off between accuracy and hardware cost by introducing controlled levels of approximation. Looking for self-similarity between different parts of a function allows us to implement a very small subset of core unique subfunctions and derive the rest of the subfunctions from this core using simple linear transformations. We compare our method to previous works such as FloPoCo-LUT (lookup table), HBU (hybrid binary-unary) and FloPoCo-PPA (piecewise polynomial approximation) on several 8&#x2013;12-bit nonlinear functions including Log, Exp, Sigmoid, GELU, Sin, and Sqr, which are frequently used in neural networks and image processing applications. The area <inline-formula><tex-math notation="LaTeX">$\times$</tex-math><alternatives><mml:math display="inline"><mml:mo>&#x000D7;</mml:mo></mml:math><inline-graphic xlink:href="bazargan-ieq1-3398512.gif"/></alternatives></inline-formula> delay hardware cost of our method is on average 32%&#x2013;60% better than previous methods in both exact and approximate implementations. We also extend our method to multivariate nonlinear functions and show on average 78%&#x2013;92% improvement over previous work.

References

[1]
M. H. Najafi, D. J. Lilja, M. D. Riedel, and K. Bazargan, “Low-cost sorting network circuits using unary processing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 8, pp. 1471–1480, Aug. 2018.
[2]
R. Andraka, “A survey of CORDIC algorithms for FPGA based computers,” in Proc. ACM/SIGDA Sixth Int. Symp. Field Programmable Gate Arrays, New York, NY, USA: ACM, 1998, pp. 191–200.
[3]
J. Detrey and F. de Dinechin, “Table-based polynomials for fast hardware function evaluation,” in Proc. IEEE Int. Conf. Appl. Specific Syst., Archit. Processors (ASAP’05), 2005, pp. 328–333.
[4]
H. Dong et al., “PLAC: Piecewise linear approximation computation for all nonlinear unary functions,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 9, pp. 2014–2027, Sep. 2020.
[5]
Y. Tian, T. Wang, Q. Zhang, and Q. Xu, “Approxlut: A novel approximate lookup table-based accelerator,” in Proc. IEEE/ACM Int. Conf. Comput. Aided Des. (ICCAD), 2017, pp. 438–443.
[6]
J. T. Butler, C. Frenzen, N. Macaria, and T. Sasao, “A fast segmentation algorithm for piecewise polynomial numeric function generators,” J. Comput. Appl. Math., vol. 235, no. 14, pp. 4076–4082, 2011. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S037704271100121X
[7]
D.-U. Lee, R. C. C. Cheung, W. Luk, and J. D. Villasenor, “Hierarchical segmentation for hardware function evaluation,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 1, pp. 103–116, Jan. 2009.
[8]
B. Adcock, S. Brugiapaglia, and C. G. Webster, “Compressed sensing approaches for polynomial approximation of high-dimensional functions,” in Proc. Compressed Sens. Appl. 2nd Int. MATHEON Conf., Cham, Switzerland: Springer International Publishing, 2017, pp. 93–124.
[9]
C. Pradhan, M. Letras, and J. Teich, “Efficient table-based function approximation on FPGAs using interval splitting and BRAM instantiation,” ACM Trans. Embed. Comput. Syst., 2023.
[10]
F. de Dinechin and B. Pasca, “Designing custom arithmetic data paths with FloPoCo,” IEEE Des. Test Comput., vol. 28, no. 4, pp. 18–27, Jul./Aug. 2011.
[11]
S. Mohajer, Z. Wang, and K. Bazargan, “Routing magic: Performing computations using routing networks and voting logic on unary encoded data,” in Proc. ACM/SIGDA Int. Symp. Field Programmable Gate Arrays, (FPGA ’18), New York, NY, USA: ACM, 2018, p. 77–86.
[12]
S. Mohajer, Z. Wang, K. Bazargan, and Y. Li, “Parallel unary computing based on function derivatives,” ACM Trans. Reconfigurable Technol. Syst., vol. 14, no. 1, Oct. 2020.
[13]
W. Qian, X. Li, M. D. Riedel, K. Bazargan, and D. J. Lilja, “An architecture for fault-tolerant computation with stochastic logic,” IEEE Trans. Comput., vol. 60, no. 1, pp. 93–105, Jan. 2011.
[14]
Z. Wang, N. Saraf, K. Bazargan, and A. Scheel, “Randomness meets feedback: Stochastic implementation of logistic map dynamical system,” in Proc. 52nd Annu. Des. Automat. Conf. (DAC ’15), New York, NY, USA: ACM, 2015.
[15]
S. A. Salehi, Y. Liu, M. D. Riedel, and K. K. Parhi, “Computing polynomials with positive coefficients using stochastic logic by double-nand expansion,” in Proc. Great Lakes Symp. VLSI (GLSVLSI ’17),. New York, NY, USA: ACM, 2017, pp. 471–474,.
[16]
P. Li, D. J. Lilja, W. Qian, M. D. Riedel, and K. Bazargan, “Logical computation on stochastic bit streams with linear finite-state machines,” IEEE Trans. Comput., vol. 63, no. 6, pp. 1474–1486, Jun. 2014.
[17]
S. R. Faraji and K. Bazargan, “Hybrid binary-unary hardware accelerator,” in Proc. 24th Asia South Pacific Des. Automat. Conf. (ASPDAC ’19), New York, NY, USA: ACM, 2019, pp. 210–215,.
[18]
S. R. Faraji and K. Bazargan, “Hybrid binary-unary hardware accelerator,” IEEE Trans. Comput., vol. 69, no. 9, pp. 1308–1319, Sep. 2020.
[19]
A. Khataei, G. Singh, and K. Bazargan, “Approximate hybrid binary-unary computing with applications in bert language model and image processing,” in Proc. ACM/SIGDA Int. Symp. Field Programmable Gate Arrays (FPGA ’23). New York, NY, USA: ACM, 2023, pp. 165–175.
[20]
A. Khataei, G. Singh, and K. Bazargan, “Optimizing hybrid binary-unary hardware accelerators using self-similarity measures,” in Proc. IEEE 31th Annu. Int. Symp. Field-Programmable Custom Comput. Mach. (FCCM), 2023.
[21]
J.-M. Muller, “Elementary functions and approximate computing,” Proc. IEEE, vol. 108, no. 12, pp. 2136–2149, 2020.
[22]
Y. Chen and H. Li, “Stochastic computing using amplitude and frequency encoding,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 30, no. 5, pp. 656–660, May 2022.
[23]
M. H. Najafi, D. Jenson, D. J. Lilja, and M. D. Riedel, “Performing stochastic computation deterministically,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 12, pp. 2925–2938, Dec. 2019.
[24]
A. Morán, L. Parrilla, M. Roca, J. Font-Rossello, E. Isern, and V. Canals, “Digital implementation of radial basis function neural networks based on stochastic computing,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 13, no. 1, pp. 257–269, Mar. 2023.
[25]
A. Alaghi, W. Qian, and J. P. Hayes, “The promise and challenge of stochastic computing,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 37, no. 8, pp. 1515–1531, Aug. 2018.
[26]
K. Chen, Y. Gao, H. Waris, W. Liu, and F. Lombardi, “Approximate softmax functions for energy-efficient deep neural networks,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 31, no. 1, pp. 4–16, Jan. 2023.
[27]
Y. Zhang, J. Qin, J. Han, and G. Xie, “Design of a stochastic computing architecture for the phansalkar algorithm,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 32, no. 3, pp. 442–454, Mar. 2024.
[28]
X. Wei and L. Xiu, “A VLSI digital circuit platform for performing deterministic stochastic computing in the time dimension using fraction operations on rational numbers,” IEEE Trans. Emerg. Topics Comput., vol. 11, no. 1, pp. 194–207, Jan.–Mar. 2023.
[29]
J. Wang, H. Chen, D. Wang, K. Mei, S. Zhang, and X. Fan, “A noise-driven heterogeneous stochastic computing multiplier for heuristic precision improvement in energy-efficient DNNs,” IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., vol. 42, no. 2, pp. 630–643, Feb. 2023.
[30]
H. Guo et al., “Cambricon-u: A systolic random increment memory architecture for unary computing,” in Proc. 56th Annu. IEEE/ACM Int. Symp. Microarchit., (MICRO ’23), New York, NY, USA: ACM, 2023, pp. 424–437.
[31]
M. Schulte and J. Stine, “Approximating elementary functions with symmetric bipartite tables,” IEEE Trans. Comput., vol. 48, no. 8, pp. 842–847, Aug. 1999.
[32]
F. de Dinechin and A. Tisserand, “Multipartite table methods,” IEEE Trans. Comput., vol. 54, no. 3, pp. 319–330, Mar. 2005.
[33]
S.-F. Hsiao, C.-S. Wen, Y.-H. Chen, and K.-C. Huang, “Hierarchical multipartite function evaluation,” IEEE Trans. Comput., vol. 66, no. 1, pp. 89–99, Jan. 2017.
[34]
S.-F. Hsiao, P.-H. Wu, C.-S. Wen, and P. K. Meher, “Table size reduction methods for faithfully rounded lookup-table-based multiplierless function evaluation,” IEEE Trans. Circuits Syst. II: Exp. Briefs, vol. 62, no. 5, pp. 466–470, May 2015.
[35]
M. Christ, L. Forget, and F. de Dinechin, “Lossless differential table compression for hardware function evaluation,” IEEE Trans. Circuits Syst. II: Exp. Briefs, vol. 69, no. 3, pp. 1642–1646, Mar. 2022.
[36]
A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 6000–6010.
[37]
J. Yu et al., “NN-LUT: Neural approximation of non-linear operations for efficient transformer inference,” in Proc. 59th ACM/IEEE Des. Automat. Conf., (DAC ’22), New York, NY, USA: ACM, 2022, pp. 577–582.
[38]
H. Khan, A. Khan, Z. Khan, L. B. Huang, K. Wang, and L. He, “NPE: An FPGA-based overlay processor for natural language processing,” 2021,.
[39]
S. Kim, A. Gholami, Z. Yao, M. W. Mahoney, and K. Keutzer, “I-bert: Integer-only bert quantization,” in Proc. 38th Int. Conf. Mach. Learn., M. Meila and T. Zhang, Eds., vol. 139, PMLR, Jul. 2021, pp. 5506–5518, [Online]. Available: https://proceedings.mlr.press/v139/kim21d.html
[40]
M. S. Ansari, V. Mrazek, B. F. Cockburn, L. Sekanina, Z. Vasicek, and J. Han, “Improving the accuracy and hardware efficiency of neural networks using approximate multipliers,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 2, pp. 317–328, Feb. 2020.
[41]
T. Na and S. Mukhopadhyay, “Speeding up convolutional neural network training with dynamic precision scaling and flexible multiplier-accumulator,” in Proc. Int. Symp. Low Power Electron. Des., (ISLPED ’16), New York, NY, USA: ACM, 2016, pp. 58–63.
[42]
S. Venkataramani, A. Ranjan, K. Roy, and A. Raghunathan, “AxNN: Energy-efficient neuromorphic systems using approximate computing,” in Proc. Int. Symp. Low Power Electron. Des., (ISLPED ’14), New York, NY, USA: ACM, 2014, pp. 27–32.
[43]
S. Orcioni, A. Paffi, F. Camera, F. Apollonio, and M. Liberti, “Automatic decoding of input sinusoidal signal in a neuron model: High pass homomorphic filtering,” Neurocomputing, vol. 292, pp. 165–173, 2018.
[44]
R. Gonzalez and R. Woods, Digital Image Processing. New York, NY, USA: Pearson, 2018. [Online]. Available: https://books.google.com/books?id=0F05vgAACAAJ
[45]
L. S. Davis, “A survey of edge detection techniques,” Computer Graphics and Image Processing, vol. 4, no. 3, pp. 248–270, 1975.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 73, Issue 9
Sept. 2024
241 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 09 May 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media