skip to main content
research-article

Automated Bug Detection for High-level Synthesis of Multi-threaded Irregular Applications

Published: 27 September 2020 Publication History

Abstract

Field Programmable Gate Arrays (FPGAs) are becoming an appealing technology in datacenters and High Performance Computing. High-Level Synthesis (HLS) of multi-threaded parallel programs is increasingly used to extract parallelism. Despite great leaps forward in HLS and related debugging methodologies, there is a lack of contributions in automated bug identification for HLS of multi-threaded programs. This work defines a methodology to automatically detect and isolate bugs in parallel circuits generated with HLS. The technique relies on hardware/software Discrepancy Analysis and exploits a pattern-matching algorithm based on Finite State Automata to compare multiple hardware and software threads. Overhead, advantages, and limitations are evaluated on designs generated with an open-source HLS compiler supporting OpenMP.

References

[1]
IEEE. 2016. Standard for information technology—Portable operating system interface (POSIX) base specifications, Issue 7. IEEE Std 1003.1, 2016 Edition (incorporates IEEE Std 1003.1-2008, IEEE Std 1003.1-2008/Cor 1-2013, and IEEE Std 1003.1-2008/Cor 2-2016) (Sept. 2016), 1--3957.
[2]
D. Andrews, R. Sass, E. Anderson, J. Agron, W. Peck, J. Stevens, F. Baijot, and E. Komp. 2008. Achieving programming model abstractions for reconfigurable computing. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 16, 1 (Jan. 2008), 34--44.
[3]
The OpenMP Architecture Review Board. 2015. OpenMP Application Programming Interface—Version 4.5. Retrieved from http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf.
[4]
P. Briggs, K. D. Cooper, T. J. Harvey, and L. T. Simpson. 1998. Practical improvements to the construction and destruction of static single assignment form. Softw. Pract. Exper. 28, 8 (July 1998), 859--881.
[5]
D. Cabrera, X. Martorell, G. Gaydadjiev, E. Ayguade, and D. Jimenez-Gonzalez. 2009. OpenMP extensions for FPGA accelerators. In Proceedings of the International Symposium on Systems, Architectures, Modeling, and Simulation. 17--24.
[6]
N. Calagar, S. D. Brown, and J. H. Anderson. 2014. Source-level debugging for FPGA high-level synthesis. In Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL’14). 1--8.
[7]
K. Campbell, L. He, L. Yang, S. Gurumani, K. Rupnow, and D. Chen. 2016. Debugging and verifying SoC designs through effective cross-layer hardware-software co-simulation. In Proceedings of the 53rd Annual Design Automation Conference (DAC’16). ACM, New York, NY.
[8]
K. Campbell, D. Lin, S. Mitra, and D. Chen. 2015. Hybrid quick error detection (H-QED): Accelerator validation and debug using high-level synthesis principles. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). ACM, New York, NY.
[9]
V. G. Castellana and F. Ferrandi. 2013. An automated flow for the high level synthesis of coarse grained parallel applications. In Proceedings of the International Conference on Field-programmable Technology (FPT’13). 294--301.
[10]
Vito Giovanni Castellana, Marco Minutoli, Antonino Tumeo, Marco Lattuada, Pietro Fezzardi, and Fabrizio Ferrandi. 2019. Software defined architectures for data analytics. In Proceedings of the 24th Asia and South Pacific Design Automation Conference (ASPDAC’19). 711--718.
[11]
J. Choi, S. Brown, and J. Anderson. 2013. From software threads to parallel hardware in high-level synthesis for FPGAs. In Proceedings of the International Conference on Field-programmable Technology (FPT’13). 270--277.
[12]
J. Curreri, G. Stitt, and A. D. George. 2010. High-level synthesis techniques for in-circuit assertion-based verification. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing, Workshops and PhD Forum (IPDPSW’10). 1--8.
[13]
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Prog. Lang. Syst. 13, 4 (Oct. 1991), 451--490.
[14]
P. Fezzardi, M. Castellana, and F. Ferrandi. 2015. Trace-based automated logical debugging for high-level synthesis generated circuits. In Proceedings of the 33rd IEEE International Conference on Computer Design (ICCD’15). 251--258.
[15]
P. Fezzardi and F. Ferrandi. 2016. Automated bug detection for pointers and memory accesses in high-level synthesis compilers. In Proceedings of the 26th International Conference on Field Programmable Logic and Applications (FPL’16). 1--9.
[16]
P. Fezzardi, M. Lattuada, and F. Ferrandi. 2017. Using efficient path profiling to optimize memory consumption of on-chip debugging for high-level synthesis. ACM Trans. Embed. Comput. Syst. (Special Issue on ESWEEK’17) 1--19. Retrieved from https://re.public.polimi.it/retrieve/handle/11311/1030731/222692/EPPDiscrepancyAnalysis.pdf.
[17]
Intel FPGA. 2017. Intel FPGA SDK for OpenCL—Programming Guide. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/hb/opencl-sdk/aocl_programming_guide.pdf.
[18]
J. Goeders and S. J. E. Wilton. 2015. Using dynamic signal-tracing to debug compiler-optimized HLS circuits on FPGAs. In Proceedings of the IEEE 23rd Annual International Symposium on Field-programmable Custom Computing Machines (FCCM’15). 127--134.
[19]
J. Goeders and S. J. E. Wilton. 2015. Using round-robin tracepoints to debug multithreaded HLS circuits on FPGAs. In Proceedings of the International Conference on Field Programmable Technology (FPT’15). 40--47.
[20]
Mentor Graphics. 2017. Catapult C High Level Synthesis, HLS Verification. Retrieved from https://www.mentor.com/hls-lp/catapult-high-level-synthesis/hls-verification.
[21]
Khronos OpenCL Working Group. 2017. The OpenCL Specification—Version 2.2. Retrieved from https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf.
[22]
R. J. Halstead and W. Najjar. 2013. Compiled multithreaded data paths on FPGAs for dynamic workloads. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES’13). IEEE Press, Piscataway, NJ. Retrieved from http://dl.acm.org/citation.cfm?id=2555729.2555732.
[23]
M. B. Hammouda, P. Coussy, and L. Lagadec. 2017. A unified design flow to automatically generate on-chip monitors during high-level synthesis of hardware accelerators. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 36, 3 (Mar. 2017), 384--397.
[24]
Y. Hara, H. Tomiyama, S. Honda, H. Takada, and K. Ishii. 2008. CHStone: A benchmark program suite for practical C-based high-level synthesis. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’08). 1192--1195.
[25]
M. Hosseinabady and J. L. Nunez-Yanez. 2015. Optimised OpenCL workgroup synthesis for hybrid ARM-FPGA devices. In Proceedings of the 25th International Conference on Field Programmable Logic and Applications (FPL’15). 1--6.
[26]
Eddie Hung, Tim Todman, and Wayne Luk. 2017. Transparent in-circuit assertions for FPGAs. IEEE Trans. CAD Integ. Circ. Syst. 36, 7 (2017), 1193--1202.
[27]
Y. Iskander, C. Patterson, and S. Craven. 2014. High-level abstractions and modular debugging for FPGA design validation. ACM Trans. Reconfig. Technol. Syst. 7, 1 (Feb. 2014).
[28]
Al-Shahna Jamal, Jeffrey Goeders, and Steven J. E. Wilton. 2018. Architecture exploration for HLS-oriented FPGA debug overlays. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays (FPGA’18). ACM, New York, NY, 209--218.
[29]
J. Korinth, D. de la Chevallerie, and A. Koch. 2015. An open-source tool flow for the composition of reconfigurable hardware thread pool architectures. In Proceedings of the IEEE 23rd Annual International Symposium on Field-programmable Custom Computing Machines. 195--198.
[30]
C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’04). 75--86.
[31]
S. Ma, M. Huang, and D. Andrews. 2012. Developing application-specific multiprocessor platforms on FPGAs. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs. 1--6.
[32]
M. Minutoli, V. G. Castellana, A. Tumeo, M. Lattuada, and F. Ferrandi. 2016. Enabling the high level synthesis of data analytics accelerators. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’16). 1--3.
[33]
J. S. Monson and B. L. Hutchings. 2014. New approaches for in-system debug of behaviorally-synthesized FPGA circuits. In Proceedings of the 24th International Conference on Field Programmable Logic and Applications (FPL’14). 1--6.
[34]
J. S. Monson and Brad L. Hutchings. 2015. Using source-level transformations to improve high-level synthesis debug and validation on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays (FPGA’15). ACM, New York, NY, 5--8.
[35]
R. Nane, V.-M. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y. T. Chen, H. Hsiao, S. Brown, F. Ferrandi, J. Anderson, and K. Bertels. 2016. A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. PP, 99 (2016), 1--1.
[36]
NEC. 2016. CyberWorkbench: NEC’s High Level Synthesis Solution. Retrieved from http://www.nec.com/en/global/prod/cwb/pdf/CWB_Detailed_technical.pdf.
[37]
T. Nguyen, Y. Cheny, K. Rupnow, S. Gurumani, and D. Chen. 2016. SoC, NoC and hierarchical bus implementations of applications on FPGAs using the FCUDA flow. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’16). 661--666.
[38]
NVIDIA. 2017. CUDA Parallel Programming and Computing Platform. Retrieved from http://www.nvidia.com/object/cuda_home_new.html.
[39]
M. Owaida, N. Bellas, K. Daloukas, and C. D. Antonopoulos. 2011. Synthesis of platform architectures from OpenCL programs. In Proceedings of the IEEE 19th Annual International Symposium on Field-programmable Custom Computing Machines. 186--193.
[40]
A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J. Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14).
[41]
A. Ribon, B. Le Gal, C. Jégo, and D. Dallet. 2011. Assertion support in high-level synthesis design flow. In Proceedings of the FDL Forum on Specification, Verification and Design Languages. 1--8.
[42]
B. Carrion Schafer. 2016. Source code error detection in high-level synthesis functional verification. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 24, 1 (Jan. 2016), 301--312.
[43]
R. M. Stallman and GCC Developer Community. 2009. Using the GNU Compiler Collection: A GNU Manual for GCC Version 4.3.3. CreateSpace Independent Publishing Platform.
[44]
A. Takach. 2016. High-level synthesis: Status, trends, and future directions. IEEE Des. Test 33, 3 (June 2016), 116--124.
[45]
M. Tan, B. Liu, S. Dai, and Z. Zhang. 2014. Multithreaded pipeline synthesis for data-parallel kernels. In Proceedings of the IEEE/ACM International Conference on Computer-aided Design (ICCAD’14). 718--725.
[46]
F. Vahid. 1997. Procedure cloning: A transformation for improved system-level functional partitioning. In Proceedings of the European Design and Test Conference (EDTC’97). 487--492.
[47]
A. Verma, H. Zhou, S. Booth, R. King, J. Coole, A. Keep, J. Marshall, and W.-C. Feng. 2017. Developing dynamic profiling and debugging support in OpenCL for FPGAs. In Proceedings of the 54th ACM/EDAC/IEEE Design Automation Conference (DAC’17).
[48]
Y. Wang, J. Yan, X. Zhou, L. Wang, W. Luk, C. Peng, and J. Tong. 2012. A partially reconfigurable architecture supporting hardware threads. In Proceedings of the International Conference on Field-programmable Technology. 269--276.
[49]
Xilinx. 2017. The SDAccel Development Environment for OpenCL. Retrieved from https://www.xilinx.com/products/design-tools/software-zone/sdaccel.html.
[50]
L. Yang, S. Gurumani, D. Chen, and K. Rupnow. 2016. AutoSLIDE: Automatic source-level instrumentation and debugging for HLS. In Proceedings of the IEEE 24th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM’16). 127--130.
[51]
L. Yang, M. Ikram, S. Gurumani, S. Fahmy, D. Chen, and K. Rupnow. 2015. JIT trace-based verification for high-level synthesis. In Proceedings of the International Conference on Field Programmable Technology (FPT’15). 228--231.

Cited By

View all
  • (2024)DeLoSo: Detecting Logic Synthesis Optimization Faults Based on Configuration DiversityACM Transactions on Design Automation of Electronic Systems10.1145/370123230:1(1-26)Online publication date: 26-Oct-2024
  • (2021)Invited: Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications2021 58th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18074.2021.9586110(1327-1330)Online publication date: 5-Dec-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing
ACM Transactions on Parallel Computing  Volume 7, Issue 4
Special Issue on Innovations in Systems for Irregular Applications, Part 2
December 2020
179 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3426879
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 September 2020
Accepted: 01 June 2020
Revised: 01 November 2019
Received: 01 November 2018
Published in TOPC Volume 7, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Debugging
  2. FPGA
  3. HLS
  4. irregular
  5. multi-threading

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)1
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)DeLoSo: Detecting Logic Synthesis Optimization Faults Based on Configuration DiversityACM Transactions on Design Automation of Electronic Systems10.1145/370123230:1(1-26)Online publication date: 26-Oct-2024
  • (2021)Invited: Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications2021 58th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18074.2021.9586110(1327-1330)Online publication date: 5-Dec-2021

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media