MDPI - Publisher of Open Access Journals

21 pages, 5845 KiB

Open AccessArticle

FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAs

by Mustafa Tasci, Ayhan Istanbullu, Vedat Tumen and Selahattin Kosunalp

Appl. Sci. 2025, 15(2), 688; https://doi.org/10.3390/app15020688 - 12 Jan 2025

Viewed by 322

Recently, convolutional neural networks (CNNs) have received a massive amount of interest due to their ability to achieve high accuracy in various artificial intelligence tasks. With the development of complex CNN models, a significant drawback is their high computational burden and memory requirements. [...] Read more.

Recently, convolutional neural networks (CNNs) have received a massive amount of interest due to their ability to achieve high accuracy in various artificial intelligence tasks. With the development of complex CNN models, a significant drawback is their high computational burden and memory requirements. The performance of a typical CNN model can be enhanced by the improvement of hardware accelerators. Practical implementations on field-programmable gate arrays (FPGA) have the potential to reduce resource utilization while maintaining low power consumption. Nevertheless, when implementing complex CNN models on FPGAs, these may may require further computational and memory capacities, exceeding the available capacity provided by many current FPGAs. An effective solution to this issue is to use quantized neural network (QNN) models to remove the burden of full-precision weights and activations. This article proposes an accelerator design framework for FPGAs, called FPGA-QNN, with a particular value in reducing high computational burden and memory requirements when implementing CNNs. To approach this goal, FPGA-QNN exploits the basics of quantized neural network (QNN) models by converting the high burden of full-precision weights and activations into integer operations. The FPGA-QNN framework comes up with 12 accelerators based on multi-layer perceptron (MLP) and LeNet CNN models, each of which is associated with a specific combination of quantization and folding. The outputs from the performance evaluations on Xilinx PYNQ Z1 development board proved the superiority of FPGA-QNN in terms of resource utilization and energy efficiency in comparison to several recent approaches. The proposed MLP model classified the FashionMNIST dataset at a speed of 953 kFPS with 1019 GOPs while consuming 2.05 W. Full article

(This article belongs to the Special Issue Advancements in Deep Learning and Its Applications)

► Show Figures

Figure 1

35 pages, 11367 KiB

Open AccessArticle

A Novel Field-Programmable Gate Array-Based Self-Sustaining Current Balancing Approach for Silicon Carbide MOSFETs

by Nektarios Giannopoulos, Georgios Ioannidis, Georgios Vokas and Constantinos S. Psomopoulos

Electronics 2025, 14(2), 268; https://doi.org/10.3390/electronics14020268 - 10 Jan 2025

Viewed by 294

Abstract

In medium- and high-power-density applications, silicon carbide (SiC) metal-oxide semiconductor field effect transistors (MOSFETs) are often connected in parallel increasing the current capability. However, the current sharing of paralleled SiC MOSFETs is affected by the mismatched technical parameters of devices and the deviated [...] Read more.

In medium- and high-power-density applications, silicon carbide (SiC) metal-oxide semiconductor field effect transistors (MOSFETs) are often connected in parallel increasing the current capability. However, the current sharing of paralleled SiC MOSFETs is affected by the mismatched technical parameters of devices and the deviated power circuit parasitic inductances, even if power devices are controlled by a single gate driver. This leads to unevenly distributed power losses causing different stress between SiC MOSFETs. As a result, unbalanced current sharing increases the probability of severe power switch(es) and system failures. For over a decade, the current imbalance issue between parallel-connected SiC MOSFETs has concerned the scientific community, and many methods and techniques have been proposed. However, most of these solutions are impossible to realize without the necessity of screening power devices to measure their technical parameters. Consequently, system costs significantly increase due to the expensive equipment for screening SiC MOSFETs. Also, transient current imbalance is the main concern of most papers, without addressing static imbalance. In this paper, an innovative approach is proposed, capable of suppressing both static and transient current imbalance between paralleled SiC MOSFETs, under both symmetrical and asymmetrical layouts, through an improved active gate driver and without the requirement for any power device screening process. Additionally, the proposed solution employs a self-sustaining algorithmic approach utilizing current sensors and a field-programmable gate array (FPGA). The functionality of the proposed solution is verified through experimental tests, achieving current imbalance suppression between two paralleled SiC MOSFETs, actively and autonomously. Full article

(This article belongs to the Special Issue Innovative Technologies in Power Converters, 2nd Edition)

► Show Figures

Figure 1

20 pages, 7167 KiB

Open AccessArticle

Accelerating Deep Learning-Based Morphological Biometric Recognition with Field-Programmable Gate Arrays

by Nourhan Zayed, Nahed Tawfik, Mervat M. A. Mahmoud, Ahmed Fawzy, Young-Im Cho and Mohamed S. Abdallah

AI 2025, 6(1), 8; https://doi.org/10.3390/ai6010008 - 9 Jan 2025

Viewed by 413

Abstract

Convolutional neural networks (CNNs) are increasingly recognized as an important and potent artificial intelligence approach, widely employed in many computer vision applications, such as facial recognition. Their importance resides in their capacity to acquire hierarchical features, which is essential for recognizing complex patterns. [...] Read more.

Convolutional neural networks (CNNs) are increasingly recognized as an important and potent artificial intelligence approach, widely employed in many computer vision applications, such as facial recognition. Their importance resides in their capacity to acquire hierarchical features, which is essential for recognizing complex patterns. Nevertheless, the intricate architectural design of CNNs leads to significant computing requirements. To tackle these issues, it is essential to construct a system based on field-programmable gate arrays (FPGAs) to speed up CNNs. FPGAs provide fast development capabilities, energy efficiency, decreased latency, and advanced reconfigurability. A facial recognition solution by leveraging deep learning and subsequently deploying it on an FPGA platform is suggested. The system detects whether a person has the necessary authorization to enter/access a place. The FPGA is responsible for processing this system with utmost security and without any internet connectivity. Various facial recognition networks are accomplished, including AlexNet, ResNet, and VGG-16 networks. The findings of the proposed method prove that the GoogLeNet network is the best fit due to its lower computational resource requirements, speed, and accuracy. The system was deployed on three hardware kits to appraise the performance of different programming approaches in terms of accuracy, latency, cost, and power consumption. The software programming on the Raspberry Pi-3B kit had a recognition accuracy of around 70–75% and relied on a stable internet connection for processing. This dependency on internet connectivity increases bandwidth consumption and fails to meet the required security criteria, contrary to ZYBO-Z7 board hardware programming. Nevertheless, the hardware/software co-design on the PYNQ-Z2 board achieved an accuracy rate of 85% to 87%. It operates independently of an internet connection, making it a standalone system and saving costs. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

26 pages, 4448 KiB

Open AccessArticle

Leveraging Neural Trojan Side-Channels for Output Exfiltration

by Vincent Meyers, Michael Hefenbrock, Dennis Gnad and Mehdi Tahoori

Cryptography 2025, 9(1), 5; https://doi.org/10.3390/cryptography9010005 - 7 Jan 2025

Viewed by 316

Abstract

Neural networks have become pivotal in advancing applications across various domains, including healthcare, finance, surveillance, and autonomous systems. To achieve low latency and high efficiency, field-programmable gate arrays (FPGAs) are increasingly being employed as accelerators for neural network inference in cloud and edge [...] Read more.

Neural networks have become pivotal in advancing applications across various domains, including healthcare, finance, surveillance, and autonomous systems. To achieve low latency and high efficiency, field-programmable gate arrays (FPGAs) are increasingly being employed as accelerators for neural network inference in cloud and edge devices. However, the rising costs and complexity of neural network training have led to the widespread use of outsourcing of training, pre-trained models, and machine learning services, raising significant concerns about security and trust. Specifically, malicious actors may embed neural Trojans within NNs, exploiting them to leak sensitive data through side-channel analysis. This paper builds upon our prior work, where we demonstrated the feasibility of embedding Trojan side-channels in neural network weights, enabling the extraction of classification results via remote power side-channel attacks. In this expanded study, we introduced a broader range of experiments to evaluate the robustness and effectiveness of this attack vector. We detail a novel training methodology that enhanced the correlation between power consumption and network output, achieving up to a 33% improvement in reconstruction accuracy over benign models. Our approach eliminates the need for additional hardware, making it stealthier and more resistant to conventional hardware Trojan detection methods. We provide comprehensive analyses of attack scenarios in both controlled and variable environmental conditions, demonstrating the scalability and adaptability of our technique across diverse neural network architectures, such as MLPs and CNNs. Additionally, we explore countermeasures and discuss their implications for the design of secure neural network accelerators. To the best of our knowledge, this work is the first to present a passive output recovery attack on neural network accelerators, without explicit trigger mechanisms. The findings emphasize the urgent need to integrate hardware-aware security protocols in the development and deployment of neural network accelerators. Full article

(This article belongs to the Special Issue Emerging Topics in Hardware Security)

► Show Figures

Figure 1

16 pages, 3372 KiB

Open AccessArticle

Design of High-Speed Signal Simulation and Acquisition System for Power Machinery Virtual Testing

by Hongyu Liu, Wei Cui, He Li, Xiuyun Shuai, Qingxin Wang, Jingyao Zhang, Feiyang Zhao and Wenbin Yu

Designs 2025, 9(1), 5; https://doi.org/10.3390/designs9010005 - 6 Jan 2025

Viewed by 318

Abstract

The rapid advancement of model-based simulation has driven the increased adoption of virtual testing in power machinery, raising demands for high accuracy and real-time signal processing. This study introduces a real-time signal simulation and acquisition system leveraging field-programmable gate array (FPGA) technology, designed [...] Read more.

The rapid advancement of model-based simulation has driven the increased adoption of virtual testing in power machinery, raising demands for high accuracy and real-time signal processing. This study introduces a real-time signal simulation and acquisition system leveraging field-programmable gate array (FPGA) technology, designed with flexible scalability and seamless integration with NI hardware-based test systems. The system supports various dynamic signals, including position, injection, and ignition signals, providing robust support for virtual testing and calibration. Comprehensive testing across scenarios involving oscilloscopes, signal generators, and the rapid control prototyping (RCP) platform confirms its high accuracy, stability, and adaptability in multi-signal processing and real-time response. This system is a state-of-the-art and extensively virtual field-tested platform for both power systems and power electronics. Full article

(This article belongs to the Topic Digital Manufacturing Technology)

► Show Figures

Figure 1

13 pages, 1853 KiB

Open AccessArticle

Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification

by Ahmad Mouri Zadeh Khaki and Ahyoung Choi

Appl. Sci. 2025, 15(1), 422; https://doi.org/10.3390/app15010422 - 5 Jan 2025

Viewed by 545

Abstract

Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two [...] Read more.

Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29× and 6.6× compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices. Full article

(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)

► Show Figures

Figure 1

25 pages, 13514 KiB

Open AccessArticle

Parallelized Field-Programmable Gate Array Data Processing for High-Throughput Pulsed-Radar Systems

by Aaron D. Pitcher, Mihail Georgiev, Natalia K. Nikolova and Nicola Nicolici

Sensors 2025, 25(1), 239; https://doi.org/10.3390/s25010239 - 3 Jan 2025

Viewed by 364

Abstract

A parallelized field-programmable gate array (FPGA) architecture is proposed to realize an ultra-fast, compact, and low-cost dual-channel ultra-wideband (UWB) pulsed-radar system. This approach resolves the main shortcoming of current FPGA-based radars, namely their low processing throughput, which leads to a significant loss of [...] Read more.

A parallelized field-programmable gate array (FPGA) architecture is proposed to realize an ultra-fast, compact, and low-cost dual-channel ultra-wideband (UWB) pulsed-radar system. This approach resolves the main shortcoming of current FPGA-based radars, namely their low processing throughput, which leads to a significant loss of data provided by the radar receiver. The architecture is integrated with an in-house UWB pulsed radar operating at a sampling rate of 20 gigasamples per second (GSa/s). It is demonstrated that the FPGA data-processing speed matches that of the radar output, thus eliminating data loss. The radar system achieves a remarkable speed of over 9000 waveforms per second on each channel. The proposed architecture is scalable to accommodate higher sampling rates and various waveform periods. It is also multi-functional since the FPGA controls and synchronizes two transmitters and a dual-channel receiver, performs signal reconstruction on both channels simultaneously, and carries out user-defined averaging, trace windowing, and interference suppression for improving the receiver’s signal-to-noise ratio. We also investigate the throughput rate while offloading radar data onto an external device through an Ethernet link. Since the radar data rate significantly exceeds the Ethernet link capacity, we show how the FPGA-based averaging and windowing functions are leveraged to reduce the amount of offloaded data while fully utilizing the radar output. Full article

(This article belongs to the Special Issue Recent Advances in Radar Imaging Techniques and Applications)

► Show Figures

Figure 1

18 pages, 779 KiB

Open AccessArticle

A Pipelined Hardware Design of FNTT and INTT of CRYSTALS-Kyber PQC Algorithm

by Muhammad Rashid, Omar S. Sonbul, Sajjad Shaukat Jamal, Amar Y. Jaffar and Azamat Kakhorov

Information 2025, 16(1), 17; https://doi.org/10.3390/info16010017 - 31 Dec 2024

Viewed by 323

Abstract

Lattice-based post-quantum cryptography (PQC) algorithms demand number theoretic transform (NTT)-based polynomial multiplications. NTT-based polynomials’ multiplication relies on the computation of forward number theoretic transform (FNTT) and inverse number theoretic transform (INTT), respectively. Therefore, this work presents a unified NTT hardware accelerator architecture to [...] Read more.

Lattice-based post-quantum cryptography (PQC) algorithms demand number theoretic transform (NTT)-based polynomial multiplications. NTT-based polynomials’ multiplication relies on the computation of forward number theoretic transform (FNTT) and inverse number theoretic transform (INTT), respectively. Therefore, this work presents a unified NTT hardware accelerator architecture to facilitate the polynomial multiplications of the CRYSTALS-Kyber PQC algorithm. Moreover, a unified butterfly unit design of Cooley–Tukey and Gentleman–Sande configurations is proposed to implement the FNTT and INTT operations using one adder, one multiplier, and one subtractor, sharing four routing multiplexers and one Barrett-based modular reduction unit. The critical path of the proposed butterfly unit is minimized using pipelining. An efficient controller is implemented for control functionalities. The simulation results after the post-place and -route step are provided on Xilinx Virtex-6 and Virtex-7 field-programmable gate array devices. Also, the proposed design is physically implemented for validation on Virtex-7 FPGA. The number of slices utilized on Virtex-6 and Virtex-7 devices is 398 and 312, the required number of clock cycles for one set of FNTT and INTT computations is 1410 and 1540, and the maximum operating frequency is 256 and 290 MHz, respectively. The average figure of merit (FoM), where FoM is the ratio of throughput to slices, illustrates 62% better performance than the most relevant NTT design from the literature. Full article

(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)

► Show Figures

Figure 1

25 pages, 13126 KiB

Open AccessArticle

Optimal Implementation of d-q Frame Finite Control Set Model Predictive Control with LabVIEW

by Mohamad Esmaeil Iranian, Elyas Zamiri and Angel de Castro

Electronics 2025, 14(1), 100; https://doi.org/10.3390/electronics14010100 - 29 Dec 2024

Viewed by 604

Abstract

Finite Control Set Model Predictive Control emerges as a promising method for controlling power electronics inverters, outperforming traditional linear techniques. However, implementing Finite Control Set Model Predictive Control on conventional processors faces a significant computational burden due to its repetitive nature. This paper [...] Read more.

Finite Control Set Model Predictive Control emerges as a promising method for controlling power electronics inverters, outperforming traditional linear techniques. However, implementing Finite Control Set Model Predictive Control on conventional processors faces a significant computational burden due to its repetitive nature. This paper presents a novel approach that utilizes LabVIEW & Field Programmable Gate Arrays to address this computational bottleneck. By capitalizing on the inherent parallelism and suitability of Field Programmable Gate Arrays for discrete control problems, substantial computational advantages are achieved for Finite Control Set Model Predictive Control. The use of LabVIEW, a well-established platform in industrial and commercial solutions, ensures that this work is relevant not only academically but also for real-world industrial applications of FCS-MPC in power electronics and motor drives. This research successfully demonstrates the application of Finite Control Set Model Predictive Control for controlling the current of a motor-like load for a three-phase Voltage Source Inverter system in LabVIEW. To simplify the traditionally complex Field Programmable Gate Arrays programming process, user-friendly toolkits such as LabVIEW Control Design & Simulation, LabVIEW Real-Time, and LabVIEW FPGA Module are employed. This LabVIEW-based integration facilitates the execution of both concurrent and sequential Field Programmable Gate Arrays algorithms, leading to efficient Field Programmable Gate Arrays resource management and user-defined restrictions on maximum switching frequency, obviating the need for resource-intensive control methods for fast switches such as SiC and GaN IGBTs. The proposed controller is validated using an off-the-shelf computer turned into a real-time system but also on Field Programmable Gate Arrays for comparison purposes. Full article

(This article belongs to the Special Issue Innovative Technologies in Power Converters, 2nd Edition)

► Show Figures

Figure 1

22 pages, 5903 KiB

Open AccessArticle

FPGA-Based Manchester Decoder for IEEE 802.15.7 Visible Light Communications

by Stefano Ricci, Stefano Caputo and Lorenzo Mucchi

Electronics 2025, 14(1), 96; https://doi.org/10.3390/electronics14010096 - 29 Dec 2024

Viewed by 434

Abstract

Visible Light Communication (VLC) is a cutting-edge transmission technique where data is sent by modulating light intensity. Manchester On–Off Keying (OOK) is among the most used modulation techniques in VLC and is normed by IEEE 802.15.7 standard for wireless networks. Various Manchester decoder [...] Read more.

Visible Light Communication (VLC) is a cutting-edge transmission technique where data is sent by modulating light intensity. Manchester On–Off Keying (OOK) is among the most used modulation techniques in VLC and is normed by IEEE 802.15.7 standard for wireless networks. Various Manchester decoder schemes are documented in the literature, often leveraging minimal two-level analog-to-digital converters followed by straightforward digital logic. These methods often compromise performance for simplicity. However, the VLC applications in fields like automotive and/or aerospace require the maximum performance in terms of bit error rate (BER) with respect to Signal-to-Noise Ratio (SNR), together with a real-time low-latency implementation. In this work, we introduce a high-performance Manchester decoder and detail its implementation in a Field Programmable Gate Array (FPGA). The decoder operates by acquiring a fully resolved signal (12-bit resolution) and by calculating the phase of the transmitted bit. Additionally, the proposed decoder achieves and maintains synchronization with the incoming signal, tolerating frequency shifts and jitter up to 1%. The Manchester decoder was tested in a VLC system with automotive-certified headlamps, realizing an IEEE 802.15.7-compliant link at 100 kb/s. The proposed decoder ensures a BER below 10⁻² for SNR > −12 dB and, compared to a standard decoder, achieves the same BER when the input signal has an SNR of 10 dB lower. Full article

(This article belongs to the Special Issue System-on-Chip (SoC) and Field-Programmable Gate Array (FPGA) Design)

► Show Figures

Figure 1

19 pages, 553 KiB

Open AccessArticle

ORNIC: A High-Performance RDMA NIC with Out-of-Order Packet Direct Write Method for Multipath Transmission

by Jiandong Ma, Zhichuan Guo, Yipeng Pan, Mengting Zhang, Zhixiang Zhao, Zezheng Sun and Yiwei Chang

Electronics 2025, 14(1), 88; https://doi.org/10.3390/electronics14010088 - 28 Dec 2024

Viewed by 437

Abstract

Remote Direct Memory Access (RDMA) technology provides a low-latency, high-bandwidth, and CPU-bypassed method for data transmission between servers. Recent works have proved that multipath transmission, especially packet spraying, can avoid network congestion, achieve load balancing, and improve overall performance in data center networks [...] Read more.

Remote Direct Memory Access (RDMA) technology provides a low-latency, high-bandwidth, and CPU-bypassed method for data transmission between servers. Recent works have proved that multipath transmission, especially packet spraying, can avoid network congestion, achieve load balancing, and improve overall performance in data center networks (DCNs). Multipath transmission can result in out-of-order (OOO) packet delivery. However, existing RDMA transport protocols, such as RDMA over Converged Ethernet version 2 (RoCEv2), are designed for handling sequential packets, limiting their ability to support multipath transmission. To address this issue, in this study, we propose ORNIC, a high-performance RDMA Network Interface Card (NIC) with out-of-order packet direct write method for multipath transmission. ORNIC supports both in-order and out-of-order packet reception. The payload of OOO packets is written directly to user memory without reordering. The write address is embedded in the packets only when necessary. A bitmap is used to check data integrity and detect packet loss. We redesign the bitmap structure into an array of bitmap blocks that support dynamic allocation. Once a bitmap block is full, it is marked and can be freed in advance. We implement ORNIC on a Xilinx U200 FPGA (Field-Programmable Gate Array), which consumes less than 15% of hardware resources. ORNIC can achieve 95 Gbps RDMA throughput, which is nearly 2.5 times that of MP-RDMA. When handling OOO packets, ORNIC’s performance is virtually unaffected, while the performance of Xilinx ERNIC and Mellanox CX-5 drops below 1 Gbps. Moreover, compared with MELO and LEFT, our bitmap has higher performance and lower bitmap block usage. Full article

(This article belongs to the Topic Advanced Integrated Circuit Design and Application)

► Show Figures

Figure 1

16 pages, 3804 KiB

Open AccessArticle

Ring Oscillators with Additional Phase Detectors as a Random Source in a Random Number Generator

by Łukasz Matuszewski, Mieczysław Jessa and Jakub Nikonowicz

Entropy 2025, 27(1), 15; https://doi.org/10.3390/e27010015 - 28 Dec 2024

Viewed by 385

Abstract

In this paper, we propose a method to enhance the performance of a random number generator (RNG) that exploits ring oscillators (ROs). Our approach employs additional phase detectors to extract more entropy; thus, RNG uses fewer resources to produce bit sequences that pass [...] Read more.

In this paper, we propose a method to enhance the performance of a random number generator (RNG) that exploits ring oscillators (ROs). Our approach employs additional phase detectors to extract more entropy; thus, RNG uses fewer resources to produce bit sequences that pass all statistical tests proposed by National Institute of Standards and Technology (NIST). Generating a specified number of bits is on-demand, eliminating the need for continuous RNG operation. This feature enhances the security of the produced sequences, as eavesdroppers are unable to observe the continuous random bit generation process, such as through monitoring power lines. Furthermore, our research demonstrates that the proposed RNG’s perfect properties remain unaffected by the manufacturer of the field-programmable gate arrays (FPGAs) used for implementation. This independence ensures the RNG’s reliability and consistency across various FPGA manufacturers. Additionally, we highlight that the tests recommended by the NIST may prove insufficient in assessing the randomness of the output bit streams produced by RO-based RNGs. Full article

(This article belongs to the Section Signal and Data Analysis)

► Show Figures

Figure 1

31 pages, 3152 KiB

Open AccessArticle

Research on Spaceborne Neural Network Accelerator and Its Fault Tolerance Design

by Yingzhao Shao, Junyi Wang, Xiaodong Han, Yunsong Li, Yaolin Li and Zhanpeng Tao

Remote Sens. 2025, 17(1), 69; https://doi.org/10.3390/rs17010069 - 28 Dec 2024

Viewed by 349

Abstract

To meet the high-reliability requirements of real-time on-orbit tasks, this paper proposes a fault-tolerant reinforcement design method for spaceborne intelligent processing algorithms based on convolutional neural networks (CNNs). This method is built on a CNN accelerator using Field-Programmable Gate Array (FPGA) technology, analyzing [...] Read more.

To meet the high-reliability requirements of real-time on-orbit tasks, this paper proposes a fault-tolerant reinforcement design method for spaceborne intelligent processing algorithms based on convolutional neural networks (CNNs). This method is built on a CNN accelerator using Field-Programmable Gate Array (FPGA) technology, analyzing the impact of Single-Event Upsets (SEUs) on neural network computation. The accelerator design integrates data validation, Triple Modular Redundancy (TMR), and other techniques, optimizing a partial fault-tolerant architecture based on SEU sensitivity. This fault-tolerant architecture analyzes the hardware accelerator, parameter storage, and actual computation, employing data validation to reinforce model parameters and spatial and temporal TMR to reinforce accelerator computations. Using the ResNet18 model, fault tolerance performance tests were conducted by simulating SEUs. Compared to the prototype network, this fault-tolerant design method increases tolerance to SEU error accumulation by five times while increasing resource consumption by less than 15%, making it more suitable for spaceborne on-orbit applications than traditional fault-tolerant design approaches. Full article

► Show Figures

Figure 1

25 pages, 5363 KiB

Open AccessArticle

Power-Optimized Field-Programmable Gate Array Implementation of Neural Activation Functions Using Continued Fractions for AI/ML Workloads

by Chanakya Hingu, Xingang Fu, Taofiki Saliyu, Rui Hu and Ramkrishna Mishan

Electronics 2024, 13(24), 5026; https://doi.org/10.3390/electronics13245026 - 20 Dec 2024

Viewed by 379

Abstract

The increasing demand for energy-efficient hardware platforms to support artificial intelligence (AI) and machine learning (ML) algorithms in edge computing has driven the adoption of system-on-chip (SoC) architectures. Implementing neural network (NN) activation functions, such as the hyperbolic tangent (tanh), on hardware presents [...] Read more.

The increasing demand for energy-efficient hardware platforms to support artificial intelligence (AI) and machine learning (ML) algorithms in edge computing has driven the adoption of system-on-chip (SoC) architectures. Implementing neural network (NN) activation functions, such as the hyperbolic tangent (tanh), on hardware presents challenges due to computational complexity, high resource requirements, and power consumption. This paper aims to optimize the hardware implementation of the tanh function using continued fraction and polynomial approximations to minimize resource consumption and power usage while preserving computational accuracy. Five models of the tanh function, including continued fraction and quadratic approximations, were implemented on Intel field-programmable gate arrays (FPGAs) using VHDL and Intel’s ALTFP toolbox, with 32-bit floating-point outputs validated against MATLAB’s 64-bit floating-point results. Detailed analyses of resource utilization, power optimization, clock latency, and bit-level accuracy were conducted, focusing on minimizing logic elements and digital signal processing (DSP) blocks while achieving high precision and low power consumption. The most optimized model was further integrated into a four-input, two-output recurrent neural network (RNN) structure to assess real-time performance. Experimental results demonstrate that the continued fraction-based models significantly reduce resource usage, computation time, and power consumption, enhancing FPGA performance for AI/ML applications in resource-constrained and power-sensitive environments. Full article

(This article belongs to the Special Issue Artificial Intelligence and Pattern Recognition for Intelligent Systems)

► Show Figures

Figure 1

22 pages, 1314 KiB

Open AccessArticle

Area and Performance Estimates of Finite State Machines in Reconfigurable Systems

by Valery Salauyou

Appl. Sci. 2024, 14(24), 11833; https://doi.org/10.3390/app142411833 - 18 Dec 2024

Viewed by 365

Abstract

Modern reconfigurable systems are typically implemented in field-programmable gate arrays (FPGAs) based on look-up tables (LUTs). Finite state machines (FSMs) perform the functions of control devices and are integral to reconfigurable systems. When designing reconfigurable systems, the problem of optimizing the area and [...] Read more.

Modern reconfigurable systems are typically implemented in field-programmable gate arrays (FPGAs) based on look-up tables (LUTs). Finite state machines (FSMs) perform the functions of control devices and are integral to reconfigurable systems. When designing reconfigurable systems, the problem of optimizing the area and performance of FSMs often arises. The FSM synthesis and state encoding methods generally use only one estimate of the FSM area or performance. However, regardless of the computational complexity of the FSM synthesis or state encoding method, if the estimate incorrectly reflects the optimization aim, the result is far from the optimal solution. This paper proposes several estimates of the area and performance of FSMs implemented in LUT-based FPGAs. The effectiveness of the proposed estimates was investigated using the sequential method for FSM state encoding. Experimental studies on benchmarks showed that the FSM area decreases on average from 3.8% to 6.5%, compared to known approaches (for some cases by 33.3%), while the performance increases on average from 3.5% to 7.3% (for some cases by 27.6%). Recommendations for the practical use of the proposed estimates are also provided. The Conclusions section highlights promising directions for future research. Full article

(This article belongs to the Special Issue Advances in Field-Programmable Gate Array (FPGA)-Based Reconfigurable Systems)

► Show Figures

Figure 1

Search Results (1,023)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,023)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI