Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (360)

Search Parameters:
Keywords = multi-frequency fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 393 KiB  
Review
Monitoring Yield and Quality of Forages and Grassland in the View of Precision Agriculture Applications—A Review
by Abid Ali and Hans-Peter Kaul
Remote Sens. 2025, 17(2), 279; https://doi.org/10.3390/rs17020279 - 15 Jan 2025
Viewed by 412
Abstract
The potential of precision agriculture (PA) in forage and grassland management should be more extensively exploited to meet the increasing global food demand on a sustainable basis. Monitoring biomass yield and quality traits directly impacts the fertilization and irrigation practises and frequency of [...] Read more.
The potential of precision agriculture (PA) in forage and grassland management should be more extensively exploited to meet the increasing global food demand on a sustainable basis. Monitoring biomass yield and quality traits directly impacts the fertilization and irrigation practises and frequency of utilization (cuts) in grasslands. Therefore, the main goal of the review is to examine the techniques for using PA applications to monitor productivity and quality in forage and grasslands. To achieve this, the authors discuss several monitoring technologies for biomass and plant stand characteristics (including quality) that make it possible to adopt digital farming in forages and grassland management. The review provides an overview about mass flow and impact sensors, moisture sensors, remote sensing-based approaches, near-infrared (NIR) spectroscopy, and mapping field heterogeneity and promotes decision support systems (DSSs) in this field. At a small scale, advanced sensors such as optical, thermal, and radar sensors mountable on drones; LiDAR (Light Detection and Ranging); and hyperspectral imaging techniques can be used for assessing plant and soil characteristics. At a larger scale, we discuss coupling of remote sensing with weather data (synergistic grassland yield modelling), Sentinel-2 data with radiative transfer modelling (RTM), Sentinel-1 backscatter, and Catboost–machine learning methods for digital mapping in terms of precision harvesting and site-specific farming decisions. It is known that the delineation of sward heterogeneity is more difficult in mixed grasslands due to spectral similarity among species. Thanks to Diversity-Interactions models, jointly assessing various species interactions under mixed grasslands is allowed. Further, understanding such complex sward heterogeneity might be feasible by integrating spectral un-mixing techniques such as the super-pixel segmentation technique, multi-level fusion procedure, and combined NIR spectroscopy with neural network models. This review offers a digital option for enhancing yield monitoring systems and implementing PA applications in forages and grassland management. The authors recommend a future research direction for the inclusion of costs and economic returns of digital technologies for precision grasslands and fodder production. Full article
Show Figures

Graphical abstract

15 pages, 2843 KiB  
Article
MSEANet: Multi-Scale Selective Edge Aware Network for Polyp Segmentation
by Botao Liu, Changqi Shi and Ming Zhao
Algorithms 2025, 18(1), 42; https://doi.org/10.3390/a18010042 - 12 Jan 2025
Viewed by 481
Abstract
The colonoscopy procedure heavily relies on the operator’s expertise, underscoring the importance of automated polyp segmentation techniques in enhancing the efficiency and accuracy of colorectal cancer diagnosis. Nevertheless, achieving precise segmentation remains a significant challenge due to the high visual similarity between polyps [...] Read more.
The colonoscopy procedure heavily relies on the operator’s expertise, underscoring the importance of automated polyp segmentation techniques in enhancing the efficiency and accuracy of colorectal cancer diagnosis. Nevertheless, achieving precise segmentation remains a significant challenge due to the high visual similarity between polyps and their backgrounds, blurred boundaries, and complex localization. To address these challenges, a Multi-scale Selective Edge-Aware Network has been proposed to facilitate polyp segmentation. The model consists of three key components: (1) an Edge Feature Extractor (EFE) that captures polyp edge features with precision during the initial encoding phase, (2) the Cross-layer Context Fusion (CCF) block designed to extract and integrate multi-scale contextual information from diverse receptive fields, and (3) the Selective Edge Aware (SEA) module that enhances sensitivity to high-frequency edge details during the decoding phase, thereby improving edge preservation and segmentation accuracy. The effectiveness of our model has been rigorously validated on the Kvasir-SEG, Kvasir-Sessile, and BKAI datasets, achieving mean Dice scores of 91.92%, 82.10%, and 92.24%, respectively, on the test sets. Full article
(This article belongs to the Special Issue Artificial Intelligence Algorithms for Medicine (2nd Edition))
Show Figures

Figure 1

17 pages, 18059 KiB  
Article
Robust Multi-Subtype Identification of Breast Cancer Pathological Images Based on a Dual-Branch Frequency Domain Fusion Network
by Jianjun Li, Kaiyue Wang and Xiaozhe Jiang
Sensors 2025, 25(1), 240; https://doi.org/10.3390/s25010240 - 3 Jan 2025
Viewed by 460
Abstract
Breast cancer (BC) is one of the most lethal cancers worldwide, and its early diagnosis is critical for improving patient survival rates. However, the extraction of key information from complex medical images and the attainment of high-precision classification present a significant challenge. In [...] Read more.
Breast cancer (BC) is one of the most lethal cancers worldwide, and its early diagnosis is critical for improving patient survival rates. However, the extraction of key information from complex medical images and the attainment of high-precision classification present a significant challenge. In the field of signal processing, texture-rich images typically exhibit periodic patterns and structures, which are manifested as significant energy concentrations at specific frequencies in the frequency domain. Given the above considerations, this study is designed to explore the application of frequency domain analysis in BC histopathological classification. This study proposes the dual-branch adaptive frequency domain fusion network (AFFNet), designed to enable each branch to specialize in distinct frequency domain features of pathological images. Additionally, two different frequency domain approaches, namely Multi-Spectral Channel Attention (MSCA) and Fourier Filtering Enhancement Operator (FFEO), are employed to enhance the texture features of pathological images and minimize information loss. Moreover, the contributions of the two branches at different stages are dynamically adjusted by a frequency-domain-adaptive fusion strategy to accommodate the complexity and multi-scale features of pathological images. The experimental results, based on two public BC histopathological image datasets, corroborate the idea that AFFNet outperforms 10 state-of-the-art image classification methods, underscoring its effectiveness and superiority in this domain. Full article
(This article belongs to the Special Issue AI-Based Automated Recognition and Detection in Healthcare)
Show Figures

Figure 1

18 pages, 7697 KiB  
Article
GNSS/IMU/ODO Integrated Navigation Method Based on Adaptive Sliding Window Factor Graph
by Xinchun Ji, Chenjun Long, Liuyin Ju, Hang Zhao and Dongyan Wei
Electronics 2025, 14(1), 124; https://doi.org/10.3390/electronics14010124 - 31 Dec 2024
Viewed by 391
Abstract
One of the predominant technologies for multi-source navigation in vehicles involves the fusion of GNSS/IMU/ODO through a factor graph. To address issues such as the asynchronous sampling frequencies between the IMU and ODO, as well as diminished accuracy during GNSS signal loss, we [...] Read more.
One of the predominant technologies for multi-source navigation in vehicles involves the fusion of GNSS/IMU/ODO through a factor graph. To address issues such as the asynchronous sampling frequencies between the IMU and ODO, as well as diminished accuracy during GNSS signal loss, we propose a GNSS/IMU/ODO integrated navigation method based on an adaptive sliding window factor graph. The measurements from the ODO are utilized as observation factors to mitigate prediction interpolation errors associated with traditional ODO pre-integration methods. Additionally, online estimation and compensation for both installation angle deviations and scale factors of the ODO further enhance its ability to constrain pose errors during GNSS signal loss. A multi-state marginalization algorithm is proposed and then utilized to adaptively adjust the sliding window size based on the quality of GNSS observations, enhancing pose optimization accuracy in multi-source fusion while prioritizing computational efficiency. Tests conducted in typical urban environments and mountainous regions demonstrate that our proposed method significantly enhances fusion navigation accuracy under complex GNSS conditions. In a complex city environment, our method achieves a 55.3% and 29.8% improvement in position and velocity accuracy and enhancements of 32.0% and 61.6% in pitch and heading angle accuracy, respectively. These results match the precision of long sliding windows, with a 75.8% gain in computational efficiency. In mountainous regions, our method enhances the position accuracy in the three dimensions by factors of 89.5%, 83.7%, and 43.4%, the velocity accuracy in the three dimensions by factors of 65.4%, 32.6%, and 53.1%, and reduces the attitude errors in roll, pitch, and yaw by 70.5%, 60.8%, and 26.0%, respectively, demonstrating strong engineering applicability through an optimal balance of precision and efficiency. Full article
Show Figures

Figure 1

29 pages, 17674 KiB  
Article
Noise-Perception Multi-Frame Collaborative Network for Enhanced Polyp Detection in Endoscopic Videos
by Haoran Li, Guoyong Zhen, Chengqun Chu, Yuting Ma and Yongnan Zhao
Viewed by 443
Abstract
The accurate detection and localization of polyps during endoscopic examinations are critical for early disease diagnosis and cancer prevention. However, the presence of artifacts and noise, along with the high similarity between polyps and surrounding tissues in color, shape, and texture complicates polyp [...] Read more.
The accurate detection and localization of polyps during endoscopic examinations are critical for early disease diagnosis and cancer prevention. However, the presence of artifacts and noise, along with the high similarity between polyps and surrounding tissues in color, shape, and texture complicates polyp detection in video frames. To tackle these challenges, we deployed multivariate regression analysis to refine the model and introduced a Noise-Suppressing Perception Network (NSPNet) designed for enhanced performance. NSPNet leverages wavelet transform to enhance the model’s resistance to noise and artifacts while improving a multi-frame collaborative detection strategy for dynamic polyp detection in endoscopic videos, efficiently utilizing temporal information to strengthen features across frames. Specifically, we designed a High-Low Frequency Feature Fusion (HFLF) framework, which allows the model to capture high-frequency details more effectively. Additionally, we introduced an improved STFT-LSTM Polyp Detection (SLPD) module that utilizes temporal information from video sequences to enhance feature fusion in dynamic environments. Lastly, we integrated an Image Augmentation Polyp Detection (IAPD) module to improve performance on unseen data through preprocessing enhancement strategies. Extensive experiments demonstrate that NSPNet outperforms nine SOTA methods across four datasets on key performance metrics, including F1Score and recall. Full article
Show Figures

Figure 1

18 pages, 14931 KiB  
Article
Wavelet-Driven Multi-Band Feature Fusion for RGB-T Salient Object Detection
by Jianxun Zhao, Xin Wen, Yu He, Xiaowei Yang and Kechen Song
Sensors 2024, 24(24), 8159; https://doi.org/10.3390/s24248159 - 20 Dec 2024
Viewed by 554
Abstract
RGB-T salient object detection (SOD) has received considerable attention in the field of computer vision. Although existing methods have achieved notable detection performance in certain scenarios, challenges remain. Many methods fail to fully utilize high-frequency and low-frequency features during information interaction among different [...] Read more.
RGB-T salient object detection (SOD) has received considerable attention in the field of computer vision. Although existing methods have achieved notable detection performance in certain scenarios, challenges remain. Many methods fail to fully utilize high-frequency and low-frequency features during information interaction among different scale features, limiting detection performance. To address this issue, we propose a method for RGB-T salient object detection that enhances performance through wavelet transform and channel-wise attention fusion. Through feature differentiation, we effectively extract spatial characteristics of the target, enhancing the detection capability for global context and fine-grained details. First, input features are passed through the channel-wise criss-cross module (CCM) for cross-modal information fusion, adaptively adjusting the importance of features to generate rich fusion information. Subsequently, the multi-scale fusion information is input into the feature selection wavelet transforme module (FSW), which selects beneficial low-frequency and high-frequency features to improve feature aggregation performance and achieves higher segmentation accuracy through long-distance connections. Extensive experiments demonstrate that our method outperforms 22 state-of-the-art methods. Full article
(This article belongs to the Special Issue Multi-Modal Image Processing Methods, Systems, and Applications)
Show Figures

Figure 1

30 pages, 13159 KiB  
Article
GLMAFuse: A Dual-Stream Infrared and Visible Image Fusion Framework Integrating Local and Global Features with Multi-Scale Attention
by Fu Li, Yanghai Gu, Ming Zhao, Deji Chen and Quan Wang
Electronics 2024, 13(24), 5002; https://doi.org/10.3390/electronics13245002 - 19 Dec 2024
Viewed by 513
Abstract
Integrating infrared and visible-light images facilitates a more comprehensive understanding of scenes by amalgamating dual-sensor data derived from identical environments. Traditional CNN-based fusion techniques are predominantly confined to local feature emphasis due to their inherently limited receptive fields. Conversely, Transformer-based models tend to [...] Read more.
Integrating infrared and visible-light images facilitates a more comprehensive understanding of scenes by amalgamating dual-sensor data derived from identical environments. Traditional CNN-based fusion techniques are predominantly confined to local feature emphasis due to their inherently limited receptive fields. Conversely, Transformer-based models tend to prioritize global information, which can lead to a deficiency in feature diversity and detail retention. Furthermore, methods reliant on single-scale feature extraction are inadequate for capturing extensive scene information. To address these limitations, this study presents GLMAFuse, an innovative dual-stream encoder–decoder network, which utilizes a multi-scale attention mechanism to harmoniously integrate global and local features. This framework is designed to maximize the extraction of multi-scale features from source images while effectively synthesizing local and global information across all layers. We introduce the global-aware and local embedding (GALE) module to adeptly capture and merge global structural attributes and localized details from infrared and visible imagery via a parallel dual-branch architecture. Additionally, the multi-scale attention fusion (MSAF) module is engineered to optimize attention weights at the channel level, facilitating an enhanced synergy between high-frequency edge details and global backgrounds. This promotes effective interaction and fusion of dual-modal features. Extensive evaluations using standard datasets demonstrate that GLMAFuse surpasses the existing leading methods in both qualitative and quantitative assessments, highlighting its superior capability in infrared and visible image fusion. On the TNO and MSRS datasets, our method achieves outstanding performance across multiple metrics, including EN (7.15, 6.75), SD (46.72, 47.55), SF (12.79, 12.56), MI (2.21, 3.22), SCD (1.75, 1.80), VIF (0.79, 1.08), Qbaf (0.58, 0.71), and SSIM (0.99, 1.00). These results underscore its exceptional proficiency in infrared and visible image fusion. Full article
(This article belongs to the Special Issue Artificial Intelligence Innovations in Image Processing)
Show Figures

Figure 1

15 pages, 3905 KiB  
Article
Conditional Skipping Mamba Network for Pan-Sharpening
by Yunxuan Tang, Huaguang Li, Peng Liu and Tong Li
Symmetry 2024, 16(12), 1681; https://doi.org/10.3390/sym16121681 - 19 Dec 2024
Viewed by 491
Abstract
Pan-sharpening aims to generate high-resolution multispectral (HRMS) images by combining high-resolution panchromatic (PAN) images with low-resolution multispectral (LRMS) data, while maintaining the symmetry of spatial and spectral characteristics. Traditional convolutional neural networks (CNNs) struggle with global dependency modeling due to local receptive fields, [...] Read more.
Pan-sharpening aims to generate high-resolution multispectral (HRMS) images by combining high-resolution panchromatic (PAN) images with low-resolution multispectral (LRMS) data, while maintaining the symmetry of spatial and spectral characteristics. Traditional convolutional neural networks (CNNs) struggle with global dependency modeling due to local receptive fields, and Transformer-based models are computationally expensive. Recent Mamba models offer linear complexity and effective global modeling. However, existing Mamba-based methods lack sensitivity to local feature variations, leading to suboptimal fine-detail preservation. To address this, we propose a Conditional Skipping Mamba Network (CSMN), which enhances global-local feature fusion symmetrically through two modules: (1) the Adaptive Mamba Module (AMM), which improves global perception using adaptive spatial-frequency integration; and (2) the Cross-domain Mamba Module (CDMM), optimizing cross-domain spectral-spatial representation. Experimental results on the IKONOS and WorldView-2 datasets demonstrate that CSMN surpasses existing state-of-the-art methods in achieving superior spectral consistency and preserving spatial details, with performance that is more symmetric in fine-detail preservation. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

24 pages, 5004 KiB  
Article
SymSwin: Multi-Scale-Aware Super-Resolution of Remote Sensing Images Based on Swin Transformers
by Dian Jiao, Nan Su, Yiming Yan, Ying Liang, Shou Feng, Chunhui Zhao and Guangjun He
Remote Sens. 2024, 16(24), 4734; https://doi.org/10.3390/rs16244734 - 18 Dec 2024
Viewed by 555
Abstract
Despite the successful applications of the remote sensing image in agriculture, meteorology, and geography, its relatively low spatial resolution is hindering the further applications. Super-resolution technology is introduced to conquer such a dilemma. It is a challenging task due to the variations in [...] Read more.
Despite the successful applications of the remote sensing image in agriculture, meteorology, and geography, its relatively low spatial resolution is hindering the further applications. Super-resolution technology is introduced to conquer such a dilemma. It is a challenging task due to the variations in object size and textures in remote sensing images. To address that problem, we present SymSwin, a super-resolution model based on the Swin transformer aimed to capture a multi-scale context. The symmetric multi-scale window (SyMW) mechanism is proposed and integrated in the backbone, which is capable of perceiving features with various sizes. First, the SyMW mechanism is proposed to capture discriminative contextual features from multi-scale presentations using corresponding attentive window size. Subsequently, a cross-receptive field-adaptive attention (CRAA) module is introduced to model the relations among multi-scale contexts and to realize adaptive fusion. Furthermore, RS data exhibit poor spatial resolution, leading to insufficient visual information when merely spatial supervision is applied. Therefore, a U-shape wavelet transform (UWT) loss is proposed to facilitate the training process from the frequency domain. Extensive experiments demonstrate that our method achieves superior performance in both quantitative metrics and visual quality compared with existing algorithms. Full article
Show Figures

Figure 1

17 pages, 2730 KiB  
Article
Redefining Contextual and Boundary Synergy: A Boundary-Guided Fusion Network for Medical Image Segmentation
by Yu Chen, Yun Wu, Jiahua Wu, Xinxin Zhang, Dahan Wang and Shunzhi Zhu
Electronics 2024, 13(24), 4986; https://doi.org/10.3390/electronics13244986 - 18 Dec 2024
Viewed by 488
Abstract
Medical image segmentation plays a crucial role in medical image processing, focusing on the automated extraction of regions of interest (such as organs, lesions, etc.) from medical images. This process supports various clinical applications, including diagnosis, surgical planning, and treatment. In this paper, [...] Read more.
Medical image segmentation plays a crucial role in medical image processing, focusing on the automated extraction of regions of interest (such as organs, lesions, etc.) from medical images. This process supports various clinical applications, including diagnosis, surgical planning, and treatment. In this paper, we introduce a Boundary-guided Context Fusion U-Net (BCF-UNet), a novel approach designed to tackle a critical shortcoming in current methods: the inability to effectively integrate boundary information with semantic context. The BCF-UNet introduces a Adaptive Multi-Frequency Encoder (AMFE), which uses multi-frequency analysis inspired by the Wavelet Transform (WT) to capture both local and global features efficiently. The Adaptive Multi-Frequency Encoder (AMFE) decomposes images into different frequency components and adapts more effectively to boundary texture information through a learnable activation function. Additionally, we introduce a new multi-scale feature fusion module, the Atten-kernel Adaptive Fusion Module (AKAFM), designed to integrate deep semantic information with shallow texture details, significantly bridging the gap between features at different scales. Furthermore, each layer of the encoder sub-network integrates a Boundary-aware Pyramid Module (BAPM), which utilizes a simple and effective method and combines it with a priori knowledge to extract multi-scale edge features to improve the accuracy of boundary segmentation. In BCF-UNet, semantic context is used to guide edge information extraction, enabling the model to more effectively comprehend and identify relationships among various organizational structures. Comprehensive experimental evaluations on two datasets demonstrate that the proposed BCF-UNet achieves superior performance compared to existing state-of-the-art methods. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

25 pages, 9994 KiB  
Article
A Triple-Channel Network for Maritime Radar Targets Detection Based on Multi-Modal Features
by Kaiqi Wang and Zeyu Wang
Remote Sens. 2024, 16(24), 4662; https://doi.org/10.3390/rs16244662 - 13 Dec 2024
Viewed by 476
Abstract
Sea surface target detectors are often interfered by various complex sea surface factors such as sea clutter. Especially when the signal-to-clutter ratio (SCR) is low, it is difficult to achieve high-performance detection. This paper proposes a triple-channel network model for maritime target detection [...] Read more.
Sea surface target detectors are often interfered by various complex sea surface factors such as sea clutter. Especially when the signal-to-clutter ratio (SCR) is low, it is difficult to achieve high-performance detection. This paper proposes a triple-channel network model for maritime target detection based on the method of multi-modal data fusion. This method comprehensively improves the traditional multi-channel inputs by extracting highly complementary multi-modal features from radar echoes, namely, time-frequency image, phase sequence and correlation coefficient sequence. Appropriate networks are selected to construct a triple-channel network according to the internal data structure of each feature. The three features are utilized as the input of each network channel. To reduce the coupling between multi-channel data, the SE block is introduced to optimize the feature vectors of the channel dimension and improve the data fusion strategy. The detection results are output by the false alarm control unit according to the given probability of false alarm (PFA). The experiments on the IPIX datasets verify that the performance of the proposed detector is better than the existing detectors in dealing with complex ocean scenes. Full article
Show Figures

Graphical abstract

24 pages, 3395 KiB  
Article
Drone-Based Wildfire Detection with Multi-Sensor Integration
by Akmalbek Abdusalomov, Sabina Umirzakova, Makhkamov Bakhtiyor Shukhratovich, Mukhriddin Mukhiddinov, Azamat Kakhorov, Abror Buriboev and Heung Seok Jeon
Remote Sens. 2024, 16(24), 4651; https://doi.org/10.3390/rs16244651 (registering DOI) - 12 Dec 2024
Viewed by 807
Abstract
Wildfires pose a severe threat to ecological systems, human life, and infrastructure, making early detection critical for timely intervention. Traditional fire detection systems rely heavily on single-sensor approaches and are often hindered by environmental conditions such as smoke, fog, or nighttime scenarios. This [...] Read more.
Wildfires pose a severe threat to ecological systems, human life, and infrastructure, making early detection critical for timely intervention. Traditional fire detection systems rely heavily on single-sensor approaches and are often hindered by environmental conditions such as smoke, fog, or nighttime scenarios. This paper proposes Adaptive Multi-Sensor Oriented Object Detection with Space–Frequency Selective Convolution (AMSO-SFS), a novel deep learning-based model optimized for drone-based wildfire and smoke detection. AMSO-SFS combines optical, infrared, and Synthetic Aperture Radar (SAR) data to detect fire and smoke under varied visibility conditions. The model introduces a Space–Frequency Selective Convolution (SFS-Conv) module to enhance the discriminative capacity of features in both spatial and frequency domains. Furthermore, AMSO-SFS utilizes weakly supervised learning and adaptive scale and angle detection to identify fire and smoke regions with minimal labeled data. Extensive experiments show that the proposed model outperforms current state-of-the-art (SoTA) models, achieving robust detection performance while maintaining computational efficiency, making it suitable for real-time drone deployment. Full article
Show Figures

Figure 1

17 pages, 2272 KiB  
Article
Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
by Okpala Chibuike and Xiaopeng Yang
Diagnostics 2024, 14(24), 2790; https://doi.org/10.3390/diagnostics14242790 - 12 Dec 2024
Viewed by 776
Abstract
Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have [...] Read more.
Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions. Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation. Results: Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions. Conclusions: Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

14 pages, 793 KiB  
Article
MFF-Net: A Lightweight Multi-Frequency Network for Measuring Heart Rhythm from Facial Videos
by Wenqin Yan, Jialiang Zhuang, Yuheng Chen, Yun Zhang and Xiujuan Zheng
Sensors 2024, 24(24), 7937; https://doi.org/10.3390/s24247937 - 12 Dec 2024
Viewed by 487
Abstract
Remote photo-plethysmography (rPPG) is a useful camera-based health motioning method that can measure the heart rhythm from facial videos. Many well-established deep learning models can provide highly accurate and robust results in measuring heart rate (HR) and heart rate variability (HRV). However, these [...] Read more.
Remote photo-plethysmography (rPPG) is a useful camera-based health motioning method that can measure the heart rhythm from facial videos. Many well-established deep learning models can provide highly accurate and robust results in measuring heart rate (HR) and heart rate variability (HRV). However, these methods are unable to effectively eliminate illumination variation and motion artifact disturbances, and their substantial computational resource requirements significantly limit their applicability in real-world scenarios. Hence, we propose a lightweight multi-frequency network named MFF-Net to measure heart rhythm via facial videos in a short time. Firstly, we propose a multi-frequency mode signal fusion (MFF) mechanism, which can separate the characteristics of different modes of the original rPPG signals and send them to a processor with independent parameters, helping the network recover blood volume pulse (BVP) signals accurately under a complex noise environment. In addition, in order to help the network extract the characteristics of different modal signals effectively, we designed a temporal multiscale convolution module (TMSC-module) and spectrum self-attention module (SSA-module). The TMSC-module can expand the receptive field of the signal-refining network, obtain more abundant multiscale information, and transmit it to the signal reconstruction network. The SSA-module can help a signal reconstruction network locate the obvious inferior parts in the reconstruction process so as to make better decisions when merging multi-dimensional signals. Finally, in order to solve the over-fitting phenomenon that easily occurs in the network, we propose an over-fitting sampling training scheme to further improve the fitting ability of the network. Comprehensive experiments were conducted on three benchmark datasets, and we estimated HR and HRV based on the BVP signals derived by MFF-Net. Compared with state-of-the-art methods, our approach achieves better performance both on HR and HRV estimation with lower computational burden. We can conclude that the proposed MFF-Net has the opportunity to be applied in many real-world scenarios. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

15 pages, 1791 KiB  
Article
A Neural Network Based on Supervised Multi-View Contrastive Learning and Two-Stage Feature Fusion for Face Anti-Spoofing
by Jin Li and Wenyun Sun
Electronics 2024, 13(24), 4865; https://doi.org/10.3390/electronics13244865 - 10 Dec 2024
Viewed by 483
Abstract
As one of the most crucial parts of face detection, the accuracy and the generalization of face anti-spoofing are particularly important. Therefore, it is necessary to propose a multi-branch network to improve the accuracy and generalization of the detection of unknown spoofing attacks. [...] Read more.
As one of the most crucial parts of face detection, the accuracy and the generalization of face anti-spoofing are particularly important. Therefore, it is necessary to propose a multi-branch network to improve the accuracy and generalization of the detection of unknown spoofing attacks. These branches consist of several frequency map encoders and one depth map encoder. These encoders are trained together. It leverages multiple frequency features and generates depth map features. High-frequency edge texture is beneficial for capturing moiré patterns, while low-frequency features are sensitive to color distortion. Depth maps are more discriminative than RGB images at the visual level and serve as useful auxiliary information. Supervised Multi-view Contrastive Learning enhances multi-view feature learning. Moreover, a two-stage feature fusion method effectively integrates multi-branch features. Experiments on four public datasets, namely CASIA-FASD, Replay–Attack, MSU-MFSD, and OULU-NPU, demonstrate model effectiveness. The average Half Total Error Rate (HTER) of our model is 4% (25% to 21%) lower than the Adversarial Domain Adaptation method in inter-set evaluations. Full article
Show Figures

Figure 1

Back to TopTop