Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (156)

Search Parameters:
Keywords = nonnegative matrix factorization (NMF)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 6164 KiB  
Article
Probabilistic Noise Detection and Weighted Non-Negative Matrix Factorization-Based Noise Reduction Methods for Snapping Shrimp Noise
by Suhyeon Park, Jongwon Seok and Jungpyo Hong
J. Mar. Sci. Eng. 2025, 13(1), 96; https://doi.org/10.3390/jmse13010096 - 7 Jan 2025
Viewed by 422
Abstract
Snapping Shrimps (SSs) live in warm marine areas. Snapping Shrimps Noise (SSN), loud sounds generated by these underwater creatures, serves as a major source of in performance degradation by decreasing the Signal-to-Noise Ratio (SNR) for underwater acoustic communication and target detection. Thus, we [...] Read more.
Snapping Shrimps (SSs) live in warm marine areas. Snapping Shrimps Noise (SSN), loud sounds generated by these underwater creatures, serves as a major source of in performance degradation by decreasing the Signal-to-Noise Ratio (SNR) for underwater acoustic communication and target detection. Thus, we propose a unified solution for SSN detection and reduction in this paper. First, Signal Presence Probability (SPP) is calculated for SSN detection, and then the SPP is provided to Non-negative Matrix Factorization (NMF) as a weight for SSN reduction. In the proposed method, SPP acts as a key factor for SSN detection and reduction. To verify the effectiveness of the proposed method, the SAVEX-15 dataset, real ocean data containing SSN, is used. As a result of SSN detection, it was confirmed that SPP presented the highest performance in the Receiver Operating Characteristics curve, and we achieved 0.014 higher Area Under the Curve compared to competing methods. In addition, Continuous Wave and Linear Frequency Modulation signals were set as target signals and combined with the SAVEX-15 data for evaluation of noise reduction performance. As a result, the performance of the SPP-weighted NMF (WNMF) presented at least 2 dB higher SNR and SDR while maintaining less LSD compared to the Optimally Modified Log Spectral Amplitude estimator and NMF. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

18 pages, 2135 KiB  
Article
Named Entity Recognition Method Based on Multi-Feature Fusion
by Weidong Huang and Xinhang Yu
Appl. Sci. 2025, 15(1), 388; https://doi.org/10.3390/app15010388 - 3 Jan 2025
Viewed by 470
Abstract
Nowadays, user-generated content has become a crucial channel for obtaining information and authentic feedback. However, due to the varying cultural and educational levels of online users, the content of online reviews often suffers from inconsistencies in specification and the inclusion of arbitrary information. [...] Read more.
Nowadays, user-generated content has become a crucial channel for obtaining information and authentic feedback. However, due to the varying cultural and educational levels of online users, the content of online reviews often suffers from inconsistencies in specification and the inclusion of arbitrary information. Consequently, the task of extracting key information from online reviews has become a prominent area of research. This paper proposes a combined entity recognition model for online reviews, aiming to improve the accuracy of Named Entity Recognition (NER). Initially, the Non-negative Matrix Factorization (NMF) model is employed to perform thematic clustering on the review texts, and entity types are extracted based on the clustering results. Subsequently, we introduce an entity recognition model utilizing the pre-trained BERT model as an embedding layer, with BiLSTM and DGCNN incorporating residual connection and gating mechanisms as feature extraction layers. The model also leverages multi-head attention for feature fusion, and the final results are decoded using a Conditional Random Field (CRF) layer. The model achieves an F1 score of 86.8383% on a collected dataset of online reviews containing eight entity categories. Experimental results demonstrate that the proposed model outperforms other mainstream NER models, effectively identifying key entities in online reviews. Full article
Show Figures

Figure 1

13 pages, 270 KiB  
Article
Exploring the Effects of Pre-Processing Techniques on Topic Modeling of an Arabic News Article Data Set
by Haya Alangari and Nahlah Algethami
Appl. Sci. 2024, 14(23), 11350; https://doi.org/10.3390/app142311350 - 5 Dec 2024
Viewed by 655
Abstract
This research investigates the impacts of pre-processing techniques on the effectiveness of topic modeling algorithms for Arabic texts, focusing on a comparison between BERTopic, Latent Dirichlet Allocation (LDA), and Non-Negative Matrix Factorization (NMF). Using the Single-label Arabic News Article Data set (SANAD), which [...] Read more.
This research investigates the impacts of pre-processing techniques on the effectiveness of topic modeling algorithms for Arabic texts, focusing on a comparison between BERTopic, Latent Dirichlet Allocation (LDA), and Non-Negative Matrix Factorization (NMF). Using the Single-label Arabic News Article Data set (SANAD), which includes 195,174 Arabic news articles, this study explores pre-processing methods such as cleaning, stemming, normalization, and stop word removal, which are crucial processes given the complex morphology of Arabic. Additionally, the influence of six different embedding models on the topic modeling performance was assessed. The originality of this work lies in addressing the lack of previous studies that optimize BERTopic through adjusting the n-gram range parameter and combining it with different embedding models for effective Arabic topic modeling. Pre-processing techniques were fine-tuned to improve data quality before applying BERTopic, LDA, and NMF, and the performance was assessed using metrics such as topic coherence and diversity. Coherence was measured using Normalized Pointwise Mutual Information (NPMI). The results show that the Tashaphyne stemmer significantly enhanced the performance of LDA and NMF. BERTopic, optimized with pre-processing and bi-grams, outperformed LDA and NMF in both coherence and diversity. The CAMeL-Lab/bert-base-arabic-camelbert-da embedding yielded the best results, emphasizing the importance of pre-processing in Arabic topic modeling. Full article
Show Figures

Figure 1

32 pages, 6565 KiB  
Article
Sparse Feature-Weighted Double Laplacian Rank Constraint Non-Negative Matrix Factorization for Image Clustering
by Hu Ma, Ziping Ma, Huirong Li and Jingyu Wang
Mathematics 2024, 12(23), 3656; https://doi.org/10.3390/math12233656 - 22 Nov 2024
Viewed by 536
Abstract
As an extension of non-negative matrix factorization (NMF), graph-regularized non-negative matrix factorization (GNMF) has been widely applied in data mining and machine learning, particularly for tasks such as clustering and feature selection. Traditional GNMF methods typically rely on predefined graph structures to guide [...] Read more.
As an extension of non-negative matrix factorization (NMF), graph-regularized non-negative matrix factorization (GNMF) has been widely applied in data mining and machine learning, particularly for tasks such as clustering and feature selection. Traditional GNMF methods typically rely on predefined graph structures to guide the decomposition process, using fixed data graphs and feature graphs to capture relationships between data points and features. However, these fixed graphs may limit the model’s expressiveness. Additionally, many NMF variants face challenges when dealing with complex data distributions and are vulnerable to noise and outliers. To overcome these challenges, we propose a novel method called sparse feature-weighted double Laplacian rank constraint non-negative matrix factorization (SFLRNMF), along with its extended version, SFLRNMTF. These methods adaptively construct more accurate data similarity and feature similarity graphs, while imposing rank constraints on the Laplacian matrices of these graphs. This rank constraint ensures that the resulting matrix ranks reflect the true number of clusters, thereby improving clustering performance. Moreover, we introduce a feature weighting matrix into the original data matrix to reduce the influence of irrelevant features and apply an L2,1/2 norm sparsity constraint in the basis matrix to encourage sparse representations. An orthogonal constraint is also enforced on the coefficient matrix to ensure interpretability of the dimensionality reduction results. In the extended model (SFLRNMTF), we introduce a double orthogonal constraint on the basis matrix and coefficient matrix to enhance the uniqueness and interpretability of the decomposition, thereby facilitating clearer clustering results for both rows and columns. However, enforcing double orthogonal constraints can reduce approximation accuracy, especially with low-rank matrices, as it restricts the model’s flexibility. To address this limitation, we introduce an additional factor matrix R, which acts as an adaptive component that balances the trade-off between constraint enforcement and approximation accuracy. This adjustment allows the model to achieve greater representational flexibility, improving reconstruction accuracy while preserving the interpretability and clustering clarity provided by the double orthogonality constraints. Consequently, the SFLRNMTF approach becomes more robust in capturing data patterns and achieving high-quality clustering results in complex datasets. We also propose an efficient alternating iterative update algorithm to optimize the proposed model and provide a theoretical analysis of its performance. Clustering results on four benchmark datasets demonstrate that our method outperforms competing approaches. Full article
Show Figures

Figure 1

24 pages, 3462 KiB  
Article
Underutilized Feature Extraction Methods for Burn Severity Mapping: A Comprehensive Evaluation
by Linh Nguyen Van and Giha Lee
Remote Sens. 2024, 16(22), 4339; https://doi.org/10.3390/rs16224339 - 20 Nov 2024
Viewed by 891
Abstract
Wildfires increasingly threaten ecosystems and infrastructure, making accurate burn severity mapping (BSM) essential for effective disaster response and environmental management. Machine learning (ML) models utilizing satellite-derived vegetation indices are crucial for assessing wildfire damage; however, incorporating many indices can lead to multicollinearity, reducing [...] Read more.
Wildfires increasingly threaten ecosystems and infrastructure, making accurate burn severity mapping (BSM) essential for effective disaster response and environmental management. Machine learning (ML) models utilizing satellite-derived vegetation indices are crucial for assessing wildfire damage; however, incorporating many indices can lead to multicollinearity, reducing classification accuracy. While principal component analysis (PCA) is commonly used to address this issue, its effectiveness relative to other feature extraction (FE) methods in BSM remains underexplored. This study aims to enhance ML classifier accuracy in BSM by evaluating various FE techniques that mitigate multicollinearity among vegetation indices. Using composite burn index (CBI) data from the 2014 Carlton Complex fire in the United States as a case study, we extracted 118 vegetation indices from seven Landsat-8 spectral bands. We applied and compared 13 different FE techniques—including linear and nonlinear methods such as PCA, t-distributed stochastic neighbor embedding (t-SNE), linear discriminant analysis (LDA), Isomap, uniform manifold approximation and projection (UMAP), factor analysis (FA), independent component analysis (ICA), multidimensional scaling (MDS), truncated singular value decomposition (TSVD), non-negative matrix factorization (NMF), locally linear embedding (LLE), spectral embedding (SE), and neighborhood components analysis (NCA). The performance of these techniques was benchmarked against six ML classifiers to determine their effectiveness in improving BSM accuracy. Our results show that alternative FE techniques can outperform PCA, improving classification accuracy and computational efficiency. Techniques like LDA and NCA effectively capture nonlinear relationships critical for accurate BSM. The study contributes to the existing literature by providing a comprehensive comparison of FE methods, highlighting the potential benefits of underutilized techniques in BSM. Full article
Show Figures

Figure 1

19 pages, 7749 KiB  
Article
Generative Simplex Mapping: Non-Linear Endmember Extraction and Spectral Unmixing for Hyperspectral Imagery
by John Waczak and David J. Lary
Remote Sens. 2024, 16(22), 4316; https://doi.org/10.3390/rs16224316 - 19 Nov 2024
Viewed by 877
Abstract
We introduce a new model for non-linear endmember extraction and spectral unmixing of hyperspectral imagery called Generative Simplex Mapping (GSM). The model represents endmember mixing using a latent space of points sampled within a (n1)-simplex corresponding to n [...] Read more.
We introduce a new model for non-linear endmember extraction and spectral unmixing of hyperspectral imagery called Generative Simplex Mapping (GSM). The model represents endmember mixing using a latent space of points sampled within a (n1)-simplex corresponding to n unique sources. Barycentric coordinates within this simplex are naturally interpreted as relative endmember abundances satisfying both the abundance sum-to-one and abundance non-negativity constraints. Points in this latent space are mapped to reflectance spectra via a flexible function combining linear and non-linear mixing. Due to the probabilistic formulation of the GSM, spectral variability is also estimated by a precision parameter describing the distribution of observed spectra. Model parameters are determined using a generalized expectation-maximization algorithm, which guarantees non-negativity for extracted endmembers. We first compare the GSM against three varieties of non-negative matrix factorization (NMF) on a synthetic data set of linearly mixed spectra from the USGS spectral database. Here, the GSM performed favorably for both endmember accuracy and abundance estimation with all non-linear contributions driven to zero by the fitting procedure. In a second experiment, we apply the GTM to model non-linear mixing in real hyperspectral imagery captured over a pond in North Texas. The model accurately identified spectral signatures corresponding to near-shore algae, water, and rhodamine tracer dye introduced into the pond to simulate water contamination by a localized source. Abundance maps generated using the GSM accurately track the evolution of the dye plume as it mixes into the surrounding water. Full article
Show Figures

Figure 1

18 pages, 12884 KiB  
Article
Data-Driven Analysis of High-Temperature Fluorocarbon Plasma for Semiconductor Processing
by Sung Kyu Jang, Woosung Lee, Ga In Choi, Jihun Kim, Minji Kang, Seongho Kim, Jong Hyun Choi, Seul-Gi Kim, Seoung-Ki Lee, Hyeong-U Kim and Hyeongkeun Kim
Sensors 2024, 24(22), 7307; https://doi.org/10.3390/s24227307 - 15 Nov 2024
Viewed by 798
Abstract
The semiconductor industry increasingly relies on high aspect ratio etching facilitated by Amorphous Carbon Layer (ACL) masks for advanced 3D-NAND and DRAM technologies. However, carbon contamination in ACL deposition chambers necessitates effective fluorine-based plasma cleaning. This study employs a high-temperature inductively coupled plasma [...] Read more.
The semiconductor industry increasingly relies on high aspect ratio etching facilitated by Amorphous Carbon Layer (ACL) masks for advanced 3D-NAND and DRAM technologies. However, carbon contamination in ACL deposition chambers necessitates effective fluorine-based plasma cleaning. This study employs a high-temperature inductively coupled plasma (ICP) system and Time-of-Flight Mass Spectrometry (ToF-MS) to analyze gas species variations under different process conditions. We applied Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF) to identify key gas species, and used the First-Order Plus Dead Time (FOPDT) model to quantify dynamic changes in gas signals. Our analysis revealed the formation of COF3 at high gas temperatures and plasma power levels, indicating the presence of additional reaction pathways under these conditions. This study provides a comprehensive understanding of high-temperature plasma interactions and suggests new strategies for optimizing ACL processes in semiconductor manufacturing. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

22 pages, 5228 KiB  
Article
Hydrogeochemical Characteristics and Formation Mechanisms of High-Arsenic Groundwater in the North China Plain: Insights from Hydrogeochemical Analysis and Unsupervised Machine Learning
by Xiaofang Wu, Weijiang Liu, Yi Liu, Ganghui Zhu and Qiaochu Han
Water 2024, 16(22), 3215; https://doi.org/10.3390/w16223215 - 8 Nov 2024
Viewed by 747
Abstract
Hydrochemical data were utilized in this study to elucidate the hydrogeochemical characteristics and genesis of high-arsenic groundwater in the North China Plain, employing both traditional hydrogeochemical analysis and unsupervised machine learning techniques. The findings indicate that the predominant hydrochemical types of groundwater in [...] Read more.
Hydrochemical data were utilized in this study to elucidate the hydrogeochemical characteristics and genesis of high-arsenic groundwater in the North China Plain, employing both traditional hydrogeochemical analysis and unsupervised machine learning techniques. The findings indicate that the predominant hydrochemical types of groundwater in the study area are HCO3-Ca·Na and SO4·Cl-Na·Ca. The primary mechanism influencing groundwater chemistry has been identified as rock weathering. The unsupervised machine learning framework incorporates various methods, such as principal component analysis (PCA), non-negative matrix factorization (NMF), machine learning models (gradient boosting trees and random forests), and cluster analysis to explore the characteristics and genesis of groundwater hydrochemical types within the study area. This study demonstrated that the formation mechanism of high-arsenic groundwater results from multiple interacting factors. Full article
Show Figures

Figure 1

18 pages, 4421 KiB  
Article
Assessing Scientific Text Similarity: A Novel Approach Utilizing Non-Negative Matrix Factorization and Bidirectional Encoder Representations from Transformer
by Zhixuan Jia, Wenfang Tian, Wang Li, Kai Song, Fuxin Wang and Congjing Ran
Mathematics 2024, 12(21), 3328; https://doi.org/10.3390/math12213328 - 23 Oct 2024
Viewed by 935
Abstract
The patent serves as a vital component of scientific text, and over time, escalating competition has generated a substantial demand for patent analysis encompassing areas such as company strategy and legal services, necessitating fast, accurate, and easily applicable similarity estimators. At present, conducting [...] Read more.
The patent serves as a vital component of scientific text, and over time, escalating competition has generated a substantial demand for patent analysis encompassing areas such as company strategy and legal services, necessitating fast, accurate, and easily applicable similarity estimators. At present, conducting natural language processing(NLP) on patent content, including titles, abstracts, etc., can serve as an effective method for estimating similarity. However, the traditional NLP approach has some disadvantages, such as the requirement for a huge amount of labeled data and poor explanation of deep-learning-based model internals, exacerbated by the high compression of patent content. On the other hand, most knowledge-based deep learning models require a vast amount of additional analysis results as training variables in similarity estimation, which are limited due to human participation in the analysis part. Thus, in this research, addressing these challenges, we introduce a novel estimator to enhance the transparency of similarity estimation. This approach integrates a patent’s content with international patent classification (IPC), leveraging bidirectional encoder representations from transformers (BERT), and non-negative matrix factorization (NMF). By integrating these techniques, we aim to improve knowledge discovery transparency in NLP across various IPC dimensions and incorporate more background knowledge into context similarity estimation. The experimental results demonstrate that our model is reliable, explainable, highly accurate, and practically usable. Full article
(This article belongs to the Special Issue Probability, Stochastic Processes and Machine Learning)
Show Figures

Figure 1

21 pages, 40325 KiB  
Article
Non-Negative Matrix Factorization with Averaged Kurtosis and Manifold Constraints for Blind Hyperspectral Unmixing
by Chunli Song, Linzhang Lu and Chengbin Zeng
Symmetry 2024, 16(11), 1414; https://doi.org/10.3390/sym16111414 - 23 Oct 2024
Cited by 1 | Viewed by 1255
Abstract
The Nonnegative Matrix Factorization (NMF) algorithm and its variants have gained widespread popularity across various domains, including neural networks, text clustering, image processing, and signal analysis. In the context of hyperspectral unmixing (HU), an important task involving the accurate extraction of endmembers from [...] Read more.
The Nonnegative Matrix Factorization (NMF) algorithm and its variants have gained widespread popularity across various domains, including neural networks, text clustering, image processing, and signal analysis. In the context of hyperspectral unmixing (HU), an important task involving the accurate extraction of endmembers from mixed spectra, researchers have been actively exploring different regularization techniques within the traditional NMF framework. These techniques aim to improve the precision and reliability of the endmember extraction process in HU. In this study, we propose a novel HU algorithm called KMBNMF, which introduces an average kurtosis regularization term based on endmember spectra to enhance endmember extraction, additionally, it integrates a manifold regularization term into the average kurtosis-constrained NMF by constructing a symmetric weight matrix. This combination of these two regularization techniques not only optimizes the extraction process of independent endmembers but also improves the part-based representation capability of hyperspectral data. Experimental results obtained from simulated and real-world hyperspectral datasets demonstrate the competitive performance of the proposed KMBNMF algorithm when compared to state-of-the-art algorithms. Full article
(This article belongs to the Section Mathematics)
Show Figures

Figure 1

22 pages, 4759 KiB  
Article
An Improved Nonnegative Matrix Factorization Algorithm Combined with K-Means for Audio Noise Reduction
by Yan Liu, Haozhen Zhu, Yongtuo Cui, Xiaoyu Yu, Haibin Wu and Aili Wang
Electronics 2024, 13(20), 4132; https://doi.org/10.3390/electronics13204132 - 21 Oct 2024
Viewed by 903
Abstract
Clustering algorithms have the characteristics of being simple and efficient and can complete calculations without a large number of datasets, making them suitable for application in noise reduction processing for audio module mass production testing. In order to solve the problems of the [...] Read more.
Clustering algorithms have the characteristics of being simple and efficient and can complete calculations without a large number of datasets, making them suitable for application in noise reduction processing for audio module mass production testing. In order to solve the problems of the NMF algorithm easily getting stuck in local optimal solutions and difficult feature signal extraction, an improved NMF audio denoising algorithm combined with K-means initialization was designed. Firstly, the Euclidean distance formula of K-means has been improved to extract audio signal features from multiple dimensions. Combined with the initialization strategy of K-means decomposition, the initialization dictionary matrix of the NMF algorithm has been optimized to avoid getting stuck in local optimal solutions and effectively improve the robustness of the algorithm. Secondly, in the sparse coding part of the NMF algorithm, feature extraction expressions are added to solve the problem of noise residue and partial spectral signal loss in audio signals during the operation process. At the same time, the size of the coefficient matrix is limited to reduce operation time and improve the accuracy of feature extraction in high-precision audio signals. Then, comparative experiments were conducted using the NOIZEUS and NOISEX-92 datasets, as well as random noise audio signals. This algorithm improved the signal-to-noise ratio by 10–20 dB and reduced harmonic distortion by approximately −10 dB. Finally, a high-precision audio acquisition unit based on FPGA was designed, and practical applications have shown that it can effectively improve the signal-to-noise ratio of audio signals and reduce harmonic distortion. Full article
Show Figures

Figure 1

20 pages, 2643 KiB  
Article
A Tour Recommendation System Considering Implicit and Dynamic Information
by Chieh-Yuan Tsai, Kai-Wen Chuang, Hen-Yi Jen and Hao Huang
Appl. Sci. 2024, 14(20), 9271; https://doi.org/10.3390/app14209271 - 11 Oct 2024
Viewed by 1242
Abstract
Tourism has become one of the world’s largest service industries. Due to the rapid development of social media, more people like self-guided tours than package itineraries planned by travel agencies. Therefore, how to develop itinerary recommendation systems that can provide practical tour suggestions [...] Read more.
Tourism has become one of the world’s largest service industries. Due to the rapid development of social media, more people like self-guided tours than package itineraries planned by travel agencies. Therefore, how to develop itinerary recommendation systems that can provide practical tour suggestions for tourists has become an important research topic. This study proposes a novel tour recommendation system that considers the implicit and dynamic information of Point-of-Interest (POI). Our approach is based on users’ photo information uploaded to social media in various tourist attractions. For each check-in record, we will find the POI closest to the user’s check-in Global Positioning System (GPS) location and consider the POI as the one they want to visit. Instead of using explicit information such as categories to represent POIs, this research uses the implicit feature extracted from the textual descriptions of POIs. Textual description for a POI contains rich and potential information describing the POI’s type, facilities, or activities, which makes it more suitable to represent a POI. In addition, this study considers visiting sequences when evaluating user similarity during clustering so that tourists in each sub-group hold higher behavior similarity. Next, the Non-negative Matrix Factorization (NMF) dynamically derives the staying time for different users, time slots, and POIs. Finally, a personalized itinerary algorithm is developed that considers user preference and dynamic staying time. The system will recommend the itinerary with the highest score and the longest remaining time. A set of experiments indicates that the proposed recommendation system outperforms state-of-the-art next POI recommendation methods regarding four commonly used evaluation metrics. Full article
Show Figures

Figure 1

16 pages, 2463 KiB  
Article
Binning Metagenomic Contigs Using Contig Embedding and Decomposed Tetranucleotide Frequency
by Long Fu, Jiabin Shi and Baohua Huang
Biology 2024, 13(10), 755; https://doi.org/10.3390/biology13100755 - 24 Sep 2024
Viewed by 1240
Abstract
Metagenomic binning is a crucial step in metagenomic research. It can aggregate the genome sequences belonging to the same microbial species into independent bins. Most existing methods ignore the semantic information of contigs and lack effective processing of tetranucleotide frequency, resulting in insufficient [...] Read more.
Metagenomic binning is a crucial step in metagenomic research. It can aggregate the genome sequences belonging to the same microbial species into independent bins. Most existing methods ignore the semantic information of contigs and lack effective processing of tetranucleotide frequency, resulting in insufficient and complex feature information extracted for binning and poor binning results. To address the above problems, we propose CedtBin, a metagenomic binning method based on contig embedding and decomposed tetranucleotide frequency. First, the improved BERT model is used to learn the contigs to obtain their embedding representation. Secondly, the tetranucleotide frequencies are decomposed using a non-negative matrix factorization (NMF) algorithm. After that, the two features are spliced and input into the clustering algorithm for binning. Considering the sensitivity of the DBSCAN clustering algorithm to input parameters, in order to solve the drawbacks of manual parameter input, we also propose an Annoy-DBSCAN algorithm that can adaptively determine the parameters of the DBSCAN algorithm. This algorithm uses Approximate Nearest Neighbors Oh Yeah (Annoy) and combines it with a grid search strategy to find the optimal parameters of the DBSCAN algorithm. On simulated and real datasets, CedtBin achieves better binning results than mainstream methods and can reconstruct more genomes, indicating that the proposed method is effective. Full article
(This article belongs to the Special Issue 2nd Edition of Computational Methods in Biology)
Show Figures

Figure 1

18 pages, 10325 KiB  
Article
Research on the Detection of Steel Plate Defects Based on SimAM and Twin-NMF Transfer
by Yongqiang Zou, Guanghui Zhang and Yugang Fan
Mathematics 2024, 12(17), 2782; https://doi.org/10.3390/math12172782 - 8 Sep 2024
Cited by 1 | Viewed by 1286
Abstract
Pulsed eddy current thermography can detect surface or subsurface defects in steel, but in the process of combining deep learning, it is expensive and inefficient to build a complete sample of defects due to the complexity of the actual industrial environment. Consequently, this [...] Read more.
Pulsed eddy current thermography can detect surface or subsurface defects in steel, but in the process of combining deep learning, it is expensive and inefficient to build a complete sample of defects due to the complexity of the actual industrial environment. Consequently, this study proposes a transfer learning method based on Twin-NMF and combines it with the SimAM attention mechanism to enhance the detection accuracy of the target domain task. First, to address the domain differences between the target domain task and the source domain samples, this study introduces a Twin-NMF transfer method. This approach reconstructs the feature space of both the source and target domains using twin non-negative matrix factorization and employs cosine similarity to measure the correlation between the features of these two domains. Secondly, this study integrates a parameter-free SimAM into the neck of the YOLOv8 model to enhance its capabilities in extracting and classifying steel surface defects, as well as to alleviate the precision collapse phenomenon associated with multi-scale defect recognition. The experimental results show that the proposed Twin-NMF model with SimAM improves the detection accuracy of steel surface defects. Taking NEU-DET and GC10-DET as source domains, respectively, in the ECTI dataset, [email protected] reaches 99.3% and 99.2%, and the detection accuracy reaches 98% and 98.5%. Full article
(This article belongs to the Section E2: Control Theory and Mechanics)
Show Figures

Figure 1

32 pages, 14893 KiB  
Article
Mapping of Clay Montmorillonite Abundance in Agricultural Fields Using Unmixing Methods at Centimeter Scale Hyperspectral Images
by Etienne Ducasse, Karine Adeline, Audrey Hohmann, Véronique Achard, Anne Bourguignon, Gilles Grandjean and Xavier Briottet
Remote Sens. 2024, 16(17), 3211; https://doi.org/10.3390/rs16173211 - 30 Aug 2024
Viewed by 1346
Abstract
The composition of clay minerals in soils, and more particularly the presence of montmorillonite (as part of the smectite family), is a key factor in soil swell–shrinking as well as off–road vehicle mobility. Detecting these topsoil clay minerals and quantifying the montmorillonite abundance [...] Read more.
The composition of clay minerals in soils, and more particularly the presence of montmorillonite (as part of the smectite family), is a key factor in soil swell–shrinking as well as off–road vehicle mobility. Detecting these topsoil clay minerals and quantifying the montmorillonite abundance are a challenge since they are usually intimately mixed with other minerals, soil organic carbon and soil moisture content. Imaging spectroscopy coupled with unmixing methods can address these issues, but the quality of the estimation degrades the coarser the spatial resolution is due to pixel heterogeneity. With the advent of UAV-borne and proximal hyperspectral acquisitions, it is now possible to acquire images at a centimeter scale. Thus, the objective of this paper is to evaluate the accuracy and limitations of unmixing methods to retrieve montmorillonite abundance from very-high-resolution hyperspectral images (1.5 cm) acquired from a camera installed on top of a bucket truck over three different agricultural fields, in Loiret department, France. Two automatic endmember detection methods based on the assumption that materials are linearly mixed, namely the Simplex Identification via Split Augmented Lagrangian (SISAL) and the Minimum Volume Constrained Non-negative Matrix Factorization (MVC-NMF), were tested prior to unmixing. Then, two linear unmixing methods, the fully constrained least square method (FCLS) and the multiple endmember spectral mixture analysis (MESMA), and two nonlinear unmixing ones, the generalized bilinear method (GBM) and the multi-linear model (MLM), were performed on the images. In addition, several spectral preprocessings coupled with these unmixing methods were applied in order to improve the performances. Results showed that our selected automatic endmember detection methods were not suitable in this context. However, unmixing methods with endmembers taken from available spectral libraries performed successfully. The nonlinear method, MLM, without prior spectral preprocessing or with the application of the first Savitzky–Golay derivative, gave the best accuracies for montmorillonite abundance estimation using the USGS library (RMSE between 2.2–13.3% and 1.4–19.7%). Furthermore, a significant impact on the abundance estimations at this scale was in majority due to (i) the high variability of the soil composition, (ii) the soil roughness inducing large variations of the illumination conditions and multiple surface scatterings and (iii) multiple volume scatterings coming from the intimate mixture. Finally, these results offer a new opportunity for mapping expansive soils from imaging spectroscopy at very high spatial resolution. Full article
(This article belongs to the Special Issue Remote Sensing for Geology and Mapping)
Show Figures

Figure 1

Back to TopTop