Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (53)

Search Parameters:
Keywords = keyframe extraction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 11107 KiB  
Article
Joint Optimization of the 3D Model and 6D Pose for Monocular Pose Estimation
by Liangchao Guo, Lin Chen, Qiufu Wang, Zhuo Zhang and Xiaoliang Sun
Viewed by 290
Abstract
The autonomous landing of unmanned aerial vehicles (UAVs) relies on a precise relative 6D pose between platforms. Existing model-based monocular pose estimation methods need an accurate 3D model of the target. They cannot handle the absence of an accurate 3D model. This paper [...] Read more.
The autonomous landing of unmanned aerial vehicles (UAVs) relies on a precise relative 6D pose between platforms. Existing model-based monocular pose estimation methods need an accurate 3D model of the target. They cannot handle the absence of an accurate 3D model. This paper adopts the multi-view geometry constraints within the monocular image sequence to solve the problem. And a novel approach to monocular pose estimation is introduced, which jointly optimizes the target’s 3D model and the relative 6D pose. We propose to represent the target’s 3D model using a set of sparse 3D landmarks. The 2D landmarks are detected in the input image by a trained neural network. Based on the 2D–3D correspondences, the initial pose estimation is obtained by solving the PnP problem. To achieve joint optimization, this paper builds the objective function based on the minimization of the reprojection error. And the correction values of the 3D landmarks and the 6D pose are parameters to be solved in the optimization problem. By solving the optimization problem, the joint optimization of the target’s 3D model and the 6D pose is realized. In addition, a sliding window combined with a keyframe extraction strategy is adopted to speed up the algorithm processing. Experimental results on synthetic and real image sequences show that the proposed method achieves real-time and online high-precision monocular pose estimation with the absence of an accurate 3D model via the joint optimization of the target’s 3D model and pose. Full article
Show Figures

Figure 1

18 pages, 7989 KiB  
Article
Intelligent Dance Motion Evaluation: An Evaluation Method Based on Keyframe Acquisition According to Musical Beat Features
by Hengzi Li and Xingli Huang
Sensors 2024, 24(19), 6278; https://doi.org/10.3390/s24196278 - 28 Sep 2024
Viewed by 674
Abstract
Motion perception is crucial in competitive sports like dance, basketball, and diving. However, evaluations in these sports heavily rely on professionals, posing two main challenges: subjective assessments are uncertain and can be influenced by experience, making it hard to guarantee timeliness and accuracy, [...] Read more.
Motion perception is crucial in competitive sports like dance, basketball, and diving. However, evaluations in these sports heavily rely on professionals, posing two main challenges: subjective assessments are uncertain and can be influenced by experience, making it hard to guarantee timeliness and accuracy, and increasing labor costs with multi-expert voting. While video analysis methods have alleviated some pressure, challenges remain in extracting key points/frames from videos and constructing a suitable, quantifiable evaluation method that aligns with the static–dynamic nature of movements for accurate assessment. Therefore, this study proposes an innovative intelligent evaluation method aimed at enhancing the accuracy and processing speed of complex video analysis tasks. Firstly, by constructing a keyframe extraction method based on musical beat detection, coupled with prior knowledge, the beat detection is optimized through a perceptually weighted window to accurately extract keyframes that are highly correlated with dance movement changes. Secondly, OpenPose is employed to detect human joint points in the keyframes, quantifying human movements into a series of numerically expressed nodes and their relationships (i.e., pose descriptions). Combined with the positions of keyframes in the time sequence, a standard pose description sequence is formed, serving as the foundational data for subsequent quantitative evaluations. Lastly, an Action Sequence Evaluation method (ASCS) is established based on all action features within a single action frame to precisely assess the overall performance of individual actions. Furthermore, drawing inspiration from the Rouge-L evaluation method in natural language processing, a Similarity Measure Approach based on Contextual Relationships (SMACR) is constructed, focusing on evaluating the coherence of actions. By integrating ASCS and SMACR, a comprehensive evaluation of dancers is conducted from both the static and dynamic dimensions. During the method validation phase, the research team judiciously selected 12 representative samples from the popular dance game Just Dance, meticulously classifying them according to the complexity of dance moves and physical exertion levels. The experimental results demonstrate the outstanding performance of the constructed automated evaluation method. Specifically, this method not only achieves the precise assessments of dance movements at the individual keyframe level but also significantly enhances the evaluation of action coherence and completeness through the innovative SMACR. Across all 12 test samples, the method accurately selects 2 to 5 keyframes per second from the videos, reducing the computational load to 4.1–10.3% compared to traditional full-frame matching methods, while the overall evaluation accuracy only slightly decreases by 3%, fully demonstrating the method’s combination of efficiency and precision. Through precise musical beat alignment, efficient keyframe extraction, and the introduction of intelligent dance motion analysis technology, this study significantly improves upon the subjectivity and inefficiency of traditional manual evaluations, enhancing the scientificity and accuracy of assessments. It provides robust tool support for fields such as dance education and competition evaluations, showcasing broad application prospects. Full article
(This article belongs to the Collection Sensors and AI for Movement Analysis)
Show Figures

Figure 1

21 pages, 7746 KiB  
Article
Multi-Robot Collaborative Mapping with Integrated Point-Line Features for Visual SLAM
by Yu Xia, Xiao Wu, Tao Ma, Liucun Zhu, Jingdi Cheng and Junwu Zhu
Sensors 2024, 24(17), 5743; https://doi.org/10.3390/s24175743 - 4 Sep 2024
Viewed by 805
Abstract
Simultaneous Localization and Mapping (SLAM) enables mobile robots to autonomously perform localization and mapping tasks in unknown environments. Despite significant progress achieved by visual SLAM systems in ideal conditions, relying solely on a single robot and point features for mapping in large-scale indoor [...] Read more.
Simultaneous Localization and Mapping (SLAM) enables mobile robots to autonomously perform localization and mapping tasks in unknown environments. Despite significant progress achieved by visual SLAM systems in ideal conditions, relying solely on a single robot and point features for mapping in large-scale indoor environments with weak-texture structures can affect mapping efficiency and accuracy. Therefore, this paper proposes a multi-robot collaborative mapping method based on point-line fusion to address this issue. This method is designed for indoor environments with weak-texture structures for localization and mapping. The feature-extraction algorithm, which combines point and line features, supplements the existing environment point feature-extraction method by introducing a line feature-extraction step. This integration ensures the accuracy of visual odometry estimation in scenes with pronounced weak-texture structure features. For relatively large indoor scenes, a scene-recognition-based map-fusion method is proposed in this paper to enhance mapping efficiency. This method relies on visual bag of words to determine overlapping areas in the scene, while also proposing a keyframe-extraction method based on photogrammetry to improve the algorithm’s robustness. By combining the Perspective-3-Point (P3P) algorithm and Bundle Adjustment (BA) algorithm, the relative pose-transformation relationships of multi-robots in overlapping scenes are resolved, and map fusion is performed based on these relative pose relationships. We evaluated our algorithm on public datasets and a mobile robot platform. The experimental results demonstrate that the proposed algorithm exhibits higher robustness and mapping accuracy. It shows significant effectiveness in handling mapping in scenarios with weak texture and structure, as well as in small-scale map fusion. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

23 pages, 9336 KiB  
Article
MFO-Fusion: A Multi-Frame Residual-Based Factor Graph Optimization for GNSS/INS/LiDAR Fusion in Challenging GNSS Environments
by Zixuan Zou, Guoshuai Wang, Zhenshuo Li, Rui Zhai and Yonghua Li
Remote Sens. 2024, 16(17), 3114; https://doi.org/10.3390/rs16173114 - 23 Aug 2024
Viewed by 925
Abstract
In various practical applications, such as autonomous vehicle and unmanned aerial vehicle navigation, Global Navigation Satellite Systems (GNSSs) are commonly used for positioning. However, traditional GNSS positioning methods are often affected by disturbances due to external observational conditions. For instance, in areas with [...] Read more.
In various practical applications, such as autonomous vehicle and unmanned aerial vehicle navigation, Global Navigation Satellite Systems (GNSSs) are commonly used for positioning. However, traditional GNSS positioning methods are often affected by disturbances due to external observational conditions. For instance, in areas with dense buildings, tree cover, or tunnels, GNSS signals may be obstructed, resulting in positioning failures or decreased accuracy. Therefore, improving the accuracy and stability of GNSS positioning in these complex environments is a critical concern. In this paper, we propose a novel multi-sensor fusion framework based on multi-frame residual optimization for GNSS/INS/LiDAR to address the challenges posed by complex satellite environments. Our system employs a novel residual detection and optimization method for continuous-time GNSS within keyframes. Specifically, we use rough pose measurements from LiDAR to extract keyframes for the global system. Within these keyframes, the multi-frame residuals of GNSS and IMU are estimated using the Median Absolute Deviation (MAD) and subsequently employed for the degradation detection and sliding window optimization of the GNSS. Building on this, we employ a two-stage factor graph optimization strategy, significantly improving positioning accuracy, especially in environments with limited GNSS signals. To validate the effectiveness of our approach, we assess the system’s performance on the publicly available UrbanLoco dataset and conduct experiments in real-world environments. The results demonstrate that our system can achieve continuous decimeter-level positioning accuracy in these complex environments, outperforming other related frameworks. Full article
Show Figures

Graphical abstract

21 pages, 5709 KiB  
Article
A Robust and Lightweight Loop Closure Detection Approach for Challenging Environments
by Yuan Shi, Rui Li, Yingjing Shi and Shaofeng Liang
Viewed by 1005
Abstract
Loop closure detection is crucial for simultaneous localization and mapping (SLAM), as it can effectively correct the accumulated errors. Complex scenarios put forward high requirements on the robustness of loop closure detection. Traditional feature-based loop closure detection methods often fail to meet these [...] Read more.
Loop closure detection is crucial for simultaneous localization and mapping (SLAM), as it can effectively correct the accumulated errors. Complex scenarios put forward high requirements on the robustness of loop closure detection. Traditional feature-based loop closure detection methods often fail to meet these challenges. To solve this problem, this paper proposes a robust and efficient deep-learning-based loop closure detection approach. We employ MixVPR to extract global descriptors from keyframes and construct a global descriptor database. For local feature extraction, SuperPoint is utilized. Then, the constructed global descriptor database is used to find the loop frame candidates, and LightGlue is subsequently used to match the most similar loop frame and current keyframe with the local features. After matching, the relative pose can be computed. Our approach is first evaluated on several public datasets, and the results prove that our approach is highly robust to complex environments. The proposed approach is further validated on a real-world dataset collected by a drone and achieves accurate performance and shows good robustness in challenging conditions. Additionally, an analysis of time and memory costs is also conducted and proves that our approach can maintain accuracy and have satisfactory real-time performance as well. Full article
Show Figures

Figure 1

15 pages, 4680 KiB  
Article
A Visual Navigation Algorithm for UAV Based on Visual-Geography Optimization
by Weibo Xu, Dongfang Yang, Jieyu Liu, Yongfei Li and Maoan Zhou
Viewed by 1143
Abstract
The estimation of Unmanned Aerial Vehicle (UAV) poses using visual information is essential in Global Navigation Satellite System (GNSS)-denied environments. In this paper, we propose a UAV visual navigation algorithm based on visual-geography Bundle Adjustment (BA) to address the challenge of missing geolocation [...] Read more.
The estimation of Unmanned Aerial Vehicle (UAV) poses using visual information is essential in Global Navigation Satellite System (GNSS)-denied environments. In this paper, we propose a UAV visual navigation algorithm based on visual-geography Bundle Adjustment (BA) to address the challenge of missing geolocation information in monocular visual navigation. This algorithm presents an effective approach to UAV navigation and positioning. Initially, Visual Odometry (VO) was employed for tracking the UAV’s motion and extracting keyframes. Subsequently, a geolocation method based on heterogeneous image matching was utilized to calculate the geographic pose of the UAV. Additionally, we introduce a tightly coupled information fusion method based on visual-geography optimization, which provides a geographic initializer and enables real-time estimation of the UAV’s geographical pose. Finally, the algorithm dynamically adjusts the weight of geographic information to improve optimization accuracy. The proposed method is extensively evaluated in both simulated and real-world environments, and the results demonstrate that our proposed approach can accurately and in real-time estimate the geographic pose of the UAV in a GNSS-denied environment. Specifically, our proposed approach achieves a root-mean-square error (RMSE) and mean positioning accuracy of less than 13 m. Full article
Show Figures

Figure 1

29 pages, 815 KiB  
Review
Literature Review of Deep-Learning-Based Detection of Violence in Video
by Pablo Negre, Ricardo S. Alonso, Alfonso González-Briones, Javier Prieto and Sara Rodríguez-González
Sensors 2024, 24(12), 4016; https://doi.org/10.3390/s24124016 - 20 Jun 2024
Cited by 1 | Viewed by 2465
Abstract
Physical aggression is a serious and widespread problem in society, affecting people worldwide. It impacts nearly every aspect of life. While some studies explore the root causes of violent behavior, others focus on urban planning in high-crime areas. Real-time violence detection, powered by [...] Read more.
Physical aggression is a serious and widespread problem in society, affecting people worldwide. It impacts nearly every aspect of life. While some studies explore the root causes of violent behavior, others focus on urban planning in high-crime areas. Real-time violence detection, powered by artificial intelligence, offers a direct and efficient solution, reducing the need for extensive human supervision and saving lives. This paper is a continuation of a systematic mapping study and its objective is to provide a comprehensive and up-to-date review of AI-based video violence detection, specifically in physical assaults. Regarding violence detection, the following have been grouped and categorized from the review of the selected papers: 21 challenges that remain to be solved, 28 datasets that have been created in recent years, 21 keyframe extraction methods, 16 types of algorithm inputs, as well as a wide variety of algorithm combinations and their corresponding accuracy results. Given the lack of recent reviews dealing with the detection of violence in video, this study is considered necessary and relevant. Full article
(This article belongs to the Special Issue Edge Computing in IoT Networks Based on Artificial Intelligence)
Show Figures

Figure 1

22 pages, 7124 KiB  
Article
ADM-SLAM: Accurate and Fast Dynamic Visual SLAM with Adaptive Feature Point Extraction, Deeplabv3pro, and Multi-View Geometry
by Xiaotao Huang, Xingbin Chen, Ning Zhang, Hongjie He and Sang Feng
Sensors 2024, 24(11), 3578; https://doi.org/10.3390/s24113578 - 2 Jun 2024
Viewed by 1455
Abstract
Visual Simultaneous Localization and Mapping (V-SLAM) plays a crucial role in the development of intelligent robotics and autonomous navigation systems. However, it still faces significant challenges in handling highly dynamic environments. The prevalent method currently used for dynamic object recognition in the environment [...] Read more.
Visual Simultaneous Localization and Mapping (V-SLAM) plays a crucial role in the development of intelligent robotics and autonomous navigation systems. However, it still faces significant challenges in handling highly dynamic environments. The prevalent method currently used for dynamic object recognition in the environment is deep learning. However, models such as Yolov5 and Mask R-CNN require significant computational resources, which limits their potential in real-time applications due to hardware and time constraints. To overcome this limitation, this paper proposes ADM-SLAM, a visual SLAM system designed for dynamic environments that builds upon the ORB-SLAM2. This system integrates efficient adaptive feature point homogenization extraction, lightweight deep learning semantic segmentation based on an improved DeepLabv3, and multi-view geometric segmentation. It optimizes keyframe extraction, segments potential dynamic objects using contextual information with the semantic segmentation network, and detects the motion states of dynamic objects using multi-view geometric methods, thereby eliminating dynamic interference points. The results indicate that ADM-SLAM outperforms ORB-SLAM2 in dynamic environments, especially in high-dynamic scenes, where it achieves up to a 97% reduction in Absolute Trajectory Error (ATE). In various highly dynamic test sequences, ADM-SLAM outperforms DS-SLAM and DynaSLAM in terms of real-time performance and accuracy, proving its excellent adaptability. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

14 pages, 11587 KiB  
Article
Efficient Structure from Motion for Large-Size Videos from an Open Outdoor UAV Dataset
by Ruilin Xiang, Jiagang Chen and Shunping Ji
Sensors 2024, 24(10), 3039; https://doi.org/10.3390/s24103039 - 10 May 2024
Viewed by 868
Abstract
Modern UAVs (unmanned aerial vehicles) equipped with video cameras can provide large-scale high-resolution video data. This poses significant challenges for structure from motion (SfM) and simultaneous localization and mapping (SLAM) algorithms, as most of them are developed for relatively small-scale and low-resolution scenes. [...] Read more.
Modern UAVs (unmanned aerial vehicles) equipped with video cameras can provide large-scale high-resolution video data. This poses significant challenges for structure from motion (SfM) and simultaneous localization and mapping (SLAM) algorithms, as most of them are developed for relatively small-scale and low-resolution scenes. In this paper, we present a video-based SfM method specifically designed for high-resolution large-size UAV videos. Despite the wide range of applications for SfM, performing mainstream SfM methods on such videos poses challenges due to their high computational cost. Our method consists of three main steps. Firstly, we employ a visual SLAM (VSLAM) system to efficiently extract keyframes, keypoints, initial camera poses, and sparse structures from downsampled videos. Next, we propose a novel two-step keypoint adjustment method. Instead of matching new points in the original videos, our method effectively and efficiently adjusts the existing keypoints at the original scale. Finally, we refine the poses and structures using a rotation-averaging constrained global bundle adjustment (BA) technique, incorporating the adjusted keypoints. To enrich the resources available for SLAM or SfM studies, we provide a large-size (3840 × 2160) outdoor video dataset with millimeter-level-accuracy ground control points, which supplements the current relatively low-resolution video datasets. Experiments demonstrate that, compared with other SLAM or SfM methods, our method achieves an average efficiency improvement of 100% on our collected dataset and 45% on the EuRoc dataset. Our method also demonstrates superior localization accuracy when compared with state-of-the-art SLAM or SfM methods. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

19 pages, 8006 KiB  
Article
An Underwater Localization Method Based on Visual SLAM for the Near-Bottom Environment
by Zonglin Liu, Meng Wang, Hanwen Hu, Tong Ge and Rui Miao
J. Mar. Sci. Eng. 2024, 12(5), 716; https://doi.org/10.3390/jmse12050716 - 26 Apr 2024
Viewed by 978
Abstract
The feature matching of the near-bottom visual SLAM is influenced by underwater raised sediments, resulting in tracking loss. In this paper, the novel visual SLAM system is proposed in the underwater raised sediments environment. The underwater images are firstly classified based on the [...] Read more.
The feature matching of the near-bottom visual SLAM is influenced by underwater raised sediments, resulting in tracking loss. In this paper, the novel visual SLAM system is proposed in the underwater raised sediments environment. The underwater images are firstly classified based on the color recognition method by adding the weights of pixel location to reduce the interference of similar colors on the seabed. The improved adaptive median filter method is proposed to filter the classified images by using the mean value of the filter window border as the discriminant condition to retain the original features of the image. The filtered images are finally processed by the tracking module to obtain the trajectory of underwater vehicles and the seafloor maps. The datasets of seamount areas captured in the western Pacific Ocean are processed by the improved visual SLAM system. The keyframes, mapping points, and feature point matching pairs extracted from the improved visual SLAM system are improved by 5.2%, 11.2%, and 4.5% compared with that of the ORB-SLAM3 system, respectively. The improved visual SLAM system has the advantage of robustness to dynamic disturbances, which is of practical application in underwater vehicles operated in near-bottom areas such as seamounts and nodules. Full article
Show Figures

Figure 1

20 pages, 18424 KiB  
Article
Accurate Recognition of Jujube Tree Trunks Based on Contrast Limited Adaptive Histogram Equalization Image Enhancement and Improved YOLOv8
by Shunkang Ling, Nianyi Wang, Jingbin Li and Longpeng Ding
Forests 2024, 15(4), 625; https://doi.org/10.3390/f15040625 - 29 Mar 2024
Cited by 3 | Viewed by 1213
Abstract
The accurate recognition of tree trunks is a prerequisite for precision orchard yield estimation. Facing the practical problems of complex orchard environment and large data flow, the existing object detection schemes suffer from key issues such as poor data quality, low timeliness and [...] Read more.
The accurate recognition of tree trunks is a prerequisite for precision orchard yield estimation. Facing the practical problems of complex orchard environment and large data flow, the existing object detection schemes suffer from key issues such as poor data quality, low timeliness and accuracy, and weak generalization ability. In this paper, an improved YOLOv8 is designed on the basis of data flow screening and enhancement for lightweight jujube tree trunk accurate detection. Firstly, the key frame extraction algorithm was proposed and utilized to efficiently screen the effective data. Secondly, the CLAHE image data enhancement method was proposed and used to enhance the data quality. Finally, the backbone of the YOLOv8 model was replaced with a GhostNetv2 structure for lightweight transformation, also introducing the improved CA_H attention mechanism. Extensive comparison and ablation results show that the average precision of the quality-enhanced dataset over that of the original dataset increases from 81.2% to 90.1%, and the YOLOv8s-GhostNetv2-CA_H model proposed in this paper reduces the model size by 19.5% compared to that of the YOLOv8s base model, with precision increasing by 2.4% to 92.3%, recall increasing by 1.4%, [email protected] increasing by 1.8%, and FPS being 17.1% faster. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

21 pages, 4867 KiB  
Article
MultiFusedNet: A Multi-Feature Fused Network of Pretrained Vision Models via Keyframes for Student Behavior Classification
by Somsawut Nindam, Seung-Hoon Na and Hyo Jong Lee
Appl. Sci. 2024, 14(1), 230; https://doi.org/10.3390/app14010230 - 26 Dec 2023
Viewed by 1277
Abstract
This research proposes a deep learning method for classifying student behavior in classrooms that follow the professional learning community teaching approach. We collected data on five student activities: hand-raising, interacting, sitting, turning around, and writing. We used the sum of absolute differences (SAD) [...] Read more.
This research proposes a deep learning method for classifying student behavior in classrooms that follow the professional learning community teaching approach. We collected data on five student activities: hand-raising, interacting, sitting, turning around, and writing. We used the sum of absolute differences (SAD) in the LUV color space to detect scene changes. The K-means algorithm was then applied to select keyframes using the computed SAD. Next, we extracted features using multiple pretrained deep learning models from the convolutional neural network family. The pretrained models considered were InceptionV3, ResNet50V2, VGG16, and EfficientNetB7. We leveraged feature fusion, incorporating optical flow features and data augmentation techniques, to increase the necessary spatial features of selected keyframes. Finally, we classified the students’ behavior using a deep sequence model based on the bidirectional long short-term memory network with an attention mechanism (BiLSTM-AT). The proposed method with the BiLSTM-AT model can recognize behaviors from our dataset with high accuracy, precision, recall, and F1-scores of 0.97, 0.97, and 0.97, respectively. The overall accuracy was 96.67%. This high efficiency demonstrates the potential of the proposed method for classifying student behavior in classrooms. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

21 pages, 7565 KiB  
Article
Early Fire Detection Using Long Short-Term Memory-Based Instance Segmentation and Internet of Things for Disaster Management
by Sharaf J. Malebary
Sensors 2023, 23(22), 9043; https://doi.org/10.3390/s23229043 - 8 Nov 2023
Cited by 2 | Viewed by 1412
Abstract
Fire outbreaks continue to cause damage despite the improvements in fire-detection tools and algorithms. As the human population and global warming continue to rise, fires have emerged as a significant worldwide issue. These factors may contribute to the greenhouse effect and climatic changes, [...] Read more.
Fire outbreaks continue to cause damage despite the improvements in fire-detection tools and algorithms. As the human population and global warming continue to rise, fires have emerged as a significant worldwide issue. These factors may contribute to the greenhouse effect and climatic changes, among other detrimental consequences. It is still challenging to implement a well-performing and optimized approach, which is sufficiently accurate, and has tractable complexity and a low false alarm rate. A small fire and the identification of a fire from a long distance are also challenges in previously proposed techniques. In this study, we propose a novel hybrid model, called IS-CNN-LSTM, based on convolutional neural networks (CNN) to detect and analyze fire intensity. A total of 21 convolutional layers, 24 rectified linear unit (ReLU) layers, 6 pooling layers, 3 fully connected layers, 2 dropout layers, and a softmax layer are included in the proposed 57-layer CNN model. Our proposed model performs instance segmentation to distinguish between fire and non-fire events. To reduce the intricacy of the proposed model, we also propose a key-frame extraction algorithm. The proposed model uses Internet of Things (IoT) devices to alert the relevant person by calculating the severity of the fire. Our proposed model is tested on a publicly available dataset having fire and normal videos. The achievement of 95.25% classification accuracy, 0.09% false positive rate (FPR), 0.65% false negative rate (FNR), and a prediction time of 0.08 s validates the proposed system. Full article
Show Figures

Figure 1

27 pages, 8113 KiB  
Article
A Robust Semi-Direct 3D SLAM for Mobile Robot Based on Dense Optical Flow in Dynamic Scenes
by Bo Hu and Jingwen Luo
Biomimetics 2023, 8(4), 371; https://doi.org/10.3390/biomimetics8040371 - 16 Aug 2023
Cited by 1 | Viewed by 1490
Abstract
Dynamic objects bring about a large number of error accumulations in pose estimation of mobile robots in dynamic scenes, and result in the failure to build a map that is consistent with the surrounding environment. Along these lines, this paper presents a robust [...] Read more.
Dynamic objects bring about a large number of error accumulations in pose estimation of mobile robots in dynamic scenes, and result in the failure to build a map that is consistent with the surrounding environment. Along these lines, this paper presents a robust semi-direct 3D simultaneous localization and mapping (SLAM) algorithm for mobile robots based on dense optical flow. First, a preliminary estimation of the robot’s pose is conducted using the sparse direct method and the homography matrix is utilized to compensate for the current frame image to reduce the image deformation caused by rotation during the robot’s motion. Then, by calculating the dense optical flow field of two adjacent frames and segmenting the dynamic region in the scene based on the dynamic threshold, the local map points projected within the dynamic regions are eliminated. On this basis, the robot’s pose is optimized by minimizing the reprojection error. Moreover, a high-performance keyframe selection strategy is developed, and keyframes are inserted when the robot’s pose is successfully tracked. Meanwhile, feature points are extracted and matched to the keyframes for subsequent optimization and mapping. Considering that the direct method is subject to tracking failure in practical application scenarios, the feature points and map points of keyframes are employed in robot relocation. Finally, all keyframes and map points are used as optimization variables for global bundle adjustment (BA) optimization, so as to construct a globally consistent 3D dense octree map. A series of simulations and experiments demonstrate the superior performance of the proposed algorithm. Full article
(This article belongs to the Section Locomotion and Bioinspired Robotics)
Show Figures

Figure 1

16 pages, 33350 KiB  
Article
Research on Multi-Sensor Simultaneous Localization and Mapping Technology for Complex Environment of Construction Machinery
by Haoling Ren, Yaping Zhao, Tianliang Lin and Jiangdong Wu
Appl. Sci. 2023, 13(14), 8496; https://doi.org/10.3390/app13148496 - 23 Jul 2023
Viewed by 1411
Abstract
Simultaneous localization and mapping (SLAM), as a key task of unmanned vehicles for construction machinery, is of great significance for later path planning and control. Construction tasks in the engineering field are mostly carried out in bridges, tunnels, open fields, etc. The prominent [...] Read more.
Simultaneous localization and mapping (SLAM), as a key task of unmanned vehicles for construction machinery, is of great significance for later path planning and control. Construction tasks in the engineering field are mostly carried out in bridges, tunnels, open fields, etc. The prominent features of these environments are high scene similarity, few geometric features, and large-scale repetitive texture information, which is prone to sensor detection degradation. This leads to positioning drift and map building failure. The traditional method of motion estimation and 3D reconstruction uses a single sensor, which lacks enough information, has poor adaptability to the environment, and cannot guarantee good positioning accuracy and robustness in complex environments. Currently, the strategy of multi-sensor fusion is proven to be an effective solution and is widely studied. This paper proposes a SLAM framework that integrates LiDAR, IMU, and camera. It tightly couples the texture information observed by camera, the geometric information scanned by LiDAR, and the measured value of IMU, allowing visual-inertial odometry (VIO) and LiDAR-inertial odometry (LIO) common implementation. The LIO subsystem extracts point cloud features and matches them with the global map. The obtained pose estimation can be used for the initialization of the VIO subsystem. The VIO system uses direct method to minimize the photometric error and IMU measurement error between images to estimate the pose of the robot and the geometric structure of the scene. The two subsystems assist each other to perform pose estimation, and can operate normally even when any subsystem fails. A factor graph is used to combine all constraints to achieve global pose optimization. Keyframe and sliding window strategies are used to ensure real-time performance. Through real-vehicle testing, the system can perform incremental and real-time state estimation and reconstruct a dense 3D point cloud map, which can effectively solve the problems of positioning drift and mapping failure in the lack of geometric features or challenging construction environments. Full article
(This article belongs to the Special Issue AI Applications in the Industrial Technologies)
Show Figures

Figure 1

Back to TopTop