Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (156)

Search Parameters:
Keywords = sequential decision problems

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 351 KiB  
Article
Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
by Ali Baheri
Mathematics 2025, 13(1), 149; https://doi.org/10.3390/math13010149 - 3 Jan 2025
Viewed by 151
Abstract
The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when [...] Read more.
The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when decisions must follow a hierarchical structure (as in autonomous systems where high-level strategy guides low-level actions); (2) when there are constraints at multiple levels of decision-making (such as both system-wide and component-level resource limits); and (3) when available actions depend on previous choices or context. To address these challenges, we introduce the hierarchical constrained bandits (HCB) framework, which extends contextual bandits to incorporate both hierarchical decisions and multilevel constraints. We propose the HC-UCB (hierarchical constrained upper confidence bound) algorithm to solve the HCB problem. The algorithm uses confidence bounds within a hierarchical setting to balance exploration and exploitation while respecting constraints at all levels. Our theoretical analysis establishes that HC-UCB achieves sublinear regret, guarantees constraint satisfaction at all hierarchical levels, and is near-optimal in terms of achievable performance. Simple experimental results demonstrate the algorithm’s effectiveness in balancing reward maximization with constraint satisfaction. Full article
Show Figures

Figure 1

26 pages, 1149 KiB  
Article
A Massively Parallel SMC Sampler for Decision Trees
by Efthyvoulos Drousiotis, Alessandro Varsi, Alexander M. Phillips, Simon Maskell and Paul G. Spirakis
Algorithms 2025, 18(1), 14; https://doi.org/10.3390/a18010014 - 2 Jan 2025
Viewed by 131
Abstract
Bayesian approaches to decision trees (DTs) using Markov Chain Monte Carlo (MCMC) samplers have recently demonstrated state-of-the-art accuracy performance when it comes to training DTs to solve classification problems. Despite the competitive classification accuracy, MCMC requires a potentially long runtime to converge. A [...] Read more.
Bayesian approaches to decision trees (DTs) using Markov Chain Monte Carlo (MCMC) samplers have recently demonstrated state-of-the-art accuracy performance when it comes to training DTs to solve classification problems. Despite the competitive classification accuracy, MCMC requires a potentially long runtime to converge. A widely used approach to reducing an algorithm’s runtime is to employ modern multi-core computer architectures, either with shared memory (SM) or distributed memory (DM), and use parallel computing to accelerate the algorithm. However, the inherent sequential nature of MCMC makes it unsuitable for parallel implementation unless the accuracy is sacrificed. This issue is particularly evident in DM architectures, which normally provide access to larger numbers of cores than SM. Sequential Monte Carlo (SMC) samplers are a parallel alternative to MCMC, which do not trade off accuracy for parallelism. However, the performance of SMC samplers in the context of DTs is underexplored, and the parallelization is complicated by the challenges in parallelizing its bottleneck, namely redistribution, especially on variable-size data types such as DTs. In this work, we study the problem of parallelizing SMC in the context of DTs both on SM and DM. On both memory architectures, we show that the proposed parallelization strategies achieve asymptotically optimal O(log2N) time complexity. Numerical results are presented for a 32-core SM machine and a 256-core DM cluster. For both computer architectures, the experimental results show that our approach has comparable or better accuracy than MCMC but runs up to 51 times faster on SM and 640 times faster on DM. In this paper, we share the GitHub link to the source code. Full article
(This article belongs to the Collection Parallel and Distributed Computing: Algorithms and Applications)
Show Figures

Figure 1

14 pages, 1424 KiB  
Article
Rice Disease Classification Using a Stacked Ensemble of Deep Convolutional Neural Networks
by Zhibin Wang, Yana Wei, Cuixia Mu, Yunhe Zhang and Xiaojun Qiao
Sustainability 2025, 17(1), 124; https://doi.org/10.3390/su17010124 - 27 Dec 2024
Viewed by 373
Abstract
Rice is a staple food for almost half of the world’s population, and the stability and sustainability of rice production plays a decisive role in food security. Diseases are a major cause of loss in rice crops. The timely discovery and control of [...] Read more.
Rice is a staple food for almost half of the world’s population, and the stability and sustainability of rice production plays a decisive role in food security. Diseases are a major cause of loss in rice crops. The timely discovery and control of diseases are important in reducing the use of pesticides, protecting the agricultural eco-environment, and improving the yield and quality of rice crops. Deep convolutional neural networks (DCNNs) have achieved great success in disease image classification. However, most models have complex network structures that frequently cause problems, such as redundant network parameters, low training efficiency, and high computational costs. To address this issue and improve the accuracy of rice disease classification, a lightweight deep convolutional neural network (DCNN) ensemble method for rice disease classification is proposed. First, a new lightweight DCNN model (called CG-EfficientNet), which is based on an attention mechanism and EfficientNet, was designed as the base learner. Second, CG-EfficientNet models with different optimization algorithms and network parameters were trained on rice disease datasets to generate seven different CG-EfficientNets, and a resampling strategy was used to enhance the diversity of the individual models. Then, the sequential least squares programming algorithm was used to calculate the weight of each base model. Finally, logistic regression was used as the meta-classifier for stacking. To verify the effectiveness, classification experiments were performed on five classes of rice tissue images: rice bacterial blight, rice kernel smut, rice false smut, rice brown spot, and healthy leaves. The accuracy of the proposed method was 96.10%, which is higher than the results of the classic CNN models VGG16, InceptionV3, ResNet101, and DenseNet201 and four integration methods. The experimental results show that the proposed method is not only capable of accurately identifying rice diseases but is also computationally efficient. Full article
Show Figures

Figure 1

22 pages, 9786 KiB  
Article
Determination of Sequential Well Placements Using a Multi-Modal Convolutional Neural Network Integrated with Evolutionary Optimization
by Seoyoon Kwon, Minsoo Ji, Min Kim, Juliana Y. Leung and Baehyun Min
Mathematics 2025, 13(1), 36; https://doi.org/10.3390/math13010036 - 26 Dec 2024
Viewed by 365
Abstract
In geoenergy science and engineering, well placement optimization is the process of determining optimal well locations and configurations to maximize economic value while considering geological, engineering, economic, and environmental constraints. This complex multi-million-dollar problem involves optimizing multiple parameters using computationally intensive reservoir simulations, [...] Read more.
In geoenergy science and engineering, well placement optimization is the process of determining optimal well locations and configurations to maximize economic value while considering geological, engineering, economic, and environmental constraints. This complex multi-million-dollar problem involves optimizing multiple parameters using computationally intensive reservoir simulations, often employing advanced algorithms such as optimization algorithms and machine/deep learning techniques to find near-optimal solutions efficiently while accounting for uncertainties and risks. This study proposes a hybrid workflow for determining the locations of production wells during primary oil recovery using a multi-modal convolutional neural network (M-CNN) integrated with an evolutionary optimization algorithm. The particle swarm optimization algorithm provides the M-CNN with full-physics reservoir simulation results as learning data correlating an arbitrary well location and its cumulative oil production. The M-CNN learns the correlation between near-wellbore spatial properties (e.g., porosity, permeability, pressure, and saturation) and cumulative oil production as inputs and output, respectively. The learned M-CNN predicts oil productivity at every candidate well location and selects qualified well placement scenarios. The prediction performance of the M-CNN for hydrocarbon-prolific regions is improved by adding qualified scenarios to the learning data and re-training the M-CNN. This iterative learning scheme enhances the suitability of the proxy for solving the problem of maximizing oil productivity. The validity of the proxy is tested with a benchmark model, UNISIM-I-D, in which four oil production wells are sequentially drilled. The M-CNN approach demonstrates remarkable consistency and alignment with full-physics reservoir simulation results. It achieves prediction accuracy within a 3% relative error margin, while significantly reducing computational costs to just 11.18% of those associated with full-physics reservoir simulations. Moreover, the M-CNN-optimized well placement strategy yields a substantial 47.40% improvement in field cumulative oil production compared to the original configuration. These findings underscore the M-CNN’s effectiveness in sequential well placement optimization, striking an optimal balance between predictive accuracy and computational efficiency. The method’s ability to dramatically reduce processing time while maintaining high accuracy makes it a valuable tool for enhancing oil field productivity and streamlining reservoir management decisions. Full article
(This article belongs to the Special Issue Evolutionary Multi-Criteria Optimization: Methods and Applications)
Show Figures

Figure 1

17 pages, 2438 KiB  
Article
Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning
by Yana Yang, Meng Xi, Huiao Dai, Jiabao Wen and Jiachen Yang
Sensors 2024, 24(23), 7746; https://doi.org/10.3390/s24237746 - 4 Dec 2024
Viewed by 399
Abstract
Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, [...] Read more.
Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, deep reinforcement learning possesses powerful perception and decision-making capabilities and has been widely applied to various domains to tackle complex decision problems. Off-policy reinforcement learning separates exploration and exploitation by storing and replaying interaction experiences, making it easier to find global optimal solutions. Understanding how to utilize experiences is crucial for improving the efficiency of off-policy reinforcement learning algorithms. To address this problem, this paper proposes Z-Score Prioritized Experience Replay, which enhances the utilization of experiences and improves the performance and convergence speed of the algorithm. A series of ablation experiments demonstrate that the proposed method significantly improves the effectiveness of deep reinforcement learning algorithms. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

20 pages, 3221 KiB  
Article
A VIKOR-Based Sequential Three-Way Classification Ranking Method
by Wentao Xu, Jin Qian, Yueyang Wu, Shaowei Yan, Yongting Ni and Guangjin Yang
Algorithms 2024, 17(11), 530; https://doi.org/10.3390/a17110530 - 19 Nov 2024
Cited by 1 | Viewed by 523
Abstract
VIKOR uses the idea of overall utility maximization and individual regret minimization to afford a compromise result for multi-attribute decision-making problems with conflicting attributes. Many researchers have proposed corresponding improvements and expansions to make it more suitable for sorting optimization in their respective [...] Read more.
VIKOR uses the idea of overall utility maximization and individual regret minimization to afford a compromise result for multi-attribute decision-making problems with conflicting attributes. Many researchers have proposed corresponding improvements and expansions to make it more suitable for sorting optimization in their respective research fields. However, these improvements and extensions only rank the alternatives without classifying them. For this purpose, this text introduces the three-way sequential decisions method and combines it with the VIKOR method to design a three-way VIKOR method that can deal with both ranking and classification. By using the final negative ideal solution (NIS) and the final positive ideal solution (PIS) for all alternatives, the individual regret value and group utility value of each alternative were calculated. Different three-way VIKOR models were obtained by four different combinations of individual regret value and group utility value. In the ranking process, the characteristics of VIKOR method are introduced, and the subjective preference of decision makers is considered by using individual regret, group utility, and decision index values. In the classification process, the corresponding alternatives are divided into the corresponding decision domains by sequential three-way decisions, and the risk of direct acceptance or rejection is avoided by putting the uncertain alternatives into the boundary region to delay the decision. The alternative is divided into decision domains through sequential three-way decisions, sorted according to the collation rules in the same decision domain, and the final sorting results are obtained according to the collation rules in different decision domains. Finally, the effectiveness and correctness of the proposed method are verified by a project investment example, and the results are compared and evaluated. The experimental results show that the proposed method has a significant correlation with the results of other methods, ad is effective and feasible, and is simpler and more effective in dealing with some problems. Errors caused by misclassification is reduced by sequential three-way decisions. Full article
Show Figures

Figure 1

15 pages, 874 KiB  
Article
Deep Reinforcement Learning-Driven Jamming-Enhanced Secure Unmanned Aerial Vehicle Communications
by Zhifang Xing, Yunhui Qin, Changhao Du, Wenzhang Wang and Zhongshan Zhang
Sensors 2024, 24(22), 7328; https://doi.org/10.3390/s24227328 - 16 Nov 2024
Cited by 1 | Viewed by 592
Abstract
Despite its flexibility, unmanned aerial vehicle (UAV) communications are susceptible to eavesdropping due to the open nature of wireless channels and the broadcasting nature of wireless signals. This paper studies secure UAV communications and proposes a method to optimize the minimum secrecy rate [...] Read more.
Despite its flexibility, unmanned aerial vehicle (UAV) communications are susceptible to eavesdropping due to the open nature of wireless channels and the broadcasting nature of wireless signals. This paper studies secure UAV communications and proposes a method to optimize the minimum secrecy rate of the system by using interference technology to enhance it. To this end, the system not only deploys multiple UAV base stations (BSs) to provide services to legitimate users but also assigns dedicated UAV jammers to send interference signals to active or potential eavesdroppers to disrupt their eavesdropping effectiveness. Based on this configuration, we formulate the optimization process of parameters such as the user association variables, UAV trajectory, and output power as a sequential decision-making problem and use the single-agent soft actor-critic (SAC) algorithm and twin delayed deep deterministic policy gradient (TD3) algorithm to achieve joint optimization of the core parameters. In addition, for specific scenarios, we also use the multi-agent soft actor-critic (MASAC) algorithm to solve the joint optimization problem mentioned above. The numerical results show that the normalized average secrecy rate of the MASAC algorithm increased by more than 6.6% and 14.2% compared with that of the SAC and TD3 algorithms, respectively. Full article
(This article belongs to the Special Issue Novel Signal Processing Techniques for Wireless Communications)
Show Figures

Figure 1

29 pages, 3537 KiB  
Article
Dynamic Integrated Scheduling of Production Equipment and Automated Guided Vehicles in a Flexible Job Shop Based on Deep Reinforcement Learning
by Jingrui Wang, Yi Li, Zhongwei Zhang, Zhaoyun Wu, Lihui Wu, Shun Jia and Tao Peng
Processes 2024, 12(11), 2423; https://doi.org/10.3390/pr12112423 - 2 Nov 2024
Viewed by 1405
Abstract
The high-quality development of the manufacturing industry necessitates accelerating its transformation towards high-end, intelligent, and green development. Considering logistics resource constraints, the impact of dynamic disturbance events on production, and the need for energy-efficient production, the integrated scheduling of production equipment and automated [...] Read more.
The high-quality development of the manufacturing industry necessitates accelerating its transformation towards high-end, intelligent, and green development. Considering logistics resource constraints, the impact of dynamic disturbance events on production, and the need for energy-efficient production, the integrated scheduling of production equipment and automated guided vehicles (AGVs) in a flexible job shop environment is investigated in this study. Firstly, a static model for the integrated scheduling of production equipment and AGVs (ISPEA) is developed based on mixed-integer programming, which aims to optimize the maximum completion time and total production energy consumption (EC). In recent years, reinforcement learning, including deep reinforcement learning (DRL), has demonstrated significant advantages in handling workshop scheduling issues with sequential decision-making characteristics, which can fully utilize the vast quantity of historical data accumulated in the workshop and adjust production plans in a timely manner based on changes in production conditions and demand. Accordingly, a DRL-based approach is introduced to address the common production disturbances in emergency order insertions. Combined with the characteristics of the ISPEA problem and an event-driven strategy for handling dynamic events, four types of agents, namely workpiece selection, machine selection, AGV selection, and target selection agents, are set up, which refine workshop production status characteristics as observation inputs and generate rules for selecting workpieces, machines, AGVs, and targets. These agents are trained offline using the QMIX multi-agent reinforcement learning framework, and the trained agents are utilized to solve the dynamic ISPEA problem. Finally, the effectiveness of the proposed model and method is validated through a comparison of the solution performance with other typical optimization algorithms for various cases. Full article
Show Figures

Figure 1

21 pages, 1284 KiB  
Article
Context-Dependent Criteria for Dirichlet Process in Sequential Decision-Making Problems
by Ksenia Kasianova and Mark Kelbert
Mathematics 2024, 12(21), 3321; https://doi.org/10.3390/math12213321 - 23 Oct 2024
Viewed by 570
Abstract
In models with insufficient initial information, parameter estimation can be subject to statistical uncertainty, potentially resulting in suboptimal decision-making; however, delaying implementation to gather more information can also incur costs. This paper examines an extension of information-theoretic approaches designed to address this classical [...] Read more.
In models with insufficient initial information, parameter estimation can be subject to statistical uncertainty, potentially resulting in suboptimal decision-making; however, delaying implementation to gather more information can also incur costs. This paper examines an extension of information-theoretic approaches designed to address this classical dilemma, focusing on balancing the expected profits and the information needed to be obtained about all of the possible outcomes. Initially utilized in binary outcome scenarios, these methods leverage information measures to harmonize competing objectives efficiently. Building upon the foundations laid by existing research, this methodology is expanded to encompass experiments with multiple outcome categories using Dirichlet processes. The core of our approach is centered around weighted entropy measures, particularly in scenarios dictated by Dirichlet distributions, which have not been extensively explored previously. We innovatively adapt the technique initially applied to binary case to Dirichlet distributions/processes. The primary contribution of our work is the formulation of a sequential minimization strategy for the main term of an asymptotic expansion of differential entropy, which scales with sample size, for non-binary outcomes. This paper provides a theoretical grounding, extended empirical applications, and comprehensive proofs, setting a robust framework for further interdisciplinary applications of information-theoretic paradigms in sequential decision-making. Full article
(This article belongs to the Special Issue Advances in Statistical Methods with Applications)
Show Figures

Figure 1

20 pages, 5181 KiB  
Article
Resource Allocation Approach of Avionics System in SPO Mode Based on Proximal Policy Optimization
by Lei Dong, Jiachen Liu, Zijing Sun, Xi Chen and Peng Wang
Aerospace 2024, 11(10), 812; https://doi.org/10.3390/aerospace11100812 - 4 Oct 2024
Viewed by 1201
Abstract
Single-Pilot Operations (SPO) mode is set to reshape the decision-making process between human-machine and air-ground operations. However, the limited on-board computing resources impose greater demands on the organization of performance parameters and the optimization of process efficiency in SPO mode. To address this [...] Read more.
Single-Pilot Operations (SPO) mode is set to reshape the decision-making process between human-machine and air-ground operations. However, the limited on-board computing resources impose greater demands on the organization of performance parameters and the optimization of process efficiency in SPO mode. To address this challenge, this paper first investigates the flexible requirements of avionics systems arising from changes in SPO operational scenarios, then analyzes the architecture of Reconfigurable Integrated Modular Avionics (RIMA) and its resource allocation framework in the context of scarcity and configurability. A “mission-function-resource” mapping relationship is established between the reconfiguration service elements of SPO mode and avionics resources. Subsequently, the Proximal Policy Optimization (PPO) algorithm is introduced to simulate the resource allocation process of IMA reconfiguration in SPO mode. The objective optimization process is transformed into a sequential decision-making problem by considering constraints and optimization criteria such as load, latency, and power consumption within the feasible domain of avionics system resources. Finally, the resource allocation scheme for avionics system reconfiguration is determined by controlling the probability of action selection during the interaction between the agent and the environment. The experimental results show that the resource allocation scheme based on the PPO algorithm can effectively reduce power consumption and latency, and the DRL model has strong anti-interference and generalization. This enables avionics resources to respond dynamically to the capabilities required in SPO mode and enhances their ability to support the aircraft mission at all stages. Full article
(This article belongs to the Collection Avionic Systems)
Show Figures

Figure 1

22 pages, 1283 KiB  
Article
Dynamic Approach to Update Utility and Choice by Emerging Technologies to Reduce Risk in Urban Road Transportation Systems
by Francesco Russo, Antonio Comi and Giovanna Chilà
Future Transp. 2024, 4(3), 1078-1099; https://doi.org/10.3390/futuretransp4030052 - 20 Sep 2024
Viewed by 909
Abstract
International research attention on evacuation issues has increased significantly following the human and natural disasters at the turn of the century, such as 9/11, Hurricane Katrina, Cyclones Idai and Kenneth, the Black Saturday forest fires and tsunamis in Japan. The main problem concerning [...] Read more.
International research attention on evacuation issues has increased significantly following the human and natural disasters at the turn of the century, such as 9/11, Hurricane Katrina, Cyclones Idai and Kenneth, the Black Saturday forest fires and tsunamis in Japan. The main problem concerning when a disaster can occur involves studying the risk reduction. Risk, following all the theoretical and experimental studies, is determined by the product of three components: occurrence, vulnerability and exposure. Vulnerability can be improved over time through major infrastructure actions, but absolute security cannot be achieved. When the event will occur with certainty, only exposure remains to reduce the risk to people before the effect hits them. Exposure can be improved, under fixed conditions of occurrence and vulnerability, by improving evacuation. The main problem in terms of evacuating the population from an area is the available transport system, which must be used to its fullest. So, if the system is well managed, the evacuation improves (shorter times), meaning the exposure is reduced, and therefore, the risk is reduced. A key factor in the analysis of transport systems under emergency conditions is the behavior of the user, and therefore, the study of demand. This work identifies the main research lines that are useful for studying demand under exposure-related risk conditions. The classification of demand models that simulate evacuation conditions in relation to the effect on the transportation system is summarized. The contribution proposes a model for updating choice in relation to emergency conditions and utility. The contribution of emerging ICTs to actualization is formally introduced into the models. Intelligent technologies make it possible to improve user decisions, reducing exposure and therefore risk. The proposed model moves within the two approaches of the literature: it is an inter-period dynamic model with the probability expressed within the discrete choice theory; furthermore, it is a sequential dynamic model with the probability dependent on the previous choices. The contribution presents an example of application of the model, developing a transition matrix considering the case of choice updating under two extreme conditions. Full article
Show Figures

Figure 1

23 pages, 1626 KiB  
Article
Is Reinforcement Learning Good at American Option Valuation?
by Peyman Kor, Reidar B. Bratvold and Aojie Hong
Algorithms 2024, 17(9), 400; https://doi.org/10.3390/a17090400 - 7 Sep 2024
Viewed by 1073
Abstract
This paper investigates algorithms for identifying the optimal policy for pricing American Options. The American Option pricing is reformulated as a Sequential Decision-Making problem with two binary actions (Exercise or Continue), transforming it into an optimal stopping time problem. Both the least square [...] Read more.
This paper investigates algorithms for identifying the optimal policy for pricing American Options. The American Option pricing is reformulated as a Sequential Decision-Making problem with two binary actions (Exercise or Continue), transforming it into an optimal stopping time problem. Both the least square Monte Carlo simulation method (LSM) and Reinforcement Learning (RL)-based methods were utilized to find the optimal policy and, hence, the fair value of the American Put Option. Both Classical Geometric Brownian Motion (GBM) and calibrated Stochastic Volatility models served as the underlying uncertain assets. The novelty of this work lies in two aspects: (1) Applying LSM- and RL-based methods to determine option prices, with a specific focus on analyzing the dynamics of “Decisions” made by each method and comparing final decisions chosen by the LSM and RL methods. (2) Assess how the RL method updates “Decisions” at each batch, revealing the evolution of the decisions during the learning process to achieve optimal policy. Full article
Show Figures

Figure 1

27 pages, 11040 KiB  
Article
PolyDexFrame: Deep Reinforcement Learning-Based Pick-and-Place of Objects in Clutter
by Muhammad Babar Imtiaz, Yuansong Qiao and Brian Lee
Viewed by 1212
Abstract
This research study represents a polydexterous deep reinforcement learning-based pick-and-place framework for industrial clutter scenarios. In the proposed framework, the agent tends to learn the pick-and-place of regularly and irregularly shaped objects in clutter by using the sequential combination of prehensile and non-prehensile [...] Read more.
This research study represents a polydexterous deep reinforcement learning-based pick-and-place framework for industrial clutter scenarios. In the proposed framework, the agent tends to learn the pick-and-place of regularly and irregularly shaped objects in clutter by using the sequential combination of prehensile and non-prehensile robotic manipulations involving different robotic grippers in a completely self-supervised manner. The problem was tackled as a reinforcement learning problem; after the Markov decision process (MDP) was designed, the off-policy model-free Q-learning algorithm was deployed using deep Q-networks as a Q-function approximator. Four distinct robotic manipulations, i.e., grasp from the prehensile manipulation category and inward slide, outward slide, and suction grip from the non-prehensile manipulation category were considered as actions. The Q-function comprised four fully convolutional networks (FCN) corresponding to each action based on memory-efficient DenseNet-121 variants outputting pixel-wise maps of action-values jointly trained via the pixel-wise parametrization technique. Rewards were awarded according to the status of the action performed, and backpropagation was conducted accordingly for the FCN generating the maximum Q-value. The results showed that the agent learned the sequential combination of the polydexterous prehensile and non-prehensile manipulations, where the non-prehensile manipulations increased the possibility of prehensile manipulations. We achieved promising results in comparison to the baselines, differently designed variants, and density-based testing clutter. Full article
Show Figures

Figure 1

18 pages, 2451 KiB  
Article
HRP-OG: Online Learning with Generative Feature Replay for Hypertension Risk Prediction in a Nonstationary Environment
by Shaofu Lin, Haokang Yan, Shiwei Zhou, Ziqian Qiao and Jianhui Chen
Sensors 2024, 24(15), 5033; https://doi.org/10.3390/s24155033 - 3 Aug 2024
Viewed by 1187
Abstract
Hypertension is a major risk factor for many serious diseases. With the aging population and lifestyle changes, the incidence of hypertension continues to rise, imposing a significant medical cost burden on patients and severely affecting their quality of life. Early intervention can greatly [...] Read more.
Hypertension is a major risk factor for many serious diseases. With the aging population and lifestyle changes, the incidence of hypertension continues to rise, imposing a significant medical cost burden on patients and severely affecting their quality of life. Early intervention can greatly reduce the prevalence of hypertension. Research on hypertension early warning models based on electronic health records (EHRs) is an important and effective method for achieving early hypertension warning. However, limited by the scarcity and imbalance of multivisit records, and the nonstationary characteristics of hypertension features, it is difficult to predict the probability of hypertension prevalence in a patient effectively. Therefore, this study proposes an online hypertension monitoring model (HRP-OG) based on reinforcement learning and generative feature replay. It transforms the hypertension prediction problem into a sequential decision problem, achieving risk prediction of hypertension for patients using multivisit records. Sensors embedded in medical devices and wearables continuously capture real-time physiological data such as blood pressure, heart rate, and activity levels, which are integrated into the EHR. The fit between the samples generated by the generator and the real visit data is evaluated using maximum likelihood estimation, which can reduce the adversarial discrepancy between the feature space of hypertension and incoming incremental data, and the model is updated online based on real-time data using generative feature replay. The incorporation of sensor data ensures that the model adapts dynamically to changes in the condition of patients, facilitating timely interventions. In this study, the publicly available MIMIC-III data are used for validation, and the experimental results demonstrate that compared to existing advanced methods, HRP-OG can effectively improve the accuracy of hypertension risk prediction for few-shot multivisit record in nonstationary environments. Full article
(This article belongs to the Special Issue Artificial Intelligence for Medical Sensing)
Show Figures

Figure 1

13 pages, 3654 KiB  
Article
Online Unmanned Aerial Vehicles Search Planning in an Unknown Search Environment
by Haopeng Duan, Kaiming Xiao, Lihua Liu, Haiwen Chen and Hongbin Huang
Viewed by 765
Abstract
Unmanned Aerial Vehicles (UAVs) have been widely used in localized data collection and information search. However, there are still many practical challenges in real-world operations of UAV search, such as unknown search environments. Specifically, the payoff and cost at each search point are [...] Read more.
Unmanned Aerial Vehicles (UAVs) have been widely used in localized data collection and information search. However, there are still many practical challenges in real-world operations of UAV search, such as unknown search environments. Specifically, the payoff and cost at each search point are unknown for the planner in advance, which poses a great challenge to decision making. That is, UAV search decisions should be made sequentially in an online manner thereby adapting to the unknown search environment. To this end, this paper initiates the problem of online decision making in UAV search planning, where the drone has limited energy supply as a constraint and has to make an irrevocable decision to search this area or route to the next in an online manner. To overcome the challenge of unknown search environment, a joint-planning approach is proposed, where both route selection and search decision are made in an integrated online manner. The integrated online decision is made through an online linear programming which is proved to be near-optimal, resulting in high information search revenue. Furthermore, this joint-planning approach can be favorably applied to multi-round online UAV search planning scenarios, showing a great superiority in first-mover dominance of gathering information. The effectiveness of the proposed approach is validated in a widely applied dataset, and experimental results show the superior performance of online search decision making. Full article
Show Figures

Figure 1

Back to TopTop