MDPI - Publisher of Open Access Journals

20 pages, 351 KiB

Open AccessArticle

Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees

by Ali Baheri

Mathematics 2025, 13(1), 149; https://doi.org/10.3390/math13010149 - 3 Jan 2025

Viewed by 151

The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when [...] Read more.

The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when decisions must follow a hierarchical structure (as in autonomous systems where high-level strategy guides low-level actions); (2) when there are constraints at multiple levels of decision-making (such as both system-wide and component-level resource limits); and (3) when available actions depend on previous choices or context. To address these challenges, we introduce the hierarchical constrained bandits (HCB) framework, which extends contextual bandits to incorporate both hierarchical decisions and multilevel constraints. We propose the HC-UCB (hierarchical constrained upper confidence bound) algorithm to solve the HCB problem. The algorithm uses confidence bounds within a hierarchical setting to balance exploration and exploitation while respecting constraints at all levels. Our theoretical analysis establishes that HC-UCB achieves sublinear regret, guarantees constraint satisfaction at all hierarchical levels, and is near-optimal in terms of achievable performance. Simple experimental results demonstrate the algorithm’s effectiveness in balancing reward maximization with constraint satisfaction. Full article

► Show Figures

Figure 1

26 pages, 1149 KiB

Open AccessArticle

A Massively Parallel SMC Sampler for Decision Trees

by Efthyvoulos Drousiotis, Alessandro Varsi, Alexander M. Phillips, Simon Maskell and Paul G. Spirakis

Algorithms 2025, 18(1), 14; https://doi.org/10.3390/a18010014 - 2 Jan 2025

Viewed by 131

Abstract

Bayesian approaches to decision trees (DTs) using Markov Chain Monte Carlo (MCMC) samplers have recently demonstrated state-of-the-art accuracy performance when it comes to training DTs to solve classification problems. Despite the competitive classification accuracy, MCMC requires a potentially long runtime to converge. A [...] Read more.

Bayesian approaches to decision trees (DTs) using Markov Chain Monte Carlo (MCMC) samplers have recently demonstrated state-of-the-art accuracy performance when it comes to training DTs to solve classification problems. Despite the competitive classification accuracy, MCMC requires a potentially long runtime to converge. A widely used approach to reducing an algorithm’s runtime is to employ modern multi-core computer architectures, either with shared memory (SM) or distributed memory (DM), and use parallel computing to accelerate the algorithm. However, the inherent sequential nature of MCMC makes it unsuitable for parallel implementation unless the accuracy is sacrificed. This issue is particularly evident in DM architectures, which normally provide access to larger numbers of cores than SM. Sequential Monte Carlo (SMC) samplers are a parallel alternative to MCMC, which do not trade off accuracy for parallelism. However, the performance of SMC samplers in the context of DTs is underexplored, and the parallelization is complicated by the challenges in parallelizing its bottleneck, namely redistribution, especially on variable-size data types such as DTs. In this work, we study the problem of parallelizing SMC in the context of DTs both on SM and DM. On both memory architectures, we show that the proposed parallelization strategies achieve asymptotically optimal

O ({log}_{2} N)

time complexity. Numerical results are presented for a 32-core SM machine and a 256-core DM cluster. For both computer architectures, the experimental results show that our approach has comparable or better accuracy than MCMC but runs up to 51 times faster on SM and 640 times faster on DM. In this paper, we share the GitHub link to the source code. Full article

(This article belongs to the Collection Parallel and Distributed Computing: Algorithms and Applications)

► Show Figures

Figure 1

14 pages, 1424 KiB

Open AccessArticle

Rice Disease Classification Using a Stacked Ensemble of Deep Convolutional Neural Networks

by Zhibin Wang, Yana Wei, Cuixia Mu, Yunhe Zhang and Xiaojun Qiao

Sustainability 2025, 17(1), 124; https://doi.org/10.3390/su17010124 - 27 Dec 2024

Viewed by 373

Abstract

Rice is a staple food for almost half of the world’s population, and the stability and sustainability of rice production plays a decisive role in food security. Diseases are a major cause of loss in rice crops. The timely discovery and control of [...] Read more.

Rice is a staple food for almost half of the world’s population, and the stability and sustainability of rice production plays a decisive role in food security. Diseases are a major cause of loss in rice crops. The timely discovery and control of diseases are important in reducing the use of pesticides, protecting the agricultural eco-environment, and improving the yield and quality of rice crops. Deep convolutional neural networks (DCNNs) have achieved great success in disease image classification. However, most models have complex network structures that frequently cause problems, such as redundant network parameters, low training efficiency, and high computational costs. To address this issue and improve the accuracy of rice disease classification, a lightweight deep convolutional neural network (DCNN) ensemble method for rice disease classification is proposed. First, a new lightweight DCNN model (called CG-EfficientNet), which is based on an attention mechanism and EfficientNet, was designed as the base learner. Second, CG-EfficientNet models with different optimization algorithms and network parameters were trained on rice disease datasets to generate seven different CG-EfficientNets, and a resampling strategy was used to enhance the diversity of the individual models. Then, the sequential least squares programming algorithm was used to calculate the weight of each base model. Finally, logistic regression was used as the meta-classifier for stacking. To verify the effectiveness, classification experiments were performed on five classes of rice tissue images: rice bacterial blight, rice kernel smut, rice false smut, rice brown spot, and healthy leaves. The accuracy of the proposed method was 96.10%, which is higher than the results of the classic CNN models VGG16, InceptionV3, ResNet101, and DenseNet201 and four integration methods. The experimental results show that the proposed method is not only capable of accurately identifying rice diseases but is also computationally efficient. Full article

(This article belongs to the Special Issue New Technological Applications in Agriculture for the Development of the Circular Bioeconomy)

► Show Figures

Figure 1

22 pages, 9786 KiB

Open AccessArticle

Determination of Sequential Well Placements Using a Multi-Modal Convolutional Neural Network Integrated with Evolutionary Optimization

by Seoyoon Kwon, Minsoo Ji, Min Kim, Juliana Y. Leung and Baehyun Min

Mathematics 2025, 13(1), 36; https://doi.org/10.3390/math13010036 - 26 Dec 2024

Viewed by 365

Abstract

In geoenergy science and engineering, well placement optimization is the process of determining optimal well locations and configurations to maximize economic value while considering geological, engineering, economic, and environmental constraints. This complex multi-million-dollar problem involves optimizing multiple parameters using computationally intensive reservoir simulations, [...] Read more.

In geoenergy science and engineering, well placement optimization is the process of determining optimal well locations and configurations to maximize economic value while considering geological, engineering, economic, and environmental constraints. This complex multi-million-dollar problem involves optimizing multiple parameters using computationally intensive reservoir simulations, often employing advanced algorithms such as optimization algorithms and machine/deep learning techniques to find near-optimal solutions efficiently while accounting for uncertainties and risks. This study proposes a hybrid workflow for determining the locations of production wells during primary oil recovery using a multi-modal convolutional neural network (M-CNN) integrated with an evolutionary optimization algorithm. The particle swarm optimization algorithm provides the M-CNN with full-physics reservoir simulation results as learning data correlating an arbitrary well location and its cumulative oil production. The M-CNN learns the correlation between near-wellbore spatial properties (e.g., porosity, permeability, pressure, and saturation) and cumulative oil production as inputs and output, respectively. The learned M-CNN predicts oil productivity at every candidate well location and selects qualified well placement scenarios. The prediction performance of the M-CNN for hydrocarbon-prolific regions is improved by adding qualified scenarios to the learning data and re-training the M-CNN. This iterative learning scheme enhances the suitability of the proxy for solving the problem of maximizing oil productivity. The validity of the proxy is tested with a benchmark model, UNISIM-I-D, in which four oil production wells are sequentially drilled. The M-CNN approach demonstrates remarkable consistency and alignment with full-physics reservoir simulation results. It achieves prediction accuracy within a 3% relative error margin, while significantly reducing computational costs to just 11.18% of those associated with full-physics reservoir simulations. Moreover, the M-CNN-optimized well placement strategy yields a substantial 47.40% improvement in field cumulative oil production compared to the original configuration. These findings underscore the M-CNN’s effectiveness in sequential well placement optimization, striking an optimal balance between predictive accuracy and computational efficiency. The method’s ability to dramatically reduce processing time while maintaining high accuracy makes it a valuable tool for enhancing oil field productivity and streamlining reservoir management decisions. Full article

(This article belongs to the Special Issue Evolutionary Multi-Criteria Optimization: Methods and Applications)

► Show Figures

Figure 1

17 pages, 2438 KiB

Open AccessArticle

Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning

by Yana Yang, Meng Xi, Huiao Dai, Jiabao Wen and Jiachen Yang

Sensors 2024, 24(23), 7746; https://doi.org/10.3390/s24237746 - 4 Dec 2024

Viewed by 399

Abstract

Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, [...] Read more.

Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, deep reinforcement learning possesses powerful perception and decision-making capabilities and has been widely applied to various domains to tackle complex decision problems. Off-policy reinforcement learning separates exploration and exploitation by storing and replaying interaction experiences, making it easier to find global optimal solutions. Understanding how to utilize experiences is crucial for improving the efficiency of off-policy reinforcement learning algorithms. To address this problem, this paper proposes Z-Score Prioritized Experience Replay, which enhances the utilization of experiences and improves the performance and convergence speed of the algorithm. A series of ablation experiments demonstrate that the proposed method significantly improves the effectiveness of deep reinforcement learning algorithms. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

20 pages, 3221 KiB

Open AccessArticle

A VIKOR-Based Sequential Three-Way Classification Ranking Method

by Wentao Xu, Jin Qian, Yueyang Wu, Shaowei Yan, Yongting Ni and Guangjin Yang

Algorithms 2024, 17(11), 530; https://doi.org/10.3390/a17110530 - 19 Nov 2024

Cited by 1 | Viewed by 523

Abstract

VIKOR uses the idea of overall utility maximization and individual regret minimization to afford a compromise result for multi-attribute decision-making problems with conflicting attributes. Many researchers have proposed corresponding improvements and expansions to make it more suitable for sorting optimization in their respective [...] Read more.

VIKOR uses the idea of overall utility maximization and individual regret minimization to afford a compromise result for multi-attribute decision-making problems with conflicting attributes. Many researchers have proposed corresponding improvements and expansions to make it more suitable for sorting optimization in their respective research fields. However, these improvements and extensions only rank the alternatives without classifying them. For this purpose, this text introduces the three-way sequential decisions method and combines it with the VIKOR method to design a three-way VIKOR method that can deal with both ranking and classification. By using the final negative ideal solution (NIS) and the final positive ideal solution (PIS) for all alternatives, the individual regret value and group utility value of each alternative were calculated. Different three-way VIKOR models were obtained by four different combinations of individual regret value and group utility value. In the ranking process, the characteristics of VIKOR method are introduced, and the subjective preference of decision makers is considered by using individual regret, group utility, and decision index values. In the classification process, the corresponding alternatives are divided into the corresponding decision domains by sequential three-way decisions, and the risk of direct acceptance or rejection is avoided by putting the uncertain alternatives into the boundary region to delay the decision. The alternative is divided into decision domains through sequential three-way decisions, sorted according to the collation rules in the same decision domain, and the final sorting results are obtained according to the collation rules in different decision domains. Finally, the effectiveness and correctness of the proposed method are verified by a project investment example, and the results are compared and evaluated. The experimental results show that the proposed method has a significant correlation with the results of other methods, ad is effective and feasible, and is simpler and more effective in dealing with some problems. Errors caused by misclassification is reduced by sequential three-way decisions. Full article

(This article belongs to the Special Issue Data-Driven Intelligent Modeling and Optimization Algorithms for Industrial Processes)

► Show Figures

Figure 1

15 pages, 874 KiB

Open AccessArticle

Deep Reinforcement Learning-Driven Jamming-Enhanced Secure Unmanned Aerial Vehicle Communications

by Zhifang Xing, Yunhui Qin, Changhao Du, Wenzhang Wang and Zhongshan Zhang

Sensors 2024, 24(22), 7328; https://doi.org/10.3390/s24227328 - 16 Nov 2024

Cited by 1 | Viewed by 592

Abstract

Despite its flexibility, unmanned aerial vehicle (UAV) communications are susceptible to eavesdropping due to the open nature of wireless channels and the broadcasting nature of wireless signals. This paper studies secure UAV communications and proposes a method to optimize the minimum secrecy rate [...] Read more.

Despite its flexibility, unmanned aerial vehicle (UAV) communications are susceptible to eavesdropping due to the open nature of wireless channels and the broadcasting nature of wireless signals. This paper studies secure UAV communications and proposes a method to optimize the minimum secrecy rate of the system by using interference technology to enhance it. To this end, the system not only deploys multiple UAV base stations (BSs) to provide services to legitimate users but also assigns dedicated UAV jammers to send interference signals to active or potential eavesdroppers to disrupt their eavesdropping effectiveness. Based on this configuration, we formulate the optimization process of parameters such as the user association variables, UAV trajectory, and output power as a sequential decision-making problem and use the single-agent soft actor-critic (SAC) algorithm and twin delayed deep deterministic policy gradient (TD3) algorithm to achieve joint optimization of the core parameters. In addition, for specific scenarios, we also use the multi-agent soft actor-critic (MASAC) algorithm to solve the joint optimization problem mentioned above. The numerical results show that the normalized average secrecy rate of the MASAC algorithm increased by more than 6.6% and 14.2% compared with that of the SAC and TD3 algorithms, respectively. Full article

(This article belongs to the Special Issue Novel Signal Processing Techniques for Wireless Communications)

► Show Figures

Figure 1

29 pages, 3537 KiB

Open AccessArticle

Dynamic Integrated Scheduling of Production Equipment and Automated Guided Vehicles in a Flexible Job Shop Based on Deep Reinforcement Learning

by Jingrui Wang, Yi Li, Zhongwei Zhang, Zhaoyun Wu, Lihui Wu, Shun Jia and Tao Peng

Processes 2024, 12(11), 2423; https://doi.org/10.3390/pr12112423 - 2 Nov 2024

Viewed by 1405

Abstract

The high-quality development of the manufacturing industry necessitates accelerating its transformation towards high-end, intelligent, and green development. Considering logistics resource constraints, the impact of dynamic disturbance events on production, and the need for energy-efficient production, the integrated scheduling of production equipment and automated [...] Read more.

The high-quality development of the manufacturing industry necessitates accelerating its transformation towards high-end, intelligent, and green development. Considering logistics resource constraints, the impact of dynamic disturbance events on production, and the need for energy-efficient production, the integrated scheduling of production equipment and automated guided vehicles (AGVs) in a flexible job shop environment is investigated in this study. Firstly, a static model for the integrated scheduling of production equipment and AGVs (ISPEA) is developed based on mixed-integer programming, which aims to optimize the maximum completion time and total production energy consumption (EC). In recent years, reinforcement learning, including deep reinforcement learning (DRL), has demonstrated significant advantages in handling workshop scheduling issues with sequential decision-making characteristics, which can fully utilize the vast quantity of historical data accumulated in the workshop and adjust production plans in a timely manner based on changes in production conditions and demand. Accordingly, a DRL-based approach is introduced to address the common production disturbances in emergency order insertions. Combined with the characteristics of the ISPEA problem and an event-driven strategy for handling dynamic events, four types of agents, namely workpiece selection, machine selection, AGV selection, and target selection agents, are set up, which refine workshop production status characteristics as observation inputs and generate rules for selecting workpieces, machines, AGVs, and targets. These agents are trained offline using the QMIX multi-agent reinforcement learning framework, and the trained agents are utilized to solve the dynamic ISPEA problem. Finally, the effectiveness of the proposed model and method is validated through a comparison of the solution performance with other typical optimization algorithms for various cases. Full article

(This article belongs to the Topic AI and IoT for Promoting Green Operation and Sustainable Environment)

► Show Figures

Figure 1

21 pages, 1284 KiB

Open AccessArticle

Context-Dependent Criteria for Dirichlet Process in Sequential Decision-Making Problems

by Ksenia Kasianova and Mark Kelbert

Mathematics 2024, 12(21), 3321; https://doi.org/10.3390/math12213321 - 23 Oct 2024

Viewed by 570

Abstract

In models with insufficient initial information, parameter estimation can be subject to statistical uncertainty, potentially resulting in suboptimal decision-making; however, delaying implementation to gather more information can also incur costs. This paper examines an extension of information-theoretic approaches designed to address this classical [...] Read more.

In models with insufficient initial information, parameter estimation can be subject to statistical uncertainty, potentially resulting in suboptimal decision-making; however, delaying implementation to gather more information can also incur costs. This paper examines an extension of information-theoretic approaches designed to address this classical dilemma, focusing on balancing the expected profits and the information needed to be obtained about all of the possible outcomes. Initially utilized in binary outcome scenarios, these methods leverage information measures to harmonize competing objectives efficiently. Building upon the foundations laid by existing research, this methodology is expanded to encompass experiments with multiple outcome categories using Dirichlet processes. The core of our approach is centered around weighted entropy measures, particularly in scenarios dictated by Dirichlet distributions, which have not been extensively explored previously. We innovatively adapt the technique initially applied to binary case to Dirichlet distributions/processes. The primary contribution of our work is the formulation of a sequential minimization strategy for the main term of an asymptotic expansion of differential entropy, which scales with sample size, for non-binary outcomes. This paper provides a theoretical grounding, extended empirical applications, and comprehensive proofs, setting a robust framework for further interdisciplinary applications of information-theoretic paradigms in sequential decision-making. Full article

(This article belongs to the Special Issue Advances in Statistical Methods with Applications)

► Show Figures

Figure 1

20 pages, 5181 KiB

Open AccessArticle

Resource Allocation Approach of Avionics System in SPO Mode Based on Proximal Policy Optimization

by Lei Dong, Jiachen Liu, Zijing Sun, Xi Chen and Peng Wang

Aerospace 2024, 11(10), 812; https://doi.org/10.3390/aerospace11100812 - 4 Oct 2024

Viewed by 1201

Abstract

Single-Pilot Operations (SPO) mode is set to reshape the decision-making process between human-machine and air-ground operations. However, the limited on-board computing resources impose greater demands on the organization of performance parameters and the optimization of process efficiency in SPO mode. To address this [...] Read more.

Single-Pilot Operations (SPO) mode is set to reshape the decision-making process between human-machine and air-ground operations. However, the limited on-board computing resources impose greater demands on the organization of performance parameters and the optimization of process efficiency in SPO mode. To address this challenge, this paper first investigates the flexible requirements of avionics systems arising from changes in SPO operational scenarios, then analyzes the architecture of Reconfigurable Integrated Modular Avionics (RIMA) and its resource allocation framework in the context of scarcity and configurability. A “mission-function-resource” mapping relationship is established between the reconfiguration service elements of SPO mode and avionics resources. Subsequently, the Proximal Policy Optimization (PPO) algorithm is introduced to simulate the resource allocation process of IMA reconfiguration in SPO mode. The objective optimization process is transformed into a sequential decision-making problem by considering constraints and optimization criteria such as load, latency, and power consumption within the feasible domain of avionics system resources. Finally, the resource allocation scheme for avionics system reconfiguration is determined by controlling the probability of action selection during the interaction between the agent and the environment. The experimental results show that the resource allocation scheme based on the PPO algorithm can effectively reduce power consumption and latency, and the DRL model has strong anti-interference and generalization. This enables avionics resources to respond dynamically to the capabilities required in SPO mode and enhances their ability to support the aircraft mission at all stages. Full article

(This article belongs to the Collection Avionic Systems)

► Show Figures

Figure 1

22 pages, 1283 KiB

Open AccessArticle

Dynamic Approach to Update Utility and Choice by Emerging Technologies to Reduce Risk in Urban Road Transportation Systems

by Francesco Russo, Antonio Comi and Giovanna Chilà

Future Transp. 2024, 4(3), 1078-1099; https://doi.org/10.3390/futuretransp4030052 - 20 Sep 2024

Viewed by 909

Abstract

International research attention on evacuation issues has increased significantly following the human and natural disasters at the turn of the century, such as 9/11, Hurricane Katrina, Cyclones Idai and Kenneth, the Black Saturday forest fires and tsunamis in Japan. The main problem concerning [...] Read more.

International research attention on evacuation issues has increased significantly following the human and natural disasters at the turn of the century, such as 9/11, Hurricane Katrina, Cyclones Idai and Kenneth, the Black Saturday forest fires and tsunamis in Japan. The main problem concerning when a disaster can occur involves studying the risk reduction. Risk, following all the theoretical and experimental studies, is determined by the product of three components: occurrence, vulnerability and exposure. Vulnerability can be improved over time through major infrastructure actions, but absolute security cannot be achieved. When the event will occur with certainty, only exposure remains to reduce the risk to people before the effect hits them. Exposure can be improved, under fixed conditions of occurrence and vulnerability, by improving evacuation. The main problem in terms of evacuating the population from an area is the available transport system, which must be used to its fullest. So, if the system is well managed, the evacuation improves (shorter times), meaning the exposure is reduced, and therefore, the risk is reduced. A key factor in the analysis of transport systems under emergency conditions is the behavior of the user, and therefore, the study of demand. This work identifies the main research lines that are useful for studying demand under exposure-related risk conditions. The classification of demand models that simulate evacuation conditions in relation to the effect on the transportation system is summarized. The contribution proposes a model for updating choice in relation to emergency conditions and utility. The contribution of emerging ICTs to actualization is formally introduced into the models. Intelligent technologies make it possible to improve user decisions, reducing exposure and therefore risk. The proposed model moves within the two approaches of the literature: it is an inter-period dynamic model with the probability expressed within the discrete choice theory; furthermore, it is a sequential dynamic model with the probability dependent on the previous choices. The contribution presents an example of application of the model, developing a transition matrix considering the case of choice updating under two extreme conditions. Full article

► Show Figures

Figure 1

23 pages, 1626 KiB

Open AccessArticle

Is Reinforcement Learning Good at American Option Valuation?

by Peyman Kor, Reidar B. Bratvold and Aojie Hong

Algorithms 2024, 17(9), 400; https://doi.org/10.3390/a17090400 - 7 Sep 2024

Viewed by 1073

Abstract

This paper investigates algorithms for identifying the optimal policy for pricing American Options. The American Option pricing is reformulated as a Sequential Decision-Making problem with two binary actions (Exercise or Continue), transforming it into an optimal stopping time problem. Both the least square [...] Read more.

This paper investigates algorithms for identifying the optimal policy for pricing American Options. The American Option pricing is reformulated as a Sequential Decision-Making problem with two binary actions (Exercise or Continue), transforming it into an optimal stopping time problem. Both the least square Monte Carlo simulation method (LSM) and Reinforcement Learning (RL)-based methods were utilized to find the optimal policy and, hence, the fair value of the American Put Option. Both Classical Geometric Brownian Motion (GBM) and calibrated Stochastic Volatility models served as the underlying uncertain assets. The novelty of this work lies in two aspects: (1) Applying LSM- and RL-based methods to determine option prices, with a specific focus on analyzing the dynamics of “Decisions” made by each method and comparing final decisions chosen by the LSM and RL methods. (2) Assess how the RL method updates “Decisions” at each batch, revealing the evolution of the decisions during the learning process to achieve optimal policy. Full article

(This article belongs to the Special Issue Machine Learning Algorithms and Optimization in the Digital Transition)

► Show Figures

Figure 1

27 pages, 11040 KiB

Open AccessArticle

PolyDexFrame: Deep Reinforcement Learning-Based Pick-and-Place of Objects in Clutter

by Muhammad Babar Imtiaz, Yuansong Qiao and Brian Lee

Machines 2024, 12(8), 547; https://doi.org/10.3390/machines12080547 - 11 Aug 2024

Viewed by 1212

Abstract

This research study represents a polydexterous deep reinforcement learning-based pick-and-place framework for industrial clutter scenarios. In the proposed framework, the agent tends to learn the pick-and-place of regularly and irregularly shaped objects in clutter by using the sequential combination of prehensile and non-prehensile [...] Read more.

This research study represents a polydexterous deep reinforcement learning-based pick-and-place framework for industrial clutter scenarios. In the proposed framework, the agent tends to learn the pick-and-place of regularly and irregularly shaped objects in clutter by using the sequential combination of prehensile and non-prehensile robotic manipulations involving different robotic grippers in a completely self-supervised manner. The problem was tackled as a reinforcement learning problem; after the Markov decision process (MDP) was designed, the off-policy model-free Q-learning algorithm was deployed using deep Q-networks as a Q-function approximator. Four distinct robotic manipulations, i.e., grasp from the prehensile manipulation category and inward slide, outward slide, and suction grip from the non-prehensile manipulation category were considered as actions. The Q-function comprised four fully convolutional networks (FCN) corresponding to each action based on memory-efficient DenseNet-121 variants outputting pixel-wise maps of action-values jointly trained via the pixel-wise parametrization technique. Rewards were awarded according to the status of the action performed, and backpropagation was conducted accordingly for the FCN generating the maximum Q-value. The results showed that the agent learned the sequential combination of the polydexterous prehensile and non-prehensile manipulations, where the non-prehensile manipulations increased the possibility of prehensile manipulations. We achieved promising results in comparison to the baselines, differently designed variants, and density-based testing clutter. Full article

(This article belongs to the Special Issue Recent Advances in Robotics, Factory Automation and Intelligent Networked Systems)

► Show Figures

Figure 1

18 pages, 2451 KiB

Open AccessArticle

HRP-OG: Online Learning with Generative Feature Replay for Hypertension Risk Prediction in a Nonstationary Environment

by Shaofu Lin, Haokang Yan, Shiwei Zhou, Ziqian Qiao and Jianhui Chen

Sensors 2024, 24(15), 5033; https://doi.org/10.3390/s24155033 - 3 Aug 2024

Viewed by 1187

Abstract

Hypertension is a major risk factor for many serious diseases. With the aging population and lifestyle changes, the incidence of hypertension continues to rise, imposing a significant medical cost burden on patients and severely affecting their quality of life. Early intervention can greatly [...] Read more.

Hypertension is a major risk factor for many serious diseases. With the aging population and lifestyle changes, the incidence of hypertension continues to rise, imposing a significant medical cost burden on patients and severely affecting their quality of life. Early intervention can greatly reduce the prevalence of hypertension. Research on hypertension early warning models based on electronic health records (EHRs) is an important and effective method for achieving early hypertension warning. However, limited by the scarcity and imbalance of multivisit records, and the nonstationary characteristics of hypertension features, it is difficult to predict the probability of hypertension prevalence in a patient effectively. Therefore, this study proposes an online hypertension monitoring model (HRP-OG) based on reinforcement learning and generative feature replay. It transforms the hypertension prediction problem into a sequential decision problem, achieving risk prediction of hypertension for patients using multivisit records. Sensors embedded in medical devices and wearables continuously capture real-time physiological data such as blood pressure, heart rate, and activity levels, which are integrated into the EHR. The fit between the samples generated by the generator and the real visit data is evaluated using maximum likelihood estimation, which can reduce the adversarial discrepancy between the feature space of hypertension and incoming incremental data, and the model is updated online based on real-time data using generative feature replay. The incorporation of sensor data ensures that the model adapts dynamically to changes in the condition of patients, facilitating timely interventions. In this study, the publicly available MIMIC-III data are used for validation, and the experimental results demonstrate that compared to existing advanced methods, HRP-OG can effectively improve the accuracy of hypertension risk prediction for few-shot multivisit record in nonstationary environments. Full article

(This article belongs to the Special Issue Artificial Intelligence for Medical Sensing)

► Show Figures

Figure 1

13 pages, 3654 KiB

Open AccessArticle

Online Unmanned Aerial Vehicles Search Planning in an Unknown Search Environment

by Haopeng Duan, Kaiming Xiao, Lihua Liu, Haiwen Chen and Hongbin Huang

Drones 2024, 8(7), 336; https://doi.org/10.3390/drones8070336 - 19 Jul 2024

Viewed by 765

Abstract

Unmanned Aerial Vehicles (UAVs) have been widely used in localized data collection and information search. However, there are still many practical challenges in real-world operations of UAV search, such as unknown search environments. Specifically, the payoff and cost at each search point are [...] Read more.

Unmanned Aerial Vehicles (UAVs) have been widely used in localized data collection and information search. However, there are still many practical challenges in real-world operations of UAV search, such as unknown search environments. Specifically, the payoff and cost at each search point are unknown for the planner in advance, which poses a great challenge to decision making. That is, UAV search decisions should be made sequentially in an online manner thereby adapting to the unknown search environment. To this end, this paper initiates the problem of online decision making in UAV search planning, where the drone has limited energy supply as a constraint and has to make an irrevocable decision to search this area or route to the next in an online manner. To overcome the challenge of unknown search environment, a joint-planning approach is proposed, where both route selection and search decision are made in an integrated online manner. The integrated online decision is made through an online linear programming which is proved to be near-optimal, resulting in high information search revenue. Furthermore, this joint-planning approach can be favorably applied to multi-round online UAV search planning scenarios, showing a great superiority in first-mover dominance of gathering information. The effectiveness of the proposed approach is validated in a widely applied dataset, and experimental results show the superior performance of online search decision making. Full article

► Show Figures

Figure 1

Search Results (156)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (156)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI