1. Introduction
In recent years, with the rapid growth of China’s ships, offshore platforms, buoys, and other marine activities and facilities, the demand for low-latency, high-reliability communications at sea has become increasingly strong [
1,
2,
3]. Satellite networks, as an important means of communication, can provide a wide range of network connectivity for marine users, which to a certain extent meets the needs of maritime communication. However, satellite signals are affected by weather, ocean reflections, and multi-path interference, which can easily lead to unstable signal quality and thus affect the communication experience. Meanwhile, due to the long signal transmission distance of satellite communication and the limited satellite on-board payload (e.g., antenna, power, etc.), these factors lead to significant challenges in terms of transmission rate, latency, and reliability of satellite communication in the marine environment [
4,
5]. On the other hand, although terrestrial base stations can provide high-speed networks for some sea areas, their coverage is extremely limited. In particular in open sea areas and areas far from land, the coastline of marine areas is long and variable, and terrestrial base stations cannot provide continuous coverage. Meanwhile, for high-speed moving targets such as ships and offshore platforms, the fixed deployment of base stations cannot meet their real-time changing communication needs [
6]. In this case, relying only on terrestrial networks as well as satellite networks to provide maritime services has certain limitations in terms of coverage, communication capacity, and computational resources, and thus it is difficult to meet large-scale, highly reliable, and low-latency maritime communication needs.
In this context, UAV-assisted communication has gradually become an effective program. With the characteristics of flexibility and high mobility, UAVs can dynamically adjust their flight altitude and position according to real-time demand in the marine environment, thereby covering the blind areas that it is difficult to cover with traditional communication networks. Compared with satellites and terrestrial base stations, drones can be rapidly deployed to specific areas to provide high-quality localized communication coverage, especially in hot spots where communication needs are more concentrated, such as sea areas far from land and near offshore platforms [
7,
8]. UAVs are also able to adjust flight routes and altitudes according to real-time communication needs, reducing signal attenuation and ensuring communication quality. In addition, UAVs can increase the line-of-sight (LOS) transmission range of the airborne base station so that it can communicate more effectively with ships, buoys, and other equipment on the sea surface, further improving the communication quality and network reliability [
9,
10]. On this basis, there is an urgent need to develop more efficient network architectures and deployment schemes for better performance in massive maritime data transmission and task processing.
In the maritime network environment, due to the mobility of ships and the limitation of UAV communication coverage, UAVs need to adjust their flight position and altitude in real time to ensure that they can maintain a stable communication connection with ships. In this case, it is particularly important to accurately predict the ship’s route. By predicting the ship’s trajectory, UAVs can plan their flight paths and adjust their positions in advance, thus optimizing the deployment and scheduling of UAVs. This not only helps to improve the response speed of UAVs in complex marine environments but also ensures the continuity and stability of communication coverage [
11]. Automatic Identification Systems (AISs) can monitor ship dynamics and can obtain real-time information on the transmitted positions of hundreds of thousands of ships around the world [
12]. These data can be used to effectively predict the ship’s trajectory and provide UAVs with positional information for some time in the future. As a result, the UAV can make real-time adjustments more flexibly, thus significantly improving the response speed and stability of maritime communication signals, and ensuring the UAV’s communication coverage capability under complex sea conditions.
This paper explores the potential benefits of UAVs in enhancing maritime coverage, with a focus on coordinating UAVs with existing satellites and ground base stations through route prediction to enable dynamic deployment and improve the overall performance of maritime communication networks. The main contributions of this paper are as follows:
We propose a deep learning algorithm-based voyage trajectory-prediction model for predicting the voyage trajectory of the user’s vessels, which enables communication nodes to sense the user position more accurately, thereby optimizing the deployment scheme and ultimately achieving dynamic and precise coverage.
We designed a RUDD algorithm based on route prediction. The algorithm takes into account the computational resources, communication coverage, latency of the communication nodes, the limitations of the UAVs’ battery capacity, and the mobility of the user, aiming to maximize the network coverage, reduce the total latency of the system to process the communication tasks, and reduce the energy consumption of the UAV to enable dynamic deployment.
We design simulation experiments to demonstrate that the RUDD algorithm significantly outperforms other basic algorithms in reducing total system cost, improving communication coverage, and reducing system delay. We also test the algorithm under different system model parameters to evaluate its performance. Simulation results show that the proposed algorithm has better stability and confirms its applicability in marine IoT scenarios.
The rest of the paper is structured as follows.
Section 2 discusses related work.
Section 3 describes the system model and problem formulation. Then,
Section 4 presents the design of the RUDD program process. In
Section 5, the experimental performance is evaluated.
Section 6 presents the conclusion.
2. Related Work
Near-Earth orbit (LEO) satellites play an important role in maritime communications, and many references in the literature have conducted specific studies on the satellite-wide-area coverage problem. For example, ref. [
13] analyzes the average backhaul capacity of terrestrial satellite terminals using stochastic geometry and queuing theory and proposes a multi-layer LEO satellite constellation deployment scheme that takes satellite mobility into account and supports seamless global coverage. To address the saturation problem in available LEO space and meet the ultra-low latency requirements of future 6G, ref. [
14] adopts a decomposition-aggregation approach combined with an elite strategy genetic algorithm to minimize satellite size and maximize average coverage, ensuring high robustness and reliability. To address issues such as the limitations of a single satellite deployment model, insufficient channel model accuracy, and limited communication coverage, ref. [
15] proposes a multi-layer satellite deployment strategy. By deploying satellites on multi-layer concentric spheres, this strategy significantly enhances both the model’s pervasiveness and the coverage capability of the satellite communication system. Meanwhile, the Shadow Rice Fading (SR) model is introduced to characterize the channel between satellites and terrestrial gateway stations, which improves the accuracy of the channel model. Although some progress has been made with existing satellite-deployment strategies, these methods usually do not take into account the deployment of drones to supplement coverage. When satellites are deployed on a large scale to extend coverage, it is not only costly but also results in higher losses during signal transmission and lower transmission rates, making it difficult to meet the demand for low latency and high reliability in maritime communications.
The mobility and flexibility of UAVs enable them to provide essential communication services to users in the edge areas of the cell as well as to users with higher requirements for communication quality. Several research papers have focused on applications and studies that rely on UAVs alone for coverage. For example, ref. [
16] investigates the problem of 3D path planning for UAVs with cellular network connectivity and proposes a multi-step dueling DDQN-based algorithm for coverage maximization. Ref. [
17] mainly explores the system performance of UAV-assisted networks in urban environments, focusing on the radiation gain of directional antennas. By deriving the network coverage probability based on stochastic geometric theory, the signal transmission efficiency and coverage range were significantly improved. Ref. [
18] proposed a distributed algorithm based on the virtual Coulomb force and Voronoi diagram with two mobility schemes and redundant UAV dormancy strategy to minimize the number of UAVs, improve communication coverage, and save energy. Ref. [
19] investigates the coverage utility and energy multi-objective optimization problem in multi-UAV communication scenarios and proposes an improved multi-objective Gray Wolf optimization algorithm. The algorithm optimizes the number, position, and speed of UAVs to maximize the coverage utility and minimize energy consumption through role determination and hybrid solution initialization strategies. Ref. [
20] investigates the multi-UAV base station deployment problem considering constraints such as movement speed, energy consumption, and communication coverage radius and proposes an algorithm called Dense Multi-Intelligent Body Reinforcement Learning, which aims to maximize the communication coverage of the vehicle-mounted network. For the problem of UAV swarms performing full coverage of an area, ref. [
21] proposes a path-planning algorithm realized by information exchange between UAV swarms and adopts the parallel line full coverage path to provide an effective solution for UAV swarms’ full coverage path planning in a simple area. To solve the problem of wireless emergency communication in maritime emergency communication networks, ref. [
22] proposed a wireless emergency communication relay system based on a tethered UAV platform, which accounts for significant advantages in rapid and flexible deployment and long-time spatial coverage. Ref. [
23] investigated the deployment of UAVs to optimize coverage quality after a disaster or during episodic events. A decentralized deployment algorithm based on weighted Voronoi cells is proposed, aiming to minimize the average distance between UAVs and users while maintaining the connectivity between UAVs and fixed base stations to improve coverage performance.
Although UAVs are playing an increasingly important role in communication coverage, a single UAV coverage solution may not be able to provide stable and long-lasting services in some scenarios due to energy constraints, coverage limitations, and other issues. For this reason, combining UAVs with communications from satellites and traditional base stations to build a synergistic and complementary multilevel communication network will be able to address the limitations in existing research. Much of the literature examines how UAVs can be deployed into existing satellite–ground integrated networks to improve maritime network coverage. For example, considering the impact of the distance between oceanic surface stations and the coastline on the coverage performance, ref. [
24] investigates a wide-range maritime communication architecture based on SAGSIN, which analyzes the random distribution of oceanic surface stations on the ocean surface and improves the coverage probability of oceanic surface stations. Ref. [
25] proposed a hybrid satellite–UAV–terrestrial network based on NOMA technology with a joint power allocation scheme to maximize the rate and coverage of the offshore network. Ref. [
26] combines the advantages of wide coverage of satellites with the high capacity of shore-based systems and deploys UAVs to enhance the coverage of a hybrid satellite–terrestrial offshore communication network, while jointly optimizing the UAV trajectory and flight transmit power. Ref. [
27] describes a hybrid satellite–UAV–terrestrial network for maritime communications that achieves extensive coverage and energy efficiency at sea by coordinating different communication links. While these studies consider factors such as energy consumption, communication latency, and coverage, they mostly ignore the computational resource constraints of the UAVs themselves and the dynamic mobility of maritime users.
Considering the impact of maritime user mobility on the deployment of communication nodes, prediction algorithms have been introduced to forecast ship trajectories, enabling more effective dynamic deployment. Many studies have focused on trajectory-prediction algorithms for ships. For example, ref. [
28] investigated how to predict ship trajectories in the inner harbor of Busan port using AIS data and deep learning techniques, solving the problem of irregular intervals of AIS data through linear interpolation, and improving the accuracy of route prediction in a complex port environment. Ref. [
29] proposed a bi-directional data-driven trajectory-prediction method based on AIS spatio-temporal data, constructed an encoder–decoder network driven by forward and reverse integrated historical trajectories, and predicted ship trajectories by fusing the characteristics of the sub-networks. Ref. [
30] proposed a multi-gated attention encoder–decoder network that significantly improves the accuracy of ship trajectory prediction. The scheme combines the LSTM network with Gated Recurrent Units and an attention mechanism and enhances the generalization ability and robustness of the model by introducing a soft-threshold residual structure to handle sparse features. Ref. [
31] proposes a deep learning-based framework for ship trajectory prediction, consisting of two models: Differential Long Short-Term Memory (DLSTM) and Enhanced DLSTM with Reference Trajectory Correction (Ref-DLSTM). These models are used for cases without and with reference trajectories, respectively, effectively reducing prediction errors. Ref. [
32] proposes a ship trajectory-prediction algorithm called Deep Bidirectional Information Empowerment, which utilizes an integrated network and attention mechanism. It combines the strengths of bidirectional long short-term memory and bidirectional gated recurrent unit networks, optimizing the weights of both network units through the attention mechanism to enhance prediction accuracy and efficiency. Considering the multi-density distribution characteristics of trajectory data, a multi-density adaptive trajectory-clustering algorithm is proposed in ref. [
33], which determines the input parameters adaptively and introduces a trajectory direction identification mechanism to make it perform better in dealing with complex trajectory-clustering problems. To address the heterogeneity of vessel motion patterns, ref. [
34] proposes a generalized vessel trajectory-prediction method based on clustering. This method uses historical AIS data to cluster route patterns for each vessel type, considering spatial and heading attributes as well as environmental factors, thereby improving the accuracy and computational efficiency of trajectory prediction. To overcome the reliance on historical position data while ignoring key factors like speed and heading in voyage trajectory prediction, ref. [
35] proposes a novel ship trajectory-prediction model based on a sequence-to-sequence structure. This model integrally considers multifaceted ship information and improves prediction accuracy through a multi-semantic encoder and a type-oriented decoder.
User ships are mobile when operating at sea, and route prediction for mobile users can effectively anticipate their future locations. This aids in the efficient deployment of communication nodes and enables dynamic coverage for less computationally demanding or non-urgent tasks. The lack of existing research in this area will affect the effectiveness of dynamic coverage to a certain extent. Therefore, the research in this paper focuses on studying a model that can realize the prediction of ship trajectories, and at the same time can improve the coverage of communication as well as the UAV flight trajectory, reduce the communication delay, and improve the quality of service for users.
3. System Modeling
3.1. Network Model
We consider a network architecture consisting of mobile users, UAVs, terrestrial base stations (TBSs), and satellites. TBSs are deployed in coastal areas to provide communication services to users in coastal waters. The broadband coverage of TBSs is usually limited due to high non-line-of-sight path loss. Outside the TBS coverage area, satellites deployed in space orbit provide communication links. Ships equipped with expensive high-gain antennas can be guaranteed broadband service. However, for low-end ships without high-gain antennas, it is still difficult to enjoy broadband service even within the satellite coverage area. To fill this gap, we use drones to provide broadband services. More specifically, if a mobile user requires a high-rate communication service (e.g., videoconferencing) from
to
, the communication request will be sent from the mobile user to the nearest TBS and then transmitted to the central processor. The central processor selects an idle UAV and prepares the idle UAV to serve the mobile user. The UAV will fly along the optimized trajectory to serve the user from
to
. After completing the high-speed communication service, the mobile user will associate with the nearest TBS at the moment
, and the UAV will return to the coast. Consider a target coverage area with network services from an LEO satellite, a terrestrial base station, and m drones, and there are n user ships in the area, which are serviced with
, and
, as shown in
Figure 1. The satellite serves as the main base station, which mainly handles computing and communication tasks for the more concentrated users at sea. Each UAV is equipped with a computational processor that can handle simple computational tasks. The transmission time is time-slotted, and it is assumed that one information-transmission cycle is divided into
T time slots, each denoted by
t, where
. The user moves at a speed of
, which satisfies
, where
denotes the maximum speed of vessel movement.
3.2. Network Communication Model
At the
t-th time slot, the position of the
m-th UAV can be expressed as:
where
denotes as the horizontal coordinate of the
m-th UAV at t time slots and
denotes the vertical coordinate of the
m-th UAV at the time of t time slots, and
h denotes the height of the
m-th UAV from the sea surface at the time of t time slots. In this paper, the altitude of the flight varies between
and
.
denotes the horizontal coordinate of the
m-th UAV at
time slots, and
denotes the vertical coordinate of the
m-th UAV at
time slots.
denotes the distance flown by the drone at a one-time slot.
denotes the maximum flight distance of the drone in a one-time slot.
denotes the horizontal flight direction.
The linear distance between UAV m and user n is as follows, where
denotes the horizontal coordinate of user n at t time slots, and
denotes the vertical coordinate of user n at t time slots:
In our model, it is assumed that the satellite and terrestrial base station TBS are relatively stationary within the shorter time slot
. At the moment
t, the position of the satellite is denoted as
, and the position of the ground base station is denoted as
. The straight-line distance between the satellite, ground base station, and user n is denoted as:
Assuming that there is no mission migration between UAVs, satellites, and base stations, and that the minimum channel capacity
C and bandwidth
B of each device are known, we can find the minimum signal-to-interference-plus-noise-ratio (SNR) according to Shannon’s formula:
Assume that all UAVs share the same frequency band, so the UAVs interfere with each other in downlink transmissions [
36]. Meanwhile, it is assumed that the transmit power
from all UAVs to the user and the channel gain
are the same, so the signal-to-noise ratio
can also be expressed as the following equation, where
denotes the noise power at the receiving end:
Therefore, the maximum coverage radius of the UAV can be derived as:
Since this system assumes that there is only one satellite and base station and they are in separate frequency bands, the interactions between the UAV, satellite, and base station are negligible when calculating the signal-to-noise ratio. Therefore, we can find out the SNR
, and the maximum coverage radius of the satellite and the base station are, respectively:
In each time slot, if the distance between user n and UAV m is less than or equal to the communication coverage radius of UAV, a communication link can be established. Similarly, a communication link can be established if the distance between user n and the satellite is less than or equal to the communication coverage radius of the satellite. A communication link can be established if the distance between user n and the TBS is less than or equal to the communication coverage radius of the terrestrial base station.
3.3. Time-Delay Model
Due to the limited computational resources of each device, when the computational resources of satellites and base stations are insufficient to meet the computational needs of users, UAVs are required to perform auxiliary computations. However, a non-negligible delay is generated in the process of providing communication and computation services to users, so this section focuses on the delay model in the maritime communication scenario.
Assuming that at the t-th time slot, if the maritime user n is within the communication coverage of both the satellite and the base station, and both of them have sufficient computational resources, the user equipment will prioritize establishing a communication connection with the satellite, if the satellite resources are insufficient, it will choose the base station to communicate; if the base station also fails to satisfy the demand, the equipment will try to communicate with the nearest UAV. When the mission ends, the device will recover the amount of computation required for that user’s mission. In marine environments, the flight delay and transmission delay are affected by environmental factors such as wind speed, humidity, and airflow, which are assumed to be represented by a discount factor . Additionally, the complexity of the computational task is represented by a factor k ().
When the amount of computation required
for a user task is less than the computational power
available to the satellite, the user device will establish a communication connection with the satellite. Assume that the channel transmission rate between the satellite and user
n is
. The amount of data to be transmitted by user device
n is
, and the amount of task data that can be processed by the satellite per second is
. Then the total delay of the satellite can be expressed as Equation (
14):
where the task transmission time
and computational delay
can be expressed as Equations (
15) and (
16), respectively:
Similarly, when the amount of computation required
for a user task is less than the computational power
available at the base station, the user device will establish a communication connection with the base station. The total delay between the base station and the associated user
n can be expressed as Equation (
17), where the task-transfer time
and computational delay
are given by Equations (
18) and (
19), respectively:
When the amount of computation required
for a user task is less than the computational power
available to UAV
m, the user device will establish a communication connection with UAV
m. For the flight delay
between UAV
m and user device
n, which is affected by the maritime environment, it can be expressed as Equation (
20):
Therefore, the total delay between UAV
m and the associated user
n can be expressed as Equation (
21):
where the mission transfer time
and computational delay
can be expressed as Equations (
22) and (
23), respectively:
Finally, we define the total system delay
as Equation (
24). Assuming that the quality of service (QoS) delay metric for user
n is
, it needs to satisfy the threshold
to meet the demand of user
n. Otherwise, the current device cannot satisfy the demand of user
n:
3.4. Energy-Consumption Model
Since UAVs are limited by battery capacity, in the case of high energy consumption, UAVs may find it difficult to fulfill their tasks. Therefore, this subsection mainly describes the energy-consumption model of UAVs. The total energy consumption
of UAV
m providing services to user
n can be expressed as the sum of the flight energy consumption and the hover energy consumption, as shown in Equation (
25):
Considering that the UAV is in a hovering state when providing computational resources for user requests, the hovering energy consumption
of the UAV can be calculated as follows, where
is the power consumption of the UAV when hovering,
is the mass of the UAV,
g is the gravitational acceleration,
is the air density in the marine environment, and
A is the total swept area of the rotor blades of the UAV:
According to the user position obtained from the user trajectory-prediction model in
Section 4.1, after solving for the optimal position of the UAV, the flight energy consumption of the UAV moving to that position can be calculated by the following equation. Here,
is the UAV flight power consumption, and
is the air resistance experienced by the UAV in the marine environment:
3.5. Formulation of the Problem
A user can establish a connection with a device when the user is located within the communication range of the respective device, and the required computation does not exceed the available computational capacity of the device, while satisfying the delay and energy-consumption constraints (i.e., the maximum delay limit
and the UAV battery capacity limit
). The establishment of a connection between a device and a user can be represented as follows:
To evaluate the coverage of the overall network deployment, a parameter
Q is defined to quantify the coverage performance of the entire system as follows:
In order to optimize mobile communication network coverage and ensure efficient collaboration between satellites and terrestrial base stations for comprehensive communication support for the maritime user community, this study designs a dynamic deployment strategy for UAVs. This strategy aims to provide reliable and immediate communication support for users in emergency situations or for those who cannot be covered by conventional networks. Based on a comprehensive consideration of the computational resources of each communication node, the coverage connectivity and communication delay for mobile users, and the battery capacity of the UAV, the objective is to minimize the total system cost and maximize user communication coverage. The optimization problem is modeled as follows:
4. Proposed Algorithm
In this section, we formulate the dynamic UAV-deployment problem based on route prediction. The goal is to predict the sailing trajectory of a user ship by a reinforcement learning algorithm and dynamically deploy UAVs to maximize user coverage based on the sailing trajectory under constrained energy consumption, latency, and computation. We propose a dynamic UAV-deployment scheme for changing user dynamics that combines deep reinforcement learning algorithms for route prediction as well as for UAV deployment.
4.1. Ship Trajectory-Prediction Algorithm Based on Improved LSTM
To improve the accuracy of ship route prediction, this section proposes an LSTM model optimized based on the Sparrow Search Algorithm (SSA) for optimal prediction of AIS datasets. The LSTM performs well in dealing with time-series prediction tasks (e.g., ship route prediction), whereas by mimicking the behavior of sparrow foraging, the SSA can be optimized in a complex, high-dimensional search space hyperparameters of LSTM. Therefore, combining SSA with LSTM can automatically optimize the hyperparameters to improve the prediction performance of the model.
The AIS dataset contains data such as latitude, longitude, heading, and speed of different numbered vessels at each moment in the past period, so we performed preprocessing operations on the AIS dataset. First, we performed feature selection and extracted the four features of latitude, longitude, speed (Sog), and heading (Cog) from the ship trajectory data as the input feature matrix x, and the target value y was set to be the latitude and longitude of the next time step. Then, the input data and output data are constructed into a time series, and the data of the past N time steps are used to predict the target value of the next time step, to capture the temporal features and dynamics. To avoid the impact of magnitude differences between feature values on model training, we use min-max normalization to scale the input features and target values into the interval [0, 1].
When inputting the input features into the input layer of the LSTM, we converted them into a 3D tensor that can represent the feature data for multiple time steps, i.e.,
, where the number of samples represents the number of time series samples that can be created for the entire dataset, and the number of features is represented as four features (latitude, longitude, speed, and heading) for each time point. In the hidden layer, the LSTM unit processes the input data through a memory and forgetting mechanism to capture the long-term dependencies in the time series. Considering the model performance, training efficiency, and avoiding the risk of overfitting, two LSTM hidden layers are used in the model, and the number of neurons in the first and second layers is set to
and
, respectively, which will be optimized by the SSA. The LSTM updates the hidden state through the gating structure, and the state of the hidden layer’s output at the last time step,
, is denoted by Equation (
33), where
is the weight matrix of the hidden layer,
is the bias term, and
f is the nonlinear activation function of the LSTM. In the output layer, the model maps
to the predicted longitude and latitude, which is calculated as in Equation (
34), where
is the weight matrix of the output layer and
is the bias term.
Finally, we use the mean square error to measure the difference between the predicted and true values of the model, continuously minimize the loss function, and update the weights and biases through the Adam optimizer to improve the prediction accuracy. To prevent overfitting, we introduce L2 regularization in the training process of LSTM networks. The complexity of the model is limited by adding a penalty term
proportional to the sum of squares of the weights to the loss function, which improves the generalization performance of the model and reduces the risk of overfitting on the training data, and the formula is specified as follows:
Here, we optimize the hyperparameters of the LSTM, i.e., the learning rate, epoch, and the number of hidden layer neurons. First, the SSA algorithm requires the initialization of a population, where each individual represents a set of hyperparameter combinations to be optimized, and the locations of these individuals are randomly generated within the upper and lower ranges
. Each individual’s performance is evaluated by a fitness function (i.e., mean square error), where a smaller fitness value indicates that the hyperparameter combination is more effective. We categorize the updating strategies of individual positions into three categories; in each iteration, the top 50% of individuals update their positions using strategy 1, i.e., the individual positions are close to the current optimal individual, with the updating Equation (
36), where
is the position of the current optimal individual, and
is a uniformly distributed random number within the interval of
, which is used to increase search diversity and avoid falling into local optimality. For the last 50% of individuals, update strategy 2 is used to generate new positions through a global random search, and the update formula is Equation (
37), where
and
are the upper and lower limits of the
j-th dimension, and
is a random number in the interval
. After each position update, the algorithm performs boundary processing to ensure that the position
of each dimension stays within the upper and lower limits
, which is handled by Equation (
38).
After each policy update, the algorithm will recalculate the fitness value of each individual and record the fitness of the current optimal individual. At the end of several iterations, the algorithm outputs the individual with the smallest fitness value, i.e., the optimal hyperparameter combination. The specific pseudo-code of the algorithm is as Algorithm 1:
Algorithm 1 Optimization of the LSTM model for ship route prediction using the SSA algorithm |
- 1:
Input ship data (latitude, longitude, SOG, COG) - 2:
Initialize the LSTM model hyperparameter spaces and ; - 3:
Initialize population size, number of iterations, individual position population; - 4:
The fitness MSE was calculated for each individual in the population; - 5:
for each episode do - 6:
Find the current optimal individual ; - 7:
For the top 50% of individuals, the individual’s position was adjusted using the Equation ( 36) update strategy; - 8:
For the latter 50% of individuals, the location was updated using a global random search via the Equation ( 37); - 9:
Performs boundary processing by the Equation ( 38) in the updated position; - 10:
Reassess the fitness of each individual in the population; - 11:
end for - 12:
Output the optimal individual, i.e., the optimal LSTM hyperparameter combination; - 13:
Constructing the model: Define an LSTM model with 2 hidden layers and compile the model using the Adam optimizer and MSE loss function. - 14:
Training LSTM models using optimal hyperparameter combinations; - 15:
Predicting the next trajectory of a vessel using a trained LSTM model; - 16:
Evaluate the prediction accuracy and calculate the mean square error;
|
4.2. UAV Dynamic Deployment Algorithm
Traditional UAV-deployment methods only consider the limitations of the UAV’s energy consumption and latency, without taking into account the mobility of the user nodes and the UAV’s computational resources. Therefore, in this section, a deep reinforcement learning algorithm is used to deploy drones dynamically. Trajectory prediction is used to understand the user distribution in advance so that decision-makers can make better decisions to maximize the coverage with less delay and energy consumption. At each time slot, the agent collects ship prediction data, dynamically assigns drones, and develops deployment strategies based on predicted device locations as well as task volumes.
We convert the UAV-deployment problem into a Markov Decision Process (MDP) problem, defined by the tuple
, where the state space
S and action space
A are continuous. Specifically, the Markov chain is denoted by
, where each parameter consists of an output state
at moment
t, an action
, an output state
at moment
, and a reward
. The output state
is received from the environment to the intelligent agent at moment
t. The intelligent agent uses an internal policy function to compute the probabilities of each action and selects an action
based on these probabilities. Applying action
to the environment yields state
. Finally, the environment will combine the action
, the state transitions, and the reward function to compute the reward value
under action
and return it to the intelligent agent so that the agent can optimize its future action strategy based on the reward situation. The complete trajectory of the interaction process with the environment can be represented by Equation (
39):
The state space
S consists of
states, where
M represents the number of available drones. Based on the trajectory-prediction algorithm described in
Section 4.1, we can predict the movement trajectories of uncovered users over some time. At the moment
t, by performing K-means clustering analysis on the user coordinates, we can determine the location of the user’s aggregation center point at that moment, whose 3D coordinates correspond to three states in the state space. In addition, the other three states describe the decision effects at moment
t, including the total number of covered users, the accumulated delay, and the overall energy consumption, respectively. Meanwhile,
represents the remaining battery capacity, available computing resources, and the distance of the UAVs relative to the center point of the user distribution at moment
t, respectively, for
M UAVs. The action space
A consists of
consecutive action variables, each of which represents the amount of adjustment of the coordinates of the
M UAVs in the 3D space at the moment
t. Together, these action variables determine the change of the UAV’s position at the next moment, thus enabling efficient optimization of user coverage.
The reward function is designed as:
where
,
, and
are the moderating factors used to regulate the reward weights of each component. As the number of covered users increases, the inverse of the average energy consumption and the inverse of the average delay decrease with the increase in users. To achieve the reduction of delay and energy consumption while maximizing the number of covered users, the larger this reward function, the better the performance of the algorithm.
The traditional strategy gradient approach leads to unstable training due to the excessive magnitude of strategy updates, while trust domain strategy optimization (TRPO) has a high computational complexity despite the introduction of constraints. PPO simplifies the implementation of strategy updates by introducing the clip mechanism, and at the same time effectively restricts the magnitude of the strategy updates to avoid strategy collapses during the training process. The PPO algorithm mainly employs the strategy gradient approach to avoid strategy collapses during the training process by maximizing the expectation of the dominance function to optimize the strategies, and to guide the intelligent agent to take actions that can improve the coverage and reduce the delay. In particular, it calculates the ratio of the probability of the old and new strategies and introduces the CLIP function to ensure that the new strategy improves within the scope of not deviating from the old strategy while limiting the update magnitude. The strategy objective function is computed as in Equation (
41), and the ratio of the old and new strategy probabilities
is as in Equation (
42), where
is the output probability of the strategy under the parameter
, and
is the output probability of the old strategy.
The input of the value network is the current state
and the output is the value of the current state. The loss function of the value is mainly used to train the value network to minimize the difference between the value function and the actual return. The closer its predicted state value is to the actual return advantage estimate
, the more accurate it is, where the value loss function is as in Equation (
43), and the dominance function
is as in Equation (
44), where
is the output of the value network,
is the target value, and
denotes the instantaneous reward at the moment of time
t. To encourage strategies to explore different actions and to prevent them from converging to a deterministic strategy too early and thus losing the ability to explore the environment, we employ entropy regularization, which is shown in Equation (
45).
The total loss function of the whole algorithm is a weighted summation of three parts, namely, the strategy objective function, value loss, and entropy regularization term. By appropriately adjusting the weights of these three parts, the total loss function can guide the algorithm to find the optimal balance between convergence, accuracy, and the exploratory nature of the strategy, and its formula is shown in Equation (
46).
The specific pseudo-code of the algorithm is as Algorithm 2:
Algorithm 2 Reinforcement learning model for maritime communication service coverage using RUDD algorithm |
- 1:
Initialize the strategy network and the value network V; - 2:
Initialize the experience cache pool D; - 3:
Initialize the optimized LSTM model to predict vessel trajectories; - 4:
Initialize the communication service coverage optimization algorithm PPO; - 5:
for each episode do - 6:
Initialize the environment state s, set the initial state to contain information about the satellite, the UAV, and the ground base station, as well as information about the relative position of the UAV and the target vessel; - 7:
for each step do - 8:
Adopting the current policy , action a is selected according to state s; - 9:
Performing action a, the environment returns a new state and reward r; - 10:
Store the experience (s, a, r, , relative positions of ships and UAVs, number of covered ships, time delay) in D; - 11:
Update status ’; - 12:
end for - 13:
Update the policy network and the value network V using the data in the cache pool D; - 14:
Sample a batch of data from D; - 15:
Calculate the advantage function for each action; - 16:
Updating the strategy network to maximize expected reward and entropy and reduce action bias; - 17:
Update the value network to minimize the value function error; - 18:
Empty cache pool D; - 19:
Predicting the next position of the target vessel using the LSTM model - 20:
The predicted locations are fed into the PPO algorithm to optimize the communication service coverage; - 21:
Update the policy network to consider the optimization results of the PPO algorithm; - 22:
end for
|
5. Simulation Results and Discussion
In this section, we show the results of evaluating the performance of the proposed algorithm through simulation experiments. We provide a detailed evaluation and analysis of the convergence of the algorithm under different hyperparameter settings. We also compare the performance of the proposed algorithm with other Deep Reinforcement Learning (DRL) algorithms in terms of system cost, communication coverage, and delay performance. The experimental results show that our algorithm enables more accurate route prediction and can achieve about 95% user coverage in marine environments while ensuring that the average latency of user response is around 1 s. The algorithm is implemented in the Visual Studio Code (VSCode) environment, using the Python programming language and integrating TensorFlow and PyTorch, two popular deep learning frameworks, to improve the execution efficiency and performance of the algorithm.
5.1. Assessment of Route-Prediction Performance
Figure 2 shows the performance of different algorithms in route prediction. From the figure, we can see the difference between the actual trajectory and the predicted trajectory of each model, as well as the prediction accuracy of each algorithm.
Figure 2a compares the performance of the Multilayer Perceptron algorithm (MLP), LSTM, and SSA-LSTM in actual trajectory prediction. The MLP model has a larger prediction error, whereas the LSTM is the next closest, and the SSA-LSTM has the closest prediction result to the actual trajectory, which demonstrates significant accuracy and stability. Since MLP cannot process time-series data, it is unable to capture the time dependence and dynamic changes in the trajectories. As a result, the deviation between the predicted and actual trajectories of MLP is large, as shown in
Figure 2b. With the change of trajectory, the error of MLP gradually increases, which has some limitations in complex trajectory prediction. Compared with MLP, LSTM can capture the long and short-term dependencies in the time series through the memory unit and gating mechanism, which is adaptive to the trajectory data, but still has deviations in the rapidly changing regions. As shown in
Figure 2c, the performance of LSTM without hyper-parameter optimization is improved but not stable enough. SSA optimizes the hyper-parameters of LSTM to work in the best configuration, which significantly improves the prediction accuracy. As shown in
Figure 2d, the prediction results of SSA-LSTM almost completely overlap with the actual trajectories with small errors, showing excellent performance. Overall, the SSA-optimized LSTM model outperforms the MLP and the unoptimized LSTM in terms of prediction accuracy and stability and can better adapt to complex trajectory dynamics and achieve higher accuracy.
As shown in
Figure 3, the figure demonstrates the trend of the mean square error (MSE) of different algorithms with the number of training rounds (epochs) during the training process. In the initial phase, the MSE of all algorithms decreases rapidly. LSTM has a higher initial error but converges quickly after a few epochs. SSA-LSTM has the fastest decrease in MSE in the starting phase and stabilizes at a low level quickly, indicating that SSA-LSTM can find better model configurations at an early stage and converges quickly. In the intermediate stage, the MSE of MLP changes relatively smoothly, but with small fluctuations and slower accuracy improvement. LSTM is affected by noise or training data when capturing nonlinear variations in the data, which results in larger fluctuations in its MSE. In contrast, SSA-LSTM is more adaptive to the data after hyperparameter optimization, significantly reduces the error, and remains stable; thus, SSA-LSTM has the lowest and smoother MSE at this stage. After 100 epochs, all the algorithms tend to be stable. SSA-LSTM has the smallest MSE and almost closes to zero, and has higher prediction accuracy after training. The MSE of MLP and LSTM is still higher than that of SSA-LSTM in the later stage, and the MSE of LSTM is slightly higher than that of MLP. Overall, SSA-LSTM algorithms have the lowest error in all stages and are suitable for subsequent ship prediction, which is suitable for the subsequent prediction of ship routes.
5.2. Performance Evaluation of UAV Dynamic Deployment Algorithms
In this section, we evaluate the performance of the proposed RUDD algorithm through simulation tests. In the simulation experiments, we compare the proposed RUDD algorithm, PPO alone, Deep Deterministic Policy Gradient (DDPG) algorithm, and Random policy (Random) algorithm for the network coverage, the latency of the user to receive the service, and the average energy consumption of the UAVs for completing the service. The main simulation parameters of the RUDD algorithm are shown in
Table 1.
In this experiment, the parameters we chose include the number of mobile devices, the number of UAVs, the discount factor , the decay factor , the number of training rounds, the Critic learning rate, and the Actor learning rate in . The number of mobile devices will determine their initial distribution and location, which determines the complexity of the task and the system load, and setting it to can test the adaptability and robustness of the algorithm under different densities of user scenarios. The number of drones is fixed at 3, which determines the service capacity and coverage, thus placing restrictions on the computational capacity, energy consumption, and coverage of the whole algorithm. The discount factor can determine how much importance the algorithm attaches to future rewards, while the decay factor can be used to control the balance between the variance and bias of the dominance estimation, and setting them to 0.95 and 0.9, respectively, can make the algorithm pay more attention to long-term rewards. The algorithm tends to converge at around 200 training rounds and thus is set to 200. The Critic network learning rate can be used to update the value function quickly, and the Actor-network learning rate can be used to avoid the model instability caused by a too fast strategy update. Setting them to 0.02 and 0.01, respectively, can help the smooth optimization of the strategy, which can improve the robustness of the algorithm.
As shown in the comparison of the total reward curves of the four algorithms in
Figure 4, the PPO algorithm performed more stably and had relatively higher rewards than the Random and DDPG algorithms. The PPO used a restrained update mechanism, which ensured smoothness during training by limiting the magnitude of each policy update, thus avoiding excessive policy fluctuations. The Random strategy, due to the lack of a learning mechanism, did not perform significantly better as the number of training times increased and thus performed relatively poorly. In this experimental setting, the DDPG’s reliance on empirical playback and unconstrained policy updates results in its vulnerability to noise and outdated data during training. As a result, the DDPG algorithm fluctuates and struggles to converge in reward values as the number of training sessions increases. Compared to these three, RUDD adds a prediction algorithm on top of PPO, which enables the UAV to predict the distribution of users in advance, thus optimizing policy updates and making appropriate action choices. As a result, the UAV can formulate the best deployment method before the task execution, which not only ensures coverage but also effectively reduces flight delay and energy consumption. This makes RUDD show strong adaptability and stability during the training process, and thus the reward value after final convergence is the largest. The RUDD algorithm can predict environmental changes more accurately in complex and dynamic marine environments, thus realizing the dynamic deployment of UAVs, and improving the stability and the effectiveness of mission execution under uncertain and dynamic conditions.
As shown in
Figure 5, in the same sea area, with the increase in user density, although the UAV can reach the coverage condition more easily, the coverage rate of users also shows a decreasing trend due to the relative increase in the number of users distributed at the edge. The PPO algorithm shows relatively strong stability in the change of coverage under different numbers of users, and its coverage decreases from 95% with 10 users to about 80% with 30 users. Despite the decrease, the performance is better compared to the Random and DDPG algorithms. The PPO algorithm can maintain a high coverage rate more stably with the increase in the number of users and has relatively good adaptive ability. In contrast, the Random algorithm, due to the lack of policy optimization, cannot respond effectively in the face of more users, resulting in a significant drop in its performance. As a result, Random performs poorly when the number of users increases, and its coverage gradually decreases to about 72% with the increase in users. Due to the interference of noise, the DDPG algorithm’s policy update is more unstable, and its coverage performance is worse as the number of users increases, dropping from 90% to less than 70%. RUDD adds a prediction algorithm to the PPO, which can optimize the policy in advance, making it able to maintain a higher coverage in the face of the dynamically changing marine environment of the users, and showing great stability and adaptability. As a result, the RUDD algorithm has strong robustness as the coverage rate only drops from about 98% to 85% even with the increase in the number of users.
As shown in
Figure 6, within the same sea area, as the user density increases, the UAVs can meet the coverage requirements with relative ease, which in turn leads to a reduction in the total flight delay. This results in a decreasing trend in the average user delay. In different user size scenarios, both DDPG and Random algorithms show a significant decreasing trend as the number of users increases. However, under the constraints of UAV computing power, battery capacity, and coverage area, although the latency of these two algorithms decreases rapidly, due to the limitations of the algorithms themselves, their latency is still higher than that of the RUDD algorithm and the PPO algorithm when the number of users reaches 30. Meanwhile, the PPO algorithm can achieve the best effect of the algorithm more stably by its effective control of the magnitude of policy updates. Based on PPO, we introduce a prediction algorithm, which allows UAVs to quickly make deployment strategies based on the distribution state of users. As a result, the RUDD algorithm has the best performance among all the algorithms, with an average response delay of 1s and the smallest curve variation, which significantly improves the overall execution efficiency and stability. The low-latency performance of the RUDD algorithm ensures that the UAV achieves fast response and stable coverage in maritime communication, which greatly improves the safety and efficiency of maritime operations.
As shown in
Figure 7, as the number of users increases, the flight energy consumption and hover energy consumption decrease due to the decrease in the average delay of users. Thus, the average energy consumption of users also shows a decreasing trend. When comparing the average energy consumption under different numbers of users, the RUDD algorithm can adjust the strategy in advance and optimize the energy allocation due to the prediction mechanism, thus effectively reducing the energy consumption while ensuring coverage. As a result, it has the best performance among the four algorithms, with the average energy consumption of users remaining stable at around 710 mAh. In contrast, the PPO algorithm has a relatively stable performance in terms of energy consumption, but due to the lack of the ability to adjust the strategy in advance, its energy consumption is slightly higher than that of RUDD, with an average energy consumption of about 800 mAh at about 30 users. With DDPG and Random algorithms it is difficult to effectively control energy consumption in dynamic environments since the algorithm’s strategy is not reasonably limited. Both of them show an obvious decreasing trend, but the uncertainty of strategy selection leads to their energy consumption always being higher. The average user energy consumption of the DDPG and Random algorithms is reduced to 850 mAh and 825 mAh, respectively, when the number of users is 30. Therefore, when applied to offshore scenarios, the RUDD algorithm can efficiently optimize energy utilization, and significantly improve the economy and durability of UAV operations.