1. Introduction
The use of unmanned aerial vehicles (UAVs), commonly known as drones, gained a lot of consideration in recent years from both academia and industry [
1,
2]. UAVs are heavily used for military and commercial applications. Leveraging UAVs for future applications looks promising solutions. This is due to their unique properties like flying ability, usability, survivability, functionality, and maneuverability [
1,
2]. Parts of these applications are data collections, delivery services, environmental monitoring, rescue operations, disaster management, aerial photography, traffic and control monitoring, and wireless communications [
1,
2]. In this paper, we will focus on the wireless communication applications of the UAVs, particularly for a post-disaster area coverage scenario. Mainly, UAVs can be cost-effective flying aerial base stations (BSs) that can provide coverage to the users in remote and post-disaster areas [
3]. They can also be used as on-demand airborne relays connecting a remote user and a cellular BS separated by significant obstacles [
4]. For wireless sensor networks (WSNs), UAVs can be utilized to disseminate/collect control and data information from ground-deployed wireless sensors [
5,
6]. For mobile ad-hoc networks (MANETs), such as vehicular ad-hoc networks (VANETs), UAVs can assist the management and control of VANETs and extend their scalability and coverage [
7]. Cache-enabled UAVs can significantly enhance network cashing functionality empowered by the ability of UAVs to track users’ mobility and predict their content requests [
8]. Wireless backhaul can be backed up using cost-effective flying UAVs when the wired backhaul link is damaged or needs maintenance. For future fifth-generation (5G) and beyond 5G (B5G) wireless networks, UAVs can play a significate role in enabling and boosting their performance [
9]. UAVs can be parts of 5G/B5G heterogeneous networks through distributing on-demand UAV BSs to cover hotspot areas or highly populated events [
9]. Moreover, UAVs can highly densify the 5G/B5G networks through the deployment of multiple UAV base stations.
On the other side, the use of millimeter wave (mmWave), i.e., 30~300 GHz, communications gained a lot of attention due to their large swath of available spectrum, enabling multi-gigabit per second (Gbps) connectivity [
10,
11,
12]. However, mmWave is susceptible to harsh propagation losses due to its high operating frequency in addition to the influence of path blockage [
10]. This can be overwhelmed by using directional communication through antenna beamforming, thanks to the high number of packed antenna elements [
11]. Accordingly, mmWave coverage is limited to be within a few meters around the mmWave transmitter, which mandates the use of relaying to extend its coverage range [
13]. Integrating millimeter wave (mmWave) band, 30~300 GHz, communications with UAV BSs can sustain 5G/B5G requirements due to the large available bandwidth [
14,
15]. Moreover, UAVs can address many of mmWave challenges, such as the construction of autonomous mmWave relays [
16].
In this paper, a post-disaster area where the terrestrial network is completely malfunctioned or destroyed was considered to help the surviving people inside it. In this catastrophic situation, several UAVs were distributed to cover this post-disaster area for rescue services adequately. MmWave was employed for the communication links among the UAVs to provide ultra-high-speed Gbps backhaul connections to support critical rescue services such as taking high-resolution videos/photos to the catastrophic area. This was to help in conducting principal analysis and precisely finding out the locations of the victims. The low data rates’ frequency bands may not support these crucial functionalities. Due to the short transmission range of the mmWave signal, some of the UAVs were to operate as access UAVs, providing data connections to the victims/rescue workers, collecting essential information about the post-disaster area such as photographs, and disseminating critical instructions to the victims/rescue workers. Whereas, the other UAVs were to act as gateways relaying information to/from the access UAVs from/to the nearest survival cellular networks, respectively.
In this paper, access UAVs were to select and then fly towards the gateway UAVs, maximizing their achievable data rates while considering the battery cost of their flights. Although the access UAVs could directly fly towards the standing cellular networks, the use of gateway UAVs relaxed the budget of access UAV flights; mmWave in particular was characterized by small coverage. This highly contributes to saving access UAVs’ energy for more rescue operations. The challenge of this gateway UAV selection problem, which is firstly introduced in this paper to the best of our knowledge, comes from its adversarial setting. This is because an access UAV has no prior experience with the data rate gained from connecting with a specific gateway UAV unless it flies and connects with it.
Additionally, this available data rate is influenced by the other access UAV selections due to mutual interference and the time-sharing schedule. In these fully decentralized settings, no prior information is either available or exchanged among UAVs. Despite its realism, this problem is unique and utterly different from the existing UAV gateway/relay selection problems [
17,
18,
19,
20,
21,
22], where UAVs can easily exchange information among them through the fully connected UAV network. Data rates among UAVs can also be anticipated beforehand via prior channel measurements and estimations. Yet, the considered UAV gateway selection problem aims to not only maximize the achievable data rates of the access UAVs, but to also minimize the battery cost of their flights towards the selected gateways.
In this paper, a tool of machine learning (ML), specifically online learning, was used to address this optimization problem efficiently [
23,
24,
25]. The motivation behind using online learning comes from its ability to deal with both complex and dynamic environments effectively [
26], without any prior information, where an agent learns to enhance its future actions based only on its past actions/observations. Towards this end, the gateway UAV selection problem is formulated as a budget-constrained multi-player multi-armed bandit (MAB) problem [
27,
28,
29]. MAB is a particular type of online learning, where an agent wants to maximize its long-term rewards (minimize regrets) via utilizing its previous best arm selection or investigating new choices, known as the exploitation–exploration tradeoff [
27,
28,
29]. Since MAB techniques work online without any prior knowledge about the environment other than the player’s observations while playing, they are considered as the most appropriate solutions for this deemed problem. From the MAB perspective, an access UAV will act as the player aiming to maximize its long-term average data rate, i.e., reward, constrained by its limited budget of battery capacity. On the other side, the gateway UAVs will act as the arms of the bandit. Due to the fully decentralized setting, access UAVs will interact selfishly and concurrently with the environment and select their appropriate gateway UAVs then fly towards them for establishing the mmWave communication links. Only based on their previous successive observed rewards, access UAVs try to compromise the exploitation–exploration tradeoff, i.e., either exploiting their best-selected gateway UAVs so-far or exploring new ones. In this paper, three MAB algorithms, namely upper confidence bound (UCB) [
29], Thompson sampling (TS) [
30], and the exponential weight algorithm for exploration and exploitation (EXP3) [
31], are modified to address such gateway UAV selection problem. Despite the adversarial setting of the problem and the selfish behavior of the access UAVs, the modified MAB algorithms learn to play actions that enhance the overall system performance, as demonstrated in [
24,
30] and further discussed in this paper. To the best of our knowledge, it is the first time that gateway UAV selection in a fully decentralized mmWave UAV network is formulated as a budget-constrained multi-player MAB problem and efficiently addressed using modified BA-MAB algorithms. The main contributions of this paper can be summarized as follows:
The problem of gateway UAV selection in post-disaster area coverage is formulated as an optimization problem aiming to maximize the achievable data rates of the access-gateway-cellular relays subject to the limited remaining battery capacity of the access UAVs. This is done in a fully decentralized setting, where no information is either pre-available or exchanged among UAVs;
A budget-constrained multi-player MAB model is formulated and introduced. In this model, the access UAVs act as the agents, the gateway UAVs act as the arms of the bandit, and the rewards are the long-term achievable data rates constrained by the limited budget of the battery capacity of the access UAVs;
Three BA-MAB algorithms, i.e., BA-UCB, BA-TS, and BA-EXP3, are proposed to be exploited by each access UAV to selfishly interact with the environment and select the proper gateway UAVs in this adversarial setting. All access UAVs will select their associated gateway UAVs concurrently, and the MAB algorithms implemented in the access UAVs will learn from their previous observations to proactively enhance the overall performance;
Extensive numerical analysis is conducted to measure the performance of the proposed MAB-based algorithms under different scenarios and compare their performances with two benchmark approaches based on near and random gateway UAV selections.
The rest of this paper is organized as follows;
Section 2 summarizes the related work.
Section 3 discusses the UAV system model, including the use of the mmWave link model and previews the gateway UAV optimization problem. In
Section 4, the proposed BA-MAB algorithms will be explained, followed by numerical analysis in
Section 5. Finally,
Section 6 delivers the concluding remarks.
2. Literature Review
An efficient gateway-selection algorithm and management technique is required for flying multi-UAV systems for connection with the global network. In [
17], the authors surveyed multi-UAV-based heterogeneous flying ad-hoc networks’ (FANET) structure and protocol architecture. Then, a mixture of distributed gateway-selection algorithms and cloud-based stability-control mechanisms were discussed, supplemented by a range of open challenges. The authors in [
18] defined the stability of UAV networks, constructed a network partition model, and designed a distributed gateway selection algorithm with dynamic network partition while considering the practical features of UAV networks. Moreover, the number of gateways is managed according to the system requirements. In [
19], an energy-efficient method for gateway selection of UAVs involved in relaying information to the heterogeneous cloud was proposed. The authors also make use of the queuing theory and Lyapunov optimization to solve the power-delay tradeoff. In [
20], a UAV-enabled two-way relaying communication between two robot swarms in the absence of communication infrastructures in remote areas or post-disaster rescues was handled. UAV is employed as the relay to expand the communication range between two disconnected ground robot swarms due to its several advantages. In addition, the UAV’s trajectory and power allocation were jointly optimized to maximize the sum-rate of the up and downlinks, where the joint optimization problem is decoupled into two sub-problems to address the non-convexity. In [
21], a new UAV node placement technique for multi-UAV relay communication was solved based on the non-linear constraint optimization problem. The authors in [
22] introduced downlink non-orthogonal multiple access (NOMA) to a UAV-enabled mobile relaying system.
All of the above existing research works of UAVs gateway/relay selections considered that UAVs have full knowledge at the time of selection, and the network is fully connected, which is not the case of this paper. Wherein, no prior information is available for the access UAVs at the time of gateways selection, and the network is fully disconnected. Moreover, the present works did not consider the cost of access UAV flights towards their selected gateway UAVs, which will be addressed efficiently throughout this paper.
ML is a promising technology for efficient solutions to the severe UAV problems caused by their utilization of wireless communication. A full survey of all related research where ML methods have been applied on UAV-based communications to improve practical aspects like channel modeling, resource management, security, and positioning is provided in [
32]. Moreover, a review of deep reinforcement learning (DRL) algorithms that address emergency applications in wireless communications such as mmWave, intelligent caching, and UAV scenarios are summarized in [
25]. In [
33], a distributed sense-and-send protocol was proposed to manage the UAVs for sensing and transmission. Moreover, the authors applied RL to solve main problems like trajectory control and resource management. A DRL-based channel and power allocation framework was suggested in [
34] for the UAV-enabled IoT system. In this scheme, the UAV-BS can intelligently allocate uplink channels and the transmit power of IoT nodes for maximizing the energy performance of all IoT nodes. Another UAV control policy based on DRL called the deep deterministic policy gradient (UC-DDPG) was proposed in [
35]. UC-DDPG addressed the combined problem of 3D mobility of multiple UAVs and energy recharging arrangements to ensure efficient energy and fair broad region coverage of each user with keeping on the service. The authors in [
36] proposed two efficient path planning algorithms based on extended MAB to make a rotary-wing UAV act as a wireless BS in a post-disaster area with unknown user distribution. Their proposed algorithms outperform the helical path, which scans the whole post-disaster area by increasing radius circles.
Despite the existing applications of ML in UAV wireless networks, all related works did not consider the problem of gateway UAV selection in a fully decentralized mmWave UAV network using budget-constrained multi-player MAB techniques.
3. System Model
In this section, we will discuss the network architecture of the mmWave UAV wireless networks in addition to the utilized mmWave link model.
3.1. UAV Network Architecture
Figure 1 shows the considered mmWave UAV network architecture. In this model, there is a post-disaster area, e.g., flood or earthquake areas, in which the cellular macro-BS cellular system is malfunctioned or wholly destroyed. For rescue services, this area will be covered using a group of UAVs. Some of these UAVs will provide the access functionalities inside the catastrophic area, and others will work as gateways for relaying the collected information to the nearest functional cellular macro-BS. To avoid frequent network reconfiguration, the gateway UAVs should have the maximum energy among the other UAVs, while considering their flights to the closest points to the survival cellular macro-BS. Moreover, the network should have alternative gateway UAVs for network presence purposes. The efficient design of UAV network topology via deciding which UAVs should act as access and which should act as gateways considering UAV energy and mobility constraints is beyond the scope of this paper.
The access UAVs provide data connectivity to the victims for essential messaging, and collect valuable information about the post-disaster area using photography. They also collect crucial details about the victims, such as names, ages, genders, photos, locations, etc. Moreover, they disseminate essential instructions to the victims as well as the rescue workers inside the area. High-speed mmWave links are used to connect the access UAVs with the gateway UAVs, and the gateway UAVs with the cellular macro-BSs. The gateway UAVs are directly connected with their associated survival cellular macro-BS without relaying. In this paper, we focus on the backhaul relay links between access UAVs, gateway UAVs, and cellular macro-BSs. After collecting/disseminating the essential information, each access UAV should select and then fly towards one of the gateway UAVs to relay its data to/from the cellular macro-BS through it. It is assumed that the UAV network is not fully connected, i.e., no information can be exchanged among the UAVs unless they fly and connect together. In this paper, we do not consider the fully connected UAV network in order to highly decrease the number of deployed UAVs and relax the need to design an efficient multi-hop routing protocol overcoming the dynamics in the flying UAV network. Moreover, highly complicated route management and maintenance algorithms are needed in the case of a fully connected UAV network to adapt the network configuration when one of the relaying UAVs is out of service, malfunctioning, or needs to be recharged. The design of this fully connected UAV network using a cooperative MAB game, including the required routing protocol in addition to the route management and maintenance algorithms, will be left for our future investigations.
During the flight lifetime of the access UAVs, i.e., during one charging period of their battery, they should collect/deliver as much data as possible. This means that access UAVs should select gateways, maximizing their achievable data rates within the limited battery capacity of their flights. The gateway UAVs are assumed to be only hovering nearby their associated cellular macro-BSs without frequently flying back and forth from them. Thus, the gateway-cellular macro-BS re-association problem due to gateway mobility is relaxed in this paper.
3.2. MmWave Link Model
In air-to-air communications, the links are almost line-of-sight (LoS). Thus, we will follow the air-to-air mmWave channel model presented in [
37] for UAV-to-UAV communication, where the received power at UAV
j from UAV
i is expressed as:
where
is the transmit power of UAV
i,
is the wavelength,
is the separation distance between UAV
i and UAV
j, and
is the path loss exponent.
and
refer to the transmitter (TX) and receiver (RX) mmWave beam-forming gains, respectively.
is the beam offset angle of the TX beam direction to the location of the RX, while
defines the beam offset angle of the RX beam direction to the location of the TX.
is the −3dB beam-width. Additionally, in [
37], a flat-top antenna model is utilized, in which
can be expressed as:
where
is the sidelobe gain,
. However, any other mmWave beam-forming strategy can be applied in the proposed scheme.
Figure 2 shows the schematic diagram of the considered flat-top mmWave antenna model for both mmWave TX and RX. The angles of the TX/RX communication beams, i.e.,
and
, are tuned by means of beam-forming training using steerable antenna arrays in both TX and RX UAVs.
3.3. Problem Formulation
In this section, we formulate the optimization problem of the decentralized gateway UAV selection. Suppose that there are
N access UAVs and
M gateway UAVs distributed in the post-disaster area, where
, and
. Each access UAV
i should select one of the gateway UAVs, i.e., gateway UAV
j, then fly towards it for relaying its collected information. This is done at every time
,
, where
indicates the total lifetime before the battery of the access UAV needs recharging. In this paper, the selected gateway UAVs should maximize the long-term average data rates of the access-gateway-cellular relays while satisfying the battery capacity constraint of the access UAVs during their flight periods. This maximization problem can be formulated as follows:
s.t.
- (1)
- (2)
where
is the data rate of the relay link between access UAV
i, gateway UAV
j, and the cellular macro-BS
j at time
. Let
For each time
,
is a matrix where
refers to a linkage indicator function that is equal to 1 if the access UAV
i is linked with gateway UAV
j and 0, otherwise, where each access UAV
i should select only one gateway UAV
j at a time
t as given in the first constraint of (3). The goal of the optimization problem in (3) is to maximize the long-term average total system rate by optimizing the selection of the linkage matrix
. In the second constraint of (3), the simple UAV energy model introduced by the authors in [
36] is utilized. However, more sophisticated UAV energy models like that presented in [
38] can be adopted in (3) without affecting the generalization of (3). In the second constraint in (3),
and
are the hovering and flying engine powers in Watts, respectively.
describes the hovering time needed for an access UAV to gather essential information from its dedicated coverage section.
is the minimum distance that should be flown by access UAV
i to establish the mmWave communication link with the gateway UAV
j chosen by access UAV
i at time
, and
reflects the flying speed of access UAV in m/sec. The term
indicates the energy consumed due to data communication in Joule, where
indicates the size of transmitted information data in bits, and
is the access UAV total battery capacity in Joule. Herein, we assume that all access UAVs have the same specifications of
,
,
,
, and
. In (3), we give high priority to the access UAVs’ battery consumptions when trying to select the gateway UAVs, maximizing the achievable data rate. This is because UAV battery consumption is one of the main concerns when designing an efficient UAV network due to its limited capacity, considering the high energy consumed during access UAV flights. However, we did not consider the constraint of the gateway UAV battery capacity due to two main reasons: (1) Typically, gateway UAVs have the highest remaining battery capacity among the UAVs; (2) in the network setting, gateway UAVs will not frequently fly like access UAVs. Instead, they hover beside their associated cellular macro-BS most of the time to provide relaying functionalities. It is stated in [
36,
38] that the power consumed in UAV hovering is much lower than that consumed during UAV flying.
Without loss of generality, we assume a half-duplex decode and forward (DF) relay strategy where time resources are equally divided between the access to gateway link and the gateway to cellular macro-BS linkage. Moreover, the uplink scenario is considered with round-robin time-sharing scheduling among access UAVs attached to the same gateway UAV. Thus,
in (3) can be expressed as:
where
indicates the number of time-scheduled access UAVs connected with the same gateway UAV
j at time
.
is the achievable data rate of the mmWave link between access UAV
i and gateway UAV
j, and
reflects the achievable data rate between gateway UAV
j and its corresponding cellular macro-BS
j. In this paper, we will focus on the value of
as it mainly results from the interference inside the UAV wireless network coming from other access-gateway selections.
can be expressed as:
where
is the allocated bandwidth, and
refers to the signal-to-interference plus noise–power ratio (SINR) of the linkage between access UAV
i and gateway UAV
j at time
. Based on (1) and considering the uplink scenario,
can be represented as:
where
is the noise power, and
is the number of access UAVs attached to the other gateway UAVs and scheduled within the same time slot assigned by gateway UAV
j to access UAV
i.
in (4) can be evaluated using (5), except that the SINR from gateway UAV
j to its corresponding macro-BS
j should be applied, i.e.,
, which can be expressed as:
where
is the number of gateway UAVs transmitting simultaneously at time
t.
Although the problem in (3) can be considered as a binary linear programming (BLP) problem, the conventional solutions of combinatorial optimization, such as the highly complicated exhaustive search approach, the graph-based approach, the branch-and-bound approach, etc., are not feasible solutions to (3). This is because the objective values , corresponding to a candidate linkage matrix , are not known beforehand unless access UAVs fly and connect with their corresponding gateway UAVs in . Thus, in the exhaustive search solution, for example, all access UAV flights and their corresponding linkage matrices should be obtained before selecting the optimal configuration. This is infeasible considering the battery capacity constraint of the access UAVs along with the time-sensitive rescue service. This highly complex and dynamic problem motivates us to use online learning by means of the multi-player MAB approach to address it. In this approach, access UAVs time-by-time proactively learn from their previous gateway selections/data rate observations how to enhance their future gateway selections, maximizing their achievable data rates within their limited budget of battery capacity. This is done without any prior knowledge about .
5. Numerical Analysis
In this section, extensive numerical simulations are conducted to compare the performances of the proposed BA-UCB, BA-TS, and BA-EXP3 MAB algorithms for gateway UAV selection. Moreover, their performances are compared with two benchmark approaches based on near and random gateway selections. In the first approach, an access UAV always selects the nearest gateway UAV to it, while in the second one, a random gateway UAV is selected by the access UAV at every time. These two approaches are chosen as benchmarks because no prior information about the achievable data rates of the access–gateway–cellular links is required, making them practical solutions to the considered gateway selection problem. Other solutions based on exhaustive search, graph-based, branch-and-bound are impractical from the perspective of access UAV battery consumptions and gateway selection times. This is because, in these schemes, the achievable data rates of candidate access UAVs–gateway UAVs configurations should be known before choosing the optimal setting. However, these values are unknown unless access UAVs fly and connect with the gateway UAVs in a particular candidate configuration, making them unfeasible solutions.
A post-disaster area of dimension 750 × 750 m
2 is assumed where access UAVs are uniformly distributed inside this area for rescue services. Gateway UAVs are uniformly distributed around this area in a circle of 1250 m diameter. Based on (1), the minimum distance for establishing a mmWave communication link between an access UAV and a gateway UAV is equal to:
where
indicates the maximum antenna gain for a particular value of
.
is the threshold received power corresponding to modulation index 0 (MC0) of IEEE 802.11ad standard [
44]. Thus, for example, when
is equal to 10°, 20°, 30°, 40°, 50° and 60°,
becomes 357, 179, 120, 90, 72 and 60 m, respectively. At low values of
, long minimum distance,
, can be held due to the free space propagation. Thus, the minimum flying distance by access UAV
i towards gateway UAV
j at time
is equal to:
where
indicates the radial separation distance between access UAV
i and gateway UAV
j.
Table 1 summarizes the simulation parameters used throughout numerical simulation unless otherwise stated. It is assumed that all access UAVs are fully charged at the beginning of the game with a total battery capacity of
given in
Table 1.
5.1. Performance Metrics
We used the following metrics to assess the performances of the compared gateway selection schemes:
Average total system rate: It is defined as the average sum rate of all UAVs relays over the time horizon. This can be expressed mathematically as:
where
and
are defined in
Section 5.
Average energy efficiency (bps/J) per access UAV: It is defined as the average data rate of the access UAV divided by its total energy consumption. Total energy consumption of an access UAV
i at time
t is the sum of energy consumptions of the data communications, hovering, and flying. Thus, the average energy efficiency of an access UAV can be expressed as:
where
,
and
represent the hovering, flying, and data communication energies consumed by access UAV
i when linked with gateway UAV
j at a time
.
Convergence rate of the proposed MAB algorithms: This measures the speed of convergence of the different proposed MAB algorithms despite the adversarial setting and the selfish behavior of the access UAVs. Towards this end, the system rate of the proposed MAB algorithms is evaluated against the time horizon.
5.2. Simulation Results
In the following section, the performances of the proposed BA-MAB-based gateway selection schemes are assessed under different system settings based on the performance metrics mentioned above.
5.2.1. Average Total System Rate
In this part of the simulation results, we give the average total system rate performances in Gbps against different values of gateway UAVs, access UAVs, and beam-widths.
Figure 4 shows the average system rate of the compared schemes against the number of access UAVs using 20 gateway UAVs and a beam-width of 60°. As shown in this figure, the BA-TS has the best performance due to its integrated Bayesian strategy based on constructing posterior distributions for the obtained data rates. On the other side, random gateway selection has the worst performance due to the randomness in the selected gateway UAV at each round. Consequently, access UAVs will experience random interference as well as a random number of time slots at each time
. The near gateway selection has better performance than random selection due to the fixed pattern of interference and the number of assigned time slots experienced by access UAVs at each time
. It is interesting to note that the average system rate of the MAB algorithms is increasing when using few numbers of access UAVs until reaching a certain point, then slightly decreasing as the number of access UAVs is increased. This comes from the low interference and time-sharing scheduling experienced by the small number of access UAVs. However, as the number of access UAVs is increased beyond the number of gateway UAVs, i.e., 20 UAVs, high interference, and low number of time slots are experienced by access UAVs. Although all MAB algorithms are highly affected by interference at a higher number of distributed access UAVs, BA-TS and BA-UCB still have the best average system rate performances. From
Figure 4, BA-EXP3 shows poor performance compared to the other MAB schemes and tends to reach the performance of the near selection at a high number of access UAVs. This comes from the nearly equal weights assigned to the gateway UAVs by the BA-EXP3 algorithm at each time step, which produces a poor gateway UAV selection policy. However, the BA-EXP3 algorithm still performs better than near and random selections. Using 25 access UAVs, BA-TS, BA-UCB, and BA-EXP3 have 60% (81%), 59.5% (80.5%), and 19% (37%) enhancement in the average system rate over the near (random) gateway selection, respectively.
Figure 5 shows the average system rate against increasing the number of gateway UAVs using 20 access UAVs and a beam-width of 60°. For all compared schemes, as the number of gateway UAVs is increased, the average system rate is increased due to the decrease in the interference experienced by the access UAVs. Moreover, as a low number of access UAVs are linked with the same gateway UAV, more time slots are assigned to them, contributing to increasing the total system rate as well. Yet, BA-TS and BA-UCB have the best performances over the other schemes. It is also interesting to note that at interfering environments that are too harsh and at a low number of assigned time slots, e.g., when the number of gateway UAVs is equal to 5, MAB-based algorithms still have some improvements over near and random selections. However, as the number of gateway UAVs reaches 40, about 88% (108%), 86% (105%), and 54% (70%) increases in average system rates are obtained using BA-TS, BA-UCB, and BA-EXP3 overusing near (random) selection, respectively.
Figure 6 shows the average system rate against the used beam-width using 20 gateway UAVs and 40 access UAVs. Generally, at lower values of beam-width, e.g., 10°, higher beam-forming gain and lower mutual interference occur, which highly increases the average system rate of all compared schemes. However, at higher values of beam-width, the beam-forming gain is decreased while the mutual interference is increased, resulting in a lower average system rate performance. From
Figure 6, BA-TS has the best performance overall compared schemes, while random selection has the worst overall values of beam-widths due to the reasons mentioned above. At a beam-width of 10°, BA-TS, BA-UCB, and BA-EXP3 have 30% (34%), 25% (30%), and 8% (13%) increases in the average system rate over near (random) selection, respectively. However, at a beam-width of 60°, 43% (66%), 38% (61%), and 5% (23%) improvement is obtained. This emphasizes the superior performance of the proposed MAB algorithms, even in a high interfering environment.
5.2.2. Average Energy Efficiency
In this part of simulation results, we study the average energy efficiency in bps/mJ of the compared gateway selection schemes against different values of gateway UAVs, access UAVs, and beam-widths.
Figure 7 shows the average energy efficiency against the number of access UAVs using 40 gateway UAVs and a beam-width of 60°. As given in
Figure 7, the proposed MAB-based gateway selection algorithms have better energy efficiency performances than near and random selections at all tested access UAV values. This comes from the proposed design of the BA-MAB algorithms, where the battery cost of the access UAV flight is taken into consideration while selecting the gateway UAV, maximizing its achievable data rate. For a low number of access UAVs, the data rate per access UAV is highly increased due to the low mutual interference and high number of assigned time slots. This results in high energy efficiency of all compared schemes, where the proposed BA-MAB algorithms show superior performances. However, at a high number of access UAVs, the achievable data rate of the access UAV is decreased due to the increase in the mutual interference accompanied by the decrease in the number of assigned time slots. This results in high decrease in the average energy efficiency, as shown in
Figure 7. Yet, the BA-MAB algorithms show better performances than the other schemes. Using 5 access UAVs, about 60% (70%), 50% (62%), and 32% (42%) improvements in average energy efficiency are obtained using the proposed BA-TS-, BA-UCB-, and BA-EXP3-based gateway selections over near (random) selection, respectively.
Figure 8 shows the average energy efficiency against the number of gateway UAVs using 20 access UAVs and a beam-width of 60°. Due to the low achievable data rate per access UAV when using a small number of gateway UAVs, e.g., 5, the average energy efficiencies of all compared schemes are highly decreased, as shown in
Figure 8. However, as the number of gateway UAVs is increased, the average energy efficiencies of all compared schemes are increasingly empowered by the increase in the achievable data rate per access UAV. At 40 gateway UAVs, 117% (143%), 114% (140%), and 68% (88%) enhancement in average energy efficiency is obtained using the proposed BA-TS, BA-UCB, and BA-EXP3 overusing near (random) selection, respectively.
Figure 9 shows the average energy efficiency against the used beam-width using 20 gateway UAVs and 40 access UAVs. Influenced by the increase in the achievable data rate, the average energy efficiency of the access UAV is also increased at low values of beam-width, e.g., 10°. It is also decreased at high values of beam-width affected by the decrease in the achievable data rate, as previously explained. However, the proposed BA-MAB algorithms show better performances over the other compared schemes at all tested values of beam-width. At beam-width of 10°, about 33% (39%), 27% (33%), and 6% (11%) improvement in average energy efficiency is obtained using the proposed BA-TS, BA-UCB, and BA-EXP3 overusing near (random) selection, respectively. These values become 43% (50%), 37% (44%), and 2% (8%) at a beam-width of 60°. This confirms that the proposed BA-MAB algorithms show better performance even in high interfering environments.
5.2.3. Convergence Rate
Convergence is one of the primary metrics for MAB applications; the MAB algorithms should reach the sub-optimal solution using a few attempts. Thus, in this section, we study the convergence of the total system rate of the proposed BA-MAB algorithms in different settings.
Figure 10,
Figure 11 and
Figure 12 show the convergence rate of the overall system rate using 20 gateway UAVs and a beam-width of 60° while changing the number of access UAVs by 20, 30, and 40, respectively. This emulates different interfering and time-sharing environments. In these figures,
t indicates the rounds of gateway UAV selection not as an absolute value in seconds, as its absolute value will be different from round to round due to the different flight durations towards the selected gateway UAVs at each round of selection. From these figures, all proposed BA-MAB algorithms converged after a few trials; specifically, they start to converge after 400 rounds. These results demonstrate that the proposed BA-MAB algorithms can converge rapidly regardless of the adversarial setting of the problem and the selfish behaviors of the access UAVs. This means that access UAVs learn to play actions that enhance the overall system performance at every attempt.