Trajectory Design for UAV-Based Internet-of-Things Data Collection: A Deep Reinforcement Learning Approach

Wang, Yang; Gao, Zhen; Zhang, Jun; Cao, Xianbin; Zheng, Dezhi; Gao, Yue; Ng, Derrick Wing Kwan; Di Renzo, Marco

Computer Science > Information Theory

arXiv:2107.11015 (cs)

[Submitted on 23 Jul 2021]

Title:Trajectory Design for UAV-Based Internet-of-Things Data Collection: A Deep Reinforcement Learning Approach

Authors:Yang Wang, Zhen Gao, Jun Zhang, Xianbin Cao, Dezhi Zheng, Yue Gao, Derrick Wing Kwan Ng, Marco Di Renzo

View PDF

Abstract:In this paper, we investigate an unmanned aerial vehicle (UAV)-assisted Internet-of-Things (IoT) system in a sophisticated three-dimensional (3D) environment, where the UAV's trajectory is optimized to efficiently collect data from multiple IoT ground nodes. Unlike existing approaches focusing only on a simplified two-dimensional scenario and the availability of perfect channel state information (CSI), this paper considers a practical 3D urban environment with imperfect CSI, where the UAV's trajectory is designed to minimize data collection completion time subject to practical throughput and flight movement constraints. Specifically, inspired from the state-of-the-art deep reinforcement learning approaches, we leverage the twin-delayed deep deterministic policy gradient (TD3) to design the UAV's trajectory and present a TD3-based trajectory design for completion time minimization (TD3-TDCTM) algorithm. In particular, we set an additional information, i.e., the merged pheromone, to represent the state information of UAV and environment as a reference of reward which facilitates the algorithm design. By taking the service statuses of IoT nodes, the UAV's position, and the merged pheromone as input, the proposed algorithm can continuously and adaptively learn how to adjust the UAV's movement strategy. By interacting with the external environment in the corresponding Markov decision process, the proposed algorithm can achieve a near-optimal navigation strategy. Our simulation results show the superiority of the proposed TD3-TDCTM algorithm over three conventional non-learning based baseline methods.

Comments:	Accepted by IEEE Internet of Things Journal. The codes and some other materials about this work may be available at this https URL
Subjects:	Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
Cite as:	arXiv:2107.11015 [cs.IT]
	(or arXiv:2107.11015v1 [cs.IT] for this version)
	https://doi.org/10.48550/arXiv.2107.11015

Submission history

From: Zhen Gao [view email]
[v1] Fri, 23 Jul 2021 03:33:29 UTC (1,746 KB)

Computer Science > Information Theory

Title:Trajectory Design for UAV-Based Internet-of-Things Data Collection: A Deep Reinforcement Learning Approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Theory

Title:Trajectory Design for UAV-Based Internet-of-Things Data Collection: A Deep Reinforcement Learning Approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators