TY - GEN
T1 - UAV Path Planning for Wireless Data Harvesting
T2 - 2020 IEEE Global Communications Conference, GLOBECOM 2020
AU - Bayerlein, Harald
AU - Theile, Mirco
AU - Caccamo, Marco
AU - Gesbert, David
N1 - Funding Information:
H. Bayerlein and D. Gesbert are supported by the PERFUME project funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement no. 670896). M. Caccamo was supported by an Alexander von Humboldt Professorship endowed by the German Federal Ministry of Education and Research.
Publisher Copyright:
© 2020 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subject to limited flying time and obstacle avoidance. While previous approaches, learning and non-learning based, must perform expensive recomputations or relearn a behavior when important scenario parameters such as the number of sensors, sensor positions, or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters. By exploiting a multi-layer map of the environment fed through convolutional network layers to the agent, we show that our proposed network architecture enables the agent to make movement decisions for a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints. Considerable advantages in learning efficiency from using a map centered on the UAV's position over a non-centered map are also illustrated.
AB - Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subject to limited flying time and obstacle avoidance. While previous approaches, learning and non-learning based, must perform expensive recomputations or relearn a behavior when important scenario parameters such as the number of sensors, sensor positions, or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters. By exploiting a multi-layer map of the environment fed through convolutional network layers to the agent, we show that our proposed network architecture enables the agent to make movement decisions for a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints. Considerable advantages in learning efficiency from using a map centered on the UAV's position over a non-centered map are also illustrated.
UR - http://www.scopus.com/inward/record.url?scp=85098421737&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098421737&partnerID=8YFLogxK
U2 - 10.1109/GLOBECOM42002.2020.9322234
DO - 10.1109/GLOBECOM42002.2020.9322234
M3 - Conference contribution
AN - SCOPUS:85098421737
T3 - 2020 IEEE Global Communications Conference, GLOBECOM 2020 - Proceedings
BT - 2020 IEEE Global Communications Conference, GLOBECOM 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 7 December 2020 through 11 December 2020
ER -