COMPARATIVE ANALYSIS OF POLICY-BASED AND VALUE-BASED RL METHODS FOR LOCAL MOTION PLANNING OF UAVS
DOI:
https://doi.org/10.28925/2663-4023.2026.32.1099Keywords:
autonomous navigation, UAV, reinforcement learning, PPO, SAC, stochastic environment, robust controlAbstract
The paper presents a comparative study on the effectiveness of two fundamental Deep Reinforcement Learning architectures—Policy-based (using PPO) and Value-based (using SAC)—for the task of UAV local navigation under stochastic uncertainty. Experimental modeling, conducted in the PyFlyt environment considering turbulent wind, variable drone mass, and target motion, revealed critical discrepancies in algorithm performance. It was established that despite high training speed and sample efficiency, the SAC algorithm forms vulnerable policies with a low mission success rate (63.2%) due to value function estimation errors in dynamic states. In contrast, the PPO method ensured the formation of a robust control strategy with a 97.1% success rate and the generation of optimal trajectories without oscillations. The results confirm that for continuous control tasks in unpredictable environments, direct policy optimization methods are more effective as they avoid the "Reward Hacking" effect and better generalize the physical laws of flight.
Downloads
References
AlMahamid, F., & Grolinger, K. (2022). Autonomous unmanned aerial vehicle navigation using reinforcement learning: A systematic review. Engineering Applications of Artificial Intelligence, 115, 105321. https://doi.org/10.1016/j.engappai.2022.105321
Zhou, Y., Shu, J., Zheng, X., Hao, H., & Song, H. (2022). Real-time route planning of unmanned aerial vehicles based on improved soft actor-critic algorithm. Frontiers in Neurorobotics, 16, 1025817. https://doi.org/10.3389/fnbot.2022.1025817
Tang, C., Abbatematteo, B., Hu, J., Chandra, R., Martín-Martín, R., & Stone, P. (2025). Deep reinforcement learning for robotics: A survey of real-world successes. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 39). https://doi.org/10.1609/aaai.v39i27.35095
Oyinlola, S., Subedi, N., & Sarkar, S. (2025). Reinforcement learning for autonomous point-to-point UAV navigation. arXiv. https://doi.org/10.48550/arXiv.2509.13943
Bălaşa, R.-I., Bîlu, M., & Iordache, C. (2022). A proximal policy optimization reinforcement learning approach to unmanned aerial vehicles attitude control. Land Forces Academy Review, 27, 400–410. https://doi.org/10.2478/raft-2022-0049
Chen, S., Mo, Y., Wu, X., Xiao, J., & Liu, Q. (2024). Reinforcement learning-based energy-saving path planning for UAVs in turbulent wind. Electronics, 13(16), Article 3190. https://doi.org/10.3390/electronics13163190
Tai, J. J., Wong, J., Innocente, M., Horri, N., Brusey, J., & Phang, S. K. (2023). PyFlyt—UAV simulation environments for reinforcement learning research. arXiv. https://doi.org/10.48550/arXiv.2304.01305
Geronel, R. S., Botez, R. M., & Bueno, D. D. (2023). Dynamic responses due to the Dryden gust of an autonomous quadrotor UAV carrying a payload. The Aeronautical Journal, 127(1307), 116–138. https://doi.org/10.1017/aer.2022.35
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268), 1–8.
Skalse, J., Howe, N. H. R., Krasheninnikov, D., & Krueger, D. (2022). Defining and characterizing reward hacking. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS 2022).
Liu, H., Shen, Y., Zhou, C., Zou, Y., Gao, Z., & Wang, Q. (2024). TD3-based collision-free motion planning for robot navigation. In Proceedings of the 6th International Conference on Communications, Information System and Computer Engineering (CISCE 2024) (pp. 247–250). https://doi.org/10.1109/CISCE62493.2024.10653233
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Роман Трембовецький, Інна Розломій

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.