COMPARATIVE ANALYSIS OF POLICY-BASED AND VALUE-BASED RL METHODS FOR LOCAL MOTION PLANNING OF UAVS

Authors

DOI:

https://doi.org/10.28925/2663-4023.2026.32.1099

Keywords:

autonomous navigation, UAV, reinforcement learning, PPO, SAC, stochastic environment, robust control

Abstract

The paper presents a comparative study on the effectiveness of two fundamental Deep Reinforcement Learning architectures—Policy-based (using PPO) and Value-based (using SAC)—for the task of UAV local navigation under stochastic uncertainty. Experimental modeling, conducted in the PyFlyt environment considering turbulent wind, variable drone mass, and target motion, revealed critical discrepancies in algorithm performance. It was established that despite high training speed and sample efficiency, the SAC algorithm forms vulnerable policies with a low mission success rate (63.2%) due to value function estimation errors in dynamic states. In contrast, the PPO method ensured the formation of a robust control strategy with a 97.1% success rate and the generation of optimal trajectories without oscillations. The results confirm that for continuous control tasks in unpredictable environments, direct policy optimization methods are more effective as they avoid the "Reward Hacking" effect and better generalize the physical laws of flight.

Downloads

Download data is not yet available.

References

AlMahamid, F., & Grolinger, K. (2022). Autonomous unmanned aerial vehicle navigation using reinforcement learning: A systematic review. Engineering Applications of Artificial Intelligence, 115, 105321. https://doi.org/10.1016/j.engappai.2022.105321

Zhou, Y., Shu, J., Zheng, X., Hao, H., & Song, H. (2022). Real-time route planning of unmanned aerial vehicles based on improved soft actor-critic algorithm. Frontiers in Neurorobotics, 16, 1025817. https://doi.org/10.3389/fnbot.2022.1025817

Tang, C., Abbatematteo, B., Hu, J., Chandra, R., Martín-Martín, R., & Stone, P. (2025). Deep reinforcement learning for robotics: A survey of real-world successes. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 39). https://doi.org/10.1609/aaai.v39i27.35095

Oyinlola, S., Subedi, N., & Sarkar, S. (2025). Reinforcement learning for autonomous point-to-point UAV navigation. arXiv. https://doi.org/10.48550/arXiv.2509.13943

Bălaşa, R.-I., Bîlu, M., & Iordache, C. (2022). A proximal policy optimization reinforcement learning approach to unmanned aerial vehicles attitude control. Land Forces Academy Review, 27, 400–410. https://doi.org/10.2478/raft-2022-0049

Chen, S., Mo, Y., Wu, X., Xiao, J., & Liu, Q. (2024). Reinforcement learning-based energy-saving path planning for UAVs in turbulent wind. Electronics, 13(16), Article 3190. https://doi.org/10.3390/electronics13163190

Tai, J. J., Wong, J., Innocente, M., Horri, N., Brusey, J., & Phang, S. K. (2023). PyFlyt—UAV simulation environments for reinforcement learning research. arXiv. https://doi.org/10.48550/arXiv.2304.01305

Geronel, R. S., Botez, R. M., & Bueno, D. D. (2023). Dynamic responses due to the Dryden gust of an autonomous quadrotor UAV carrying a payload. The Aeronautical Journal, 127(1307), 116–138. https://doi.org/10.1017/aer.2022.35

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268), 1–8.

Skalse, J., Howe, N. H. R., Krasheninnikov, D., & Krueger, D. (2022). Defining and characterizing reward hacking. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS 2022).

Liu, H., Shen, Y., Zhou, C., Zou, Y., Gao, Z., & Wang, Q. (2024). TD3-based collision-free motion planning for robot navigation. In Proceedings of the 6th International Conference on Communications, Information System and Computer Engineering (CISCE 2024) (pp. 247–250). https://doi.org/10.1109/CISCE62493.2024.10653233

Downloads


Abstract views: 4

Published

2026-03-26

How to Cite

Trembovetskyi, R., & Rozlomii, I. (2026). COMPARATIVE ANALYSIS OF POLICY-BASED AND VALUE-BASED RL METHODS FOR LOCAL MOTION PLANNING OF UAVS. Electronic Professional Scientific Journal «Cybersecurity: Education, Science, Technique», 4(32), 767–774. https://doi.org/10.28925/2663-4023.2026.32.1099