Mobile robots are increasingly deployed in applications ranging from household cleaning to hazardous industrial inspection, where autonomous navigation through complex, dynamic environments is essential. Traditional navigation methods such as the Dynamic Window Approach (DWA), Timing Elastic Band (TEB), and Model Predictive Control (MPC) perform reliably in static settings but falter when obstacles move unpredictably. These algorithms rely on pre-defined safety criteria, often leading to stagnation when faced with fast-moving impediments.

Deep reinforcement learning (DRL) has emerged as a promising alternative, mapping raw sensor inputs directly to control actions and enabling robots to learn complex behaviors. DRL offers robustness and generalization in varied environments, yet suffers from long training times and poor long-term memory. Without retaining early state information, DRL agents can fall into local minima during extended navigation tasks.
A novel framework addresses these limitations by integrating traditional global planning with DRL-based local navigation, connected via a dynamic sub-goal generator. The global planner, using algorithms such as A* or Dijkstra, charts an initial path through a static map built via Rapidly-exploring Random Trees (RRT). Along this path, the sub-goal generator identifies high-value landmarks in real time, considering both static and dynamic obstacles. The DRL local planner then navigates between these landmarks, reducing computational burden and training complexity.
To overcome DRL’s memory deficit, the local planner employs a feature extraction network with a Long Short-Term Memory (LSTM) module. This architecture preserves long-term dependencies in sensor data and reward history, improving decision-making over extended distances. The reinforcement learning component uses an Asynchronous Advantage Actor-Critic (A3C) model, where multiple agents train in parallel on varied environments, sharing experiences to refine global policy and value networks.
The sub-goal generator processes raw LiDAR data to detect two frontier types: Type A, defined by maximum sensor range, and Type B, defined by obstacle boundaries. Geometric rules select sub-goal positions ensuring safe passage and efficient progress. Once a sub-goal is reached, the global path is updated, maintaining alignment with the final destination.
Local navigation is modeled as a partially observable Markov decision process, with the actor network outputting motion strategies and the critic network evaluating them. Training involves randomized environments with varying obstacle counts and speeds, using a reward system that incentivizes reaching sub-goals and penalizes collisions or unsafe proximity. Positive rewards are assigned for successful landmark arrivals, while negative rewards discourage stagnation and risky maneuvers.
Experiments in ROS 2D simulation with Turtlebot3 compared the proposed method against DWA, TEB, and a DRL-based end-to-end approach (CADRL). Four test environments ranged from static obstacles to 20 dynamic obstacles moving at up to 0.3 m/s. Metrics included navigation time, path length, collision count, and success rate.
Results showed that while TEB excelled in navigation efficiency with few slow obstacles, it slowed significantly as obstacle speed increased, due to reliance on global paths and conservative avoidance. CADRL achieved competitive speeds at low obstacle velocities but lagged in efficiency and safety under more challenging conditions. The proposed framework consistently outperformed others in scenarios with numerous fast-moving obstacles, achieving up to 26.6% higher success rates, 20% better navigation efficiency, and 34% fewer collisions than the second-best method.
Obstacle avoidance capability was particularly strong: at 0.3 m/s obstacle speed, collision counts for the new method were more than 50% lower than competitors. The inclusion of the LSTM memory module further improved efficiency, shortening paths and reducing unnecessary rotations when surrounded by multiple moving obstacles.
Training convergence was also faster. Reward curves indicated the proposed method stabilized around episode 470, compared to CADRL’s 600, demonstrating reduced training difficulty by decomposing navigation into global and local tasks.
By combining numerical optimization for global planning, dynamic landmark generation, and memory-enhanced DRL for local control, this framework delivers robust, efficient, and safe navigation in dynamic environments. Its modular design supports integration with existing robotic platforms and offers a pathway toward more adaptive, real-world-ready autonomous systems.
