Human-Like Navigation in Deep Reinforcement Learning Agents

Deep Reinforcement Learning (DRL) has achieved remarkable milestones, producing artificial agents capable of matching or surpassing human performance in diverse tasks, from mastering Atari games to controlling robotic manipulators. Yet, much of the literature focuses on aggregate performance metrics, often overlooking the fine-grained spatiotemporal dynamics of agent behavior. This omission leaves open questions about how closely DRL agents’ movement patterns align with those of humans, especially in navigation tasks involving obstacles.

Image Credit to shutterstock.com | License details

A recent study addressed this gap by comparing human participants and DRL agents navigating simple virtual environments populated with obstacles. The goal was twofold: to assess similarities and differences in route selection and to determine whether a task-dynamical model of human navigation—the Fajen and Warren Dynamical Perceptual-Motor Primitive (FW-DPMP) model—could capture both human and DRL trajectories. This model represents steering toward goals and avoiding obstacles using attractor and repulsor dynamics within a non-linear framework.

Human participants, recruited both online via Amazon Mechanical Turk and in-person, navigated a 40×40 m virtual arena using keyboard or joystick controls. Targets were represented by red cylinders, obstacles by blue cylinders. DRL agents were trained in the same Unity-based environment using the MLAgents framework with Proximal Policy Optimization. Two agent types were developed: one using raycast sensor data with target position, and another using visual pixel input from a first-person view. Both were trained for five million steps.

The study examined 108 unique scenarios derived from combinations of obstacle layouts, start positions, and target locations. Analysis involved three measures: the number of preferred routes, confidence interval (CI) overlap between DRL and human trajectories, and average distance between trajectories. Preferred routes were defined as distinct paths taken by at least 10% of participants in a scenario. Humans exhibited more route diversity, with 41.67% of scenarios showing multiple preferred paths, compared to 27.78% for DRL agents.

CI analysis revealed striking similarity: on average, 98.45% of DRL agents’ mean trajectories fell within the human 95% CI. Only a handful of scenarios showed notable deviations, often where humans favored straighter paths and DRL agents curved around obstacles. Distance measures confirmed that far-side routes—those requiring traversal across the field—produced greater divergence between groups, reflecting increased opportunities for variation.

Within-group variability was higher for humans across all route types. DRL agents demonstrated more consistent trajectories, especially in near-side routes, while humans showed their greatest consistency in middle routes. Both groups exhibited the most variation in far-side routes.

Fitting the FW-DPMP model to observed trajectories allowed comparison of three key parameters: β (damping of turning rate), γ (goal attraction), and ε (obstacle repulsion). DRL agents’ parameters were more tightly tuned, with less variability across scenarios. For both humans and DRL agents, damping was lowest in middle routes, where shorter distances necessitated sharper obstacle avoidance. Repulsion was lowest in far-side routes, reflecting smoother adjustments over longer travel distances. However, only DRL agents showed significant variation in goal attraction across route types, with highest attraction in middle routes and lowest in far-side routes.

These findings suggest that DRL agents, when trained in simple navigation tasks with clear goals, can produce trajectories closely matching human patterns, likely because their learned policies approximate the low-dimensional dynamics captured by the FW-DPMP model. Yet, the reduced variability in DRL behavior underscores a limitation: over-tuning to task specifics can impair flexibility and human-like adaptability.

The study points to potential in hybrid approaches, combining DRL’s capacity for high-level decision-making with DPMP models’ transparent control of movement dynamics. Such integration could yield agents that not only perform optimally but also exhibit the variability and nuance characteristic of human navigation, enhancing compatibility in human–AI collaboration. For engineers and roboticists, these insights reinforce the importance of evaluating not just whether an agent reaches its goal, but how it moves through space to get there.

spot_img

More from this stream

Recomended

Discover more from Aerospace and Mechanical Insider

Subscribe now to keep reading and get access to the full archive.

Continue reading