Stepwise Explainability for DRL in Air Traffic Control

Deep Reinforcement Learning (DRL) has emerged as a powerful tool for maintaining safe separation among aircraft in structured, high-density airspace. While DRL models such as Proximal Policy Optimization (PPO) and Deep Q-Networks have demonstrated strong performance in collision avoidance, their opaque decision-making processes hinder trust in safety-critical applications. Addressing this challenge, a novel framework—Stepwise Explainable Separation Assurance MEthod (SESAME)—integrates online and offline explainability to make DRL-driven aircraft separation assurance more transparent.

Image Credit to Wikimedia Commons | License details

The framework targets two distinct user groups: human operators, including pilots and air traffic controllers, and certification agencies such as the FAA. For real-time operational insight, SESAME employs a Soft Decision Tree (SDT) to distill DRL policies into interpretable decision paths, showing critical state features that influence each advisory. For post-event or certification analysis, SESAME uses the Linearly Estimated Gradient (LEG) saliency method to quantify the importance of input features—such as aircraft location and speed—on DRL outputs.

In the BlueSky air traffic simulator, SESAME was evaluated in complex scenarios with multiple intersections and stochastic aircraft entry. Each aircraft acts as an agent in a multi-agent reinforcement learning setup, coordinating with others to avoid conflicts. The action space is limited to acceleration, deceleration, or maintaining speed, with rewards penalizing collisions and unnecessary speed changes. State information includes ownship and intruder positions, speeds, and distances to intersections.

The SDT module is trained on state-action pairs generated by the DRL model, using supervised learning to replicate policy behavior. Batch normalization is applied to improve interpretability by normalizing input feature scales. Online visualizations include tree plots, which display feature weights as heatmaps along the decision path, and trajectory plots, which highlight the most influential features with symbols and text annotations in the airspace context.

Tree plots reveal how decision nodes prioritize different features. For example, in one case study, the root node focused on the ownship’s distance to its goal, while subsequent nodes considered distances to specific intruders and potential collision risks. Trajectory plots distill this into a concise visual, showing only the most critical factors—such as distances to goal and nearest intruder—paired with the chosen action.

For offline analysis, LEG recovers DRL gradients by perturbing selected features under constraints that preserve route integrity and intersection ordering. Saliency maps visualize the importance of each feature for possible actions, using color to indicate influence direction and magnitude. Position maps complement saliency maps by showing all aircraft in the sector, highlighting ownship, intruders, and others.

These tools allow certification analysts to identify decision patterns. One observed pattern involved three adjacent aircraft on the same route, with the ownship in the middle: decreasing distance to the trailing aircraft increased the likelihood of acceleration, while decreasing distance to the leading aircraft favored deceleration. Saliency maps confirmed that DRL models assign greater importance to intruders behind when considering acceleration, and to those ahead when considering deceleration.

Another pattern emerged at intersections where two aircraft from different routes were about to cross. Typically, the closer aircraft accelerated to pass first, while the other maintained speed or decelerated, reflecting cooperative behavior learned by the DRL model. Exceptions occurred in more complex traffic situations, where additional nearby aircraft influenced decisions to deviate from this pattern.

SESAME’s evaluation included fidelity scoring to measure how closely SDT outputs matched DRL decisions. Results showed SDTs outperforming traditional hard decision trees and random baselines, with batch normalization improving fidelity further. The framework generalized well across different DRL models, indicating robustness.

By combining interpretable surrogate models for real-time use with gradient-based saliency analysis for deeper offline investigation, SESAME bridges the gap between DRL’s performance and the transparency required in aviation safety systems. Its dual-module approach not only clarifies individual decisions but also uncovers consistent behavioral patterns, contributing to more predictable and trustworthy autonomous separation assurance.

spot_img

More from this stream

Recomended

Discover more from Aerospace and Mechanical Insider

Subscribe now to keep reading and get access to the full archive.

Continue reading