Reinforcement Learning for Smarter Ocean Sampling

Autonomous underwater and surface vehicles have become indispensable tools for monitoring dynamic aquatic environments. These systems enable studies of ocean fronts, salinity profiles, harmful algae blooms, and other phenomena that evolve across vast spatial and temporal scales. Traditional sampling methods, such as manned vessels or fixed buoys, often yield sparse data, making predictive models essential for guiding robotic assets to optimal sampling locations.

Image Credit to Wikimedia Commons | License details

A recent framework integrates reinforcement learning with classical partial differential equation (PDE) modeling to address limitations in conventional sampling strategies. By formulating ocean features—such as temperature, turbidity, or chlorophyll-A concentration—as spatio-temporal fields governed by PDEs, the method allows an agent to actively select informative sampling points. This approach reduces overfitting risks by balancing fitting error with cross-validation error, ensuring that collected samples not only match observed data but also generalize well across the domain.

The agent operates in a modeled 2-D aquatic environment, navigating obstacle-free regions while leveraging ocean currents. Its motion is described by a nonlinear kinematic model, with control inputs for forward speed and angular velocity. The action space is discretized into combinations of speed and heading changes, enabling efficient policy learning. Observations come from onboard sensors such as IMUs and GPS, with Gaussian noise incorporated to reflect real-world uncertainties.

To solve the estimation problem, the framework employs the SARSA(λ) reinforcement learning algorithm with linear function approximation via stochastic semi-gradient descent. Tile coding is used to construct feature vectors, partitioning the continuous state space into overlapping grids for computationally efficient representation. Eligibility traces enhance learning efficiency by tracking recently active features and updating their associated weights based on the trace-decay parameter λ.

The reward function is designed to encourage minimization of both fitting and cross-validation errors within a fixed number of sampling steps. Episodes terminate after a set number of observations, with rewards inversely proportional to the total error. This structure promotes targeted exploration while avoiding unproductive wandering.

Simulation experiments tested the framework under varying step sizes, trace decay rates, and exploration probabilities (ɛ-greedy parameters). Results showed that smaller step sizes and larger λ values improved rewards and reduced estimation errors. Exploration rates had less pronounced effects, though higher ɛ encouraged broader sampling. Optimal agent paths tended to follow gradients of the ocean feature, crossing multiple level sets to capture diverse data.

In one scenario, the framework estimated parameters of a constant flow field with high accuracy, yielding an estimated vector b̂ close to the true b. The total error converged near twice the variance of the observation noise, consistent with theoretical expectations. A more complex test involved a double-gyre flow field, a common oceanic structure characterized by oscillating vortices. Here, the PDE model was extended to include diffusion and non-constant velocity fields, requiring the solution of an advection-diffusion equation with Neumann boundary conditions.

Simulating the double-gyre system demanded significant computational resources due to repeated PDE solves within optimization loops. Using Bayesian optimization and a high-performance computing server, the agent learned gyre parameters with minimal error over just ten episodes. Path analysis revealed that while the agent generally tracked the contaminant effectively, care was needed near the gyre separation line to avoid misalignment with the feature.

The framework’s novelty lies in merging classical PDE-based estimation with reinforcement learning-driven sample selection. This hybrid approach enables adaptive, data-efficient mapping of dynamic ocean features without relying on pre-defined sampling grids. Potential extensions include scaling to three-dimensional environments, deploying cooperative multi-agent systems, and applying the method to real-world datasets such as those from the Regional Ocean Modeling System. Tracking complex structures like Lagrangian coherent structures could further broaden its applications in ocean exploration and environmental monitoring.

spot_img

More from this stream

Recomended

Discover more from Aerospace and Mechanical Insider

Subscribe now to keep reading and get access to the full archive.

Continue reading