Generalizing Safety Beyond Collision-Avoidance via Latent-Space Reachability Analysis

Abstract

Hamilton-Jacobi (HJ) reachability is a rigorous mathematical framework that enables robots to simultaneously detect unsafe states and generate actions that prevent future failures. While in theory, HJ reachability can synthesize safe controllers for nonlinear systems and nonconvex constraints, in practice, it has been limited to hand-engineered collision- avoidance constraints modeled via low-dimensional state-space representations and first-principles dynamics. In this work, our goal is to generalize safe robot controllers to prevent failures that are hard—if not impossible—to write down by hand, but can be intuitively identified from high-dimensional observations: for example, spilling the contents of a bag. We propose Latent Safety Filters, a latent-space generalization of HJ reachability that tractably operates directly on raw observation data (e.g., RGB images) by performing safety analysis in the latent embedding space of a generative world model. This transforms nuanced constraint specification to a classification problem in latent space and enables reasoning about dynamical consequences that are hard to simulate. In simulation and hardware experiments, we use Latent Safety Filters to safeguard arbitrary policies (from generative policies to direct teleoperation) from complex safety hazards, like preventing a Franka Research 3 manipulator from spilling the contents of a bag or toppling cluttered objects.

Method

Our Latent Safety Filter can detect, predict, and mitigate failures that are hard to model (e.g., spilling the contents of a bag), such as those encountered in vision-based manipulation. Our idea is to perform approximate reachability analysis in the latent space of a world model (light grey region). The latent failure set is shown as a black region, with an example of an imagined failure observation shown in the upper right. Our method identifies latent states from which the robot is doomed to enter visually-observable failures no matter what actions it takes (larger red set shown above), and automatically overrides the base policy with safety-preserving actions from our safety policy to prevent spilling the content of the bag.

Case Study: Dubin's Car

We start by studying a canonical safe-control benchmark: collision-avoidance of a static obstacle with a vehicle. Because this is a standard, low-dimensional benchmark, we can rigorously compare the quality of the safety filter to an exact grid-based solution and a privileged-state RL-based safety filter.

We train an RSSM using 2000 offline trajectories consisting of randomly sampled actions and conduct reachability analysis in the learned latent space of the world model. Our latent safety filter is comparable to the privileged state RL-baseline (PrivilegedSafe) and has an F1 score of 0.982

In the main manuscript, we also investigate the effect of the biased world model training data on the downstream safety filter.

Simulation Experiments

We study the ability of our latent safety filter to prevent a 7-DoF Franka Research 3 manipulator from knocking over the red blocks while allowing it to pick up the green blocks. We apply our latent Hamilton-Jacobi reachability analysis in the latent space of a recurrent state-space model (RSSM). We show successful filtered rollouts below.

We compare our method (LatentSafe) to an unfiltered task policy trained with DreamerV3 using negative rewards for constraint violations (Dreamer), and a constrained MDP-based method (SQRL). Our latent Hamilton-Jacobi safety exhibit the safest behavior while still allowing the robot to achieve its task.

Hardware Experiments

We test our latent safety filter on hardware on a contact-rich manipulation task: picking up an opened bag of Skittles. Our safety specification is not allowing any skittles in the bag to fall out.

Direct Teleoperation (no filter)

This is challenging due to the partial observability and uncertain dynamics of the overall environment. We conduct reachability analysis in the latent space of DINO-WM, a ViT-based world model that uses the patch tokens of DINOv2 as a latent state.

Teleoperation with Latent Safety Filter

Notably, our policy is entirely agnostic to the base policy used to perform the task. This means we are even able to shield a human teleoperating the robot.

We also stress-test our latent safety filter on out-of-distribution bags. We find that our latent safety filter can zero-shot generalize to skittle bags of different colors, or background changes.

We hypothesize that this generalization capability is due to the strong pretrained visual representation in DINOv2.

Failure Modes

Our latent safety filter is not without limitations. Despited being grounded in rigorous theory, our method does not ensure safety in all cases. This is a natural consequence of using learned world models and learning-based approximations of the Hamilton-Jacobi value function in our experiments. Quantifying the inaccuracies and understanding how these latent safety filters fail is an exciting direction for future work.
In simulation, we find that our value function can occassionally allow the robot to take an unsafe action. We believe that this can be attributed to the quality of the world model not correctly predicting the consequences of a given action. We show a failure mode below.

Our hardware experiments also exhibit non-zero failure rates. We believe that a large reason for this is due to the partial observability of the skittle bag pick up task. It is unclear from the persepective of the robot where the locations of the skittles inside the bag are. As such, occassionally the filter allows 1-2 skittles to fall out of the bag. We show a video of this failure mode below.
We also demonstrate a more severe failure mode that arises when the opened end of the bag is unobservable from the wrist or front-facing camera.

BibTeX

@article{nakamura2025generalizing,
          title={Generalizing Safety Beyond Collision-Avoidance via Latent-Space Reachability Analysis},
          author={Nakamura, Kensuke and Peters, Lasse and Bajcsy, Andrea},
          year={2025}
        }