We present SonoGym, a scalable simulation platform for robotic ultrasound, enabling parallel simulation across tens to hundreds of environments. Our framework supports realistic and real-time simulation of ultrasound data from CT-derived 3D models of the anatomy through both a physics-based and a Generative Adversarial Network (GAN) approach. Our framework enables the training of deep reinforcement learning (DRL) and recent imitation learning agents (IL) (vision transformers and diffusion policies) for ultrasound-guided navigation, anatomy reconstruction and surgery. We believe our simulation can facilitate research in robot learning approaches for such challenging robotic surgery applications. Future research directions include improving ultrasound simulation quality and diversity, modeling soft tissue deformation, scaling to larger patient populations, improving generalization over different patients, and validation with real systems in clinical settings.
We demonstrate high-performance Proximal Policy Optimization, Action Chunking Transformer and Diffusion Policy agents for the navigation task. The bottom of the videos show the learning-based ultrasound simulation from the first 8 environments. The goal plane for navigation is the tranverse plane across the center of the L4 lumbar vertebra.
We support high-performing submodular Proximal Policy Optimization agents for the anatomy reconstruction task. We also show the reconstruction of the heuristic trajectory for comparison. The top right corner shows the real-time observation of the agent, which is the current surface reconstruction transformed to the ultrasound probe frame. The red vertebra model is only for visualization and is not included in the observation. The bottom right corner shows the reconstruction status. The covered and uncovered surface points are colored in yellow and blue, respectively.
We support high-performing safe Proximal Policy Optimization and Action Chunking Transformer agents for the ultrasound-guided surgery task. The right half of the videos show the trajectories (green) from 50 environments towards the target L4 vertebra (blue). The goal frame / end point for the trajectory is annotated in red. For PPO + safety filter, predicted unsafe actions are stopped before entering the target vertebra.