World Models for General Surgical Grasping

The Chinese University of Hong Kong1, Lipscomb University2
2024

Abstract

Intelligent vision control systems for surgical robots should adapt to unknown and diverse objects, while being robust to system disturbances. Previous methods did not meet these requirements due to mainly relying on pose estimation and feature tracking. We propose a world-model-based deep reinforcement learning framework ``Grasp Anything for Surgery'' (GAS), that learns a pixel-level visuomotor policy for surgical robots, enhancing both generality and robustness. In particular, a novel method is proposed to estimate the values and uncertainties of depth pixels for a rigid-link object's inaccurate region based on the empirical prior of the object's size; both depth and mask images of task objects are encoded to a single compact 3-channel image (size: 64x64x3) by dynamically zooming in the mask regions, minimizing the information loss. The learned controller's effectiveness is extensively evaluated in simulation and in a real robot. Our learned visuomotor policy handles: i) unseen objects, including 5 types of target grasping objects and a robot gripper, in unstructured real-world surgery environments, and ii) disturbances in perception and control. Note that we are the first work to achieve a unified surgical control system that grasp diverse surgical objects using different robot grippers on real robots in complex surgery scenes (average success rate: 69%). Our system also demonstrates significant robustness across 6 conditions including background variation, target disturbance, camera pose variation, kinematic control error, image noise, and re-grasping after the gripped target object drops from the gripper.

Grasping in Liver Phantom

Generality Study

1. Grasp Diverse Surgical Objects

2. Grasp needle with a unseen gripper

Robustness Study

1. Background Variation

2. Shaking Camera

3. Camera occlusiona and pose variation

4. Light variation and Lens Fogging

5. Target Pose Disturbance

6. Regrasping

Video Object Segmentation

BibTeX

        @article{lin2024world,
  title={World Models for General Surgical Grasping},
  author={Lin, Hongbin and Li, Bin and Wong, Chun Wai and Rojas, Juan and Chu, Xiangyu and Au, Kwok Wai Samuel},
  journal={Robotics: Science and Systems},
  year={2024}
}