Skip to main navigation Skip to search Skip to main content

VMSIS: A Pre-trained Vision Transformer with Mamba Decoder for Surgical Instrument Segmentation

  • Yuechen Tao*
  • , Xiaobo Zhu
  • , Shiwei Wu
  • , He Sun
  • , Jiangang Liu
  • , Yu An
  • , Jie Tian
  • , Zhenyu Liu
  • *Corresponding author for this work
  • CAS - Institute of Automation
  • University of Chinese Academy of Sciences
  • Nanjing University of Science and Technology
  • Beihang University
  • National Key Laboratory of Kidney Diseases

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Accurate surgical instrument segmentation plays a vital role in robot assisted surgery. We present VMSIS, a hybrid architecture that combines the visual representation capabilities of self-supervised DINOv2 with the efficient sequence modeling of Mamba for surgical instrument segmentation. Our approach trained DINOv2 backbone with over 900,000 frames of RGB surgical videos and introduces a Mamba-based decoder that effectively captures temporal dependencies in surgical video sequences with backbone frozen. By processing 10 consecutive frames, our model achieves accurate instrument segmentation while maintaining temporal consistency. Experiments on 4 reorganized public datasets demonstrate the effectiveness of our approach, achieving competitive results with fewer trainable parameters compared to traditional methods.

Original languageEnglish
Title of host publication2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2025 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331586188
DOIs
StatePublished - 2025
Event47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2025 - Copenhagen, Denmark
Duration: 14 Jul 202518 Jul 2025

Publication series

NameProceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
ISSN (Print)1557-170X

Conference

Conference47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2025
Country/TerritoryDenmark
CityCopenhagen
Period14/07/2518/07/25

Fingerprint

Dive into the research topics of 'VMSIS: A Pre-trained Vision Transformer with Mamba Decoder for Surgical Instrument Segmentation'. Together they form a unique fingerprint.

Cite this