跳到主要导航 跳到搜索 跳到主要内容

DefMamba: Deformable Visual State Space Model

  • Leiye Liu
  • , Miao Zhang*
  • , Jihao Yin
  • , Tingwei Liu
  • , Wei Ji
  • , Yongri Piao*
  • , Huchuan Lu
  • *此作品的通讯作者
  • Dalian University of Technology
  • Yale University

科研成果: 期刊稿件会议文章同行评审

摘要

Recently, state space models (SSM), particularly Mamba, have attracted significant attention from scholars due to their ability to effectively balance computational efficiency and performance. However, most existing visual Mamba methods flatten images into 1D sequences using predefined scan orders, which results the model being less capable of utilizing the spatial structural information of the image during the feature extraction process. To address this issue, we proposed a novel visual foundation model called Def-Mamba. This model includes a multi-scale backbone structure and deformable mamba (DM) blocks, which dynamically adjust the scanning path to prioritize important information, thus enhancing the capture and processing of relevant input features. By combining a deformable scanning (DS) strategy, this model significantly improves its ability to learn image structures and detects changes in object details. Numerous experiments have shown that Def-Mamba achieves state-of-the-art performance in various visual tasks, including image classification, object detection, instance segmentation, and semantic segmentation. The code is open source on DefMamba.

源语言英语
页(从-至)8838-8847
页数10
期刊Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOI
出版状态已出版 - 2025
已对外发布
活动2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 - Nashville, 美国
期限: 11 6月 202515 6月 2025

指纹

探究 'DefMamba: Deformable Visual State Space Model' 的科研主题。它们共同构成独一无二的指纹。

引用此