TY - JOUR
T1 - Patch2Space
T2 - a registration-free segmentation method for misaligned multimodal medical images
AU - Tang, Zhenyu
AU - Li, Shuaishuai
AU - Ding, Chaowei
AU - Wang, Jinda
AU - Pan, Junjun
AU - Zang, Jie
N1 - Publisher Copyright:
© 2026 Institute of Physics and Engineering in Medicine. All rights, including for text and data mining, AI training, and similar technologies, are reserved.
PY - 2026/2/28
Y1 - 2026/2/28
N2 - Objective. Multimodal images contain complementary information that is valuable for deep learning (DL)-based image segmentation. To enable effective multimodal feature learning and fusion for accurate segmentation, multimodal images usually need to be registered to achieve anatomical alignment. However, in clinical settings, multimodal image registration is often challenging. For instance, to reduce radiation exposure, CT scans usually have a smaller field of view than MR, i.e. inconsistent anatomical content in CT and MR images, hindering accurate registration. Using such misaligned multimodal images, segmentation performance could be significantly degraded. This study aims to develop a DL-based multimodal image segmentation method that is capable of learning high-quality and strongly related image features from misaligned multimodal images without registration and produce accurate segmentation results comparable to that obtained with well-aligned multimodal images. Approach. In our method, a unified body space (UBS) module is presented, where image patches cropped from misaligned modalities are encoded to positions and projected into a UBS, thereby largely mitigating the misalignment among multimodal images. Built upon the UBS module, a new spatial-attention is proposed and integrated into a multilevel feature fusion (MFF) module, where features learned from misaligned multimodal images are effectively fused at internal-, spatial-, and modal-levels, leading the segmentation of misaligned multimodal images to a high accuracy level. Main results. We validate our method on both public and in-house multimodal image datasets containing 1472 patients. Experimental results demonstrate that our method outperforms state-of-the-art methods. The ablation study further confirms that the UBS modules can accurately project image patches from different modalities into the UBS. Moreover, the internal-, spatial-, and modal-level feature fusion in the MFF module substantially enhances segmentation accuracy for misaligned multimodal images. Significance. Our method presents a new registration-free multimodal segmentation framework that explicitly models the correspondence between image patches and anatomical positions, enabling effective fusion of misaligned modalities and improved segmentation performance in realistic clinical scenarios. Codes of our method are available at https://github.com/BH-MICom/Patch2Space.
AB - Objective. Multimodal images contain complementary information that is valuable for deep learning (DL)-based image segmentation. To enable effective multimodal feature learning and fusion for accurate segmentation, multimodal images usually need to be registered to achieve anatomical alignment. However, in clinical settings, multimodal image registration is often challenging. For instance, to reduce radiation exposure, CT scans usually have a smaller field of view than MR, i.e. inconsistent anatomical content in CT and MR images, hindering accurate registration. Using such misaligned multimodal images, segmentation performance could be significantly degraded. This study aims to develop a DL-based multimodal image segmentation method that is capable of learning high-quality and strongly related image features from misaligned multimodal images without registration and produce accurate segmentation results comparable to that obtained with well-aligned multimodal images. Approach. In our method, a unified body space (UBS) module is presented, where image patches cropped from misaligned modalities are encoded to positions and projected into a UBS, thereby largely mitigating the misalignment among multimodal images. Built upon the UBS module, a new spatial-attention is proposed and integrated into a multilevel feature fusion (MFF) module, where features learned from misaligned multimodal images are effectively fused at internal-, spatial-, and modal-levels, leading the segmentation of misaligned multimodal images to a high accuracy level. Main results. We validate our method on both public and in-house multimodal image datasets containing 1472 patients. Experimental results demonstrate that our method outperforms state-of-the-art methods. The ablation study further confirms that the UBS modules can accurately project image patches from different modalities into the UBS. Moreover, the internal-, spatial-, and modal-level feature fusion in the MFF module substantially enhances segmentation accuracy for misaligned multimodal images. Significance. Our method presents a new registration-free multimodal segmentation framework that explicitly models the correspondence between image patches and anatomical positions, enabling effective fusion of misaligned modalities and improved segmentation performance in realistic clinical scenarios. Codes of our method are available at https://github.com/BH-MICom/Patch2Space.
KW - image segmentation
KW - misaligned modalities
KW - multilevel feature fusion
KW - spatial coding
KW - unified body space
UR - https://www.scopus.com/pages/publications/105030585715
U2 - 10.1088/1361-6560/ae4286
DO - 10.1088/1361-6560/ae4286
M3 - 文章
C2 - 41643314
AN - SCOPUS:105030585715
SN - 0031-9155
VL - 71
JO - Physics in Medicine and Biology
JF - Physics in Medicine and Biology
IS - 4
M1 - 045008
ER -