Abstract
Thyroid nodule segmentation in ultrasound images is crucial for accurate diagnosis and treatment planning. However, existing methods struggle with segmentation accuracy, interpretability, and generalization. This letter proposes CLIP-TNseg, a novel framework that integrates a multimodal large model with a neural network architecture to address these challenges. We innovatively divide visual features into coarse-grained and fine-grained components, leveraging textual integration with coarse-grained features for enhanced semantic understanding. Specifically, the Coarse-grained Branch extracts high-level semantic features from a frozen CLIP model, while the Fine-grained Branch refines spatial details using U-Net-style residual blocks. Extensive experiments on the newly collected PKTN dataset and other public datasets demonstrate the competitive performance of CLIP-TNseg. Additional ablation experiments confirm the critical contribution of textual inputs, particularly highlighting the effectiveness of our carefully designed textual prompts compared to fixed or absent textual information.
| Original language | English |
|---|---|
| Pages (from-to) | 1625-1629 |
| Number of pages | 5 |
| Journal | IEEE Signal Processing Letters |
| Volume | 32 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Thyroid nodule segmentation
- deep learning
- multimodal models
- ultrasound images
Fingerprint
Dive into the research topics of 'CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver