Abstract
Objective: Accurate identification of intracranial hemorrhage (ICH) subtypes on non-contrast CT is crucial for prognosis and treatment but remains challenging due to low contrast and blurred boundaries. This study evaluates the zero-shot performance of multi-modal large language models (MLLMs) versus traditional deep learning in ICH detection and subtyping. Methods: Using 192 NCCT volumes from the RSNA dataset, we compared MLLMs (GPT-4o, Gemini 2.0 Flash, Claude 3.5 Sonnet V2) with deep learning models (ResNet50, Vision Transformer). MLLMs were prompted for ICH presence, subtype, localization, and volume estimation. Results: Traditional deep learning models outperformed MLLMs in both ICH detection and subtyping. For subtyping, MLLMs showed lower accuracy, with Gemini 2.0 Flash achieving a macro-averaged precision of 0.41 and F1 score of 0.31. Conclusion: While MLLMs offer enhanced interpretability through language-based interaction, their accuracy in ICH subtyping remains inferior to deep learning networks. Further optimization is needed to improve their utility in three-dimensional medical imaging.
| Original language | English |
|---|---|
| Pages (from-to) | 323-330 |
| Number of pages | 8 |
| Journal | Brain Hemorrhages |
| Volume | 6 |
| Issue number | 6 |
| DOIs | |
| State | Published - Dec 2025 |
Keywords
- Intracranial hemorrhage subtyping
- Medical image classification
- Multi-modal large language models
- Validation
Fingerprint
Dive into the research topics of 'Zero-shot multi-modal large language models v.s. supervised deep learning: A comparative analysis on CT-based intracranial hemorrhage subtyping'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver