TY - GEN
T1 - Zero-Shot Nuclei Detection via Visual-Language Pre-trained Models
AU - Wu, Yongjian
AU - Zhou, Yang
AU - Saiyin, Jiya
AU - Wei, Bingzheng
AU - Lai, Maode
AU - Shou, Jianzhong
AU - Fan, Yubo
AU - Xu, Yan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.
PY - 2023
Y1 - 2023
N2 - Large-scale visual-language pre-trained models (VLPM) have proven their excellent performance in downstream object detection for natural scenes. However, zero-shot nuclei detection on H &E images via VLPMs remains underexplored. The large gap between medical images and the web-originated text-image pairs used for pre-training makes it a challenging task. In this paper, we attempt to explore the potential of the object-level VLPM, Grounded Language-Image Pre-training (GLIP) model, for zero-shot nuclei detection. Concretely, an automatic prompts design pipeline is devised based on the association binding trait of VLPM and the image-to-text VLPM BLIP, avoiding empirical manual prompts engineering. We further establish a self-training framework, using the automatically designed prompts to generate the preliminary results as pseudo labels from GLIP and refine the predicted boxes in an iterative manner. Our method achieves a remarkable performance for label-free nuclei detection, surpassing other comparison methods. Foremost, our work demonstrates that the VLPM pre-trained on natural image-text pairs exhibits astonishing potential for downstream tasks in the medical field as well. Code will be released at github.com/VLPMNuD.
AB - Large-scale visual-language pre-trained models (VLPM) have proven their excellent performance in downstream object detection for natural scenes. However, zero-shot nuclei detection on H &E images via VLPMs remains underexplored. The large gap between medical images and the web-originated text-image pairs used for pre-training makes it a challenging task. In this paper, we attempt to explore the potential of the object-level VLPM, Grounded Language-Image Pre-training (GLIP) model, for zero-shot nuclei detection. Concretely, an automatic prompts design pipeline is devised based on the association binding trait of VLPM and the image-to-text VLPM BLIP, avoiding empirical manual prompts engineering. We further establish a self-training framework, using the automatically designed prompts to generate the preliminary results as pseudo labels from GLIP and refine the predicted boxes in an iterative manner. Our method achieves a remarkable performance for label-free nuclei detection, surpassing other comparison methods. Foremost, our work demonstrates that the VLPM pre-trained on natural image-text pairs exhibits astonishing potential for downstream tasks in the medical field as well. Code will be released at github.com/VLPMNuD.
KW - Nuclei Detection
KW - Prompt Designing
KW - Unsupervised Learning
KW - Visual-Language Pre-trained Models
KW - Zero-shot Learning
UR - https://www.scopus.com/pages/publications/85174687674
U2 - 10.1007/978-3-031-43987-2_67
DO - 10.1007/978-3-031-43987-2_67
M3 - 会议稿件
AN - SCOPUS:85174687674
SN - 9783031439865
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 693
EP - 703
BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 - 26th International Conference, Proceedings
A2 - Greenspan, Hayit
A2 - Greenspan, Hayit
A2 - Madabhushi, Anant
A2 - Mousavi, Parvin
A2 - Salcudean, Septimiu
A2 - Duncan, James
A2 - Syeda-Mahmood, Tanveer
A2 - Taylor, Russell
PB - Springer Science and Business Media Deutschland GmbH
T2 - 26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023
Y2 - 8 October 2023 through 12 October 2023
ER -