TY - GEN
T1 - OEMLLM
T2 - 37th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2025
AU - Sun, Junfeng
AU - Gu, Yunchao
AU - Wang, Xinliang
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - While Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in general-domain tasks, their application in specialized medical fields, such as the assisted diagnosis of fundus diseases, remains limited due to the lack of domain-specific knowledge. To bridge this gap, we introduce the FUNDUS-BENCH dataset, a multi-task benchmark tailored for fundus images. Based on the FUNDUSBENCH dataset, a multimodal medical auxiliary diagnosis system, Ophthalmology Expert MLLM (OEMLLM) is designed, which is an innovative system that leverages a hierarchical feature extraction method based on Vision Transformer to fully utilize both low-level lesion features and high-level semantic features from fundus images. OEMLLM further integrates with a Large Language Model (LLM) to perform multi-task learning for comprehensive fundus disease diagnosis. Extensive experiments show that OEMLLM outperforms state-of-the-art MLLMs with comparable parameter scales (approximately 2B parameters) and maintains competitive performance against larger-scale models. The dataset and code associated with this system will be open-sourced shortly, aiming to facilitate research and development of practical AI-assisted diagnostic tools in medical applications.
AB - While Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in general-domain tasks, their application in specialized medical fields, such as the assisted diagnosis of fundus diseases, remains limited due to the lack of domain-specific knowledge. To bridge this gap, we introduce the FUNDUS-BENCH dataset, a multi-task benchmark tailored for fundus images. Based on the FUNDUSBENCH dataset, a multimodal medical auxiliary diagnosis system, Ophthalmology Expert MLLM (OEMLLM) is designed, which is an innovative system that leverages a hierarchical feature extraction method based on Vision Transformer to fully utilize both low-level lesion features and high-level semantic features from fundus images. OEMLLM further integrates with a Large Language Model (LLM) to perform multi-task learning for comprehensive fundus disease diagnosis. Extensive experiments show that OEMLLM outperforms state-of-the-art MLLMs with comparable parameter scales (approximately 2B parameters) and maintains competitive performance against larger-scale models. The dataset and code associated with this system will be open-sourced shortly, aiming to facilitate research and development of practical AI-assisted diagnostic tools in medical applications.
KW - Fundus Disease Assisted Diagnosis
KW - Multi-Task Learning
KW - Multimodal Large Language Model
UR - https://www.scopus.com/pages/publications/105031910648
U2 - 10.1109/ICTAI66417.2025.00179
DO - 10.1109/ICTAI66417.2025.00179
M3 - 会议稿件
AN - SCOPUS:105031910648
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 1229
EP - 1236
BT - Proceedings - 2025 IEEE 37th International Conference on Tools with Artificial Intelligence, ICTAI 2025
PB - IEEE Computer Society
Y2 - 3 November 2025 through 5 November 2025
ER -