Skip to main navigation Skip to search Skip to main content

CMedBench: A Comprehensive Benchmark for Efficient Medical Large Language Models

  • Beihang University
  • Western University
  • SenseTime Group Limited
  • Zhejiang University Ningbo Institute of Technology

Research output: Contribution to journalConference articlepeer-review

Abstract

Large Language Models (LLMs) hold significant potential for enhancing healthcare applications, yet their deployment is hindered by high computational and memory demands. Model compression techniques offer solutions to reduce these demands, but their impact on medical LLMs remains underexplored. In this paper, we introduce CMedBench, the first comprehensive benchmark for evaluating compressed LLMs in medical contexts. CMedBench assesses five core dimensions: Medical Knowledge Ability, Medical Application Ability, Trustworthiness Maintenance, Compression Cross Combination, and Computational Efficiency. Through extensive empirical studies, we analyze the trade-offs between model efficiency and clinical performance across diverse models, datasets, and compression strategies. Our findings highlight critical limitations in current evaluation practices and provide a robust framework for aligning compression strategies with medical requirements. CMedBench serves as a vital resource for researchers and practitioners, guiding the development of efficient, trustworthy, and clinically effective LLMs for healthcare applications.

Original languageEnglish
Pages (from-to)21198-21206
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume40
Issue number25
DOIs
StatePublished - 2026
Event40th AAAI Conference on Artificial Intelligence, AAAI 2026 - Singapore, Singapore
Duration: 20 Jan 202627 Jan 2026

Fingerprint

Dive into the research topics of 'CMedBench: A Comprehensive Benchmark for Efficient Medical Large Language Models'. Together they form a unique fingerprint.

Cite this