跳到主要导航 跳到搜索 跳到主要内容

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

  • Shiyao Li
  • , Yingchun Hu
  • , Xuefei Ning*
  • , Xihui Liu
  • , Ke Hong
  • , Xiaotao Jia*
  • , Xiuhong Li
  • , Yaqi Yan
  • , Pei Ran
  • , Guohao Dai
  • , Shengen Yan
  • , Huazhong Yang
  • , Yu Wang*
  • *此作品的通讯作者

科研成果: 期刊稿件会议文章同行评审

摘要

Vision-Language Models (VLMs) have enabled a variety of real-world applications. The large parameter size of VLMs brings large memory and computation overhead which poses significant challenges for deployment. Post-Training Quantization (PTQ) is an effective technique to reduce the memory and computation overhead. Existing PTQ methods mainly focus on large language models (LLMs), without considering the differences across other modalities. In this paper, we discover that there is a significant difference in sensitivity between language and vision tokens in large VLMs. Therefore, treating tokens from different modalities equally, as in existing PTQ methods, may over-emphasize the insensitive modalities, leading to significant accuracy loss. To deal with the above issue, we propose a simple yet effective method, Modality-Balanced Quantization (MBQ), for large VLMs. Specifically, MBQ incorporates the different sensitivities across modalities during the calibration process to minimize the reconstruction loss for better quantization parameters. Extensive experiments show that MBQ can significantly improve task accuracy by up to 4.4% and 11.6% under W3A16 and W4A8 quantization for 7B to 70B VLMs, compared to SOTA baselines. Additionally, we implement a W3A16 GPU kernel that fuses the dequantization and GEMV operators, achieving a 1.4× speedup on LLaVA-onevision-7B on the RTX 4090. The code is available at https://github.com/thu-nics/MBQ.

源语言英语
页(从-至)4167-4177
页数11
期刊Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOI
出版状态已出版 - 2025
活动2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 - Nashville, 美国
期限: 11 6月 202515 6月 2025

指纹

探究 'MBQ: Modality-Balanced Quantization for Large Vision-Language Models' 的科研主题。它们共同构成独一无二的指纹。

引用此