跳到主要导航 跳到搜索 跳到主要内容

Amove: Accelerating LLMs through Mitigating Outliers and Salient Points via Fine-Grained Grouped Vectorized Data Type

  • Xilong Xie
  • , Liang Wang*
  • , Limin Xiao*
  • , Meng Han
  • , Lei Liu
  • , Xiangrong Xu
  • , Jinquan Wang
  • , Zhen Song
  • , Xiaojian Liao
  • *此作品的通讯作者
  • Beihang University
  • Tsinghua University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The quantization of Large Language Models (LLMs) poses significant challenges due to the heterogeneous nature of feature point distributions in low-bit quantization scenarios, including salient points, normal outliers, and massive outliers. These challenges are particularly pronounced in supporting both weight-only and weight-activation quantization modes, as existing methods often focus on a single mode and fail to address the diverse feature characteristics holistically, resulting in suboptimal model accuracy and hardware efficiency trade-offs. To tackle these limitations, we introduce Amove, a novel co-design framework that synergistically integrates data type and hardware architecture design for efficient LLM quantization. Our approach is threefold: First, we conduct a comprehensive analysis of quantization granularity and propose a residual approximation mechanism that balances model accuracy and memory overhead under fine-grained quantization. Second, we design a flexible fine-grained grouped vectorized data type, enabling seamless support for both weight-activation and low-bit weight-only quantization modes within a unified framework. Third, we implement the hardware architecture of Amove on both GPU tensor core and systolic array-based architectures. The Amove-enhanced tensor core achieves an average speedup of 2.13× and a 1.70× reduction in energy consumption over the state-of-the-art OliVe design. Furthermore, an Amove-based accelerator achieves up to 2.67× speedup and 1.68× energy reduction over the state-of-the-art accelerator.

源语言英语
主期刊名MICRO 2025 - 58th IEEE/ACM International Symposium on Microarchitecture
出版商IEEE Computer Society
854-868
页数15
ISBN(电子版)9798400715730
DOI
出版状态已出版 - 17 10月 2025
活动58th IEEE/ACM International Symposium on Microarchitecture , MICRO 2025 - Seoul, 韩国
期限: 18 10月 202522 10月 2025

出版系列

姓名Proceedings of the Annual International Symposium on Microarchitecture, MICRO
Part of 213862
ISSN(印刷版)1072-4451

会议

会议58th IEEE/ACM International Symposium on Microarchitecture , MICRO 2025
国家/地区韩国
Seoul
时期18/10/2522/10/25

联合国可持续发展目标

此成果有助于实现下列可持续发展目标:

  1. 可持续发展目标 7 - 经济适用的清洁能源
    可持续发展目标 7 经济适用的清洁能源

指纹

探究 'Amove: Accelerating LLMs through Mitigating Outliers and Salient Points via Fine-Grained Grouped Vectorized Data Type' 的科研主题。它们共同构成独一无二的指纹。

引用此