摘要
The quantization of Large Language Models (LLMs) poses significant challenges due to the heterogeneous nature of feature point distributions in low-bit quantization scenarios, including salient points, normal outliers, and massive outliers. These challenges are particularly pronounced in supporting both weight-only and weight-activation quantization modes, as existing methods often focus on a single mode and fail to address the diverse feature characteristics holistically, resulting in suboptimal model accuracy and hardware efficiency trade-offs. To tackle these limitations, we introduce Amove, a novel co-design framework that synergistically integrates data type and hardware architecture design for efficient LLM quantization. Our approach is threefold: First, we conduct a comprehensive analysis of quantization granularity and propose a residual approximation mechanism that balances model accuracy and memory overhead under fine-grained quantization. Second, we design a flexible fine-grained grouped vectorized data type, enabling seamless support for both weight-activation and low-bit weight-only quantization modes within a unified framework. Third, we implement the hardware architecture of Amove on both GPU tensor core and systolic array-based architectures. The Amove-enhanced tensor core achieves an average speedup of 2.13× and a 1.70× reduction in energy consumption over the state-of-the-art OliVe design. Furthermore, an Amove-based accelerator achieves up to 2.67× speedup and 1.68× energy reduction over the state-of-the-art accelerator.
| 源语言 | 英语 |
|---|---|
| 主期刊名 | MICRO 2025 - 58th IEEE/ACM International Symposium on Microarchitecture |
| 出版商 | IEEE Computer Society |
| 页 | 854-868 |
| 页数 | 15 |
| ISBN(电子版) | 9798400715730 |
| DOI | |
| 出版状态 | 已出版 - 17 10月 2025 |
| 活动 | 58th IEEE/ACM International Symposium on Microarchitecture , MICRO 2025 - Seoul, 韩国 期限: 18 10月 2025 → 22 10月 2025 |
出版系列
| 姓名 | Proceedings of the Annual International Symposium on Microarchitecture, MICRO |
|---|---|
| 卷 | Part of 213862 |
| ISSN(印刷版) | 1072-4451 |
会议
| 会议 | 58th IEEE/ACM International Symposium on Microarchitecture , MICRO 2025 |
|---|---|
| 国家/地区 | 韩国 |
| 市 | Seoul |
| 时期 | 18/10/25 → 22/10/25 |
联合国可持续发展目标
此成果有助于实现下列可持续发展目标:
-
可持续发展目标 7 经济适用的清洁能源
指纹
探究 'Amove: Accelerating LLMs through Mitigating Outliers and Salient Points via Fine-Grained Grouped Vectorized Data Type' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver