Abstract
The quantization of Large Language Models (LLMs) poses significant challenges due to the heterogeneous nature of feature point distributions in low-bit quantization scenarios, including salient points, normal outliers, and massive outliers. These challenges are particularly pronounced in supporting both weight-only and weight-activation quantization modes, as existing methods often focus on a single mode and fail to address the diverse feature characteristics holistically, resulting in suboptimal model accuracy and hardware efficiency trade-offs. To tackle these limitations, we introduce Amove, a novel co-design framework that synergistically integrates data type and hardware architecture design for efficient LLM quantization. Our approach is threefold: First, we conduct a comprehensive analysis of quantization granularity and propose a residual approximation mechanism that balances model accuracy and memory overhead under fine-grained quantization. Second, we design a flexible fine-grained grouped vectorized data type, enabling seamless support for both weight-activation and low-bit weight-only quantization modes within a unified framework. Third, we implement the hardware architecture of Amove on both GPU tensor core and systolic array-based architectures. The Amove-enhanced tensor core achieves an average speedup of 2.13× and a 1.70× reduction in energy consumption over the state-of-the-art OliVe design. Furthermore, an Amove-based accelerator achieves up to 2.67× speedup and 1.68× energy reduction over the state-of-the-art accelerator.
| Original language | English |
|---|---|
| Title of host publication | MICRO 2025 - 58th IEEE/ACM International Symposium on Microarchitecture |
| Publisher | IEEE Computer Society |
| Pages | 854-868 |
| Number of pages | 15 |
| ISBN (Electronic) | 9798400715730 |
| DOIs | |
| State | Published - 17 Oct 2025 |
| Event | 58th IEEE/ACM International Symposium on Microarchitecture , MICRO 2025 - Seoul, Korea, Republic of Duration: 18 Oct 2025 → 22 Oct 2025 |
Publication series
| Name | Proceedings of the Annual International Symposium on Microarchitecture, MICRO |
|---|---|
| Volume | Part of 213862 |
| ISSN (Print) | 1072-4451 |
Conference
| Conference | 58th IEEE/ACM International Symposium on Microarchitecture , MICRO 2025 |
|---|---|
| Country/Territory | Korea, Republic of |
| City | Seoul |
| Period | 18/10/25 → 22/10/25 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- Fine-Grained Grouped Vectorized Data Type
- Large Language Models
- Quantization
Fingerprint
Dive into the research topics of 'Amove: Accelerating LLMs through Mitigating Outliers and Salient Points via Fine-Grained Grouped Vectorized Data Type'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver