Skip to main navigation Skip to search Skip to main content

Amove: Accelerating LLMs through Mitigating Outliers and Salient Points via Fine-Grained Grouped Vectorized Data Type

  • Xilong Xie
  • , Liang Wang*
  • , Limin Xiao*
  • , Meng Han
  • , Lei Liu
  • , Xiangrong Xu
  • , Jinquan Wang
  • , Zhen Song
  • , Xiaojian Liao
  • *Corresponding author for this work
  • Beihang University
  • Tsinghua University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The quantization of Large Language Models (LLMs) poses significant challenges due to the heterogeneous nature of feature point distributions in low-bit quantization scenarios, including salient points, normal outliers, and massive outliers. These challenges are particularly pronounced in supporting both weight-only and weight-activation quantization modes, as existing methods often focus on a single mode and fail to address the diverse feature characteristics holistically, resulting in suboptimal model accuracy and hardware efficiency trade-offs. To tackle these limitations, we introduce Amove, a novel co-design framework that synergistically integrates data type and hardware architecture design for efficient LLM quantization. Our approach is threefold: First, we conduct a comprehensive analysis of quantization granularity and propose a residual approximation mechanism that balances model accuracy and memory overhead under fine-grained quantization. Second, we design a flexible fine-grained grouped vectorized data type, enabling seamless support for both weight-activation and low-bit weight-only quantization modes within a unified framework. Third, we implement the hardware architecture of Amove on both GPU tensor core and systolic array-based architectures. The Amove-enhanced tensor core achieves an average speedup of 2.13× and a 1.70× reduction in energy consumption over the state-of-the-art OliVe design. Furthermore, an Amove-based accelerator achieves up to 2.67× speedup and 1.68× energy reduction over the state-of-the-art accelerator.

Original languageEnglish
Title of host publicationMICRO 2025 - 58th IEEE/ACM International Symposium on Microarchitecture
PublisherIEEE Computer Society
Pages854-868
Number of pages15
ISBN (Electronic)9798400715730
DOIs
StatePublished - 17 Oct 2025
Event58th IEEE/ACM International Symposium on Microarchitecture , MICRO 2025 - Seoul, Korea, Republic of
Duration: 18 Oct 202522 Oct 2025

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
VolumePart of 213862
ISSN (Print)1072-4451

Conference

Conference58th IEEE/ACM International Symposium on Microarchitecture , MICRO 2025
Country/TerritoryKorea, Republic of
CitySeoul
Period18/10/2522/10/25

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Fine-Grained Grouped Vectorized Data Type
  • Large Language Models
  • Quantization

Fingerprint

Dive into the research topics of 'Amove: Accelerating LLMs through Mitigating Outliers and Salient Points via Fine-Grained Grouped Vectorized Data Type'. Together they form a unique fingerprint.

Cite this