Skip to main navigation Skip to search Skip to main content

HRAMTran: A Hybrid-RAM Transformer Accelerator With Dynamic Sparsity Floating-Point CIM and Written-Back Transpose Array

  • Beihang University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Transformer model performs outstandingly in various tasks involving artificial intelligence. In this work, we propose a hybrid-RAM Transformer accelerator (HRAMTran) utilizing computing-in-memory (CIM) based on spin-orbit torque magnetic random access memory (SOT-MRAM) and static RAM (SRAM), which supports dynamic sparsity in floating-point (FP) matrix multiplication (MM) and written-back transpose, thereby realizing efficient attention mechanism. First, a dynamic sparsity-based MM scheme is proposed, which dynamically ignores low-impact elements during vector multiplication, thereby effectively reducing the latency and energy consumption of MM. Second, a data-reuse multiply-and-accumulate (MAC) scheme for mantissa is designed to further optimize MM, which shares partial operation result to reduce redundant computation. The SOT-MRAM and SRAM based CIM architectures with dynamic sparsity and data-reuse schemes are constructed to perform weight (Query (Q), Key (K), and Value (V)) and dynamic MM, respectively. This hybrid-RAM CIM method can realize the optimization of energy and latency during attention mechanism computation. Moreover, written-back transpose SRAM array that can write multiple bits into a column simultaneously is designed to significantly reduce write-back cycles for KT. Finally, the HRAMTran accelerator is built to evaluate the performance of transformer implementation through performing machine translation for the WMT14 dataset. Results show that this accelerator realizes 3.6 μJ/Token and 68.77 TFLOPS/W, achieving 4.33× and 2.39× improvement compared with the state-of-the-art transformer accelerator.

Original languageEnglish
Title of host publication2025 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2025 - Conference Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331515607
DOIs
StatePublished - 2025
Event44th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2025 - Munich, Germany
Duration: 26 Oct 202530 Oct 2025

Publication series

NameIEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
ISSN (Print)1092-3152

Conference

Conference44th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2025
Country/TerritoryGermany
CityMunich
Period26/10/2530/10/25

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Computing-in-memory (CIM)
  • dynamic sparsity
  • SOT-MRAM
  • SRAM
  • transformer model
  • written-back transpose

Fingerprint

Dive into the research topics of 'HRAMTran: A Hybrid-RAM Transformer Accelerator With Dynamic Sparsity Floating-Point CIM and Written-Back Transpose Array'. Together they form a unique fingerprint.

Cite this