Abstract
Transformer model performs outstandingly in various tasks involving artificial intelligence. In this work, we propose a hybrid-RAM Transformer accelerator (HRAMTran) utilizing computing-in-memory (CIM) based on spin-orbit torque magnetic random access memory (SOT-MRAM) and static RAM (SRAM), which supports dynamic sparsity in floating-point (FP) matrix multiplication (MM) and written-back transpose, thereby realizing efficient attention mechanism. First, a dynamic sparsity-based MM scheme is proposed, which dynamically ignores low-impact elements during vector multiplication, thereby effectively reducing the latency and energy consumption of MM. Second, a data-reuse multiply-and-accumulate (MAC) scheme for mantissa is designed to further optimize MM, which shares partial operation result to reduce redundant computation. The SOT-MRAM and SRAM based CIM architectures with dynamic sparsity and data-reuse schemes are constructed to perform weight (Query (Q), Key (K), and Value (V)) and dynamic MM, respectively. This hybrid-RAM CIM method can realize the optimization of energy and latency during attention mechanism computation. Moreover, written-back transpose SRAM array that can write multiple bits into a column simultaneously is designed to significantly reduce write-back cycles for KT. Finally, the HRAMTran accelerator is built to evaluate the performance of transformer implementation through performing machine translation for the WMT14 dataset. Results show that this accelerator realizes 3.6 μJ/Token and 68.77 TFLOPS/W, achieving 4.33× and 2.39× improvement compared with the state-of-the-art transformer accelerator.
| Original language | English |
|---|---|
| Title of host publication | 2025 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2025 - Conference Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9798331515607 |
| DOIs | |
| State | Published - 2025 |
| Event | 44th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2025 - Munich, Germany Duration: 26 Oct 2025 → 30 Oct 2025 |
Publication series
| Name | IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD |
|---|---|
| ISSN (Print) | 1092-3152 |
Conference
| Conference | 44th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2025 |
|---|---|
| Country/Territory | Germany |
| City | Munich |
| Period | 26/10/25 → 30/10/25 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- Computing-in-memory (CIM)
- dynamic sparsity
- SOT-MRAM
- SRAM
- transformer model
- written-back transpose
Fingerprint
Dive into the research topics of 'HRAMTran: A Hybrid-RAM Transformer Accelerator With Dynamic Sparsity Floating-Point CIM and Written-Back Transpose Array'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver