An RRAM-Based Computing-in-Memory Architecture and Its Application in Accelerating Transformer Inference

  • Zhaojun Lu
  • , Xueyan Wang*
  • , Md Tanvir Arafin
  • , Haoxiang Yang
  • , Zhenglin Liu
  • , Jiliang Zhang
  • , Gang Qu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Deep neural network (DNN)-based transformer models have demonstrated remarkable performance in natural language processing (NLP) applications. Unfortunately, the unique scaled dot-product attention mechanism and intensive memory access pose a significant challenge during inference on power-constrained edge devices. One emerging solution to this challenge is computing-in-memory (CIM), which uses memory cells for logic computation to reduce data movement and overcome the memory wall. However, existing CIM designs do not support high-precision computations, such as floating-point operations, which are essential for NLP applications. Furthermore, CIM architectures require complex control modules and costly peripheral circuits to harness the full potential of in-memory computation. Hence, this article proposes a scalable RRAM-based in-memory floating-point computation architecture (RIME) that uses single-cycle NOR, NAND, and minority logic to implement in-memory floating-point operations. RIME features efficient parallel and pipeline capabilities with a centralized control module and a simplified peripheral circuit to eliminate data movement during computation. Furthermore, the article proposes pipelined implementations of matrix-matrix multiplication (MatMul) and softmax functions, enabling the construction of a transformer accelerator based on RIME. Extensive experimental results show that compared with GPU-based implementation, the RIME-based transformer accelerator improves timing efficiency by 2.3\times and energy efficiency by 1.7\times without compromising inference accuracy.

Original languageEnglish
Pages (from-to)485-496
Number of pages12
JournalIEEE Transactions on Very Large Scale Integration (VLSI) Systems
Volume32
Issue number3
DOIs
StatePublished - 1 Mar 2024

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Accelerator
  • computing-in-memory (CIM)
  • energy efficiency
  • resistive random access memory (RRAM)
  • scalability
  • transformer

Fingerprint

Dive into the research topics of 'An RRAM-Based Computing-in-Memory Architecture and Its Application in Accelerating Transformer Inference'. Together they form a unique fingerprint.

Cite this