Skip to main navigation Skip to search Skip to main content

HHA: Hybrid Control-Data Flow Heterogeneous Accelerator for Conv-transformer Neural Networks

  • Yongxiang Cao
  • , Hongxu Jiang*
  • , Wei Wang
  • , Yonghua Zhang
  • , Yixiang Zhang
  • , Yanfei Song
  • , Xinyi Wang
  • *Corresponding author for this work
  • Beihang University
  • Tsinghua University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The conv-transformer neural network (CTNN) has demonstrated exceptional performance in computer vision. When implemented on resource-constrained edge devices, CTNNs ex-hibit two critical operational challenges. The first challenge is the limited acceleration of nonlinear operators, while the second challenge is the complexity of hardware scheduling due to the variety of CTNN operators. These challenges significantly reduce the inference performance of CTNN on edge devices. We propose a hybrid control-data flow heterogeneous CTNN accelerator, HHA. To address the computational inefficiency of nonlinear operators, we propose a novel nonlinear acceleration engine(NAE) based on lookup tables(LUTs). Building on this foundation, we propose an accelerator architecture that integrates control flow and data flow in a hybrid manner, alongside a batch-by-batch heterogeneous cores scheduling algorithm(BH) to enhance the parallel computing capability of heterogeneous cores. Experimental results indicate that the NAE significantly improves the efficiency of nonlinear operations. Compared with the state-of-the-art acceleration architectures for non-linear operators, NAE achieves 1.44x, 12.52x, and 18.56x speedup in computing the Softmax, Layer Normalization(LN), and GELU operators, respectively. The BH method significantly enhances the CTNN inference speed by a factor of 3.944 through optimization of the computing and storage structure of the operators. We implemented HHA on the Xilinx UltraScale+ MPSoC ZCU102. Compared to the state-of-the-art FPGA accelerator and NVIDIA V100, HHA achieves a 1.78x and 3.74x throughput improvement and a 1.11x and 47.76x energy efficiency improvement.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Workshops, CCGridW 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages213-216
Number of pages4
ISBN (Electronic)9798331509385
DOIs
StatePublished - 2025
Event25th IEEE International Symposium on Cluster, Cloud and Internet Computing Workshops, CCGridW 2025 - Tromso, Norway
Duration: 19 May 202522 May 2025

Publication series

NameProceedings - 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Workshops, CCGridW 2025

Conference

Conference25th IEEE International Symposium on Cluster, Cloud and Internet Computing Workshops, CCGridW 2025
Country/TerritoryNorway
CityTromso
Period19/05/2522/05/25

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Conv-transformer neural network
  • FPGA
  • heterogeneous accelerator
  • nonlinear function acceleration
  • scheduling optimization

Fingerprint

Dive into the research topics of 'HHA: Hybrid Control-Data Flow Heterogeneous Accelerator for Conv-transformer Neural Networks'. Together they form a unique fingerprint.

Cite this