Abstract
The conv-transformer neural network (CTNN) has demonstrated exceptional performance in computer vision. When implemented on resource-constrained edge devices, CTNNs ex-hibit two critical operational challenges. The first challenge is the limited acceleration of nonlinear operators, while the second challenge is the complexity of hardware scheduling due to the variety of CTNN operators. These challenges significantly reduce the inference performance of CTNN on edge devices. We propose a hybrid control-data flow heterogeneous CTNN accelerator, HHA. To address the computational inefficiency of nonlinear operators, we propose a novel nonlinear acceleration engine(NAE) based on lookup tables(LUTs). Building on this foundation, we propose an accelerator architecture that integrates control flow and data flow in a hybrid manner, alongside a batch-by-batch heterogeneous cores scheduling algorithm(BH) to enhance the parallel computing capability of heterogeneous cores. Experimental results indicate that the NAE significantly improves the efficiency of nonlinear operations. Compared with the state-of-the-art acceleration architectures for non-linear operators, NAE achieves 1.44x, 12.52x, and 18.56x speedup in computing the Softmax, Layer Normalization(LN), and GELU operators, respectively. The BH method significantly enhances the CTNN inference speed by a factor of 3.944 through optimization of the computing and storage structure of the operators. We implemented HHA on the Xilinx UltraScale+ MPSoC ZCU102. Compared to the state-of-the-art FPGA accelerator and NVIDIA V100, HHA achieves a 1.78x and 3.74x throughput improvement and a 1.11x and 47.76x energy efficiency improvement.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Workshops, CCGridW 2025 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 213-216 |
| Number of pages | 4 |
| ISBN (Electronic) | 9798331509385 |
| DOIs | |
| State | Published - 2025 |
| Event | 25th IEEE International Symposium on Cluster, Cloud and Internet Computing Workshops, CCGridW 2025 - Tromso, Norway Duration: 19 May 2025 → 22 May 2025 |
Publication series
| Name | Proceedings - 2025 IEEE 25th International Symposium on Cluster, Cloud and Internet Computing Workshops, CCGridW 2025 |
|---|
Conference
| Conference | 25th IEEE International Symposium on Cluster, Cloud and Internet Computing Workshops, CCGridW 2025 |
|---|---|
| Country/Territory | Norway |
| City | Tromso |
| Period | 19/05/25 → 22/05/25 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- Conv-transformer neural network
- FPGA
- heterogeneous accelerator
- nonlinear function acceleration
- scheduling optimization
Fingerprint
Dive into the research topics of 'HHA: Hybrid Control-Data Flow Heterogeneous Accelerator for Conv-transformer Neural Networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver