Skip to main navigation Skip to search Skip to main content

JOVS: Joint Optimization of Vectorization and Scheduling for DNN on AI DSPs

  • Beihang University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent embedded devices have integrated digital signal processors (DSPs) to balance performance and power when executing complex Deep Neural Network (DNN) workloads. With modern AI DSPs providing specialized tensor computation vector instructions and limited on-chip memory, fully releasing the potential of these DSPs remains a significant challenge. The performance of AI DSPs relies heavily on vendor-provided libraries and compilers. In practice, vendor-provided libraries are inflexible and prevent further optimization. State-of-the-art compilers usually focus on a single optimization (vectorization or scheduling), which is insufficient to address this challenge. In this paper, we propose JOVS, a Joint Optimization of Vectorization and Scheduling to accelerate DNN inference on AI DSPs. The key is to prioritize the selection of novel specialized instructions to implement operators, reducing the joint optimization space. For vectorization, we design a mapping and layout scheme for the specialized instruction and perform global instruction selection. For scheduling, we propose a two-step scheduling strategy for finegrained optimization. Finally, we evaluate JOVS on five popular DNNs. The experimental results show that JOVS achieves 1.44× speedup over the vendor-provided library. Compared to state-of-the-art compilers, JOVS also achieves 1.83× speedup.

Original languageEnglish
Title of host publicationSPAA 2025 - Proceedings of the 2025 37th ACM Symposium on Parallelism in Algorithms and Architectures
PublisherAssociation for Computing Machinery
Pages487-498
Number of pages12
ISBN (Electronic)9798400712586
DOIs
StatePublished - 16 Jul 2025
Event37th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2025 - Portland, United States
Duration: 28 Jul 20251 Aug 2025

Publication series

NameAnnual ACM Symposium on Parallelism in Algorithms and Architectures
ISSN (Print)1548-6109

Conference

Conference37th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2025
Country/TerritoryUnited States
CityPortland
Period28/07/251/08/25

Keywords

  • Compiler Optimization
  • DSPs
  • Scheduling
  • Vectorization

Fingerprint

Dive into the research topics of 'JOVS: Joint Optimization of Vectorization and Scheduling for DNN on AI DSPs'. Together they form a unique fingerprint.

Cite this