Skip to main navigation Skip to search Skip to main content

T4Di: A Hybrid TTT-Transformer Backbone for Scalable and Efficient Diffusion Model

  • Xirui Wu
  • , Haixia Pan
  • , Ruijun Liu*
  • , Biao Dong
  • , Ying Zheng
  • , Huolong Ye
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Diffusion models have achieved significant progress in image generation, with backbone architectures evolving from U-Net to Transformers. However, the quadratic complexity of Transformer-based diffusion models limits their scalability and efficiency, and this limitation becomes more prominent with increasing resolution. Linear complexity models such as Mamba partially address this issue but struggle with spatial continuity when applied to two-dimensional image data. To tackle these challenges, we propose T4Di, a hybrid backbone architecture combining the efficiency of Test-Time Training (TTT) with the global modeling capability of Transformers. By introducing multidirectional scanning and lightweight local feature enhancement modules, T4Di adapts TTT to 2D image signals, improving spatial continuity and local coherence. Moreover, we explore adaptive block composition, adjusting the ratio between Transformer and TTT components to achieve a favorable balance between generation quality and computational cost. We evaluate T4Di on both unconditional and class-conditional image generation tasks across CIFAR-10, CelebA, and ImageNet benchmarks. Experimental results demonstrate that T4Di consistently outperforms existing diffusion models in terms of both generation quality and computational efficiency, establishing it as a scalable and effective solution for image synthesis.

Original languageEnglish
Title of host publicationAdvanced Intelligent Computing Technology and Applications - 21st International Conference, ICIC 2025, Proceedings
EditorsDe-Shuang Huang, Qinhu Zhang, Chuanlei Zhang, Wei Chen
PublisherSpringer Science and Business Media Deutschland GmbH
Pages162-173
Number of pages12
ISBN (Print)9789819698110
DOIs
StatePublished - 2025
Event21st International Conference on Intelligent Computing, ICIC 2025 - Ningbo, China
Duration: 26 Jul 202529 Jul 2025

Publication series

NameLecture Notes in Computer Science
Volume15859 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Conference on Intelligent Computing, ICIC 2025
Country/TerritoryChina
CityNingbo
Period26/07/2529/07/25

Keywords

  • Diffusion
  • Hybrid
  • Image Generator
  • Test-Time Training
  • Transformer

Fingerprint

Dive into the research topics of 'T4Di: A Hybrid TTT-Transformer Backbone for Scalable and Efficient Diffusion Model'. Together they form a unique fingerprint.

Cite this