Abstract
Fine-grained visual classification (FGVC) involves classifying multiple subcategories within a unified major category, a task characterized by significant intra-class variability and minimal inter-class differences. Previous methods often rely on pre-trained visual models augmented with specialized modules, typically using large-scale models that are challenging for industrial deployment. Moreover, image data often comes with auxiliary information (e.g., spatiotemporal priors, attributes, and text descriptions), offering opportunities to enhance FGVC accuracy. Here we propose a novel lightweight Transformer-based approach that incorporates additional auxiliary information to enhance classification accuracy. Our method introduces a simplified pixel-focused aggregation attention to achieve local and global feature fusion and improves it with a separable aggregation attention to reduce model complexity. We also present the extra inside padding method for integrating auxiliary information with minimal additional parameters. Without pre-training, our model surpasses other lightweight neural networks on fine-grained datasets (e.g., a 5.3% increase in accuracy on CUB-200-2011), demonstrating a significant improvement. Our approach offers a promising direction for FGVC tasks, highlighting the effectiveness of integrating multimodal data for enhanced performance. Our source code is available at https://github.com/yang-zzy/SAA-EIP.
| Original language | English |
|---|---|
| Pages (from-to) | 11691-11704 |
| Number of pages | 14 |
| Journal | Visual Computer |
| Volume | 41 |
| Issue number | 13 |
| DOIs | |
| State | Published - Oct 2025 |
Keywords
- Extra information
- Fine-grained visual classification
- Light-weight vision transformer
- Separable aggregated attention
Fingerprint
Dive into the research topics of 'Enhanced fine-grained visual classification through lightweight Transformer integration and auxiliary information fusion'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver