Skip to main navigation Skip to search Skip to main content

Saliency Based Data Augmentation for Few-Shot Video Action Recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Despite the progress made in few-shot video action recognition, existing methods still struggle to achieve satisfactory performance when support samples are limited (e.g., 1-shot task). This paper proposes to augment training samples without relying on additional supervision and labor costs, aiming at improving generalizability of learned representations. We introduce a novel self-supervised salient object detection model which results in frame-level saliency and background features of videos. A shared encoder is employed to fuse saliency and background information from different videos. Both intra- and inter-class fusion are performed, in which the latter is controlled by prior probability to avoid semantic ambiguities. This way actually corresponds to augment training data in feature space. The saliency-background representations formed from query and support videos are used to construct class prototypes through Temporal-Relational CrossTransformers. Experimental results on four standard benchmarks demonstrate that the proposed method outperforms state-of-the-arts under various few-shot settings, particularly excelling in the 1-shot case.

Original languageEnglish
Title of host publicationMultiMedia Modeling - 31st International Conference on Multimedia Modeling, MMM 2025, Proceedings
EditorsIchiro Ide, Ioannis Kompatsiaris, Changsheng Xu, Keiji Yanai, Wei-Ta Chu, Naoko Nitta, Michael Riegler, Toshihiko Yamasaki
PublisherSpringer Science and Business Media Deutschland GmbH
Pages367-380
Number of pages14
ISBN (Print)9789819620630
DOIs
StatePublished - 2025
Event31st International Conference on Multimedia Modeling, MMM 2025 - Nara, Japan
Duration: 8 Jan 202510 Jan 2025

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15522 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference31st International Conference on Multimedia Modeling, MMM 2025
Country/TerritoryJapan
CityNara
Period8/01/2510/01/25

Keywords

  • Action recognition
  • Data augmentation
  • Few-shot learning
  • Saliency

Fingerprint

Dive into the research topics of 'Saliency Based Data Augmentation for Few-Shot Video Action Recognition'. Together they form a unique fingerprint.

Cite this