Skip to main navigation Skip to search Skip to main content

MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition

  • Beihang University

Research output: Contribution to journalArticlepeer-review

Abstract

This paper researches on the problem of object recognition using RGB-D data. Although deep convolutional neural networks have so far made progress in this area, they are still suffering a lot from lack of large-scale manually labeled RGB-D data. Labeling large-scale RGB-D dataset is a time-consuming and boring task. More importantly, such large-scale datasets often exist a long tail, and those hard positive examples of the tail can hardly be recognized. To solve these problems, we propose a multimodal self-augmentation and adversarial network (MSANet) for RGB-D object recognition, which can augment the data effectively at two levels while keeping the annotations. Toward the first level, series of transformations are leveraged to generate class-agnostic examples for each instance, which supports the training of our MSANet. Toward the second level, an adversarial network is proposed to generate class-specific hard positive examples while learning to classify them correctly to further improve the performance of our MSANet. Via the above schemes, the proposed approach wins the best results on several available RGB-D object recognition datasets, e.g., our experimental results indicate a 1.5% accuracy boost on benchmark Washington RGB-D object dataset compared with the current state of the art.

Original languageEnglish
Pages (from-to)1583-1594
Number of pages12
JournalVisual Computer
Volume35
Issue number11
DOIs
StatePublished - 1 Nov 2019

Keywords

  • Adversarial network
  • Deep learning
  • Multimodal
  • Object recognition

Fingerprint

Dive into the research topics of 'MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition'. Together they form a unique fingerprint.

Cite this