Abstract
Indoor semantic segmentation plays a critical role in many applications, such as intelligent robots. However, multi-class recognition is still challenging, especially for pixel-level indoor semantic labeling. In this paper, a novel deep structured model that combines the strengths of the widely used convolutional neural networks (CNNs) and recurrent neural networks (RNNs) is proposed. We first present a multi-information fusion model that utilizes the scene category information to fine-tune the fully convolutional network. Then, to refine the coarse outputs of CNN, the RNN is applied to the final CNN layer so that we can build an end-to-end trainable system. This Graph-RNN is transformed from a conditional random field based on superpixel segmentation graphical modeling that can utilize flexible contextual information of different neighboring regions. The experimental results on the recent large SUN RGB-D dataset demonstrate that the proposed model outperforms existing state-of-the-art methods on the challenging 40 dominant classes task (40.8 % mean IU accuracy and 69.1 % pixel accuracy). We also evaluate our model on the public NYU depth V2 dataset and achieve remarkable performance.
| Original language | English |
|---|---|
| Pages (from-to) | 735-747 |
| Number of pages | 13 |
| Journal | Visual Computer |
| Volume | 34 |
| Issue number | 5 |
| DOIs | |
| State | Published - 1 May 2018 |
Keywords
- Conditional random field
- Convolutional neural network
- Graph-RNN
- Scene classification
- Semantic segmentation
Fingerprint
Dive into the research topics of 'Multi-class indoor semantic segmentation with deep structured model'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver