Skip to main navigation Skip to search Skip to main content

Human parsing with contextualized convolutional neural network

  • Xiaodan Liang
  • , Chunyan Xu
  • , Xiaohui Shen
  • , Jianchao Yang
  • , Si Liu
  • , Jinhui Tang
  • , Liang Lin*
  • , Shuicheng Yan
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this work, we address the human parsing task with a novel Contextualized Convolutional Neural Network (Co-CNN) architecture, which well integrates the cross-layer context, global image-level context, within-super-pixel context and cross-super-pixel neighborhood context into a unified network. Given an input human image, Co-CNN produces the pixel-wise categorization in an end-to-end way. First, the cross-layer context is captured by our basic local-to-global-to-local structure, which hierarchically combines the global semantic structure and the local fine details within the cross-layers. Second, the global image-level label prediction is used as an auxiliary objective in the intermediate layer of the Co-CNN, and its outputs are further used for guiding the feature learning in subsequent convolutional layers to leverage the global image-level context. Finally, to further utilize the local super-pixel contexts, the within-super-pixel smoothing and cross-super-pixel neighbourhood voting are formulated as natural sub-components of the Co-CNN to achieve the local label consistency in both training and testing process. Comprehensive evaluations on two public datasets well demonstrate the significant superiority of our Co-CNN architecture over other state-of-the-arts for human parsing. In particular, the F-1 score on the large dataset [15] reaches 76.95% by Co-CNN, significantly higher than 62.81% and 64.38% by the state-of-the-art algorithms, M-CNN [21] and ATR [15], respectively.

Original languageEnglish
Title of host publication2015 International Conference on Computer Vision, ICCV 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1386-1394
Number of pages9
ISBN (Electronic)9781467383912
DOIs
StatePublished - 17 Feb 2015
Externally publishedYes
Event15th IEEE International Conference on Computer Vision, ICCV 2015 - Santiago, Chile
Duration: 11 Dec 201518 Dec 2015

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2015 International Conference on Computer Vision, ICCV 2015
ISSN (Print)1550-5499

Conference

Conference15th IEEE International Conference on Computer Vision, ICCV 2015
Country/TerritoryChile
CitySantiago
Period11/12/1518/12/15

Fingerprint

Dive into the research topics of 'Human parsing with contextualized convolutional neural network'. Together they form a unique fingerprint.

Cite this