TY - GEN
T1 - Enhanced feature pyramid network for semantic segmentation
AU - Ye, Mucong
AU - Ouyang, Jingpeng
AU - Chen, Ge
AU - Zhang, Jing
AU - Yu, Xiaogang
N1 - Publisher Copyright:
© 2020 IEEE
PY - 2020
Y1 - 2020
N2 - Multi-scale feature fusion has been an effective way for improving the performance of semantic segmentation. However, current methods generally fail to consider the semantic gaps between the shallow (low-level) and deep (high-level) features and thus the fusion methods may not be optimal. In this paper, to address the issues of the semantic gap between the feature from different layers, we propose a unified framework based on the U-shape encoder-decoder architecture, named Enhanced Feature Pyramid Network (EFPN). Specifically, the semantic enhancement module (SEM), edge extraction module (EEM), and context aggregation model (CAM) are incorporated into the decoder network to improve the robustness of the multilevel features aggregation. In addition, a global fusion model (GFM), which in the encoder branch is proposed to capture more semantic information in the deep layers and effectively transmit the high-level semantic features to each layer. Extensive experiments are conducted and the results show that the proposed framework achieves the state-of-the-art results on three public datasets, namely PASCAL VOC 2012, Cityscapes, and PASCAL Context. Furthermore, we also demonstrate that the proposed method is effective for other visual tasks that require frequent fusing features and upsampling.
AB - Multi-scale feature fusion has been an effective way for improving the performance of semantic segmentation. However, current methods generally fail to consider the semantic gaps between the shallow (low-level) and deep (high-level) features and thus the fusion methods may not be optimal. In this paper, to address the issues of the semantic gap between the feature from different layers, we propose a unified framework based on the U-shape encoder-decoder architecture, named Enhanced Feature Pyramid Network (EFPN). Specifically, the semantic enhancement module (SEM), edge extraction module (EEM), and context aggregation model (CAM) are incorporated into the decoder network to improve the robustness of the multilevel features aggregation. In addition, a global fusion model (GFM), which in the encoder branch is proposed to capture more semantic information in the deep layers and effectively transmit the high-level semantic features to each layer. Extensive experiments are conducted and the results show that the proposed framework achieves the state-of-the-art results on three public datasets, namely PASCAL VOC 2012, Cityscapes, and PASCAL Context. Furthermore, we also demonstrate that the proposed method is effective for other visual tasks that require frequent fusing features and upsampling.
UR - https://www.scopus.com/pages/publications/85110449703
U2 - 10.1109/ICPR48806.2021.9413224
DO - 10.1109/ICPR48806.2021.9413224
M3 - 会议稿件
AN - SCOPUS:85110449703
T3 - Proceedings - International Conference on Pattern Recognition
SP - 3209
EP - 3216
BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th International Conference on Pattern Recognition, ICPR 2020
Y2 - 10 January 2021 through 15 January 2021
ER -