TY - GEN
T1 - Pyramid Grafting Network for One-Stage High Resolution Saliency Detection
AU - Xie, Chenxi
AU - Xia, Changqun
AU - Ma, Mingcan
AU - Zhao, Zhirui
AU - Chen, Xiaowu
AU - Li, Jia
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Recent salient object detection (SOD) methods based on deep neural network have achieved remarkable performance. However, most of existing SOD models designed for low-resolution input perform poorly on high-resolution images due to the contradiction between the sampling depth and the receptive field size. Aiming at resolving this con-tradiction, we propose a novel one-stage framework called Pyramid Grafting Network (PGNet), using transformer and CNN backbone to extract features from different resolution images independently and then graft the features from transformer branch to CNN branch. An attention-based Cross-Model Grafting Module (CMGM) is proposed to en-able CNN branch to combine broken detailed information more holistically, guided by different source feature during decoding process. Moreover, we design an Attention Guided Loss (AGL) to explicitly supervise the attention matrix generated by CMGM to help the network better interact with the attention from different models. We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-SK resolutions. To our knowledge, it is the largest dataset in both quantity and resolution for high-resolution SOD task, which can be used for training and testing in future research. Sufficient exper-iments on UHRSD and widely-used SOD datasets demon-strate that our method achieves superior performance compared to the state-of-the-art methods.
AB - Recent salient object detection (SOD) methods based on deep neural network have achieved remarkable performance. However, most of existing SOD models designed for low-resolution input perform poorly on high-resolution images due to the contradiction between the sampling depth and the receptive field size. Aiming at resolving this con-tradiction, we propose a novel one-stage framework called Pyramid Grafting Network (PGNet), using transformer and CNN backbone to extract features from different resolution images independently and then graft the features from transformer branch to CNN branch. An attention-based Cross-Model Grafting Module (CMGM) is proposed to en-able CNN branch to combine broken detailed information more holistically, guided by different source feature during decoding process. Moreover, we design an Attention Guided Loss (AGL) to explicitly supervise the attention matrix generated by CMGM to help the network better interact with the attention from different models. We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-SK resolutions. To our knowledge, it is the largest dataset in both quantity and resolution for high-resolution SOD task, which can be used for training and testing in future research. Sufficient exper-iments on UHRSD and widely-used SOD datasets demon-strate that our method achieves superior performance compared to the state-of-the-art methods.
KW - Segmentation
KW - grouping and shape analysis
UR - https://www.scopus.com/pages/publications/85133308171
U2 - 10.1109/CVPR52688.2022.01142
DO - 10.1109/CVPR52688.2022.01142
M3 - 会议稿件
AN - SCOPUS:85133308171
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 11707
EP - 11716
BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PB - IEEE Computer Society
T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Y2 - 19 June 2022 through 24 June 2022
ER -