TY - JOUR
T1 - Learning Discriminative and Generative Shape Embeddings for Three-Dimensional Shape Retrieval
AU - Xu, Cheng
AU - Leng, Biao
AU - Chen, Bo
AU - Zhang, Cheng
AU - Zhou, Xiaochen
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2020/9
Y1 - 2020/9
N2 - As an important solution for 3D shape retrieval, a multi-view shape descriptor has achieved impressive performance. One crucial part of view-based shape descriptors is to interpret 3D structures through various 2D observations. Most existing methods like MVCNN believe that a strong classification model trained with deep learning, can often provide an efficient shape embedding for 3D shape retrieval. However, these methods pay much attention to discriminative models and none of them necessarily incorporate the underlying 3D properties of the objects from 2D images. In this paper, we present a novel encoder-decoder recurrent feature aggregation network (ERFA-Net) to address this problem. Aiming at emphasizing the 3D properties of 3D shapes in the fusion of multiple view features, 3D properties prediction tasks are introduced into the 3D shape retrieval. Specifically, an image sequence of the shape is recurrently aggregated into a discriminative shape embedding based on LSTM network, and then this latent shape embedding is trained to predict the original voxel grids and estimate images of unseen viewpoints. This generation task gives an effective supervision which makes the network exploit 3D properties of shapes through various 2D images. Our method achieves the state-of-the-art performance for 3D shape retrieval, on two large-scale 3D shape datasets, ModelNet and ShapeNetCore55. Extensive experiments show that the proposed 3D representation performs robust discrimination against view occlusion, and strong generation ability for various 3D shape tasks.
AB - As an important solution for 3D shape retrieval, a multi-view shape descriptor has achieved impressive performance. One crucial part of view-based shape descriptors is to interpret 3D structures through various 2D observations. Most existing methods like MVCNN believe that a strong classification model trained with deep learning, can often provide an efficient shape embedding for 3D shape retrieval. However, these methods pay much attention to discriminative models and none of them necessarily incorporate the underlying 3D properties of the objects from 2D images. In this paper, we present a novel encoder-decoder recurrent feature aggregation network (ERFA-Net) to address this problem. Aiming at emphasizing the 3D properties of 3D shapes in the fusion of multiple view features, 3D properties prediction tasks are introduced into the 3D shape retrieval. Specifically, an image sequence of the shape is recurrently aggregated into a discriminative shape embedding based on LSTM network, and then this latent shape embedding is trained to predict the original voxel grids and estimate images of unseen viewpoints. This generation task gives an effective supervision which makes the network exploit 3D properties of shapes through various 2D images. Our method achieves the state-of-the-art performance for 3D shape retrieval, on two large-scale 3D shape datasets, ModelNet and ShapeNetCore55. Extensive experiments show that the proposed 3D representation performs robust discrimination against view occlusion, and strong generation ability for various 3D shape tasks.
KW - 3D shape retrieval
KW - feature aggregation
KW - recurrent neural network
UR - https://www.scopus.com/pages/publications/85090164827
U2 - 10.1109/TMM.2019.2957933
DO - 10.1109/TMM.2019.2957933
M3 - 文章
AN - SCOPUS:85090164827
SN - 1520-9210
VL - 22
SP - 2234
EP - 2245
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
IS - 9
M1 - 8931662
ER -