TY - GEN
T1 - Convnets-based action recognition from depth maps through virtual cameras and pseudocoloring
AU - Wang, Pichao
AU - Li, Wanqing
AU - Gao, Zhimin
AU - Tang, Chang
AU - Zhang, Jing
AU - Ogunbona, Philip
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/10/13
Y1 - 2015/10/13
N2 - In this paper, we propose to adopt ConvNets to recognize human actions from depth maps on relatively small datasets based on Depth Motion Maps (DMMs). In particular, three strategies are developed to effectively leverage the capability of ConvNets in mining discriminative features for recognition. Firstly, different viewpoints are mimicked by rotating virtual cameras around subject represented by the 3D points of the captured depth maps. This not only synthesizes more data from the captured ones, but also makes the trained ConvNets view-Tolerant. Secondly, DMMs are constructed and further enhanced for recognition by encoding them into Pseudo-RGB images, turning the spatial-Temporal motion patterns into textures and edges. Lastly, through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the colorcoded DMMs constructed in three orthogonal planes. The proposed algorithm was extensively evaluated on MSRAction3D, MSRAction3DExt and UTKinect-Action datasets and achieved the stateof-the-Art results on these datasets.
AB - In this paper, we propose to adopt ConvNets to recognize human actions from depth maps on relatively small datasets based on Depth Motion Maps (DMMs). In particular, three strategies are developed to effectively leverage the capability of ConvNets in mining discriminative features for recognition. Firstly, different viewpoints are mimicked by rotating virtual cameras around subject represented by the 3D points of the captured depth maps. This not only synthesizes more data from the captured ones, but also makes the trained ConvNets view-Tolerant. Secondly, DMMs are constructed and further enhanced for recognition by encoding them into Pseudo-RGB images, turning the spatial-Temporal motion patterns into textures and edges. Lastly, through transferring learning the models originally trained over ImageNet for image classification, the three ConvNets are trained independently on the colorcoded DMMs constructed in three orthogonal planes. The proposed algorithm was extensively evaluated on MSRAction3D, MSRAction3DExt and UTKinect-Action datasets and achieved the stateof-the-Art results on these datasets.
KW - Action Recognition
KW - ConvNets
KW - Pseudocoloring
KW - Virtual Cameras
UR - https://www.scopus.com/pages/publications/84962878607
U2 - 10.1145/2733373.2806296
DO - 10.1145/2733373.2806296
M3 - 会议稿件
AN - SCOPUS:84962878607
T3 - MM 2015 - Proceedings of the 2015 ACM Multimedia Conference
SP - 1119
EP - 1122
BT - MM 2015 - Proceedings of the 2015 ACM Multimedia Conference
PB - Association for Computing Machinery, Inc
T2 - 23rd ACM International Conference on Multimedia, MM 2015
Y2 - 26 October 2015 through 30 October 2015
ER -