TY - GEN
T1 - Sense beauty via face, dressing, and/or voice
AU - Nguyen, Tam V.
AU - Liu, Si
AU - Ni, Bingbing
AU - Tan, Jun
AU - Rui, Yong
AU - Yan, Shuicheng
PY - 2012
Y1 - 2012
N2 - Discovering the secret of beauty has been the pursuit of artists and philosophers for centuries. Nowadays, the computational model for beauty estimation has been actively explored in computer science community, yet with the focus mainly on facial features. In this work, we perform a comprehensive study of female attractiveness conveyed by single/multiple modalities of cues, i.e., face, dressing and/or voice, and aim to uncover how different modalities individually and collectively affect the human sense of beauty. To this end, we collect the first Multi-Modality Beauty (M2B) dataset in the world for female attractiveness study, which is thoroughly annotated with attractiveness levels converted from manual k-wise ratings and semantic attributes of different modalities. A novel Dual-supervised Feature-Attribute- Task (DFAT) network is proposed to jointly learn the beauty estimation models of single/multiple modalities as well as the attribute estimation models. The DFAT network differentiates itself by its supervision in both attribute and task layers. Several interesting beauty-sense observations over single/multiple modalities are reported, and the extensive experimental evaluations on the collected M2B dataset well demonstrate the effectiveness of the proposed DFAT network for female attractiveness estimation.
AB - Discovering the secret of beauty has been the pursuit of artists and philosophers for centuries. Nowadays, the computational model for beauty estimation has been actively explored in computer science community, yet with the focus mainly on facial features. In this work, we perform a comprehensive study of female attractiveness conveyed by single/multiple modalities of cues, i.e., face, dressing and/or voice, and aim to uncover how different modalities individually and collectively affect the human sense of beauty. To this end, we collect the first Multi-Modality Beauty (M2B) dataset in the world for female attractiveness study, which is thoroughly annotated with attractiveness levels converted from manual k-wise ratings and semantic attributes of different modalities. A novel Dual-supervised Feature-Attribute- Task (DFAT) network is proposed to jointly learn the beauty estimation models of single/multiple modalities as well as the attribute estimation models. The DFAT network differentiates itself by its supervision in both attribute and task layers. Several interesting beauty-sense observations over single/multiple modalities are reported, and the extensive experimental evaluations on the collected M2B dataset well demonstrate the effectiveness of the proposed DFAT network for female attractiveness estimation.
KW - attributes
KW - dual-supervised feature-attribute-task network
KW - {face, dressing, voice} attractiveness
UR - https://www.scopus.com/pages/publications/84871382660
U2 - 10.1145/2393347.2393385
DO - 10.1145/2393347.2393385
M3 - 会议稿件
AN - SCOPUS:84871382660
SN - 9781450310895
T3 - MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia
SP - 239
EP - 248
BT - MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia
T2 - 20th ACM International Conference on Multimedia, MM 2012
Y2 - 29 October 2012 through 2 November 2012
ER -