TY - JOUR
T1 - Porn2Vec
T2 - A robust framework for detecting pornographic websites based on contrastive learning
AU - Zhao, Jun
AU - Shao, Minglai
AU - Peng, Hao
AU - Wang, Hong
AU - Li, Bo
AU - Liu, Xudong
N1 - Publisher Copyright:
© 2021
PY - 2021/9/27
Y1 - 2021/9/27
N2 - Pornographic websites have become one of the largest origins spreading vulgar contents, which seriously threaten the mental and physical health of juveniles. Unfortunately, the existing pornography detection approaches are ineffective against the pornographic websites, which are armed with adversarial attack examples. In this paper, we propose Porn2Vec, a robust end-to-end framework for detecting pornographic websites using contrastive learning. Particularly, we first model pornographic websites with a heterogeneous graph consisting of websites, webpages, images, texts, and their interactive relationships, and formalize pornographic website detection into node classification task on the graph. Subsequently, we present a novel contrastive learning based heterogeneous graph embedding method to learn the high-level representation of websites by jointly aggregating image-based, text-based, and structure-based features. Finally, the learned website features are fed into a neural network to train an automatic model for pornographic website detection. Experimental results show that Porn2Vec outperforms the existing state-of-the-art methods, demonstrating a more promising and robust performance for detecting well-disguised pornographic websites equipped with adversarial attack examples.
AB - Pornographic websites have become one of the largest origins spreading vulgar contents, which seriously threaten the mental and physical health of juveniles. Unfortunately, the existing pornography detection approaches are ineffective against the pornographic websites, which are armed with adversarial attack examples. In this paper, we propose Porn2Vec, a robust end-to-end framework for detecting pornographic websites using contrastive learning. Particularly, we first model pornographic websites with a heterogeneous graph consisting of websites, webpages, images, texts, and their interactive relationships, and formalize pornographic website detection into node classification task on the graph. Subsequently, we present a novel contrastive learning based heterogeneous graph embedding method to learn the high-level representation of websites by jointly aggregating image-based, text-based, and structure-based features. Finally, the learned website features are fed into a neural network to train an automatic model for pornographic website detection. Experimental results show that Porn2Vec outperforms the existing state-of-the-art methods, demonstrating a more promising and robust performance for detecting well-disguised pornographic websites equipped with adversarial attack examples.
KW - Contrastive Learning
KW - Heterogeneous graph
KW - Pornography detection
KW - Robustness
UR - https://www.scopus.com/pages/publications/85110514993
U2 - 10.1016/j.knosys.2021.107296
DO - 10.1016/j.knosys.2021.107296
M3 - 文章
AN - SCOPUS:85110514993
SN - 0950-7051
VL - 228
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 107296
ER -