TY - GEN
T1 - Graph Regularized Non-negative Matrix Factorization with Long-tail Constraint
AU - You, Lu
AU - Liu, Rui
AU - Zhang, He
AU - Shan, Z. M.
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/8
Y1 - 2019/8
N2 - How to dig out long tail topics is a great challenge in text mining. In previous research, most of non-hierarchical topic models were based on a hypothesis that the topics in documents follow polynomial distribution, ignoring the topics at the tail of distribution curve. Hierarchical topic model have the ability to mine long tail topics by introducing the hierarchical relationship among topics, but leading to a higher computational complexity. In this article, we propose a new method to mine long tail topics, which is called graph regularized non-negative matrix factorization with long-tail constraint. It uses KL divergence to measure the difference between matrices, and use neighbor graph to preserve the intrinsic geometrical and discriminating structure between original samples in low-dimensional space. Experiment shows, the algorithm we proposed can mine more long tail topic information in document, and make improvement in the task of data mining, comparing to other method, such as classical dirichlet distribution, non-negative matrix, hierarchical matrix, hierarchical latent dirichlet distribution.
AB - How to dig out long tail topics is a great challenge in text mining. In previous research, most of non-hierarchical topic models were based on a hypothesis that the topics in documents follow polynomial distribution, ignoring the topics at the tail of distribution curve. Hierarchical topic model have the ability to mine long tail topics by introducing the hierarchical relationship among topics, but leading to a higher computational complexity. In this article, we propose a new method to mine long tail topics, which is called graph regularized non-negative matrix factorization with long-tail constraint. It uses KL divergence to measure the difference between matrices, and use neighbor graph to preserve the intrinsic geometrical and discriminating structure between original samples in low-dimensional space. Experiment shows, the algorithm we proposed can mine more long tail topic information in document, and make improvement in the task of data mining, comparing to other method, such as classical dirichlet distribution, non-negative matrix, hierarchical matrix, hierarchical latent dirichlet distribution.
KW - Data Mining
KW - Long tail
KW - Matrix Factorization
UR - https://www.scopus.com/pages/publications/85084355271
U2 - 10.1109/PACRIM47961.2019.8985119
DO - 10.1109/PACRIM47961.2019.8985119
M3 - 会议稿件
AN - SCOPUS:85084355271
T3 - 2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, PACRIM 2019 - Proceedings
BT - 2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, PACRIM 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, PACRIM 2019
Y2 - 21 August 2019 through 23 August 2019
ER -