TY - GEN
T1 - Online Detection of Domain-Specific New Words in Text Streams
AU - Luo, Yanlin
AU - Zuo, Yuan
AU - Wu, Junjie
AU - Li, Hong
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/13
Y1 - 2018/9/13
N2 - With the tremendous development of Internet, many domain-specific new words appear in various media text streams such as forums, Sina Weibo, Wechat, etc. These new words are always a group of important words in specific domains and are significant for NLP tasks. Most existing models have time-consuming processing or cannot handle out of vocabulary (OOV) words on streaming and online scenes. In this paper, we propose an unsupervised method, D-TopWords with Gaussian LDA, to perform online detection of domain-specific new words effectively. Different from traditional new words detection models, our method is a joint statistical model based on a finite word dictionary without any handcraft features. By further introducing Gaussian LDA into our model, we solve properly the problem of OOV words from new text streams. Experimental results show that our work can successfully extract domain-specific new words and it has a better performance in online detection task than some state-of-the-art methods.
AB - With the tremendous development of Internet, many domain-specific new words appear in various media text streams such as forums, Sina Weibo, Wechat, etc. These new words are always a group of important words in specific domains and are significant for NLP tasks. Most existing models have time-consuming processing or cannot handle out of vocabulary (OOV) words on streaming and online scenes. In this paper, we propose an unsupervised method, D-TopWords with Gaussian LDA, to perform online detection of domain-specific new words effectively. Different from traditional new words detection models, our method is a joint statistical model based on a finite word dictionary without any handcraft features. By further introducing Gaussian LDA into our model, we solve properly the problem of OOV words from new text streams. Experimental results show that our work can successfully extract domain-specific new words and it has a better performance in online detection task than some state-of-the-art methods.
KW - Gaussian LDA
KW - new words detection
KW - text streams
KW - word dictionary model
UR - https://www.scopus.com/pages/publications/85054408603
U2 - 10.1109/ICSSSM.2018.8465088
DO - 10.1109/ICSSSM.2018.8465088
M3 - 会议稿件
AN - SCOPUS:85054408603
SN - 9781538651780
T3 - 2018 15th International Conference on Service Systems and Service Management, ICSSSM 2018
BT - 2018 15th International Conference on Service Systems and Service Management, ICSSSM 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th International Conference on Service Systems and Service Management, ICSSSM 2018
Y2 - 21 July 2018 through 22 July 2018
ER -