跳到主要导航 跳到搜索 跳到主要内容

TopicOcean: An ever-increasing topic model with meta-learning

  • Yuanfeng Song
  • , Yongxin Tong
  • , Siqi Bao
  • , Di Jiang
  • , Hua Wu
  • , Raymond Chi Wing Wong
  • Beihang University
  • Baidu Inc
  • Hong Kong University of Science and Technology

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Topic modeling has been intensively studied and widely applied in both academia and industry in the last decade. In the literature, topic models usually need to be trained from scratch for each individual corpus. Hence, the wisdom of the crowd (i.e., topic models previously trained based upon other corpora) is abandoned. Since a massive amount of in-domain data, considerable computational cost, and human labour are involved in obtaining a high-quality topic model, training from scratch for each new corpus is a huge waste of resources. In this paper, we propose the novel TopicOcean framework, which aims to integrate well-trained topic models and transfer the knowledge of accumulated topics to new corpora in order to improve the quality of their topic models. We first propose a method of constructing the ever-increasing TopicOcean, and then propose a meta-learning mechanism that transfers the meta-level knowledge (i.e., topics) in TopicOcean to the scenario of topic modeling on new corpora. Comprehensive experiments validate that the TopicOcean framework can significantly outperform the state-of-the-art (53.77% perplexity improvement on a temporal-shift corpus and 29.24% improvement on a domain-shift corpus). The well-trained high-quality topic models used to construct TopicOcean have been opensourced to promote further research. 11The well-trained topic models can be accessed at Github (https://github.com/baidu/Familia/blob/master/model/download-model.sh).

源语言英语
主期刊名Proceedings - 20th IEEE International Conference on Data Mining, ICDM 2020
编辑Claudia Plant, Haixun Wang, Alfredo Cuzzocrea, Carlo Zaniolo, Xindong Wu
出版商Institute of Electrical and Electronics Engineers Inc.
1262-1267
页数6
ISBN(电子版)9781728183169
DOI
出版状态已出版 - 11月 2020
已对外发布
活动20th IEEE International Conference on Data Mining, ICDM 2020 - Virtual, Sorrento, 意大利
期限: 17 11月 202020 11月 2020

出版系列

姓名Proceedings - IEEE International Conference on Data Mining, ICDM
2020-November
ISSN(印刷版)1550-4786

会议

会议20th IEEE International Conference on Data Mining, ICDM 2020
国家/地区意大利
Virtual, Sorrento
时期17/11/2020/11/20

指纹

探究 'TopicOcean: An ever-increasing topic model with meta-learning' 的科研主题。它们共同构成独一无二的指纹。

引用此