A semi-supervised bayesian network model for microblog topic classification

  • Yan Chen*
  • , Zhoujun Li
  • , Liqiang Nie
  • , Xia Hu
  • , Xiangyu Wang
  • , Tat Seng Chua
  • , Xiaoming Zhang
  • *Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

Abstract

Microblogging services have brought users to a new era of knowledge dissemination and information seeking. However, the large volume and multi-aspect of messages hinder the ability of users to conveniently locate the specific messages that they are interested in. While many researchers wish to employ traditional text classification approaches to effectively understand messages on microblogging services, the limited length of the messages prevents these approaches from being employed to their full potential. To tackle this problem, we propose a novel semi-supervised learning scheme to seamlessly integrate the external web resources to compensate for the limited message length. Our approach first trains a classifier based on the available labeled data as well as some auxiliary cues mined from the web, and probabilistically predicts the categories for all unlabeled data. It then trains a new classifier using the labels for all messages and the auxiliary cues, and iterates the process to convergence. Our approach not only greatly reduces the time-consuming and labor-intensive labeling process, but also deeply exploits the hidden information from unlabeled data and related text resources. We conducted extensive experiments on two real-world microblogging datasets. The results demonstrate the effectiveness of the proposed approaches which produce promising performance as compared to state-of-the-art methods.

Original languageEnglish
Pages561-576
Number of pages16
StatePublished - 2012
Event24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India
Duration: 8 Dec 201215 Dec 2012

Conference

Conference24th International Conference on Computational Linguistics, COLING 2012
Country/TerritoryIndia
CityMumbai
Period8/12/1215/12/12

Keywords

  • Microblog classification
  • Probabilistic graph model
  • Semi-supervised algorithm

Fingerprint

Dive into the research topics of 'A semi-supervised bayesian network model for microblog topic classification'. Together they form a unique fingerprint.

Cite this