Discovering compatible top-k theme patterns from text based on users' preferences

  • Yongxin Tong*
  • , Shilong Ma
  • , Dan Yu
  • , Yuanyuan Zhang
  • , Li Zhao
  • , Ke Xu
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Discovering a representative set of theme patterns from a large amount of text for interpreting their meaning has always been concerned by researches of both data mining and information retrieval. Recent studies of theme pattern mining have paid close attention to the problem of discovering a set of compatible top-k theme patterns with both high-interestingness and lowredundancy. Since different users have different preferences on interestingness and redundancy, how to measure the attributes of the users' preferences, and thereby to discover ′preferred compatible top-k theme patterns″ (PCTTP) is urgent in the field of text mining. In this paper, a novel strategy of discovering PCTTP based on users' preferences in text mining is proposed. Firstly, an evaluation function of the preferred compatibility between every two theme patterns is presented. Then the preferred compatibilities are archived into a data structure called theme compatibility graph, and a problem called MWSP based on the compatibility graph is proposed to formulate the problem how to discover the PCTTP. Secondly, since MWSP is proved to be a NP-Hard problem, a greedy algorithm, DPCTG, is designed to approximate the optimal solution of MWSP. Thirdly, a quality evaluation model is introduced to measure the compatibility of discovering theme patterns. Empirical studies indicate that a high quality set of PCTTP on four different sub text sets can be obtained from DBLP.

Original languageEnglish
Title of host publicationIntelligence and Security Informatics - Pacific Asia Workshop, PAISI 2009, Proceedings
Pages130-142
Number of pages13
DOIs
StatePublished - 2009
EventPacific Asia Workshop on Intelligence and Security Informatics, PAISI 2009 - Bangkok, Thailand
Duration: 27 Apr 200927 Apr 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5477
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferencePacific Asia Workshop on Intelligence and Security Informatics, PAISI 2009
Country/TerritoryThailand
CityBangkok
Period27/04/0927/04/09

Fingerprint

Dive into the research topics of 'Discovering compatible top-k theme patterns from text based on users' preferences'. Together they form a unique fingerprint.

Cite this