PPLSA: Parallel probabilistic latent semantic analysis based on MapReduce

  • Ning Li*
  • , Fuzhen Zhuang
  • , Qing He
  • , Zhongzhi Shi
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

PLSA(Probabilistic Latent Semantic Analysis) is a popular topic modeling technique for exploring document collections. Due to the increasing prevalence of large datasets, there is a need to improve the scalability of computation in PLSA. In this paper, we propose a parallel PLSA algorithm called PPLSA to accommodate large corpus collections in the MapReduce framework. Our solution efficiently distributes computation and is relatively simple to implement.

Original languageEnglish
Title of host publicationIntelligent Information Processing VI - 7th IFIP TC 12 International Conference, IIP 2012, Proceedings
Pages40-49
Number of pages10
DOIs
StatePublished - 2012
Externally publishedYes
Event7th IFIP International Conference on Intelligent Information Processing, IIP 2012 - Guilin, China
Duration: 12 Oct 201215 Oct 2012

Publication series

NameIFIP Advances in Information and Communication Technology
Volume385 AICT
ISSN (Print)1868-4238

Conference

Conference7th IFIP International Conference on Intelligent Information Processing, IIP 2012
Country/TerritoryChina
CityGuilin
Period12/10/1215/10/12

Keywords

  • EM
  • MapReduce
  • Parallel
  • Probabilistic Latent Semantic Analysis

Fingerprint

Dive into the research topics of 'PPLSA: Parallel probabilistic latent semantic analysis based on MapReduce'. Together they form a unique fingerprint.

Cite this