Bigram Chinese word segmentation by Viterbi algorithm

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Chinese word segmentation is an important foundation for Chinese information processing. This paper proposes a new Chinese word segmentation model based on Bayesian network. In this model, Character alignment Viterbi algorithm, which treats the preceding word of each Chinese character as its state, and the N-gram probability as its state transition probability, is suggested to be combined with Viterbi algorithm to achieve better performance. The model we proposed also achieves word sense disambiguation and auto recognition of foreign and domestic person names together. It is demonstrated to be more efficient in word segmentation under better precision and recall.

Original languageEnglish
Title of host publication6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009
Pages364-368
Number of pages5
DOIs
StatePublished - 2009
Event6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009 - Tianjin, China
Duration: 14 Aug 200916 Aug 2009

Publication series

Name6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009
Volume5

Conference

Conference6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009
Country/TerritoryChina
CityTianjin
Period14/08/0916/08/09

Keywords

  • Bayesian network
  • N-gram
  • Viterbi algorithm

Fingerprint

Dive into the research topics of 'Bigram Chinese word segmentation by Viterbi algorithm'. Together they form a unique fingerprint.

Cite this