An encoding strategy based word-character LSTM for Chinese ner

  • Wei Liu
  • , Tongge Xu
  • , Qinghua Xu
  • , Jiayu Song
  • , Yueran Zu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

A recently proposed lattice model has demonstrated that words in character sequence can provide rich word boundary information for character-based Chinese NER model. In this model, word information is integrated into a shortcut path between the start and the end characters of the word. However, the existence of shortcut path may cause the model to degenerate into a partial word-based model, which will suffer from word segmentation errors. Furthermore, the lattice model can not be trained in batches due to its DAG structure. In this paper, we propose a novel word-character LSTM(WC-LSTM) model to add word information into the start or the end character of the word, alleviating the influence of word segmentation errors while obtaining the word boundary information. Four different strategies are explored in our model to encode word information into a fixed-sized representation for efficient batch training. Experiments on benchmark datasets show that our proposed model outperforms other state-of-the-arts models.

Original languageEnglish
Title of host publicationLong and Short Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages2379-2389
Number of pages11
ISBN (Electronic)9781950737130
StatePublished - 2019
Event2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019 - Minneapolis, United States
Duration: 2 Jun 20197 Jun 2019

Publication series

NameNAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
Volume1

Conference

Conference2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019
Country/TerritoryUnited States
CityMinneapolis
Period2/06/197/06/19

Fingerprint

Dive into the research topics of 'An encoding strategy based word-character LSTM for Chinese ner'. Together they form a unique fingerprint.

Cite this