跳到主要导航 跳到搜索 跳到主要内容

A graph-based coarse-to-fine method for unsupervised bilingual lexicon induction

  • Shuo Ren*
  • , Shujie Liu
  • , Ming Zhou
  • , Shuai Ma
  • *此作品的通讯作者
  • Beihang University
  • Beijing Advanced Innovation Center for Big Data and Brain Computing
  • Microsoft USA

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Unsupervised bilingual lexicon induction is the task of inducing word translations from monolingual corpora of two languages. Recent methods are mostly based on unsupervised cross-lingual word embeddings, the key to which is to find initial solutions of word translations, followed by the learning and refinement of mappings between the embedding spaces of two languages. However, previous methods find initial solutions just based on word-level information, which may be (1) limited and inaccurate, and (2) prone to contain some noise introduced by the insufficiently pre-trained embeddings of some words. To deal with those issues, in this paper, we propose a novel graph-based paradigm to induce bilingual lexicons in a coarse-to-fine way. We first build a graph for each language with its vertices representing different words. Then we extract word cliques from the graphs and map the cliques of two languages. Based on that, we induce the initial word translation solution with the central words of the aligned cliques. This coarse-to-fine approach not only leverages clique-level information, which is richer and more accurate, but also effectively reduces the bad effect of the noise in the pre-trained embeddings. Finally, we take the initial solution as the seed to learn cross-lingual embeddings, from which we induce bilingual lexicons. Experiments show that our approach improves the performance of bilingual lexicon induction compared with previous methods.

源语言英语
主期刊名ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
出版商Association for Computational Linguistics (ACL)
3476-3485
页数10
ISBN(电子版)9781952148255
出版状态已出版 - 2020
活动58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Virtual, Online, 美国
期限: 5 7月 202010 7月 2020

出版系列

姓名Proceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN(印刷版)0736-587X

会议

会议58th Annual Meeting of the Association for Computational Linguistics, ACL 2020
国家/地区美国
Virtual, Online
时期5/07/2010/07/20

指纹

探究 'A graph-based coarse-to-fine method for unsupervised bilingual lexicon induction' 的科研主题。它们共同构成独一无二的指纹。

引用此