XDailyDialog: A Multilingual Parallel Dialogue Corpus

  • Zeming Liu
  • , Ping Nie
  • , Jie Cai
  • , Haifeng Wang*
  • , Zheng Yu Niu
  • , Peng Zhang
  • , Mrinmaya Sachan
  • , Kaiping Peng
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

High-quality corpora are significant to the development of dialogue models. However, most existing corpora for open-domain dialogue modeling are limited to a single language. The absence of multilingual open-domain dialog corpora not only limits the research on multilingual or cross-lingual transfer learning but also hinders the development of robust open-domain dialogue systems that can be deployed in other parts of the world. In this paper, we provide a multilingual parallel open-domain dialog dataset, XDailyDialog, to enable researchers to explore the challenging task of multilingual and cross-lingual open-domain dialogue. XDailyDialog includes 13K dialogues aligned across 4 languages (52K dialogues and 410K utterances in total). We then propose a dialogue generation model, kNN-Chat, which has a novel kNN-search mechanism to support unified response retrieval for monolingual, multilingual, and cross-lingual dialogue. Experiment results show the effectiveness of this framework.

Original languageEnglish
Title of host publicationLong Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages12240-12253
Number of pages14
ISBN (Electronic)9781959429722
DOIs
StatePublished - 2023
Externally publishedYes
Event61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
Duration: 9 Jul 202314 Jul 2023

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Conference

Conference61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/TerritoryCanada
CityToronto
Period9/07/2314/07/23

Fingerprint

Dive into the research topics of 'XDailyDialog: A Multilingual Parallel Dialogue Corpus'. Together they form a unique fingerprint.

Cite this