跳到主要导航 跳到搜索 跳到主要内容

LSDSCC: A large scale domain-specific conversational corpus for response generation with diversity oriented evaluation metrics

  • Zhen Xu
  • , Nan Jiang
  • , Bingquan Liu
  • , Wenge Rong
  • , Bowen Wu
  • , Baoxun Wang
  • , Zhuoran Wang
  • , Xiaolong Wang
  • Harbin Institute of Technology
  • Beihang University
  • Tricorn (Beijing) Technology Co., Ltd.

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

It has been proven that automatic conversational agents can be built up using the Endto-End Neural Response Generation (NRG) framework, and such a data-driven methodology requires a large number of dialog pairs for model training and reasonable evaluation metrics for testing. This paper proposes a Large Scale Domain-Specific Conversational Corpus (LSDSCC) composed of high-quality queryresponse pairs extracted from the domainspecific online forum, with thorough preprocessing and cleansing procedures. Also, a testing set, including multiple diverse responses annotated for each query, is constructed, and on this basis, the metrics for measuring the diversity of generated results are further presented. We evaluate the performances of neural dialog models with the widely applied diversity boosting strategies on the proposed dataset. The experimental results have shown that our proposed corpus can be taken as a new benchmark dataset for the NRG task, and the presented metrics are promising to guide the optimization of NRG models by quantifying the diversity of the generated responses reasonably.

源语言英语
主期刊名Long Papers
出版商Association for Computational Linguistics (ACL)
2070-2080
页数11
ISBN(电子版)9781948087278
出版状态已出版 - 2018
活动2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018 - New Orleans, 美国
期限: 1 6月 20186 6月 2018

出版系列

姓名NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
1

会议

会议2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018
国家/地区美国
New Orleans
时期1/06/186/06/18

指纹

探究 'LSDSCC: A large scale domain-specific conversational corpus for response generation with diversity oriented evaluation metrics' 的科研主题。它们共同构成独一无二的指纹。

引用此