Skip to main navigation Skip to search Skip to main content

LSDSCC: A large scale domain-specific conversational corpus for response generation with diversity oriented evaluation metrics

  • Zhen Xu
  • , Nan Jiang
  • , Bingquan Liu
  • , Wenge Rong
  • , Bowen Wu
  • , Baoxun Wang
  • , Zhuoran Wang
  • , Xiaolong Wang
  • Harbin Institute of Technology
  • Beihang University
  • Tricorn (Beijing) Technology Co., Ltd.

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

It has been proven that automatic conversational agents can be built up using the Endto-End Neural Response Generation (NRG) framework, and such a data-driven methodology requires a large number of dialog pairs for model training and reasonable evaluation metrics for testing. This paper proposes a Large Scale Domain-Specific Conversational Corpus (LSDSCC) composed of high-quality queryresponse pairs extracted from the domainspecific online forum, with thorough preprocessing and cleansing procedures. Also, a testing set, including multiple diverse responses annotated for each query, is constructed, and on this basis, the metrics for measuring the diversity of generated results are further presented. We evaluate the performances of neural dialog models with the widely applied diversity boosting strategies on the proposed dataset. The experimental results have shown that our proposed corpus can be taken as a new benchmark dataset for the NRG task, and the presented metrics are promising to guide the optimization of NRG models by quantifying the diversity of the generated responses reasonably.

Original languageEnglish
Title of host publicationLong Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages2070-2080
Number of pages11
ISBN (Electronic)9781948087278
StatePublished - 2018
Event2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018 - New Orleans, United States
Duration: 1 Jun 20186 Jun 2018

Publication series

NameNAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
Volume1

Conference

Conference2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018
Country/TerritoryUnited States
CityNew Orleans
Period1/06/186/06/18

Fingerprint

Dive into the research topics of 'LSDSCC: A large scale domain-specific conversational corpus for response generation with diversity oriented evaluation metrics'. Together they form a unique fingerprint.

Cite this