Skip to main navigation Skip to search Skip to main content

Auto-dialabel: Labeling dialogue data with unsupervised learning

  • Chen Shi
  • , Qi Chen
  • , Lei Sha
  • , Sujian Li
  • , Xu Sun
  • , Houfeng Wang*
  • , Lintao Zhang
  • *Corresponding author for this work
  • Peking University
  • Microsoft USA

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The lack of labeled data is one of the main challenges when building a task-oriented dialogue system. Existing dialogue datasets usually rely on human labeling, which is expensive, limited in size, and in low coverage. In this paper, we instead propose our framework auto-dialabel to automatically cluster the dialogue intents and slots. In this framework, we collect a set of context features, leverage an autoencoder for feature assembly, and adapt a dynamic hierarchical clustering method for intent and slot labeling. Experimental results show that our framework can promote human labeling cost to a great extent, achieve good intent clustering accuracy (84.1%), and provide reasonable and instructive slot labeling results.

Original languageEnglish
Title of host publicationProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
EditorsEllen Riloff, David Chiang, Julia Hockenmaier, Jun'ichi Tsujii
PublisherAssociation for Computational Linguistics
Pages684-689
Number of pages6
ISBN (Electronic)9781948087841
StatePublished - 2018
Externally publishedYes
Event2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 - Brussels, Belgium
Duration: 31 Oct 20184 Nov 2018

Publication series

NameProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018

Conference

Conference2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018
Country/TerritoryBelgium
CityBrussels
Period31/10/184/11/18

Fingerprint

Dive into the research topics of 'Auto-dialabel: Labeling dialogue data with unsupervised learning'. Together they form a unique fingerprint.

Cite this