跳到主要导航 跳到搜索 跳到主要内容

A Neural Expectation-Maximization Framework for Noisy Multi-Label Text Classification

  • Zhongguancun Laboratory
  • University of Leeds
  • University of Ottawa

科研成果: 期刊稿件文章同行评审

摘要

Multi-label text classification (MLTC) has a wide range of real-world applications. Neural networks recently promoted the performance of MLTC models. Training these neural-network models relies on sufficient accurately labelled data. However, manually annotating large-scale multi-label text classification datasets is expensive and impractical for many applications. Weak supervision techniques have thus been developed to reduce the cost of annotating text corpus. However, these techniques introduce noisy labels into the training data and may degrade the model performance. This paper aims to deal with such noise-label problems in MLTC in both single-instance and multi-instance settings. We build a novel Neural Expectation-Maximization Framework (nEM) that combines neural networks with probabilistic modelling. The nEM framework produces text representations using neural-network text encoders and is optimized with the Expectation-Maximization algorithm. It naturally considers the noisy labels during learning by iteratively updating the model parameters and estimating the distribution of the ground-truth labels. We evaluate our nEM framework in multi-instance noisy MLTC on a benchmark relation extraction dataset constructed by distant supervision and in single-instance noisy MLTC on synthetic noisy datasets constructed by keywords supervision and label flipping. The experimental results demonstrate that nEM significantly improves upon baseline models in both single-instance and multi-instance noisy MLTC tasks. The experiment analysis suggests that our nEM framework efficiently reduces the noisy labels in MLTC datasets and significantly improves model performance.

源语言英语
页(从-至)10992-11003
页数12
期刊IEEE Transactions on Knowledge and Data Engineering
35
11
DOI
出版状态已出版 - 1 11月 2023

指纹

探究 'A Neural Expectation-Maximization Framework for Noisy Multi-Label Text Classification' 的科研主题。它们共同构成独一无二的指纹。

引用此