跳到主要导航 跳到搜索 跳到主要内容

DropMix: A Textual Data Augmentation Combining Dropout with Mixup

  • Fanshuang Kong
  • , Richong Zhang*
  • , Xiaohui Guo
  • , Samuel Mensah
  • , Yongyi Mao
  • *此作品的通讯作者
  • Beihang University
  • Zhongguancun Laboratory
  • University of Sheffield
  • University of Ottawa

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Overfitting is a common problem when there is insufficient data to train deep neural networks in machine learning tasks. Data augmentation regularization methods such as Dropout, Mixup, and their enhanced variants, are effective and prevalent, and achieve promising performance to overcome overfitting. However, in text learning, most of the existing regularization approaches merely adopt ideas from computer vision without considering the importance of dimensionality in natural language processing. In this paper, we argue that the property is essential to overcome overfitting in text learning. Accordingly, we present a saliency map informed textual data augmentation and regularization framework, which combines Dropout and Mixup, namely DropMix, to mitigate the overfitting problem in text learning. In addition, we design a procedure that drops and patches fine grained shapes of the saliency map under the DropMix framework to enhance regularization. Empirical studies confirm the effectiveness of the proposed approach on 12 text classification tasks.

源语言英语
主期刊名Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
编辑Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
出版商Association for Computational Linguistics (ACL)
890-899
页数10
ISBN(电子版)9781959429401
DOI
出版状态已出版 - 2022
活动2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Hybrid, Abu Dhabi, 阿拉伯联合酋长国
期限: 7 12月 202211 12月 2022

出版系列

姓名Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022

会议

会议2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
国家/地区阿拉伯联合酋长国
Hybrid, Abu Dhabi
时期7/12/2211/12/22

指纹

探究 'DropMix: A Textual Data Augmentation Combining Dropout with Mixup' 的科研主题。它们共同构成独一无二的指纹。

引用此