Skip to main navigation Skip to search Skip to main content

DropMix: A Textual Data Augmentation Combining Dropout with Mixup

  • Fanshuang Kong
  • , Richong Zhang*
  • , Xiaohui Guo
  • , Samuel Mensah
  • , Yongyi Mao
  • *Corresponding author for this work
  • Beihang University
  • Zhongguancun Laboratory
  • University of Sheffield
  • University of Ottawa

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Overfitting is a common problem when there is insufficient data to train deep neural networks in machine learning tasks. Data augmentation regularization methods such as Dropout, Mixup, and their enhanced variants, are effective and prevalent, and achieve promising performance to overcome overfitting. However, in text learning, most of the existing regularization approaches merely adopt ideas from computer vision without considering the importance of dimensionality in natural language processing. In this paper, we argue that the property is essential to overcome overfitting in text learning. Accordingly, we present a saliency map informed textual data augmentation and regularization framework, which combines Dropout and Mixup, namely DropMix, to mitigate the overfitting problem in text learning. In addition, we design a procedure that drops and patches fine grained shapes of the saliency map under the DropMix framework to enhance regularization. Empirical studies confirm the effectiveness of the proposed approach on 12 text classification tasks.

Original languageEnglish
Title of host publicationProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
EditorsYoav Goldberg, Zornitsa Kozareva, Yue Zhang
PublisherAssociation for Computational Linguistics (ACL)
Pages890-899
Number of pages10
ISBN (Electronic)9781959429401
DOIs
StatePublished - 2022
Event2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Hybrid, Abu Dhabi, United Arab Emirates
Duration: 7 Dec 202211 Dec 2022

Publication series

NameProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022

Conference

Conference2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityHybrid, Abu Dhabi
Period7/12/2211/12/22

Fingerprint

Dive into the research topics of 'DropMix: A Textual Data Augmentation Combining Dropout with Mixup'. Together they form a unique fingerprint.

Cite this