Skip to main navigation Skip to search Skip to main content

Learning to Sample Replacements for ELECTRA Pre-Training

  • Yaru Hao*
  • , Li Dong
  • , Hangbo Bao
  • , Ke Xu
  • , Furu Wei
  • *Corresponding author for this work
  • Beihang University
  • Microsoft USA

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

ELECTRA (Clark et al., 2020a) pretrains a discriminator to detect replaced tokens, where the replacements are sampled from a generator trained with masked language modeling. Despite the compelling performance, ELECTRA suffers from the following two issues. First, there is no direct feedback loop from discriminator to generator, which renders replacement sampling inefficient. Second, the generator's prediction tends to be over-confident along with training, making replacements biased to correct tokens. In this paper, we propose two methods to improve replacement sampling for ELECTRA pre-training. Specifically, we augment sampling with a hardness prediction mechanism, so that the generator can encourage the discriminator to learn what it has not acquired. We also prove that the efficient sampling reduces the training variance of the discriminator. Moreover, we propose to use a focal loss for the generator in order to relieve oversampling correct tokens as replacements. Experimental results show that our method improves ELECTRA pre-training on various downstream tasks. Our code and pre-trained models will be released at: https://github.com/YRdddream/electra-hp.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationACL-IJCNLP 2021
EditorsChengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
PublisherAssociation for Computational Linguistics (ACL)
Pages4495-4506
Number of pages12
ISBN (Electronic)9781954085541
DOIs
StatePublished - 2021
EventFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021 - Virtual, Online
Duration: 1 Aug 20216 Aug 2021

Publication series

NameFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Conference

ConferenceFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021
CityVirtual, Online
Period1/08/216/08/21

Fingerprint

Dive into the research topics of 'Learning to Sample Replacements for ELECTRA Pre-Training'. Together they form a unique fingerprint.

Cite this