跳到主要导航 跳到搜索 跳到主要内容

Scheduled drophead: A regularization method for transformer models

  • Wangchunshu Zhou*
  • , Tao Ge
  • , Ke Xu
  • , Furu Wei
  • , Ming Zhou
  • *此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

We introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism which is a key component of transformer. In contrast to the conventional dropout mechanism which randomly drops units or connections, DropHead drops entire attention heads during training to prevent the multi-head attention model from being dominated by a small portion of attention heads. It can help reduce the risk of overfitting and allow the models to better benefit from the multi-head attention. Given the interaction between multi-headedness and training dynamics, we further propose a novel dropout rate scheduler to adjust the dropout rate of DropHead throughout training, which results in a better regularization effect. Experimental results demonstrate that our proposed approach can improve transformer models by 0.9 BLEU score on WMT14 En-De translation task and around 1.0 accuracy for various text classification tasks.

源语言英语
主期刊名Findings of the Association for Computational Linguistics Findings of ACL
主期刊副标题EMNLP 2020
出版商Association for Computational Linguistics (ACL)
1971-1980
页数10
ISBN(电子版)9781952148903
出版状态已出版 - 2020
活动Findings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020 - Virtual, Online
期限: 16 11月 202020 11月 2020

出版系列

姓名Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020

会议

会议Findings of the Association for Computational Linguistics, ACL 2020: EMNLP 2020
Virtual, Online
时期16/11/2020/11/20

指纹

探究 'Scheduled drophead: A regularization method for transformer models' 的科研主题。它们共同构成独一无二的指纹。

引用此