Investigating Learning Dynamics of BERT Fine-Tuning

  • Yaru Hao
  • , Li Dong
  • , Furu Wei
  • , Ke Xu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The recently introduced pre-trained language model BERT advances the state-of-the-art on many NLP tasks through the fine-tuning approach, but few studies investigate how the fine-tuning process improves the model performance on downstream tasks. In this paper, we inspect the learning dynamics of BERT fine-tuning with two indicators. We use JS divergence to detect the change of the attention mode and use SVCCA distance to examine the change to the feature extraction mode during BERT fine-tuning. We conclude that BERT fine-tuning mainly changes the attention mode of the last layers and modifies the feature extraction mode of the intermediate and last layers. Moreover, we analyze the consistency of BERT fine-tuning between different random seeds and different datasets. In summary, we provide a distinctive understanding of the learning dynamics of BERT fine-tuning, which sheds some light on improving the fine-tuning results.

Original languageEnglish
Title of host publicationProceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL-IJCNLP 2020
EditorsKam-Fai Wong, Kevin Knight, Hua Wu
PublisherAssociation for Computational Linguistics (ACL)
Pages87-92
Number of pages6
ISBN (Electronic)9781952148910
DOIs
StatePublished - 2020
Event1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL-IJCNLP 2020 - Virtual, Online, China
Duration: 4 Dec 20207 Dec 2020

Publication series

NameProceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL-IJCNLP 2020

Conference

Conference1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL-IJCNLP 2020
Country/TerritoryChina
CityVirtual, Online
Period4/12/207/12/20

Fingerprint

Dive into the research topics of 'Investigating Learning Dynamics of BERT Fine-Tuning'. Together they form a unique fingerprint.

Cite this