TOCOL: Improving Contextual Representation of Pre-trained Language Models via Token-Level Contrastive Learning

  • Keheng Wang*
  • , Chuantao Yin
  • , Rumei Li
  • , Sirui Wang
  • , Yunsen Xian
  • , Wenge Rong
  • , Zhang Xiong
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a TOken-Level COntrastive Learning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.

Original languageEnglish
Title of host publication2023 IEEE 10th International Conference on Data Science and Advanced Analytics, DSAA 2023 - Proceedings
EditorsYannis Manolopoulos, Zhi-Hua Zhou
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350345032
DOIs
StatePublished - 2023
Event10th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2023 - Thessaloniki, Greece
Duration: 9 Oct 202312 Oct 2023

Publication series

Name2023 IEEE 10th International Conference on Data Science and Advanced Analytics, DSAA 2023 - Proceedings

Conference

Conference10th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2023
Country/TerritoryGreece
CityThessaloniki
Period9/10/2312/10/23

Keywords

  • Contrastive Learning
  • GLUE
  • Natural Language Processing
  • Natural Language Understanding

Fingerprint

Dive into the research topics of 'TOCOL: Improving Contextual Representation of Pre-trained Language Models via Token-Level Contrastive Learning'. Together they form a unique fingerprint.

Cite this