TY - GEN
T1 - TOCOL
T2 - 10th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2023
AU - Wang, Keheng
AU - Yin, Chuantao
AU - Li, Rumei
AU - Wang, Sirui
AU - Xian, Yunsen
AU - Rong, Wenge
AU - Xiong, Zhang
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a TOken-Level COntrastive Learning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.
AB - Self-attention, which allows transformers to capture deep bidirectional contexts, plays a vital role in BERT-like pre-trained language models. However, the maximum likelihood pre-training objective of BERT may produce an anisotropic word embedding space, which leads to biased attention scores for high-frequency tokens, as they are very close to each other in representation space and thus have higher similarities. This bias may ultimately affect the encoding of global contextual information. To address this issue, we propose TOCOL, a TOken-Level COntrastive Learning framework for improving the contextual representation of pre-trained language models, which integrates a novel self-supervised objective to the attention mechanism to reshape the word representation space and encourages PLM to capture the global semantics of sentences. Results on the GLUE Benchmark show that TOCOL brings considerable improvement over the original BERT. Furthermore, we conduct a detailed analysis and demonstrate the robustness of our approach for low-resource scenarios.
KW - Contrastive Learning
KW - GLUE
KW - Natural Language Processing
KW - Natural Language Understanding
UR - https://www.scopus.com/pages/publications/85179012802
U2 - 10.1109/DSAA60987.2023.10302506
DO - 10.1109/DSAA60987.2023.10302506
M3 - 会议稿件
AN - SCOPUS:85179012802
T3 - 2023 IEEE 10th International Conference on Data Science and Advanced Analytics, DSAA 2023 - Proceedings
BT - 2023 IEEE 10th International Conference on Data Science and Advanced Analytics, DSAA 2023 - Proceedings
A2 - Manolopoulos, Yannis
A2 - Zhou, Zhi-Hua
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 October 2023 through 12 October 2023
ER -