TY - GEN
T1 - Enhancing Prompt Tuning for Smaller Pretrained Models via Knowledge Distillation
AU - Yuan, Mengyang
AU - Lang, Bo
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Prompt tuning, as a parameter-efficient fine-tuning method, plays a crucial role in the fine-tuning of pre-trained models. However, due to the limited expressive power of smaller pre-trained models, the performance of prompt tuning on these smaller models often falls short compared to the larger pre-trained models. To resolve this issue, we propose a knowledge distillation approach that leverages the knowledge of a larger teacher model to enhance the performance of prompt tuning on smaller models. Through analysis and experiments, we first determine that the logit-based distillation method is more suitable for prompt tuning compared to the feature-based method. Building on the commonly used inter-class relationship distillation, we then design and add a new loss function that enables the student model to learn the inter-instance relationships from the teacher model. This expands the information utilized from the teacher model, thereby further enhancing the distillation effect. Experimental results on multiple tasks in the SuperGLUE benchmark indicate that our method significantly enhances the prompt tuning performance of smaller models, even achieving or surpassing the results of larger teacher models in some tasks. Additionally, our method does not alter the structure of the student model, ensuring that the fine-tuned model retains all the advantages of prompt tuning during inference.
AB - Prompt tuning, as a parameter-efficient fine-tuning method, plays a crucial role in the fine-tuning of pre-trained models. However, due to the limited expressive power of smaller pre-trained models, the performance of prompt tuning on these smaller models often falls short compared to the larger pre-trained models. To resolve this issue, we propose a knowledge distillation approach that leverages the knowledge of a larger teacher model to enhance the performance of prompt tuning on smaller models. Through analysis and experiments, we first determine that the logit-based distillation method is more suitable for prompt tuning compared to the feature-based method. Building on the commonly used inter-class relationship distillation, we then design and add a new loss function that enables the student model to learn the inter-instance relationships from the teacher model. This expands the information utilized from the teacher model, thereby further enhancing the distillation effect. Experimental results on multiple tasks in the SuperGLUE benchmark indicate that our method significantly enhances the prompt tuning performance of smaller models, even achieving or surpassing the results of larger teacher models in some tasks. Additionally, our method does not alter the structure of the student model, ensuring that the fine-tuned model retains all the advantages of prompt tuning during inference.
KW - Knowledge distillation
KW - Parameter-efficient fine-tuning
KW - Prompt tuning
UR - https://www.scopus.com/pages/publications/105010012495
U2 - 10.1007/978-981-96-7030-7_12
DO - 10.1007/978-981-96-7030-7_12
M3 - 会议稿件
AN - SCOPUS:105010012495
SN - 9789819670291
T3 - Communications in Computer and Information Science
SP - 164
EP - 178
BT - Neural Information Processing - 31st International Conference, ICONIP 2024, Proceedings
A2 - Mahmud, Mufti
A2 - Doborjeh, Maryam
A2 - Wong, Kevin
A2 - Leung, Andrew Chi Sing
A2 - Doborjeh, Zohreh
A2 - Tanveer, M.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 31st International Conference on Neural Information Processing, ICONIP 2024
Y2 - 2 December 2024 through 6 December 2024
ER -