TY - GEN
T1 - Towards Harmonized Uncertainty Estimation for Large Language Models
AU - Li, Rui
AU - Long, Jing
AU - Qi, Muge
AU - Xia, Heming
AU - Sha, Lei
AU - Wang, Peiyi
AU - Sui, Zhifang
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - To facilitate robust and trustworthy deployment of large language models (LLMs), it is essential to quantify the reliability of their generations through uncertainty estimation. While recent efforts have made significant advancements by leveraging the internal logic and linguistic features of LLMs to estimate uncertainty scores, our empirical analysis highlights the pitfalls of these methods to strike a harmonized estimation between indication, balance, and calibration, which hinders their broader capability for accurate uncertainty estimation. To address this challenge, we propose CUE (Corrector for Uncertainty Estimation): A straightforward yet effective method that employs a lightweight model trained on data aligned with the target LLM's performance to adjust uncertainty scores. Comprehensive experiments across diverse models and tasks demonstrate its effectiveness, which achieves consistent improvements of up to 60% over existing methods. Resources are available at https://github.com/O-L1RU1/Corrector4UE.
AB - To facilitate robust and trustworthy deployment of large language models (LLMs), it is essential to quantify the reliability of their generations through uncertainty estimation. While recent efforts have made significant advancements by leveraging the internal logic and linguistic features of LLMs to estimate uncertainty scores, our empirical analysis highlights the pitfalls of these methods to strike a harmonized estimation between indication, balance, and calibration, which hinders their broader capability for accurate uncertainty estimation. To address this challenge, we propose CUE (Corrector for Uncertainty Estimation): A straightforward yet effective method that employs a lightweight model trained on data aligned with the target LLM's performance to adjust uncertainty scores. Comprehensive experiments across diverse models and tasks demonstrate its effectiveness, which achieves consistent improvements of up to 60% over existing methods. Resources are available at https://github.com/O-L1RU1/Corrector4UE.
UR - https://www.scopus.com/pages/publications/105021046764
U2 - 10.18653/v1/2025.acl-long.1118
DO - 10.18653/v1/2025.acl-long.1118
M3 - 会议稿件
AN - SCOPUS:105021046764
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 22938
EP - 22953
BT - Long Papers
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Y2 - 27 July 2025 through 1 August 2025
ER -