TY - GEN
T1 - How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation
AU - Li, Rui
AU - Xia, Heming
AU - Yuan, Xinfeng
AU - Dong, Qingxiu
AU - Sha, Lei
AU - Li, Wenjie
AU - Sui, Zhifang
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Recently, LLMs have garnered increasing attention across academic disciplines for their potential as human digital twins, virtual proxies designed to replicate individuals and autonomously perform tasks such as decision-making, problem-solving, and reasoning on their behalf. However, current evaluations of LLMs primarily emphasize dialogue simulation while overlooking human behavior simulation, which is crucial for digital twins. To address this gap, we introduce BEHAVIORCHAIN, the first benchmark for evaluating LLMs' ability to simulate continuous human behavior. BEHAVIORCHAIN comprises diverse, high-quality, persona-based behavior chains, totaling 15,846 distinct behaviors across 1,001 unique personas, each with detailed history and profile metadata. For evaluation, we integrate persona metadata into LLMs and employ them to iteratively infer contextually appropriate behaviors within dynamic scenarios provided by BEHAVIORCHAIN. Comprehensive evaluation results demonstrated that even state-of-the-art models struggle with accurately simulating continuous human behavior. Resources are available at https://github.com/OL1RU1/BehaviorChain.
AB - Recently, LLMs have garnered increasing attention across academic disciplines for their potential as human digital twins, virtual proxies designed to replicate individuals and autonomously perform tasks such as decision-making, problem-solving, and reasoning on their behalf. However, current evaluations of LLMs primarily emphasize dialogue simulation while overlooking human behavior simulation, which is crucial for digital twins. To address this gap, we introduce BEHAVIORCHAIN, the first benchmark for evaluating LLMs' ability to simulate continuous human behavior. BEHAVIORCHAIN comprises diverse, high-quality, persona-based behavior chains, totaling 15,846 distinct behaviors across 1,001 unique personas, each with detailed history and profile metadata. For evaluation, we integrate persona metadata into LLMs and employ them to iteratively infer contextually appropriate behaviors within dynamic scenarios provided by BEHAVIORCHAIN. Comprehensive evaluation results demonstrated that even state-of-the-art models struggle with accurately simulating continuous human behavior. Resources are available at https://github.com/OL1RU1/BehaviorChain.
UR - https://www.scopus.com/pages/publications/105028596875
U2 - 10.18653/v1/2025.findings-acl.813
DO - 10.18653/v1/2025.findings-acl.813
M3 - 会议稿件
AN - SCOPUS:105028596875
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 15738
EP - 15763
BT - Findings of the Association for Computational Linguistics
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Y2 - 27 July 2025 through 1 August 2025
ER -