TY - GEN
T1 - Chain-of-Imagination for Reliable Instruction Following in Decision Making
AU - Zhou, Enshen
AU - Qin, Yiran
AU - Yin, Zhenfei
AU - Shi, Zhelun
AU - Huang, Yuzhou
AU - Zhang, Ruimao
AU - Sheng, Lu
AU - Shao, Jing
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Enabling the embodied agent to imagine step-by-step the future states and sequentially approach these situation-aware states can enhance its capability to make reliable action decisions from textual instructions. In this work, we introduce a simple but effective mechanism called Chain-of-Imagination (CoI), which repeatedly employs a Multimodal Large Language Model (MLLM) equipped with diffusion model to facilitate imagining and acting upon the series of intermediate situation-aware visual sub-goals one by one, resulting in more reliable instruction-following capability. Based on the CoI mechanism, we propose an embodied agent DecisionDreamer as the low-level controller that can be adapted to different open-world scenarios. Extensive experiments demonstrate that Decision-Dreamer can achieve more reliable and accurate decision-making and significantly outperform the state-of-the-art generalist agents in the Minecraft and CALVIN sandbox simulators, regarding the instruction-following capability. For more demos, please see https://sites.google.com/view/decisiondreamer.
AB - Enabling the embodied agent to imagine step-by-step the future states and sequentially approach these situation-aware states can enhance its capability to make reliable action decisions from textual instructions. In this work, we introduce a simple but effective mechanism called Chain-of-Imagination (CoI), which repeatedly employs a Multimodal Large Language Model (MLLM) equipped with diffusion model to facilitate imagining and acting upon the series of intermediate situation-aware visual sub-goals one by one, resulting in more reliable instruction-following capability. Based on the CoI mechanism, we propose an embodied agent DecisionDreamer as the low-level controller that can be adapted to different open-world scenarios. Extensive experiments demonstrate that Decision-Dreamer can achieve more reliable and accurate decision-making and significantly outperform the state-of-the-art generalist agents in the Minecraft and CALVIN sandbox simulators, regarding the instruction-following capability. For more demos, please see https://sites.google.com/view/decisiondreamer.
UR - https://www.scopus.com/pages/publications/105029920762
U2 - 10.1109/IROS60139.2025.11246335
DO - 10.1109/IROS60139.2025.11246335
M3 - 会议稿件
AN - SCOPUS:105029920762
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 10010
EP - 10017
BT - IROS 2025 - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, Conference Proceedings
A2 - Laugier, Christian
A2 - Renzaglia, Alessandro
A2 - Atanasov, Nikolay
A2 - Birchfield, Stan
A2 - Cielniak, Grzegorz
A2 - De Mattos, Leonardo
A2 - Fiorini, Laura
A2 - Giguere, Philippe
A2 - Hashimoto, Kenji
A2 - Ibanez-Guzman, Javier
A2 - Kamegawa, Tetsushi
A2 - Lee, Jinoh
A2 - Loianno, Giuseppe
A2 - Luck, Kevin
A2 - Maruyama, Hisataka
A2 - Martinet, Philippe
A2 - Moradi, Hadi
A2 - Nunes, Urbano
A2 - Pettre, Julien
A2 - Pretto, Alberto
A2 - Ranzani, Tommaso
A2 - Ronnau, Arne
A2 - Rossi, Silvia
A2 - Rouse, Elliott
A2 - Ruggiero, Fabio
A2 - Simonin, Olivier
A2 - Wang, Danwei
A2 - Yang, Ming
A2 - Yoshida, Eiichi
A2 - Zhao, Huijing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025
Y2 - 19 October 2025 through 25 October 2025
ER -