Abstract
Traffic scene understanding plays a crucial role in reasoning about and predicting relationships among entities in traffic images. It focuses on analyzing behavioral interaction patterns and global semantic associations to support higher-level traffic requirements. However, few existing frameworks can achieve comprehensive scene understanding and semantic description in complex traffic environments. In particular, effective multiview semantic association modeling is still lacking. To address these challenges, we propose multiview large language model (MVLLM), which integrates YOLO-based object detection with the reasoning ability of large language models (LLMs). Through prompt engineering, MVLLM utilizes the visual information extracted by YOLO to constrain the semantic space and guide the reasoning behavior, thereby enhancing the scene parsing capability. Meanwhile, we design a Chain-of-Thought (CoT) reasoning mechanism to establish spatiotemporal associations across multiple views and to integrate their scene understanding with semantic descriptions. The framework enables intent understanding for vehicles in dynamic environments, enhancing driving safety. It also provides comprehensive semantic descriptions for traffic management agencies, supporting holistic analyses of vehicles, roads, and environmental contexts.
| Original language | English |
|---|---|
| Article number | 2814128 |
| Journal | Journal of Advanced Transportation |
| Volume | 2026 |
| Issue number | 1 |
| DOIs | |
| State | Published - 2026 |
Keywords
- LLM
- deep learning
- multiview integration
- road transportation
- traffic scene understanding
Fingerprint
Dive into the research topics of 'A Multiview-Integrated Framework for Traffic Scene Understanding Based on YOLO and LLM'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver