TY - GEN
T1 - SafeRoute
T2 - 2025 IEEE/CVF International Conference on Computer Vision Workshops, ICCV-W 2025
AU - Shaw, Ankit Kumar
AU - Sah, Chandan Kumar
AU - Lian, Xiaoli
AU - Baig, Arsalan Shahid
AU - Wen, Tuopu
AU - Jiang, Kun
AU - Yang, Mengmeng
AU - Yang, Diange
AU - Zhang, Li
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Autonomous vehicles (AVs) require highly reliable traffic sign recognition and robust lane detection to navigate safely in complex and dynamic environments. This paper presents SafeRoute, a unified perception framework that integrates deep learning with instruction-tuned Multimodal Large Language Model (MLLM) for comprehensive road scene understanding. For traffic sign recognition, we benchmark three state-of-the-art architectures, ResNet-50, YOLOv8, and RT-DETR, achieving accuracies of 99.8%, 98.0%, and 96.6% respectively. To address the limitations of traditional vision-only methods in lane detection under adverse conditions (e.g. occlusion, poor lighting, road wear), we introduced a MLLM-based pipeline, fine-tuned via instruction learning without requiring large-scale pretraining. Our approach introduces a novel Multimodal Adapter that fuses CNN-derived spatial features with EVA-CLIP embeddings, enabling fine-grained visual grounding and robustness to occlusion. By integrating these visual tokens into a LLaMA-2 decoder, our system performs semantic-level reasoning and interpretable scene understanding, moving beyond segmentation to structured, language-based lane perception. Quantitatively, SafeRoute achieves a Frame Overall Accuracy (FRM) of 53.87%, Question Overall Accuracy (QNS) of 82.83%, and lane detection accuracies of 99.6% in clear conditions and 93.0% at night. It also demonstrates robust reasoning in adverse conditions, with 88.4% accuracy in rain and 95.6% under lane degradation. Overall, SafeRoute introduces a new paradigm in AV perception by offering a unified, multimodal approach, significantly improving both the robustness and explainability of lane detection in safety-critical scenarios.
AB - Autonomous vehicles (AVs) require highly reliable traffic sign recognition and robust lane detection to navigate safely in complex and dynamic environments. This paper presents SafeRoute, a unified perception framework that integrates deep learning with instruction-tuned Multimodal Large Language Model (MLLM) for comprehensive road scene understanding. For traffic sign recognition, we benchmark three state-of-the-art architectures, ResNet-50, YOLOv8, and RT-DETR, achieving accuracies of 99.8%, 98.0%, and 96.6% respectively. To address the limitations of traditional vision-only methods in lane detection under adverse conditions (e.g. occlusion, poor lighting, road wear), we introduced a MLLM-based pipeline, fine-tuned via instruction learning without requiring large-scale pretraining. Our approach introduces a novel Multimodal Adapter that fuses CNN-derived spatial features with EVA-CLIP embeddings, enabling fine-grained visual grounding and robustness to occlusion. By integrating these visual tokens into a LLaMA-2 decoder, our system performs semantic-level reasoning and interpretable scene understanding, moving beyond segmentation to structured, language-based lane perception. Quantitatively, SafeRoute achieves a Frame Overall Accuracy (FRM) of 53.87%, Question Overall Accuracy (QNS) of 82.83%, and lane detection accuracies of 99.6% in clear conditions and 93.0% at night. It also demonstrates robust reasoning in adverse conditions, with 88.4% accuracy in rain and 95.6% under lane degradation. Overall, SafeRoute introduces a new paradigm in AV perception by offering a unified, multimodal approach, significantly improving both the robustness and explainability of lane detection in safety-critical scenarios.
KW - Autonomous Vehicles (AVs)
KW - Deep Learning
KW - Lane Detection
KW - Multimodal Large Language Models (MLLMs)
KW - Traffic Sign Recognition
UR - https://www.scopus.com/pages/publications/105035201600
U2 - 10.1109/ICCVW69036.2025.00478
DO - 10.1109/ICCVW69036.2025.00478
M3 - 会议稿件
AN - SCOPUS:105035201600
T3 - Proceedings - 2025 IEEE/CVF International Conference on Computer Vision Workshops, ICCV-W 2025
SP - 4606
EP - 4615
BT - Proceedings - 2025 IEEE/CVF International Conference on Computer Vision Workshops, ICCV-W 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 October 2025 through 20 October 2025
ER -