Abstract
Drivable area segmentation (DAS) plays an important role in autonomous driving. Segment anything model (SAM) has recently emerged as a powerful foundation model, demonstrating remarkable potential across diverse downstream segmentation tasks through domain-specific parameter-efficient fine-tuning (PEFT). This paper explores effective adaptation strategies for applying SAM to DAS. However, existing approaches suffer from the following two limitations: 1) SAM employs a vanilla vision transformer (ViT) as its image encoder. However, the ViT struggles to extract multi-scale features without incurring substantial computational overhead; 2) current fine-tuning approaches for SAM have been found to inadequately explore traffic scene context. Thus, they are not fully optimized for DAS and leave much room for improvement. To address the above issues, we propose segment anything model for drivable area segmentation termed as DAS-SAM, a novel efficient adaption framework that fine-tunes SAM towards DAS. Our approach incorporates a lightweight, learnable network to extract multi-scale features and introduces three auxiliary learning objectives to incorporate traffic scene context. Furthermore, DAS-SAM employs mosaic image augmentation to improve robustness and generalization. Our framework is compatible with most of the existing PEFT methods, allowing for flexible integration that boosts performance. Extensive experiments on the BDD100k and Cityscapes datasets demonstrate that DAS-SAM outperforms both full fine-tuning and state-of-the-art PEFT methods.
| Original language | English |
|---|---|
| Article number | 6 |
| Journal | Visual Intelligence |
| Volume | 4 |
| Issue number | 1 |
| DOIs | |
| State | Published - Dec 2026 |
Keywords
- Drivable area segmentation (DAS)
- Multi-scale feature extraction
- Parameter-efficient fine-tuning (PEFT)
- Segment anything model (SAM)
- Traffic scene context
Fingerprint
Dive into the research topics of 'DAS-SAM: fine-tuning SAM towards drivable area segmentation via efficient multi-scale traffic scene-aware adaptation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver