TY - GEN
T1 - Chinese Traffic Guide Panel Text Detection Based on Pixel Aggregation
AU - Li, Xia
AU - Han, Tao
AU - Fan, Xudong
AU - Zhao, Wei
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The detection and recognition of road traffic signs has always been a key research project in the fields of assisted driving and intelligent transportation, which has ignited the enthusiasm of many researchers. However, most methods focus on the study of symbolic traffic signs, the study of detecting and recognizing textual traffic signs is still challenging. In order to realize the efficient and accurate detection of text in Chinese traffic guide panels, a text detection network based on pixel aggregation (PA) is proposed, which can detect text kernels and precisely aggregate their surrounding text pixels using similarity vectors. More specifically, the model neck is composed of one Single-scale Transformer Encoder (STE) and one Cross-scale Feature Fusion Module (CFFM). STE can explore the position-aware feature representation with self-attention mechanism. CFFM is a light-weight U-shaped module, which can merge multi-level information to guide better segmentation. After CFFM, the features from different levels are gathered into a final feature for segmentation. Experimental results on CTST-1600 show that our proposed method outperforms other text detection methods while ensuring real-time performance.
AB - The detection and recognition of road traffic signs has always been a key research project in the fields of assisted driving and intelligent transportation, which has ignited the enthusiasm of many researchers. However, most methods focus on the study of symbolic traffic signs, the study of detecting and recognizing textual traffic signs is still challenging. In order to realize the efficient and accurate detection of text in Chinese traffic guide panels, a text detection network based on pixel aggregation (PA) is proposed, which can detect text kernels and precisely aggregate their surrounding text pixels using similarity vectors. More specifically, the model neck is composed of one Single-scale Transformer Encoder (STE) and one Cross-scale Feature Fusion Module (CFFM). STE can explore the position-aware feature representation with self-attention mechanism. CFFM is a light-weight U-shaped module, which can merge multi-level information to guide better segmentation. After CFFM, the features from different levels are gathered into a final feature for segmentation. Experimental results on CTST-1600 show that our proposed method outperforms other text detection methods while ensuring real-time performance.
KW - Segmentation
KW - Text detection
KW - Transformer
UR - https://www.scopus.com/pages/publications/85206821520
U2 - 10.1109/AIPMV62663.2024.10692256
DO - 10.1109/AIPMV62663.2024.10692256
M3 - 会议稿件
AN - SCOPUS:85206821520
T3 - 2024 2nd International Conference on Algorithm, Image Processing and Machine Vision, AIPMV 2024
SP - 249
EP - 252
BT - 2024 2nd International Conference on Algorithm, Image Processing and Machine Vision, AIPMV 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International Conference on Algorithm, Image Processing and Machine Vision, AIPMV 2024
Y2 - 12 July 2024 through 14 July 2024
ER -