TY - JOUR
T1 - TransCrack
T2 - revisiting fine-grained road crack detection with a transformer design
AU - Lin, Chunmian
AU - Tian, Daxin
AU - Duan, Xuting
AU - Zhou, Jianshan
N1 - Publisher Copyright:
© 2023 The Author(s).
PY - 2023/9/4
Y1 - 2023/9/4
N2 - Prior convolution-based road crack detectors typically learn more abstract visual representation with increasing receptive field via an encoder-decoder architecture. Despite the promising accuracy, progressive spatial resolution reduction causes semantic feature blurring, leading to coarse and incontiguous distress detection. To these ends, an alternative sequence-to-sequence perspective with a transformer network termed TransCrack is introduced for road crack detection. Specifically, an image is decomposed into a grid of fixed-size crack patches, which is flattened with position embedding into a sequence. We further propose a pure transformer-based encoder with multi-head reduced self-attention modules and feed-forward networks for explicitly modelling long-range dependencies from the sequential input in a global receptive field. More importantly, a simple decoder with cross-layer aggregation architecture is developed to incorporate global with local attentions across different regions for detailed feature recovery and pixel-wise crack mask prediction. Empirical studies are conducted on three publicly available damage detection benchmarks. The proposed TransCrack achieves a state-of-the-art performance over all counterparts by a substantialmargin, and qualitative results further demonstrate its superiority in contiguous crack recognition and fine-grained profile extraction. This article is part of the theme issue 'Artificial intelligence in failure analysis of transportation infrastructure and materials'.
AB - Prior convolution-based road crack detectors typically learn more abstract visual representation with increasing receptive field via an encoder-decoder architecture. Despite the promising accuracy, progressive spatial resolution reduction causes semantic feature blurring, leading to coarse and incontiguous distress detection. To these ends, an alternative sequence-to-sequence perspective with a transformer network termed TransCrack is introduced for road crack detection. Specifically, an image is decomposed into a grid of fixed-size crack patches, which is flattened with position embedding into a sequence. We further propose a pure transformer-based encoder with multi-head reduced self-attention modules and feed-forward networks for explicitly modelling long-range dependencies from the sequential input in a global receptive field. More importantly, a simple decoder with cross-layer aggregation architecture is developed to incorporate global with local attentions across different regions for detailed feature recovery and pixel-wise crack mask prediction. Empirical studies are conducted on three publicly available damage detection benchmarks. The proposed TransCrack achieves a state-of-the-art performance over all counterparts by a substantialmargin, and qualitative results further demonstrate its superiority in contiguous crack recognition and fine-grained profile extraction. This article is part of the theme issue 'Artificial intelligence in failure analysis of transportation infrastructure and materials'.
KW - deep learning
KW - infrastructure maintenance
KW - intelligent transportation systems
KW - road crack detection
KW - transformer
UR - https://www.scopus.com/pages/publications/85164854895
U2 - 10.1098/rsta.2022.0172
DO - 10.1098/rsta.2022.0172
M3 - 文章
C2 - 37454681
AN - SCOPUS:85164854895
SN - 1364-503X
VL - 381
JO - Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
JF - Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
IS - 2254
M1 - 20220172
ER -