TY - GEN
T1 - Contrastive Multi-Modal Fusion for Enhanced Airport Surface Surveillance
AU - Chao, Xu
AU - Cai, Kaiquan
AU - Zhao, Peng
AU - Su, Jian
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Remote and Virtual Tower (RVT) is increasingly attracting attention for its potential to reduce human resource requirements and construction costs in air traffic management. While advances in artificial intelligence have enhanced the safety and efficiency of RVT systems, key tasks such as aircraft detection remain challenged by limitations in the data quality of single modality, affecting the accuracy of current implementations. To address these challenges, we introduce a framework named Vision-ADSB Network (VAD-Net) to fuse primary surface monitoring data from RVT. Unlike traditional methods, VAD-Net integrates visual data with Automatic Dependent Surveillance Broadcast (ADS-B) information using a specialized method based on contrastive learning, generating a comprehensive semantic representation. By fusing camera and ADS-B data, VAD-Net addresses the shortcomings of single-modality data and enhances the overall monitoring capabilities. Experimental results, validated with data collected from RVT systems, demonstrate that the VAD-Net model outperforms existing methods, even when working with limited datasets.
AB - Remote and Virtual Tower (RVT) is increasingly attracting attention for its potential to reduce human resource requirements and construction costs in air traffic management. While advances in artificial intelligence have enhanced the safety and efficiency of RVT systems, key tasks such as aircraft detection remain challenged by limitations in the data quality of single modality, affecting the accuracy of current implementations. To address these challenges, we introduce a framework named Vision-ADSB Network (VAD-Net) to fuse primary surface monitoring data from RVT. Unlike traditional methods, VAD-Net integrates visual data with Automatic Dependent Surveillance Broadcast (ADS-B) information using a specialized method based on contrastive learning, generating a comprehensive semantic representation. By fusing camera and ADS-B data, VAD-Net addresses the shortcomings of single-modality data and enhances the overall monitoring capabilities. Experimental results, validated with data collected from RVT systems, demonstrate that the VAD-Net model outperforms existing methods, even when working with limited datasets.
KW - Airport scene monitoring
KW - Contrastive learning
KW - Intelligent traffic management
KW - Multimodal fusion
KW - Remote and virtual towers
UR - https://www.scopus.com/pages/publications/105005205581
U2 - 10.1109/ICNS65417.2025.10976824
DO - 10.1109/ICNS65417.2025.10976824
M3 - 会议稿件
AN - SCOPUS:105005205581
T3 - Integrated Communications, Navigation and Surveillance Conference, ICNS
BT - ICNS 2025 - Integrated Communications, Navigation and Surveillance Conference
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 Integrated Communications, Navigation and Surveillance Conference, ICNS 2025
Y2 - 8 April 2025 through 10 April 2025
ER -