TY - GEN
T1 - Line-level Semantic Structure Learning for Code Vulnerability Detection
AU - Wang, Ziliang
AU - Li, Ge
AU - Li, Jia
AU - Dong, Yihong
AU - Xiong, Yingfei
AU - Jin, Zhi
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/10/27
Y1 - 2025/10/27
N2 - Unlike the flow structure of natural languages, programming languages have an inherent rigidity in structure and grammar. However, existing detection methods based on pre-trained models typically treat code as a natural language sequence, ignoring its unique structural information. This hinders the models from understanding the code’s semantic and structual information. To address this problem, we introduce the Code Structure-Aware Network through Line-level Semantic Learning (CSLS), which comprises four components: code preprocessing, global semantics awareness, line semantic awareness, and line semantic structure awareness. The preprocessing step transforms the code into two types of text: global code text and line-level code text. Unlike typical preprocessing methods, CSLS preserves structural elements such as line breaks and indentation characters while processing the global text. While preserving global code semantics, the CSLS network emphasizes capturing structural relationships between line semantics. By modeling each line’s semantics, CSLS treats line-level semantics as the smallest structural unit to learn nonlinear structural relationships, thereby improving code vulnerability detection accuracy. We conducted extensive experiments on vulnerability detection datasets from real projects. Results show that our preprocessing method significantly enhances the performance of existing baseline models. Additionally, the CSLS model outperforms the state-of-the-art baselines in code vulnerability detection, achieving 70.57% accuracy on the Devign dataset and a 49.59% F1 score on the Reveal dataset. These results demonstrate the importance of preserving and utilizing code structure information to improve the performance of code vulnerability detection models.
AB - Unlike the flow structure of natural languages, programming languages have an inherent rigidity in structure and grammar. However, existing detection methods based on pre-trained models typically treat code as a natural language sequence, ignoring its unique structural information. This hinders the models from understanding the code’s semantic and structual information. To address this problem, we introduce the Code Structure-Aware Network through Line-level Semantic Learning (CSLS), which comprises four components: code preprocessing, global semantics awareness, line semantic awareness, and line semantic structure awareness. The preprocessing step transforms the code into two types of text: global code text and line-level code text. Unlike typical preprocessing methods, CSLS preserves structural elements such as line breaks and indentation characters while processing the global text. While preserving global code semantics, the CSLS network emphasizes capturing structural relationships between line semantics. By modeling each line’s semantics, CSLS treats line-level semantics as the smallest structural unit to learn nonlinear structural relationships, thereby improving code vulnerability detection accuracy. We conducted extensive experiments on vulnerability detection datasets from real projects. Results show that our preprocessing method significantly enhances the performance of existing baseline models. Additionally, the CSLS model outperforms the state-of-the-art baselines in code vulnerability detection, achieving 70.57% accuracy on the Devign dataset and a 49.59% F1 score on the Reveal dataset. These results demonstrate the importance of preserving and utilizing code structure information to improve the performance of code vulnerability detection models.
KW - Large language model
KW - Model collaboration
KW - Pre-trained models
KW - Vulnerability detection
UR - https://www.scopus.com/pages/publications/105023676176
U2 - 10.1145/3755881.3755894
DO - 10.1145/3755881.3755894
M3 - 会议稿件
AN - SCOPUS:105023676176
T3 - 16th International Conference on Internetware, Internetware 2025 - Proceedings
SP - 269
EP - 280
BT - 16th International Conference on Internetware, Internetware 2025 - Proceedings
A2 - Mei, Hong
A2 - Lv, Jian
A2 - Jin, Zhi
A2 - Li, Xuandong
A2 - Zimmermann, Thomas
A2 - Li, Ge
A2 - Bu, Lei
A2 - Xia, Xin
PB - Association for Computing Machinery, Inc
T2 - 16th International Conference on Internetware, Internetware 2025
Y2 - 20 June 2025 through 22 June 2025
ER -