Skip to main navigation Skip to search Skip to main content

Line-level Semantic Structure Learning for Code Vulnerability Detection

  • Ziliang Wang
  • , Ge Li*
  • , Jia Li
  • , Yihong Dong
  • , Yingfei Xiong
  • , Zhi Jin
  • *Corresponding author for this work
  • Peking University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Unlike the flow structure of natural languages, programming languages have an inherent rigidity in structure and grammar. However, existing detection methods based on pre-trained models typically treat code as a natural language sequence, ignoring its unique structural information. This hinders the models from understanding the code’s semantic and structual information. To address this problem, we introduce the Code Structure-Aware Network through Line-level Semantic Learning (CSLS), which comprises four components: code preprocessing, global semantics awareness, line semantic awareness, and line semantic structure awareness. The preprocessing step transforms the code into two types of text: global code text and line-level code text. Unlike typical preprocessing methods, CSLS preserves structural elements such as line breaks and indentation characters while processing the global text. While preserving global code semantics, the CSLS network emphasizes capturing structural relationships between line semantics. By modeling each line’s semantics, CSLS treats line-level semantics as the smallest structural unit to learn nonlinear structural relationships, thereby improving code vulnerability detection accuracy. We conducted extensive experiments on vulnerability detection datasets from real projects. Results show that our preprocessing method significantly enhances the performance of existing baseline models. Additionally, the CSLS model outperforms the state-of-the-art baselines in code vulnerability detection, achieving 70.57% accuracy on the Devign dataset and a 49.59% F1 score on the Reveal dataset. These results demonstrate the importance of preserving and utilizing code structure information to improve the performance of code vulnerability detection models.

Original languageEnglish
Title of host publication16th International Conference on Internetware, Internetware 2025 - Proceedings
EditorsHong Mei, Jian Lv, Zhi Jin, Xuandong Li, Thomas Zimmermann, Ge Li, Lei Bu, Xin Xia
PublisherAssociation for Computing Machinery, Inc
Pages269-280
Number of pages12
ISBN (Electronic)9798400719264
DOIs
StatePublished - 27 Oct 2025
Externally publishedYes
Event16th International Conference on Internetware, Internetware 2025 - Trondheim, Norway
Duration: 20 Jun 202522 Jun 2025

Publication series

Name16th International Conference on Internetware, Internetware 2025 - Proceedings

Conference

Conference16th International Conference on Internetware, Internetware 2025
Country/TerritoryNorway
CityTrondheim
Period20/06/2522/06/25

Keywords

  • Large language model
  • Model collaboration
  • Pre-trained models
  • Vulnerability detection

Fingerprint

Dive into the research topics of 'Line-level Semantic Structure Learning for Code Vulnerability Detection'. Together they form a unique fingerprint.

Cite this