TY - GEN
T1 - Improving Vulnerability Detection with Hybrid Code Graph Representation
AU - Meng, Xiangxin
AU - Lu, Shaoxiao
AU - Wang, Xu
AU - Liu, Xudong
AU - Hu, Chunming
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The increasing richness of software applications contributes to the enhanced productivity and convenience in daily life. However, the growing software complexity simultaneously poses significant challenges to software security. As one of the most important solutions, vulnerability detection technology attracts increasing attention. This paper proposes a novel vulnerability detection method HybridNN based on graph neural networks (GNNs). To begin, we simplify the code property graph (CPG) to design a hybrid code graph (HCG) which is better suitable for the deep semantic extraction via GNN models. Subsequently, the datasets consisting of considerable amount of samples including both artificially synthesized and real-world vulnerabilities are constructed. Next, we leverage a GNN model with a hierarchical attention mechanism which is proficient in extracting deep semantics in heterogeneous graphs, and apply it to the newly designed HCG representation. Moreover, we propose UD-Sampling method, which combines up-sampling and down-sampling methods, to balance the distribution of the training samples. Finally, extensive experiments are conducted, showing that HybridNN outperforms all baseline methods.
AB - The increasing richness of software applications contributes to the enhanced productivity and convenience in daily life. However, the growing software complexity simultaneously poses significant challenges to software security. As one of the most important solutions, vulnerability detection technology attracts increasing attention. This paper proposes a novel vulnerability detection method HybridNN based on graph neural networks (GNNs). To begin, we simplify the code property graph (CPG) to design a hybrid code graph (HCG) which is better suitable for the deep semantic extraction via GNN models. Subsequently, the datasets consisting of considerable amount of samples including both artificially synthesized and real-world vulnerabilities are constructed. Next, we leverage a GNN model with a hierarchical attention mechanism which is proficient in extracting deep semantics in heterogeneous graphs, and apply it to the newly designed HCG representation. Moreover, we propose UD-Sampling method, which combines up-sampling and down-sampling methods, to balance the distribution of the training samples. Finally, extensive experiments are conducted, showing that HybridNN outperforms all baseline methods.
KW - graph neural network
KW - heterogeneous graph representation
KW - software security
KW - vulnerability detection
UR - https://www.scopus.com/pages/publications/85190561158
U2 - 10.1109/APSEC60848.2023.00036
DO - 10.1109/APSEC60848.2023.00036
M3 - 会议稿件
AN - SCOPUS:85190561158
T3 - Proceedings - Asia-Pacific Software Engineering Conference, APSEC
SP - 259
EP - 268
BT - Proceedings - 2023 30th Asia-Pacific Software Engineering Conference, APSEC 2023
PB - IEEE Computer Society
T2 - 30th Asia-Pacific Software Engineering Conference, APSEC 2023
Y2 - 4 December 2023 through 7 December 2023
ER -