TY - JOUR
T1 - Data-Driven Transferable Modeling for Cross-Project Software Vulnerability Detection via Dual-Feature Stacking Ensemble
AU - Liu, Yu
AU - Liu, Bin
AU - Wang, Shihai
AU - Hu, Bin
AU - Jin, Yujie
N1 - Publisher Copyright:
© 2026 by the authors.
PY - 2026/3
Y1 - 2026/3
N2 - In recent years, deep learning-based vulnerability detection has drawn wide attention for its data-driven ability to analyze code semantics and learn vulnerability patterns without predefined models. However, data distribution differences across projects limit model generalization. Transfer learning provides a solution, yet most studies ignore expert-designed metrics. This paper proposes Decpvd, a data-driven cross-project software vulnerability detection method based on a dual-feature stacking ensemble. It builds an adaptive and transferable model using only code and vulnerability label data from source and target projects. It extracts code semantic features via Gated Graph Neural Networks, incorporates expert metrics from tools, performs cross-domain data-driven modeling with TrAdaBoost, and adaptively fuses the two features through stacking, overcoming fixed-weight fusion limitations. Experiments on six cross-project groups from three real datasets (FFmpeg, LibTIFF, LibPNG) show that Decpvd achieves an average AUC of 0.814, significantly outperforming mainstream baselines.
AB - In recent years, deep learning-based vulnerability detection has drawn wide attention for its data-driven ability to analyze code semantics and learn vulnerability patterns without predefined models. However, data distribution differences across projects limit model generalization. Transfer learning provides a solution, yet most studies ignore expert-designed metrics. This paper proposes Decpvd, a data-driven cross-project software vulnerability detection method based on a dual-feature stacking ensemble. It builds an adaptive and transferable model using only code and vulnerability label data from source and target projects. It extracts code semantic features via Gated Graph Neural Networks, incorporates expert metrics from tools, performs cross-domain data-driven modeling with TrAdaBoost, and adaptively fuses the two features through stacking, overcoming fixed-weight fusion limitations. Experiments on six cross-project groups from three real datasets (FFmpeg, LibTIFF, LibPNG) show that Decpvd achieves an average AUC of 0.814, significantly outperforming mainstream baselines.
KW - adaptive model fusion
KW - cross-project vulnerability detection
KW - data-driven modeling
KW - expert metrics
KW - semantic metrics
KW - transfer learning
UR - https://www.scopus.com/pages/publications/105032815797
U2 - 10.3390/math14050780
DO - 10.3390/math14050780
M3 - 文章
AN - SCOPUS:105032815797
SN - 2227-7390
VL - 14
JO - Mathematics
JF - Mathematics
IS - 5
M1 - 780
ER -