Skip to main navigation Skip to search Skip to main content

Defect Prediction with Semantics and Context Features of Codes Based on Graph Representation Learning

  • Jiaxi Xu
  • , Fei Wang
  • , Jun Ai*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

To optimize the process of software testing and to improve software quality and reliability, many attempts have been made to develop more effective methods for predicting software defects. Previous work on defect prediction has used machine learning and artificial software metrics. Unfortunately, artificial metrics are unable to represent the features of syntactic, semantic, and context information of defective modules. In this article, therefore, we propose a practical approach for identifying software defect patterns via the combination of semantics and context information using abstract syntax tree representation learning. Graph neural networks are also leveraged to capture the latent defect information of defective subtrees, which are pruned based on a fix-inducing change. To validate the proposed approach for predicting defects, we define mining rules based on the GitHub workflow and collect 6052 defects from 307 projects. The experiments indicate that the proposed approach performs better than the state-of-the-art approach and five traditional machine learning baselines. An ablation study shows that the information about code concepts leads to a significant increase in accuracy.

Original languageEnglish
Article number9290043
Pages (from-to)613-625
Number of pages13
JournalIEEE Transactions on Reliability
Volume70
Issue number2
DOIs
StatePublished - Jun 2021

Keywords

  • Deep learning
  • defect prediction
  • graph representation learning
  • software defect dataset
  • software engineering

Fingerprint

Dive into the research topics of 'Defect Prediction with Semantics and Context Features of Codes Based on Graph Representation Learning'. Together they form a unique fingerprint.

Cite this