Skip to main navigation Skip to search Skip to main content

Reducing False Positives of Static Bug Detectors Through Code Representation Learning

  • Beihang University
  • Huazhong University of Science and Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the increasing significance of software correctness and security, automatic static analysis tools (ASATs) play a more and more important role in software development due to their ability and scalability. However, compared to dynamic analysis methods, static tools often suffer from the severe problem of generating high false positive rates, due to their analysis mechanisms. To alleviate the false positive problem, many approaches have been proposed, which focus on manually extracted features from code snippets and then prioritize real warnings by means of statistics or machine learning techniques. However, manual encoded features are insufficient to achieve satisfactory performance across different datasets. In this study, we focus on exploring the effectiveness of various code representation learning (CRL) techniques in understanding the semantics of warnings generated by ASATs. In particular, our large-scale empirical study not only reveals that CRL models can effectively differentiate buggy code snippets (i.e., containing warnings detected by ASATs) from clean ones (the median of F1-score reaches 87.3 % for binary classification, and reaches 77.4 % for multi-class classification), they are also promising in identifying false positive warnings (the F1-score of best performer is 75.6%). Such findings drive us to further design a novel approach named PRI SM, to PRIoritize Static warnings based on aggregating multiple CRL Models to reduce the false positives generated by existing ASATs. Extensive evaluations demonstrate that our designed approach can outperform existing baselines significantly.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages681-692
Number of pages12
ISBN (Electronic)9798350330663
DOIs
StatePublished - 2024
Event31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024 - Rovaniemi, Finland
Duration: 12 Mar 202415 Mar 2024

Publication series

NameProceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024

Conference

Conference31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024
Country/TerritoryFinland
CityRovaniemi
Period12/03/2415/03/24

Keywords

  • Code Representation Learning
  • False Positive Warnings
  • Static Hug Detector

Fingerprint

Dive into the research topics of 'Reducing False Positives of Static Bug Detectors Through Code Representation Learning'. Together they form a unique fingerprint.

Cite this