Skip to main navigation Skip to search Skip to main content

A Novel Neural Source Code Representation Based on Abstract Syntax Tree

  • Beihang University
  • University of Newcastle

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Exploiting machine learning techniques for analyzing programs has attracted much attention. One key problem is how to represent code fragments well for follow-up analysis. Traditional information retrieval based methods often treat programs as natural language texts, which could miss important semantic information of source code. Recently, state-of-the-art studies demonstrate that abstract syntax tree (AST) based neural models can better represent source code. However, the sizes of ASTs are usually large and the existing models are prone to the long-term dependency problem. In this paper, we propose a novel AST-based Neural Network (ASTNN) for source code representation. Unlike existing models that work on entire ASTs, ASTNN splits each large AST into a sequence of small statement trees, and encodes the statement trees to vectors by capturing the lexical and syntactical knowledge of statements. Based on the sequence of statement vectors, a bidirectional RNN model is used to leverage the naturalness of statements and finally produce the vector representation of a code fragment. We have applied our neural network based source code representation method to two common program comprehension tasks: source code classification and code clone detection. Experimental results on the two tasks indicate that our model is superior to state-of-the-art approaches.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE/ACM 41st International Conference on Software Engineering, ICSE 2019
PublisherIEEE Computer Society
Pages783-794
Number of pages12
ISBN (Electronic)9781728108698
DOIs
StatePublished - May 2019
Event41st IEEE/ACM International Conference on Software Engineering, ICSE 2019 - Montreal, Canada
Duration: 25 May 201931 May 2019

Publication series

NameProceedings - International Conference on Software Engineering
Volume2019-May
ISSN (Print)0270-5257

Conference

Conference41st IEEE/ACM International Conference on Software Engineering, ICSE 2019
Country/TerritoryCanada
CityMontreal
Period25/05/1931/05/19

Keywords

  • Abstract Syntax Tree
  • code classification
  • code clone detection
  • neural network
  • source code representation

Fingerprint

Dive into the research topics of 'A Novel Neural Source Code Representation Based on Abstract Syntax Tree'. Together they form a unique fingerprint.

Cite this