跳到主要导航 跳到搜索 跳到主要内容

Natural Language to Code: How Far Are We?

  • Shangwen Wang
  • , Mingyang Geng
  • , Bo Lin
  • , Zhensu Sun
  • , Ming Wen*
  • , Yepang Liu*
  • , Li Li
  • , Tegawendé F. Bissyandé
  • , Xiaoguang Mao
  • *此作品的通讯作者
  • National University of Defense Technology
  • The Key Laboratory of Software Engineering for Complex Systems
  • Singapore Management University
  • Huazhong University of Science and Technology
  • Southern University of Science and Technology
  • University of Luxembourg

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

A longstanding dream in software engineering research is to devise effective approaches for automating development tasks based on developers' informally-specified intentions. Such intentions are generally in the form of natural language descriptions. In recent literature, a number of approaches have been proposed to automate tasks such as code search and even code generation based on natural language inputs. While these approaches vary in terms of technical designs, their objective is the same: transforming a developer's intention into source code. The literature, however, lacks a comprehensive understanding towards the effectiveness of existing techniques as well as their complementarity to each other. We propose to fill this gap through a large-scale empirical study where we systematically evaluate natural language to code techniques. Specifically, we consider six state-of-the-art techniques targeting code search, and four targeting code generation. Through extensive evaluations on a dataset of 22K+ natural language queries, our study reveals the following major findings: (1) code search techniques based on model pre-training are so far the most effective while code generation techniques can also provide promising results; (2) complementarity widely exists among the existing techniques; and (3) combining the ten techniques together can enhance the performance for 35% compared with the most effective standalone technique. Finally, we propose a post-processing strategy to automatically integrate different techniques based on their generated code. Experimental results show that our devised strategy is both effective and extensible.

源语言英语
主期刊名ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
编辑Satish Chandra, Kelly Blincoe, Paolo Tonella
出版商Association for Computing Machinery, Inc
375-387
页数13
ISBN(电子版)9798400703270
DOI
出版状态已出版 - 30 11月 2023
活动31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023 - San Francisco, 美国
期限: 3 12月 20239 12月 2023

出版系列

姓名ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

会议

会议31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023
国家/地区美国
San Francisco
时期3/12/239/12/23

指纹

探究 'Natural Language to Code: How Far Are We?' 的科研主题。它们共同构成独一无二的指纹。

引用此