跳到主要导航 跳到搜索 跳到主要内容

CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training

  • Xin Wang
  • , Yasheng Wang
  • , Yao Wan
  • , Jiawei Wang
  • , Pingyi Zhou
  • , Li Li
  • , Hao Wu
  • , Jin Liu*
  • *此作品的通讯作者
  • Wuhan University
  • Huawei Technologies Co., Ltd.
  • Huazhong University of Science and Technology
  • Monash University
  • Yunnan University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Recent years have witnessed increasing interest in code representation learning, which aims to represent the semantics of source code into distributed vectors. Currently, various works have been proposed to represent the complex semantics of source code from different views, including plain text, Abstract Syntax Tree (AST), and several kinds of code graphs (e.g., Control/Data Flow Graph). However, most of them only consider a single view of source code independently, ignoring the correspondences among different views. In this paper, we propose to integrate different views with the naturallanguage description of source code into a unified framework with Multi-View contrastive Pre-training, and name our model as CODEMVP. Specifically, we first extract multiple code views using compiler tools, and learn the complementary information among them under a contrastive learning framework. Inspired by the type checking in compilation, we also design a fine-grained type inference objective in the pre-training. Experiments on three downstream tasks over five datasets demonstrate the superiority of CODE-MVP when compared with several state-of-the-art baselines. For example, we achieve 2.4/2.3/1.1 gain in terms of MRR/MAP/Accuracy metrics on natural language code retrieval, code similarity, and code defect detection tasks, respectively.

源语言英语
主期刊名Findings of the Association for Computational Linguistics
主期刊副标题NAACL 2022 - Findings
出版商Association for Computational Linguistics (ACL)
1066-1077
页数12
ISBN(电子版)9781955917766
DOI
出版状态已出版 - 2022
已对外发布
活动2022 Findings of the Association for Computational Linguistics: NAACL 2022 - Seattle, 美国
期限: 10 7月 202215 7月 2022

出版系列

姓名Findings of the Association for Computational Linguistics: NAACL 2022 - Findings

会议

会议2022 Findings of the Association for Computational Linguistics: NAACL 2022
国家/地区美国
Seattle
时期10/07/2215/07/22

指纹

探究 'CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training' 的科研主题。它们共同构成独一无二的指纹。

引用此