跳到主要导航 跳到搜索 跳到主要内容

MCodeSearcher: Multi-View Contrastive Learning for Code Search

  • Jia Li
  • , Fang Liu
  • , Yunfei Zhao
  • , Ge Li*
  • , Zhi Jin*
  • *此作品的通讯作者
  • Peking University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Code search has been a critical software development activity in facilitating developers to retrieve a proper code snippet from open-source repositories given a user intent. In recent years, large-scale pre-trained models have shown impressive performance on code representation learning and have achieved state-of-the-art performance on code search task. However, it is challenging for these models to distinguish the functionally equivalent code snippets with dissimilar implementations or the non-equivalent code snippets that look similar. Due to the diversity of the code implementations, it is necessary for the code search engines to identify the functional similarities or dissimilarities of source code so as to return the functionally matched source code for a given query. Besides, existing pre-trained models mainly focus on learning the semantic representations of code snippets. The semantic correlation between the code snippet and natural language query is not sufficiently exploited. An effective code search tool not only needs to understand the relationship between queries and code snippets but also needs to identify the relationship between diversified code snippets. To address these limitations, we propose a novel multi-view contrastive learning model MCodeSearcher for code retrieval, aiming at sufficiently exploiting (1) the semantic correlation between queries and code snippets, and (2) the relationship between functionally equivalent code snippets. To achieve this, we design contrastive training objectives from three views and pre-train our model with these objectives. The experimental results on five representative code search datasets show that our approach significantly outperforms the state-of-the-art methods.

源语言英语
主期刊名14th Asia-Pacific Symposium on Internetware, Internetware 2023 - Proceedings
出版商Association for Computing Machinery
270-280
页数11
ISBN(电子版)9798400708947
DOI
出版状态已出版 - 4 8月 2023
活动14th Asia-Pacific Symposium on Internetware, Internetware 2023 - Hangzhou, 中国
期限: 4 8月 20236 8月 2023

出版系列

姓名ACM International Conference Proceeding Series

会议

会议14th Asia-Pacific Symposium on Internetware, Internetware 2023
国家/地区中国
Hangzhou
时期4/08/236/08/23

指纹

探究 'MCodeSearcher: Multi-View Contrastive Learning for Code Search' 的科研主题。它们共同构成独一无二的指纹。

引用此