Skip to main navigation Skip to search Skip to main content

A method for identifying references between projects in GitHub

Research output: Contribution to journalArticlepeer-review

Abstract

In open source software platforms, software projects do not usually develop in isolation, and they depend on each other and develop together. It is important to identify references between projects in software development activities, which may help projects identify cross-project bugs or attract new contributors from related projects. In this paper, we propose a method IREL to Identify References between projects by Extracting Links. We first extract links from descriptions and comments on issues, pull requests, and commits with three matching patterns. Then we identify changes in project names and replace the original project names with their new project names. Finally, we identify references between projects by selecting links with different source projects and target projects. We evaluate the performance based on datasets with 20,347,228 projects. Our method IREL obtains 934,322 references, 26.461 times as many as the method Reference Coupling and 16.483 times as many as the method Issue Units. Project PageRank scores based on references identified by our method IREL are more correlated with the number of stars of projects. Our method supports researchers to identify references better.

Original languageEnglish
Article number102858
JournalScience of Computer Programming
Volume222
DOIs
StatePublished - 1 Oct 2022

Keywords

  • GitHub
  • Redirected project
  • References identification

Fingerprint

Dive into the research topics of 'A method for identifying references between projects in GitHub'. Together they form a unique fingerprint.

Cite this