Skip to main navigation Skip to search Skip to main content

Predicting the number of forks for open source software project

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

GitHub is successful open source software platform which attract many developers. In GitHub, developers are allowed to fork repositories and copy repositories without asking for permission, which make contribution to projects much easier than it has ever been. It is significant to predict the number of forks for open source software projects. The prediction can help GitHub to recommend popular projects, and guide developers to find projects which are likely to succeed and worthy of their contribution. In this paper, we use stepwise regression and design a model to predict the number of forks for open source software projects. Then we collect datasets of 1,000 repositories through GitHub's APIs. We use datasets of 700 repositories to compute the weight of attributes and realize the model. Then we use other 300 repositories to verify the prediction accuracy of our model. Advantages of our model include: (1) Some attributes used in our model are new. This is because GitHub is different from traditional open source software platforms and has some new features. These new features are used to build our model. (2) Our model uses project information within t month after its creation, and predicts the number of forks in the month T (t < T). It allows users to set the combination of time parameters and satisfy their own needs. (3) Our model predicts the exact number of forks, rather than the range of the number of forks (4) Experiments show that our model has high prediction accuracy. For example, we use project information with 3 months to prediction the number of forks in month 6 after its creation. The correlation coefficient is as high as 0.992, and the median number of absolute difference between prediction value and actual value is only 1.8. It shows that the predicted number of forks is very close to the actual number of forks. Our model also has high prediction accuracy when we set other time parameters.

Original languageEnglish
Title of host publication2014 3rd International Workshop on Evidential Assessment of Software Technologies, EAST 2014 - Proceedings
PublisherAssociation for Computing Machinery
Pages40-47
Number of pages8
ISBN (Print)9781450329651
DOIs
StatePublished - 2014
Event2014 3rd International Workshop on Evidential Assessment of Software Technologies, EAST 2014 - Nanjing, China
Duration: 26 May 201426 May 2014

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2014 3rd International Workshop on Evidential Assessment of Software Technologies, EAST 2014
Country/TerritoryChina
CityNanjing
Period26/05/1426/05/14

Keywords

  • Fork
  • Open Source Software

Fingerprint

Dive into the research topics of 'Predicting the number of forks for open source software project'. Together they form a unique fingerprint.

Cite this