TY - GEN
T1 - Predicting the number of forks for open source software project
AU - Chen, Fangwei
AU - Li, Lei
AU - Jiang, Jing
AU - Zhang, Li
PY - 2014
Y1 - 2014
N2 - GitHub is successful open source software platform which attract many developers. In GitHub, developers are allowed to fork repositories and copy repositories without asking for permission, which make contribution to projects much easier than it has ever been. It is significant to predict the number of forks for open source software projects. The prediction can help GitHub to recommend popular projects, and guide developers to find projects which are likely to succeed and worthy of their contribution. In this paper, we use stepwise regression and design a model to predict the number of forks for open source software projects. Then we collect datasets of 1,000 repositories through GitHub's APIs. We use datasets of 700 repositories to compute the weight of attributes and realize the model. Then we use other 300 repositories to verify the prediction accuracy of our model. Advantages of our model include: (1) Some attributes used in our model are new. This is because GitHub is different from traditional open source software platforms and has some new features. These new features are used to build our model. (2) Our model uses project information within t month after its creation, and predicts the number of forks in the month T (t < T). It allows users to set the combination of time parameters and satisfy their own needs. (3) Our model predicts the exact number of forks, rather than the range of the number of forks (4) Experiments show that our model has high prediction accuracy. For example, we use project information with 3 months to prediction the number of forks in month 6 after its creation. The correlation coefficient is as high as 0.992, and the median number of absolute difference between prediction value and actual value is only 1.8. It shows that the predicted number of forks is very close to the actual number of forks. Our model also has high prediction accuracy when we set other time parameters.
AB - GitHub is successful open source software platform which attract many developers. In GitHub, developers are allowed to fork repositories and copy repositories without asking for permission, which make contribution to projects much easier than it has ever been. It is significant to predict the number of forks for open source software projects. The prediction can help GitHub to recommend popular projects, and guide developers to find projects which are likely to succeed and worthy of their contribution. In this paper, we use stepwise regression and design a model to predict the number of forks for open source software projects. Then we collect datasets of 1,000 repositories through GitHub's APIs. We use datasets of 700 repositories to compute the weight of attributes and realize the model. Then we use other 300 repositories to verify the prediction accuracy of our model. Advantages of our model include: (1) Some attributes used in our model are new. This is because GitHub is different from traditional open source software platforms and has some new features. These new features are used to build our model. (2) Our model uses project information within t month after its creation, and predicts the number of forks in the month T (t < T). It allows users to set the combination of time parameters and satisfy their own needs. (3) Our model predicts the exact number of forks, rather than the range of the number of forks (4) Experiments show that our model has high prediction accuracy. For example, we use project information with 3 months to prediction the number of forks in month 6 after its creation. The correlation coefficient is as high as 0.992, and the median number of absolute difference between prediction value and actual value is only 1.8. It shows that the predicted number of forks is very close to the actual number of forks. Our model also has high prediction accuracy when we set other time parameters.
KW - Fork
KW - Open Source Software
UR - https://www.scopus.com/pages/publications/84903649357
U2 - 10.1145/2627508.2627515
DO - 10.1145/2627508.2627515
M3 - 会议稿件
AN - SCOPUS:84903649357
SN - 9781450329651
T3 - ACM International Conference Proceeding Series
SP - 40
EP - 47
BT - 2014 3rd International Workshop on Evidential Assessment of Software Technologies, EAST 2014 - Proceedings
PB - Association for Computing Machinery
T2 - 2014 3rd International Workshop on Evidential Assessment of Software Technologies, EAST 2014
Y2 - 26 May 2014 through 26 May 2014
ER -