TY - GEN
T1 - Recommending Good First Issues in GitHub OSS Projects
AU - Xiao, Wenxin
AU - He, Hao
AU - Xu, Weiwei
AU - Tan, Xin
AU - Dong, Jinhao
AU - Zhou, Migahui
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/7/5
Y1 - 2022/7/5
N2 - Attracting and retaining newcomers is vital for the sustainability of an open-source software project. However, it is difficult for new-comers to locate suitable development tasks, while existing 'Good First Issues' (GFI) in GitHub are often insufficient and inappropriate. In this paper, we propose RECGFI, an effective practical approach for the recommendation of good first issues to newcomers, which can be used to relieve maintainers' burden and help newcomers onboard. RECGFI models an issue with features from multiple dimensions (content, background, and dynamics) and uses an XGBoost classifier to generate its probability of being a GFI. To evaluate RECGFI, we collect 53,510 resolved issues among 100 GitHub projects and care-fully restore their historical states to build ground truth datasets. Our evaluation shows that RECGFI can achieve up to 0.853 AUC in the ground truth dataset and outperforms alternative models. Our interpretable analysis of the trained model further reveals in-teresting observations about GFI characteristics. Finally, we report latest issues (without GFI-signaling labels but recommended as GFI by our approach) to project maintainers among which 16 are confirmed as real GFIs and five have been resolved by a newcomer.
AB - Attracting and retaining newcomers is vital for the sustainability of an open-source software project. However, it is difficult for new-comers to locate suitable development tasks, while existing 'Good First Issues' (GFI) in GitHub are often insufficient and inappropriate. In this paper, we propose RECGFI, an effective practical approach for the recommendation of good first issues to newcomers, which can be used to relieve maintainers' burden and help newcomers onboard. RECGFI models an issue with features from multiple dimensions (content, background, and dynamics) and uses an XGBoost classifier to generate its probability of being a GFI. To evaluate RECGFI, we collect 53,510 resolved issues among 100 GitHub projects and care-fully restore their historical states to build ground truth datasets. Our evaluation shows that RECGFI can achieve up to 0.853 AUC in the ground truth dataset and outperforms alternative models. Our interpretable analysis of the trained model further reveals in-teresting observations about GFI characteristics. Finally, we report latest issues (without GFI-signaling labels but recommended as GFI by our approach) to project maintainers among which 16 are confirmed as real GFIs and five have been resolved by a newcomer.
KW - good first issues
KW - onboarding
KW - open-source software
UR - https://www.scopus.com/pages/publications/85133534834
U2 - 10.1145/3510003.3510196
DO - 10.1145/3510003.3510196
M3 - 会议稿件
AN - SCOPUS:85133534834
T3 - Proceedings - International Conference on Software Engineering
SP - 1830
EP - 1842
BT - Proceedings - 2022 ACM/IEEE 44th International Conference on Software Engineering, ICSE 2022
PB - IEEE Computer Society
T2 - 44th ACM/IEEE International Conference on Software Engineering, ICSE 2022
Y2 - 22 May 2022 through 27 May 2022
ER -