TY - GEN
T1 - A method for detecting abnormal users with fake stars
AU - Jiang, Jing
AU - Li, Hao
AU - Liu, Yifan
AU - Zhang, Li
N1 - Publisher Copyright:
© 2022 by KSI Research Inc. and Knowledge Systems Institute, USA.
PY - 2022
Y1 - 2022
N2 - In GitHub, users star interesting repositories, and the number of stars is viewed as the significant measure of repository popularity. Some repositories obtain fake stars by unjustified means, which ruin efforts that communities have made stars a valuable indicator, and bring negative impacts in GitHub. Therefore, it is important to stop abusing GitHub stars and detect abnormal users who provide fake stars. In this paper, we first define features from the user dimension and repository dimension. Then we perform differential analysis and find that most of the features show a significant difference between abnormal users and normal users. Next, we propose a method AUDetec for Abnormal User Detection. The method AUDetec uses the decision tree to detect the abnormal users based on two features, including the sum of repositories starred by the user and the median value of the number of days since creation for repositories starred by the user. We evaluate the effectiveness of AUDetec on the data set which contains 120 abnormal users and 240 normal users. The experiment results show that AUDetec has a high performance by achieving an accuracy of 99.86% on average.
AB - In GitHub, users star interesting repositories, and the number of stars is viewed as the significant measure of repository popularity. Some repositories obtain fake stars by unjustified means, which ruin efforts that communities have made stars a valuable indicator, and bring negative impacts in GitHub. Therefore, it is important to stop abusing GitHub stars and detect abnormal users who provide fake stars. In this paper, we first define features from the user dimension and repository dimension. Then we perform differential analysis and find that most of the features show a significant difference between abnormal users and normal users. Next, we propose a method AUDetec for Abnormal User Detection. The method AUDetec uses the decision tree to detect the abnormal users based on two features, including the sum of repositories starred by the user and the median value of the number of days since creation for repositories starred by the user. We evaluate the effectiveness of AUDetec on the data set which contains 120 abnormal users and 240 normal users. The experiment results show that AUDetec has a high performance by achieving an accuracy of 99.86% on average.
KW - Abnormal user detection
KW - Fake star
KW - GitHub
KW - Open source software
KW - Repository popularity
UR - https://www.scopus.com/pages/publications/85138106731
U2 - 10.18293/DMSVIVA22-005
DO - 10.18293/DMSVIVA22-005
M3 - 会议稿件
AN - SCOPUS:85138106731
T3 - DMSVIVA 2022 - Proceedings of the 28th International DMS Conference on Visualization and Visual Languages
SP - 63
EP - 68
BT - DMSVIVA 2022 - Proceedings of the 28th International DMS Conference on Visualization and Visual Languages
PB - Knowledge Systems Institute Graduate School, KSI Research Inc.
T2 - 28th International DMS Conference on Visualization and Visual Languages, DMSVIVA 2022
Y2 - 29 June 2022 through 30 June 2022
ER -