TY - GEN
T1 - Research on sampling method of CFSFDP clustering algorithm and its criteria for determining the best sample size
AU - Cheng, Chen
AU - Yang, Jun
AU - Kong, Xuefeng
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/10/6
Y1 - 2018/10/6
N2 - Clustering by fast search and find of density peaks (CFSFDP) is a novel density-based fast clustering method, which has been widely studied and applied in many fields. However, when the sample size of data is too large, the algorithm is inefficient, since it consumes a lot of time and storage space. To solve the above problem, a simple random sampling (SRS) method is provided to speed up the optimized CFSFDP algorithm for real data with large sample size. The rate of correct classification of the sample is defined to measure its clustering performance, and we call it as sampling accuracy. We first use SRS method to generate small samples for cluster analysis. Then, we explore the relationship between the sampling rate and the sampling accuracy. Finally, in order to determine the best sample size that can achieve high sampling accuracy with high efficiency, the mean and standard deviation of the sampling accuracy are adopted as two criteria, and the best sample size is determined based on them. A real case study is given to show the implementation and effectiveness of the proposed method.
AB - Clustering by fast search and find of density peaks (CFSFDP) is a novel density-based fast clustering method, which has been widely studied and applied in many fields. However, when the sample size of data is too large, the algorithm is inefficient, since it consumes a lot of time and storage space. To solve the above problem, a simple random sampling (SRS) method is provided to speed up the optimized CFSFDP algorithm for real data with large sample size. The rate of correct classification of the sample is defined to measure its clustering performance, and we call it as sampling accuracy. We first use SRS method to generate small samples for cluster analysis. Then, we explore the relationship between the sampling rate and the sampling accuracy. Finally, in order to determine the best sample size that can achieve high sampling accuracy with high efficiency, the mean and standard deviation of the sampling accuracy are adopted as two criteria, and the best sample size is determined based on them. A real case study is given to show the implementation and effectiveness of the proposed method.
KW - CFSFDP
KW - Sampling accuracy
KW - Sampling rate
KW - Simple random sampling
KW - The best sample size
UR - https://www.scopus.com/pages/publications/85061039171
U2 - 10.1145/3292448.3292451
DO - 10.1145/3292448.3292451
M3 - 会议稿件
AN - SCOPUS:85061039171
T3 - ACM International Conference Proceeding Series
SP - 24
EP - 28
BT - ICAAI 2018 - 2018 the 2nd International Conference on Advances in Artificial Intelligence
PB - Association for Computing Machinery
T2 - 2nd International Conference on Advances in Artificial Intelligence, ICAAI 2018
Y2 - 6 October 2018 through 8 October 2018
ER -