TY - GEN
T1 - Parallel implementation of classification algorithms based on mapreduce
AU - He, Qing
AU - Zhuang, Fuzhen
AU - Li, Jincheng
AU - Shi, Zhongzhi
PY - 2010
Y1 - 2010
N2 - Data mining has attracted extensive research for several decades. As an important task of data mining, classification plays an important role in information retrieval, web searching, CRM, etc. Most of the present classification techniques are serial, which become impractical for large dataset. The computing resource is under-utilized and the executing time is not waitable. Provided the program mode of MapReduce, we propose the parallel implementation methods of several classification algorithms, such as k-nearest neighbors, naive bayesian model and decision tree, etc. Preparatory experiments show that the proposed parallel methods can not only process large dataset, but also can be extended to execute on a cluster, which can significantly improve the efficiency.
AB - Data mining has attracted extensive research for several decades. As an important task of data mining, classification plays an important role in information retrieval, web searching, CRM, etc. Most of the present classification techniques are serial, which become impractical for large dataset. The computing resource is under-utilized and the executing time is not waitable. Provided the program mode of MapReduce, we propose the parallel implementation methods of several classification algorithms, such as k-nearest neighbors, naive bayesian model and decision tree, etc. Preparatory experiments show that the proposed parallel methods can not only process large dataset, but also can be extended to execute on a cluster, which can significantly improve the efficiency.
KW - Classification
KW - Data Mining
KW - Large Dataset
KW - MapReduce
KW - Parallel Implementation
UR - https://www.scopus.com/pages/publications/78349266817
U2 - 10.1007/978-3-642-16248-0_89
DO - 10.1007/978-3-642-16248-0_89
M3 - 会议稿件
AN - SCOPUS:78349266817
SN - 3642162479
SN - 9783642162473
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 655
EP - 662
BT - Rough Set and Knowledge Technology - 5th International Conference, RSKT 2010, Proceedings
T2 - 5th International Conference on Rough Set and Knowledge Technology, RSKT 2010
Y2 - 15 October 2010 through 17 October 2010
ER -