TY - GEN
T1 - Bootstrap sampling based data cleaning and maximum entropy SVMs for large datasets
AU - Wang, Senzhang
AU - Li, Zhoujun
AU - Zhang, Xiaoming
PY - 2012
Y1 - 2012
N2 - Support Vector Machines (SVMs) is a popular machine learning algorithm based on Statistical Learning Theory (SLT). However, traditional solutions suffer from O(n^{2}) time complexity. In this paper, a novel two-stage informative pattern abstraction algorithm is proposed. The first stage of the algorithm is data cleaning based on bootstrap sampling. A bundle of weak SVM classifiers are trained based on the sampled small datasets. Training data correctly classified by all the weak classifiers are cleaned. In the second stage, to further improve performance of final classifier and reduce training time, two novel informative pattern extraction algorithms based on entropy maximization SVMs are proposed. Empirical studies show our approach is effective in reducing size of training datasets and the computational cost, outperforming the state-of-the-art SVM training algorithms PEGASOS, RSVM and LIBLINEAR SVM with comparable classification accuracy.
AB - Support Vector Machines (SVMs) is a popular machine learning algorithm based on Statistical Learning Theory (SLT). However, traditional solutions suffer from O(n^{2}) time complexity. In this paper, a novel two-stage informative pattern abstraction algorithm is proposed. The first stage of the algorithm is data cleaning based on bootstrap sampling. A bundle of weak SVM classifiers are trained based on the sampled small datasets. Training data correctly classified by all the weak classifiers are cleaned. In the second stage, to further improve performance of final classifier and reduce training time, two novel informative pattern extraction algorithms based on entropy maximization SVMs are proposed. Empirical studies show our approach is effective in reducing size of training datasets and the computational cost, outperforming the state-of-the-art SVM training algorithms PEGASOS, RSVM and LIBLINEAR SVM with comparable classification accuracy.
KW - SVMs
KW - bootstrap sampling
KW - entropy maximization
UR - https://www.scopus.com/pages/publications/84876870819
U2 - 10.1109/ICTAI.2012.164
DO - 10.1109/ICTAI.2012.164
M3 - 会议稿件
AN - SCOPUS:84876870819
SN - 9780769549156
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 1151
EP - 1156
BT - Proceedings - 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, ICTAI 2012
T2 - 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, ICTAI 2012
Y2 - 7 November 2012 through 9 November 2012
ER -