TY - GEN
T1 - swGBDT
T2 - 6th Asian Supercomputing Conference, SCFA 2020
AU - Yin, Bohong
AU - Li, Yunchun
AU - Dun, Ming
AU - You, Xin
AU - Yang, Hailong
AU - Luan, Zhongzhi
AU - Qian, Depei
N1 - Publisher Copyright:
© 2020, The Author(s).
PY - 2020
Y1 - 2020
N2 - Gradient Boosted Decision Trees (GBDT) is a practical machine learning method, which has been widely used in various application fields such as recommendation system. Optimizing the performance of GBDT on heterogeneous many-core processors exposes several challenges such as designing efficient parallelization scheme and mitigating the latency of irregular memory access. In this paper, we propose swGBDT, an efficient GBDT implementation on Sunway processor. In swGBDT, we divide the 64 CPEs in a core group into multiple roles such as loader, saver and worker in order to hide the latency of irregular global memory access. In addition, we partition the data into two granularities such as block and tile to better utilize the LDM on each CPE for data caching. Moreover, we utilize register communication for collaboration among CPEs. Our evaluation with representative datasets shows that swGBDT achieves 4.6 and 2 performance speedup on average compared to the serial implementation on MPE and parallel XGBoost on CPEs respectively.
AB - Gradient Boosted Decision Trees (GBDT) is a practical machine learning method, which has been widely used in various application fields such as recommendation system. Optimizing the performance of GBDT on heterogeneous many-core processors exposes several challenges such as designing efficient parallelization scheme and mitigating the latency of irregular memory access. In this paper, we propose swGBDT, an efficient GBDT implementation on Sunway processor. In swGBDT, we divide the 64 CPEs in a core group into multiple roles such as loader, saver and worker in order to hide the latency of irregular global memory access. In addition, we partition the data into two granularities such as block and tile to better utilize the LDM on each CPE for data caching. Moreover, we utilize register communication for collaboration among CPEs. Our evaluation with representative datasets shows that swGBDT achieves 4.6 and 2 performance speedup on average compared to the serial implementation on MPE and parallel XGBoost on CPEs respectively.
KW - Gradient Boosted Decision Tree
KW - Many-core processor
KW - Performance optimization
UR - https://www.scopus.com/pages/publications/85086180884
U2 - 10.1007/978-3-030-48842-0_5
DO - 10.1007/978-3-030-48842-0_5
M3 - 会议稿件
AN - SCOPUS:85086180884
SN - 9783030488413
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 67
EP - 86
BT - Supercomputing Frontiers - 6th Asian Conference, SCFA 2020, Proceedings
A2 - Panda, Dhabaleswar K.
PB - Springer
Y2 - 24 February 2020 through 27 February 2020
ER -