TY - GEN
T1 - DPS
T2 - 14th International Symposium on Pervasive Systems, Algorithms and Networks, I-SPAN 2017, 11th International Conference on Frontier of Computer Science and Technology, FCST 2017 and 3rd International Symposium of Creative Computing, ISCC 2017
AU - Sun, Chenggen
AU - Zhang, Yangyang
AU - Yu, Weiren
AU - Zhang, Richong
AU - Bhuiyan, Md Zakirul Alam
AU - Li, Jianxin
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/27
Y1 - 2017/11/27
N2 - To solve the problem of efficient storing and updating of model parameters in the learning process, the parameter server is concerned as a high-throughput distributed machine learning (ML) architecture with the emergence of big models with billions of parameters. Current parameter servers, such as the Parameter Server and the Petuum, do not address data management and lack high-level data abstraction. Moreover, they have no task scheduling and do not fully utilize the computing resource as well as possibly lead to load imbalance. Their programming interface is too complicated and they do not support data flow operations (e.g. map/reduce) which are very useful for data preprocessing. These drawbacks limit the performance and ease of use of such parameter servers.In this paper, we proposed DPS, a parameter server based on Distributed Shared Memory (DSM) for machine learning. DPS provides flexible consistency models, high-level data abstraction and management that support data flow operations, lightweight task scheduling system and user-friendly programming interface to solve the problems of existing systems mentioned above. The experimental results show that DPS can reduce networking time by about 50%, and achieve up to 1.9x performance compared to Petuum while the algorithms implemented on DPS use less code than those implemented on Petuum. In this paper, we proposed DPS, a parameter server based on Distributed Shared Memory (DSM) for machine learning. DPS provides flexible consistency models, high-level data abstraction and management that support data flow operations, lightweight task scheduling system and user-friendly programming interface to solve the problems of existing systems mentioned above. The experimental results show that DPS can reduce networking time by about 50%, and achieve up to 1.9x performance compared to Petuum while the algorithms implemented on DPS use less code than those implemented on Petuum.
AB - To solve the problem of efficient storing and updating of model parameters in the learning process, the parameter server is concerned as a high-throughput distributed machine learning (ML) architecture with the emergence of big models with billions of parameters. Current parameter servers, such as the Parameter Server and the Petuum, do not address data management and lack high-level data abstraction. Moreover, they have no task scheduling and do not fully utilize the computing resource as well as possibly lead to load imbalance. Their programming interface is too complicated and they do not support data flow operations (e.g. map/reduce) which are very useful for data preprocessing. These drawbacks limit the performance and ease of use of such parameter servers.In this paper, we proposed DPS, a parameter server based on Distributed Shared Memory (DSM) for machine learning. DPS provides flexible consistency models, high-level data abstraction and management that support data flow operations, lightweight task scheduling system and user-friendly programming interface to solve the problems of existing systems mentioned above. The experimental results show that DPS can reduce networking time by about 50%, and achieve up to 1.9x performance compared to Petuum while the algorithms implemented on DPS use less code than those implemented on Petuum. In this paper, we proposed DPS, a parameter server based on Distributed Shared Memory (DSM) for machine learning. DPS provides flexible consistency models, high-level data abstraction and management that support data flow operations, lightweight task scheduling system and user-friendly programming interface to solve the problems of existing systems mentioned above. The experimental results show that DPS can reduce networking time by about 50%, and achieve up to 1.9x performance compared to Petuum while the algorithms implemented on DPS use less code than those implemented on Petuum.
KW - Big data
KW - Machine learning
KW - Parameter server
UR - https://www.scopus.com/pages/publications/85048007706
U2 - 10.1109/ISPAN-FCST-ISCC.2017.48
DO - 10.1109/ISPAN-FCST-ISCC.2017.48
M3 - 会议稿件
AN - SCOPUS:85048007706
T3 - Proceedings - 14th International Symposium on Pervasive Systems, Algorithms and Networks, I-SPAN 2017, 11th International Conference on Frontier of Computer Science and Technology, FCST 2017 and 3rd International Symposium of Creative Computing, ISCC 2017
SP - 20
EP - 27
BT - Proceedings - 14th International Symposium on Pervasive Systems, Algorithms and Networks, I-SPAN 2017, 11th International Conference on Frontier of Computer Science and Technology, FCST 2017 and 3rd International Symposium of Creative Computing, ISCC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 21 June 2017 through 23 June 2017
ER -