TY - GEN
T1 - PMS
T2 - 26th ACM International Conference on Information and Knowledge Management, CIKM 2017
AU - Cao, Yingjie
AU - Zhang, Yangyang
AU - Li, Jianxin
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/11/6
Y1 - 2017/11/6
N2 - Recently, large-scale graph data processing and mining has drawn great attention, and many distributed graph processing systems [6-8, 10, 11, 13, 14, 17, 20] have been proposed. However, large-scale graph processing remains a challenging problem. Because the computation time in some cases is still unacceptable especially when the time is limited. As illustrated in Table 1, nearly three hours are needed when running Single-Source Shortest Path algorithm on the USA-road dataset 1 using performant open-source distributed graph processing systems. In this paper, we propose an effective priority-based message sampling (PMS) approach to further improve the performance of distributed graph processing at the cost of some accuracy loss. Noticing that the passing and processing of messages dominates the computation time, our approach works by eliminating those less useful messages directly without passing them which can effectively reduce the computation overhead.We implement our approach basing on Apache Giraph [6], a popular open-source implementation of Google's Pregel [14] and report the primary results of our prototype system. The experimental results show that our approach can achieve reasonable accuracy with much less computation time.
AB - Recently, large-scale graph data processing and mining has drawn great attention, and many distributed graph processing systems [6-8, 10, 11, 13, 14, 17, 20] have been proposed. However, large-scale graph processing remains a challenging problem. Because the computation time in some cases is still unacceptable especially when the time is limited. As illustrated in Table 1, nearly three hours are needed when running Single-Source Shortest Path algorithm on the USA-road dataset 1 using performant open-source distributed graph processing systems. In this paper, we propose an effective priority-based message sampling (PMS) approach to further improve the performance of distributed graph processing at the cost of some accuracy loss. Noticing that the passing and processing of messages dominates the computation time, our approach works by eliminating those less useful messages directly without passing them which can effectively reduce the computation overhead.We implement our approach basing on Apache Giraph [6], a popular open-source implementation of Google's Pregel [14] and report the primary results of our prototype system. The experimental results show that our approach can achieve reasonable accuracy with much less computation time.
KW - Approximate computation
KW - Distributed system
KW - Large-scale graph processing
UR - https://www.scopus.com/pages/publications/85037358402
U2 - 10.1145/3132847.3133087
DO - 10.1145/3132847.3133087
M3 - 会议稿件
AN - SCOPUS:85037358402
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1999
EP - 2002
BT - CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PB - Association for Computing Machinery
Y2 - 6 November 2017 through 10 November 2017
ER -