TY - GEN
T1 - Retrospection on the Performance Analysis Tools for Large-Scale HPC Programs
AU - Xuan, Zhibo
AU - You, Xin
AU - Yang, Hailong
AU - Li, Mingzhen
AU - Luan, Zhongzhi
AU - Liu, Yi
AU - Qian, Depei
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - As the performance gap between hardware and software widens, performance analysis tools are essential for understanding the behavior of large-scale High-Performance Computing (HPC) programs. These tools provide insights into the performance bottlenecks and help in optimizing the performance of the programs. In this paper, we present a comprehensive study of performance analysis tools for large-scale HPC systems including both sampling-based and instrumentation-based tools that are commonly adopted in the HPC community. We investigate the abundance and overheads of data collection as well as the analysis capabilities of HPCToolkit, TAU, and Scalasca with representative programs at scale. Our study shows that different performance analysis tools have distinct strengths and weaknesses, and the choice of a performance analysis tool depends on the specific requirements of the user. We also discuss the challenges and future directions in the field of performance analysis tools for large-scale HPC systems.
AB - As the performance gap between hardware and software widens, performance analysis tools are essential for understanding the behavior of large-scale High-Performance Computing (HPC) programs. These tools provide insights into the performance bottlenecks and help in optimizing the performance of the programs. In this paper, we present a comprehensive study of performance analysis tools for large-scale HPC systems including both sampling-based and instrumentation-based tools that are commonly adopted in the HPC community. We investigate the abundance and overheads of data collection as well as the analysis capabilities of HPCToolkit, TAU, and Scalasca with representative programs at scale. Our study shows that different performance analysis tools have distinct strengths and weaknesses, and the choice of a performance analysis tool depends on the specific requirements of the user. We also discuss the challenges and future directions in the field of performance analysis tools for large-scale HPC systems.
UR - https://www.scopus.com/pages/publications/105000006821
U2 - 10.1109/HIPC62374.2024.00013
DO - 10.1109/HIPC62374.2024.00013
M3 - 会议稿件
AN - SCOPUS:105000006821
T3 - Proceedings - 2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics, HiPC 2024
SP - 34
EP - 44
BT - Proceedings - 2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics, HiPC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 31st Annual IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2024
Y2 - 18 December 2024 through 21 December 2024
ER -