TY - GEN
T1 - PRoof
T2 - 53rd International Conference on Parallel Processing, ICPP 2024
AU - Wu, Siyu
AU - Yang, Hailong
AU - You, Xin
AU - Gong, Ruihao
AU - Liu, Yi
AU - Luan, Zhongzhi
AU - Qian, Depei
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/8/12
Y1 - 2024/8/12
N2 - The increasing diversity of deep neural network (DNN) models and hardware platforms necessitates effective model profiling for high-performance inference deployment. Current DNN profiling tools suffer from either limited optimization insights due to the missing correlation between high-level DNN layer design and low-level hardware performance metrics, or prohibitive profiling overhead due to the large amount of performance measurement through hardware performance counters. Meanwhile, the roofline model has been widely used in the high-performance computing (HPC) domain for identifying performance bottlenecks and guiding optimizations. However, it lacks hierarchical (e.g., kernel/operator/layer), fine-grained, multi-platform support for profiling DNN models. To overcome the above limitations, we propose PRoof, a versatile DNN profiling framework, that can effectively attribute the hardware performance metrics back to the model design. In addition, PRoof does not require massive hardware profiling and thus mitigates the large profiling overhead. Specifically, our approach correlates the profiled result of each layer to their conceptual layer design by effectively handling layer fusion. Our approach also provides an analytical model to predict the floating-point operations (FLOP) and memory accesses of DNN models without massive profiling. We demonstrate the effectiveness of PRoof with representative DNN models across a wide range of hardware platforms. Derived from PRoof's profiling results, we obtain several insights to provide useful guidance for model design and hardware tuning.
AB - The increasing diversity of deep neural network (DNN) models and hardware platforms necessitates effective model profiling for high-performance inference deployment. Current DNN profiling tools suffer from either limited optimization insights due to the missing correlation between high-level DNN layer design and low-level hardware performance metrics, or prohibitive profiling overhead due to the large amount of performance measurement through hardware performance counters. Meanwhile, the roofline model has been widely used in the high-performance computing (HPC) domain for identifying performance bottlenecks and guiding optimizations. However, it lacks hierarchical (e.g., kernel/operator/layer), fine-grained, multi-platform support for profiling DNN models. To overcome the above limitations, we propose PRoof, a versatile DNN profiling framework, that can effectively attribute the hardware performance metrics back to the model design. In addition, PRoof does not require massive hardware profiling and thus mitigates the large profiling overhead. Specifically, our approach correlates the profiled result of each layer to their conceptual layer design by effectively handling layer fusion. Our approach also provides an analytical model to predict the floating-point operations (FLOP) and memory accesses of DNN models without massive profiling. We demonstrate the effectiveness of PRoof with representative DNN models across a wide range of hardware platforms. Derived from PRoof's profiling results, we obtain several insights to provide useful guidance for model design and hardware tuning.
KW - DNN profiling
KW - hardware performance metrics
KW - roofline model
UR - https://www.scopus.com/pages/publications/85202436992
U2 - 10.1145/3673038.3673116
DO - 10.1145/3673038.3673116
M3 - 会议稿件
AN - SCOPUS:85202436992
T3 - ACM International Conference Proceeding Series
SP - 822
EP - 832
BT - 53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings
PB - Association for Computing Machinery
Y2 - 12 August 2024 through 15 August 2024
ER -