TY - GEN
T1 - Large Model based Sequential Keyframe Extraction for Video Summarization
AU - Tan, Kailong
AU - Zhou, Yuxiang
AU - Xia, Qianchen
AU - Liu, Rui
AU - Chen, Yong
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/4/12
Y1 - 2024/4/12
N2 - Keyframe extraction aims to sum up a video's semantics with the minimum number of its frames. This paper puts forward a Large Model based Sequential Keyframe Extraction for video summarization, dubbed LMSKE, which contains three stages as below. First, we use the large model "TransNetV21"to cut the video into consecutive shots, and employ the large model "CLIP2"to generate each frame's visual feature within each shot; Second, we develop an adaptive clustering algorithm to yield candidate keyframes for each shot, with each candidate keyframe locating nearest to a cluster center; Third, we further reduce the above candidate keyframes via redundancy elimination within each shot, and finally concatenate them in accordance with the sequence of shots as the final sequential keyframes. To evaluate LMSKE, we curate a benchmark dataset and conduct rich experiments, whose results exhibit that LMSKE performs much better than quite a few SOTA competitors with average F1 of 0.5311, average fidelity of 0.8141, and average compression ratio of 0.9922.
AB - Keyframe extraction aims to sum up a video's semantics with the minimum number of its frames. This paper puts forward a Large Model based Sequential Keyframe Extraction for video summarization, dubbed LMSKE, which contains three stages as below. First, we use the large model "TransNetV21"to cut the video into consecutive shots, and employ the large model "CLIP2"to generate each frame's visual feature within each shot; Second, we develop an adaptive clustering algorithm to yield candidate keyframes for each shot, with each candidate keyframe locating nearest to a cluster center; Third, we further reduce the above candidate keyframes via redundancy elimination within each shot, and finally concatenate them in accordance with the sequence of shots as the final sequential keyframes. To evaluate LMSKE, we curate a benchmark dataset and conduct rich experiments, whose results exhibit that LMSKE performs much better than quite a few SOTA competitors with average F1 of 0.5311, average fidelity of 0.8141, and average compression ratio of 0.9922.
KW - adaptive clustering
KW - keyframe extraction
KW - large model
KW - shot segmentation
KW - video summarization
UR - https://www.scopus.com/pages/publications/85197267073
U2 - 10.1145/3661725.3661781
DO - 10.1145/3661725.3661781
M3 - 会议稿件
AN - SCOPUS:85197267073
T3 - ACM International Conference Proceeding Series
BT - CMLDS 2024 - 2024 International Conference on Computing, Machine Learning and Data Science, Conference Proceedings
PB - Association for Computing Machinery
T2 - 2024 International Conference on Computing, Machine Learning and Data Science, CMLDS 2024
Y2 - 12 April 2024 through 14 April 2024
ER -