TY - GEN
T1 - ICMH-Net
T2 - 31st ACM International Conference on Multimedia, MM 2023
AU - Liu, Lei
AU - Hu, Zhihao
AU - Chen, Zhenghao
AU - Xu, Dong
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/10/27
Y1 - 2023/10/27
N2 - Neural image compression has gained significant attention thanks to the remarkable success of deep neural networks. However, most existing neural image codecs focus solely on improving human vision perception. In this work, our objective is to enhance image compression methods for both human vision quality and machine vision tasks simultaneously. To achieve this, we introduce a novel approach to Partition, Transmit, Reconstruct, and Aggregate (PTRA) the latent representation of images to balance the optimizations for both aspects. By employing our method as a module in existing neural image codecs, we create a latent representation predictor that dynamically manages the bit-rate cost for machine vision tasks. To further improve the performance of auto-regressive-based coding techniques, we enhance our hyperprior network and predictor module with context modules, resulting in a reduction in bit-rate. The extensive experiments conducted on various machine vision benchmarks such as ILSVRC 2012, VOC 2007, VOC 2012, and COCO demonstrate the superiority of our newly proposed image compression framework. It outperforms existing neural image compression methods in multiple machine vision tasks including classification, segmentation, and detection, while maintaining high-quality image reconstruction for human vision.
AB - Neural image compression has gained significant attention thanks to the remarkable success of deep neural networks. However, most existing neural image codecs focus solely on improving human vision perception. In this work, our objective is to enhance image compression methods for both human vision quality and machine vision tasks simultaneously. To achieve this, we introduce a novel approach to Partition, Transmit, Reconstruct, and Aggregate (PTRA) the latent representation of images to balance the optimizations for both aspects. By employing our method as a module in existing neural image codecs, we create a latent representation predictor that dynamically manages the bit-rate cost for machine vision tasks. To further improve the performance of auto-regressive-based coding techniques, we enhance our hyperprior network and predictor module with context modules, resulting in a reduction in bit-rate. The extensive experiments conducted on various machine vision benchmarks such as ILSVRC 2012, VOC 2007, VOC 2012, and COCO demonstrate the superiority of our newly proposed image compression framework. It outperforms existing neural image compression methods in multiple machine vision tasks including classification, segmentation, and detection, while maintaining high-quality image reconstruction for human vision.
KW - human vision
KW - image compression
KW - machine vision
KW - scalable coding
UR - https://www.scopus.com/pages/publications/85179549064
U2 - 10.1145/3581783.3612041
DO - 10.1145/3581783.3612041
M3 - 会议稿件
AN - SCOPUS:85179549064
T3 - MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
SP - 8047
EP - 8056
BT - MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 29 October 2023 through 3 November 2023
ER -