TY - JOUR
T1 - Highly Paralleled Low-Cost Embedded HEVC Video Encoder on TI KeyStone Multicore DSP
AU - Jiang, Hongxu
AU - Fan, Rui
AU - Zhang, Yongfei
AU - Wang, Gang
AU - Li, Zhe
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2019/4
Y1 - 2019/4
N2 - Although HEVC, the emerging video coding standard, has doubled the coding performance of its predecessor H.264/AVC, its significantly increased computational complexity imposes great obstacles for HEVC encoders to be employed in real-time applications with embedded processors, such as digital signal processors (DSPs). In this paper, a TI Keystone multicore TMS320C6678 DSP-based highly paralleled low-cost fast HEVC encoding solution is well designed and implemented. First, the overall structure of HEVC encoder with CTU-level parallelism is re-designed to well support the encoding parallelism, with full consideration of the hardware characteristics. Second, a low-delay and low-memory multicore data transmission mechanism is proposed to reduce the latency of data access between internal L2 memory and external DDR3. Third, the encoding bottlenecks, i.e., the most time-consuming encoding modules, are identified and optimized for acceleration with TI powerful C6000 SIMD instructions. Experimental results show that our proposed HEVC encoder on TI TMS320C6678 DSPs can significantly improve the real-time capacity with tolerable performance loss, 0.93 dB performance loss under on average 465.50 times speedup as compared to CPU-based HM reference software, more specifically, which makes it desirable in power-constrained real-time video applications.
AB - Although HEVC, the emerging video coding standard, has doubled the coding performance of its predecessor H.264/AVC, its significantly increased computational complexity imposes great obstacles for HEVC encoders to be employed in real-time applications with embedded processors, such as digital signal processors (DSPs). In this paper, a TI Keystone multicore TMS320C6678 DSP-based highly paralleled low-cost fast HEVC encoding solution is well designed and implemented. First, the overall structure of HEVC encoder with CTU-level parallelism is re-designed to well support the encoding parallelism, with full consideration of the hardware characteristics. Second, a low-delay and low-memory multicore data transmission mechanism is proposed to reduce the latency of data access between internal L2 memory and external DDR3. Third, the encoding bottlenecks, i.e., the most time-consuming encoding modules, are identified and optimized for acceleration with TI powerful C6000 SIMD instructions. Experimental results show that our proposed HEVC encoder on TI TMS320C6678 DSPs can significantly improve the real-time capacity with tolerable performance loss, 0.93 dB performance loss under on average 465.50 times speedup as compared to CPU-based HM reference software, more specifically, which makes it desirable in power-constrained real-time video applications.
KW - HEVC
KW - SIMD
KW - TMS320C6678
KW - embedded system
KW - multicore DSP
UR - https://www.scopus.com/pages/publications/85045324017
U2 - 10.1109/TCSVT.2018.2826074
DO - 10.1109/TCSVT.2018.2826074
M3 - 文章
AN - SCOPUS:85045324017
SN - 1051-8215
VL - 29
SP - 1163
EP - 1178
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 4
M1 - 8336889
ER -