LOW POWER DIGITAL VIDEO COMPRESSION HARDWARE DESIGN ILKER HAMZAOGLU Sabancı University
Digital Video Compression Why do we need video compression? To store high quality video on a limited storage space or transmit it within the limited bandwidth networks provide There is tremendous growth in the amount of digital video creation and communication. Higher Spatial Video Resolutions 1920x1080 (Full HD), 3840x2160 (Quad Full HD, 4K Ultra HD) Higher Temporal Video Resolutions 30 Frames per second, 120 Frames per second 3D, Multi-View Video
Digital Video Compression Applications of video compression Digital TV, DVD, cellular phones, internet video streaming, digital camcorders, video teleconferencing
Digital Video Compression Video compression algorithms remove redundancies Subjective redundancy, Spatial redundancy, Temporal redundancy, Statistical Redundancy
Digital Video Compression High Efficiency Video Coding (HEVC) / H.265 Standard 50% more compression at the same video quality than Advanced Video Coding (AVC) / H.264 standard at the expense of significant computational complexity increase. HEVC Video Encoder
Low Power Digital Hardware Design
Low Power Digital Hardware Design Limited capacity of batteries in battery-operated portable devices Excessive power consumption Degrades performance Increases packaging and cooling costs Reduces the reliability and may cause device failures Therefore, power consumption is a very important design metric for integrated circuits (IC).
Power Consumption Estimation Power consumption of a hardware implementation on a Xilinx FPGA is estimated using Xilinx XPower tool Timing simulation of placed and routed netlist is performed using ModelSim and switching activities are stored in Value Change Dump (VCD) files for several video frames The VCD files are used for estimating the power consumption of that hardware implementation using Xilinx XPower tool
Power Consumption Estimation Hardware Average Current without DBF Average Current with DBF Estimated Power Measured Power DBF_4×4 999 mA 1076 mA 220.6 mW 254.1 mW DBF_16×16 1119 mA 1152 mA 89.7 mW 108.9 mW
Power Reduction Techniques Glitch Reduction Comparison Prediction Multiple Constant Multiplication Pixel Equality based Computation and Energy Reduction Pixel Similarity based Computation and Energy Reduction Computation and Energy Reduction for HEVC/H.265 Inverse Discrete Cosine Transform
Glitch Reduction Glitch Reduction in Motion Estimation Hardware Glitch is a spurious transition at a node within a single cycle before the node settles to the correct value. Reducing glitches by pipelining is an effective power reduction technique.
Glitch Reduction An average dynamic power reduction of 20% is achieved for Foreman and Mobile video sequences by pipelining the Motion Estimation hardware.
Comparison Prediction
Comparison Prediction
Comparison Prediction
Multiple Constant Multiplication Hcub MCM algorithm is used in a HEVC / H.265 Inverse Discrete Cosine Transform hardware to minimize number of adders, their bit size and adder tree depth in a multiplier block, which multiplies a single input with multiple constants.
Multiple Constant Multiplication
AVC/H.264 Intra Prediction Algorithm Intra prediction algorithm predicts the pixels in a MB using the pixels in the available neighboring blocks.
AVC/H.264 4x4 Intra Prediction Modes
AVC/H.264 16x16 Intra Prediction Modes
HEVC/H.265 Intra Prediction Algorithm Intra prediction algorithm predicts the pixels in a MB using the pixels in the available neighboring blocks. Intra prediction unit (PU) sizes from 4x4 up to 64x64. Number of intra prediction modes for a PU up to 35. PU Size # of Prediction Modes 4x4 18 8x8 35 16x16 32x32 64x64 4 0: Intra_Planar 3: Intra_DC
Pixel Equality based Computation and Energy Reduction Technique PECR technique compares the pixels used in the prediction equations of intra prediction modes. If the pixels used in a prediction equation are equal, this prediction equation simplifies to a constant value and prediction calculation for this equation becomes unnecessary. Pixel Equations # of Add. # of Shift I,J [27I+5J+16] >> 5 6 5 J,K [22J+10K+16] >> 5 K,L [17K+15L+16] >> 5 L,M [12L+20M+16] >> 5 4 M,N [6M+26N+16] >> 5 N,O [30N+2O+16] >> 5 O,P [8O+24P+16] >> 5 3 A,R [11A+21R+16] >> 5 A,B [5A+27B+16] >> 5 B,C [10B+22C +16] >> 5 Mode 8 pred[0, 0] = 27I + 5J + 16 >> 5 If I = J pred[0,0] = [32I+16] >> 5 pred[0,0] = I No quality loss
Pixel Similarity based Computation and Energy Reduction Technique PSCR technique compares the pixels used in the prediction equations of intra prediction modes. If the pixels used in a prediction equation are similar, the predicted pixel by this equation is assumed to be equal to one of these pixels. Therefore, this prediction equation simplifies to a constant value and prediction calculation for this equation becomes unnecessary. In this way, the amount of computations performed by HEVC/H.265 intra prediction algorithm is reduced even further with a small quality loss.
Pixel Similarity based Computation and Energy Reduction Technique
HEVC/H.265 Intra Prediction Hardware
HEVC/H.265 Intra Prediction Hardware
HEVC/H.265 Intra Prediction Hardware ML605 board with Xilinx Virtex 6 XC6VLX75T FPGA
Energy Consumption Results
Energy Consumption Results
HEVC/H.265 Inverse DCT After forward DCT and quantization, most of the transformed and quantized high frequency AC coefficients become zero. IDCT(Transform Coefficients) { if (DC coefficient is not zero and predetermined AC coefficients are smaller than threshold) Residual ← IDCT(DC Coefficient) else Residual ← IDCT(Transform Coefficients) }
HEVC/H.265 Inverse DCT Frame QP ∆Bitrate (%) ∆PSNR (dB) Steam Locomotive 22 0.49 0.003 27 0.53 -0.001 32 0.64 -0.007 Traffic 0.70 0.015 1.25 0.016 3.41 0.059 People on Street 0.77 -0.005 0.90 -0.019 3.05 -0.054 Park Scene 0.39 -0.010 0.68 -0.017 0.57 -0.085 Kimono 0.40 -0.004 0.63 -0.002 0.95 -0.009 Cactus -0.04 -0.039 0.86 -0.016 2.59 -0.046
HEVC/H.265 Inverse DCT
HEVC/H.265 Inverse DCT
HEVC/H.265 Inverse DCT
HEVC/H.265 Inverse DCT [14] [15] [16] [17] Proposed Technology 0.13 um ASIC 0.18 um 90 nm Gate Count 109.2 K 287 K 12.3 K 235.4 K 142 K Max Speed (MHz) 350 300 211 311 150 Frames per Second 30 4096x2048 30 3840x2160 67 1920x1080 4096x2048 48 Transform Size 4, 8, 16, 32 16, 32 8 Transform 1D 2D
Many Thanks 6 PhD students, 19 MS students, many BS students for everything Sabancı University for all the support TUBITAK for 11 BIDEB Scholarships TUBITAK for 1001 Research Projects EEEAG 115E290, EEEAG 111E013, EEEAG 108E239, EEEAG 106E153 TUBITAK and Korea Research Foundation for Joint Research Project EEEAG 107E179