11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1 University of Michigan 2 Arizona State University
22 2 Customizing Wide-SIMD Architectures for H.264 Outline Motivation H.264 Analysis Proposed Architecture H.264 Kernel Mappings Results Conclusion 2
33 3 Customizing Wide-SIMD Architectures for H.264 Motivation – Smart Phone 3 Reference Images :
44 4 Customizing Wide-SIMD Architectures for H.264 Motivation – Inside Smart Phone 4 Reference Images :
55 5 Customizing Wide-SIMD Architectures for H.264 H.264 Design 5 Reference Images : I. Richardson, “H.264 and MPEG-4 video compression,” WILEY, 2003 H.264 encoder/decoder reference design
66 6 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis H.264 Kernel Algorithms Heavy SIMD workload Different natural SIMD widths High & Medium Thread Level Parallelism Need to support multiple SIMD widths to maximize the SIMD utilization 6
77 7 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis Example – Deblocking Filter Two dimensional data are used for multimedia algorithms. Row or column order memory access works well for one set of edges, but not for the other. Diagonal memory bank system helps to access blocks along a row or a column. 7 Horizontal Filtering Vertical Filtering
88 8 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis Subgraphs for Innerloops of two kernel algorithms Large amount of data locality Large RF power consumption (Read/Write) Bypass and Temporary buffer support 8
99 9 Customizing Wide-SIMD Architectures for H.264 H Analysis Instruction Pairs Heavy usage of shuffle and arithmetic operations Add-Shift : round operation Sub-Abs : SAD operation Need to fuse the frequently used instruction pairs 9
10 Customizing Wide-SIMD Architectures for H.264 H Analysis Permutation Patterns for Intraprediction Fixed set of shuffle patterns Need for programmable shuffle network 10
11 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 11
12 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 12 Multiple SIMD widths Thread-Level Parallelism
13 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 13 Diagonal Memory Organization Memory Bank System + Shuffle Network
14 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 14 Short-lived values stored in temporary buffers
15 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 15 Short-lived values Fused Operation
16 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 16 Shuffle Networks are placed here and there to align data
17 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels Intra Prediction 17
18 Customizing Wide-SIMD Architectures for H.264 Results System Breakdown H.264 CIF video at 30fps 18
19 Customizing Wide-SIMD Architectures for H.264 Results Speedup Breakdown 2.13x performance increase on average 19
20 Customizing Wide-SIMD Architectures for H.264 Results Energy-Delay product comparison 29% energy-delay improvement on average 20
21 Customizing Wide-SIMD Architectures for H.264 Results 21 Comparison with latest H.264 encoders [17] T. C. Chen et.al, “2.8 to 62.7 mW low-power and power-aware H.264 encoder for mobile applications,” 2007 IEEE Symposium on VLSI Circuits, pp. 222–223, June [18] M. Bhatnagar, “TMS320DM6446/3 Power Consumption Summary,” Texas Instruments Application Reports, Feb
22 Customizing Wide-SIMD Architectures for H.264 Conclusion Key architectural enhancements SIMD partitioning Diagonal memory bank system Bypass and temporary buffer support Fused operation support Programmable crossbar Future work Image processing algorithms on SIMD architecture 22
23 Customizing Wide-SIMD Architectures for H.264 Backup Slides 23
24 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis Diagonal Memory Organization Two dimensional data are used for multimedia algorithms. Blocks along a row or a column need to be accessed easily. 24
25 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels Deblocking Filter 25
26 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels Motion Compensation 26
27 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels Motion Estimation 27