Download presentation
Presentation is loading. Please wait.
Published byDale Lewis Modified over 9 years ago
1
11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1 University of Michigan 2 Arizona State University
2
22 2 Customizing Wide-SIMD Architectures for H.264 Outline Motivation H.264 Analysis Proposed Architecture H.264 Kernel Mappings Results Conclusion 2
3
33 3 Customizing Wide-SIMD Architectures for H.264 Motivation – Smart Phone 3 Reference Images : http://www.apple.com/iphone/gallery/http://www.apple.com/iphone/gallery/
4
44 4 Customizing Wide-SIMD Architectures for H.264 Motivation – Inside Smart Phone 4 Reference Images : http://idannyb.files.wordpress.com/2008/07/xiuvbfueck3gsdum-large.jpghttp://idannyb.files.wordpress.com/2008/07/xiuvbfueck3gsdum-large.jpg
5
55 5 Customizing Wide-SIMD Architectures for H.264 H.264 Design 5 Reference Images : I. Richardson, “H.264 and MPEG-4 video compression,” WILEY, 2003 H.264 encoder/decoder reference design
6
66 6 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis H.264 Kernel Algorithms Heavy SIMD workload Different natural SIMD widths High & Medium Thread Level Parallelism Need to support multiple SIMD widths to maximize the SIMD utilization 6
7
77 7 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis Example – Deblocking Filter Two dimensional data are used for multimedia algorithms. Row or column order memory access works well for one set of edges, but not for the other. Diagonal memory bank system helps to access blocks along a row or a column. 7 Horizontal Filtering Vertical Filtering
8
88 8 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis Subgraphs for Innerloops of two kernel algorithms Large amount of data locality Large RF power consumption (Read/Write) Bypass and Temporary buffer support 8
9
99 9 Customizing Wide-SIMD Architectures for H.264 H.264 - Analysis Instruction Pairs Heavy usage of shuffle and arithmetic operations Add-Shift : round operation Sub-Abs : SAD operation Need to fuse the frequently used instruction pairs 9
10
10 Customizing Wide-SIMD Architectures for H.264 H.264 - Analysis Permutation Patterns for Intraprediction Fixed set of shuffle patterns Need for programmable shuffle network 10
11
11 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 11
12
12 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 12 Multiple SIMD widths Thread-Level Parallelism
13
13 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 13 Diagonal Memory Organization Memory Bank System + Shuffle Network
14
14 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 14 Short-lived values stored in temporary buffers
15
15 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 15 Short-lived values Fused Operation
16
16 Customizing Wide-SIMD Architectures for H.264 Modified SIMD architecture 16 Shuffle Networks are placed here and there to align data
17
17 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels Intra Prediction 17
18
18 Customizing Wide-SIMD Architectures for H.264 Results System Breakdown H.264 CIF video at 30fps 18
19
19 Customizing Wide-SIMD Architectures for H.264 Results Speedup Breakdown 2.13x performance increase on average 19
20
20 Customizing Wide-SIMD Architectures for H.264 Results Energy-Delay product comparison 29% energy-delay improvement on average 20
21
21 Customizing Wide-SIMD Architectures for H.264 Results 21 Comparison with latest H.264 encoders [17] T. C. Chen et.al, “2.8 to 62.7 mW low-power and power-aware H.264 encoder for mobile applications,” 2007 IEEE Symposium on VLSI Circuits, pp. 222–223, June 2007. [18] M. Bhatnagar, “TMS320DM6446/3 Power Consumption Summary,” Texas Instruments Application Reports, http://focus.ti.com/lit/an/spraad6a/spraad6a.pdf, Feb. 2008.
22
22 Customizing Wide-SIMD Architectures for H.264 Conclusion Key architectural enhancements SIMD partitioning Diagonal memory bank system Bypass and temporary buffer support Fused operation support Programmable crossbar Future work Image processing algorithms on SIMD architecture 22
23
23 Customizing Wide-SIMD Architectures for H.264 Backup Slides 23
24
24 Customizing Wide-SIMD Architectures for H.264 H.264 – Analysis Diagonal Memory Organization Two dimensional data are used for multimedia algorithms. Blocks along a row or a column need to be accessed easily. 24
25
25 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels Deblocking Filter 25
26
26 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels Motion Compensation 26
27
27 Customizing Wide-SIMD Architectures for H.264 Mapping of H.264 Kernels Motion Estimation 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.