Download presentation
Presentation is loading. Please wait.
Published byRaymond Weaver Modified over 9 years ago
1
Performance Enhancement of Video Compression Algorithms using SIMD Valia, Shamik Jamkar, Saket
2
Motivation Understand the SSE architecture Understand the Video compression algorithm and identify the bottlenecks. Improve performance of Video Compression Algorithm using the SSE platform
3
Components of Video Compression Algorithm Motion Estimation Motion Compensation and Image Subtraction Discrete Cosine Transform Quantization Run Length Encoding Huffman Coding
4
Bottleneck Motion Estimation It is the process of calculating motion vectors by searching image blocks from a reference image in a new target image DCT Technique to change from the time domain to spatial frequency domain Highest energy compaction after KLT
5
SSE 2 Specifics Intel C/C++ Compiler 8 3 coding styles Intrinsics Assembly Vector Ops Use of Intrinsics _mm_sad_epu8 for __m128i datatype _m_psadbw for __m64 datatype
6
SSE2 platform for Motion Estimation Without SSEWith SSE Full Search 16 x 16 3 secs1 secs Full Search 8 x 8 23 secs6 secs Three Step 16 x 16 4 secs1 secs Three Step 8 x 8 12 secs3 secs
7
Original Frame from Video
8
Part of Frames 4 and 5
9
Motion Compensated frames 16 x 168 x 8
10
Discrete Cosine Transform 2-D DCT is extensively used in JPEG compression algorithm. Highly computational intensive. FOCUS Exploring DCT implementation on SSE2. Identify the DCT algorithm which is scalable with the SIMD Architecture
11
DCT hardware Accelerator Distributed Arithmetic Choice of DA implementation of DCT Scalable with SSE platform. 2-D 8x8 DCT operations can be performed as Preprocessing 1-D DCT (Using DA) Transpose 1-D DCT (Using DA) Post Processing
12
1-D DCT on SSE2 using DA x 0 + x 7 x 1 +x 6 x 2 +x 5 x 3 +x 4 x 0 -x 7 x 1 -x 6 x 2 -x 5 x 3 -x 4 ROM + 0.5 R 0.25 16 DAP X0 X2X4X6X1X3X5X7 Total of 8 DAP structures. Each DAP completes operations in 8 cycles Scalable on various datapaths 16,32,64,128. DAP subword dest,source 4
13
Work done Accomplished Motion Estimation coding and analysis DCT hardware accelerator in Verilog ISA extension for DCT implementation. To be done Synthesis to get delay and area estimate Assembly code with SSE-DCT enhancements and its performance analysis
14
Questions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.