Implementation of DWT using SSE Instruction Set

Implementation of DWT using SSE Instruction Set
Mehta, Ami Muller, Gilles

Lifting based 2D-DWT Lifting Fixed point 1D Horizontal lifting
1D Vertical lifting Fixed point (9,7) tap biorthogonal filter Lossy compression High compression levels

2D DWT Matrices layout Mallat Strategy
Uses an auxiliary matrix to store the results of the horizontal filtering. No memory scattering: Horizontal high and low frequency components are not interleaved in memory. It allows a better exploitation of the SIMD parallelism.

Optimizations Cache The 2 matrices are aligned on the cache row size (128bits=16B) to allow data fetching in one cycle. Input and output matrices are juxtaposed in the memory to prevent conflicts in Direct Mapped cache. (Associativity conflict) access Cache layout without alignment Cache layout with alignment

Optimizations … SIMD code Using SSE2
Computes 4 pixels in parallel using fixed point arithmetic. Profiling C code showed that column transform and cache access caused the main bottleneck. In DWT intermediate values are reused, instead of recalculating we keep the intermediate computations.

Results Image size of 1024 x 1024 Profiling results done using VTune Analyzer© Cycles per uops improves from 3.38 to 2.28 Improvement of 32.5%

Results …

Thank you

Implementation of DWT using SSE Instruction Set

Similar presentations

Presentation on theme: "Implementation of DWT using SSE Instruction Set"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Implementation of DWT using SSE Instruction Set

Similar presentations

Presentation on theme: "Implementation of DWT using SSE Instruction Set"— Presentation transcript:

Similar presentations

About project

Feedback