Download presentation
Presentation is loading. Please wait.
Published byΤρύφων Μαλαξός Modified over 6 years ago
1
Implementation of DWT using SSE Instruction Set
Mehta, Ami Muller, Gilles
2
Lifting based 2D-DWT Lifting Fixed point 1D Horizontal lifting
1D Vertical lifting Fixed point (9,7) tap biorthogonal filter Lossy compression High compression levels
3
2D DWT Matrices layout Mallat Strategy
Uses an auxiliary matrix to store the results of the horizontal filtering. No memory scattering: Horizontal high and low frequency components are not interleaved in memory. It allows a better exploitation of the SIMD parallelism.
4
Optimizations Cache The 2 matrices are aligned on the cache row size (128bits=16B) to allow data fetching in one cycle. Input and output matrices are juxtaposed in the memory to prevent conflicts in Direct Mapped cache. (Associativity conflict) access Cache layout without alignment Cache layout with alignment
5
Optimizations … SIMD code Using SSE2
Computes 4 pixels in parallel using fixed point arithmetic. Profiling C code showed that column transform and cache access caused the main bottleneck. In DWT intermediate values are reused, instead of recalculating we keep the intermediate computations.
6
Results Image size of 1024 x 1024 Profiling results done using VTune Analyzer© Cycles per uops improves from 3.38 to 2.28 Improvement of 32.5%
7
Results …
8
Thank you
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.