Download presentation
Presentation is loading. Please wait.
Published byΔαμιανός Θεοτόκης Modified over 6 years ago
1
Hardware Acceleration of the Lifting Based DWT
Vidhu Niti Singh Pritam Kulkarni
2
Background: The Lifting Scheme
: Low Frequency Components Scaling Coefficients : High Frequency Components Wavelet Coefficients
3
Advantages Does not rely on Fourier Transform, hence can be made even more efficient. Integer to Integer Transformation Maps integers to integers thus eliminating the use of floating point operations. This process is reversible and lossless. Good for Hardware implementation. Large amount of parallelism available. All computations done in place i.e. no extra registers needed to store the input. Registers are needed to store the filter/lifting coefficients though.
4
Motivation - Parallel operations in predicting one
- Parallel operations in updating from one - Reusing data in predicting one - Reusing data in updating from one - Parallel execution of 1-D predict and update phases - Reusing data in 1-D predict and update phases
5
Parallelizing Predict Stage(1)
All and filter coefficients known before hand. The entire operation Can be done in parallel Lambdas used for consecutive stages are stored in buffer to reduce number of memory accesses.
6
Parallelizing Predict Stage(2)
The filter coefficients can be stored in a RAM/Registers in such a manner that they can be accessed in parallel.
7
Predict Module
8
Accelerating the update stage
In the update stage we know all the lambdas and lifting coefficients Hence there is no data dependency within one cycle and we can do the entire cycle in parallel.
9
Update Module
10
Pipelining the predict and update stages
A FIFO buffer is needed to accommodate for the different rate of production of gammas.
11
Work Done Written the code ( MATLAB and C) for the predict and update modules for both – the forward and the inverse wavelet transform based on the lifting scheme. Designed the Hardware shown above to help in the parallel implementation of the DWT. Completed coding of the modules in Verilog HDL. Synthesized the same using Synopsis Design Compiler.
12
Results Predict stage in a sequential machine needs 16 memory accesses
4 multiplication 4 addition/subtraction The new scheme needs 1 memory access 1 clock for all the multiplications 1 clock cycle for all additions/subtractions. Exactly the same results for the update stage. Apart from parallelism within the stages. This scheme allows pipelining of the two stages.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.