Hardware Acceleration of the Lifting Based DWT

Hardware Acceleration of the Lifting Based DWT
Vidhu Niti Singh Pritam Kulkarni

Background: The Lifting Scheme
: Low Frequency Components Scaling Coefficients : High Frequency Components Wavelet Coefficients

Advantages Does not rely on Fourier Transform, hence can be made even more efficient. Integer to Integer Transformation Maps integers to integers thus eliminating the use of floating point operations. This process is reversible and lossless. Good for Hardware implementation. Large amount of parallelism available. All computations done in place i.e. no extra registers needed to store the input. Registers are needed to store the filter/lifting coefficients though.

Motivation - Parallel operations in predicting one
- Parallel operations in updating from one - Reusing data in predicting one - Reusing data in updating from one - Parallel execution of 1-D predict and update phases - Reusing data in 1-D predict and update phases

Parallelizing Predict Stage(1)
All and filter coefficients known before hand. The entire operation Can be done in parallel Lambdas used for consecutive stages are stored in buffer to reduce number of memory accesses.

Parallelizing Predict Stage(2)
The filter coefficients can be stored in a RAM/Registers in such a manner that they can be accessed in parallel.

Predict Module

Accelerating the update stage
In the update stage we know all the lambdas and lifting coefficients Hence there is no data dependency within one cycle and we can do the entire cycle in parallel.

Update Module

Pipelining the predict and update stages
A FIFO buffer is needed to accommodate for the different rate of production of gammas.

Work Done Written the code ( MATLAB and C) for the predict and update modules for both – the forward and the inverse wavelet transform based on the lifting scheme. Designed the Hardware shown above to help in the parallel implementation of the DWT. Completed coding of the modules in Verilog HDL. Synthesized the same using Synopsis Design Compiler.

Results Predict stage in a sequential machine needs 16 memory accesses
4 multiplication 4 addition/subtraction The new scheme needs 1 memory access 1 clock for all the multiplications 1 clock cycle for all additions/subtractions. Exactly the same results for the update stage. Apart from parallelism within the stages. This scheme allows pipelining of the two stages.

Hardware Acceleration of the Lifting Based DWT

Similar presentations

Presentation on theme: "Hardware Acceleration of the Lifting Based DWT"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hardware Acceleration of the Lifting Based DWT

Similar presentations

Presentation on theme: "Hardware Acceleration of the Lifting Based DWT"— Presentation transcript:

Similar presentations

About project

Feedback