1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk Semesterial project SPRING 2008 High Speed Digital System Lab
2 Project Goals Real time video signal filtering based on nonlinear diffusion algorithm. Studying the algorithm of nonlinear diffusion. Studying the work environment of Synplify DSP. Implementing on FPGA, a real time video processing algorithm.
NON LINEAR DIFFUSION It aims at filtering an image The filtered image is the solution of a nonlinear differential equation This equation is called the nonlinear diffusion equation: Since no analytic solution is known, we need to solve it iteratively 3
ALGORITHM FEATURES 4 It smoothes the image without damaging the edges It is an iterative algorithm The more iterations, the more effect you get Highly computationally demanding Real time implementation possible only in hardware Original image 15 iterations 40 iterations80 iterations
5 Part 1: Simulation in Simulink environment Part 2: Working directly with the FPGA CONTENTS
6 Short reminder of what has been done till the midterm presentation PART 1
FROM Simulink to SynplifyDSP We had to change the original Matlab/Simulink design because: 1)We choose not to use any buffer between the DVI connection and the processing of the input 2)In the Simulink design we use matrixes to represent images, but SynplifyDSP can only use vectors 7
REAL TIME DERIVATION false result true result Matrix derivation Vector derivation
SynplifyDSP DESIGN 9
10
FIXED POINT PRECISION In Matlab and Simulink we work at full precision But when we working with FPGA, one needs to use fixed point precision We check for each block how many bits it use. 11
12 SynplifyDSP SIMULATION
13 SynplifyDSP SIMULATION
SIMULATION RESULTS 14 ORIGINAL IMAGE SynplifyDSP RESULT AFTER 30 ITERATIONS
Matlab AND SynplifyDSP COMPARISON We measure the error between the Matlab code output and the SynplifyDSP output. For 30 iteration: relative root MSE = % per pixel 15
ß PARAMETER 16 Let’s simplify the diffusion equation: Now let’s show how one can get an iterative solution: We define ß in the following way: In our implementation this parameter can be changed online !!!
17 Original image
18 NLD image ß=0.1 iterations=10
WORK FOLW – Matlab & Simulink STAGE 19 Algorithm design Simulation and error measurement DSPPRO
WORK FOLW – SynplifyDSP STAGE 20 VHDL code generation DSPPRO
WORK FOLW – SynplifyPRO STAGE 21 Synthesis the VHDL code to logic schema Creats a VQM file DSPPRO
WORK FOLW – ProcWizard STAGE 22 Built the VHDL code of the board interface DSPPRO
WORK FOLW – Quartus STAGE 23 Configuration of the interface VHDL code Link the VHDL interface to the VQM file Place and route Creates RBF file DSPPRO
WORK FOLW – ProcWizard STAGE 24 Load the FPGA with the RBF file DSPPRO
25 Working with the FPGA PART 2
Memory lack Frequency problem (pipeline) Simple designs Checking the card at Gidel Gidel’s advice Why does it work Receiver configuration Ideal control signal waveforms Control signals on scope Maximum iteration minimum frequency Blanking check. 26 PROJECT PROGRESS
MEMORY LACK We use ROM block to implement “pow” function There isn’t enough ROM to load more than one iteration on FPGA High MSE 27
We replaced the ROM by “DIV” and 3 “CONVERTER” This solution give us a 0.2% MSE for one iteration. 28 MEMORY LACK
PROJECT PROGRESS Memory lack Frequency problem (pipeline) Simple designs Checking the card at Gidel Gidel’s advice Why does it work Receiver configuration Ideal control signal waveforms Control signals on scope Maximum iteration minimum frequency Blanking check 29
30 FREQUENCY PROBLEM (PIPELINE) Highest frequency 18MHz. Pipeline at the hardware not at the logic level.
31 To implement a correct pipeline we use the SynplifyDSP program: This solution gave us a frequency of 107MHz FREQUENCY PROBLEM (PIPELINE)
Memory lack Frequency problem (pipeline) Simple designs Checking the card at Gidel Gidel’s advice Why does it work Receiver configuration Ideal control signal waveforms Control signals on scope Maximum iteration minimum frequency Blanking test 32 PROJECT PROGRESS
33 SIMPLE DESIGNS We still had noise that come from the DVI input 1.delay 2.Overhead test We got noise To understand the problem we built 2 simple designs: 1. ”delay”, 2. “overhead_test”
Memory lack Frequency problem (pipeline) Simple designs Checking the card at Gidel Gidel’s advice Why does it work Receiver configuration Ideal control signal waveforms Control signals on scope Maximum iteration minimum frequency Blanking test 34 PROJECT PROGRESS
35 CHECKING THE CARD AT GIDEL Cleaning the card at the lab and switching the DVI cables Checking the card at Gidel: 1.Automatic card test 2.Test with a new PSDB The board and the daughter board worked fine
Memory lack Frequency problem (pipeline) Simple designs Checking the card at Gidel Gidel’s advice Why does it work Receiver configuration Ideal control signal waveforms Control signals on scope Maximum iteration minimum frequency Blanking test 36 PROJECT PROGRESS
DVI CONNECTION 37 DVI RECEIVER DVI TRANSMITTER From graphics card 24 data bit 3 control bit Clk 12 double data rate bit To screen Clk FPGA Sites on DVI PSDB 3 control bit 24 bit 12 MSB 12 LSB
INVALID DVI CONNECTION 38 DVI RECEIVER DVI TRANSMITTER Synplify DSP VHDL code MUX From graphics card 24 data bit 3 control bit Clk 12 MSB data bit 12 LSB data bit 12 double data rate bit To screen Clk 3 control bit FPGA Sites on DVI PSDB
Synplify DSP VHDL code DDR From graphics card 24 data bit 3 control bit Clk 12 MSB data bit 12 LSB data bit 12 double data rate bit To screen PLL Clk Phased Clk 3 control bit FPGA VALID DVI CONNECTION DDR ‘1’ ‘0’ Sites on DVI PSDB 39 DVI RECEIVER DVI TRANSMITTER
Memory lack Frequency problem (pipeline) Simple designs Checking the card at Gidel Gidel’s advice Why does it work Receiver configuration Ideal control signal waveforms Control signals on scope Maximum iteration minimum frequency Blanking test 40 PROJECT PROGRESS
WHY DOES IT WORK DDR clk 1# LSB 1# MSB 2# MSB2#LSB Tcd -mux Tpd -mux Tcd -mux Tpd -mux TholdTsuTholdTsu DDR Data out Transmitter clk 41 FF 12 MSB clk 12 LSB TRANSMITTER DDR 12-BIT DATATO SCREEN 2# MSB Tcd -ff Tpd -ff Tcd -ff Tpd -ff Tcd -mux Tpd -mux Transmitter clk 3# MSB
WHY DOES IT WORK 1# LSB 1# MSB 2# MSB2#LSB Tcd -mux Tpd -mux Tcd -mux Tpd -mux TholdTsuTholdTsu TholdTsu DDR Data out Phased clk 42 FF 12 MSB clk 12 LSB TRANSMITTER DDR 12-BIT DATATO SCREEN PLL 2# MSB Tcd -ff Tpd -ff Tcd -ff Tpd -ff Tcd -mux Tpd -mux Phased clk DDR clk Transmitter clk 3# MSB
Memory lack Frequency problem (pipeline) Simple designs Checking the card at Gidel Gidel’s advice Why does it work Receiver configuration Ideal control signal waveforms Control signals on scope Maximum iteration minimum frequency Blanking test 43 PROJECT PROGRESS
DVI CONNECTION 44 DVI RECEIVER DVI TRANSMITTER From graphics card 24 data bit 3 control bit Clk 12 double data rate bit To screen Clk 3 control bit FPGA
DVI CONNECTION 45 DVI RECEIVER DVI TRANSMITTER From graphics card 24 data bit 3 control bit Clk 12 double data rate bit To screen Clk 3 control bit FPGA
46 RECEIVER CONFIGURATION “overhead_test” worked perfect But “delay” and “NLD” still had noise We found that the solution is to configure differently the receiver BAD CONFIGURATION GOOD CONFIGURATION Valid data and control signal FPGA clk obtained from the receiver
Memory lack Frequency problem (pipeline) Simple designs Checking the card at Gidel Gidel’s advice Why does it work Receiver configuration Ideal control signal waveforms Control signals on scope Maximum iteration minimum frequency Blanking test 47 PROJECT PROGRESS
48 IDEAL CONTROL SIGNAL WAVEFORMS “NLD” works perfect Need to check the control signals with scope
Memory lack Frequency problem (pipeline) Simple designs Checking the card at Gidel Gidel’s advice Why does it work Receiver configuration Ideal control signal waveforms Control signals on scope Maximum iteration minimum frequency Blanking test 49 PROJECT PROGRESS
CLOCK SIGNAL 50 AT SCOPEEXPECTED CLOCK
CONTROL SIGNALS AT SCOPE 51 hsyncvsync enable This signal caused when vsync=‘1’. 19”TFT LCD SXGA monitor data sheet
Memory lack Frequency problem –pipeline Simple designs Checking the card at Gidel Gidel’s advice Why does it work Receiver configuration Ideal control signal waveforms Control signals on scope Maximum iteration minimum frequency Blanking test 52 PROJECT PROGRESS
ITERATION AND FREQUENCY TRADEOFF The more we pipelined our design in order to get higher frequencies, the less iterations we can load 7 iterations and 53MHz 12 iterations and 24.01MHz
Memory lack Frequency problem –pipeline Simple designs Checking the card at Gidel Gidel’s advice Why does it work Receiver configuration Ideal control signal waveforms Control signals on scope Maximum iteration minimum frequency Blanking test 54 PROJECT PROGRESS
BLANKING CHECK We built a special design which count the clock cycles for row, blanking and data.
AND THE BIG BONUS…….. 56
REC TRNS JOINING OF FOUR FPGA’S DVI PSDB 11 ITR (vqm) 1’st FPGA PLL 2’nd FPGA 3’rd FPGA 4’th FPGA Control signals Data clk Control and Data signals PLL clk PLL 11 ITR (vqm) 11 ITR (vqm) 11 ITR (vqm) clk DDR
THANKS… Our great supervisor Mike Sumszyk Lab staff Michael Yampolsky Gadi Tuchman 58
Our God in the sky We are happy to invite you to our demonstration at the lab 59
APPENDIX 60
Image processing Beltrami Smoothing Gaussian Smoothing 61
Comparison between SynplifyDSP and direct VHDL implementation Pros: The SynplifyDSP tool plugs into the familiar Simulink environment. The development is fast. Cons: Hard to obtain an optimal implementation (non optimal critical path) VHDL code that is hard to understand and therefore it is difficult to make changes 62
The Digital Visual Interface (DVI) is a video interface standard designed to maximize the visual quality of digital display devices DVI
Simulink design 64
Simulink design 65
SynplifyDSP – VHDL code 66
Synplify Pro 67
Synplify Pro 68
Procwizard + Quartus In the ProcWizard we create the interface between the FPGA and daughter board DVI port. The Quartus performs the place and route according to the Procwizard interface and the SynplifyPRO node-level netlist. 69
Project stages Simulink design of an existing Matlab code Adaptation of the Simulink design to SynplifyDSP components and constraints. Synthesis of the VHDL code produced by SynplifyDSP using SynplifyPro Integration of the above RTL component within the Gidel card architecture using Quartus II and ProcWizard Place and route by using Quartus II Loading RBF file to Gidel’s Procstar II card using ProcWizard 70