Download presentation
Presentation is loading. Please wait.
Published byGarey Day Modified over 9 years ago
1
Roman Kofman & Sergey Kleyman Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part A (Annual project)
2
Project Recap Data Flow Blocks implementation Conclusions Project B - Time Table
3
The algorithm: Nonlinear Diffusion The algorithm: Nonlinear Diffusion use numeric solution with iterations to solve the diffusion equation use numeric solution with iterations to solve the diffusion equation Why use it for image processing? Why use it for image processing? Image noise is smoothed Image noise is smoothed Edges remain sharp Edges remain sharp
4
Original image
5
dt = 30 !!! one iteration dt = 30 !!! one iteration Look at the edges (sharp!) Look at the hat (smoothed)
6
Difficulties with the semi-implicit model: Difficulties with the semi-implicit model: Very complex design (Thomas), makes real time almost impossible Transpose entire image Reverse order loop multiple memory accesses So why use this model ??? So why use this model ??? Strong effect - good results after very few iterations
7
DVI IN DVI IN DVI OUT DVI OUT Lines PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ Columns PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ T’ How to implement T’ In real time???
8
Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns rows M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ Double buffers External memory Balanced channels Reduced frequency
9
AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization
10
Addressing controller Addressing method - First attempt:Addressing method - First attempt: Use cache organization approach: Use cache organization approach: Fast - direct access to data in memoryFast - direct access to data in memory Easy to implement - no logic is needed for “translation”Easy to implement - no logic is needed for “translation” However, expensive : 10 bits is more than we need for column representation10 bits is more than we need for column representation 4bits10bits 1bit rowAreacolumn 15 bits
11
Addressing controller 1 st attempt implementation requires: 98KB1 st attempt implementation requires: 98KB 1 M-RAM block is 64KB1 M-RAM block is 64KB Solution Use consecutive addressing Use consecutive addressing Address = block + row + phase Address = block + row + phase Requires “translation” … but: Requires “translation” … but: Size: 61KB - Fits! Size: 61KB - Fits! Quartus report
12
Addressing controller Address translation units
13
AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization
14
Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ
15
TransposeGoal: write the transposed data, so it can later be read sequentially, in rowswrite the transposed data, so it can later be read sequentially, in rowsProblem: Random access in DDR is too expensive: 32 clk penalty!Random access in DDR is too expensive: 32 clk penalty!solution: Use internal memory to inverse order:Use internal memory to inverse order: - “pay” most penalty in random accesses to FPGA mem - “pay” most penalty in random accesses to FPGA mem Write to DDR in “windows” :Write to DDR in “windows” : - Enable sequential row write - Penalty only every row skip
16
Transpose how it works: M-RAM WRITE M-RAM READ DDRII T’ WRITE DDRII T’ READ Penalty every row skip Sequential read from DDR Penalty all the time !
17
AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization
18
Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ
19
Reverse Line Order Used for Thomas algorithmUsed for Thomas algorithm Implementation Implementation On M4K blocksOn M4K blocks Double sized buffer with alternating pointers for Read/WriteDouble sized buffer with alternating pointers for Read/Write 0 640 0 640 Read Write Swap addresses Read Write
20
AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization
21
Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ
22
We need very large double buffers, that can be integrated easily with FPGA designs We need very large double buffers, that can be integrated easily with FPGA designs FPGA is resource limited FPGA is resource limited Solution: use external memory for this purpose. Solution: use external memory for this purpose.
23
Enables efficient usage of the memory on GiDEL PROC board Enables efficient usage of the memory on GiDEL PROC board Up to 16 ports per bank, 2 banks per FPGA Up to 16 ports per bank, 2 banks per FPGA Each port may be forced to access a different memory area and limited to a certain address space Each port may be forced to access a different memory area and limited to a certain address space Straightforward random memory access with random ports – slow and not efficient Straightforward random memory access with random ports – slow and not efficient Segmented working mode option for sequential ports. Enables to perform fast read/write bursts. Segmented working mode option for sequential ports. Enables to perform fast read/write bursts.
24
Two ports: sequential read and write. Each accesses a different memory area. Implement double buffer: by switching the starting address at the end of every burst.
25
Pipeline Design Multi port coreOurEntity with Controller Control signals Write sequential port Read sequential port Fixed CLK External DVI CLK PROBLEM
26
Add FIFO to implement data rate matching. Add FIFO to implement data rate matching. Altera provides dual-clock FIFO (DCFIFO) megafunction. Using it before and after each write/read port would solve the problem. Altera provides dual-clock FIFO (DCFIFO) megafunction. Using it before and after each write/read port would solve the problem. Control logic is integrated into the control entity. Control logic is integrated into the control entity. Extra FIFOs = extra FPGA resources Extra FIFOs = extra FPGA resources
27
Solution Pipeline Design Multi port coreOurEntity with Controller Control signals Write sequential port Read sequential port
28
DVI clk Multi clk
29
Reset Prepare for read \ write Read \ write Flush Following DDR protocol including wait states Symmetric read \ write bursts according to FIFOs states Burst length can be adjusted Next slide… Buffer controller Schema
30
Problem: Data is written to DDR, only when the internal DDR FIFO is full Problem: Data is written to DDR, only when the internal DDR FIFO is full Solution: Flush forces the FIFO to pass data. Not using the Accurate flush length results in image noise! Solution: Flush forces the FIFO to pass data. Not using the Accurate flush length results in image noise! Problem: Flush delay length is not constant and depends on burst length Problem: Flush delay length is not constant and depends on burst length Solution: stretch write bursts until FIFO is almost full. This will lower flush influence. Solution: stretch write bursts until FIFO is almost full. This will lower flush influence.
31
Reset Prepare for read \ write Read \ writeFlush Fixed controller Schema Internal fifo is almost full
32
Up to 8 buffers per memory bank Up to 8 buffers per memory bank Must comply with bandwidth restrictions (MultiPort utilization) Must comply with bandwidth restrictions (MultiPort utilization) Integration effort Integration effort
33
AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization
34
Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ
35
In original design – down rate used internal memory. However, needed FIFO will not fit on FPGA In original design – down rate used internal memory. However, needed FIFO will not fit on FPGA Implementation is based on the DDR buffer with asymmetric read / write Implementation is based on the DDR buffer with asymmetric read / write Extra DDR access Extra DDR access Input output DCFIFOs are asymmetric in size Input output DCFIFOs are asymmetric in size Full data path Full data path Down rate buffer save to DDR only 1 frame out of 4 Down rate buffer save to DDR only 1 frame out of 4 Up rate buffer read from DDR same frame 4 times Up rate buffer read from DDR same frame 4 times
36
Prepare for write Read Flush reset Prepare for read Write Flush reset Prepare for write Flush Read/write reset Prepare for write Flush Read/write reset Re/Wr Sync controller
37
AGENDA Internal memory blocks: Addressing controller Transpose Line reverse External memory: Double buffer on DDR Up/down rate controller DVI synchronization
39
DVI in controller Mux Flag frame Flag detector Signal generation DVI rx DVI tx 24 data bit 12 bits hsync vsync date enable clk FPGA Data path with memory access Data path with memory access PLL 24bit to 12bit double rate gen hsync gen vsync gen de clk The signals must Pass through the same long delays as data extra bits written to memory
40
DVI in controller Mux Flag frame Flag detector Signal generation Send a known flag through the data path Send a known flag through the data path Start generating according to flag arrival Start generating according to flag arrival DVI rx DVI tx 24 data bit 12 bits hsync vsync date enable clk FPGA Data path with memory access Data path with memory access PLL 24bit to 12bit double rate gen hsync gen vsync gen de clk
41
Freq controller: 4F to F Freq controller: 4F to F Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT DDRII T’ WRITE DDRII T’ READ columns lines 48bit M-RAM WRITE M-RAM READ M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Delay M-RAM WRITE M-RAM READ
42
Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines 48bit M-RAM WRITE M-RAM READ M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Delay M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ
43
Transpose DVI IN DVI IN DDRII T’ WRITE DDRII T’ READ PIPEThomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ PIPE Thomas 3 M4K LINE REVERSE WRITE M4K LINE REVERSE READ M4K LINE REVERSE WRITE M4K LINE REVERSE READ DVI OUT DVI OUT columns lines M-RAM WRITE M-RAM READ Transpose DDRII T’ WRITE DDRII T’ READ M-RAM WRITE M-RAM READ Freq controller: 4F to F DDRII T’ WRITE DDRII T’ READ Freq controller: F to 4F DDRII T’ WRITE DDRII T’ READ
44
Summery Internal memory blocks: Internal memory blocks: Addressing controller Addressing controller Transpose Transpose Line reverse Line reverse External memory: External memory: Double buffer on DDR Double buffer on DDR Up/down rate controller Up/down rate controller DVI synchronization DVI synchronization
45
Problem with the board’s RESET Problem with the board’s RESET Problem with loading design Problem with loading design
46
Plan and implement logic blocks: Plan and implement logic blocks: SQRT, DIV are the main problemSQRT, DIV are the main problem Verify required precisionVerify required precision (based on our conclusions from part A) Integration of frequency controllers and transpose blocks Integration of frequency controllers and transpose blocks Implement one full iteration Implement one full iteration
47
Divide between 2 problems: Design of logic blocks Design of logic blocks Full DDR blocks integration Full DDR blocks integrationHow? Implement the processing algorithm for a smaller frame - Avoid using external memory Implement the processing algorithm for a smaller frame - Avoid using external memory
48
DVI IN DVI IN DVI OUT DVI OUT Logic blocks M-RAM WRITE M-RAM READ M-RAM WRITE M-RAM READ Sample smaller frame
49
Project B goal: create end to end data path - with Image Processing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.