Download presentation
Presentation is loading. Please wait.
1
Video on DSP and FPGA John Johansson April 12, 2004
2
Agenda ► Overview of video processing ► A typical video encoder and the DCT ► Requirements of DCT ► Comparison of DSP and FPGA chips ► Analysis and conclusions ► Questions
3
Overview of Video Processing Video processing generally involves ► Compression / Decompression ► Special Effects ► TV Broadcasting ► Focus on Compression
4
Video Encoding Typical Video Encoder ► Focus on DCT algorithm
5
The Discrete Cosine Transformation ► DCT is a spatial transform, like the FFT ► Rearranges data into a more compressible format ► Typically done on 64 (8x8) pixels at a time ► Big nasty equation … ► … But no sharp teeth (optimizes extremely well)
6
Requirements for DCT Basic Idea ► Read in data (64 values, 8-24 bits signed / unsigned) ► Do transformation ► Write out data ► Profit !!! ► Easy, right ??
7
Requirements for DCT Memory Limitations ► Load an entire frame? ► One frame can vary from 50K to 50 MB in size when uncompressed ► External memory is much slower, more plentiful ► Do the DCT in chunks (8x8 block)
8
Requirements for DCT Degree of Parallelism ► DCT can be done serially, or broken up and done in parallel ► Parallelism depends largely on available memory ► Price / Performance tradeoffs
9
The Challengers Xilinx Spartan-3 FPGA ► 50K – 5M gates ► 326 MHz ► 100 KB – 2.3 MB internal memory ► 4 - 104 dedicated multipliers ► Oodles of I/O pins (up to 784) Look at XC3S1000 ► 1M gates, 560 KB memory, 24 multipliers, 376 I/O pins
10
The Challengers ADSP-BF5xx Blackfin Processor ► 200 – 750 MHz ► Single or dual core ► DMA memory controller ► 52 KB – 326 KB internal memory ► Other processor goodies Look at ADSP-BF533 ► 500 MHz, single core, 148 KB memory
11
Performance How do we correctly benchmark an algorithm between two completely different processors? ► I don’t really know ► Look at some rough performance indicators and try and draw a conclusion
12
Performance FPGA ► Varies from 1-25 cycle(s) / pixel for DCT ► Reading and writing of data takes additional time ► Clock speed limited by degree of parallelism DSP ► Roughly 5 cycles / pixel for DCT ► DMA controller allows parallel reading and writing with some setup overhead
13
(Ideal) Performance Spartan-3 ► 64 read + 64 compute + 64 write = 196 cycles / block ► 326 MHz = 1.66 Mblocks / second Blackfin ► 319 compute + 10 DMA transfer = 329 cycles / block ► 500 MHz = 1.52 Mblocks / second
14
Advantages FPGA ► Potential for very high parallelism ► Existing video designs available for purchase ► Good middleman functionality DSP ► Higher potential clock speed ► Much more flexible design ► DMA memory controller
15
Disadvantages FPGA ► Low flexibility ► Hard to optimize ► Limited logic blocks DSP ► Difficult to achieve full utilization ► Higher power consumption
16
Conclusions FPGA ► Best for well defined roles, like DCT ► Faster in situations where throughput matters ► Can be very expensive DSP ► Better off for more flexible roles, like full encoder ► Situations where large amounts of (additional) memory are needed
17
Questions?
18
References Xilinx Spartan III http://www.xilinx.com/xlnx/xil_prodcat_landingpage. jsp?title=Spartan-3 Analog Devices Blackfin http://www.analog.com/processors/processors/black fin/index.html
19
References Other articles http://www.xilinx.com/publications/products/services /xc_pdf/xc_videoapps44.pdf http://www.xilinx.com/publications/products/sp2e/xc _dspvid43.htm http://www.reed- ectronics.com/ednmag/article/CA336860?stt=000& pubdate=11%2F27%25
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.