FPGA Based Video Codec: Implementation and Techniques An 18-796 Seminar Series Markus Adhiwiyogo Benjamin Ernest-Jones Matt Richey.

FPGA Based Video Codec: Implementation and Techniques An 18-796 Seminar Series Markus Adhiwiyogo Benjamin Ernest-Jones Matt Richey

Field Programmable Graphical Arrays  Ability to reconfigure its circuitry for a desired application or function at any time after manufacturing  Adaptive hardware that continuously changes in response to the input data or processing environment  Combination of general-purpose processors and ASICs  Quick reconfiguration time, in order of 100  S to 1 mS

Basic FPGA Design and Structure  A myriad of Configurable Logic Blocks  The CLBs may have functionality of either adding or comparing two numbers  Connection between CLBs are established through signal controlled grid connections  Current FPGAs have more than 100,000 logic gates

Advantages of FPGA  Reconfiguration ability enables performing specific computational tasks at will  Higher flexibility for adaptive coding for multimedia requirements such as:  Bandwidth availability  Quality of Service requirements  Channel characteristics  Rapid prototyping and design iteration  Certain function implementations lead to reduction in die area

Disadvantages of FPGA  Hardware is not ASIC which can lead to non-optimized performance and density  Reconfiguration time is longer compared to loading software  High power consumption during reconfiguration

Parallel Banks Technique  Codec implemented on 2 or more FPGAs  Each FPGA has all parts of the codec  Enables multiple data to be processed simultaneously  Advantages:  Easy to implement  Die area is not a constraint  High data throughput due to parallelism  Disadvantages :  Too much hardware  Lead to non-optimized configuration

Compile-Time Reconfiguration  Entire chip is configured once for the target application  Advantages:  Easy control signals  Disadvantages:  More than 1 FPGA may be needed

Run-Time Reconfiguration  Chip is reconfigured to perform different functions during an application  Advantages:  Reduced Hardware  Critical Path is small  Disadvantages:  Reconfiguration causes significant delay (can be compensated by partial reconfiguration)  May lead to difficulty in control system implementation

Prototype Video Codec from UCLA  Transformation scheme (i.e DCT)  Quantization  Entropy Coding  No Motion Compensation performed

Detailed Description of UCLA Video Codec  Utilizes RTR implementation  Partitioned into 3 separate configuration  Discrete Wavelet Transform, Addressing, and Control Logic  Quantization and Run Length Coding  Entropy Coding  RTR uses partial reconfiguration technique  QCIF Resolution  60-600 kbs  CDMA for RF-Link

Configuration One  Discrete Wavelet Transform  Short filter with integer coefficients  Requires 318 gates and 241 flip-flops  Corresponds to 681 CLBs  Addressing and Control Logic  Correct data retrieval from RAM  Provides access to peripheral system

Configuration Two  Quantization and Run Length Coding  Requires 2500 gates  Addressing and Control Logic  Same as configuration 1  Never reconfigured  Data from previous configuration stored in another RAM

Configuration Three  Entropy Coding  Provides 2:1 lossless compression  Addressing and Control Logic  Same as configuration 1 and 2  Never reconfigured  Data from previous configuration stored in another RAM

Experiment Results  RTR provides lowest silicon area  Partial reconfiguration decreases reconfiguration delay by 50% on Global reconfiguration  Critical Path is 220 ns (5 MHz system)  Load and ready time approximately 1.6ms  Compression rate of 15:1 was achieved  Independent of frame size

Alternate Implementation: FPGA-VSP Co-Processor  Allows more operations:  7 x 7 Mask 2D Filter (13.3 f/sec)  8 x 8 Block DCT (55 f/sec)  4 x 4 Block VQ at 0.5 bpp (7.4 f/sec)  1 level WT (35.7 f/sec)  Max FPGA clock of 20 MHz  Max VSP clock of 50 MHz

Other Notable Implementations and Techniques  Dual FPGA, One RTR at any time  FPGA and General Processor Co- Processing  Systolic  Look Up Table for transform coefficients

Documentations   J. Villasenor and W.H. Mangione-Smith, "Configurable Computing,” Scientific American, pp. 66-71, June, 1997.Configurable Computing   J. Villasenor, C. Jones, and B. Schoner, "Video Communications using Rapidly Reconfigurable Hardware," IEEE Transactions on Circuits and Systems for Video Technology, vol. 5, pp. 565-567, December 1995.Video Communications using Rapidly Reconfigurable Hardware   B. Schoner, C. Jones and J. Villasenor, "Issues in Wireless Video Coding Using Run-time-reconfigurable FPGAs,” Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pp. 85-89, Napa, CA, Apr. 1995.Issues in Wireless Video Coding Using Run-time-reconfigurable FPGAs   B. Schoner, J. Villasenor, S. Molloy, R. Jain, "Techniques for FPGA Implementation of Video Compression," ACM/SIGBA International Symposium on Field-Programmable Gate Arrays, 1995.Techniques for FPGA Implementation of Video Compression

Related Sites  FPGA Based Codec Site  www.icsl.ucla.edu/~ipl www.icsl.ucla.edu/~ipl  Techniques and Implementations  www.cs.cmu.edu/afs/cs.cmu.edu/academi c/class/15828-s98/www/index.html www.cs.cmu.edu/afs/cs.cmu.edu/academi c/class/15828-s98/www/index.html www.cs.cmu.edu/afs/cs.cmu.edu/academi c/class/15828-s98/www/index.html  www.ece.cmu.edu/research/piperench/ www.ece.cmu.edu/research/piperench/  Hardware Sites  www.xilinx.com www.xilinx.com  www.altera.com www.altera.com

Question and Answers  How does FPGA compare to direct hardware implementation?  Compared to video cards of today, FPGA’s performance would be slower compared to them. I believe this is because today’s semiconductor technology is still insufficient to process FPGAs wiring and density to be optimal.  Frame rate of the UCLA video codec?  Frame rate of the codec depends upon which hardware implementation used. In the co-processing method, the frame rate is variable (from 7-35). The pure FPGA implementation runs at 20 frames/second. Although the comparison may look “funny” one also must take into account that the pure FPGA implementation much more simplified codec than the co-processing method.  How fast a FPGA re-configure itself?  Initial design download is 1.6 ms. Global reconfiguration is 3 ms. Partial reconfiguration is 1.5 ms.

FPGA Based Video Codec: Implementation and Techniques An 18-796 Seminar Series Markus Adhiwiyogo Benjamin Ernest-Jones Matt Richey.

Similar presentations

Presentation on theme: "FPGA Based Video Codec: Implementation and Techniques An 18-796 Seminar Series Markus Adhiwiyogo Benjamin Ernest-Jones Matt Richey."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

FPGA Based Video Codec: Implementation and Techniques An 18-796 Seminar Series Markus Adhiwiyogo Benjamin Ernest-Jones Matt Richey.

Similar presentations

Presentation on theme: "FPGA Based Video Codec: Implementation and Techniques An 18-796 Seminar Series Markus Adhiwiyogo Benjamin Ernest-Jones Matt Richey."— Presentation transcript:

Similar presentations

About project

Feedback