FPGA Based Video Codec: Implementation and Techniques An Seminar Series Markus Adhiwiyogo Benjamin Ernest-Jones Matt Richey
Field Programmable Graphical Arrays Ability to reconfigure its circuitry for a desired application or function at any time after manufacturing Adaptive hardware that continuously changes in response to the input data or processing environment Combination of general-purpose processors and ASICs Quick reconfiguration time, in order of 100 S to 1 mS
Basic FPGA Design and Structure A myriad of Configurable Logic Blocks The CLBs may have functionality of either adding or comparing two numbers Connection between CLBs are established through signal controlled grid connections Current FPGAs have more than 100,000 logic gates
Advantages of FPGA Reconfiguration ability enables performing specific computational tasks at will Higher flexibility for adaptive coding for multimedia requirements such as: Bandwidth availability Quality of Service requirements Channel characteristics Rapid prototyping and design iteration Certain function implementations lead to reduction in die area
Disadvantages of FPGA Hardware is not ASIC which can lead to non-optimized performance and density Reconfiguration time is longer compared to loading software High power consumption during reconfiguration
Parallel Banks Technique Codec implemented on 2 or more FPGAs Each FPGA has all parts of the codec Enables multiple data to be processed simultaneously Advantages: Easy to implement Die area is not a constraint High data throughput due to parallelism Disadvantages : Too much hardware Lead to non-optimized configuration
Compile-Time Reconfiguration Entire chip is configured once for the target application Advantages: Easy control signals Disadvantages: More than 1 FPGA may be needed
Run-Time Reconfiguration Chip is reconfigured to perform different functions during an application Advantages: Reduced Hardware Critical Path is small Disadvantages: Reconfiguration causes significant delay (can be compensated by partial reconfiguration) May lead to difficulty in control system implementation
Prototype Video Codec from UCLA Transformation scheme (i.e DCT) Quantization Entropy Coding No Motion Compensation performed
Detailed Description of UCLA Video Codec Utilizes RTR implementation Partitioned into 3 separate configuration Discrete Wavelet Transform, Addressing, and Control Logic Quantization and Run Length Coding Entropy Coding RTR uses partial reconfiguration technique QCIF Resolution kbs CDMA for RF-Link
Configuration One Discrete Wavelet Transform Short filter with integer coefficients Requires 318 gates and 241 flip-flops Corresponds to 681 CLBs Addressing and Control Logic Correct data retrieval from RAM Provides access to peripheral system
Configuration Two Quantization and Run Length Coding Requires 2500 gates Addressing and Control Logic Same as configuration 1 Never reconfigured Data from previous configuration stored in another RAM
Configuration Three Entropy Coding Provides 2:1 lossless compression Addressing and Control Logic Same as configuration 1 and 2 Never reconfigured Data from previous configuration stored in another RAM
Experiment Results RTR provides lowest silicon area Partial reconfiguration decreases reconfiguration delay by 50% on Global reconfiguration Critical Path is 220 ns (5 MHz system) Load and ready time approximately 1.6ms Compression rate of 15:1 was achieved Independent of frame size
Alternate Implementation: FPGA-VSP Co-Processor Allows more operations: 7 x 7 Mask 2D Filter (13.3 f/sec) 8 x 8 Block DCT (55 f/sec) 4 x 4 Block VQ at 0.5 bpp (7.4 f/sec) 1 level WT (35.7 f/sec) Max FPGA clock of 20 MHz Max VSP clock of 50 MHz
Other Notable Implementations and Techniques Dual FPGA, One RTR at any time FPGA and General Processor Co- Processing Systolic Look Up Table for transform coefficients
Documentations J. Villasenor and W.H. Mangione-Smith, "Configurable Computing,” Scientific American, pp , June, 1997.Configurable Computing J. Villasenor, C. Jones, and B. Schoner, "Video Communications using Rapidly Reconfigurable Hardware," IEEE Transactions on Circuits and Systems for Video Technology, vol. 5, pp , December 1995.Video Communications using Rapidly Reconfigurable Hardware B. Schoner, C. Jones and J. Villasenor, "Issues in Wireless Video Coding Using Run-time-reconfigurable FPGAs,” Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pp , Napa, CA, Apr Issues in Wireless Video Coding Using Run-time-reconfigurable FPGAs B. Schoner, J. Villasenor, S. Molloy, R. Jain, "Techniques for FPGA Implementation of Video Compression," ACM/SIGBA International Symposium on Field-Programmable Gate Arrays, 1995.Techniques for FPGA Implementation of Video Compression
Related Sites FPGA Based Codec Site Techniques and Implementations c/class/15828-s98/www/index.html c/class/15828-s98/www/index.html c/class/15828-s98/www/index.html Hardware Sites
Question and Answers How does FPGA compare to direct hardware implementation? Compared to video cards of today, FPGA’s performance would be slower compared to them. I believe this is because today’s semiconductor technology is still insufficient to process FPGAs wiring and density to be optimal. Frame rate of the UCLA video codec? Frame rate of the codec depends upon which hardware implementation used. In the co-processing method, the frame rate is variable (from 7-35). The pure FPGA implementation runs at 20 frames/second. Although the comparison may look “funny” one also must take into account that the pure FPGA implementation much more simplified codec than the co-processing method. How fast a FPGA re-configure itself? Initial design download is 1.6 ms. Global reconfiguration is 3 ms. Partial reconfiguration is 1.5 ms.