Download presentation
Presentation is loading. Please wait.
1
FPGA Based Video Codec: Implementation and Techniques An 18-796 Seminar Series Markus Adhiwiyogo Benjamin Ernest-Jones Matt Richey
2
Field Programmable Graphical Arrays Ability to reconfigure its circuitry for a desired application or function at any time after manufacturing Adaptive hardware that continuously changes in response to the input data or processing environment Combination of general-purpose processors and ASICs Quick reconfiguration time, in order of 100 S to 1 mS
3
Basic FPGA Design and Structure A myriad of Configurable Logic Blocks The CLBs may have functionality of either adding or comparing two numbers Connection between CLBs are established through signal controlled grid connections Current FPGAs have more than 100,000 logic gates
4
Advantages of FPGA Reconfiguration ability enables performing specific computational tasks at will Higher flexibility for adaptive coding for multimedia requirements such as: Bandwidth availability Quality of Service requirements Channel characteristics Rapid prototyping and design iteration Certain function implementations lead to reduction in die area
5
Disadvantages of FPGA Hardware is not ASIC which can lead to non-optimized performance and density Reconfiguration time is longer compared to loading software High power consumption during reconfiguration
6
Parallel Banks Technique Codec implemented on 2 or more FPGAs Each FPGA has all parts of the codec Enables multiple data to be processed simultaneously Advantages: Easy to implement Die area is not a constraint High data throughput due to parallelism Disadvantages : Too much hardware Lead to non-optimized configuration
7
Compile-Time Reconfiguration Entire chip is configured once for the target application Advantages: Easy control signals Disadvantages: More than 1 FPGA may be needed
8
Run-Time Reconfiguration Chip is reconfigured to perform different functions during an application Advantages: Reduced Hardware Critical Path is small Disadvantages: Reconfiguration causes significant delay (can be compensated by partial reconfiguration) May lead to difficulty in control system implementation
9
Prototype Video Codec from UCLA Transformation scheme (i.e DCT) Quantization Entropy Coding No Motion Compensation performed
10
Detailed Description of UCLA Video Codec Utilizes RTR implementation Partitioned into 3 separate configuration Discrete Wavelet Transform, Addressing, and Control Logic Quantization and Run Length Coding Entropy Coding RTR uses partial reconfiguration technique QCIF Resolution 60-600 kbs CDMA for RF-Link
11
Configuration One Discrete Wavelet Transform Short filter with integer coefficients Requires 318 gates and 241 flip-flops Corresponds to 681 CLBs Addressing and Control Logic Correct data retrieval from RAM Provides access to peripheral system
12
Configuration Two Quantization and Run Length Coding Requires 2500 gates Addressing and Control Logic Same as configuration 1 Never reconfigured Data from previous configuration stored in another RAM
13
Configuration Three Entropy Coding Provides 2:1 lossless compression Addressing and Control Logic Same as configuration 1 and 2 Never reconfigured Data from previous configuration stored in another RAM
14
Experiment Results RTR provides lowest silicon area Partial reconfiguration decreases reconfiguration delay by 50% on Global reconfiguration Critical Path is 220 ns (5 MHz system) Load and ready time approximately 1.6ms Compression rate of 15:1 was achieved Independent of frame size
15
Alternate Implementation: FPGA-VSP Co-Processor Allows more operations: 7 x 7 Mask 2D Filter (13.3 f/sec) 8 x 8 Block DCT (55 f/sec) 4 x 4 Block VQ at 0.5 bpp (7.4 f/sec) 1 level WT (35.7 f/sec) Max FPGA clock of 20 MHz Max VSP clock of 50 MHz
16
Other Notable Implementations and Techniques Dual FPGA, One RTR at any time FPGA and General Processor Co- Processing Systolic Look Up Table for transform coefficients
17
Documentations J. Villasenor and W.H. Mangione-Smith, "Configurable Computing,” Scientific American, pp. 66-71, June, 1997.Configurable Computing J. Villasenor, C. Jones, and B. Schoner, "Video Communications using Rapidly Reconfigurable Hardware," IEEE Transactions on Circuits and Systems for Video Technology, vol. 5, pp. 565-567, December 1995.Video Communications using Rapidly Reconfigurable Hardware B. Schoner, C. Jones and J. Villasenor, "Issues in Wireless Video Coding Using Run-time-reconfigurable FPGAs,” Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pp. 85-89, Napa, CA, Apr. 1995.Issues in Wireless Video Coding Using Run-time-reconfigurable FPGAs B. Schoner, J. Villasenor, S. Molloy, R. Jain, "Techniques for FPGA Implementation of Video Compression," ACM/SIGBA International Symposium on Field-Programmable Gate Arrays, 1995.Techniques for FPGA Implementation of Video Compression
18
Related Sites FPGA Based Codec Site www.icsl.ucla.edu/~ipl www.icsl.ucla.edu/~ipl Techniques and Implementations www.cs.cmu.edu/afs/cs.cmu.edu/academi c/class/15828-s98/www/index.html www.cs.cmu.edu/afs/cs.cmu.edu/academi c/class/15828-s98/www/index.html www.cs.cmu.edu/afs/cs.cmu.edu/academi c/class/15828-s98/www/index.html www.ece.cmu.edu/research/piperench/ www.ece.cmu.edu/research/piperench/ Hardware Sites www.xilinx.com www.xilinx.com www.altera.com www.altera.com
19
Question and Answers How does FPGA compare to direct hardware implementation? Compared to video cards of today, FPGA’s performance would be slower compared to them. I believe this is because today’s semiconductor technology is still insufficient to process FPGAs wiring and density to be optimal. Frame rate of the UCLA video codec? Frame rate of the codec depends upon which hardware implementation used. In the co-processing method, the frame rate is variable (from 7-35). The pure FPGA implementation runs at 20 frames/second. Although the comparison may look “funny” one also must take into account that the pure FPGA implementation much more simplified codec than the co-processing method. How fast a FPGA re-configure itself? Initial design download is 1.6 ms. Global reconfiguration is 3 ms. Partial reconfiguration is 1.5 ms.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.