Download presentation
Presentation is loading. Please wait.
Published byRosamund Cain Modified over 9 years ago
1
HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe
2
Outline Objectives Background HW-SW Partitioning SW/HW Design Testing and Debug VGA Display Driver Results Lessons Learned Future Work
3
Objectives Accelerate MPEG-2 Decoder – Identify bottlenecks – Isolate bottleneck functions and partition design – Convert SW functions to HW blocks – Design HW/SW interfaces for communication – Measure accelerated performance Design VGA display driver on FPGA – Attempt to display decoded stream in real-time
4
Background Development Platform – TLL-5000 prototyping board ARMv9, Spartan3 FPGA, VGA DAC (ADV7125) Source code for MPEG2 Decoder – Obtained from sourceforge.net
5
Background – MPEG2 Consists of Group of pictures (GOP) sequence Types of pictures – I-picture (Intra coded) – P-picture (Forward predicted) – B-picture (Bidirectional predicted)
6
Background – MPEG2
7
HW-SW Partitioning Linux profiling done to determine critical functions – Results based on a particular input (mpeg file) – Assumed to be representative of a typical use case – Profiling done on x86 Linux and as well as on the board gmon.out generated on board
8
Profiling on x86-Linux
9
Profiling on ARM-Linux
10
HW-SW Partitioning
11
SW Design IDCT function uses pointers to access an input array – Not suitable for synthesis by Catapult-C – Converted all pointer accesses to array accesses IDCT performs non sequential accesses with varying stride – Modified caller of the IDCT function to re-organize access pattern into sequential form – Created temporary array, which is passed to function – Return array from function is re-distributed to correct locations Changes to software verified using golden code
12
SW Flow Chart MPEG2 SW code.……........ ……. IDCT function call.……........ ……. Create temporary buffer Pass input values in temporary buffer to FPGA memory Issue Start command to FPGA IDCT does computation and stores data back in FPGA memory Generates interrupt signal after computation is done Reads values from FPGA memory to temporary buffer Stores values from temp buffer back to original array in order start Wait for Interrupt interrupt..........
13
HW Design Mentor Catapult-C Synthesis Tool – High level synthesis from C/C++ to Verilog RTL
14
HW Design High Level Synthesis – Tool schedules operations on a cycle-by cycle basis – Constrained to available resources Uses target device and library information – Built RTL as a interface + controller + datapath
15
Example: Y = A*C + B*D
19
HW Design Code conversion for synthesis – Isolate IDCT function from MPEG2 code – Merge initialization functions One initialization construct was needed – Remove all global variables Few dependencies for the IDCT function – Convert pointer arithmetic to array offsets Most work needed for this conversion No standard guidelines available
20
HW Design Pointer conversions
21
HW Design Hardware Interface
22
HW Design Verifying Isolated IDCT function in C and RTL – C testbench written to test isolated IDCT function – Catapult-C allows testing of C function vs. RTL Ensure RTL generation matches expected behavior Un-converted pointer code generated wrong RTL
23
HW Design Integration with communication interface – Communication FSM given – Integrate IDCT block
24
Problems Faced IDCT RTL would not synthesize to 66 MHz – 27 MHz clock used instead IDCT code takes ~30 minutes to synthesize – Inefficiency of using Catapult-C to generate code Catapult code difficult to debug Some reads not returning correct values – Read/Write alignment – Synthesis could be a problem
25
Debug Techniques Removed IDCT block for fast synthesis – Used to check interface memory writes – Showed 16 bit writes were not successful Routed state bits to board LEDS – Helpful when program hangs due to lack of DTACK – OR’d DTACK with DIP switch to prevent hang printf and printk statements to check addresses and data being sent
26
Delay Values Hardware Delay – Approximately 10 us to compute IDCT Based on cycle count provided by Catapult-C and 27 MHz clock frequency of FPGA Pure software implementation – Approximately 30 us Overhead for communication – ~15000 us
27
VGA Display Block Diagram VGA Application Driver VGA Controller Main FSM RAM 1 RAM 2 ADV 7125 Monitor VGA On Board FPGA ARM Generated ppm files
28
VGA Hardware: ADV7125 Video DAC ADV7125 has triple 8-bit video DAC’s VGA DAC requires R, G, B 8-bit values Needs H-Sync and V-Synch signals
29
VGA Controller Used double buffer to store frame data – FIFO implementation didn’t work ARM cannot keep up with the display data rate requirement – Frame resolution: 64X48 – Each frame transfer requires 3072 words – Used 12KB RAM memory to implement double buffer One full frame transferred with single driver call – Reduces system call overhead – Each call overhead ~26 μs Interrupt used to communicate to User application – Fills the next buffer
30
VGA Display Demonstration
31
Lessons Learned Debugging on an FPGA is difficult! Hand-conversion of C code could have been more efficient Create test bench to simulate ARM-FGPA communication – Allows quick debug of FPGA hardware – Visibility into internal signals Hardware partition should have high computation to communication ratio – IDCT called many times with small computation time – ~10 us of computation; ~15000 us of communication
32
Future Work Fix erroneous reads from IDCT Integrate VGA display driver and MPEG2 Decoder
33
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.