Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida

Slides:



Advertisements
Similar presentations
2009 Midyear Workshop F4-09: Virtual Architecture and Design Automation for Partial Reconfiguration All Hands Meeting November 10th, 2009 Dr. Ann Gordon-Ross.
Advertisements

Computer Architecture (EEL4713, Fall 2013) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Research Student University of.
Run-Time FPGA Partial Reconfiguration for Image Processing Applications Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
HTR: On-Chip Hardware Task Relocation for Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
QUIZ What does ICAP stand for ? What is its main use ? Why is Partition Pin preferred over Bus Macro? 1.
A Matlab Playground for JPEG Andy Pekarske Nikolay Kolev.
Digital Signal Processing and Field Programmable Gate Arrays By: Peter Holko.
SWE 423: Multimedia Systems
Team Morphing Architecture Reconfigurable Computational Platform for Space.
JPEG.
Configurable System-on-Chip: Xilinx EDK
Case Study ARM Platform-based JPEG Codec HW/SW Co-design
5. 1 JPEG “ JPEG ” is Joint Photographic Experts Group. compresses pictures which don't have sharp changes e.g. landscape pictures. May lose some of the.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
1 JPEG Compression CSC361/661 Burg/Wong. 2 Fact about JPEG Compression JPEG stands for Joint Photographic Experts Group JPEG compression is used with.jpg.
Bitstream Relocation with Local Clock Domains for Partially Reconfigurable FPGAs Adam Flynn, Ann Gordon-Ross, Alan D. George NSF Center for High-Performance.
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
Introduction to JPEG Alireza Shafaei ( ) Fall 2005.
ECE472/572 - Lecture 12 Image Compression – Lossy Compression Techniques 11/10/11.
DAPR: Design Automation for Partially Reconfigurable FPGAs Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross Associate.
Heng Tan Ronald Demara A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management.
JPEG. The JPEG Standard JPEG is an image compression standard which was accepted as an international standard in  Developed by the Joint Photographic.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Group No 5 1.Muhammad Talha Islam 2.Karim Akhter 3.Muhammad Arif 4.Muhammad Umer Khalid.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research.
Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison.
Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 10 – Compression Basics and JPEG Compression (Part 4) Klara Nahrstedt Spring 2014.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Copyright © 2003 Texas Instruments. All rights reserved. DSP C5000 Chapter 18 Image Compression and Hardware Extensions.
FPGA Partial Reconfiguration Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida April 10 th, 2009.
The JPEG Standard J. D. Huang Graduate Institute of Communication Engineering National Taiwan University, Taipei, Taiwan, ROC.
M. ALSAFRJALANI D. DZENITIS Runtime PR for Software Radio 2/26/2010 UFL ECE Dept 1 PARTIAL RECONFIGURATION (PR)
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center.
JPEG.
Introduction to JPEG m Akram Ben Ahmed
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.
JPEG. Introduction JPEG (Joint Photographic Experts Group) Basic Concept Data compression is performed in the frequency domain. Low frequency components.
Implementing JPEG Encoder for FPGA ECE 734 PROJECT Deepak Agarwal.
Automated Software Generation and Hardware Coprocessor Synthesis for Data Adaptable Reconfigurable Systems Andrew Milakovich, Vijay Shankar Gopinath, Roman.
Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time Abelardo Jara-Berrocal, Ann Gordon-Ross HCS Research Laboratory College of Engineering.
Lab 4 HW/SW Compression and Decompression of Captured Image
An Automated Hardware/Software Co-Design
JPEG Compression What is JPEG? Motivation
High Speed Video Compression/Decompression Pipeline
Dynamo: A Runtime Codesign Environment
School of Engineering University of Guelph
CS644 Advanced Topics in Networking
JPEG Image Coding Standard
JPEG.
Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch
Anne Pratoomtong ECE734, Spring2002
Abelardo Jara-Berrocal Joseph Antoon Ph.D. Students
XC4000E Series Xilinx XC4000 Series Architecture 8/98
Aurelio Morales-Villanueva and Ann Gordon-Ross+
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
Jian Huang, Matthew Parris, Jooheung Lee, and Ronald F. DeMara
JPEG Still Image Data Compression Standard
The JPEG Standard.
University of Florida, Gainesville, Florida, USA
Presentation transcript:

Run-Time FPGA Partial Reconfiguration for Image Processing Applications Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross Assistant Professor of ECE

Introduction Run-time reconfiguration is an important feature in SRAM-based FPGAs that allows changes in functionality dynamically Enables benefits such as flexibility, hardware reuse and reduced power consumption Drawbacks of run-time reconfiguration Entire fabric is reconfigured even for slight design changes System execution stalls completely Time to load a design onto the fabric from external memory (reconfiguration time) increases with bitstream size Run-time reconfiguration is enhanced by run-time partial reconfiguration (PR) which mitigate these drawbacks Design B Design A Design C Flexibility Designs loaded when required Hardware Reuse Current required design replaces old one on the same fabric Design A Power Savings Design A, B, & C stored in external memory Est time for slide: 0:45 min Total = 0:45 min Configuration controller Design B Design C Design C Required Design B Required Design A Required External memory FPGA Fabric

Partial Reconfiguration (PR) PR allows the ability to reconfigure a portion of an FPGA dynamically by dividing the FPGA into two types of regions Static region - contains static portion of the design (Static Modules) Partially reconfigurable region (PRR) - loaded with a partial reconfiguration module (PRM) PR benefits in addition to full reconfiguration benefits Only reconfigured PRR is stalled while static region and other PPRs continue operating Smaller bitstreams sizes Reduced power consumption Reduced memory requirements Reduced time to reconfigure Central Controlling Agent ICAP Mem controller Module A Module B Module C Module D Static modules Reconfigurable Modules (PRMs) Module: A & B PRR 1 Est time for slide: 0:45 min Total = 1:30 min Static modules Static region Modules: C & D PRR 2 Example with 2 PRRs FPGA Fabric

PR Challenges and Motivations PR is hard Complicated design flow Requires manual intervention and knowledge about target device Can decrease system performance as compared to full reconfiguration if system design is not carefully considered Despite these challenges, PR can potentially prove to be beneficial for certain application types Resource savings, flexibility, power savings, etc. Thus, it becomes necessary to explore PR for different application types Provides valuable insights Potentially ease PR for designers of similar application types Manual Steps Xilinx PR Implementation Flow HDL Design Description HDL Synthesis Set Design Constraints Timing/ Placement Analysis Implement Static Design and PR Modules Est time for slide: 2:00 min Total = 3:30 min Merge Final Generated Bitstreams

Contribution PR architecture benefits for a JPEG encoder system JPEG encoder/decoder systems are a key enabling technology for low-power and high-performance image transmission for on-line satellite communication The JPEG encoder PR architecture provides increased flexibility as compared to a non-PR architecture Leveraging the JPEG encoder PR architecture we propose a PR architecture for a JPEG encoder/decoder (codec) system The proposed PR codec architecture will provide significant benefits in terms of resource savings and power savings as well as flexibility Study of the PR architecture of the JPEG systems can be adapted to realize potential benefits for similar applications types Est time for slide: 0:30 min Total = 4:00 min

JPEG Encoder/Decoder Process JPEG encoding process for color images is divided into four main steps Color Space Transformation - RGB to YCbCr Forward Discrete Cosine Transform (FDCT) Quantization Entropy Encoding 8x8 data block of each color component Est time for slide: 2:00 min Total = 6:00 min Table Specifications Table Specifications Compressed Image Data RGB to YCbCr FDCT Quantizer Entropy Encoder 6

JPEG Encoder/Decoder Process JPEG encoding process for color images is divided into four main steps Color Space Transformation - RGB to YCbCr Forward Discrete Cosine Transform (FDCT) Quantization Entropy Encoding JPEG decoder performs these steps in reverse Reconstructed 8x8 data block of each color component Est time for slide: 2:00 min Total = 6:00 min Table Specifications Table Specifications Compressed Image Data YCbCr to RGB IDCT Dequantizer Entropy Decoder 7

JPEG Encoder Architecture HOST DATA HOST PROG RAM 8x8 Data blocks Control Signals Host Interface MUX Data Control Signals Control Signals Header Generator H: 0xFF Buffer Pipeline Controller Est time for slide: 2:00 min Total = 8:00 min RGB2YCbCr & FDCT2D ZigZag Quantizer Run Length Encoder Huffman Encoder and Byte Stuffer Skipping ahead to when processing is almost done…… 8 8

JPEG Encoder Architecture HOST DATA HOST PROG RAM 8x8 Data blocks Control Signals Data Control Signals Host Interface MUX Control Signals Buffer Header Generator EOI : 0xFF Pipeline Controller Est time for slide: 2:00 min Total = 8:00 min RGB2YCbCr & FDCT2D ZigZag Quantizer Run Length Encoder Huffman Encoder and Byte Stuffer Encoding Complete! 9 9

JPEG Encoder PR Architecture Pipeline controller module Ability to replace with updated module Ability to replace with different controller module HOST DATA HOST PROG RAM Control Signals Data Control Signals Host IF Est time for slide: 1:30 min Total = 9:30 min MUX Byte Stuffer Control Signals Buffer Pipeline Controller Header Generator RGB2YCbCr & FDCT2D ZigZag Quantizer Run Length Encoder Huffman Encoder

JPEG Encoder PR Architecture RGB2YCbCr and FDCT module Ability to replace with an updated module Ability to skip color space transformation by replacing with a module that only does DCT Ability to replace different DCT types HOST DATA HOST PROG RAM PR Region Control Signals PR Module Data Control Signals Host IF Est time for slide: 1:30 min Total = 9:30 min MUX Byte Stuffer Control Signals Buffer Pipeline Controller Header Generator RGB2YCbCr & FDCT2D ZigZag Quantizer Run Length Encoder Huffman Encoder 11

JPEG Encoder PR Architecture Entropy encoder modules (Zigzag, Run length, Huffman, Header Generator, Byte Stuffer) Ability to update each individual module Ability to employ different entropy encoding schemes Ability to replace Huffman code tables and update header accordingly HOST DATA HOST PROG RAM PR Region Control Signals PR Module Data Control Signals Host IF Est time for slide: 1:30 min Total = 9:30 min MUX Byte Stuffer Control Signals Buffer Pipeline Controller Header Generator RGB2YCbCr & FDCT2D ZigZag Quantizer Run Length Encoder Huffman Encoder 12

JPEG Encoder PR Architecture Quantization Module Ability to replace with an updated module Ability to change quantization matrix tables to control image quality HOST DATA HOST PROG RAM PR Region Control Signals PR Module Data Control Signals Host IF Est time for slide: 1:30 min Total = 9:30 min MUX Byte Stuffer Control Signals Buffer Pipeline Controller Header Generator RGB2YCbCr & FDCT2D ZigZag Quantizer Run Length Encoder Huffman Encoder 13

JPEG Encoder PR Architecture Advantages of JPEG Encoder PR Architecture Provides flexibility by allowing the ability to replace different modules More interesting benefits arise when the encoder architecture is combined with a decoder architecture HOST DATA HOST PROG RAM PR Region Control Signals PR Module Data Control Signals Host IF Est time for slide: 1:30 min Total = 9:30 min MUX Byte Stuffer Control Signals Buffer Pipeline Controller Header Generator RGB2YCbCr & FDCT2D ZigZag Quantizer Run Length Encoder Huffman Encoder 14

JPEG Codec PR Architecture HOST DATA PR Region HOST PROG RAM PR Module Control Signals Data Control Signals Host IF MUX Control Signals Buffer Byte Stuffer Encoder Pipeline Controller Decoder Pipeline Controller Est time for slide: 1:00 min Total = 10:00 min DEMUX Header Generator RGB2YCbCr & FDCT2D YCbCr2RGB & IDCT2D Reorder ZigZag Quantizer Dequantizer Run length Encoder Run length decoder Huffman Encoder Huffman Decoder 15

JPEG Codec PR Architecture Encoder Architecture Encoder Data Path HOST DATA PR Region HOST PROG RAM PR Module Control Signals Control Signals Host IF MUX Data Control Signals Buffer Byte Stuffer Decoder Pipeline Controller Encoder Pipeline Controller Est time for slide: 1:00 min Total = 10:00 min DEMUX Header Generator RGB2YCbCr & FDCT2D ZigZag Quantizer Run length Encoder Huffman Encoder 16

JPEG Codec PR Architecture Decoder Architecture Decoder Data Path HOST DATA PR Region HOST PROG RAM PR Module Control Signals Control Signals Host IF MUX Data Control Signals Buffer Byte Stripper Byte Stuffer Encoder Pipeline Controller Decoder Pipeline Controller Est time for slide: 1:00 min Total = 10:00 min DEMUX Header Generator Header Decoder YCbCr2RGB & IDCT2D Reorder Dequantizer Run Length Decoder Huffman Decoder 17

JPEG Codec PR Architecture Contd. Benefits Resource savings Same hardware resources shared between encoder and decoder Power savings PR module bitstreams stored in memory and loaded on demand (decoding or encoding) as opposed to both occupying actual hardware resources Increased flexibility Encoder and decoder PR modules can be updated as needed or replaced with one of another type as per application requirements Architecture limitations For a PR module loaded into a particular region The loaded PR module’s size and resource requirements (slices, FIFOs, BRAMs, DSPs) cannot exceed the maximum available in the PR region PR module port connections, both incoming and outgoing, cannot exceed the PR regions maximum incoming and outgoing port connections, respectively Est time for slide: 1:00 min Total = 10:00 min 18

Experimental Setup Software Hardware Xilinx ISE 9.204 with PR patch 12 installed Synthesize options Optimization Goal - Speed Optimization Effort - Normal Hardware Xilinx Virtex-4 LX60 Est time for slide: 0:20 min Total = 10:50 min

Results – Architecture Specifications Input image specifications Color images only (3 components, RGB input) Supported resolution 800x600 JPEG Encoder system JPEG baseline encoding JPEG ITU-T T.81 | ISO/IEC 10918-1 Standard JFIF header v 1.01 automatic generation Design operates above 100 MHz Hardcoded Huffman tables and two programmable quantization tables, one for luminance and one for chrominance at 50% quality settings 50% quality reduced 100Mhz Est time for slide: 0:40 min Total = 11:30 min T H JPEG Encoder System 20 20

Results: JPEG Encoder PR Architecture Floorplan Quantizer Module Pipeline controller Module FDCT Module Huffman Encoder Module Est time for slide: 1:00 min Total = 12:30 min Zigzag Module Byte Stuffer Module Header Generation Module Run length Encoder Module 21

Results: Resource Requirements JPEG Encoder Architecture Total Slices = 5,531 Total DSP48s = 9 Total FIFO/RAMB16s = 27 JPEG Encoder PR Architecture Total Slices = 5,678 Est time for slide: 1:00 min Total = 13:30 min Slice Requirements 22 22

Results: Resource Requirements JPEG Encoder Architecture Total Slices = 5,531 Total DSP48s = 9 Total FIFO/RAMB16s = 27 JPEG Encoder PR Architecture Total Slices = 5,678 Est time for slide: 1:00 min Total = 13:30 min DSP48 Requirements 23 23

Results: Resource Requirements JPEG Encoder Architecture Total Slices = 5,531 Total DSP48s = 9 Total FIFO/RAMB16s = 27 JPEG Encoder PR Architecture Total Slices = 5,678 Est time for slide: 1:00 min Total = 13:30 min FIFO/RAMB16s Requirements 24 24

Results: PR vs Non PR Architectures Encoder architecture slice requirement = 5531 Predicted decoder architecture slice requirement ~ 5600 Est time for slide: 1:00 min Total = 14:30 min Put 11,200 in graph 25

Results: PR vs Non-PR Architectures Predicted total slice requirement for non-PR codec architecture ~ 11200 Predicted PR codec architecture slice requirement ~ 6100 11200 Slices Slice Macro Overhead : 7% or 416 Slices Est time for slide: 1:00 min Total = 14:30 min Put 11,200 in graph 26

Results: PR vs Non-PR Architectures Predicted total slice requirement for non-PR codec architecture ~ 11200 Predicted PR codec architecture slice requirement ~ 6100 Predicted resource savings from PR codec architecture = ((11200-6100) * 100)/11200 = 45%!! 45% Savings!! Est time for slide: 1:00 min Total = 14:30 min Put 11,200 in graph 27

Conclusions and Future Work We created a JPEG encoder PR architecture for image encoding The architecture provides increased flexibility and potential power savings with a slice macro overhead of only 4% The architecture forms a base for a JPEG codec PR architecture The JPEG codec architecture is predicted to benefit from increased flexibility and power savings as well as area savings as much as 45% relative to a non-PR codec architecture Future Work Complete proposed JPEG codec PR architecture Extend work to development of PR architectures for MPEG/H.264 encoder and decoder systems Est time for slide: 0:30 min Total = 15:00 min

QUESTIONS? This work was supported in part by the I/UCRC Program of the National Science Foundation under Grant No. EEC-0642422. We also gratefully acknowledge tools provided by Xilinx.