Run-Time FPGA Partial Reconfiguration for Image Processing Applications Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross.

Slides:



Advertisements
Similar presentations
2009 Midyear Workshop F4-09: Virtual Architecture and Design Automation for Partial Reconfiguration All Hands Meeting November 10th, 2009 Dr. Ann Gordon-Ross.
Advertisements

Computer Architecture (EEL4713, Fall 2013) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Research Student University of.
Reconfigurable Computing (EEL4930/5934) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Joseph Antoon Research Students.
EEL6686 Guest Lecture February 25, 2014 A Framework to Analyze Processor Architectures for Next-Generation On-Board Space Computing Tyler M. Lovelly Ph.D.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
HTR: On-Chip Hardware Task Relocation for Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
QUIZ What does ICAP stand for ? What is its main use ? Why is Partition Pin preferred over Bus Macro? 1.
A Matlab Playground for JPEG Andy Pekarske Nikolay Kolev.
Digital Signal Processing and Field Programmable Gate Arrays By: Peter Holko.
SWE 423: Multimedia Systems
Team Morphing Architecture Reconfigurable Computational Platform for Space.
Fall 2006Lecture 16 Lecture 16: Accelerator Design in the XUP Board ECE 412: Microcomputer Laboratory.
1 Performed by: Lin Ilia Khinich Fanny Instructor: Fiksman Eugene המעבדה למערכות ספרתיות מהירות High Speed Digital Systems Laboratory הטכניון - מכון טכנולוגי.
Configurable System-on-Chip: Xilinx EDK
Case Study ARM Platform-based JPEG Codec HW/SW Co-design
5. 1 JPEG “ JPEG ” is Joint Photographic Experts Group. compresses pictures which don't have sharp changes e.g. landscape pictures. May lose some of the.
1 JPEG Compression CSC361/661 Burg/Wong. 2 Fact about JPEG Compression JPEG stands for Joint Photographic Experts Group JPEG compression is used with.jpg.
Bitstream Relocation with Local Clock Domains for Partially Reconfigurable FPGAs Adam Flynn, Ann Gordon-Ross, Alan D. George NSF Center for High-Performance.
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
Introduction to JPEG Alireza Shafaei ( ) Fall 2005.
ECE472/572 - Lecture 12 Image Compression – Lossy Compression Techniques 11/10/11.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
DAPR: Design Automation for Partially Reconfigurable FPGAs Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross Associate.
Heng Tan Ronald Demara A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Group No 5 1.Muhammad Talha Islam 2.Karim Akhter 3.Muhammad Arif 4.Muhammad Umer Khalid.
Codec structuretMyn1 Codec structure In an MPEG system, the DCT and motion- compensated interframe prediction are combined. The coder subtracts the motion-compensated.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research.
LZRW3 Decompressor dual semester project Part A Mid Presentation Students: Peleg Rosen Tal Czeizler Advisors: Moshe Porian Netanel Yamin
Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison.
Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 10 – Compression Basics and JPEG Compression (Part 4) Klara Nahrstedt Spring 2014.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Copyright © 2003 Texas Instruments. All rights reserved. DSP C5000 Chapter 18 Image Compression and Hardware Extensions.
A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead Heng Tan and Ronald F. DeMara University of Central Florida.
Reconfigurable Embedded Processor Peripherals Xilinx Aerospace and Defense Applications Brendan Bridgford Brandon Blodget.
FPGA Partial Reconfiguration Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida April 10 th, 2009.
The JPEG Standard J. D. Huang Graduate Institute of Communication Engineering National Taiwan University, Taipei, Taiwan, ROC.
M. ALSAFRJALANI D. DZENITIS Runtime PR for Software Radio 2/26/2010 UFL ECE Dept 1 PARTIAL RECONFIGURATION (PR)
JPEG - JPEG2000 Isabelle Marque JPEGJPEG2000. JPEG Joint Photographic Experts Group Committe created in 1986 by: International Organization for Standardization.
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center.
JPEG.
Introduction to JPEG m Akram Ben Ahmed
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.
Implementing JPEG Encoder for FPGA ECE 734 PROJECT Deepak Agarwal.
Automated Software Generation and Hardware Coprocessor Synthesis for Data Adaptable Reconfigurable Systems Andrew Milakovich, Vijay Shankar Gopinath, Roman.
An Automated Hardware/Software Co-Design
JPEG Compression What is JPEG? Motivation
CS644 Advanced Topics in Networking
JPEG Image Coding Standard
Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch
Anne Pratoomtong ECE734, Spring2002
Abelardo Jara-Berrocal Joseph Antoon Ph.D. Students
Aurelio Morales-Villanueva and Ann Gordon-Ross+
Jian Huang, Matthew Parris, Jooheung Lee, and Ronald F. DeMara
Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida
The JPEG Standard.
University of Florida, Gainesville, Florida, USA
Presentation transcript:

Run-Time FPGA Partial Reconfiguration for Image Processing Applications Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross Assistant Professor of ECE NSF CHREC Center, University of Florida

Design A RequiredDesign C Required Design B Required 2 Introduction Run-time reconfiguration is an important feature in SRAM-based FPGAs that allows changes in functionality dynamically  Enables benefits such as flexibility, hardware reuse, and reduced power consumption Drawbacks of run-time reconfiguration  Entire fabric is reconfigured even for slight design changes System execution stalls completely  Time to load a design onto the fabric from external memory (reconfiguration time) increases with bitstream size Run-time reconfiguration is enhanced by run-time partial reconfiguration (PR) which mitigate these drawbacks Configuration controller Design A Design B Design C Design ADesign BDesign C FPGA Fabric External memory Flexibility Designs loaded when required Hardware Reuse Current required design replaces old one on the same fabric Power Savings Design A, B, & C stored in external memory

3 Partial Reconfiguration (PR) Central Controlling Agent ICAP Mem controller Module A Module B Module C Static modules Reconfigurable Modules (PRMs) PRR 1 PRR 2 Static region Static modules Module: A & B Modules: C & D Module D PR allows the ability to reconfigure a portion of an FPGA dynamically by dividing the FPGA into two types of regions  Static region - contains static portion of the design (Static Modules)  Partially reconfigurable region (PRR) - loaded with a partial reconfiguration module (PRM) PR benefits in addition to full reconfiguration benefits  Only reconfigured PRR is stalled while static region and other PPRs continue operating  Smaller bitstreams sizes Reduced power consumption Reduced memory requirements Reduced time to reconfigure FPGA Fabric Example with 2 PRRs

PR is hard  Complicated design flow  Requires manual intervention and knowledge about target device  Can decrease system performance as compared to full reconfiguration if system design is not carefully considered Despite these challenges, PR can potentially prove to be beneficial for certain application types  Resource savings, flexibility, power savings, etc. Thus, it becomes necessary to explore PR for different application types  Provides valuable insights  Potentially ease PR for designers of similar application types 4 PR Challenges and Motivations HDL Synthesis Set Design Constraints Implement Static Design and PR Modules HDL Design Description Final Generated Bitstreams Merge Timing/ Placement Analysis Xilinx PR Implementation Flow Manual Steps

5 Contribution PR architecture benefits for a JPEG encoder system  JPEG encoder/decoder systems are a key enabling technology for low- power and high-performance image transmission for on-line satellite communication The JPEG encoder PR architecture provides increased flexibility as compared to a non-PR architecture Leveraging the JPEG encoder PR architecture we propose a PR architecture for a JPEG encoder/decoder (codec) system  The proposed PR codec architecture will provide significant benefits in terms of resource savings and power savings as well as flexibility Study of the PR architecture of the JPEG systems can be adapted to realize potential benefits for similar applications types

6 JPEG Encoder/Decoder Process JPEG encoding process for color images is divided into four main steps  Color Space Transformation - RGB to YCbCr  Forward Discrete Cosine Transform (FDCT)  Quantization  Entropy Encoding RGB to YCbCr FDCTQuantizer Entropy Encoder Table Specifications Compressed Image Data 8x8 data block of each color component

Table Specifications Reconstructed 8x8 data block of each color component YCbCr to RGB IDCTDequantizer Entropy Decoder Compressed Image Data 7 JPEG Encoder/Decoder Process JPEG encoding process for color images is divided into four main steps  Color Space Transformation - RGB to YCbCr  Forward Discrete Cosine Transform (FDCT)  Quantization  Entropy Encoding JPEG decoder performs these steps in reverse

MUX RAM Header Generator Control Signals 8 JPEG Encoder Architecture Buffer RGB2YCbCr & FDCT2DZigZagQuantizer Run Length Encoder Huffman Encoder and Byte Stuffer Pipeline Controller HOST PROG HOST DATA Host Interface DataControl Signals 8x8 Data blocks Skipping ahead to when processing is almost done…… H: 0xFF

MUX RAM Header Generator Control Signals 9 JPEG Encoder Architecture Buffer RGB2YCbCr & FDCT2DZigZagQuantizer Run Length Encoder Huffman Encoder and Byte Stuffer Pipeline Controller HOST PROG HOST DATA Host Interface Data Control Signals 8x8 Data blocks Encoding Complete! EOI : 0xFF

Pipeline controller module  Ability to replace with updated module  Ability to replace with different controller module JPEG Encoder PR Architecture 10 Control Signals Buffer RGB2YCbCr & FDCT2DZigZagQuantizer Run Length Encoder Huffman Encoder Pipeline Controller MUX RAM HOST PROG HOST DATA Host IF Data Control Signals Byte Stuffer Header Generator

RGB2YCbCr and FDCT module  Ability to replace with an updated module  Ability to skip color space transformation by replacing with a module that only does DCT  Ability to replace different DCT types JPEG Encoder PR Architecture 11 Control Signals Buffer RGB2YCbCr & FDCT2DZigZagQuantizer Run Length Encoder Huffman Encoder Pipeline Controller MUX RAM HOST PROG HOST DATA Host IF Data Control Signals PR Region Control Signals PR Module Byte Stuffer Header Generator

Entropy encoder modules (Zigzag, Run length, Huffman, Header Generator, Byte Stuffer)  Ability to update each individual module  Ability to employ different entropy encoding schemes  Ability to replace Huffman code tables and update header accordingly JPEG Encoder PR Architecture 12 Control Signals Buffer RGB2YCbCr & FDCT2DZigZagQuantizer Run Length Encoder Huffman Encoder Pipeline Controller MUX RAM HOST PROG HOST DATA Host IF Data Control Signals PR Region Control Signals PR Module Byte Stuffer Header Generator

Quantization Module  Ability to replace with an updated module  Ability to change quantization matrix tables to control image quality JPEG Encoder PR Architecture 13 Control Signals Buffer RGB2YCbCr & FDCT2DZigZagQuantizer Run Length Encoder Huffman Encoder Pipeline Controller MUX RAM HOST PROG HOST DATA Host IF Data Control Signals PR Region Control Signals PR Module Byte Stuffer Header Generator

Advantages of JPEG Encoder PR Architecture  Provides flexibility by allowing the ability to replace different modules  More interesting benefits arise when the encoder architecture is combined with a decoder architecture JPEG Encoder PR Architecture 14 Control Signals Buffer RGB2YCbCr & FDCT2DZigZagQuantizer Run Length Encoder Huffman Encoder Pipeline Controller MUX RAM HOST PROG HOST DATA Host IF Data Control Signals PR Region Control Signals PR Module Byte Stuffer Header Generator

Byte Stuffer Header Generator YCbCr2RGB & IDCT2D ReorderDequantizer Run length decoder Huffman Decoder Decoder Pipeline Controller JPEG Codec PR Architecture 15 Control Signals DEMUX RGB2YCbCr & FDCT2D ZigZagQuantizer Run length Encoder Huffman Encoder Encoder Pipeline Controller MUX RAM HOST PROG HOST DATA Host IF Data Control Signals PR Region Control Signals PR Module Buffer

RGB2YCbCr & FDCT2D ZigZagQuantizer Huffman Encoder Run length Encoder Byte Stuffer Header Generator Decoder Pipeline Controller JPEG Codec PR Architecture 16 Control Signals DEMUX Encoder Pipeline Controller MUX RAM HOST PROG HOST DATA Host IF Data Control Signals PR Region Control Signals PR Module Buffer Encoder Architecture Encoder Data Path

Run Length Decoder Byte Stuffer Header Generator Header Decoder Decoder Architecture YCbCr2RGB & IDCT2D ReorderDequantizer Huffman Decoder Decoder Pipeline Controller JPEG Codec PR Architecture 17 Control Signals DEMUX Encoder Pipeline Controller MUX RAM HOST PROG HOST DATA Host IF Data Control Signals PR Region Control Signals PR Module Buffer Byte Stripper Decoder Data Path

18 JPEG Codec PR Architecture Contd. Benefits  Resource savings Same hardware resources shared between encoder and decoder  Power savings PR module bitstreams stored in memory and loaded on demand (decoding or encoding) as opposed to both occupying actual hardware resources  Increased flexibility Encoder and decoder PR modules can be updated as needed or replaced with one of another type as per application requirements Architecture limitations  For a PR module loaded into a particular region The loaded PR module’s size and resource requirements (slices, FIFOs, BRAMs, DSPs) cannot exceed the maximum available in the PR region PR module port connections, both incoming and outgoing, cannot exceed the PR regions maximum incoming and outgoing port connections, respectively

19 Experimental Setup Software  Xilinx ISE with PR patch 12 installed Synthesize options  Optimization Goal - Speed  Optimization Effort - Normal Hardware  Xilinx Virtex-4 LX60

20 Results – Architecture Specifications Input image specifications  Color images only (3 components, RGB input)  Supported resolution 800x600 JPEG Encoder system  JPEG baseline encoding JPEG ITU-T T.81 | ISO/IEC  Standard JFIF header v 1.01 automatic generation  Design operates above 100 MHz  Hardcoded Huffman tables and two programmable quantization tables, one for luminance and one for chrominance at 50% quality settings JPEG Encoder System H T 100Mhz 50% quality reduced

21 Results: JPEG Encoder PR Architecture Floorplan FDCT Module Quantizer Module Huffman Encoder Module Zigzag Module Byte Stuffer Module Header Generation Module Run length Encoder Module Pipeline controller Module

Slice Requirements 22 Results: Resource Requirements JPEG Encoder Architecture  Total Slices = 5,531  Total DSP48s = 9  Total FIFO/RAMB16s = 27 JPEG Encoder PR Architecture  Total Slices = 5,678  Total DSP48s = 9  Total FIFO/RAMB16s = 27

23 Results: Resource Requirements JPEG Encoder Architecture  Total Slices = 5,531  Total DSP48s = 9  Total FIFO/RAMB16s = 27 JPEG Encoder PR Architecture  Total Slices = 5,678  Total DSP48s = 9  Total FIFO/RAMB16s = 27 DSP48 Requirements

FIFO/RAMB16s Requirements 24 Results: Resource Requirements JPEG Encoder Architecture  Total Slices = 5,531  Total DSP48s = 9  Total FIFO/RAMB16s = 27 JPEG Encoder PR Architecture  Total Slices = 5,678  Total DSP48s = 9  Total FIFO/RAMB16s = 27

Encoder architecture slice requirement = 5531 Predicted decoder architecture slice requirement ~ Results: PR vs Non PR Architectures

Predicted total slice requirement for non-PR codec architecture ~ Predicted PR codec architecture slice requirement ~ Results: PR vs Non-PR Architectures Slices Slice Macro Overhead : 7% or 416 Slices Slice Macro Overhead : 7% or 416 Slices

Predicted total slice requirement for non-PR codec architecture ~ Predicted PR codec architecture slice requirement ~ 6100 Predicted resource savings from PR codec architecture = (( ) * 100)/11200 = 45%!! 27 Results: PR vs Non-PR Architectures 45% Savings!!

28 Conclusions and Future Work Conclusions  We created a JPEG encoder PR architecture for image encoding  The architecture provides increased flexibility and potential power savings with a slice macro overhead of only 4%  The architecture forms a base for a JPEG codec PR architecture  The JPEG codec architecture is predicted to benefit from increased flexibility and power savings as well as area savings as much as 45% relative to a non-PR codec architecture Future Work  Complete proposed JPEG codec PR architecture  Extend work to development of PR architectures for MPEG/H.264 encoder and decoder systems

QUESTIONS? This work was supported in part by the I/UCRC Program of the National Science Foundation under Grant No. EEC We also gratefully acknowledge tools provided by Xilinx.