Computer Architecture (EEL4713, Fall 2013) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Research Student University of.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

2009 Midyear Workshop F4-09: Virtual Architecture and Design Automation for Partial Reconfiguration All Hands Meeting November 10th, 2009 Dr. Ann Gordon-Ross.
Run-Time FPGA Partial Reconfiguration for Image Processing Applications Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross.
Reconfigurable Computing (EEL4930/5934) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Joseph Antoon Research Students.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Lecture 7 FPGA technology. 2 Implementation Platform Comparison.
HTR: On-Chip Hardware Task Relocation for Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
Chapter 19: Network Management Business Data Communications, 4e.
Team Morphing Architecture Reconfigurable Computational Platform for Space.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
Design Flow – Computation Flow. 2 Computation Flow For both run-time and compile-time For some applications, must iterate.
Configurable System-on-Chip: Xilinx EDK
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
GanesanP91 Synthesis for Partially Reconfigurable Computing Systems Satish Ganesan, Abhijit Ghosh, Ranga Vemuri Digital Design Environments Laboratory.
Virtual Architecture For Partially Reconfigurable Embedded Systems (VAPRES) Architecture for creating partially reconfigurable embedded systems Module.
Bitstream Relocation with Local Clock Domains for Partially Reconfigurable FPGAs Adam Flynn, Ann Gordon-Ross, Alan D. George NSF Center for High-Performance.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Reconfigurable Hardware in Wearable Computing Nodes Christian Plessl 1 Rolf Enzler 2 Herbert Walder 1 Jan Beutel 1 Marco Platzner 1 Lothar Thiele 1 1 Computer.
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
A comprehensive method for the evaluation of the sensitivity to SEUs of FPGA-based applications A comprehensive method for the evaluation of the sensitivity.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Embedded Systems Seminar (EEL6935, Spring 2013) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Research Student University.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Xilinx Programmable Logic Design Solutions Version 2.1i Designing the Industry’s First 2 Million Gate FPGA Drop-In 64 Bit / 66 MHz PCI Design.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
DAPR: Design Automation for Partially Reconfigurable FPGAs Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross Associate.
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
J. Christiansen, CERN - EP/MIC
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research.
Exploiting Partially Reconfigurable FPGAs for Situation-Based Reconfiguration in Wireless Sensor Networks Rafael Garcia, Dr. Ann Gordon-Ross, Dr. Alan.
Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance.
EE3A1 Computer Hardware and Digital Design
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Reconfigurable Embedded Processor Peripherals Xilinx Aerospace and Defense Applications Brendan Bridgford Brandon Blodget.
FPGA Partial Reconfiguration Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida April 10 th, 2009.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
This material exempt per Department of Commerce license exception TSU Xilinx On-Chip Debug.
M. ALSAFRJALANI D. DZENITIS Runtime PR for Software Radio 2/26/2010 UFL ECE Dept 1 PARTIAL RECONFIGURATION (PR)
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.
Survey of Reconfigurable Logic Technologies
2/19/2016http://csg.csail.mit.edu/6.375L11-01 FPGAs K. Elliott Fleming Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time Abelardo Jara-Berrocal, Ann Gordon-Ross HCS Research Laboratory College of Engineering.
An Automated Hardware/Software Co-Design
Dynamo: A Runtime Codesign Environment
James Coole PhD student, University of Florida Aaron Landy Greg Stitt
FPGA: Real needs and limits
Anne Pratoomtong ECE734, Spring2002
Reconfigurable Computing
Embedded systems, Lab 1: notes
Aurelio Morales-Villanueva and Ann Gordon-Ross+
Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida
University of Florida, Gainesville, Florida, USA
Presentation transcript:

Computer Architecture (EEL4713, Fall 2013) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Research Student University of Florida Dr. Ann Gordon-Ross Associate Professor of ECE University of Florida

Partial Reconfiguration is All Around Us 2 Changing situations… …require part of the system to reconfigure on the fly

Partial Reconfiguration is All Around Us But, FPGA reconfiguration is disruptive  Resets the device  Lose all data  Causes downtime Downtime is dangerous 3

Full Reconfiguration: 4 Task 1 Task 2 Task 1 Task 2 Static

So what?? I’ll just put both tasks on the same device! Sure, why not? But, devices have limited space! Why Partial Reconfiguration? 5 Not impressed FPGA Task 1 Task 2Task 3Task 4Task 5Task 6 Reason #1 Sharing many tasks on a single region saves area!

Why Partial Reconfiguration? 6 Reason #2 Using less area on a smaller device is less costly!

Why Partial Reconfiguration? 7 Man, what a buzz-kill FPGA Reason #3 Replace tasks with low-power versions when possible!

So what?? I’ll just use clock gating (CG) and dynamic frequency scaling (DFS), both of which are available for Xilinx FPGAs Right… well… you see… actually…. Why Partial Reconfiguration? 8 Hmm… Shut up

Why Partial Reconfiguration? 9 But FPGA configuration memory uses SRAM! FPGA FPGA Reason #4 PR keeps circuits safe in harsh environments

So you wanna make a PR design… 10 First, we make partitions  Partitions are like black boxes They start out empty Then we load modules  Modules run tasks  To change tasks Load a new module Old one is overwritten Partition 1 Partition 2 The FPGA (not to scale) a b a f f

So you wanna make a PR design… 11 Modules have to fit like puzzle pieces  Black boxes have a defined interface  All modules must fit that interface Where the ports are matters as well  Ports must be in the same place for every module  “Partition pins” are port location definitions  They ensure connections are not broken during PR Partition 1 Partition 2 The FPGA (not to scale) a b a f f

Quit sugar-coating it, sirs, I am not a child you know. Oh, fine. This is what you’re going to learn today: I. Logically partitioning your application into modules II. Preparing your partitioned design in ISE III. Floor-planning the layout of your device in PlanAhead IV. Implementing your design in PlanAhead V. Finding your inner child through meditation (time permitting) So you wanna make a PR design… 12

Step 1: Logical partitioning Easy there buddy Two components are mutually exclusive if  Only one is used at a time  One’s inputs don’t directly depend on the other’s outputs Only mutually exclusive components share a partition  So, before you can make your design…  You must find as many of these as you can 13 The first step to make a PR design is breaking the application into sets of mutually exclusive components

Step 1: Logical partitioning Okay, lets do an example This is an up/down counter The add and the subtract  …are mutually exclusive  Only one is used  They do not depend on each other The store and the add  …are not mutually exclusive  The store depends on the add’s output The add and subtract can share a partition  The add forms one reconfigurable module  The subtract forms another reconfigurable module 14 Direction? Direction = up Result = 0 Result ++Result -- Store Result Get Direction up down Direction = up Result = 0 Result ++ count Store Result Get Direction Result ++

Now some cool stuff that our group has been doing in CHREC 15

Computer Architecture (EEL4713, Fall 2013) June 3-4, 2013 F4-13: Partially Reconfigurable System Development and Management Number of supporting memberships: 1.5 Dr. Ann Gordon-Ross Associate Professor of ECE University of Florida Rohit Kumar Elizabeth Graham Aurelio Morales Shaon Yousuf Zack Smaridge Research Students University of Florida

F4-13: Goals, Motivations, and Challenges 17 Optimize area, power, and performance Reduce design time effort Goal Increase reconfigurable computing (RC) system designer productivity Source code’s PR analysis aids design parameter selection PR isolates reconfiguration to portions of FPGA Enables resource time-sharing Leverage network of PR-capable FPGAs Leverage distributed resource management services Scripts and tools reduce manual design flow steps Motivations Partial reconfiguration (PR) enables area and power savings Distributed computing provides increased system computation capability Early design space pruning reduces design time Design automation enables rapid system implementation PR requires application- and device- specific, low-level knowledge Efficient design space exploration (DSE) for PR-centric system design Maintaining application data integrity across PR-centric distributed RC systems Challenges Identifying automatable design flow steps

Alleviates tool flow overhead and reduces implementation effort Enables load balancing across local and remote VAPRES nodes Enables distributed processing and management across VAPRES nodes Identifies resource- and performance- optimized PR architectures F4-13: Approach 18 Adapt system-wide version of DDRM for server/client Leverage dynamic hardware task management tools Design and test DDRM application Node-Level Distributed Resource Management Expand context save and restore (CSR) and hardware task relocation (HTR) features Optimize CSR and HTR to maximize task throughput and resource utilization Dynamic Hardware Task Management Leverage PRML to generate PR applications from source code Leverage high-level synthesis tools to generate VHDL code Leverage intermediate fabrics 1 and DAPR+ 2 for fast DSE One-click PR Design Space Exploration Design automation tool suite (DAPR++) to aid PR system design Generates distributed RC system for increased computational capacity Automated Design Implementation PR-centric RC System Development Task B Task A Task C DAPR+ – Design Automation for PR FPGAs DDRM – Distributed Dynamic Resource Manger PRML – PR Modeling Language DAPR+ – Design Automation for PR FPGAs DDRM – Distributed Dynamic Resource Manger PRML – PR Modeling Language 1 Developed by F Developed by F4-11 DSE – Design Space Exploration 1 Developed by F Developed by F4-11 DSE – Design Space Exploration

Streamlined framework for rapid application partitioning, PR design space exploration, and implementation 19 Automatically generates PR application from non-PR high-level source code Alleviates complexities in PR design implementation via automated tool flows Task A: PR Design Space Exploration Framework PR design space exploration Low-level automated floorplaning and partitioned application’s area/ power/performance evaluation Implementation Automation and integration of vendor’s and various third-party tools Framework components Explores PR design space to find area/power/ performance optimized PR application Automatically generates PR application from non-PR high-level source code 1 Published in FCCM’13 Partitioning Automatic modeling and PR partitioning of application’s C source code via PRML 1

DAPR++ tool suite aids designing RC systems using automation Task B: PR System Design Automation with DAPR++ Tool Suite 20 Creates master and slave FPGA component layout tree Creates FPGA VHDL black boxes for all components Creates master and slave FPGA component layout tree Creates FPGA VHDL black boxes for all components DAPR++ Tool Suite PR Architecture Generator Network Generator PR Task Manager Throughput Profiler Bitstream Manager PRR Floorplanner Automatically generates target device resource mapping Heuristically floorplans PRRs and partition pins Automatically generates target device resource mapping Heuristically floorplans PRRs and partition pins Modifies bitstreams and enables task context save (CS) and context restore (CR) Creates network protocols for master and slave FPGAs Creates PR task reconfiguration schedules to reduce reconfiguration time Records data packet transfer rates between master and slave FPGAs CAW13 CMW12 CAW13 Switch Master FPGA Slave FPGA 1 GPP PRRs Slave FPGA 2 PRRs

Node-level DDRM facilitates VAPRES network management  Automatically manages task relocation Minimizes system delays caused by task relocation latency  Uses custom node communication procedures Maintains global node execution status  Task relocation circumvents node-level restrictions Individual nodes have limited resources and power Network nodes to leverage shared resource pool  Example applications: sensor networks, target tracking Node-level DDRM controls nodes’ task distribution  Node is a client for local tasks, server for remote tasks  Client determines new node and PRR for task execution Algorithm developed in system-level test version of DDRM  Clients communicate with servers to locate new PRR and transfer PRM Created automated communication functions to coordinate inter-node transfer of bitstreams, context, test results, and node status Task C.1: Node-level DDRM 21 PRR – Partially Reconfigurable Region PRM – Partially Reconfigurable Module PRR – Partially Reconfigurable Region PRM – Partially Reconfigurable Module DDRM – Distributed Dynamic Resource Manager DDRM

Task C.2: Hardware Task Management Tools 22 PRM – Partially Reconfigurable Module VAPRES – Virtual Architecture for Partially Reconfigurable Embedded Systems PRM – Partially Reconfigurable Module VAPRES – Virtual Architecture for Partially Reconfigurable Embedded Systems DSP – Digital Signal Processing BRAM – Random Access Memory Block PRR – Partially Reconfigurable Region DSP – Digital Signal Processing BRAM – Random Access Memory Block PRR – Partially Reconfigurable Region VAPRES node PRR 1 M2 PRR1 M1 PRR1 On-chip CSR VAPRES node PRR 2 PRR 1 M3 PRR2 merged M1 PRR2 M1 PRR2 M2 PRR1 M1 PRR1 On-chip HTR Experimental results on XUPV5 board  Linear growth rate in CSR execution times w.r.t. number of PRM flip-flops  HTR execution times Linear growth rate for context save (CS) and context restore (CR) Non-linear growth rate for task relocation (TR)  System designers can trade off PRR size/granularity and CSR/HTR execution times based on application requirements New CSR and HTR features  Supports DSPs/BRAMs/LUTRAMs and multiple PRR rows/columns  Reduced execution times Distributed processing and load balancing tools for networked VAPRES nodes  Portable across different FPGA architectures  On-chip context save and restore (CSR) and hardware task relocation (HTR) software PRM execution state retained on PRM preemption Enhances task switching in PR-capable FPGAs Suitable for autonomous, multitasking PR systems