Berkeley: Sept 15, 1999 1 Physical Design Challenges of Reconfigurable Computing Systems Majid Sarrafzadeh NuCAD Department of ECE Northwestern University.

Slides:



Advertisements
Similar presentations
Optimal Bus Sequencing for Escape Routing in Dense PCBs H.Kong, T.Yan, M.D.F.Wong and M.M.Ozdal Department of ECE, University of Illinois at U-C ICCAD.
Advertisements

Field Programmable Gate Array
Part IV: Memory Management
Lecture 19: Parallel Algorithms
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
Steven Koelmeyer BDS(hons)1 Reconfigurable Hardware for use in Ad Hoc Sensor Networks Supervisors Charles Greif Nandita Bhattacharjee.
The Design Process Outline Goal Reading Design Domain Design Flow
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Power-Aware Placement
ER UCLA UCLA ICCAD: November 5, 2000 Predictable Routing Ryan Kastner, Elaheh Borzorgzadeh, and Majid Sarrafzadeh ER Group Dept. of Computer Science UCLA.
Dynamic FPGA Routing for Just-in-Time Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer Science and Engineering.
Design Flow – Computation Flow. 2 Computation Flow For both run-time and compile-time For some applications, must iterate.
Configurable System-on-Chip: Xilinx EDK
Reconfigurable Computing (EN2911X, Fall07)
Automated Generation of Layout and Control for Quantum Circuits Mark Whitney, Nemanja Isailovic, Yatish Patel, John Kubiatowicz University of California,
Dynamic NoC. 2 Limitations of Fixed NoC Communication NoC for reconfigurable devices:  NOC: a viable infrastructure for communication among task dynamically.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
CS294-6 Reconfigurable Computing Day 3 September 1, 1998 Requirements for Computing Devices.
K. Bazargan R. KastnerM. Sarrafzadeh Physical Design for Reconfigurable Computing Systems using Firm Templates Department of Electrical & Computer Engineering.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
HARP: Hard-Wired Routing Pattern FPGAs Cristinel Ababei , Satish Sivaswamy ,Gang Wang , Kia Bazargan , Ryan Kastner , Eli Bozorgzadeh   ECE Dept.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Chip Planning 1. Introduction Chip Planning:  Deals with large modules with −known areas −fixed/changeable shapes −(possibly fixed locations for some.
StaticRoute: A novel router for the dynamic partial reconfiguration of FPGAs Brahim Al Farisi, Karel Bruneel, Dirk Stroobandt 2/9/2013.
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Power Reduction for FPGA using Multiple Vdd/Vth
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
CAD for Physical Design of VLSI Circuits
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
FPGA Run-time Reconfigurable Placement Presentation by Brian Leonard Clemson University 2003 SURE REU Program Advisor: Ron Sass.
Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Ping-Hung Yuh, Chia-Lin Yang, and Yao-Wen Chang
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Incremental Placement Algorithm for Field Programmable Gate Arrays David Leong Advisor: Guy Lemieux University of British Columbia Department of Electrical.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
EE3A1 Computer Hardware and Digital Design
Paper Review Presentation Paper Title: Hardware Assisted Two Dimensional Ultra Fast Placement Presented by: Mahdi Elghazali Course: Reconfigurable Computing.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 13: February 20, 2002 Routing 1.
Routability-driven Floorplanning With Buffer Planning Chiu Wing Sham Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
FPGA CAD 10-MAR-2003.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
Architecture and algorithm for synthesizable embedded programmable logic core Noha Kafafi, Kimberly Bozman, Steven J. E. Wilton 2003 Field programmable.
Chapter 7 Memory Management Eighth Edition William Stallings Operating Systems: Internals and Design Principles.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Partial Reconfigurable Designs
Chapter 2 Memory and process management
Dynamo: A Runtime Codesign Environment
CSCI1600: Embedded and Real Time Software
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Topics Logic synthesis. Placement and routing..
HIGH LEVEL SYNTHESIS.
CSCI1600: Embedded and Real Time Software
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Berkeley: Sept 15, Physical Design Challenges of Reconfigurable Computing Systems Majid Sarrafzadeh NuCAD Department of ECE Northwestern University Ryan Kastner, Todd Haverkos, Kia Bazargan, Seda Ogrenci, Eli Bozorgzadeh, Candice McGrew Sponsored: DARPA, Motorola, AT&T, NSF

Berkeley: Sept 15, Faculty Position In VLSI Design & CAD (1-2 openings) VLSI Design & CAD: One of the six focused research areas in the department Assistant/Associate/Full Professor –(Northwestern rank: top 10; –ECE: top 20 (top 10 in 5 years) Contact:

Berkeley: Sept 15, Field Programmable Gate Array: FPGA

Berkeley: Sept 15, FPGA(Xilinx)

Berkeley: Sept 15, Degraded ImageRestored Image

Berkeley: Sept 15, Degraded ImageRestored Image

Berkeley: Sept 15, Image stored in on-chip memory Circuit to process the image residing on the rest of the chip FPGA chip On-board memory, where the image is stored FPGA chip Host processor ( image is stored here) System ASystem BSystem C

Berkeley: Sept 15, CPU Data Memory Control Data Instruction Memory (Program) RFUOPs CPU instructions The Architecture of a Reconfigurable System RFU

Berkeley: Sept 15, RFU Programmable logic Programmable connections Field Programmable Gate Array: FPGA SRAM cells used in configuration –Reconfigurable (runtime) –Static vs. dynamic configuration Hardware functions implemented as rectangular areas on the FPGA SRAM cells

Berkeley: Sept 15, System Components Configuration Memory Config. Bits RFUOPs RFU Manager Placement Engine Cache Manager Prefetch/Branch Prediction Unit Control Program Manager Instruction Mem. (Prog.) CPU instructions Data CPU RFU Data Memory Data

Berkeley: Sept 15, System Behavior Two kind of instructions –CPU instructions => always run on CPU Assume known runtime –RFUOPs, might be performed on CPU if not enough room on RFU Assume known runtime and reconfiguration time Runtime profiles and RFU status are used to decide between CPU and RFU

Berkeley: Sept 15, PD Challenges Problem: Given RFUOPs to be performed on RFU and DFG constraints, schedule them in time assign them physical location. Must be very fast: (mtools achieve 1000 cells per minute). Existing tools/techniques are very slow. Quality is less important. New PD algorithm/paradigms are needed. In this presentation: –placement, –routing, –an application on reconfigurable systems.

Berkeley: Sept 15, Firm Macros Not hard (too rigid), not soft (takes too much time to utilize the flexibility) Each unit is 80%-100% pre-designed: Can “break” the macros in limited ways We have defined a network algebra for combining circuits (based on parameterization using VHDL generics): combine a fast and a slow adder in multiple ways

Berkeley: Sept 15, Faculty Position In VLSI Design & CAD (1-2 openings) VLSI Design & CAD: One of the six focused research areas in the department Assistant/Associate/Full Professor –(Northwestern rank: top 10; –ECE: top 20 (top 10 in 5 years) –Contact:

Berkeley: Sept 15, Execution of a Sample Program RFU t y x x = 3*a - b; … C = RFUOP1(x,5); y = 4*x - c; for (i=0;i<3;i++){ x += RFUOP2(y); ++y; } z = RFUOP1(x,3); a = z - y; b = RFUOP3(a,b); c = a - b; …CodeDFG =>(on CPU) (on RFU)=> No room on RFU to run all in parallel ==> run in sequence => (in parallel) =>

Berkeley: Sept 15, Placement On-line placement –RFU calls needs to be executed as the program proceeds off-line placement –Have a complete or partial profile of the operation

Berkeley: Sept 15, Online Placement When a new RFUOP arrives –Is there enough space to place the RFUOP? –If yes, Which location is best to place it? Decision 1: Managing the empty space –Fast but sub-optimal Keep only O(n) empty rectangles –Shorter Seg. (SSEG), Square Empty Rects. (SQR),... –Efficient use of RFU real estate KAMER: Keep all O(n 2 ) maximal empty rectangles Decision 2: Packing rule –Best Fit, Bottom Left, First Fit

Berkeley: Sept 15, Keeping All Empty Rectangles Keeping O(n) Empty Rectangles - SSEG Cannot fit this

Berkeley: Sept 15, Area( ) < Area( )  Choose A Heuristics for Choosing an Empty Rectangle A B Current Placement New module to be inserted + = ? BF (Best Fit) FF (First Fit)BL (Bottom Left) Places the new module in the empty rectangle which causes less wasted space. Any of A or B could be chosen for placing the new module. P1P1 P2P2 Places the new module in rect w/ lower bottom-left corner, breaking the tie by picking leftmost one. y( P 2 ) < y( P 1 )  Choose B

Berkeley: Sept 15, Heuristics for Choosing a Segment SSEG (Shorter Seg) BER (Balanced Empty Rects)LSQR (Larger Rect Square) SQR (Square Rects) LER (Large Empty Rects) LSEG (Longer Seg) S1S1 S2S2 Chooses the shorter of the two segments. Chooses the longer of the two segments.   A B C D S1S1 S2S2 A B C D A B C D A B C D Chooses the segment which creates less area difference. Chooses the segment which creates the larger rectangle closer to square.  S 1 < S 2 Area( B ) - Area( A ) > Area( D ) - Area( C ) AspectRatio( B ) > AspectRatio( D ) Chooses the segment which creates the larger empty rectangle. Chooses the segment which creates empty rectangles closer to squares. Area( B ) > Area( D )  Max{AR( A ),AR( B )} < Max{AR( C ),AR( D )} AR = AspectRatio 

Berkeley: Sept 15, Online Placement Results Table 1. Percentage of accepted modules using different bin-packing and empty space partitioning rules

Berkeley: Sept 15, Online Placement Results Volume that does not fit BEST

Berkeley: Sept 15, Online Placement Results (cont.)

Berkeley: Sept 15, t y x Off-line placement: 3-D Floorplanning RFU DFGSchedule RFU CPU RFU area time

Berkeley: Sept 15, t y x 3-D Floorplanning RFU By deleting this RFUOP (CPU performs the operation)... DFGSchedule RFU CPU

Berkeley: Sept 15, t y x 3-D Floorplanning RFU DFGSchedule RFU CPU

Berkeley: Sept 15, Our 3-D Floorplanner: No change in the schedule Pure annealing –Move set Move operation from CPU set to RFU set Move operation from RFU set to CPU set Displace an already placed RFUOP on the RFU –Cost function: Volume –Very poor results Start with an ASAP schedule, use on-line to get an initial solution, then low-temperature annealing

Berkeley: Sept 15, Offline Penalty Online Penalty Ratio % % % % AlgorithmData set T50 T100 S100 S200 LTSA X=100% A % T50 T100 S100 S200 LTSA X=20% A % % % % % Offline Placement Results Place X% of the largest-volume modules using on-line placement

Berkeley: Sept 15, Flexibility of the Modules Library of modules have different implementations for each RFUOP –Experimental results with our online algorithms show about 60% reduction in penalty. 3-4 Implementations are enough

Berkeley: Sept 15, Faster Routing: mostly offline Technology-Mapped netlist Architecture Description File VPR Place Circuit or Read in Existing Placement Perform either Global or Combined Global/Detailed Routing Placement and Routing Output Files VPRCAD flow

Berkeley: Sept 15, Routing Algorithm (VPR) Call the VPR’s Router by an arbitrary channel width Based on PathFinder negotiated congestion algorithm Step1: Each net routed by the shortest path which can be found. ( Regardless of any overuse of wiring segments) Step2: Sequentially ripping-up and re-routing every net in the circuit ( by the lowest cost path found)

Berkeley: Sept 15, Fast Pattern Routing Maze-based routing algorithm has a good performance but it’s very slow. So, Speed-up the router by partially using pattern routing if an arbitrary net picked and routed differently, it would not change the result effectively.

Berkeley: Sept 15, Independent subset of nets Two geometrical independent sets of nets - Class 1 - Class 2

Berkeley: Sept 15, Routing Patterns 2 terminal net patterns Multi terminal net patterns (MST & RSTs) Cost = L + const / Flexibility

Berkeley: Sept 15, Implementation of Algorithm First choose the 2 terminal nets to route - More than 50% of the nets are 2 terminal nets. - In order to get the maximum independent sets, sort the two terminal nets in terms of their bounding boxes. - Classify the 2 terminal nets in geometrical independent classes - Route the classes, sequentially by pattern routing. Next choose the multi terminal nets ( low fan-out) - Route them in their corresponding RST patterns Finally, let the rest of the nets be routed by traditional router

Berkeley: Sept 15, Experimental Results

Berkeley: Sept 15, Faculty Position In VLSI Design & CAD (1-2 openings) VLSI Design & CAD: One of the six focused research areas in the department Assistant/Associate/Full Professor –(Northwestern rank: top 10; –ECE: top 20 (top 10 in 5 years) –Contact:

Berkeley: Sept 15, r0r0 r1r1 Image Restoration The value of the center pixel in the next iteration: x k+1 =  *y + x k -  * (d**x k ) r1r1 r1r1 r1r1 r1r1 r1r1 r1r1 y: the pixel value from the original degraded image x k : the pixel value from the previous iteration d**x k denotes the weighted sum r 1 *  (eight neighbor pixels) + r 0 * center pixel

Berkeley: Sept 15, Incentive : Processing of large sized images using FPGA’s with limited resources 1. Segmentation of the image into smaller sized images suitable for the FPGA Segments of size m x n are surrounded by an overlap of o. m o n

Berkeley: Sept 15, Pixels of individual segments are restored in parallel by hardware. Restored segments are written back after the overlap is discarded MEMORY m o n RFU

Berkeley: Sept 15, How bad is the segmentation? Theorem: The error introduces is about (w)**O example: (1/16) ** 2 = (1/264) Proof: By induction m o n

Berkeley: Sept 15,

Berkeley: Sept 15, Degraded ImageRestored Image

Berkeley: Sept 15, Degraded ImageRestored Image

Berkeley: Sept 15, Image stored in on-chip memory Circuit to process the image residing on the rest of the chip FPGA chip On-board memory, where the image is stored FPGA chip Host processor ( image is stored here) System ASystem BSystem C

Berkeley: Sept 15, ImageSoftware Running Time (sec) Running Time for System A (msec) Running Time for System C (msec) cameraman moon circle animals fish barbara yacht soccer announcer bluegirl cablecar cornfield Running Times of the Application on Software and on Different Systems (ignoring reconfiguration)

Berkeley: Sept 15, Conclusions Need radical departure (new algorithm, etc) from traditional PD algorithms. Fast (and lower quality) place & route tools Do as much as possible (building complex libraries, hierarchical routing, …) before compilation All of the above (and more) needed to make reconfigurable computing a reality.

Berkeley: Sept 15, Faculty Position In VLSI Design & CAD (1-2 openings) VLSI Design & CAD: One of the six focused research areas in the department Assistant/Associate/Full Professor –(Northwestern rank: top 10; –ECE: top 20 (top 10 in 5 years) Contact: