Download presentation
Presentation is loading. Please wait.
1
Berkeley: Sept 15, 1999 1 Physical Design Challenges of Reconfigurable Computing Systems Majid Sarrafzadeh NuCAD Department of ECE Northwestern University Ryan Kastner, Todd Haverkos, Kia Bazargan, Seda Ogrenci, Eli Bozorgzadeh, Candice McGrew Sponsored: DARPA, Motorola, AT&T, NSF
2
Berkeley: Sept 15, 1999 2 Faculty Position In VLSI Design & CAD (1-2 openings) VLSI Design & CAD: One of the six focused research areas in the department Assistant/Associate/Full Professor –(Northwestern rank: top 10; –ECE: top 20 (top 10 in 5 years) Contact: majid@ece.nwu.edu
3
Berkeley: Sept 15, 1999 3 Field Programmable Gate Array: FPGA
4
Berkeley: Sept 15, 1999 4 FPGA(Xilinx)
5
Berkeley: Sept 15, 1999 5 Degraded ImageRestored Image
6
Berkeley: Sept 15, 1999 6 Degraded ImageRestored Image
7
Berkeley: Sept 15, 1999 7 Image stored in on-chip memory Circuit to process the image residing on the rest of the chip FPGA chip On-board memory, where the image is stored FPGA chip Host processor ( image is stored here) System ASystem BSystem C
8
Berkeley: Sept 15, 1999 8 CPU Data Memory Control Data Instruction Memory (Program) RFUOPs CPU instructions The Architecture of a Reconfigurable System RFU
9
Berkeley: Sept 15, 1999 9 RFU Programmable logic Programmable connections Field Programmable Gate Array: FPGA SRAM cells used in configuration –Reconfigurable (runtime) –Static vs. dynamic configuration Hardware functions implemented as rectangular areas on the FPGA SRAM cells
10
Berkeley: Sept 15, 1999 10 System Components Configuration Memory Config. Bits RFUOPs RFU Manager Placement Engine Cache Manager Prefetch/Branch Prediction Unit Control Program Manager Instruction Mem. (Prog.) CPU instructions Data CPU RFU Data Memory Data
11
Berkeley: Sept 15, 1999 11 System Behavior Two kind of instructions –CPU instructions => always run on CPU Assume known runtime –RFUOPs, might be performed on CPU if not enough room on RFU Assume known runtime and reconfiguration time Runtime profiles and RFU status are used to decide between CPU and RFU
12
Berkeley: Sept 15, 1999 12 PD Challenges Problem: Given RFUOPs to be performed on RFU and DFG constraints, schedule them in time assign them physical location. Must be very fast: (mtools achieve 1000 cells per minute). Existing tools/techniques are very slow. Quality is less important. New PD algorithm/paradigms are needed. In this presentation: –placement, –routing, –an application on reconfigurable systems.
13
Berkeley: Sept 15, 1999 13 Firm Macros Not hard (too rigid), not soft (takes too much time to utilize the flexibility) Each unit is 80%-100% pre-designed: Can “break” the macros in limited ways We have defined a network algebra for combining circuits (based on parameterization using VHDL generics): combine a fast and a slow adder in multiple ways
14
Berkeley: Sept 15, 1999 14 Faculty Position In VLSI Design & CAD (1-2 openings) VLSI Design & CAD: One of the six focused research areas in the department Assistant/Associate/Full Professor –(Northwestern rank: top 10; –ECE: top 20 (top 10 in 5 years) –Contact: majid@ece.nwu.edu
15
Berkeley: Sept 15, 1999 15 Execution of a Sample Program RFU t y x x = 3*a - b; … C = RFUOP1(x,5); y = 4*x - c; for (i=0;i<3;i++){ x += RFUOP2(y); ++y; } z = RFUOP1(x,3); a = z - y; b = RFUOP3(a,b); c = a - b; …CodeDFG =>(on CPU) (on RFU)=> No room on RFU to run all in parallel ==> run in sequence => (in parallel) =>
16
Berkeley: Sept 15, 1999 16 Placement On-line placement –RFU calls needs to be executed as the program proceeds off-line placement –Have a complete or partial profile of the operation
17
Berkeley: Sept 15, 1999 17 Online Placement When a new RFUOP arrives –Is there enough space to place the RFUOP? –If yes, Which location is best to place it? Decision 1: Managing the empty space –Fast but sub-optimal Keep only O(n) empty rectangles –Shorter Seg. (SSEG), Square Empty Rects. (SQR),... –Efficient use of RFU real estate KAMER: Keep all O(n 2 ) maximal empty rectangles Decision 2: Packing rule –Best Fit, Bottom Left, First Fit
18
Berkeley: Sept 15, 1999 18 Keeping All Empty Rectangles Keeping O(n) Empty Rectangles - SSEG Cannot fit this
19
Berkeley: Sept 15, 1999 19 Area( ) < Area( ) Choose A Heuristics for Choosing an Empty Rectangle A B Current Placement New module to be inserted + = ? BF (Best Fit) FF (First Fit)BL (Bottom Left) Places the new module in the empty rectangle which causes less wasted space. Any of A or B could be chosen for placing the new module. P1P1 P2P2 Places the new module in rect w/ lower bottom-left corner, breaking the tie by picking leftmost one. y( P 2 ) < y( P 1 ) Choose B
20
Berkeley: Sept 15, 1999 20 Heuristics for Choosing a Segment SSEG (Shorter Seg) BER (Balanced Empty Rects)LSQR (Larger Rect Square) SQR (Square Rects) LER (Large Empty Rects) LSEG (Longer Seg) S1S1 S2S2 Chooses the shorter of the two segments. Chooses the longer of the two segments. A B C D S1S1 S2S2 A B C D A B C D A B C D Chooses the segment which creates less area difference. Chooses the segment which creates the larger rectangle closer to square. S 1 < S 2 Area( B ) - Area( A ) > Area( D ) - Area( C ) AspectRatio( B ) > AspectRatio( D ) Chooses the segment which creates the larger empty rectangle. Chooses the segment which creates empty rectangles closer to squares. Area( B ) > Area( D ) Max{AR( A ),AR( B )} < Max{AR( C ),AR( D )} AR = AspectRatio
21
Berkeley: Sept 15, 1999 21 Online Placement Results Table 1. Percentage of accepted modules using different bin-packing and empty space partitioning rules
22
Berkeley: Sept 15, 1999 22 Online Placement Results Volume that does not fit BEST
23
Berkeley: Sept 15, 1999 23 Online Placement Results (cont.)
24
Berkeley: Sept 15, 1999 24 t y x Off-line placement: 3-D Floorplanning RFU DFGSchedule RFU CPU RFU area time
25
Berkeley: Sept 15, 1999 25 t y x 3-D Floorplanning RFU By deleting this RFUOP (CPU performs the operation)... DFGSchedule RFU CPU
26
Berkeley: Sept 15, 1999 26 t y x 3-D Floorplanning RFU DFGSchedule RFU CPU
27
Berkeley: Sept 15, 1999 27 Our 3-D Floorplanner: No change in the schedule Pure annealing –Move set Move operation from CPU set to RFU set Move operation from RFU set to CPU set Displace an already placed RFUOP on the RFU –Cost function: Volume –Very poor results Start with an ASAP schedule, use on-line to get an initial solution, then low-temperature annealing
28
Berkeley: Sept 15, 1999 28 Offline Penalty Online Penalty Ratio 14728721315369.10% 25356630787982.36% 46404950892391.18% 53943561262388.05% AlgorithmData set T50 T100 S100 S200 LTSA X=100% A1024 42776145662793.68% T50 T100 S100 S200 LTSA X=20% A1024 14897521315369.89% 22560330787973.28% 28715350892356.42% 35998061262358.76% 21303645662746.65% Offline Placement Results Place X% of the largest-volume modules using on-line placement
29
Berkeley: Sept 15, 1999 29 Flexibility of the Modules Library of modules have different implementations for each RFUOP –Experimental results with our online algorithms show about 60% reduction in penalty. 3-4 Implementations are enough
30
Berkeley: Sept 15, 1999 30 Faster Routing: mostly offline Technology-Mapped netlist Architecture Description File VPR Place Circuit or Read in Existing Placement Perform either Global or Combined Global/Detailed Routing Placement and Routing Output Files VPRCAD flow
31
Berkeley: Sept 15, 1999 31 Routing Algorithm (VPR) Call the VPR’s Router by an arbitrary channel width Based on PathFinder negotiated congestion algorithm Step1: Each net routed by the shortest path which can be found. ( Regardless of any overuse of wiring segments) Step2: Sequentially ripping-up and re-routing every net in the circuit ( by the lowest cost path found)
32
Berkeley: Sept 15, 1999 32 Fast Pattern Routing Maze-based routing algorithm has a good performance but it’s very slow. So, Speed-up the router by partially using pattern routing if an arbitrary net picked and routed differently, it would not change the result effectively.
33
Berkeley: Sept 15, 1999 33 Independent subset of nets Two geometrical independent sets of nets - Class 1 - Class 2
34
Berkeley: Sept 15, 1999 34 Routing Patterns 2 terminal net patterns Multi terminal net patterns (MST & RSTs) Cost = L + const / Flexibility
35
Berkeley: Sept 15, 1999 35 Implementation of Algorithm First choose the 2 terminal nets to route - More than 50% of the nets are 2 terminal nets. - In order to get the maximum independent sets, sort the two terminal nets in terms of their bounding boxes. - Classify the 2 terminal nets in geometrical independent classes - Route the classes, sequentially by pattern routing. Next choose the multi terminal nets ( low fan-out) - Route them in their corresponding RST patterns Finally, let the rest of the nets be routed by traditional router
36
Berkeley: Sept 15, 1999 36 Experimental Results
37
Berkeley: Sept 15, 1999 37 Faculty Position In VLSI Design & CAD (1-2 openings) VLSI Design & CAD: One of the six focused research areas in the department Assistant/Associate/Full Professor –(Northwestern rank: top 10; –ECE: top 20 (top 10 in 5 years) –Contact: majid@ece.nwu.edu
38
Berkeley: Sept 15, 1999 38 r0r0 r1r1 Image Restoration The value of the center pixel in the next iteration: x k+1 = *y + x k - * (d**x k ) r1r1 r1r1 r1r1 r1r1 r1r1 r1r1 y: the pixel value from the original degraded image x k : the pixel value from the previous iteration d**x k denotes the weighted sum r 1 * (eight neighbor pixels) + r 0 * center pixel
39
Berkeley: Sept 15, 1999 39 Incentive : Processing of large sized images using FPGA’s with limited resources 1. Segmentation of the image into smaller sized images suitable for the FPGA Segments of size m x n are surrounded by an overlap of o. m o n
40
Berkeley: Sept 15, 1999 40. Pixels of individual segments are restored in parallel by hardware. Restored segments are written back after the overlap is discarded MEMORY m o n RFU
41
Berkeley: Sept 15, 1999 41 How bad is the segmentation? Theorem: The error introduces is about (w)**O example: (1/16) ** 2 = (1/264) Proof: By induction m o n
42
Berkeley: Sept 15, 1999 42
43
Berkeley: Sept 15, 1999 43 Degraded ImageRestored Image
44
Berkeley: Sept 15, 1999 44 Degraded ImageRestored Image
45
Berkeley: Sept 15, 1999 45 Image stored in on-chip memory Circuit to process the image residing on the rest of the chip FPGA chip On-board memory, where the image is stored FPGA chip Host processor ( image is stored here) System ASystem BSystem C
46
Berkeley: Sept 15, 1999 46 ImageSoftware Running Time (sec) Running Time for System A (msec) Running Time for System C (msec) cameraman4.7729.15791.960 moon2.8125.72554.494 circle2.9874.25442.722 animals6.7618.82688.628 fish7.02914.026140.850 barbara21.74136.630367.840 yacht12.36734.079342.227 soccer12.36034.079342.227 announcer13.46234.079342.227 bluegirl10.15834.079342.227 cablecar12.35434.079342.227 cornfield13.45834.079342.227 Running Times of the Application on Software and on Different Systems (ignoring reconfiguration)
47
Berkeley: Sept 15, 1999 47 Conclusions Need radical departure (new algorithm, etc) from traditional PD algorithms. Fast (and lower quality) place & route tools Do as much as possible (building complex libraries, hierarchical routing, …) before compilation All of the above (and more) needed to make reconfigurable computing a reality.
48
Berkeley: Sept 15, 1999 48 Faculty Position In VLSI Design & CAD (1-2 openings) VLSI Design & CAD: One of the six focused research areas in the department Assistant/Associate/Full Professor –(Northwestern rank: top 10; –ECE: top 20 (top 10 in 5 years) Contact: majid@ece.nwu.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.