Download presentation
Presentation is loading. Please wait.
Published byAusten Glenn Modified over 8 years ago
1
University of Michigan Electrical Engineering and Computer Science 1 Compiler-directed Synthesis of Multifunction Loop Accelerators Kevin Fan, Manjunath Kudlur, Hyunchul Park, Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan
2
Electrical Engineering and Computer Science 2 Accelerating Streaming Applications Streaming applications: –Discrete transformations operating on data stream –High performance Map application to pipeline of accelerators Multifunction accelerators reuse hardware –Improve hardware efficiency Frame Type? Loop 2Loop 3 Loop 1 Loop 4 Application … DRAM LA1 LA2 LA3 Accelerator Pipeline … Loop Accelerator Multifunction Loop Accelerator Multifunction Loop Accelerator Block 5
3
University of Michigan Electrical Engineering and Computer Science 3 Loop Accelerator Schema Hard wired state machine for one or more critical loops Order of magnitude power and performance improvements over more general designs
4
University of Michigan Electrical Engineering and Computer Science 4 Single Function Accelerator Design Use compiler as architecture synthesis tool –Parameterized meta-architecture – all loop accelerators have same general organization –Performance/throughput is input –Compiler analysis to understand computation and communication requirements –Hardware-sensitive optimization to reduce cost
5
University of Michigan Electrical Engineering and Computer Science 5 Flow Diagram Application Loop, Desired II Allocate FUs Abstract Arch Modulo Schedule Scheduled Ops Build Datapath Concrete Arch Instantiate Arch Verilog, Control Signals Synthesize Loop Accelerator Op1 Op2 Op3 … time FUs FU RF FU
6
University of Michigan Electrical Engineering and Computer Science 6 FU Allocation Given operations in a loop and cost of hardware cells implementing those operations Minimize total FU cost while supporting all operations 3 ADD 1 SUB 2 LOAD II = 2 ++ - MEM
7
University of Michigan Electrical Engineering and Computer Science 7 Modulo Scheduling and Datapath Derivation Schedule to abstract architecture (FUs) Determine register and interconnect requirements from schedule r1 = Mem[r2] r3 = r1 + 12 Source Code Datapath MEM+ 12 ADD LOAD time 1 time 4 FU1FU2 Schedule...
8
University of Michigan Electrical Engineering and Computer Science 8 Multifunction Accelerator Single hardware accelerator to run multiple loops Could place single function accelerators side by side Want to exploit potential hardware sharing between loops –Function units –Registers –Interconnect
9
University of Michigan Electrical Engineering and Computer Science 9 Multifunction Design Strategies 1. Union Method 2. Phase Ordered Method FU +
10
University of Michigan Electrical Engineering and Computer Science 10 Union Method +-MM ++*M Accel 1 Accel 2 +-MM ++*M Smart Union +*/-M/+M Storage cost: 11 Positional Union ++/-M/*M Multi- function accel Storage cost: 15 Goal: combine FUs and register files to improve hardware sharing.
11
University of Michigan Electrical Engineering and Computer Science 11 Union Method Smart union formulated as ILP problem which minimizes FU and register cost Benefit: Look at whole design at once Limitation: Schedules are fixed prior to union phase Fast runtime
12
University of Michigan Electrical Engineering and Computer Science 12 Cost of Union of Accelerators Image ProcessingMPEG4Signal Processing Worst union: 25% average savings Positional union: 29% average savings Best union: 33% average savings
13
University of Michigan Electrical Engineering and Computer Science 13 Phase Ordered Method Schedule loops in order During scheduling, account for hardware from previous loop Cost sensitive scheduler attempts to minimize hardware cost increase FU + Loop 1Loop 2Accel 1Accel 1+2
14
University of Michigan Electrical Engineering and Computer Science 14 Cost Sensitive Scheduling Different valid scheduling alternatives are not equal +1+1 LD 1 +1+1 +2+2 LD 2 +2+2 time FU1FU2FU3 FU1FU2FU3 0 1 2 +1+1 +2+2 LD 2 LD 1 time FU1FU2FU3 FU1FU2FU3 0 1 2
15
University of Michigan Electrical Engineering and Computer Science 15 Greedy Cost Sensitive Scheduler Select scheduling alternative with minimum cost Account for estimated cost of unscheduled ops 12 43 5 Modulo Scheduler Loop 1 Cost i Alt i Partial Hardware for Scheduled Ops Estimate for Unscheduled Ops +*+ HW Cost Library + 12 43 5 Modulo Scheduler Loop 2 Cost i Alt i Loop 1 Hardware Hardware Cost Modeler
16
University of Michigan Electrical Engineering and Computer Science 16 Phase Ordered Method Extend conventional iterative modulo scheduler with hardware cost model Benefits: –Scheduler is aware of hardware for all previously scheduled loops –Can adjust schedule to improve cost savings Limitation: process is localized, greedy. Schedules of previous loops are fixed Fast runtime
17
University of Michigan Electrical Engineering and Computer Science 17 Cost Sensitive Scheduling Comparison Image ProcessingMPEG4Signal Processing Greedy scheduling: 41% average savings ILP scheduling: 51% average savings
18
University of Michigan Electrical Engineering and Computer Science 18 Union vs. Phase Ordered Methods Union method: 45% average savings Phase ordered method: 41% average savings Image ProcessingMPEG4Signal Processing
19
University of Michigan Electrical Engineering and Computer Science 19 Conclusion Compiler-directed design system Multifunction accelerator for hardware reuse Two multifunction design methods –Smart union of single-function accelerators: 45% average cost savings –Phase ordered scheduling: 41% average cost savings Overall, 20 – 61% hardware savings from sharing
20
University of Michigan Electrical Engineering and Computer Science 20 Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.