Presentation is loading. Please wait.

Presentation is loading. Please wait.

VEAL: Virtualized Execution Accelerator for Loops Nate Clark 1, Amir Hormati 2, Scott Mahlke 2 1 Georgia Tech., 2 U. Michigan.

Similar presentations


Presentation on theme: "VEAL: Virtualized Execution Accelerator for Loops Nate Clark 1, Amir Hormati 2, Scott Mahlke 2 1 Georgia Tech., 2 U. Michigan."— Presentation transcript:

1 VEAL: Virtualized Execution Accelerator for Loops Nate Clark 1, Amir Hormati 2, Scott Mahlke 2 1 Georgia Tech., 2 U. Michigan

2 STI Cell How to get Efficiency? Microarchitecture changes Multi- / many-core Heterogeneity 2 Core2 Duo

3 How is Heterogeneity Used? 3 Engineer/ Compiler GPP Hetero. Program Control Statically Placed in Binary

4 Problem With Static Control Not forward/backward compatible 4 CPU Hetero. CPU Hetero. Program Engineer/ Compiler

5 Solution: Virtualization Abstract accelerator features –Reexamine compiler algorithms Key: do the hard stuff offline 5 CPU Hetero. Program CPU Hetero. Dyn Comp. Dyn Comp. Dyn Comp. Engineer/ Compiler Offline Online

6 This Paper: Examines loops as heterogeneity target –ASICs often implement loops Design a generalized loop accelerator –Not covered in this talk Explore how to virtualize loop accelerators –I.e. abstract the accelerator interface 6

7 Loop Accelerator Template 7

8 Why More Efficient Than GPP? Simple control flow Decoupled memory accesses I-Cache unnecessary Customize execution resources for loops 8

9 Proposed Loop Accelerator 1 CCA 2 Int units 16 regs Memory (4x) –16 Input streams – 8 Output streams 0.8 mm 2, 90nm 9

10 Modulo Scheduling + High quality software pipelining technique + Simple control structure (low HW cost) - Can be slow, i.e., hard to do dynamically - Loops: no side exits, no while, if convertible 10

11 Benchmark Execution Time 11

12 Modulo Scheduling Basics 12 Kernel FU C

13 CCAInt CCAInt Modulo Scheduling Example 13 Priority: 2, 4, 6 3, 5 7 0 1 2 2 4 6 3 5 7 Time 1. CCA Mapping 2. II Calculation 3. Priority 4. Scheduling 5. Reg. assignment/ communication

14 Measured Scheduling Overhead 14 70% Priority, 19% CCA

15 Supporting Hybrid Compilation 15 Loop: 1 ld 2 add 3 sub and sub xor 5 or 6 or 7 add 8 str Loop: 1 ld 2 add 3 sub 4 brl CCA 5 or 6 or 7 add 8 str CCA: and sub xor ret Data: 0 1 4 6 3 … Loop: 1 ld 2 add 3 sub 4 brl CCA 5 or …

16 Speedups 16

17 Summary Virtualization key to heterogeneity VEAL speedup: 2.54 –2.63 w/o translation (i.e., not binary compatible) –2.17 fully dynamic CCA and priority: 89% overhead –mpeg2dec 2.1 vs. 1.15 17

18 Thank you! Questions? http://www.cc.gatech.edu/~ntclark http://cccp.eecs.umich.edu/ 18


Download ppt "VEAL: Virtualized Execution Accelerator for Loops Nate Clark 1, Amir Hormati 2, Scott Mahlke 2 1 Georgia Tech., 2 U. Michigan."

Similar presentations


Ads by Google