Download presentation
Presentation is loading. Please wait.
1
Sculptor: Flexible Approximation with
Selective Dynamic Loop Perforation Shikai Li, Sunghyun Park, Scott Mahlke Compilers Creating Custom Processors (CCCP) Research Group EECS Department at the University of Michigan
2
Problem The end of Dennard Scaling and the upcoming end of Moore’s Law. VS Data explosion and emerging compute-intensive applications in deep learning, data mining, etc. Quantum Tunneling through the memory wall, K,Gpmez.
3
An example of image quality of increasing quality losses
Opportunity Quality: 100% 95% 90% 85% Compute-intensive applications in various domains are error-tolerant. -- Machine Learning -- Data Mining -- Image Processing -- Video Processing -- Gaming … An example of image quality of increasing quality losses From Mehrzard S. et al. “SAGE: Self-Tuning Approximation for Graphics Engines”
4
+ Solution Rising Computation Demands
Emerging Error-Tolerant Applications Approximate Computing Trade off output accuracy with performance improvement or energy reduction.
5
Approximate Computing
Reduce the amount of computation Replace accurate computation with fuzzy computation Perform computation without correctness guarantees
6
Previous Work Hardware Software Neural Network Accelerator
-- ASIC (H. Esmaeilzadeh, MICRO 2012) -- Analog (R. St Amant , ISCA 2014) -- FPGA (T. Moreau, HPCA 2015) -- GPU (A. Yazdanbakhsh, MICRO 2015) Approximate Value Prediction -- J. S. Miguel, MICRO 2014; -- A. Yazdanbakhsh, TACO 2016; Cache And Memory System -- Doppelgänger Cache (J. S. Miguel, MICRO 2015) -- Bunker Cache (J. S. Miguel, MICRO 2016) -- Concise Loads & Stores (A. Jain, MICRO 2016) Approximate Operation and Storage -- CPU (H. Esmaeilzadeh, ASPLOS 2012) -- GPU (D. Wong, HPCA 2016) Software Programmer-Assisted Framework -- Green (W. Baek, PLDI 2010) -- EnerJ (A. Sampson, PLDI 2011) Automatic Framework for GPU -- SAGE (M. Samadi, MICRO 2013) -- Paraprox (M. Samadi, ASPLOS 2013) Unleash Parallelism -- QuickStep (S. Misailovic, TECS 2013) -- Helix-Up (S. Campanoni, CGO 2015) Approximation Dynamism -- M. A. Laurenzano, PLDI 2016 -- S. Mitra, CGO 2017 Task Skipping and Loop Perforation -- M. Rinard, et al. SC 2016, MIT Tech Report 2009, SAS 2011, FSE 2011
7
Loop Perforation Loops are transformed to periodically skip subsets of their iterations. Periodically Entirely
8
Skipping Different Instructions
Skipping different instructions have different influences on accuracy. Data Addr Mem Cond Different Final Output Errors Caused by Skipping A Single Instruction at Rate 2 inside The Kernel Loop of Hotspot from Rodinia
9
Skipping Different Iterations
Skipping different iterations have different influences on accuracy. Iteration ID Different Final Output Errors Caused by Skipping A Single Iteration inside A Kernel Loop of Bodytrack from PARSEC
10
Optimized Loop Perforation
Traditional Loop Perforation Dynamic Iteration Loop Perforation Selective Instruction Loop Perforation Selective Dynamic Loop Perforation
11
System Overview
12
Methodology Selective Instruction Loop Perforation
Dynamic Iteration Loop Perforation Runtime Error Management
13
Selective Instruction Loop Perforation
Loops are transformed to skip a subset of instructions in each iteration.
14
Selective Perforation Methodology
Instruction Level Selective Perforation Load Based Selective Perforation Store Based Selective Perforation
15
Instruction Level Selective Perforation
Selection Stage Expansion Stage Transformation Stage
16
Selection Stage 1. Selection Stage Performance Impact
17
Selection Stage 1. Selection Stage Performance Impact
Program Corruption
18
Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error … 101, 102, 103, 104, 105, 106, 107, 108, 109 … Good Temporal data similarity … 100, 200, 100, 300, 200, 500, 200, 300, 500 … Bad Temporal data similarity
19
Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error
20
Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error
21
Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error
22
Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error
23
Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error
24
Selection Stage 1. Selection Stage Performance Impact
Program Corruption Output Error
25
Expansion Stage 2. Expansion Stage
Perforate more instructions without additional output error. Instructions that only use results of perforated instructions or loop invariants. Instructions whose results are only used by perforated instructions.
26
Expansion Stage 2. Expansion Stage
27
Expansion Stage 2. Expansion Stage
28
Expansion Stage 2. Expansion Stage
29
Expansion Stage 2. Expansion Stage
30
Expansion Stage 2. Expansion Stage
31
Transformation Stage 3. Transformation Stage
Reduce control divergence overhead with compiler optimization.
32
Transformation Stage 3. Transformation Stage Instruction Re-ordering
33
Transformation Stage 3. Transformation Stage Instruction Re-ordering
Loop Unswitching
34
Transformation Stage 3. Transformation Stage Instruction Re-ordering
Loop Unswitching Loop Unrolling
35
Methodology Selective Instruction Loop Perforation
Dynamic Iteration Loop Perforation Runtime Error Management
36
during program execution.
Dynamic Iteration Loop Perforation Loops are transformed to skip a flexible subset of iterations during program execution.
37
Dynamic Perforation Methodology
Dynamic Perforation Rate Dynamic Start Point
38
Dynamic Rate Adapt approximation aggressiveness through changing skip rates at different circumstances during program execution.
39
Active Function Call Based Dynamic Rate
Loop executions tend to have different accuracy impacts during different function calls.
40
Active Loop Iteration Based Dynamic Rate
Loop executions tend to have different accuracy impacts during different “outer-loop” iterations. Iteration ID
41
Dynamic Start Coverage guarantee each iteration to be executed at least once. Fairness provides each iteration an equal chance to be executed.
42
Methodology Selective Instruction Loop Perforation
Dynamic Iteration Loop Perforation Runtime Error Management
43
Runtime Error Management
A calibration-based aggressiveness adjustment mechanism to perform error management at runtime.
44
Evaluation Evaluation Benchmark: 7 Benchmarks from PARSEC
2 Additional Benchmarks from Rodinia Error Metric: Most Error Metrics are Based on Relative Mean Error Cluster Applications use NMI Score as the Error Metric Evaluation Platform: LLVM 4.0 Clang 4.0, -O3 Ubuntu 16.04 Intel Skylake i GHz
45
Selective & Dynamic Perforation
Selective Dynamic Loop Perforation Speedup with Different Error Budgets (left: 5%, right: 10%) Average speedup improved from 1.47x to 2.89x Average speedup improved from 1.93x to 4.07x
46
Selective / Dynamic Loop Perforation
Selective Loop Perforation Speedup with An Error Budget of 10% Dynamic Loop Perforation Speedup with An Error Budget of 10% Average speedup 2.62x Compared to 4.07x of Selective Dynamic Loop Perforation Average speedup 2.91x Compared to 4.07x of Selective Dynamic Loop Perforation
47
Conclusion Motivation: Space Limitation in Loop Perforation
Time Limitation in Loop Perforation Methodology: Selective Instruction Loop Perforation Dynamic Iteration Loop Perforation Evaluation: Average Speedup 1.93x -> 4.07x
48
Q & A
49
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.