Download presentation
Presentation is loading. Please wait.
Published byChrystal Marsh Modified over 6 years ago
1
An Evaluation of Staged Run-Time Optimizations in DyC
Brian Grant, Matthai Philipose, Markus Mock, Craig Chambers, and Susan J. Eggers Presented by: Franjo Plavec October 16, 2002
2
An Evaluation of Staged Run-Time Optimizations in DyC
Agenda Motivation DyC System Overview DyC’s Run-Time Optimizations Performance Analysis Pros and Cons October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
3
An Evaluation of Staged Run-Time Optimizations in DyC
Motivation Dynamic compilation improves performance of small kernels Scale to larger programs (e.g. benchmark suite) Evaluate contribution of each optimization to performance October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
4
An Evaluation of Staged Run-Time Optimizations in DyC
What is DyC? Selective, Value Specific Dynamic Compilation System Run-Time constants Targets complex C programs Declarative, annotation based system Staged optimization Low overhead October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
5
DyC System Overview Annotated Program Source Program Input
Static Compile Time Run Time Traditional Optimizations Statically Generated Code Binding-Time Analysis (BTA) Dynamic Compiler Dynamic-Compiler Generator Dynamically Generated Code October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
6
Dynamic-to-Static Promotions and Polyvariant Specialization
Specialization – generate multiple versions of code specialized to different values of static variables Static variables are said to be promoted from dynamic to static October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
7
An Evaluation of Staged Run-Time Optimizations in DyC
Internal Promotions Promotion can occur in the middle of a dynamic region Enables multi-stage specialization October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
8
Unchecked Dispatching
Every execution of a promotion point triggers code cache lookup Programmer annotates if static variable doesn’t change ⇒ no need for cache lookup Cache-one-unchecked October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
9
Complete Loop Unrolling
Polyvariant specialization can result in complete loop unrolling Single-way loop unrolling – simple loops Multi-way loop unrolling – more complex loops (e.g. loop induction variables change inside the loop body) Loop is eliminated, not enlarged Enables additional constant and branch-folding optimizations October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
10
An Evaluation of Staged Run-Time Optimizations in DyC
Polyvariant Division The same program point may be analyzed multiple times Each time, a different set of variables is assumed static Programmer can annotate conditional specialization October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
11
An Evaluation of Staged Run-Time Optimizations in DyC
Static Loads and Calls Loads of invariant parts of memory are treated as static computations Pure functions can be annotated as static Invocation of static function with all static arguments is treated as static computation October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
12
An Evaluation of Staged Run-Time Optimizations in DyC
Strength Reduction Multiplies, divides, and modulus operations with a single static operand Fit integer static operands into instruction immediate fields October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
13
An Evaluation of Staged Run-Time Optimizations in DyC
Staged Dynamic Zero and Copy Propagation, and Dead-Assignment Elimination Multiply by 1 ⇒ Move Multiply by 0 ⇒ Clear Dead-Assignment Elimination - if the value is propagated downstream, original instruction (i.e. move or clear) can be removed Planning stage is done at static compile time October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
14
An Evaluation of Staged Run-Time Optimizations in DyC
Example void do_convol (float image [][], int irows, int icols, float cmatrix [][], int crows, int ccols, float outbuf [][]) { float x, sum, weighted_x, weight; int crow, ccol, irow, icol, rowbase, colbase, crowso2, ccolso2; make_static (cmatrix, crows, ccols, crow, ccol); crowso2=crows/2; ccolso2=ccols/2; for (irow=0; irow < irows; ++irow) { rowbase = irow-crowso2; for (icol=0; icol < icols; ++icol){ colbase = icol-ccolso2; sum = 0.0; for (crow=0; crow<crows; ++crow) { for (ccol=0; ccol<ccols; ++ccol) { weight x = image[rowbase+crow][colbase+ccol]; weighted_x = x * weight; sum = sum + weighted_x; }} outbuf [irow][icol] = sum; }}} October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
15
Example – Partially Optimized
for (irow=0; irow < irows; ++irow) { rowbase = irow-1; for (icol=0; icol < icols; ++icol){ colbase = icol-1; sum = 0.0; x = image[rowbase+0][colbase+0]; // Iteration 0: crow=0, ccol=0 weighted_x = x * 0.0; sum = sum + weighted_x; x = image[rowbase+0][colbase+1]; // Iteration 1: crow=0, ccol=1 weighted_x = x * 1.0; sum = sum + weighted_x; x = image[rowbase+0][colbase+2]; // Iteration 2: crow=0, ccol=2 x = image[rowbase+1][colbase+0]; // Iteration 3: crow=1, ccol=0 … //Iterations 4-8 outbuf [irow][icol] = sum; }}} October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
16
Example – Partially Optimized
for (irow=0; irow < irows; ++irow) { rowbase = irow-1; for (icol=0; icol < icols; ++icol){ colbase = icol-1; sum = 0.0; x = image[rowbase+0][colbase+0]; // Iteration 0: crow=0, ccol=0 weighted_x = x * 0.0; sum = sum + weighted_x; x = image[rowbase+0][colbase+1]; // Iteration 1: crow=0, ccol=1 weighted_x = x * 1.0; sum = sum + weighted_x; x = image[rowbase+0][colbase+2]; // Iteration 2: crow=0, ccol=2 x = image[rowbase+1][colbase+0]; // Iteration 3: crow=1, ccol=0 … //Iterations 4-8 outbuf [irow][icol] = sum; }}} weighted_x = x * 0.0; sum = weighted_x; x = image[rowbase][colbase]; // Iteration 0: crow=0, ccol=0 weighted_x = x * 1.0; sum = weighted_x; sum = ; 0.0 0.0 x = image[rowbase][colbase+1]; // Iteration 1: crow=0, ccol=1 sum = x; 0.0 X October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
17
Example – Fully Optimized
for (irow=0; irow < irows; ++irow) { rowbase = irow-1; for (icol=0; icol < icols; ++icol){ colbase = icol-1; // Iteration 0: crow=0, ccol=0 //All code eliminated x = image[rowbase][colbase+1]; // Iteration 1: crow=0, ccol=1 sum = x; // Iteration 2: crow=0, ccol=2 x = image[rowbase+1][colbase]; // Iteration 3: crow=1, ccol=0 sum = sum + x; … //Iterations 4-8 outbuf [irow][icol] = sum; }}} October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
18
An Evaluation of Staged Run-Time Optimizations in DyC
Image Convolution -1 5 1 Emboss Sharpen Example October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
19
An Evaluation of Staged Run-Time Optimizations in DyC
Performance Analysis Workload - Applications dinero – cache simulator m88ksim – Motorola 8800 simulator mipsi – MIPS R3000 simulator pnmconvol – image convolution viewperf - renderer October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
20
An Evaluation of Staged Run-Time Optimizations in DyC
Performance Analysis Workload - Kernels binary – binary search over an array chebyshev – polynomial function approximation dotproduct – dot-product of two vectors query – tests database entry for match romberg – functional integration by iteration October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
21
An Evaluation of Staged Run-Time Optimizations in DyC
Performance Analysis System DEC Alpha based workstation, 1.5 GB RAM - lightly loaded Multiflow compiler - comparable to gcc –O2 October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
22
Performance Analysis – Static Variables
Program Annotated Static Variables Values of Static Variables dinero cache configuration 8kB I/D, direct-mapped, 32B blocks m88ksim an array of breakpoints no breakpoints mipsi input program bubble sort pnmconvol convolution matrix 11x11 with 9% ones, 83%zeros viewperf 3D projection matrix, lighting vars perspective matrix, one light source binary input array and its contents 16 integers chebyshev the degree of the polynomial 10 dotproduct contents of one of the vectors a 100-integer array with 90% zeros query a query 7 comparisons romberg the iteration bound 6 October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
23
An Evaluation of Staged Run-Time Optimizations in DyC
Measurement results Program Asymptotic Speedup Break-Even Point Overhead (CPI) Instr. Generated dinero 1.7 1 invocation(3524 memory references) 334 634 m88ksim 3.7 28 breakpoint checks 365 6 mipsi 5.0 1 invocation ( instructions) 207 36614 pnmconvol 3.1 1 invocation (59 pixels) 110 2394 viewperf:project&clip 1.3 16 invocations 823 122 viewperf: shade 1.2 524 618 binary 1.8 836 searches 72 304 chebyshev 6.3 2 interpolations 31 807 dotproduct 5.7 6 dot products 85 50 query 1.4 259 database entry comparisons 53 71 romberg 16 integrations 13 1206 October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
24
An Evaluation of Staged Run-Time Optimizations in DyC
Measurement results Program Asymptotic Speedup Execution Time in the Dynamic Regions (% of total static execution) Average Whole Program Speedup dinero 1.7 49.9 1.5 m88ksim 3.7 9.8 1.05 mipsi 5.0 ~ 100 4.6 pnmconvol 3.1 83.8 3.0 viewperf 1.3 41.4 1.02 October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
25
An Evaluation of Staged Run-Time Optimizations in DyC
October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
26
An Evaluation of Staged Run-Time Optimizations in DyC
Pros and Cons Pros Performance improvement Low overhead (staged) Cons Annotation is time-consuming Programmer has to statically predict which variables will be static What if programmer is wrong? All optimizations overhead? October 16, 2002 An Evaluation of Staged Run-Time Optimizations in DyC
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.